Filling the Gaps: Using AI to Complete Satellite Images for Better Climate Monitoring
How deep learning techniques are helping scientists see through the clouds to monitor land surface temperatures in South Africa
Introduction
Imagine trying to monitor the health of crops across a given area, but every time you look up at the sky, clouds are blocking your view. This is exactly the challenge scientists face when using satellite data to monitor land surface temperatures, which is a critical measurement for understanding drought conditions, fire risks, and agricultural health. This blog post explores how we can solve this problem by teaching a deep learning model to 'see through' the clouds and fill in the missing data.
Satellites like NASA's MODIS (Moderate Resolution Imaging Spectroradiometer) circle the Earth, capturing thermal images that show how hot or cool the land surface is. These temperature measurements are useful for:
Detecting crop stress and irrigation needs,
Identifying areas at risk before conditions become severe,
Fire risk assessment like spotting unusually hot, dry conditions that could spark wildfires,
Climate research like understanding how land temperatures are changing over time
But taking these measurements faces a problem and that is clouds block thermal infrared radiation, creating gaps in the satellite data. In South Africa's Western Cape, a region crucial for agriculture and particularly vulnerable to drought, these gaps can persist for days or weeks, especially during critical growing seasons.
The figure below shows a map of the Western Cape, where the figure on the left shows a time period with minimal cloud coverings and the figure on the right shows one with a lot of cloud coverings. The red dots represent ground stations where some ground based measurements are taken.
When we lose days or weeks of temperature data during crucial periods like drought monitoring or fire season, the consequences can be severe. Farmers might miss early signs of crop stress, firefighters might not get adequate warning of dangerous conditions, and researchers lose valuable data for understanding climate patterns.
The Solution: Teaching a Deep Learning Model to Fill in the Blanks
This research focused on developing a system that can intelligently fill in the missing pieces of satellite temperature data. The approach combines two powerful deep learning techniques: Partial Convolutions and U-Net architecture.
Partial Convolutions
Traditional computer vision techniques treat all parts of an image equally, including missing or invalid areas. This is like trying to calculate the average grade in a classroom, but some students didn’t submit their work, the average results will likely be misleading.
Partial convolutions solves this by being "aware" of which parts of the data are valid and which are missing. The mathematical formulation for how this is done is given below:
where:
The formula basically says: “If there are valid pixels in a given window, compute the convolution using just them and scale it properly. If there aren’t any valid pixels, just set the result to zero”. After each step, a binary mask (which is like a checklist that marks whether each pixel is usable (1) or missing (0) ) is updated so that more and more missing pixels become “valid” as the network fills in the gaps.
The image below further illustrates this process:
This process repeats across layers, gradually painting in the missing areas of the image until, by the end, the network has reconstructed a complete picture.
Understanding U-Net
U-Net is a type of neural network architecture that's particularly good at understanding images at multiple scales simultaneously. Originally developed for medical image analysis, it has a distinctive U-shaped design that makes it perfect for tasks like ours.
Here's how the U-Net works:
The Encoder (Left side of the U): This part progressively reduces the image size while extracting increasingly abstract features. It's like zooming out to understand the "big picture" like regional weather patterns, large-scale temperature gradients, and overall landscape features.
The Bottleneck (Bottom of the U): At the deepest level, the network has a compressed understanding of the entire scene. Missing data has a minimal impact here because the view is so zoomed out.
The Decoder (Right side of the U): This part gradually increases resolution while reconstructing the final image. It's like zooming back in, but now with knowledge of both local details and global context.
Skip Connections (The bridges): These directly connect corresponding levels on both sides, ensuring that fine details from the original image aren't lost during the compression process.
The diagram below shows the general architecture of a U-Net:
By combining partial convolutions with the U-Net architecture, which in this case is replacing every convolutional layer in the U-Net architecture with a Partial convolution layer, we created a system capable of handling irregular gaps caused by realistic cloud patterns, going beyond the limitations of simple interpolation methods. At the same time, it preserves important spatial relationships by recognizing that nearby areas should have similar temperatures, while still respecting natural boundaries such as coastlines and mountain ranges.
Training the Model: Learning from Incomplete Images
To teach our Deep Learning model how to fill gaps, we designed a clever training approach. We began with nearly complete satellite images that had minimal cloud cover (at least 90% valid pixels), then created artificial gaps by overlaying cloud patterns from heavily clouded images onto them. This produced realistic missing regions, and the model was trained to reconstruct the original complete image from these artificially gapped versions, learning the relationships needed to restore cloud-covered areas.
The schema of this approach is shown below.
Reliable gap-filling, however, requires not just realistic training data but also a carefully chosen loss function that ensures the model learns from valid information only. For this purpose, we employ a masked mean squared error (MSE) loss, which restricts error computation to valid pixels defined by a binary mask. This loss function is critical because it ensures the model learns only from the data we trust, preventing it from 'hallucinating' or learning from the noisy, cloud-covered parts of the image. The loss function is expressed as:
where V denotes the set of valid pixel indices and |V| is the number of such pixels. This design ensures the model learns only from trustworthy observations and avoids biasing updates with unreliable or missing data. To ensure numerical stability, the loss is computed over normalized temperature values in the [0,1] range, and in cases where a sample contains no valid pixels, the loss function returns a zero tensor with gradient tracking, preventing training failures.
The Results: Seeing Through the Clouds
We evaluated model performance using four standard metrics: RMSE, MAE, Bias, and R2.
RMSE (Root Mean Square Error) measures how large the prediction errors are on average, giving more weight to bigger mistakes. MAE (Mean Absolute Error) also measures average error but treats all mistakes equally, making it easier to interpret. Bias shows whether the model consistently overestimates or underestimates values. Finally, R2 tells us how well the predictions match the observed data, with values closer to 1 indicating a better fit.
By combining Partial Convolutions and the U-Net architecture, our model significantly outperformed traditional gap-filling methods on the above mentioned metrics:
Looking at the table above, we see that our proposed model reduced errors from 4.3°C to 0.9 °C, produced natural-looking temperature patterns, and achieved near-zero bias. Most importantly, the reconstructed temperature fields looked and behaved like real land surface temperature data, maintaining the spatial patterns and relationships that scientists depend on for their analyses.
A sample gap-filled land surface temperature reconstruction is presented below, showing the cloud-masked input, model prediction, and corresponding prediction bias.
Looking Forward: From Research to Reality
This research demonstrates that advanced Deep Learning techniques can substantially improve our ability to monitor environmental conditions from space. While this study focused on the Western Cape, the methodology could be extended to other regions and even other types of satellite data.
The implications extend beyond academic research:
Operational drought monitoring: More complete temperature data could improve early warning systems
Agricultural decision support: Farmers could receive more reliable information about crop conditions
Fire risk assessment: Better spatial coverage during high-risk periods
Climate research: More complete datasets for understanding long-term trends
As we face increasing climate variability and extreme weather events, tools like these become essential for building resilience and supporting evidence-based decision-making. By training Deep Learning models to ‘see through’ the clouds, we're helping ensure that critical environmental monitoring can continue even when the sky isn't clear.
Code and Reproducibility
The complete codebase, project structure, and implementation details utilized in this research are publicly available and documented in the associated repository.
This includes:
Data preprocessing pipelines for MODIS LST and weather station integration
Partial Convolution U-Net implementation with training scripts
Baseline LightGBM model development and evaluation
Visualization tools for results analysis
Documentation for reproducing the experimental setup.
Repository: Github







