predicting sentinel-2 optical data using multitemporal

Predicting Sentinel-2 optical data using

multitemporal Sentinel-1 radar data for

cloud gap reconstruction

Hamelberg, MA

12 May 2020

Geo-Information Science and Remote Sensing

Thesis Report GIRS-2020-25

Predicting Sentinel-2 optical data using multitemporal Sentinel-1 radar data for cloud gap reconstruction

Hamelberg, MA | 12 May 2020 | pg. 1

Predicting Sentinel-2 optical data using multitemporal Sentinel-1 radar data for

cloud gap reconstruction

Thesis Report GIRS-2020-25

Author

Hamelberg, MA 1

Registration number: 910824-302-050 | [email protected] 1 MSc, Geo-Information Science and Remote Sensing, Wageningen University

Supervisors

Clevers, JGPW 2

[email protected]

Reiche, J 2

[email protected] 2 Laboratory of Geo-Information Science and Remote Sensing, Wageningen University and Research Centre

12 May 2020, Wageningen, Netherlands

A thesis report submitted in partial fulfilment of the degree of Master of Science at Wageningen University

and Research Centre, The Netherlands

Keywords

Thesis; WUR; Deep learning; Predicting; Sentinel-1; radar data; Sentinel-2; optical data; Cloud gap

reconstruction; Remote Sensing

Abstract

Dense and continuous land surface mapping and monitoring is hampered by cloud cover. The Sentinel-

2 satellite mission provides optical data that suffers from this problem, creating so called cloud gaps. There

is a demand to predict missing optical data using alternative sources to reconstruct these cloud gaps. The

spatiotemporally similar Sentinel-1 satellite mission is one of these sources providing radar data that is

able to bypass clouds. The research objective is to predict incomplete regions of Sentinel-2 optical data

using multitemporal Sentinel-1 radar data from cloud free regions. Dissimilarities between the data types

render this a difficult task. A U-Net deep learning model is applied to provide an advanced solution. The

model is designed for semantic segmentation and image reconstruction and has been successfully applied

in previous research in the field of remote sensing. Initial preprocessing of the Sentinel data is performed

to prepare for an optimal training and prediction phase. Google Earth Engine and various Python libraries

are the tools of choice. A basic machine learning random forest regressor is implemented to form a

prediction performance baseline. The performance of both prediction models are tested on two study

areas with iterations of 10%, 20%, and 30% artificial cloud cover. The U-Net has a promising performance

with consistent results for both study areas (R2 > 0.70; MSE ~0.02; RPD > 2.00; SSIM > 0.70). Significant

differences are observed between the prediction models (p-value < 0.05), favoring the U-Net. No

significant differences are observed in prediction performances between cloud coverages, suggesting

consistent performances when cloud cover varies. The U-Net retains the image structure between cloud

cover iterations, outperforming the baseline model that generally sees noisy and subpar results. More

(artificial) study areas and cloud cover iterations need to be added for rigorous model performance testing

and methodological approaches need to be aligned to other research for valid comparisons.

Thesis code number: GRS-80436

Thesis Report: GIRS-2020-25

Wageningen University and Research Centre

Laboratory of Geo-Information Science and Remote Sensing

mailto:[email protected]





Contents

1 Introduction ................................................................................................................................. 4

1.1 The Digital Earth ...................................................................................................................... 4

1.2 Sentinel missions ...................................................................................................................... 4

1.3 Deep learning ........................................................................................................................... 6

1.4 Literature review....................................................................................................................... 6

1.5 Problem statement ................................................................................................................... 7

1.6 Research objective .................................................................................................................... 7

Research questions ......................................................................................................................... 7

2 Methodology................................................................................................................................. 8

2.1 Overview ................................................................................................................................... 8

2.2 Materials .................................................................................................................................... 8

Google Earth Engine and Python ................................................................................................. 8

Data specifications ......................................................................................................................... 8

2.3 Methods .................................................................................................................................... 9

Study areas ...................................................................................................................................... 9

Preprocessing Sentinel data ......................................................................................................... 10

Artificial clouds ............................................................................................................................ 11

Predicting with deep learning ..................................................................................................... 12

U-Net performance testing .......................................................................................................... 13

Comparison to basic machine learning ...................................................................................... 13

3 Results ......................................................................................................................................... 14

3.1 U-Net predictions ................................................................................................................... 14

3.2 Comparing prediction models .............................................................................................. 14

3.3 Comparing cloud cover predictions ...................................................................................... 15

3.4 Test results & visualizations ................................................................................................... 16

4 Discussion ................................................................................................................................... 21

4.1 Interpreting the U-Net results ............................................................................................... 21

Response labels ............................................................................................................................ 21

Cloud cover iterations ................................................................................................................. 22

4.2 Literature comparison ............................................................................................................ 22

4.3 Methodological approach ...................................................................................................... 23

Study areas and preprocessing .................................................................................................... 23

Artificial cloud improvements .................................................................................................... 23

Prediction models and testing .................................................................................................... 24

5 Conclusion .................................................................................................................................. 25

6 Appendix ..................................................................................................................................... 26

6.1 Extended tables ....................................................................................................................... 26

7 References ................................................................................................................................... 28



1 Introduction

1.1 The Digital Earth

Earth is constantly evolving, captured by numerous remote sensing satellites (ESDS, 2019). Al Gore

coined ‘The Digital Earth’ in his 1998 speech, ushering in the age for spatial information by a data driven

and computer-generated twin Earth, which could greatly advance our understanding of the natural

processes and human influences on the planet (Gore, 1998). This digital Earth’s representation of its

surface should be dense, continuous, and contemporary. These goals become a reality, thanks to the

increasing amount of public and private satellite imagery, as well as the capabilities to store, process,

manage, and analyze this huge amount of data on a planetary scale (Gorelick et al., 2017; Mateo-García et

al., 2018; Stuhler et al., 2016). These satellites are equipped with many types of sensors, capturing different

aspects of Earth’s surface. A majority of the sensors capture visible and infrared light reflected from the

sun. These, so-called passive optical sensors, register biophysical and chemical surface properties, making

them useful for a multitude of sectors, including agriculture (Gao et al., 2016; Wang et al., 2017; Zheng

et al., 2016) and ecology (Nagendra et al., 2013; Pettorelli et al., 2014). Data from these optical sensors

offer a human perspective of interpreting and understanding Earth’s surface (Wang and Patel, 2018).

Many tools have been developed that rely on optical data, for example navigation and visualization

applications (i.e. Google Maps), radiative transfer models (e.g. PROSAIL by Jacquemoud et al. (2009)),

and landcover classification libraries (Chen et al., 2017; Wu et al., 2017).

However, there are limitations to optical sensors, namely, atmospheric interferences obstructing

solar reflectance. One of these interferences is cloud cover, causing a lack of information forming ‘cloud

gaps’, hampering a dense and continuous optical perspective of Earth’s surface, especially in the tropics

(Loff, 2015). An alternative data source could be used to reconstruct these cloud gaps. A good candidate

is radar, which is the process of detecting and ranging radio waves that are able to bypass cloud cover.

Radar consists of an active sensor transmitting beams of radio waves scattering on a surface that (partially)

reflect back to a receiver. This process is called backscatter. Radar looks at physical properties of a surface

and is therefore different than the chemically orientated optical data. The goal is to transform radar data

to pseudo-optical data, or in other words predict optical data using radar data, to reconstruct cloud gaps

using the relationship between the dissimilar datasets from nearby cloud free regions.

1.2 Sentinel missions

This thesis will focus on the publicly available Sentinel-1 (S1) and Sentinel-2 (S2) missions by the

European Space Agency. S1 has radar sensors and S2 optical sensors. Each mission has two satellites,

providing a fine spatiotemporal resolution with near global coverage. A spatiotemporal resolution has

two components. The first component is the spatial resolution, which is the smallest possible feature

detectable within a resolution cell (i.e. pixel) at surface level (Liang et al., 2012). The second component

is the temporal resolution, which is the revisit time of a satellite platform capturing the same area (Small

et al., 2018). S1 and S2 have spatial resolutions of 5-20m and 10-60m respectively and temporal resolutions

of 6 and 5 days at the equator respectively (2–3 days at mid-latitudes). A fine spatiotemporal resolution is

hereby defined as the resolutions of S1 and S2. The data provided by the missions is freely available and

accessible from various geoportals, benefiting research and real-world applications.



The radar systems equipped by S1 are active sensors using the technique of synthetic aperture radar

(SAR) to register C-band electromagnetic radiation within the microwave spectrum. Roughness and water

content on a surface can be inferred from the backscatter values that make up radar data (Woodhouse,

2017). Surface roughness is measured by obliquely transmitted microwaves that either deflect away from

a transmitter or scatter back to a receiver close to the transmitter. For example, a flat surface (e.g. a calm

lake) deflects most microwaves resulting in lower backscatter values, whilst a rough surface (e.g. a forested

area) causes diffusion, scattering more microwaves towards the receiver resulting in higher backscatter

values. Backscatter of water content is affected by the dielectric properties of water: higher dielectric

constants increase backscatter. These two processes differ slightly depending on the polarization of

microwaves when transmitted and received. Two common polarizations are vertical-vertical (VV) and

vertical-horizontal (VH), each giving particular cues about surface properties.

Objects in a resolution cell scatter transmitted microwaves in all sort of directions before making

their way back to a receiver. This causes granular interference called speckle, appearing as spike noise (i.e.

“salt & pepper” noise), even on a seemingly flat surface. Another disadvantage of radar is its heavy

influence by topography. To capture distances by a radar system, the sensor must be at an oblique angle,

resulting in slant-range scale distortion (features compress closer to the sensor), foreshortening (feature

slopes are compressed), layover (sloped features appear closer), and radar shadow effects (sloped features

obstruct radar beams). The speckle and topographic effects of radar data make it difficult to interpret,

preprocess, analyze, and geographically co-register to other satellite data (Mou et al., 2017; Schmitt et al.,

2017; Tzouvaras et al., 2019). These disadvantages limit the usage of radar data. However, the

aforementioned capability to bypass most atmospheric interferences and continuously capture the surface

remains a huge advantage over optical sensors.

The optical sensors equipped on S2 register visible (blue, green, red), near infrared (nir), and short-

wave infrared (swir) electromagnetic radiation. The sensors include the vegetation red edge spectral

domain, cloud screening, and atmospheric correction bands. Bands are attributes with digital pixel values

of sections on the electromagnetic spectrum. Optical data by S2 is captured close to the sensors’ nadir

mitigating angular effects. The data is easy to interpret, and spatial resolutions are higher due to the

smaller waveforms. Speckle is not an issue, reflecting the surface more accurately relative to radar data.

Figure 1 shows the difference between optical and radar data captured in the same area.

Figure 1 (left) S2 optical data displaying visible light (captured on 2019-08-26); (right) VV backscatter values of

descending polar orbit S1 radar data (captured on 2019-08-25).



1.3 Deep learning

Accurately predicting optical data using dissimilar radar data is challenging considering their

differences and limitations. Deep learning (DL), which is an advanced subset of machine learning (ML),

could provide a solution to this transformation problem. Even though DL is often considered as a ‘black

box’ approach, a considerable amount of research has been done using DL algorithms with superior

results compared to traditional ML algorithms. Nowadays, DL is making its way into the field of remote

sensing and is becoming more popular due to its high performance in satellite image analysis (Belgiu and

Stein, 2019; Ma et al., 2019; Zhu et al., 2018, 2017). A common DL algorithm used for image analysis is a

convolutional neural network (CNN), which is especially well-suited for image object recognition

(Krizhevsky et al., 2012), and with its many iterations and improvements, such as convolutional

autoencoders (CAEs), greatly applicable to semantic segmentation and image reconstruction (Cresson et

al., 2019; Shelhamer et al., 2017).

1.4 Literature review

Reconstruction of missing information in optical data using radar data has seen a rise in the past

decade (Gao et al., 2020; Schmitt et al., 2017; Shen et al., 2015). Eckardt et al. (2013) made initial strides

by transforming pixel values in extensive radar data to pseudo-optical data using spatial statistics. The

composites show promising performances with different percentages and distributions of artificial cloud

cover. The technique is developed for multitemporal and multifrequency radar data with very fine

resolutions. The approach may yield limited results on the simpler S1 data. Nevertheless, it did set a trend

in furthering this research field and should not be overlooked for potential alternative implementations.

It was followed by several ML approaches using dictionary learning with sparse representation for pixel-

based reconstruction (Huang et al., 2015; Li et al., 2014; Xu et al., 2016). With the advent of advanced DL

approaches in remote sensing (Ma et al., 2019; Zhu et al., 2017), a deviation from earlier statistical and

basic ML algorithms towards these ‘deeper’ and more extensive algorithms came to fruition. In late 2017,

Zhang et al. (2017) looked at the effective fusion of multimodal remote sensing data using a fully CNN

for semantic segmentation. This was followed shortly by research using this technique for transforming

and compositing radar and optical data. Early 2018, Scarpa et al. (2018) used a compact CNN to transform

radar to vegetation indices, where after Wang & Patel (2018) used a cascade architecture of CNNs

extended by a generative adversarial network (GAN) to generate pseudo-optical data from radar data for

interpretation purposes. With the increased interest in radar to optical data transformation, a preprocessed

dataset of co-registered radar and optical data was offered by Schmitt et al. (2018) to advance research, this

included complex urban areas that are heavily subdued to the topographic effects (Wang and Zhu, 2018).

A conference in the same month of July 2018 had Grohnfeldt et al. (2018) and Liu and Lei (2018) address

the transformation topic, suggesting further improvements of previous research and discussed future

directions, as well as addressing the use of more advanced DL algorithms, such as conditional GANs

(cGANs). Bermudez et al. (2018) and He and Yokoya (2018) used a CNN and a cGAN for advanced optical

data prediction using radar data, where Bermudez et al. (2018) included a cloud gap reconstruction after

the prediction phase using DL algorithms, improving their method in further research (Bermudez et al.,

2019). A cGAN has limitations according to Cresson et al. (2019) due to its generative nature, suggesting

the use of CAEs that rely more on reconstruction by estimation. These CAEs often have a U-Net

architecture (Ronneberger et al., 2015) and are widely used in DL applications, reaping the

aforementioned benefits of high performances in semantic segmentation and image reconstruction.



During the writing of this thesis, Gao et al. (2020) published a paper describing the use of a U-Net for

optical data using several radar data sources, introducing a novel method of iteratively tweaking the

reconstructed cloud gaps (that were slightly deviating from the ground truth) with a cGAN before the

final fusion, obtaining prediction results with proper spectral information and fine textures.

1.5 Problem statement

Predicting optical data using radar data with the help of DL algorithms is still a young research field

where all its potentials have not yet been fully exploited. For example, Gao et al. (2020) mentions to start

using multitemporal radar data to incorporate changes over time as explanatory variables, whilst Cresson

et al. (2019) still relies on optical data after the prediction period. Research before these novel methods

do not use the powerful U-Net risking suboptimality. Most recent research limit their reconstruction to a

single percentage of (artificial) cloud cover and are not systematically testing over varying percentages.

Aside from previous research being performed to reconstruct cloud gaps, room for improvement

to the proposed prediction methods is welcomed. Current prediction models still do not perfectly predict

optical data using radar data. The physical obstruction by clouds in the optical range of the

electromagnetic spectrum remains a problem to sectors that rely on optical remote sensing data.

1.6 Research objective

The cloud penetrating capability of radar provides a solution to reconstruct cloud gaps in optical

data. This can be achieved by forming a relationship between the VV and VH bands of radar data

(hereafter named training features) and the visible and infrared bands of optical data (hereafter named

response labels) in cloud free regions. For this process, a U-Net is implemented. The S1 training features

are multitemporal and solitarily use information during and before the prediction time interval within

cloud free regions. The optical data prediction results, spanning over multiple study areas, are extensively

and systematically tested whilst iterating over varying artificial cloud cover percentages.

Research questions

The research objective can be summarized to one main research question with two sub-questions:

How accurately can a U-Net predict optical S2 data in cloud gaps using multitemporal S1 radar data

from cloud free regions?

1. Does the advanced U-Net model outperform a basic machine learning model?

2. To what degree do increasing cloud cover percentages affect prediction performances?



2 Methodology

2.1 Overview

Figure 2 Scheme of the core methodological steps.

2.2 Materials

Google Earth Engine and Python

S1 and S2 (S1/2) provide a large influx and a constant stream of globally and dynamically available

data preprocessed and accessible through Google Earth Engine (GEE). GEE is a spatial cloud computing

platform and is freely available for research, education, and nonprofit use (Google Earth Engine, n.d.).

When registered for GEE, a Python application programming interface (API) can be used in this

programming language’s coding environment with an authentication key. For this thesis, the freely

available ‘Jupyter’ notebook coding environment ‘Colaboratory’ (Google Colab, n.d.) is used with Python

3.x. Colaboratory allows access to virtual machines with graphical processing units that enable fast

training of DL algorithms. TensorFlow (TF) wrapped in Keras is the DL library of choice. Other libraries

include Folium and Matplotlib for visualization, Numpy and Pandas for data structuring, and Scipy and

Scikit-Learn/Image for statistics and ML processes. A single web-based notebook addresses the research

question auxiliary to the thesis report. To run the notebook, a Google account with a GEE registration

and an active Google Cloud Platform subscription is required.

Data specifications

S1 data stored in GEE is available as level-1 C-band SAR ground range detected images. These are

calibrated and orthorectified using the S1 Toolbox. S2 data is available as level-2A surface reflectance

images in the GEE database. These are atmospherically corrected and orthorectified. Other relevant

specifications are displayed in Table 1.



Table 1 Relevant S1/2 specifications.

Specification Sentinel-1 (S1) Sentinel-2 (S2)

Selected bands

VV (Vertical-Vertical polarized, ~5.404

GHz); VH (Vertical-Horizontal

polarized, ~5.404 GHz); and

𝜇 (i.e. mean) of VV and VH (𝜇𝑉𝑉,𝑉𝐻)

B2 (blue ~493nm); B3 (green ~560nm);

B4 (red ~665nm); B8 (nir ~833nm); and

B11 (swir1 ~1610nm); B12 (swir2

~2190nm)

Spatial

resolution VV and VH: 10x10m

B2-B8: 10x10m; and

B11 and B12: 20x20m

Temporal

resolution

6 days at the equator

(2-3 days at mid-latitudes)

5 days at the equator

(2-3 days at mid-latitudes)

Availability 2014-10-03 to present; and

near global coverage

2017-03-28 to present; and

near global coverage

Other

instrument mode: interferometric wide

swath (IW); orbit pass: ascending and

descending; and resolution: high

2.3 Methods

Study areas

Two study areas (i.e. test sites) are selected based on a study time interval where zero to minimal

atmospheric interferences are present, providing clean reference data for testing. The study time interval

can be of arbitrary length as long as it contains S1/2 data; for this thesis, an interval of 7 days is used to

ensure complete coverage of S1/2 data. The size of the study area must be large enough to encompass at

least two distinct features, such as a forested patch and an urban area or various agricultural fields with

different growth stages. The terrain in a study area must be topographically simple, minimizing added

uncertainties by the aforementioned limitations of optical and radar data. It is assumed that both datasets

are correctly preprocessed and devoid of atmospheric interferences. An in situ ‘laboratory’ situation is

created considering these assumptions as close as possible. The study area images are visually inspected

for each study time interval to meet the assumptions. It is difficult to achieve complete elimination of all

uncertainties, as preprocessing tools and atmospheric interference indications are not perfect. Table 2

provides specifications for the two selected study areas, which are partially visualized in Figure 4.

Table 2 Two study areas that meet uncertainty assumptions as close as possible. The study areas are in the

coordinate reference system of EPSG:4326.

Specification Flevoland, NL (study area 1) Amazon, BR (study area 2)

Description Various rectangular agricultural fields

with different growth stages.

Partially deforested area with rivers

and a town.

upper left (x, y),

upper right (x, y),

lower left (x, y),

lower right (x, y)

(5.670, 52.765),

(5.670, 52.725),

(5.735, 52.725),

(5.735, 52.765)

(-69.865, -6.650),

(-69.865, -6.686),

(-69.828, -6.686),

(-69.828, -6.650)

Area / Perimeter ~19.46 km2 / ~17.68 km ~16.36 km2 / ~16.14 km

Prediction time

interval [2019-08-20, 2019-08-27) [2019-07-25, 2019-08-01)



Preprocessing Sentinel data

S1/2 data is loaded from the GEE database and filtered based on metadata and spatial properties,

which include the required image bands, desired study time interval (i.e. 7 days), and the boundaries of

the study area. S1’s instrument mode, orbit pass, and resolution (as mentioned in Table 1) are filtered

based on the metadata as well. The filtered S1/2 data is each loaded in an image collection (stack of images)

for further processing using tools within GEE. The stacked images in each image collection are aggregated

to a median value at each pixel if there are two or more images overlaying. This empirically shows that

extreme values (e.g. clouds, haze, shadows, sensor artifacts, and speckle) are advantageously reduced or

eliminated, especially with larger time intervals, as is commonly practiced within the GEE community.

Furthermore, S2 is resampled to 30x30m by a single focal pass sampling the median within a circular

kernel, whilst S1 retains its 10x10m spatial resolution. This means that for 9 training feature pixels, one

response label pixel value is available, reducing the amount of variation during training. The higher

resample size of S2 data makes it heuristically more similar to the naturally fuzzier S1 data. It also reduced

spatial inaccuracies within the S2 data. Testing on prediction performance, the 30x30m resample size

outperforms the 10x10m resample size. However, when resample sizes become too large, say 60x60m,

spatial detail is lost. The final predicted pseudo-optical images are of a spatial resolution of 30x30m.

Additionally, the statistical Lee speckle filter (Lee, 1980) is applied to S1 data, reducing speckle whilst

retaining feature fidelity. This renders it more similar to the relative smooth S2 data. Moreover, the S1

images taken by ascending and descending orbit passes are aggregated by their median at each pixel,

resulting in reduced speckle and denser representations within a time interval. Fuzziness may increase

when fusing the orbital passes, as they capture the surface from different angles and directions. This effect

is especially apparent in complex terrains where topographic effects distort the ground truth. Finally, extra

temporal datapoints are added to the S1 data by repeating the above mentioned S1 preprocessing steps on

more than one time interval to indicate changes in backscatter values (VV and VH) over time, thus adding

a multitemporal component to the S1 training features (see Figure 3).

Figure 3 Scheme of multitemporal S1 datapoints. The first time interval (t1) of S1 is the same as the time interval

of S2 and has a length of 7 days. This is the time interval where S2 optical data is predicted and contains the first S1

temporal training features. Extra features are extracted by a time interval (t2) one week prior to the prediction time

interval. Lastly, three consecutive features are extracted from time intervals of a month each (t3, t4, and t5).

Individual radar images are heavily subdued to topographic effects and speckle. Aggregation of

multiple images will result in a median value that resembles spectral reality more accurately as extremes

are reduced and recurring features dominate. However, surface changes are washed out over longer time

intervals. This is the reason why both long (see t3, t4 and t5 in Figure 3) and short (see t1 and t2 in Figure

3) time intervals are used as training features. This effect is visualized in Figure 4, where the left S1 images

(short time interval) are fuzzier and noisier as compared to the center S1 images (long time interval). The

right S2 images are within the same prediction time interval as the left S1 image.



Figure 4 Top to bottom: Study area 1; Study area 2. Left to right: False color representation of S1 training

features in t1 (red: VV, green: VH, blue: 𝜇𝑉𝑉,𝑉𝐻); False color representation of S1 training features in t3 to t5 (red:

VV of t3, green: VV of t4, blue: VV of t5); True color of S2 response labels in t1.

Artificial clouds

Artificial clouds are generated using various magnitudes of gradient (Perlin) noise. The clouds can

vary in scatteredness and patch size. Shadows are not modeled for simplicity sake as the final mask is of

importance. A threshold can be set to indicate the cloud cover percentage in a study area. For the

experiments in this thesis, an initial seed driven pseudo-random state of artificial cloud dispersion is

generated resembling a single layer of medium sized cumulus cloud formations. This dispersion is

consistent, and the boundary of the initial state expands evenly when the cloud cover threshold increases

(see Figure 5). The artificial clouds allow extensive testing on reference data (i.e. ground-truth data) in

masked regions.

Figure 5 Artificial clouds cover in study area 1. Left to right: 10%; 20%; 30%.



Predicting with deep learning

All bands (i.e. response labels) in the preprocessed S1/2 datasets are masked on regions with artificial

cloud cover. These partially masked datasets will be used as input data to the U-Net (see Figure 6). The

model architecture of this U-Net is based on TF documentation (tensorflow/models, n.d.) and is similar

to the code provided in a notebook demo by Google (google/earthengine-api, n.d.).

Figure 6 A schematic overview of a U-Net architecture (Meados et al., 2019). The pixelwise relation between

training features (multitemporal S1 data) and response labels (S2 data) are encoded to a latent space where the

input (x) is spatially reduced to granular features. From this latent space the input is decoded (reconstructed) with

input and encoder layer concatenations to a predicted output (y).

The U-Net uses the process of semantic segmentation to transform continuous pixel values from

one image to another, in this case, from multitemporal S1 to S2 data. It looks at the relation between

individual pixels, but also considers their spatial context by convolving each layer to a lower spatial

dimension using different convolutional filters. Therefore, elevating its predictive power to a larger set of

granular features within the data. To be able to extract training features and the response label, a single

multidimensional image (𝑤𝑖𝑑𝑡ℎ (𝑖𝑛 𝑝𝑖𝑥𝑒𝑙𝑠) × ℎ𝑒𝑖𝑔ℎ𝑡 (𝑖𝑛 𝑝𝑖𝑥𝑒𝑙𝑠) × 𝑑𝑒𝑝𝑡ℎ (𝑖. 𝑒. 𝑙𝑎𝑦𝑒𝑟𝑠)) is created of the

preprocessed S1/2 image collections (11 layers (i.e. training features and response labels) in total). At each

pixel of this image a neighborhood array (i.e. patch) with a kernel size of 64 × 64 pixels is stored with a

depth of the training features and response labels (64 × 64 × 11). The kernel size should not be too small,

as spatial contexts may be lost due to limited training features. It should not be too large either, as cloud

masked regions may distort feature continuation within the patch, as well as to prevent the patch to extent

too far beyond the boundaries of a study area.

The patches are pseudo-randomly sampled from the image and stored into a TF record, which is a

special file format for the TF library to optimize DL workflows. The TF record is written into a TF dataset

in random access memory that shuffles the patches randomly up to a buffer size of 2000. This to prevent

a bias when iterating over the training data during the training phase. The buffer has 2000 batches of 16

shuffled patches with the training features (16 × 64 × 64 × 10) and the response label (16 × 64 × 64 ×

1). This is fed into the U-Net as input nodes. The model runs for an optimal number of epochs (the loss

usually stabilizes between 30 and 50 epochs with 30% cloud cover) and with 1000 steps per epoch.

The U-Net provides faster results with normalization of input data as this accelerates the process of

gradient decent. Before training, each input band for S1/2 is linearly normalized to a range of 0 and 1

between the 1st and 99th percentile of the original data range. This excludes extremes and often centers the

data around the median more evenly. The normalization can be denormalized with the original band

ranges and the extremes outside the percentile range will be lost after denormalization. This can be



avoided by normalizing to the minimum and maximum value. However, retained extremes will have a

negative effect on training times and prediction performance.

U-Net performance testing

The performance by the U-Net is tested by relating cloud free optical reference data to the predicted

optical data. Input data with increasing percentages (10%, 20%, and 30%) of artificial cloud cover are

trained in the U-Net. Each iteration has a pseudo random distribution of a similar cloud cover dispersion

by an initial seed to generate consistent results. The outputs are tested using several statistical metrics.

Basic metrics include the coefficient of determination (R2), the mean squared error (MSE) and the residual

prediction deviation (RPD). R2 values close to 1 indicate a correlation between datasets, lower MSE values

indicating a narrow fit around the identity line, and RPD values indicate prediction strength where values

above 3.00 are preferred. These metrics, together with their associated scatter plots, provide insight in the

predictive power of the model for each response label and increasing cloud cover percentages. However,

these metrics are limited in assessing complex image characteristics, such as image structure. Two images

can have the same MSE (e.g. by auto cancellation of spike noise) whilst having a completely different

image structure (e.g. texture). The structural similarity index (SSIM) solves this problem (Wang et al.,

2004; Wang and Bovik, 2009). This advanced metric elaborates upon the preceding peak signal-to-noise

ratio (PSNR) metric that only assesses noise. The SSIM provides an index ranging from 0 to 1, where 1 is

an identical structural similarity between images and 0 indicates large differences in image structure. All

metrics assess differences in continuous data as found in the pixels from the predicted images and together

provide a comprehensive indication of the model performance.

It should be noted that before applying the statistical metrics, the test images are dimensionally

reduced (2D to 1D) and encompass purely the regions within the 10% cloud cover mask. This to test the

absolute prediction performance of the models without interference of cloudless reference data or

additional data found in the larger cloud cover percentage masks. The image is flattened before testing

because the SSIM image quality model does not take empty pixels (i.e. masked regions) as inputs. The

null values with the masked regions are removed in the flattened image array.

Comparison to basic machine learning

The basic ML method of random forest regression is used to compare the U-Net performance to a

baseline. The random forest regressor (RFR), by the Python library Scikit-Learn with default settings,

takes as training features the multitemporal S1 data and as response labels the S2 data per pixel. The

response label predictions are then tested using the same performance testing methods as applied to the

U-Net predictions. The RFR uses the same preprocessed S1/2 data as the U-Net, and the RFR predictions

are compared to the same reference data. Furthermore, as a control, both the RFR and U-Net predictions

are compared to random noise generated from the input data ranges.

The comparison of each statistical metric derived from the RFR and U-Net predictions are

quantified by indicating metric value differences expressed in percentages. The differences between

prediction performances are further quantified by applying an independent two-sample t-test between the

statistical metrics of the models, as well as between different cloud cover percentages.



3 Results

3.1 U-Net predictions

The statistical metrics as seen in Table 5 display consistent results for the U-Net predictions in both

study areas. These results apply for most response labels and cloud cover percentages. The R2 is usually

above 0.70, suggesting a correlation between the reference and prediction data. The MSE is mostly below

0.02 in a data range from 0.00 to 1.00. Study area 2 has similar MSE values for all response labels, while

study area 1 shows a lower performance at the response labels in the visible range of the electromagnetic

spectrum. This pattern is seen at the RPD values as well. The RPD does not exceed the desired value of

3.00 for all response labels and averages around 2.20. The last metric, the SSIM, is stable and hovers around

0.70, indicating similarities in image structures even when other metrics show variation.

It is noticeable that response labels with long wavelengths, such as nir, swir1 and swir2, outperform

the other labels. The nir label has an especially good performance in study area 1 and swir1 performs

relatively well in study area 2. The MSAVI vegetation index has a consistent and decent performance in

both study areas.

A small difference, usually between 1% to 5%, is observed in most statistical metrics when

considering cloud coverages. Counterintuitively, decreasing cloud cover percentages do not always follow

an expected pattern of increasing performances. For example, in study area 1, the blue label always

performs better with a 30% cloud cover as compared to the 20% cloud cover. This effect differs per

response label and study area.

3.2 Comparing prediction models

Overall, the U-Net outperforms the RFR. The statistical metric values between the predictions of

both models differ from around 5% to 50% for study area 1. Even higher differences are observed in study

areas 2, ranging from around 10% to above a 100%. Performance between increasing cloud cover

percentages are small based on the U-Net predictions. Higher differences between increasing cloud cover

percentages for the RFR predictions are observed. Table 3 displays a significant difference (p-value < 0.05,

n=21) between the statistical metrics applied to the RFR and U-Net predictions when comparing all labels

and cloud coverages. The differences mostly favor the U-Net. An exception between the significant

difference concerns the nir label when grouping the statistical metrics based on individual response labels

in study area 1. This also accounts for the swir1 and MSAVI response label in study area 1 excluding the

SSIM metric, suggesting similar average pixel values whilst the image structure differs. Moreover, a

significant difference (p-value < 0.05, n=7) can be observed in the SSIM results when grouping cloud

coverages between the RFR and U-Net predictions. This can be explained by an increase in noise seen in

the RFR predictions with higher cloud coverages (see Figure 13), as compared to the more consistent

predictions of the U-Net. Study area 2 shows significant differences in most groups, excluding the nir,

swir2, and 10% group considering the SSIM results.

The differences between the models are further visualized for certain response labels (see Figure 9

and Figure 11). The selected response labels of the U-Net prediction show a narrower fit around the

identity line, with fewer outliers compared to the RFR predictions. The narrow fit is especially noticeable

in the histograms, where the edges of the marginal distribution are more aligned. The histograms of the

U-Net also show fewer spikes in study area 2.



Lastly, visually inspecting Figure 13 and Figure 14, the differences between the RFR and U-Net

predictions solidify. RFR predictions are noisier, display fewer features (especially visible in the urban

region found in the west of study area 2), and have more outliers (mainly visible in the supposedly smooth

agricultural fields of study area 1). For both the RFR and U-Net, the prediction regions are fussier

compared to the reference data. This is because the predictions are based on the naturally fuzzy radar data.

Table 3 The p-values quantify the difference in performance between the RFR and U-Net predictions based on

(grouped) statistical metrics. The p-values are derived from results found in Table 5.

Type Grouped p-values | Study area 1 p-values | Study area 2 Sample

size (n) R2 MSE RPD SSIM R2 MSE RPD SSIM

Response

labels

blue 0.07 0.07 0.06 < 0.05 < 0.05 < 0.05 < 0.05 < 0.05 3

green 0.08 0.08 0.07 < 0.05 < 0.05 < 0.05 < 0.05 < 0.05 3

red < 0.05 < 0.05 < 0.05 < 0.05 < 0.05 < 0.05 < 0.05 < 0.05 3

nir 0.56 0.56 0.6 0.17 < 0.05 < 0.05 < 0.05 0.22 3

swir1 0.16 0.16 0.17 < 0.05 < 0.05 < 0.05 < 0.05 < 0.05 3

swir2 < 0.05 < 0.05 < 0.05 < 0.05 < 0.05 < 0.05 < 0.05 < 0.05 3

MSAVI 0.12 0.12 0.13 < 0.05 < 0.05 < 0.05 < 0.05 0.41 3

Cloud

cover

10% 0.31 0.23 0.25 < 0.05 < 0.05 < 0.05 < 0.05 0.1 7

20% 0.2 0.13 0.17 < 0.05 < 0.05 < 0.05 < 0.05 < 0.05 7

30% 0.07 0.06 0.06 < 0.05 < 0.05 < 0.05 < 0.05 < 0.05 7

all < 0.05 < 0.05 < 0.05 < 0.05 < 0.05 < 0.05 < 0.05 < 0.05 21

3.3 Comparing cloud cover predictions

The differences between cloud coverage performances are not significant (see Table 4) considering

the statistical metrics found in Table 5. This is noticeable in the graphs visualizing certain response labels

of the U-Net predictions (see Figure 10 and Figure 12) and can be observed in the geographic

representations (see Figure 13 and Figure 14). As aforementioned, the U-Net displays an unexpected

pattern where lower cloud cover percentages do not necessarily produce better prediction results. For

study area 1, the 10% cloud cover predictions always outperform the higher coverages, while in study area

2, 30% cloud cover predictions sometimes outperform the lower cloud coverages (e.g. the red response

label). It contrasts the RFR predictions, as these follow the expected decreasing performance pattern and

smaller cloud cover percentages always outperform the larger ones. The U-Net shows consistent and high

p-values between cloud cover variations, and the p-value are larger compared to the RFR predictions.

Table 4 The p-values between statistical metrics as shown in Table 5. These compare the RFR and U-Net prediction

performance between cloud cover percentages over all response labels.

Model Grouped p-values | Study area 1 p-values | Study area 2 Sample

size (n) R2 MSE RPD SSIM R2 MSE RPD SSIM

U-Net

10%-20% 0.42 0.34 0.37 0.51 0.78 0.86 0.85 0.79 7

20%-30% 0.96 0.86 0.84 0.97 0.40 0.69 0.40 0.91 7

10%-30% 0.35 0.26 0.24 0.51 0.69 0.86 0.61 0.70 7

RFR

10%-20% 0.24 0.23 0.23 0.50 0.36 0.52 0.43 0.46 7

20%-30% 0.60 0.58 0.58 0.79 0.81 0.86 0.84 0.94 7

10%-30% 0.11 0.09 0.09 0.35 0.26 0.42 0.33 0.41 7



3.4 Test results & visualizations

Table 5 The statistical metrics of S2 optical data predictions within artificially cloud masked regions. Random

noise control metrics are included. Appendix 6.1 extends the table by displaying percent differences between the

metrics. Figure 7 and Figure 8 in this paragraph visualize the table. Figure 9 to Figure 12 visualize the regression

and histogram graphs of some metrics. Figure 13 and Figure 14 geographically display the prediction results.

Study

area

Response

labels

Cloud

cover Noise

R2 Noise

MSE Noise

RPD Noise

SSIM

RFR U-Net RFR U-Net RFR U-Net RFR U-Net

1

blue

10%

-0.45

0.74 0.78

0.1297

0.0231 0.0195

0.83

1.96 2.14

0.09

0.66 0.71

20% 0.69 0.74 0.0275 0.0229 1.80 1.97 0.64 0.69

30% 0.67 0.76 0.0296 0.0217 1.74 2.03 0.64 0.71

green

10%

-0.51

0.67 0.70

0.1085

0.0236 0.0213

0.81

1.74 1.83

0.10

0.68 0.72

20% 0.58 0.68 0.0298 0.0233 1.55 1.76 0.65 0.71

30% 0.54 0.69 0.0329 0.0223 1.48 1.79 0.64 0.71

red

10%

-0.33

0.79 0.84

0.1494

0.0239 0.0179

0.87

2.17 2.51

0.08

0.65 0.73

20% 0.74 0.83 0.0297 0.0192 1.95 2.42 0.63 0.73

30% 0.71 0.80 0.0329 0.0227 1.85 2.23 0.63 0.71

nir

10%

-0.92

0.86 0.85

0.1239

0.0093 0.0094

0.72

2.63 2.62

0.09

0.78 0.80

20% 0.82 0.82 0.0113 0.0113 2.38 2.39 0.76 0.78

30% 0.81 0.84 0.0123 0.0102 2.29 2.51 0.76 0.78

swir1

10%

-0.50

0.77 0.80

0.1086

0.0168 0.0142

0.82

2.07 2.26

0.10

0.73 0.78

20% 0.72 0.75 0.0203 0.0180 1.89 2.00 0.72 0.77

30% 0.69 0.76 0.0224 0.0171 1.80 2.06 0.72 0.78

swir2

10%

-0.39

0.81 0.88

0.1293

0.0174 0.0115

0.85

2.31 2.85

0.09

0.72 0.80

20% 0.77 0.85 0.0213 0.0142 2.09 2.56 0.70 0.79

30% 0.76 0.83 0.0227 0.0154 2.02 2.46 0.70 0.78

MSAVI

10%

-0.50

0.85 0.88

0.1551

0.0154 0.0122

0.82

2.59 2.91

0.08

0.74 0.79

20% 0.82 0.86 0.0188 0.0142 2.35 2.70 0.72 0.77

30% 0.80 0.84 0.0211 0.0167 2.21 2.49 0.71 0.76

2

blue

10%

-1.42

0.48 0.79

0.1763

0.0381 0.0155

0.64

1.38 2.17

0.05

0.54 0.66

20% 0.39 0.78 0.0444 0.0160 1.28 2.13 0.50 0.68

30% 0.37 0.77 0.0459 0.0165 1.26 2.10 0.51 0.67

green

10%

-1.14

0.45 0.78

0.1509

0.0386 0.0158

0.68

1.35 2.11

0.07

0.55 0.67

20% 0.37 0.79 0.0447 0.0150 1.26 2.17 0.52 0.67

30% 0.34 0.77 0.0464 0.0161 1.23 2.10 0.51 0.67

red

10%

-1.09

0.47 0.73

0.1713

0.0437 0.0222

0.69

1.37 1.92

0.06

0.50 0.58

20% 0.39 0.74 0.0498 0.0211 1.28 1.97 0.47 0.61

30% 0.37 0.75 0.0518 0.0206 1.26 2.00 0.47 0.65

nir

10%

-0.96

0.73 0.77

0.1051

0.0143 0.0122

0.72

1.94 2.10

0.11

0.67 0.67

20% 0.70 0.76 0.0160 0.0129 1.83 2.04 0.63 0.66

30% 0.70 0.75 0.0163 0.0134 1.82 2.00 0.63 0.67

swir1

10%

-0.64

0.62 0.81

0.0907

0.0210 0.0107

0.78

1.63 2.27

0.11

0.67 0.76

20% 0.57 0.80 0.0239 0.0112 1.52 2.22 0.64 0.76

30% 0.55 0.79 0.0247 0.0117 1.50 2.17 0.64 0.76

swir2

10%

-1.11

0.52 0.72

0.1243

0.0285 0.0167

0.69

1.44 1.88

0.08

0.65 0.72

20% 0.44 0.75 0.0328 0.0148 1.34 1.99 0.62 0.72

30% 0.42 0.72 0.0341 0.0166 1.31 1.88 0.62 0.72

MSAVI

10%

-0.80

0.69 0.79

0.1154

0.0198 0.0135

0.75

1.80 2.18

0.10

0.69 0.68

20% 0.65 0.79 0.0227 0.0133 1.68 2.20 0.66 0.68

30% 0.63 0.78 0.0238 0.0140 1.64 2.14 0.65 0.67



Figure 7 Visualizing statistical metrics of study area 1.

Figure 8 Visualizing statistical metrics of study area 2.



Figure 9 The regressions (top) and histograms (bottom) visualize the relationship between predictions and cloud

free reference data of the swir1 response label in study area 1 with a cloud coverage of 20%. Left to right: random

noise; RFR predictions; U-Net predictions.

Figure 10 The regressions (top) and histograms (bottom) visualize the relationship between the U-Net predictions

and cloud free reference data of the red response label in study area 1 with increasing cloud coverages. Left to

right: 10%; 20%; 30%.



Figure 11 The regressions (top) and histograms (bottom) visualize the relationship between predictions and cloud

free reference data of the green response label in study area 2 with a cloud coverage of 20%. Left to right: random

noise; RFR predictions; U-Net predictions.

Figure 12 The regressions (top) and histograms (bottom) visualize the relationship between the U-Net predictions

and cloud free reference data of the MSAVI response label in study area 2 with increasing cloud coverages. Left to

right: 10%; 20%; 30%.



Figure 13 Images visualize a subsection in study area 1 of the response labels red, green, and blue. Top to bottom:

cloud coverages: 10%; 20%; 30%. Left to right: Cloud free reference image; artificially cloud masked image; RFR

prediction composite; U-Net prediction composite.

Figure 14 Images visualize the complete study area 2 of the response labels nir, swir1, and swir2. Top to bottom:

cloud coverages: 10%; 20%; 30%. Left to right: Cloud free reference image; artificially cloud masked image; RFR

prediction composite; U-Net prediction composite.



4 Discussion

4.1 Interpreting the U-Net results

As of now, the experiments indicate similar results between the two study areas with different

features. However, there are some differences. Study area 2 has more consistent results compared to study

area 1, whereas the latter has a higher performance. The first could be explained by the lack of variation

in historic training data in study area 2 (as seen in the bottom-middle image of Figure 4). Too much

variation in historic training data could potentially cause completely covered response labels to be

misclassified in the prediction phase. This because the prediction model was not fed with the

corresponding set of training features. The higher performance in study area 1 may be explained by the

relative smoothness of agricultural fields and their reoccurring patterns. Even though the study area is

heterogenous, many fields are similar to each other. The major outliers are likely the cloud covered fields

with a unique set of historic training feature information. The U-Net outperforms the RFR in both study

areas. This is likely because the RFR only predicts on an individual pixel level, ignoring the spatial context.

This is visible in the noisier RFR predictions that are caused by the spike noise in the radar data. Reducing

this initial spike noise in the radar data while maintain feature fidelity could improve prediction results.

Alternative DL algorithms could step in as denoisers to replace the currently used Lee filter, where GANs

are good candidates (Chen et al., 2018, 2020).

Although the prediction models are extensively tested, a lack of diverse and numerous study areas

still pose a major limitation to the rigorousness of the results. The results should therefore be seen as

experiments that provide an empirical approximation of the performance by the proposed U-Net. More

study areas of different sizes should be tested with variations in artificial cloud cover distributions and

percentages.

Response labels

A pattern is observed where some response labels consistently outperform other labels, mainly seen

in study area 1. The nir response label has a good performance compared to labels (i.e. bands) in the visible

part of the electromagnetic spectrum. This may be explained by its relative high reflectance on cellular

structures (e.g. from foliage) possibly creating easier to distinguish classification features as the structural

reflectance increases representation of physical surface properties. A similar principle contributes to the

decent performance of the MSAVI label, that additionally rationalizes between multiple bands to

eliminate non vegetation elements providing less variation in training features. The better performance

seen in the swir1/2 label could be explained by similar effects of the nir label as well as the larger resolution

cell sizes found in these response labels, avoiding geographic mismatches. The lower resolution makes the

swir1/2 response label more similar to the fuzzier S1 data. This also means that higher resample sizes of

S2 data could lead to improved prediction performances, at a cost of spatial accuracy. Lastly, the R2, MSE

and RPD values often have more variation compared to the SSIM, for example the MSE values of nir

outperform swir2 prediction results in study area 1, while the SSIM is almost similar. This could mean

that the U-Net retains decent prediction image structures, even when pixel values differ as a whole. The

same could be said for the small performance differences between cloud cover percentages, especially seen

in study area 2.



Cloud cover iterations

Decreasing cloud cover percentages do not necessarily lead to improved prediction performances

regarding the U-Net. Several factors could be at play, including the neighborhood random sampling

method of the U-Net input data preparation, the variation in the study area (as this effect differs per

response label for each study area), the cloud cover distribution, or other undetected reasons. For example,

the blue response label in study area 1 always sees a decreased performance in the 20% cloud cover

predictions compared to 10% and 30%. This may be due to training features that are partially included in

the 20% cloud cover data, whilst they are completely excluded in the 30% cloud cover data and completely

included in the 10% cloud cover data. Certain outlier features could disturb prediction performance by

incorrectly predicting the response labels in the 20% cloud cover data. This effect may be apparent in

study area 1 as seen in Figure 13 (top right to bottom right images), where at 20% cloud cover data, some

agricultural fields are cut off by a cloud gap transition, whilst at the 30% cloud cover data this transition

is coincidentally conveniently located. This also accounts for study area 2 as seen in Figure 14, where

occasionally improved performances (e.g. red response label) are observed with higher cloud cover

percentages. In this case it could be explained by the less detailed prediction outputs at higher cloud

coverages, potentially reducing sharp outliers.

4.2 Literature comparison

The statistical metrics used to indicate performance of the model are used in other research as well

and provide an insight of its prediction accuracy. Comparing these predictions to other research is

challenging, mainly because of the usage of different study areas, artificial cloud cover

variations/techniques, resample sizes, preprocessing steps, input data sources, selected statistical metrics,

etcetera. For example, Cresson et al. (2019) uses a single statistical metric for testing, the PSNR, to

determine prediction performance. This does not extensively analyses their model performance.

Furthermore, their preprocessing and testing methods differ, such as using cloud cover consisting of

relatively small and similarly sized rectangles over larger study areas. Their method also differs by the

inclusion of posterior training data, instead of purely utilizing historic or contemporary data. This renders

their predictions less timely compared to the approach of this thesis. Posterior data points could be

included to the multitemporal training features suggested in this thesis to improve results, at a cost of

timeliness. Recent research by Gao et al. (2020) uses the R2 and SSIM metrics seen in this thesis. However,

they leave out the prediction results by S1/2 data, rendering a direct comparison impossible. For the

advanced radar data utilized, a decent performance is observed with R2 and SSIM values above 0.90,

potentially by their addition of a cGAN for finetuning U-Net prediction results as well as the selection of

relatively homogenous study areas. This extra step of adding a cGAN could be added to the method of

this thesis to further improve prediction performances. Lastly, earlier research poses problems where

methods deviate significantly, for example by Wang and Petal (2018), who did not use a U-Net for

prediction purposes and did not generate artificial cloud cover for testing. They did use similar statistical

metrics for testing as seen in this thesis, namely the SSIM, which for their study areas resulted in a mean

value of 0.853. The bottom line is that direct comparisons with previous research is difficult. Therefore,

further research should align the proposed models, study areas, and testing strategies of this thesis and the

above mentioned research to gain a comprehensive overview of their differences in performance.



4.3 Methodological approach

Study areas and preprocessing

The S1/2 data in the study areas need to be well preprocessed in order to get optimal results from

the U-Net. One of these preprocessing steps is input normalization. Prediction results without input

normalization had reduced performances and longer training durations. It must be noted that when

normalizing data, additional input data should be normalized using a similar range in order to work

properly. Another preprocessing step concerns multitemporal time interval selection of the S1 radar data.

Currently, these intervals last one week up to a month and are relatively arbitrarily selected with two

criteria to not use posterior training data and the assurance of total coverage of S1/2 data. Additional time

intervals of various lengths could be added that may improve prediction performance. Alternatively,

individual images could be used as training features instead of the currently used median aggregations

over the mentioned time intervals. This results in temporally discrete training features sampled from

individual S1/2 images within a study area. These individual images provide more training features.

However, each training image is subject to an increase in spike noise, topographic effects, and atmospheric

interference because extremes are not filtered out. It also often results in incomplete coverage of the study

area. These issues pose a problem with individual images, both due to the nature of datasets and the

selected study area.

The quality of the training images highly varies within each study area contributing to a set of

uncertainties. In practice, these uncertainties are never completely eliminated, even with longer time

intervals. When studying topographically complex terrains or clouded areas, prediction performances may

be severely hampered. For example, relating to topographic effects, current methods applying

orthorectification on S1 data and terrain shadow removal of S2 data are often insufficient. Masking clouds

in S2 data is not perfect either, even with tools developed dealing with this problem, a perfect and

automated elimination of atmospheric interference still has limitations (Coluzzi et al., 2018; Zhu and

Helmer, 2018). The method suggested in this thesis could be expanded to study areas that incorporate

more uncertainties to further test the performance, such as mountainous or hazy areas (e.g. long lasting

forest fires). Even though the current study areas are empirically examined on uncertainties, a complete

elimination of all possible interferences and effects is unrealistic. This could possibly be solved by

artificially generating clean S1/2 data to train and test prediction models.

Artificial cloud improvements

Artificial cloud cover generation could be improved as well. Currently, a simple gradual noise is

used somewhat emulating one type of cloud cover. This could be expanded by adding additional layers of

cloud cover with variations in opacity, distribution, shape, etcetera. Cloud shadows should also be added

for a more realistic approach, where potentially these cloud shadows, together with parts of the artificial

clouds, could be left in the training data to simulate real world imperfections in cloud masking. Another

product arising from the cloud cover generation is scatteredness of the clouds, these affect the continuity

within the training patches. Increased cloud scatteredness means less continuity in the training patches

fed into the U-Net. This is also the reason why generating artificial clouds was preferred over just selecting

rectangular cloud free training areas and cloud covered prediction areas. By generating somewhat realistic

clouds within one study area, a real world situation is created where certain irregular parts of a study area

are covered by clouds that need to be reconstructed.



Prediction models and testing

For this thesis, a mildly modified preexisting U-Net architecture was used. The main contributions

were hyperparameter optimization and custom input data preprocessing. The performance of the U-Net

may be improved by adjusting its hyperparameters using Bayesian optimization and structured prediction

(Zhang et al., 2015). Further improvements can be made by adjusting its architecture and training

durations based on empirically optimal loss values to avoid over or underfitting instead of the fixed 30

epochs. The U-Net model should be compared to applying other models found in Python libraries (e.g.

Skikit-learn/image), where expert knowledge of the model is disregarded focusing mainly on the

application and results of the model. This is similar to applying the RFR model, developed by experts in

the field and used by researchers within different domains. Nevertheless, prediction results of these

models should be scientifically assessed and tested in order to prove their validity. Having a solid test

scheme for prediction models is important to provide insight in their accuracy. The artificial cloud cover

provided clean reference data for prediction testing on a pixel level within masked regions. This gives the

approach of this thesis an edge over prediction problems where reference data (i.e. verification or ground-

truth data) is hard to acquire, inaccurate, or incomplete. Using this approach gives freedom to expand to

other study areas, improving and testing the prediction models even further.

A final note on the testing results is that all statistical metrics are based on flattened prediction

images, as described in the methodology. This dimension reduction was necessary to use the prediction

data in the SSIM tool. Not flattening the images could result in different statistical metric values,

potentially being higher than the current values. Future research should custom build the SSIM tool that

is able to exclude non-numerical values (e.g. the unmasked training regions) in a 2D image.



5 Conclusion

The prediction results of the U-Net model display promising performances for both study areas.

The performance is consistent for all response labels and cloud cover percentages. It significantly

outperforms the RFR model for most response labels. This suggest that using advanced DL is preferred

over basic ML when prediction accuracy is important. A similar conclusion has been made by Gao et al.

(2020). This is further encouraged by the fact that most convolutional DL models, such as the U-Net, are

able to process large amounts of imagery data. It allows for scalability to larger and numerous study areas.

The discussed U-Net uses solitary historic and contemporary multitemporal training data, making the

approach, together with it being entirely cloud computing based, appropriate for applications where

timely and dynamic predictions are required. Predicting optical data using radar data with the help of DL

is still relatively new, where this thesis hopes to contribute to maturing this research field.



6 Appendix

6.1 Extended tables

Table 6 Study area 1 statistical metrics of Table 5 extended with percent differences. Horizontal arrows with a

percentage in the middle indicate the difference between the RFR and U-Net performance for the relating metric.

The vertical arrows with percentages on each side indicate the difference between the performance of each cloud

cover iteration.

10%

↓ ∆%

20%

↓ ∆%

30%

10%

↓ ∆%

10%

↓ ∆%

30%

10%

↓ ∆%

20%

↓ ∆%

30%

10%

↓ ∆%

30%

↓ ∆%

30%

10%

↓ ∆%

10%

↓ ∆%

30%

10%

↓ ∆%

30%

↓ ∆%

30%

10%

↓ ∆%

10%

↓ ∆%

30%

Study

area

1

0.69 → +6.9% → 0.74

0.67 → +11.8% → 0.76

-3.5% ↓ +1.9%

0.74 → +5.1% → 0.78

RFR → ∆% → U-Net

-7.1% ↓ -5.1%

-0.4

5

Lab

el Cloud

cover

blu

e

No

ise R2


0.8

3

1.96 → +8% → 2.14

-9.1% ↓ -8.3%

1.8 → +8.7% → 1.97

-3.8% ↓ +2.8%

1.74 → +14.5% → 2.03


0.1

29

7

0.0231 → -18.3% → 0.0195

+16% ↓ +14.8%

0.0275 → -20% → 0.0229

+7.1% ↓ -5.8%

0.0296 → -36.6% → 0.0217

No

ise RPD

No

ise MSE


0.0

9

0.66 → +7.1% → 0.71

-3.4% ↓ -3.4%

0.64 → +7.1% → 0.69

-0.8% ↓ +3.1%

0.64 → +10.8% → 0.71

No

ise SSIM

1.55 → +11.7% → 1.76 0.65 → +8.2% → 0.71

-8% ↓ +1.9% +9.5% ↓ -4.3% -5.1% ↓ +2.1% -1.4% ↓ +0%

1.74 → +5% → 1.83

0.1

0

0.68 → +5.2% → 0.72

-14.9% ↓ -4.1% +20.9% ↓ +8.5% -12.4% ↓ -4.5% -4% ↓ -0.6%

0.67 → +4.6% → 0.7

0.1

08

5

0.0236 → -10.8% → 0.0213

0.8

1

0.58 → +13.5% → 0.68 0.0298 → -28.1% → 0.0233

1.48 → +17.7% → 1.79 0.64 → +9.5% → 0.71

red

-0.3

3

0.79 → +6.3% → 0.84

0.1

49

4

0.0239 → -33.3% → 0.0179

0.8

7

2.17 → +13.4% → 2.51

gre

en

-0.5

1

0.54 → +21.5% → 0.69 0.0329 → -47.6% → 0.0223

0.0297 → -55% → 0.0192 1.95 → +19.7% → 2.42 0.63 → +12.9% → 0.73

-4% ↓ -3.9% +9.8% ↓ +15.6% -5.3% ↓ -8.9% -1.2% ↓ -2.1%

0.0

8

0.65 → +11.2% → 0.73

-7% ↓ -1.3% +19.5% ↓ +6.4% -11.4% ↓ -3.4% -2.6% ↓ -0.7%

0.74 → +11.3% → 0.83

0.71 → +11.3% → 0.8 0.0329 → -44.9% → 0.0227 1.85 → +16.9% → 2.23 0.63 → +12.2% → 0.71

nir

-0.9

2

0.86 → -0.2% → 0.85

0.1

23

9

0.0093 → +1.1% → 0.0094

-3.8% ↓ -3.6% +17.9% ↓ +16.8% -10.4% ↓ -9.6% -2.9% ↓ -2.4%

0.82 → +0.1% → 0.82 0.0113 → -0.3% → 0.0113 2.38 → +0.1% → 2.39 0.76 → +2.2% → 0.780.7

2

2.63 → -0.5% → 2.62

0.0

9

0.78 → +1.7% → 0.8

-1.8% ↓ +2% +7.7% ↓ -10.3% -4.1% ↓ +4.8% -0.4% ↓ 0%

0.81 → +3.8% → 0.84 0.0123 → -19.9% → 0.0102 2.29 → +8.7% → 2.51 0.76 → +2.6% → 0.78

1.89 → +5.7% → 2 0.72 → +6.6% → 0.77

-4.2% ↓ +1.8% +9.4% ↓ -5.7% -5.1% ↓ +2.8% -0.9% ↓ +0.3%

2.07 → +8.2% → 2.26

0.1

0

0.73 → +7.1% → 0.78

-6.6% ↓ -7.1% +16.9% ↓ +21.4% -9.7% ↓ -12.8% -0.6% ↓ -1.2%

0.77 → +4.6% → 0.8

0.1

08

6

0.0168 → -18.7% → 0.0142

0.8

2

0.72 → +4.1% → 0.75 0.0203 → -12.3% → 0.018

1.8 → +12.7% → 2.06 0.72 → +7.7% → 0.78

swir

2

-0.3

9

0.81 → +7.3% → 0.88

0.1

29

3

0.0174 → -51.9% → 0.0115

0.8

5

2.31 → +18.9% → 2.85

swir

1

-0.5

0

0.69 → +9.6% → 0.76 0.0224 → -31.1% → 0.0171

0.0213 → -50.2% → 0.0142 2.09 → +18.4% → 2.56 0.7 → +10.6% → 0.79

-2.1% ↓ -1.6% +6.4% ↓ +8% -3.4% ↓ -4.3% -0.7% ↓ -0.2%

0.0

9

0.72 → +10.6% → 0.8

-5.4% ↓ -3.5% +18.3% ↓ +19.2% -10.6% ↓ -11.3% -1.9% ↓ -2%

0.77 → +9% → 0.85

0.76 → +9.5% → 0.83 0.0227 → -47.6% → 0.0154 2.02 → +17.7% → 2.46 0.7 → +11% → 0.78

MS

AV

I

-0.5

0

0.85 → +3.5% → 0.88

0.1

55

1

0.0154 → -26.3% → 0.0122

-4% ↓ -2.2% +17.8% ↓ +13.7% -10.3% ↓ -7.6% -2.9% ↓ -2.9%

0.82 → +5.2% → 0.86 0.0188 → -32.7% → 0.0142 2.35 → +13.2% → 2.7 0.72 → +5.6% → 0.770.8

2

2.59 → +11% → 2.91

0.0

8

0.74 → +5.6% → 0.79

-2.8% ↓ -3% +11.1% ↓ +15.3% -6.1% ↓ -8.7% -1.9% ↓ -0.3%

0.8 → +5.1% → 0.84 0.0211 → -26.3% → 0.0167 2.21 → +11% → 2.49 0.71 → +7% → 0.76



Table 7 Study area 2 statistical metrics of Table 5 extended with percent differences. Horizontal arrows with a

percentage in the middle indicate the difference between the RFR and U-Net performance for the relating metric.

The vertical arrows with percentages on each side indicate the difference between the performance of each cloud

cover iteration.

10%

↓ ∆%

20%

↓ ∆%

30%

10%

↓ ∆%

30%

↓ ∆%

30%

10%

↓ ∆%

20%

↓ ∆%

30%

10%

↓ ∆%

30%

↓ ∆%

30%

10%

↓ ∆%

10%

↓ ∆%

30%

10%

↓ ∆%

20%

↓ ∆%

30%

10%

↓ ∆%

10%

↓ ∆%

30%

MSE

No

ise RPD

No

ise SSIM

RFR → ∆% → U-Net RFR → ∆% → U-Net RFR → ∆% → U-Net RFR → ∆% → U-Net

2

Study

area Lab

el Cloud

cover No

ise R2

No

ise

0.1

0

0.69 → -1.7% → 0.68

-7.1% ↓ +0.3% +12.9% ↓ -1.2% -7.2% ↓ +0.6% -5% ↓ +0.1%

0.65 → +18.5% → 0.79 0.0227 → -70.8% → 0.0133 1.68 → +23.5% → 2.2 0.66 → +3.2% → 0.68

-2.7% ↓ -1.4% +4.6% ↓ +5% -2.4% ↓ -2.6% -1.1% ↓ -0.7%

0.63 → +19.6% → 0.78 0.0238 → -70.2% → 0.014 1.64 → +23.4% → 2.14 0.65 → +3.6% → 0.67

MS

AV

I

-0.8

0

0.69 → +12.5% → 0.79

0.1

15

4

0.0198 → -46.9% → 0.0135

0.7

5

1.8 → +17.5% → 2.18

0.0

8

0.65 → +9.4% → 0.72

-16.5% ↓ +4.4% +13.1% ↓ -13% -7.3% ↓ +5.9% -4.4% ↓ +0.6%

0.44 → +40.9% → 0.75 0.0328 → -121.9% → 0.0148 1.34 → +32.9% → 1.99 0.62 → +13.7% → 0.72

-5.2% ↓ -4.3% +3.8% ↓ +11% -2% ↓ -6% -0.7% ↓ +0.1%

0.42 → +41.5% → 0.72 0.0341 → -105.4% → 0.0166 1.31 → +30.2% → 1.88 0.62 → +14.4% → 0.72

swir

2

-1.1

1

0.52 → +28% → 0.72

0.1

24

3

0.0285 → -70.7% → 0.0167

0.6

9

1.44 → +23.5% → 1.88

0.1

1

0.67 → +11.9% → 0.76

-9.1% ↓ -1.1% +12% ↓ +4.4% -6.6% ↓ -2.3% -4.7% ↓ +0%

0.57 → +28.6% → 0.8 0.0239 → -112.7% → 0.0112 1.52 → +31.4% → 2.22 0.64 → +15.8% → 0.76

-2.8% ↓ -1.2% +3.5% ↓ +4.4% -1.8% ↓ -2.3% -0.8% ↓ -0.8%

0.55 → +29.7% → 0.79 0.0247 → -110.9% → 0.0117 1.5 → +31.1% → 2.17 0.64 → +15.8% → 0.76

swir

1

-0.6

4

0.62 → +23% → 0.81

0.0

90

7

0.021 → -95.8% → 0.0107

0.7

8

1.63 → +28.5% → 2.27

0.1

1

0.67 → -0.3% → 0.67

-4.4% ↓ -1.7% +10.3% ↓ +5.4% -5.6% ↓ -2.8% -6.5% ↓ -1.6%

0.7 → +7.5% → 0.76 0.016 → -23.6% → 0.0129 1.83 → +10% → 2.04 0.63 → +4.3% → 0.66

-0.8% ↓ -1.3% +1.9% ↓ +3.8% -1% ↓ -1.9% -0.5% ↓ +0.8%

0.7 → +7.1% → 0.75 0.0163 → -21.3% → 0.0134 1.82 → +9.2% → 2 0.63 → +5.6% → 0.67

nir

-0.9

6

0.73 → +5.1% → 0.77

0.1

05

1

0.0143 → -17.2% → 0.0122

0.7

2

1.94 → +7.6% → 2.1

0.0

6

0.5 → +13% → 0.58

-18.9% ↓ +1.8% +12.2% ↓ -5.2% -6.7% ↓ +2.5% -6% ↓ +6%

0.39 → +47% → 0.74 0.0498 → -135.5% → 0.0211 1.28 → +34.8% → 1.97 0.47 → +22.9% → 0.61

-6.5% ↓ +0.9% +3.8% ↓ -2.6% -2% ↓ +1.3% -0.3% ↓ +6%

0.37 → +50.7% → 0.75 0.0518 → -151.2% → 0.0206 1.26 → +36.9% → 2 0.47 → +27.8% → 0.65

red

-1.0

9

0.47 → +35.8% → 0.73

0.1

71

3

0.0437 → -96.4% → 0.0222

0.6

9

1.37 → +28.6% → 1.92

0.0

7

0.55 → +17.7% → 0.67

-23.4% ↓ +1.4% +13.6% ↓ -5.1% -7.6% ↓ +2.5% -6.4% ↓ +1.1%

0.37 → +53.3% → 0.79 0.0447 → -197.2% → 0.015 1.26 → +42% → 2.17 0.52 → +23.5% → 0.67

-7% ↓ -1.9% +3.6% ↓ +6.5% -1.9% ↓ -3.4% -0.5% ↓ -1.1%

0.34 → +55.5% → 0.77 0.0464 → -188.3% → 0.0161 1.23 → +41.1% → 2.1 0.51 → +23% → 0.67

gre

en

-1.1

4

0.45 → +41.6% → 0.78

0.1

50

9

0.0386 → -144.3% → 0.0158

0.6

8

1.35 → +36% → 2.11

0.0

5

0.54 → +19.1% → 0.66

-22.1% ↓ -0.9% +14.2% ↓ +3.2% -8% ↓ -1.6% -6.1% ↓ +2.3%

0.39 → +49.9% → 0.78 0.0444 → -177.1% → 0.016 1.28 → +39.9% → 2.13 0.5 → +25.5% → 0.68

-5.7% ↓ -0.9% +3.3% ↓ +3.1% -1.7% ↓ -1.6% +0.1% ↓ -1.4%

0.37 → +52.1% → 0.77 0.0459 → -177.9% → 0.0165 1.26 → +40% → 2.1 0.51 → +24.5% → 0.67

blu

e

-1.4

20.48 → +39.3% → 0.79

0.1

76

3

0.0381 → -145.5% → 0.0155

0.6

4

1.38 → +36.2% → 2.17



7 References

Belgiu, M. and Stein, A.: Spatiotemporal Image Fusion in Remote Sensing, Remote Sens., 11(7), 818,

doi:10.3390/rs11070818, 2019.

Bermudez, J. D., Happ, P. N., Oliveira, D. A. B. and Feitosa, R. Q.: SAR to Optical Image Synthesis for

Cloud Removal with Generative Adversarial Networks, ISPRS Ann. Photogramm. Remote Sens. Spat.

Inf. Sci., IV–1, 5–11, doi:10.5194/isprs-annals-IV-1-5-2018, 2018.

Bermudez, J. D., Happ, P. N., Feitosa, R. Q. and Oliveira, D. A. B.: Synthesis of Multispectral Optical

Images From SAR/Optical Multitemporal Data Using Conditional Generative Adversarial Networks,

IEEE Geosci. Remote Sens. Lett., 16(8), 1220–1224, doi:10.1109/LGRS.2019.2894734, 2019.

Chen, B., Huang, B. and Xu, B.: Multi-source remotely sensed data fusion for improving land cover

classification, ISPRS J. Photogramm. Remote Sens., 124, 27–39, doi:10.1016/j.isprsjprs.2016.12.008, 2017.

Chen, J., Chen, J., Chao, H. and Yang, M.: Image Blind Denoising with Generative Adversarial Network

Based Noise Modeling, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.

3155–3164, IEEE, Salt Lake City, UT., 2018.

Chen, Z., Zeng, Z., Shen, H., Zheng, X., Dai, P. and Ouyang, P.: DN-GAN: Denoising generative

adversarial networks for speckle noise reduction in optical coherence tomography images, Biomed. Signal

Process. Control, 55, 101632, doi:10.1016/j.bspc.2019.101632, 2020.

Coluzzi, R., Imbrenda, V., Lanfredi, M. and Simoniello, T.: A first assessment of the Sentinel-2 Level 1-C

cloud mask product to support informed surface analyses, Remote Sens. Environ., 217, 426–443,

doi:10.1016/j.rse.2018.08.009, 2018.

Cresson, R., Ienco, D., Gaetano, R., Ose, K. and Ho Tong Minh, D.: Optical image gap filling using deep

convolutional autoencoder from optical and radar images, IEEE, Yokohama, Japan., 2019.

Eckardt, R., Berger, C., Thiel, C. and Schmullius, C.: Removal of Optically Thick Clouds from Multi-

Spectral Satellite Images Using Multi-Frequency SAR Data, Remote Sens., 5(6), 2973–3006,

doi:10.3390/rs5062973, 2013.

ESDS: Remote Sensors | Earthdata, [online] Available from: https://earthdata.nasa.gov/learn/remote-

sensors (Accessed 8 October 2019), 2019.

Gao, F., Anderson, M. C. and Xie, D.: Spatial and temporal information fusion for crop condition

monitoring, in 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp.

3579–3582., 2016.

Gao, J., Yuan, Q., Li, J., Zhang, H. and Su, X.: Cloud Removal with Fusion of High Resolution Optical

and SAR Images Using Generative Adversarial Networks, Remote Sens., 12(1), 191,

doi:10.3390/rs12010191, 2020.

Google Colab: Welcome To Colaboratory - Colaboratory, [online] Available from:

https://colab.research.google.com/notebooks/welcome.ipynb (Accessed 3 October 2019), n.d.



Google Earth Engine: FAQ – Google Earth Engine, [online] Available from:

https://earthengine.google.com/faq/ (Accessed 3 October 2019), n.d.

google/earthengine-api: google/earthengine-api, GitHub [online] Available from:

https://github.com/google/earthengine-api (Accessed 10 February 2020), n.d.

Gore, A.: The digital earth, Aust. Surv., 43(2), 89–91, doi:10.1080/00050326.1998.10441850, 1998.

Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, D., Thau, D. and Moore, R.: Google Earth Engine:

Planetary-scale geospatial analysis for everyone, Remote Sens. Environ., 202, 18–27,

doi:10.1016/j.rse.2017.06.031, 2017.

Grohnfeldt, C., Schmitt, M. and Zhu, X.: A Conditional Generative Adversarial Network to Fuse Sar And

Multispectral Optical Data For Cloud Removal From Sentinel-2 Images, in IGARSS 2018 - 2018 IEEE

International Geoscience and Remote Sensing Symposium, pp. 1726–1729., 2018.

He, W. and Yokoya, N.: Multi-Temporal Sentinel-1 and -2 Data Fusion for Optical Image Simulation,

ISPRS Int. J. Geo-Inf., 7(10), 389, doi:10.3390/ijgi7100389, 2018.

Huang, B., Li, Y., Han, X., Cui, Y., Li, W. and Li, R.: Cloud Removal From Optical Satellite Imagery With

SAR Imagery Using Sparse Representation, IEEE Geosci. Remote Sens. Lett., 12(5), 1046–1050,

doi:10.1109/LGRS.2014.2377476, 2015.

Jacquemoud, S., Verhoef, W., Baret, F., Bacour, C., Zarco-Tejada, P. J., Asner, G. P., François, C. and

Ustin, S. L.: PROSPECT+SAIL models: A review of use for vegetation characterization, Remote Sens.

Environ., 113, S56–S66, doi:10.1016/j.rse.2008.01.026, 2009.

Krizhevsky, A., Sutskever, I. and Hinton, G. E.: ImageNet Classification with Deep Convolutional Neural

Networks, in Advances in Neural Information Processing Systems 25, edited by F. Pereira, C. J. C. Burges,

L. Bottou, and K. Q. Weinberger, pp. 1097–1105, Curran Associates, Inc. [online] Available from:

http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

(Accessed 9 February 2020), 2012.

Lee, J.-S.: Digital Image Enhancement and Noise Filtering by Use of Local Statistics, IEEE Trans. Pattern

Anal. Mach. Intell., PAMI-2(2), 165–168, doi:10.1109/TPAMI.1980.4766994, 1980.

Li, X., Shen, H., Zhang, L., Zhang, H., Yuan, Q. and Yang, G.: Recovering Quantitative Remote Sensing

Products Contaminated by Thick Clouds and Shadows Using Multitemporal Dictionary Learning, IEEE

Trans. Geosci. Remote Sens., 52(11), 7086–7098, doi:10.1109/TGRS.2014.2307354, 2014.

Liang, S., Li, X. and Wang, J., Eds.: Chapter 1 - A Systematic View of Remote Sensing, in Advanced

Remote Sensing, pp. 1–31, Academic Press, Boston., 2012.

Liu, L. and Lei, B.: Can SAR Images and Optical Images Transfer with Each Other?, in IGARSS 2018 -

2018 IEEE International Geoscience and Remote Sensing Symposium, pp. 7019–7022., 2018.

Loff, S.: Cloudy Earth, NASA [online] Available from: http://www.nasa.gov/image-feature/cloudy-earth




Ma, L., Liu, Y., Zhang, X., Ye, Y., Yin, G. and Johnson, B. A.: Deep learning in remote sensing applications:

A meta-analysis and review, ISPRS J. Photogramm. Remote Sens., 152, 166–177,

doi:10.1016/j.isprsjprs.2019.04.015, 2019.

Mateo-García, G., Gómez-Chova, L., Amorós-López, J., Muñoz-Marí, J. and Camps-Valls, G.:

Multitemporal Cloud Masking in the Google Earth Engine, Remote Sens., 10(7), 1079,

doi:10.3390/rs10071079, 2018.

Meados, C., Nair, R. and Plascencia, D.: Pix2pix Utilizing the Deep Learning Reference Stack, 01.org

[online] Available from: https://01.org/blogs/2019/pix2pix-utilizing-deep-learning-reference-stack


Mou, L., Schmitt, M., Wang, Y. and Zhu, X.: A CNN for the Identification of Corresponding Patches in

SAR and Optical Imagery of Urban Scenes., 2017.

Nagendra, H., Lucas, R., Honrado, J. P., Jongman, R. H. G., Tarantino, C., Adamo, M. and Mairota, P.:

Remote sensing for conservation monitoring: Assessing protected areas, habitat extent, habitat condition,

species diversity, and threats, Ecol. Indic., 33, 45–59, doi:10.1016/j.ecolind.2012.09.014, 2013.

Pettorelli, N., Laurance, W. F., O’Brien, T. G., Wegmann, M., Nagendra, H. and Turner, W.: Satellite

remote sensing for applied ecologists: opportunities and challenges, J. Appl. Ecol., 51(4), 839–848,

doi:10.1111/1365-2664.12261, 2014.

Ronneberger, O., Fischer, P. and Brox, T.: U-Net: Convolutional Networks for Biomedical Image

Segmentation, ArXiv150504597 Cs [online] Available from: http://arxiv.org/abs/1505.04597 (Accessed 8

February 2020), 2015.

Scarpa, G., Gargiulo, M., Mazza, A. and Gaetano, R.: A CNN-Based Fusion Method for Feature Extraction

from Sentinel Data, Remote Sens., 10(2), 236, doi:10.3390/rs10020236, 2018.

Schmitt, M., Tupin, F. and Zhu, X. X.: Fusion of SAR and optical remote sensing data — Challenges and

recent trends, in 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp.

5458–5461., 2017.

Schmitt, M., Hughes, L. H. and Zhu, X. X.: The SEN1-2 Dataset for Deep Learning in SAR-Optical Data

Fusion, ArXiv180701569 Cs [online] Available from: http://arxiv.org/abs/1807.01569 (Accessed 16 January

2020), 2018.

Shelhamer, E., Long, J. and Darrell, T.: Fully Convolutional Networks for Semantic Segmentation, IEEE

Trans. Pattern Anal. Mach. Intell., 39(4), 640–651, doi:10.1109/TPAMI.2016.2572683, 2017.

Shen, H., Li, X., Cheng, Q., Zeng, C., Yang, G., Li, H. and Zhang, L.: Missing Information Reconstruction

of Remote Sensing Data: A Technical Review, IEEE Geosci. Remote Sens. Mag., 3(3), 61–85,

doi:10.1109/MGRS.2015.2441912, 2015.

Small, C., Okujeni, A., van der Linden, S. and Waske, B.: 6.07 - Remote Sensing of Urban Environments,

in Comprehensive Remote Sensing, edited by S. Liang, pp. 96–127, Elsevier, Oxford., 2018.



Stuhler, S., Leiterer, R., Joerg, P., Wulf, H. and Schaepman, M.: Generating a cloud-free, homogeneous

Landsat-8 mosaic of Switzerland using Google Earth Engine., 2016.

tensorflow/models: tensorflow/models, GitHub [online] Available from:

https://github.com/tensorflow/models (Accessed 10 February 2020), n.d.

Tzouvaras, M., Kouhartsiouk, D., Agapiou, A., Danezis, C. and Hadjimitsis, D. G.: The Use of Sentinel-1

Synthetic Aperture Radar (SAR) Images and Open-Source Software for Cultural Heritage: An Example

from Paphos Area in Cyprus for Mapping Landscape Changes after a 5.6 Magnitude Earthquake, Remote

Sens., 11(15), 1766, doi:10.3390/rs11151766, 2019.

Wang, P. and Patel, V. M.: Generating High Quality Visible Images from SAR Images Using CNNs,

ArXiv180210036 Cs [online] Available from: http://arxiv.org/abs/1802.10036 (Accessed 15 February 2020),

2018.

Wang, X., Jia, K., Liang, S., Li, Q., Wei, X., Yao, Y., Zhang, X. and Tu, Y.: Estimating Fractional Vegetation

Cover From Landsat-7 ETM+ Reflectance Data Based on a Coupled Radiative Transfer and Crop Growth

Model, IEEE Trans. Geosci. Remote Sens., 55(10), 5539–5546, doi:10.1109/TGRS.2017.2709803, 2017.

Wang, Y. and Zhu, X. X.: The SARptical Dataset for Joint Analysis of SAR and Optical Image in Dense

Urban Area, in IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium, pp.

6840–6843, IEEE, Valencia., 2018.

Wang, Z. and Bovik, A. C.: Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures,

IEEE Signal Process. Mag., 26(1), 98–117, doi:10.1109/MSP.2008.930649, 2009.

Wang, Z., Bovik, A. C., Sheikh, H. R. and Simoncelli, E. P.: Image quality assessment: from error visibility

to structural similarity, IEEE Trans. Image Process., 13(4), 600–612, doi:10.1109/TIP.2003.819861, 2004.

Woodhouse, I. H.: Introduction to microwave remote sensing, CRC press., 2017.

Wu, B., Huang, B., Cao, K. and Zhuo, G.: Improving spatiotemporal reflectance fusion using image

inpainting and steering kernel regression techniques, Int. J. Remote Sens., 38(3), 706–727,

doi:10.1080/01431161.2016.1271471, 2017.

Xu, M., Jia, X., Pickering, M. and Plaza, A. J.: Cloud Removal Based on Sparse Representation via

Multitemporal Dictionary Learning, IEEE Trans. Geosci. Remote Sens., 54(5), 2998–3006,

doi:10.1109/TGRS.2015.2509860, 2016.

Zhang, W., Huang, H., Schmitz, M., Sun, X., Wang, H. and Mayer, H.: Effective Fusion of Multi-Modal

Remote Sensing Data in a Fully Convolutional Network for Semantic Labeling, Remote Sens., 10(1), 52,

doi:10.3390/rs10010052, 2017.

Zhang, Y., Sohn, K., Villegas, R., Pan, G. and Lee, H.: Improving Object Detection With Deep

Convolutional Networks via Bayesian Optimization and Structured Prediction, pp. 249–258. [online]

Available from:

http://openaccess.thecvf.com/content_cvpr_2015/html/Zhang_Improving_Object_Detection_2015_CVP

R_paper.html (Accessed 2 April 2020), 2015.



Zheng, Y., Wu, B., Zhang, M. and Zeng, H.: Crop Phenology Detection Using High Spatio-Temporal

Resolution Data Fused from SPOT5 and MODIS Products, Sensors, 16(12), 2099, doi:10.3390/s16122099,

2016.

Zhu, X. and Helmer, E. H.: An automatic method for screening clouds and cloud shadows in optical

satellite image time series in cloudy regions, Remote Sens. Environ., 214, 135–153,

doi:10.1016/j.rse.2018.05.024, 2018.

Zhu, X., Cai, F., Tian, J. and Williams, T. K.-A.: Spatiotemporal Fusion of Multisource Remote Sensing

Data: Literature Survey, Taxonomy, Principles, Applications, and Future Directions, Remote Sens., 10(4),

527, doi:10.3390/rs10040527, 2018.

Zhu, X. X., Tuia, D., Mou, L., Xia, G.-S., Zhang, L., Xu, F. and Fraundorfer, F.: Deep Learning in Remote

Sensing: A Comprehensive Review and List of Resources, IEEE Geosci. Remote Sens. Mag., 5(4), 8–36,

doi:10.1109/MGRS.2017.2762307, 2017.

predicting sentinel-2 optical data using multitemporal

Documents