essnet big data ii grant agreement number: 847375-2018-nl ... · activities related to background...

75
Page 1 | 75 ESSnet Big Data II Grant Agreement Number: 847375-2018-NL-BIGDATA https://webgate.ec.europa.eu/fpfis/mwikis/essnetbigdata https://ec.europa.eu/eurostat/cros/content/essnetbigdata_en Work Package H Earth Observation Deliverable H1 Interim technical report Version 2019-09-30 Work package Leader: Marek Morze (CSO, Poland) [email protected] telephone : +48 89 524 36 66 mobile phone : Prepared by WPH team

Upload: others

Post on 09-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 1 | 75

E S S n e t B i g D a t a I I

G r a n t A g r e e m e n t N u m b e r : 8 4 7 3 7 5 - 2 0 1 8 - N L - B I G D A T A

h t t p s : / / w e b g a t e . e c . e u r o p a . e u / f p f i s / m w i k i s / e s s n e t b i g d a t a h t t p s : / / e c . e u r o p a . e u / e u r o s t a t / c r o s / c o n t e n t / e s s n e t b i g d a t a _ e n

W o r k P a c k a g e H

E a r t h O b s e r v a t i o n

D e l i v e r a b l e H 1

I n t e r i m t e c h n i c a l r e p o r t

Version 2019-09-30

Work package Leader:

Marek Morze (CSO, Poland)

[email protected]

telephone : +48 89 524 36 66

mobile phone :

Prepared by WPH team

Page 2: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 2 | 75

Table of contents

1. Introduction ..................................................................................................................................... 4

2. Methodological framework ............................................................................................................. 5

3. Satellite Earth Observation data sources ......................................................................................... 5

4. Report on thematic task 1 – Agriculture .......................................................................................... 8

4.1. Case study 1 - Crop recognition, mapping and monitoring ..................................................... 8

4.1.1. Pre-works ............................................................................................................................. 8

4.1.2. Stage 1 ............................................................................................................................... 11

4.1.3. Stage 2 ............................................................................................................................... 13

4.1.4. Stage 3 ............................................................................................................................... 19

4.2. Case study 2 - Monitoring of the off-season vegetation cover .............................................. 19

4.2.1. Pre-works ........................................................................................................................... 19

4.2.2. Stage 1 ............................................................................................................................... 21

4.2.3. Stage 2 ............................................................................................................................... 21

4.2.4. Stage 3 ............................................................................................................................... 22

4.3. Case study 3 - Crop recognition with very high-resolution aerial data .................................. 22

4.3.1. Pre-works ........................................................................................................................... 22

5. Report on thematic task 2 - Build-up area ..................................................................................... 26

5.1. Case study 4 - Implementing SDG indicator 11.7.1 ................................................................ 26

5.1.1. Pre-works ........................................................................................................................... 26

5.1.2. Stage 1 ............................................................................................................................... 27

5.1.3. Stage 2 ............................................................................................................................... 28

5.1.4. Stage 3 ............................................................................................................................... 28

5.2. Case study 5 - Urban sprawl across urban areas in Europe ................................................... 30

5.2.1. Pre-works ........................................................................................................................... 30

5.2.2. Stage 1 ............................................................................................................................... 35

5.3. Case study 6 - Combination of administrative and Earth Observation data to determine the

quality of housing .............................................................................................................................. 37

5.3.1. Pre-works ........................................................................................................................... 38

5.3.2. Stage 1 ............................................................................................................................... 43

5.3.3. Stage 2 ............................................................................................................................... 46

5.3.4. Stage 3 ............................................................................................................................... 46

6. Report on thematic task 3 - Land cover ......................................................................................... 47

6.1. Case study 7 - Comparing «in-situ» and «remote-sensing» collection mode for land cover data

47

6.1.1. Pre-works ........................................................................................................................... 47

6.1.2. Stage 1 ............................................................................................................................... 56

6.1.3. Stage 2 ............................................................................................................................... 56

6.1.4. Stage 3 ............................................................................................................................... 59

6.2. Case study 8 - Land cover maps at very detailed scale .......................................................... 60

6.2.1. Pre-works ........................................................................................................................... 60

6.2.2. Stage 1 ............................................................................................................................... 61

6.2.3. Stage 2 ............................................................................................................................... 62

Page 3: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 3 | 75

6.2.4. Stage 3 ............................................................................................................................... 62

7. Report on thematic task 4 - Settlements, Enumeration Areas and Forestry.................................. 66

7.1. Case study 9 - Update the INSPIRE Theme Statistical Units dataset and preventing forest fire

66

7.1.1. Pre-works ........................................................................................................................... 66

7.1.2. Stage 1 ............................................................................................................................... 67

7.1.3. Stage 2 ............................................................................................................................... 67

7.1.4. Stage 3 ............................................................................................................................... 68

8. Report on the meetings ................................................................................................................. 70

9. Bibliography ................................................................................................................................... 70

Page 4: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 4 | 75

1. Introduction

Workpackage H (WPH) is one of the four pilot projects carries out within the ESSnet Big Data II and is

implemented in a partnership of nine institutions from Poland, Belgium, Germany, France, Italy, Finland,

Netherlands and Portugal. The aim of the pilot is to support areal statistics with Earth Observation (EO)

data. Project results in experimental statistics using remote sensing data. From the technological point

of view, the WPH uses the new methods like machine learning algorithms for image analysis. EO creates

an unprecedented advantage in Europe and the World for the development of operational applications

of remote sensing providing an enormous dataset. Recently the EO has become increasingly

technologically sophisticated. The market is full of the EO data from high to low resolution, gathered

from unmanned aerial vehicles through aircrafts to satellites. Especially the launch of the Sentinels from

the Copernicus Programme opened a new chapter in applicability of remote sensing data ensuring free,

open access, continuous and systematic acquisition of the satellite images. One of the important

economic and commercial applications of EO data is official statistical production and landscape

mapping for variable thematic purposes. Nowadays there is the evident need to facilitate and improve

the mandatory statistical registers. In the era of geospatialization of the information the use of EO data

is reasonable and continuously promising, particularly in the perspective of upcoming Census 2021 and

Agricultural Census as well as other commitments of European Commission or United Nations. The

crucial goal of the WPH is the usage of the EO data from different sources that will contribute to build

the geospatial framework to support the mentioned registers. Within this project the usefulness and

practical usage of EO data in order to fill the gap between statistical and geographical information

named as “geospatial breakdown” is proposed. The main objectives of WPH are implemented by the

execution of different case studies divided into thematic tasks: agriculture, build-up area, land cover,

settlements, enumeration areas and forestry. The overview of thematic tasks and cases studies is

presented in Figure 1.1.

Figure 1.1 Overview of thematic tasks and cases studies.

Page 5: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 5 | 75

2. Methodological framework

Based on the thematic fields raised in the project, methodological framework was jointly developed

(Figure 2.1). The methodological framework is closely related to quality, metadata and IT infrastructure

issues and is divided into five general stages: Pre-works and Stages from 1 to 4. Pre-works are all

activities related to background study including state of the art deep researching, definition of statistical

products with using EO data and study about available data source and toolkits. Stage 1 is focused on

specification of the test area and collection of the EO data (ordering, downloading etc.) with other type

of the data (administrative, cadastral etc.). Stage 2 is the preparation of acquired and downloaded data

for main processing and analysis. It can mean unzipping and importing original data to expected format,

data reformatting, database re-shaping and necessary information extraction, radiometric and

geometric correction of images, SAR pre-processing from single look complex/ground range data to

calibrated sigma nought orthoimages etc. Stage 3 is a development of the methods and procedures to

be used for producing statistics. This stage includes data processing (e.g. image segmentation,

classification, learning machine etc.) and complex analysis of the results. Stage 4 is the last part and

includes pilot production, validation and final conclusions.

Additionally, the quality assessment (in dark blue on Figure 2.1) should be performed for Pre-works and

Stages 1-3 as well as IT Infrastructures (in green) and Metadata (in aquamarine) issues for Stages 1-3.

Figure 2.1 The methodological framework of WPH.

3. Satellite Earth Observation data sources

The basis of performed tasks are EO data from various sources. In some cases, the data sources are

common. Each of the data sources offers many different products. The used products may vary

depending on the case study, that’s way the detailed specifications of products are included in particular

case studies. Below is described general information of the common data sources.

Copernicus data

Copernicus programme is the European Programme for the establishment of a European capacity for

Earth Observation. Copernicus consists of satellite missions named as Sentinels. Sentinel-1A/1B, -2A/B,

-3A/B and -6 are dedicated satellites, while Sentinel-4 and -5 are instruments on board EUMETSAT’s

weather satellites. Figure 3.1 shows all Sentinels of Copernicus Programme.

Page 6: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 6 | 75

Figure 3.1 Sentinels of Copernicus Programme

Sentinel-1 operates day and night, performs C-band synthetic aperture radar imaging, enabling to

acquire imagery regardless of the weather. The Sentinel-1 is a two-satellite constellation of Sentinel-1A

(launched in 2016) and Sentinel-1B (launched in 2016), which provides acquisitions with 6 days interval.

Sentinel-1 works with four nominal operational modes on each spacecraft:

Stripmap mode (SM): 80 km swath, 5 m x 5 m resolution

Interferometric Wide Swath mode (IWS): 240 km swath, 5 m x 20 m resolution

Extra Wide Swath mode (EWS): 400 km swath

Interferometric Wide Swath mode (IWS): 240 km swath, 25 m x 80 m resolution

Wave mode (WM): 20 km x 20 km, 20 m x 5 m resolution

Sentinel-2 consisting of 2 polar orbiting satellites (786 km above sea level), Sentinel-2A (since June 2015)

and Sentinel-2B (since March 2017), allows total coverage of the Earth with a 5-day repetition and

provides images of a 290 km swath and a resolution of 10 to 60 m according to spectral bands ranging

from visible to infrared. A total of 13 spectral bands, 3 of which in the short infrared (Short-Wave

Infrared; SWIR) are Sentinel-2 products containing surface reflectance data (Figure 3.2).

Figure 3.2 Sentinel-2 spectral bands. Source ESA.

Sentinel-5 Precursor mission is the mission dedicated to monitoring atmosphere. The mission consists

of one satellite carrying the TROPOspheric Monitoring Instrument (TROPOMI) instrument. The satellite

was launched in 2017.

Page 7: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 7 | 75

Access to Sentinels data

Open data policy adopted for the Copernicus programme foresees access to Sentinels remote sensing

data available to all users via a simple pre-registration. Sentinels data is available at no costs.

The Sentinel data is available through the Copernicus Open Access Hub https://scihub.copernicus.eu/).

This platform provides full access to all Sentinel-1, -2 and -3 user products. One possibility is to

interactively select an area and a time period of interest on the Copernicus Open Access Hub. Figure 3.3

visualizes how images can be searched for and filtered by cloud coverage, area and date.

Figure 3.3 Selection of area and time period of interest on the Copernicus Access Hub

Data can be downloaded using the API Hub which is a dedicated interface allowing users access via a

scripting interface.

Lansdat mission

Landsat a joint program of the USGS and NASA, has been observing the Earth continuously since 1972.

Landsat satellites image currently provides global coverage at a 30-meter resolution about once every

two weeks, with multispectral and thermal data.

The Landsat satellites are a series of civil NASA Earth observation satellites for remote sensing of the

continental earth’s surface and coastal regions. Since 1972, eight satellites of this series have been

launched (one of which was a false start), spread over four series. The latest satellite of the Landsat

program is Landsat 8. Landsat 8 was launched in February 2013 by NASA. It is equipped with the OLI and

TIRS sensors, which deliver images in various spectral ranges of visible light and infrared with pixel

resolutions of 15 to 100 m (at object Earth). The thermal bands 10 and 11 have a bandwidth of 10.6 –

12.5 µm while the resolution in the visible and near infrared is 30m.

Terra/Aqua MODIS

Terra and Aqua are a joint Earth observing missions within NASA's ESE (Earth Science Enterprise)

program between the United States, Japan and Canada. One of the sensors carried by Terra and Aqua

is MODerate resolution Imaging Spectrometer (MODIS). MODIS has 36 channels between 0.44 µm and

15 µm with spatial resolution ranging from 250 m to 1 km. Objective of MODIS is to measure biological

and physical processes on a global basis on time scales of 1 to 2 days. MODIS gives information for

example of cloud and aerosol properties, surface temperature at 1 km resolution, chlorophyll

concentration.

Access to data

Landsat-8 and MODIS data can be freely downloaded directly from USGS service named Earth Explorer

https://earthexplorer.usgs.gov/.

Page 8: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 8 | 75

4. Report on thematic task 1 – Agriculture

4.1. Case study 1 - Crop recognition, mapping and monitoring

4.1.1. Pre-works

State of the art

Earth observation satellite systems have been providing information of the Earth’s surface for several

decades. Nowadays the number of active EO systems has significantly increased up to hundreds which

is over 30% of operational satellite systems. Current state is due to the constantly growing demand for

geoinformation in multiscale dimension in many sectors of life. Technology development caused the

data is accurate, timely and easily accessible. There are several official databases of EO missions and

sensors which shows enormous market of satellites images, for example oePortal of ESA - European

Space Agency (https://directory.eoportal.org/web/eoportal/satellite-missions), website of CEOS –

Committee on Earth Observation Satellites (http://database.eohandbook.com/), OSCAR developed by

WMO - World Meteorological Organization (https://www.wmo-sat.info/oscar/satellites). Over the

years, international cooperation between individuals, organizations, institutions and industrial sector

has strongly developed leading to the formation of many associations and societies. The most important

international societies of remote sensing users are Geoscience and Remote Sensing Society (GRSS)

formed in 1961, International Society of Photogrammetry and Remote Sensing (ISPRS) established

under this name in 1980, but founded in 1910 as International Society for Photogrammetry, European

Association of Remote Sensing Laboratories (EARSeL) founded in 1977, Committee on Earth

Observation Satellites (CEOS) established in 1984, Group on Earth Observations (GEO) formed in 2005

and UN Committee of Experts on Global Geospatial Information Management (UN-GGIM) established

in 2011. A broad look on the market of the satellite images shows that remote sensing techniques are

widely used and permanently developing. One of the applications of remote sensing is official statistics.

The brief summary of the using satellite imagery and earth observation technology in official statistics

was currently presented in documents (UNECE-CES 2019; United Nation 2017; GSARS 2017).

The main thematic field of using the remote sensing in official statistics is agriculture, which is also the

subject of a case study 1. Remote sensing can be used for many tasks, from cropland mapping, crop

acreage estimation, biomass and yield estimation, vegetation vigour and drought stress monitoring to

overwintering. Most of the research is based on the observations over the longer period (Bargiel 2017;

Demarez et al. 2019; Inglada et al. 2015; Navarro et al. 2017). It is reasonable because of crops

seasonality and their phenology. Additionally, the crop mapping is difficult because single crop type can

vary due to different cultivation practices, soil type or moisture. In the context of digital image

classification, this is called intra-class variability. In literature can be found that for agriculture area

mapping the two types of satellite images are using, gathered from passive (Rufin et al. 2019; Feng et

al. 2019; Dimitrov et al. 2019) and active sensors (Bargiel 2017; Bargiel et al. 2014; Hütt and Waldhoff

2018). Optical sensors are passive and well-suited for crops mapping and monitoring growth condition

(Atzberger 2013). Unfortunately, the information from optical sensors is not possible to achieve when

is cloudy. On the other hand, there are active systems like Synthetic Aperture Radar (SAR), generating

their own energy with few centimetres wavelength, which allows to penetrate clouds and imaging

regardless of weather conditions. In general, SAR and optical data both accurately reproduce crop

growth cycles and may be combined for having full gap-free time series (Veloso et al. 2017; De Bernardis

et al. 2016). In this study the combined data from optical and SAR systems is under investigation. In

literature two approaches of classification can be found: pixel-based (Xie et al. 2019; Sonobe 2019;

Page 9: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 9 | 75

Navarro et al. 2017) and object-based classification (Peña-Barragán et al. 2011; Q. Li et al. 2015; Csillik

et al. 2019). Since our study will use object-oriented administrative data the object-based classification

will be performed. The machine learning algorithm like supported vector machine, random forest,

decision tree, maximum likelihood, artificial neural network will be tested. Examples of implementations

of these algorithms can be found in the literature (Feng et al. 2019; Gómez et al. 2019; Jia et al. 2012;

Sonobe et al. 2015).

It is worth to mention that ESA has ongoing projects related to the agriculture crops:

Sen2Agri - Sentinel-2 for Agriculture (http://www.esa-sen2agri.org/). This project exploits

optical satellite systems like Sentinel-2 and Landsat 8 for agriculture monitoring.

Sen4CAP - Sentinels for Common Agricultural Policy (http://esa-sen4cap.org/). Project

provides to the European and national stakeholders of the CAP validated algorithms, products,

workflows and best practices for agriculture monitoring relevant for the management of the

CAP based on Sentinel-1 and Sentinel-2 data. Sen4CAP has been setup by ESA in direct

collaboration and on request from DG-Agri, DG-Grow and DG-JRC.

SEN4Stat – Sentinels for Statistics (http://www2.rosa.ro/index.php/en/esa/oferte-

furnizori/3153-sen4stat-sentinels-for-statistics). The aim of SEN4Stat is facilitating the uptake

of Sentinels data in NSOs supporting the agricultural statistics. The development and

demonstration of EO products as well as best practices for agricultural monitoring relevant to

SDGs reporting and monitoring their progress at national scale will be given.

Only Sen2Agri is operational at the present day, the rest of projects are under construction. Moreover,

ESA started in 2014 Thematic Exploitation Platform (TEP) among others dedicated to food security

taking into account the use EO data to agriculture monitoring. The TEP platform is a collaborative, virtual

work environment providing access to EO data and the tools, processors. This TEP platform is still being

built.

Statistical product definition

The main purpose of the “Agriculture - Crop recognition, mapping and monitoring” case study is to use

EO data gathered from Sentinel-1 and Sentinel-2 satellites for agricultural crops mapping and area

estimation in Northern Europe conditions. EO data combined with administrative geodata is promising

tool for statistics production. The pilot project shows the methodology of using EO data with machine

learning algorithms.

The main product of the case study 1 is map of crops. Based on the obtained map, area of crops is

estimated. The crops map can support agricultural statistics in further projects including crop yields and

growth models.

Data source and toolkit

The list of satellite systems achieving earth observation data is long. For the detail crops identification,

the systems with high and medium resolution (up to 30 m) should be chosen. The list of example and

potential SAR satellite systems using for agriculture is shown in Figure 4.1.

Page 10: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 10 | 75

Figure 4.1 Timelines of SAR satellite systems.

The tools for three types of processing including satellite data processing, administrative data

processing and machine learning are needed in reference to case study 1. There are many available

commercial as well as open source/free software on the market. The list of potential software to use is

showed in Table 4.1.

Table 4.1 List of potential software to use.

Purpose Type of

software Name

Type of data SAR images Optical images Vector data Text

Satellite data processing

Commercial ENVI/SARscape + + + +

Commercial GAMMA software + +

Commercial PCI Geomatics + + + + Commercial Erdas Imagine + + +

Commercial TerrSET + + +

Commercial SARPROZ + Open source ESA/SNAP + + +

Open source PolSARpro + +

Open source RAT (Radar Tools) + Open source Sen2Agri + +

Open source Ilwis + Open source MapReady + +

Administrative data processing

Commercial ArcGIS +

Open source QGIS + Open source SAGA +

Open source GRASS +

Commercial Microsoft Office + Open source Libre Office +

Machine learning

Commercial eCognition + + +

Commercial ENVI + + + Open source ORFEO Toolbox + + +

Open source LEOworks + + +

Page 11: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 11 | 75

4.1.2. Stage 1

Test site definition

The investigated area is Warmian-Masurian Voivodship (24 173 km2) located in the north-eastern

Poland (Figure 4.2). It is 7.7% of Poland’s area. It is the fourth in terms of area among the voivodships.

Figure 4.2 Geographical localization of the test area

The largest area of the Warmian-Masurian Voivodship is agricultural land – 54.4% (mainly: arable land

– 66.4%, permanent pastures – 16.9%, permanent meadows - 12.2%). Forests cover 32.2% of the area.

Warmia and Mazury is characterized by many lakes and rivers in Poland. The water area in the

voivodship is 5.7%. The soils of the test area are characterized by high variability. The dominant soil

types are brown soils (covering about 70% of the area) and hydrogenic soils (about 14% of the area).

The arable land is dominated by good and medium soils in terms of agricultural usefulness. Mainly the

soils are 3rd and 4th quality classes, they cover 73,8% of the voivodship’s area. There are 20 main types

of crops which are under investigation. The list of crops for recognition is the following:

sugar beets

buckwheat

spring barley

winter barley

corn

cereal mixes

oat

fruit trees plantations

fruit bushes plantations

spring wheat

winter wheat

spring triticale

winter triticale

spring rape

winter rape

grassland

potatoes

rye

mustard

leguminous crops

Page 12: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 12 | 75

Data collection

Two types of data are used for case study 1: satellite images and administrative geodata. For the pilot

study the long time series of Sentinel-1 images is the base of the research and Sentinel-2 optical images

support the investigation.

The gathered data of Sentinel-1 are Ground Range Detected (GRD) products from Interferometric Wide

Swath Imaging Mode and resampled to 10 m pixel size. The products are acquired in dual polarization

mode VV (vertical transmitted and vertical received signal) and VH (vertical transmitted and horizontal

received signal). It means that physically for each acquisition date there are two images. The coverage

of the single Sentinel-1 image are 270 km by 200 km. For whole test area two scenes are needed. The

Sentinel-1 data characteristics are shown in Table 4.2. The second collected type of images are from

optical sensor from Sentinel-2. The images of Sentinel-2 are bottom of atmosphere corrected images in

cartographic geometry (2A processing level). The optical data contains 13 bands and has 10 m or 20 m

or 60 m resolution depend on spectral band. Single image of Sentinel-2 covers 100 km by 100 km, thus,

to fill out whole test area twelve images are required. The Sentinel-2 data characteristics are shown in

Table 4.2.

Table 4.2 Characteristics of Sentinel-1 and Sentinel-2 data.

a) Sentinel-1 data characteristics

Imaging mode IW

Product GRD

Relative orbit 51

Pass direction descending

Polarization VV/VH

Resolution 10m

Scene size 270 x 200 km

File size 1.5-2 GB per scene

b) Sentinel-2 data characteristics

Processing level 2A

Product S2 MSI 2A

Relative orbit 79

Pass direction descending

Resolution 10m

Tile size 100 x 100 km

File size 1 GB per tile

The Figure 4.3 shows footprints of Sentinels images for the test area.

Figure 4.3 Footprints of Sentinel-1 images covering test area are presented in red, footprints of Sentinel-2

images are shown in orange.

Page 13: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 13 | 75

The analysed time series covers vegetation season 2018. The acquisitions of satellite data started in

October 2017 when winter crops had emerged and ended in September 2018 after the harvest. The

satellite image collection includes 26 acquisitions of Sentinel-1 and 3 acquisitions of Sentinel-2. The

timeline of acquisition’s dates is shown in Figure 4.4.

Figure 4.4 Timeline of Sentinel-1 and Sentinel-2 acquisitions.

Geodata obtained from administrative sources: Land Parcel Identification System (LPIS), Agency for

Restructuring and Modernisation of Agriculture (ARMA) and Statistics Poland is the second type of data

used for case study 1. LPIS provides details of feature boundaries and land use information in vector

format. ARMA gives information of crops declared by farmers and Statistics Poland provides information

of crop type from field campaigns (in-situ measurement).

4.1.3. Stage 2

Data pre-processing

The pre-processing of satellite data is a crucial and necessary point before further analysis and

classifications. The Sentinel-1 data are images in SAR geometry and they present uncalibrated values of

radar backscatter. It is required to do the pre-processing which consists in radiometric and geometric

transformations leading to the elaboration of orthorectified sigma nought maps. The pre-processing in

this case study was done using open source software – SNAP 7.0. The workflow of pre-processing is

presented in Figure 4.5.

Page 14: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 14 | 75

Figure 4.5 Scheme of Sentinel-1 data pre-processing

The pre-processing of Sentinel-1 data includes:

Choosing subset of image in case of test area smaller than half scene one Sentinel-1 image. The

pre-processing of the whole scene is more time consuming.

Thermal noise removal

Remove GRD border noise - masking the "no-value" samples efficiently with thresholding

method

Radiometric calibration to Sigma Nought in linear scale

Slice assembly if the area of interest is located on the border of two consecutive images along

track

Sub-pixel coregistration of SAR images

Speckle filtering based on time series in reference to each polarization

Stack creation – joining two datasets after multitemporal filtering

Page 15: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 15 | 75

Geocoding – geometric correction to cartographic system including digital elevation model

(SRTM)

Spatial subset selection in the cartographic coordinates

Converting linear scale to the logarithmic scale in decibels

Additional median filtering with window size 3 by 3 pixels

The example of pre-processing result is shown in Figure 4.6.

Figure 4.6 Sentinel-1 colour composition of Warmian-Mazurian Voivodeship (R: 9 May 2018, G: 8 June 2018, B: 7 August 2018; pixel size: 10x10m; polarization VH)

In case of Sentinel-2 data the key pre-processing step is image mosaicking and cloud masking. The pre-

processing is performed using SNAP 7.0 by steps presented in Figure 4.7. The main outputs are 4 bands

VNIR and calculated NDVI images.

Page 16: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 16 | 75

Figure 4.7 Scheme of Sentinel-2 data pre-processing

The example of pre-processing results of Sentinel-2 is shown in Figure 4.8.

Figure 4.8 Sentinel-2 natural colour composition of Warmian-Mazurian Voivodeship (R: B4, G: B3, B: B2 of 7 June

2018; pixel size: 10x10m)

Page 17: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 17 | 75

The full list of gathered images for further analysis contains 72 images is above (Table 4.3).

Table 4.3 List of output images after pre-processing.

1 Sigma0_VH_2017_10_17_db 25 Sigma0_VH_2018_08_31_db 49 Sigma0_VV_2018_08_19_db

2 Sigma0_VH_2017_10_23_db 26 Sigma0_VH_2018_09_06_db 50 Sigma0_VV_2018_08_25_db

3 Sigma0_VH_2018_04_03_db 27 Sigma0_VV_2017_10_17_db 51 Sigma0_VV_2018_08_31_db

4 Sigma0_VH_2018_04_09_db 28 Sigma0_VV_2017_10_23_db 52 Sigma0_VV_2018_09_06_db

5 Sigma0_VH_2018_04_15_db 29 Sigma0_VV_2018_04_03_db 53 B2_Blue_443nm_2018_03_19

6 Sigma0_VH_2018_04_21_db 30 Sigma0_VV_2018_04_09_db 54 B3_Green_560nm_2018_03_19

7 Sigma0_VH_2018_04_27_db 31 Sigma0_VV_2018_04_15_db 55 B4_Red_665nm_2018_03_19

8 Sigma0_VH_2018_05_03_db 32 Sigma0_VV_2018_04_21_db 56 B8_NIR_842nm_2018_03_19

9 Sigma0_VH_2018_05_09_db 33 Sigma0_VV_2018_04_27_db 57 B2_Blue_443nm_2018_04_13

10 Sigma0_VH_2018_05_15_db 34 Sigma0_VV_2018_05_03_db 58 B3_Green_560nm_2018_04_13

11 Sigma0_VH_2018_05_21_db 35 Sigma0_VV_2018_05_09_db 59 B4_Red_665nm_2018_04_13

12 Sigma0_VH_2018_05_27_db 36 Sigma0_VV_2018_05_15_db 60 B8_NIR_842nm_2018_04_13

13 Sigma0_VH_2018_06_08_db 37 Sigma0_VV_2018_05_21_db 61 B2_Blue_443nm_2018_05_08

14 Sigma0_VH_2018_06_14_db 38 Sigma0_VV_2018_05_27_db 62 B3_Green_560nm_2018_05_08

15 Sigma0_VH_2018_06_20_db 39 Sigma0_VV_2018_06_08_db 63 B4_Red_665nm_2018_05_08

16 Sigma0_VH_2018_06_26_db 40 Sigma0_VV_2018_06_14_db 64 B8_NIR_842nm_2018_05_08

17 Sigma0_VH_2018_07_02_db 41 Sigma0_VV_2018_06_20_db 65 B2_Blue_443nm_2018_06_07

18 Sigma0_VH_2018_07_08_db 42 Sigma0_VV_2018_06_26_db 66 B3_Green_560nm_2018_06_07

19 Sigma0_VH_2018_07_14_db 43 Sigma0_VV_2018_07_02_db 67 B4_Red_665nm_2018_06_07

20 Sigma0_VH_2018_07_20_db 44 Sigma0_VV_2018_07_08_db 68 B8_NIR_842nm_2018_06_07

21 Sigma0_VH_2018_08_01_db 45 Sigma0_VV_2018_07_14_db 69 NDVI_2018_03_19

22 Sigma0_VH_2018_08_07_db 46 Sigma0_VV_2018_07_20_db 70 NDVI_2018_04_13

23 Sigma0_VH_2018_08_19_db 47 Sigma0_VV_2018_08_01_db 71 NDVI_2018_05_08

24 Sigma0_VH_2018_08_25_db 48 Sigma0_VV_2018_08_07_db 72 NDVI_2018_06_07

Geodata from administrative sources also has been pre-processed. The ingestion of geodata relies on

reformatting, database re-shaping and necessary information extraction. In this case study, two tasks

must be done. First task is to choose area intended only for agricultural practices. For this purpose,

information from cadastral parcels (LPIS) and land use (ARMA) was combined (Figure 4.9).

Figure 4.9 Cadastral parcels borders on the left (red lines), Land Use borders in the middle (white lines), border

of agricultural parcels on the right (blue lines filled with white).

Page 18: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 18 | 75

Second task is to choose the representative parcels of each type crop to perform supervised

classification. For this task information on crops declared by farmers (ARMA) and cadastral parcels (LPIS)

were used. The parcels were chosen with three conditions: i) parcels bigger than 1 ha, ii) on type of crop

on one cadastral parcel, iii) declared crop type covers at least 98% of cadastral parcel (Figure 4.10).

Figure 4.10 Examples of representative parcels (green) and excluded parcels (red) on colour composition of

Sentinel-1 (R: 9 May 2018 VH, G: 8 Jun 2018 VH, B: 7 Aug 2018 VH).

The output from this task are the parcels limits facilitating image segmentation, objects aggregation and

validation. Summary of the selected parcels is in Table 4.4.

Table 4.4 Summary of the selected parcels for image classification.

Crop type Number of samples

Sum of area samples [ha]

% of agriculture area

sugar beets 75 520 0.04

buckwheat 268 1174 0.10

spring barley 848 3835 0.32

winter barley 156 539 0.04

corn 1163 5451 0.45

cereal mixes 553 1862 0.15

oat 610 2577 0.21

fruit trees plantations 91 291 0.02

fruit bushes plantations 85 227 0.02

spring wheat 1097 5618 0.46

winter wheat 2250 11404 0.94

spring triticale 222 626 0.05

winter triticale 1422 6033 0.50

spring rape 147 792 0.07

winter rape 1250 7335 0.60

grassland 12013 58108 4.78

potatoes 87 225 0.02

rye 878 3264 0.27

mustard 75 218 0.02

leguminous crops 755 4097 0.34

SUM 24045 114195 9.40

Page 19: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 19 | 75

4.1.4. Stage 3

Main data processing

In order to agricultural crops mapping and area estimation, the goal of the Case Study 1, the object-

based image classification will be performed. Due to the fact, that the time series is a base of analysis,

the selection of the subsets of images the most suitable for object-based classification will be done. The

selection is related to quality of images and information gathered from them. Merging temporal and

polarimetric features of the crops should permit to extract the subsets returning maximum separability

of crops and highly reduced data volume. This task is very important and critical for final results. It should

give the answer which crops can be effectively recognized and mapped using Sentinel-1A/B very long

time series.

In order to achieve the goals object based classification including machine learning algorithms will be

performed. Key elements of this task are:

• testing mean shift segmentation algorithm parameters for calculating homogeneous areas

(segments) on Sentinel 1 and Sentinel 2 preprocessed data with open source CNES OrfeoToolbox

software,

• testing machine learning algorithms (support vector machine classifier, decision tree classifier,

artificial neural network classifier, random forest classifier, KNN classifier) parameters in the context of

obtaining the best accuracy for crop recognition with open source CNES OrfeoToolbox software.

4.2. Case study 2 - Monitoring of the off-season vegetation cover

4.2.1. Pre-works

State of the art

Radar remote sensing data enables the classification of arable land surface at critical moments, such as

in spring when snow is thawing. While passive optical satellite data e.g. from Sentinel-2 heavily relies on

the sun as target illumination source, and cannot receive reflectance due to clouds, water vapor and

aerosols, radar satellite like Sentinel-1 is an active system, which allows operation during day and night

and through the shorter wavelengths of radars it can also penetrate clouds. Due to these capabilities it

has also good ability to separate surfaces based on its roughness and water content. Thus, it can be

assumed that it is possible to distinguish bare, snow-free arable land with fairly good accuracy from land

covered in vegetation in the time window between snow thaw and the beginning of the growing season.

Operating on microwave wavelengths, the radar beam can penetrate to some extent also the soil.

Therefore, the soil properties have effect on the signal and should be included in the model. Whereas

radar is sensitive to roughness and water content of the surface, optical reflectance at microwave

regions corresponds to cellulose absorption of dead vegetation, thus making it sensitive to plants

residue levels on the soil surface. For example, Normalized Difference Tillage Index (NDTI) could give

additional information to the model on the quality of soil residue cover. In summary, an integration of

Sentinel-1 and Sentinel-2 imagery is a promising approach to classify soil cover.

Statistical product definition

In Finland the statistics on the structure of agricultural and horticultural enterprises is produced by

Natural Resources Institute Finland. The statistics include information such as the number of

enterprises, land use, type of production, and educational level of the owners of agricultural holdings.

Regulation (EU) 2018/1091 on integrated farm statistics provides the framework for the statistics on

Page 20: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 20 | 75

the structure of agricultural and horticultural enterprises. According to a new implementing regulation,

a new variable will be set out in 2023: proportion of agricultural area under vegetation cover in winter.

Large-scale information on off-season vegetation cover will enhance the estimation of soil erosion and

nutrient loads to the water bodies. The statistics on off-season vegetation cover would also relate to

the UN's Sustainable Development Goals, Indicator 2.4.1: Proportion of agricultural area under

productive and sustainable agriculture. This method would provide grounds for establishing an indicator

on sustainable agriculture as land management practices closely relate to sustainability. The statistical

product would tell the proportion of agricultural area under vegetation cover in winter on regional level.

Data source & toolkit

The spaceborne radar remote sensing data used in this study is acquired by the Sentinel-1

mission.Sentinel-1 data have been radiometrically and geometrically corrected and provided in 11-day

mosaics covering Finland with a spatial resolution of 20 meters on the Finnish Geospatial Platform. The

Sentinel-1 data is preprocessed by the National Satellite Data Center (NSDC) at the Finnish

Meteorological Institute. NSDC also provides preprocessed mosaics of NDTI from Sentinel-2 mission on

60 meters spatial resolution.

Sentinel-1 data and Sentinel-2 based vegetation indices are examples of preprocessed data sets made

publicly available on the Finnish Geospatial Platform. The platform collects spatial data from various

providers and makes them openly available to users. The aim of the platform is to harmonise and

improve services provided by the public administration, to improve data-based decision-making and

increase transparency, as well as to save public administration costs, e.g. by enabling the efficient

maintenance of data resources, removing overlapping activities and harmonising datasets. The

responsible party of the Geospatial Platform project is the Ministry of Agriculture and Forestry.

Participating in the preparation and implementation of the project are the Ministry of Finance, the

Ministry of the Environment, the Finnish Environment Institute, the National Land Survey of Finland and

other partners from the private and public sectors.

As a reference data, we use existing administrative data from the national Integrated Administration

and Control System (IACS) operated by Finnish Agency for Rural Affairs. IACS provides open data on

annual agricultural land use (Land Parcel Identification System, LPIS) and agricultural payment

entitlements. Under the EU’s Common Agricultural Policy the Agri-Environmental Support (AES)

Schemes provide financial support for Member States to design and implement agri-environment

measures. Farmers who subscribe, on a voluntary basis, to environmental commitments related to the

preservation of the environment and maintaining the countryside, are provided payments. IACS

contains also data of agri-environment measures on field parcel level. There are several measures that

commit to off-season vegetation cover. For each field parcel, we set a value of a type of vegetation

cover based on the information on farmers environmental commitments. In order to combine satellite

and reference data, field parcel geometries (LPIS) is used to extract backscatter intensity values derived

from the satellite data per field parcel. All IACS data used in this study is openly available upon

application with non-disclosure agreement.

Our initial plan was to utilize ancillary data on precipitation, temperature and soil properties. The idea

was that with meteorological data we could decide when the time window between snow thaw and the

beginning of the growing season occurs each year. As we learned that there are preprocessed 11-day

mosaics in data analysis ready format, that actually solve the problem of changing weather conditions

in the time window, we decided to use mosaics instead. Moreover, we found a meteorological product

of the date of the start of the growing season. The date helps to set the time window suitable for

Page 21: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 21 | 75

monitoring the soil cover regionally. In the Southern Finland the 11-day mosaic would be around the

11th or 21st of April, in the North the 1st or 11th of May. The meteorological product of the date of the

start of the growing season is also open data from the Finnish Meteorological Institute.

Data on soil properties is freely available from the Finnish Soil Database. Considering the spatial

variability of soil properties on field scale we need to average the variability to one class per parcel. Our

plan is to append the soil data only after first results from modelling. The hypothesis is that adding soil

type may improve the model.

All data in this project is processed with open source Python libraries. Python is suitable for spatial data

and also for downstream data analysis. Note that radar satellite images were already preprocessed, and

we can use products in analysis ready format.

4.2.2. Stage 1

Test site definition

The area of interest (AOI) comprises 23200 field parcels of total 91500 ha arable land from the primary

agricultural production region in southwestern Finland. The soil of the arable land is approximately 93%

of mineral and 7% of organic soils.

Data collection

In this project, data will be retrieved from a WMS service (GeoTIFFs), by email (IACS data) and from a

database (meteorological data). Files can be saved on standard PC storage.

4.2.3. Stage 2

Data pre-processing

AOI was masked from LPIS. Parcels smaller than 1ha or with holes were masked out. The resulting set

of field parcels were merged with AES data. The soil cover class was decided based on the variables of

the agri-environment measures subscribed to a parcel. If a parcel was not subscribed to any measure,

it was considered as potentially ploughed, that is bare soil. It was checked from the following year’s

parcel data that no autumn crop was sowed. In the end, we decided that if the following year the parcel

was growing spring crop, then the parcel was most probably ploughed. After preprocessing we have

15000 parcels covering 59000ha of arable that have vegetation cover in the winter, either truly

vegetation or vegetation residues (reduced tillage), and 8300 ploughed parcels covering 35000ha.

We use Sentinel-1 mosaics for the ground range detected horizontal polarization ground backscatter of

vertical polarization radar pulse (VH) and vertical polarization ground back scatter of vertical polarization

radar pulse (VV). Usually there is an overpass every 2-3 days over Finland. As several measurements

have been combined over 11 days, we get the maximum, minimum, mean and standard deviation of

these measurements. Sentinel-2 optical sensors acquire 13 spectral bands in the visible, the near

infrared, and the short-wave infrared (SWIR) wavelength on 60m spatial resolution. After preprocessing,

NDTI is calculated from the SWIR bands number 11 and 12. 15-day mosaics have the index averaged per

pixel.

Page 22: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 22 | 75

4.2.4. Stage 3

Main data processing

From each parcel we have a distribution of backscatter coefficients from two radar polarisations VV and

VH, their difference and NDTI. As target variable we have class variable with three soil cover values:

“bare soil”, “reduced tilling”, “vegetation cover”. For a classification task, we divide the data set into

training, validation, and testing set by 60-20-20 to train a convolutional neural network and by 80-20

for Random Forest as a baseline.

4.3. Case study 3 - Crop recognition with very high-resolution aerial data

4.3.1. Pre-works

Introduction

Research question

The use of satellite data with an intermediate resolution (10 m for Sentinel-1/2) for agriculture and more

specifically for crop recognition might not be adequate in areas where the size of parcels is relatively

small and diversity of crops is relatively high. In those cases, aerial photography data with a higher

resolution would offer a solution. This case study addresses two concrete questions:

Data availability: are aerial photography data available for official statistics, with characteristics

and conditions of use allowing them to be used realistically?

Testing in practice whether high-resolution aerial photography data can be used for crop

recognition at a more diversified, detailed and small-scale level, using machine learning

algorithms.

Obviously, the second objective can only be attained after the first one has been addressed successfully.

Concrete approach

Belgium is a federal state where competencies linked to territory (such as environment, agriculture,

land use, zoning regulations, construction and housing, GIS, …) are exercised at the level of Belgium’s

regions Flanders, Wallonia and Brussels. In order to tackle both research questions, Statbel opted to

limit itself to Flanders, partnering with Statistics Flanders.

Flanders has 6.6 million inhabitants (about 58% of Belgium’s population) for a surface area of 13,500

km², resulting in a high population density of about 490 persons per km². As a result, its agricultural area

(7,425 km² or 55%) consists of a great number of relatively small plots which may vary considerably as

to the crops under cultivation. Consequently, the use of satellite data with a fairly low resolution is

probably less effective to assess crops than in predominantly agricultural regions characterised by large

plots and monocultures.

Statistics Flanders, being interested in the exploitation of satellite and aerial imagery for various

statistical purposes, has agreed to act as ‘unofficial’ non-refunded partner in WPH. Its role is essential

for connecting to the departments and units in the Flemish administration owning satellite and aerial

photography data and/or using machine learning/deep learning/artificial intelligence to analyse these

data.

Data availability

In order to assess the availability of aerial photography data with the required resolution, frequency and

access conditions, Statbel and Statistics Flanders had e-mail and face-to-face exchanges with the major

Page 23: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 23 | 75

organisations and units responsible within the Flemish administrations: EODaS (Earth Observation Data

Science) programme of the Department Informatie Vlaanderen (Information Flanders), VITO (Vlaams

Instituut voor Technologisch Onderzoek, Flemish Institute for Technological Research) and the GIS

unit/Geoloket (Geocounter) of the department Agriculture and Fisheries.

EODaS Programme, Ghent (BE)

https://overheid.vlaanderen.be/informatie-vlaanderen/producten-diensten/earth-observation-data-

science (Dutch)

The EODaS programme manages the collection of earth observation data for Flanders and their

dissemination as open data in a standardised way via web services.

EODaS has several potentially useful datasets freely available (for Sentinel data, see below, VITO):

10-yearly high-resolution aerial photography (RGB) data, resolution 10 cm, and LiDAR data,

resolution 25 cm, collected 2013-2015 for DHMVII (Digitaal Hoogtemodel Vlaanderen, Digital

Elevation Model Flanders);

triannual aerial photography datasets, summer, resolution 40cm;

annual aerial photography datasets, winter, resolution 25 cm;

Data can be accessed via:

the ‘Orthophoto mosaic webservice’ for aerial photography

(http://www.geopunt.be/catalogus/webservicefolder/418e8e4a-12c1-80a8-8306-fcf4-799c-

581d-c4e38594), with among others most recent aerial photography dataset 2019.02, ground

resolution 25 cm (http://www.geopunt.be/catalogus/datasetfolder/50134be3-f0cd-47c5-8f6e-

4a0936287947

the OpenData Viewer for LiDAR raw remote sensing data collected 2013-2015

(https://remotesensing.vlaanderen.be/apps/openlidar/), including RGB.

Other Links

Beeldverwerkingsketen (BVK) (Image processing chain): https://overheid.vlaanderen.be/bvk-

algemeen

LiDAR DHMV (Digitaal Hoogtemodel Vlaanderen, Digital Elevation Model Flanders):

http://www.geopunt.be/catalogus/datasetfolder/7e40413e-9c17-492b-ac24-e72d37251e5a

Geopunt, central Inspire-compliant gateway to Flemish geographic government data made

accessible to government agencies, citizens, organizations and companies:

http://www.geopunt.be/over-geopunt

Presentation (Dutch) Source Data DHMV

https://overheid.vlaanderen.be/sites/default/files/media/documenten/informatie-

vlaanderen/producten/BVK/documenten/Infosessie_BVKOpenData_LiDAR_21022017%20%28

1%29.pdf

VITO, Ghent (BE)

VITO (Vlaams Instituut voor Technologisch Onderzoek, Flemish Institute for Technological

Research) undertakes research projects of public interest in various domains, among others

using remote sensing (https://vito.be/nl/technologie-groep/remote-sensing), with some 90

persons employed. They collaborate extensively with EODaS which provides data, but apart

from that they collect or buy remote sensing data (satellite, airplane manned or unmanned,

drone) paid for by projects’ clients.

Page 24: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 24 | 75

They provide free access to Sentinel satellite data via the Terrascope data platform

(https://remotesensing.vito.be/case/terrascope).

Agriculture and Fisheries– GIS & Geocounter (Geoloket), Brussels (BE)

Landbouw en Visserij - Geoloket Landbouw (Agriculture and Fisheries - Geocounter Agriculture)

conducts agricultural surveys on crop areas under cultivation, for the calculation of EU CAP

subsidies. To that end, they use machine learning algorithms to automatically recognise crops

as far as is possible; when this proves unfeasible, the traditional method of surveying is used

and its results are fed into the AI algorithm to improve it gradually. LV uses data provided by

EODaS. See

https://www.landbouwvlaanderen.be/eloket/Domain.Eloket.Portaal.Wui/Content/Help/132.G

eoloket%20landbouw/geoloket.htm, Dutch only).

Conclusions

A large amount of fairly recent Earth observations is freely available online as open data, including high-

resolution and very-high-resolution aerial photography. Unfortunately, these images cannot be used for

crop recognition via machine learning, because they are recorded at an insufficient frequency (annually

at low frequency (annually and tri-annually, and even 10-yearly for the highest-resolution data) to

constitute the time series needed for analysis. Furthermore, many images cannot be used for crop

recognition due to a recording time unfit for this purpose (during winter).

Analysis

Because the first subtask, assessing the data situation, came to the conclusion that high-resolution aerial

photography data are not readily available with the required frequency, the next foreseen step of

testing machine learning methods to analyse the data could not be taken.

Nevertheless, a first assessment of the data science capabilities of the various partners to address the

research question was conducted.

Statbel

Statbel has personnel capable of performing this type of analysis, but all are more than fully occupied

with operational business. Although there is awareness of the need to invest in building AI and machine

learning capacity, at this moment the budgetary situation does not allow doing so. Other urgent

priorities unfortunately have to take precedence.

Statistics Flanders

The recruitment of experts able to conduct the type of analysis required by this task is planned.

VITO

VITO has machine learning expertise and in fact has executed or is executing various quite similar

projects using aerial photography (e.g., detecting asbestos roofs from aerial photography, creating Solar

Map of Flanders). However, and although these experts are perfectly willing to provide advice,

comments and feedback when asked, this analytical capacity is only available against payment.

Dept. Agriculture and Fishery – GIS/Geocounter

The Department Agriculture and Fisheries of the Flemish administration conducts agricultural surveys

on crop areas under cultivation, for the calculation of EU CAP subsidies. They use machine learning

algorithms to maximally determine land use and if possible, crops from satellite images and aerial

photography, thus greatly reducing the need to survey farmers. These survey responses are then fed

Page 25: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 25 | 75

back into the AI algorithms to improve them. LV also is willing to provide advice, comments and

feedback, but their core business is of course not conducting projects. Moreover, their need for

additional precision is not high as they always have the option to ask information about specific crops,

and they need to do so to obtain the level of precision required for the correct assignment of EU

subsidies.

Annotated literature overview

Another potentially useful outcome of Case study 3 is a commented overview of reports, documents

and webpages on similar applications and projects by the organisations and units contacted in the

course of the case study.

EODaS, VITO: Detection of asbestos-containing roofs from the sky via artificial neural networks

(AI) - https://overheid.vlaanderen.be/asbestdaken-monitoren-vanuit-de-lucht-aan-de-hand-

van-ai (Dutch)

Asbestos was used as a building material in many houses and buildings in Flanders from the 1970’s and

1980’s, among others as slate or corrugated-sheet roof covering. Because of the health hazard of

asbestos, OVAM (the Flemish Public Waste Agency) is elaborating an asbestos removal plan, part of

which is the gradual replacement of asbestos-containing slates and corrugated sheets. In order to create

an inventory of asbestos-containing roofs in Flanders, OVAM has turned to remote sensing expertise

provided by EODaS and VITO. To this end the high-resolution aerial images obtained in the context of

the Flanders Digital Height Model (DHMVII 2013-2015) are being analysed with deep-learning

algorithms within the EODaS Machine Learning workflow.

EODaS, VITO: Unmanned aircraft for the operations of VLM (Flemish Land Agency) -

https://overheid.vlaanderen.be/onbemande-vliegtuigen-voor-de-werking-van-de-vlm (Dutch)

EODaS, VITO, VLM and ANB (Agency for Nature and Forests) carry out an analysis on the possible uses

of imagery collected by unmanned aircraft in a limited test natural area, to be analysed via AI techniques

and other methods. The project aims to assess the possible added value for determining groundwater

levels, plant growth, relief or woody vegetation, especially in the less inaccessible parts.

Solar Map of Flanders (Zonnekaart Vlaanderen) - https://overheid.vlaanderen.be/bvk-

zonnepotentieel-vlaanderen-voorbeeldprojecten

By analysing the very precise LiDAR-based elevation measures from EODaS’ Digital Height Model

Flanders II (DHMV II, 2013-2015) to determine the surface area, orientation and inclination of some 2.5

million roofs, and by combining these data with meteorological, land registry and address data, the

Flemish Energy Agency (VEA) and VITO created the Solar Map of Flanders which shows the ‘solar score’

for all buildings or parts of buildings, and their ‘solar potential’.

An overview of all ‘Image,Processing Chain’ remote sensing projects by Information Flanders can be

found here: https://overheid.vlaanderen.be/BVK-remote-sensing-projecten-bij-Informatie-Vlaanderen

(Dutch).

Conclusions

The research question, whether aerial photography rather than satellite data is needed to recognise

crops in areas characterised by relatively small plots and higher diversity of crops, could not be answered

due to unavailability of images with both a high resolution and high frequency. Satellite data, at a fairly

low resolution of 10 m, are freely available at a frequency (every 5 days) seemingly adequate for crop

recognition in areas characterised by fairly large plots and a limited variety of crops. Aerial photography

Page 26: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 26 | 75

data with a high resolution, on the other hand, are available only at an annual or even multi-annual

frequency, and sometimes in the winter season when hardly any crops are present.

A possible way out of this dilemma would be to pay for the creation of a high-resolution high-frequency

dataset for a carefully selected area, limited in surface area but predominantly agricultural, with small

and varied plots. The cost might not be as prohibitive as before, due to the continuous evolution of

aerial photography techniques, notably the development of unmanned aerial vehicles (UAVs or

‘drones’).

This may eventually result in a two-way approach to crop recognition, on the one hand using lower-

resolution satellite data for regions with large plots and a limited variety of crops, complemented on

the other hand with high-resolution aerial photography for regions with fairly small plots and a larger

range of different crops.

5. Report on thematic task 2 - Build-up area

5.1. Case study 4 - Implementing SDG indicator 11.7.1

5.1.1. Pre-works

State of the art

As our goal is to implement the UN methodology to compute the SDG indicator 11.7.1 we mainly used

the methodological reports written by UN HABITAT. We received several documents, some of them are

draft, from the UN HABITAT documenting on the methodology.

Moreover, several documents or articles concern defining population agglomerations or cities, which is

as the heart of the SDG indicator. Eurostat for example define, within the Tercet typology, the concept

of Cities based on a 1km² grid cells. For that purpose, Eurostat has released a Methodological manual

on territorial typologies (https://ec.europa.eu/eurostat/documents/3859598/9507230/KS-GQ-18-008-

EN-N.pdf/a275fd66-b56b-4ace-8666-f39754ede66b). One similarity with the UN HABITAT methodology

is that one has to cluster contiguous cells to define areas.

Statistical product definition

The statistical product is the SDG indicator 11.7.1 which is the “average share of the build-up area of

cities that is open space for public use for all, by sex, age and persons with disabilities”. It consists of a

number between 0 and 1 for every city in France.

Moreover, the statistical product can be refined by:

- distinguishing among the “open space for public use” between the streets, the green open public

spaces and others open public spaces

- giving the proportion of people (by sex and age) who live within the cities who have access to these

open public spaces (except streets) meaning they are located at less than 400m from the space.

Data source & toolkit

Data

OSO – OSO is a landcover map (raster) which resolution is 20m. It is derived from Sentinel 2 imageries

which are processed through the iota2 chain developed by CESBIO, an informatic laboratory based in

Toulouse, France. It covers the whole metropolitan France and the last version is from 2018 imageries.

Bdtopo – The Bdtopo consists of multiple vector maps (layers) describing the human constructions

(roads, buildings, bridges, parks, parking, cemeteries, etc.) as well as the natural elements (waterways,

Page 27: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 27 | 75

ground elevation, forests, etc.). It is maintained by the national geographical institute (IGN) in France. It

covers the whole french territory and is updated every year.

OSM – Open Street Map is a well known open data platform giving access to detailed geographical

information. It is available in France, but the quality may vary between areas.

Sentinel 2 – Sentinel 2 is the name of two earth observations satellites, the first one Sentinel-2A

launched in 2015 and the second one Sentinel-2B launched in 2017. They will be active for normally 7

years. They give access to 20m resolution raster imageries of the whole earth every 5 days

approximately. The bandwidths are located in the visible lights and in the infra-reds.

Cadastral plan – Vectorized cadastral plan with some additive information on the land use and owners.

Software

R (packages raster, sf)

5.1.2. Stage 1

Test site definition

We first considered the 31 urban units in France that contains more than 200 000 inhabitants. These

cities are listed in the table below.

Table 5.1 List of cities.

City name Population

1 Paris 10 923 026

2 Lyon 1 668 841

3 Marseille 1 443 980

4 Nice 1 111 658

5 Lille 1 004 759

6 Toulouse 955 238

7 Bordeaux 928 517

8 Nantes 659 454

9 Toulon 596 700

10 Douai 510 698

11 Avignon 480 117

12 Rouen 456 875

13 Grenoble 454 016

14 Strasbourg 440 724

15 Montpellier 427 441

16 Tours 350 648

17 Rennes 321 135

18 Valenciennes 314 183

19 Metz 284 303

20 Saint-Étienne 283 825

21 Armentières 281 176

22 Orléans 280 029

23 Nancy 271 940

24 Clermont-Ferrand 268 475

25 Bayonne 238 495

26 Mulhouse 238 358

27 Angers 235 886

28 Le Havre 234 250

29 Dijon 229 250

30 Le Mans 226 563

31 Reims 207 692

These cities are located all over metropolitan France, but none of them belong to an overseas

department.

Page 28: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 28 | 75

Data collection

Data sources

The OSO landcover map has been acquired freely from http://osr-cesbio.ups-tlse.fr/~oso/. The Bdtopo

and the cadastral plan are data available within the institute (Insee): some agreements between Insee

and IGN and between Insee and DGFiP (general direction of public finances) allow the institute to have

access to these data. OSM and Sentinel 2 are freely accessible online data.

Repositories

The Bdtopo and the cadastral plan are located on a secured server. The non confidential data can be

stored on a local platform (non protected servers) and used on virtual machines.

Collection

For the moment, only OSM and Sentinel 2 data are non comprehensive. All the data available have not

been downloaded because of memory size and downloading time.

5.1.3. Stage 2

Data pre-processing

The OSO data base has been pre-processed: pixels are classified according to 23 classes, four of them

correspond to built-up area (thick built-up area, light built-up area, industrial and commercial areas,

roads). After pre-processing, each pixel can take one out of two values (0 for non built-up area and 1 for

built-up area).

The pre-process of the BDTopo consists of keeping the geographic elements corresponding to open

public spaces. Fortunately, there is a variable in the database that allow to directly target those spaces.

Like for the BDTopo we only keep some elements of the OSM data base: fclass variable is equal to “park”,

“recreation_ground” or “forest” and fclass is not equal to “residential”.

Pre-processing of Sentinel 2 data consists of computing the NDVI for each pixel. Moreover, we only

choose imageries with low cloud cover on the area of interest (the city boundary), and because we want

to detect green areas, we only take into account imageries recorded between March and September.

The cadastral plan is still being studied for what new information it can provides: this source has still to

be exported.

5.1.4. Stage 3

Main data processing

The data processing consists of two main steps. The first step is deriving from the OSO land use map,

the cities boundaries. The second step is to delineate within those boundaries the areas which are open

for public use.

The first step is based on pixels clustering. First, we compute for each built-up pixels the proportion of

other built-up pixels located in a 1km² disk. This proportion can vary from 0 (no other built-up pixels

around the considered pixel) to 1 (every neighbour pixel is also a built-up pixel). Then we only take into

account built-up pixels whose proportion is higher than 0.25. Finally, we cluster those pixels according

to a proximity rule: if two pixels share a common point (an edge or a corner) they are in an equivalence

relation. This allows to define equivalence classes which correspond to the final clusters. Among all the

clusters we only keep the biggest one located on the administrative boundary of the city. All the pixels

of this cluster are then unioned to produce a boundary, which is the city boundary.

Page 29: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 29 | 75

The second step is trickier as no sources can provide accurately all the open public spaces. Moreover,

we must clearly define what type of areas we are looking for. We can take advantage of having multiple

data bases: by cleverly combining them we can improve the quality of delineating open public spaces.

How to combine those sources is still under study.

Results analysis

Step 1 of the UN methodology works well. We can define for every city in France the geographical

delineation based on the OSO land use map. Computing time is around 30 minutes for one city,

depending of the size of the city (it is slower for big cities). For the moment, we only applied the method

for the main 20 cities in France. It appears that the results are very close to the urban units. Urban unit

is a concept to define urban units based on continuity of the built-up fabric. Every two buildings at a

distance of less than 200m are considered contiguous.

Nevertheless, some important differences sometimes appear. For example, for the city of Rouen located

at the north of Paris, the UN methodology doesn’t lead to link the north part and the south part of the

city separated by the Seine river (Figure 5.1).

Figure 5.1 The city of Rouen along the Seine river

The blue area is the city as defined by the UN methodology. The orange area with a red border is the

extent of the urban unit defined by Insee. The green areas are the other urban areas according to the

UN methodology.

Unlike the city of Rouen, the match between the UN result and the urban unit result is almost perfect

for the city of Reims (Figure 5.2).

Page 30: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 30 | 75

Figure 5.2 City of Reims

The blue area is the city as defined by the UN methodology. The orange area with a red border is the

extent of the urban unit defined by Insee. The green areas are the other urban areas according to the

UN methodology.

Moreover, we analyse for some cities the sensitivity of the results to the input parameters have been

explored. It appears that the results are not very sensitive to the radius on which neighbour pixels are

considered, but it can change dramatically according to the share of neighbour pixels which are also

built-up.

Step 2 is still under development as it is not straightforward to delineate the open public spaces.

Multiple sources have to be used to precisely create the boundaries of the open public spaces.

5.2. Case study 5 - Urban sprawl across urban areas in Europe

5.2.1. Pre-works

State of the art

This case study aims at characterizing urban sprawl across urban areas in the Netherlands by means of

data-driven machine learning methods, in order to evaluate to which extent can NSOs benefit from

Earth observation to monitor and report on build-up area at local to national level. Urban sprawl was

only recently officially acknowledged as an issue in Europe (Hennig et al. 2016) and numerous attempts

at characterizing urban sprawl have been made in recent years. In its 2016 report, EEA estimated that

sprawl is most pronounced in wide rings around city centres, along large transport corridors, and along

many coastlines. It further identifies the two largest clusters of high-sprawl values in Europe. The first

spans from north-eastern France to western Germany including Belgium and the Netherlands and the

second is in the United Kingdom between London and the Midlands. Hennig et al. (2015) concluded that

increasing urban sprawl in Europe causes land-use conflicts and threatens sustainable land use.

Page 31: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 31 | 75

At country scale, in order to preserve the nature and the environment, the Netherlands has striven for

the past 60 years to keep existing cities compact to avoid extensive and uncontrolled urban and

suburban sprawl. Although the urban compaction policy has prevented urban sprawl in the Netherlands,

most rural-urban fringes in the Netherlands have seen substantial urbanization in recent years

(Nabielek, Hamers, and Evers 2016). Indeed, growing welfare, global economic forces, improved

transportation networks and increase mobility have made possible for people to live and work further

away from the cities while retaining most of the cities’ advantages. Further, OECD (2018) estimated in

2018, that Dutch urban areas are less fragmented, but also more decentralized than the OECD average.

In the Netherlands, the population density dispersion across urban space lies indeed far below the OECD

average. It was also evaluated that between 2000 and 2014 the fragmentation index had reduced by

9%, i.e. new development has been constructed in a more contiguous manner.

In this study we follow the EEA (Hennig et al. 2016) approach define urban sprawl by applying the

method of ’weighted urban proliferation’ (WUP). This method quantifies the degree of urban sprawl for

any given landscape through a combination of three components: (i) the size of the built-up areas; (ii)

the spatial configuration (dispersion) of the built-up areas in the landscape (iii) the uptake of built-up

area per inhabitant or job.

Thus, the first step here is to perform a spatial analysis to delimit the built-up area of urban

agglomeration. The last two decades has seen an exponential increase in the amount of satellite

missions acquiring high resolution image time series and a general consensus has been reached that

satellite remote sensing provides viable means for measurement-based characterization of the land use

(Corbane et al. 2017; Pesaresi et al. 2016) and land cover on regional to global scale (Charlotte Pelletier

et al. 2016). Recently, the focus has been on ensemble learning methods. Two approaches will be used

to evaluate the extend of a built-up area.

First, following the Belgiu et al. (2016) study we will make use of a Random Forest state-of-the-art

classifier. Random Forest (Breiman 2001) is an ensemble method, which constructs many decision trees

to be used to classify a new instance by a majority vote. Each decision tree node uses a subset of

attributes randomly selected from the original set of attributes. Additionally, each tree uses a different

bootstrap sample data. The decision rule will be learned by evaluating the NDVI and NDBI spectral

features which stem for vegetation depiction and building detection respectively.

The second approach will explore the added-value of convolutional deep neural networks, indeed, since

the high rate availability of high resolution images (spatial resolution of 10m or finer) Deep Learning

(Lecun, Bengio, and Hinton 2015) algorithms have recently seen a massive rise in popularity in the

remote sensing community. Deep Learning is characterized by an "end-to-end" learning approach and

depends on a multilayer task module to achieve the final goal. It attempts to mimic the activity in layers

of neurons in the neocortex. The applications span from analysis task such as image fusion and/ or

registration, scene classification, object detection, segmentation, and object-based image analysis.

Most studies focused on the field of land use and land cover classification where information is retrieved

from hyper-spectral or high resolution images (Ma et al. 2019) by means of CNNs. Further, most studies

investigated supervised Deep Learning models, which require a large amount of training data to trigger

its image classification power. However, the preparation of training datasets is tedious and highly time

and/or cost-intensive. Therefore, augmentation techniques such as transfer learning, active learning are

being investigated in recent studies to increase the size and/or efficiency of the training set (Liu, Zhang,

and Eom 2017).

It appears from available LULC studies, built-up area has been one of the most important aspects to be

extracted in remote sensing images by means of Deep Learning models. Several factors such as complex

Page 32: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 32 | 75

structure, diverse texture and varied backgrounds are typical challenges for the task of built-up area

extraction (Ehrlich et al. 2018; Ma et al. 2019). Unfortunately, most available studies solely used a single

image scene, although some used multisource remote-sensing data and time series were rarely

analysed using DL algorithms. For example, Ienco et al. (2017) used multi-temporal remote sensing data

(Pléiades images) and RNNs to perform LULC classification, but only three dates of imagery were used

for this analysis (July and September in 2012, and March in 2013). See Table 5.2. Therefore, further

developments are needed to investigate the wealth of information present in long time records, such

as Landsat and Sentinel time series.

Table 5.2 Deep Learning for remote sensing related work.

Approach Sensors Pre-processing Features Applications*

Ruswurm et al 2018 Sentinel - 2 None TOA ConvRNN

Ruswurm et al 2017 Sentinel - 2 Atmospheric correction BOA RNN

Schiachalou et al

2015

Landsat

RapidEye Geometric correction and image registrations TOA HMM

Hao et al 2015 MODIS Atmospheric correction and image

registrations

Statistical

phen. Features RF

Nougueria et al

2016

Aerial images,

SPOT - - CNN

Ienco et al 2017 Pleiades Mosaic of orthorectified and atmospherically

corrected scenes

Mean and std

of bands, NDVI RNN

* ConvRNN – Convolutional Recurrent neural network, RNN – Recurrent neural network, HMM - Hidden Markov model, RF –

Random Forest, CNN – Convolutional Neural Network

Statistical product definition

The percentage of built-up area (PBA) is the ratio of the size of the built-up areas to the size of the total

area of the reporting unit and is given as a percentage.

The utilization density (UD) measures the number of people working or living (N Inh + Jobs) in a built-up

area (per km2). Built-up areas with more workplaces and/or inhabitants are considered more intensively

used, and hence less sprawled, than areas with a lower density of workplaces and/or inhabitants. LUP,

the reciprocal can also be used, that is the area of land used per inhabitant or workplace (LUP). High

LUP values indicate that more space is used per inhabitant or workplace than in areas of low LUP values.

Urban sprawl is quantified by means of Weighted Urban Proliferation (WUP) which is the product of the

dispersion, a weighting of dispersion, the percentage of built-up area (PBA) and a weighting of the land

uptake per person (LUP), that is land uptake per inhabitant or workplace. It is measured in urban

permeation units (UPU) per square metre of landscape (UPU.m−2).

The dispersion quantifies the spatial distribution of built-up areas, expressed as UPU per m2 of built-up

area (UPU.m−2). The further dispersed the built-up areas, the larger the value of DIS. Therefore, more

compact built-up areas have lower values of DIS than less compact built-up areas. Urban permeation

(UP) is a measure of the permeation of a landscape by built-up areas. It accounts for the DIS and the

PBA in the reporting unit. It is measured in UPU per m2 of landscape.

The Normalized Difference Vegetation Index (NDVI) is the well-known and most used vegetation index.

It normalizes green leaf scattering in the Near Infra-red wavelength and chlorophyll absorption in the

red wavelength.

NDVI = (NIR - RED) / (NIR + RED)

NDVI values range is -1 to 1. Negative values (values approaching -1) indicate the presence of water

while values close to zero (-0.1 to 0.1) correspond to barren areas of rock, sand, or snow. Low, positive

Page 33: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 33 | 75

values ranging from 0.2 to 0.4 represent shrub and grassland and high values (values approaching 1)

indicate temperate and tropical rainforests.

The Normalized Difference Built-up Index (NDBI)s used to extract built-up features and have indices

ranging from -1 to 1. Build-up areas and bare soil reflect more SWIR than NIR while water bodies do not

reflect in the Infrared spectrum. For vegetated surfaces, the reflection of NIR is higher than the SWIR

spectrum. The Normalize Difference Build-up Index value lies between -1 to +1. Negative value of NDBI

represent water bodies whereas higher values represent build-up areas. NDBI value for vegetation is

low. NDBI calculation is simple and easy to be derived.

NDBI = (SWIR – NIR) / (SWIR + NIR)

Built-up Index (BU) allows for analysis of urban patterns using NDBI and NDVI, it is a binary image of

which higher positive value indicates built-up and barren areas.

BU = NDBI - NDVI

Data source & toolkit

Data Sources and Access

Satellites Data

Sentinel-1 and Sentinel-2 data were acquired from The Copernicus Open Access Hub. To this end, we

make use of the API Hub which is a dedicated interface allowing users access via a scripting interface.

The API Hub Access is currently available for all users registered on SciHub. MODIS and Landsat data

were accessed via the Application for Extracting and Exploring Analysis Ready Samples

(https://lpdaacsvc.cr.usgs.gov/appeears/; AppEEARS) which offers a simple and efficient way to access

and transform geospatial data from a variety of Earth Observation datasets from NASA.

Administrative Datasets

When not in house, the Public Services On the Map (https://www.pdok.nl; PDOK) website was used to

access latest official release of open access geo-information. These datasets can be accessed via geo

web services, RESTful APIs and are available as downloads and linked data. This is current and reliable

data for both the public and private sectors. PDOK makes digital geo-information available as data

services and files. The PDOK services are based on open data and are therefore freely available to

everyone.

Software

Programming language

In this study, all tasks are carried out in Python (https://www.python.org/about/gettingstarted/)

programming language. Created 30 years ago, Python is a general purpose and high level programming

language. Python use is particularly common in data science and machine learning fields. Furthermore,

Python is developed under an OSI-approved open source license, making it freely usable and

distributable, even for commercial use. Jupyter notebooks and/or PyCharm will be use for executing

Python code.

Classification methods

Here, the first challenge is to classify land use cover by means of earth observation datasets and machine

learning methods. To this end, the Python module scikit-learn is used to implement the classifiers. This

Page 34: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 34 | 75

module integrates a wide range of state-of-the-art machine learning algorithms for medium-scale

supervised and unsupervised problems (Pedregosa et al. 2011). For comparison purposes two

traditional machine learning approaches are under consideration for implementations:

Supported Vector Machine (SVM) classifier has been widely used and reported as an

outstanding classifier (Cortes and Vapnik 1995). The basic idea of SVM is to classify the input

vectors into two classes using a hyperplane with maximal margin.

Random Forest (RF) (Breiman 2001) is an ensemble method, which constructs many decision

trees to be used for classifying a new instance by the majority vote. Each decision tree node

uses a subset of attributes randomly selected from the original set of attributes. Additionally,

each tree uses a different bootstrap of sample data.

Image processing

Image processing will be carried out mainly by means of Python code. To this end, various libraries and

packages will be used to read and process the data in order to prepare them for the classification step.

table 4.3 provides an explicit listing of the packages used for image processing.

Visualization

During this project the visualization of geo-spatial data will be made with Folium (https://python-

visualization.github.io/folium/): a Python library, which make it easy to visualize data on an interactive

leaflet map. It allows for both creating choropleth maps and passing rich vector/raster/HTML

visualizations as markers on the map. Folium supports both Image, Video, GeoJSON and TopoJSON

overlays. Moreover, this library has a number of built-in tilesets from OpenStreetMap, Mapbox, and

Stamen, and supports custom tilesets with Mapbox or Cloudmade API keys. In addition, Matplotlib

(https://matplotlib.org/), a plotting library for the Python programming language, and its numerical

mathematics extension NumPy will be used for generating more traditional plots such as generate plots,

histograms, power spectra, bar charts, error charts, scatter plots, etc.

The overview of satellite missions used in this study is presented in Table 5.3. List and description of

used packages and libraries is shown in Table 5.4

Table 5.3 Overview of satellite missions used in this study.

Mission Sensors Applications Repeat

cycle

Spatial

Resolution Formats

Sentinel -1 C-Band SAR

Sea ice, oil spills, marine winds and

waves, landuse change, respond

emergencies

12 days

5 m

20 m

20 m

.SAFE with GEO-

TIFF, XML, PNG,

XDS, HTMAL

and netCDF files

Sentinel -2

MSI (13 bands

from 443 nm to

2,190 nm)

Agriculture, Forests, land-use and

landcover change; mapping biophysical

variables; Monitoring coastal and inland

waters; risk and disaster mapping

10 days

10m

20 m

60 m

.SAFE with

JPEG2000, XML,

GML and HTML

files

Landsat 8

OLI ( 9 spectral

bands from 0,43

μm to 1.38 μm)

Agriculture, Forestry and Range

Resources Land Use and Mapping

Geology Hydrology Coastal Resources

Environmental monitoring

16 days 30 m GEOTIFF (ARD

via AppEars)

Terra/Aqua

MODIS (36

discrete spectral

bands from 405

m to 14,085 μm)

Atmosphere, Land, Cryosphere and

Ocean 16 days

250 m

500 m

1000 m

GEOTIFF (ARD

via AppEars)

Page 35: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 35 | 75

Table 5.4 List and Description of used Package and library.

Name Description

SCIPY a Python-based ecosystem of open-source soft- ware for mathematics, science, and engineering.

https://www.scipy.org/

OSGEO/GDAL

open source X/MIT licensed translator library for raster and vector geospatial data formats. It presents a

single raster abstract data model and single vector abstract data model to the calling application for all

supported formats.

https://gdal.org/

OPENCV

open source computer vision and machine learning software library. OpenCV was built to provide a

common infrastructure for computer vision applications and to accelerate the use of machine perception

in the commercial products.

https://opencv.org/

PANDAS

open source, BSD-licensed library providing high-performance, easy-to-use data structures and data

analysis tools for the Python programming language. The perfect tool for bridging the gap between rapid

iterations of ad-hoc analysis and production quality code.

https://pandas.pydata.org/

GEOPANDAS

open source project to make working with geospatial data in python easier. GeoPandas extends the

datatypes used by pandas to allow spatial operations on geometric types.

http://geopandas.org/

5.2.2. Stage 1

Test site definition

Two study areas are selected in the Netherlands (Figure 5.3). The first study area will focus on the most

densely populated part of the Netherlands, The Randstad. In 2016, it was estimated that almost 50% of

the Dutch population lived in Randstad which represents about 25% of the country’s surface area. The

second study area will focus on Zuid-Limburg which in past decades has seen the number of inhabitants

decreased and according to forecast (PBL add ref here) is projected to shrink by more than 10.% by 2027

(Nabielek, Hamers, and Evers 2016).

Figure 5.3 Location of test areas

Page 36: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 36 | 75

Data collection

In this study, we will exploit the spectral and temporal information present in four satellites datasets.

The satellite datasets will be acquired on the domain of interest for May, June, July of 2017, 2018, and

2019.

Earth Observation Data

MODIS

In this study, we make use of the 16-day NDVI data composite from MODIS Terra and Aqua. The

MOD13Q1 Version 6 product provides a Vegetation Index (VI) value at a per pixel basis for two primary

vegetation layers, the first is the Normalized Difference Vegetation Index (NDVI) and the Enhanced

Vegetation Index (EVI), which has improved sensitivity over high biomass regions. Detailed description

on the MODIS NDVI retrieval algorithm can be found in Didan (Didan 2015). This dataset runs from 2004

to present with a fortnightly temporal and 250 to 500 m spatial resolution. The datasets are provided

with 16 days pixel reliability which allow for cleaning of the data as well as masking sea pixels.

Landsat

In this study, we will make use of the Landsat 8 OLI Collection 1 Tier 1 orthorectified scenes, using the

computed surface reflectance to create 16-day NDVI data composite. OLI collects data in eight bands at

30m resolution and one panchromatic band at 15m resolution. The atmospheric correction is

performed by means of the LEDAPS algorithm and convert the raw Landsat data to surface reflectance.

Sentinel 1&2

In this study, the Level-1 Ground Range Detected (GRD) products in dual-pol (VV+VH) Interferometric

Wide swath mode (IW) of Sentinel-1 data are used. Sentinel-2 constellation, launched in 2015 and 2017,

aims at providing a full and systematic global coverage at spatial resolution as high as 10 m. In this study,

we will solely make use of the, atmosphere corrected, bidirectional surface reflectance from the 10m

resolution bands, namely B02, B03, B04 and B08 of which B04 and B08 will be used to compute the

NDVI. The data volume of Sentinel 1 and 2 in their twin constellations is approximately 3.6 TB/day and

1.6 TB/day respectively(Soille et al. 2018). The synergistic use of Sentinel-1 and Sentinel-2 promises an

access to a cloud-free global image database at high spatial resolution.

Register Data

The Basisregistratie Adressen en Gebouwen

BAG (https://zakelijk.kadaster.nl/basisregistratie-adressen-en-gebouwen) is the main registry

containing all addresses and buildings in the Netherlands. The BAG furthermore contains additional

information such as the object type, the area of a building, the date of build, etc. The BAG is maintained

by local authorities who are also responsible for the quality of the registry. By using the BAG as a filter

for the satellite images, geographical areas without buildings could be removed from consideration.

Soil Use Classification

The Soil Use File (https://www.pdok.nl/introductie/-/article/cbs-bestand-bodemgebruik) contains

digital geometry of land use in the Netherlands. Examples of land use are traffic areas, buildings,

recreational areas and inland and outside water. The limitations are largely based on the Top10NL (BRT).

The current classification is mainly based on information from aerial photos.

CORINE Land cover

For validation and training purposes the CORINE Land Cover (CLC) for the year 2006 will be used. The

CLC inventory was initiated in 1985 (reference year 1990) and updated version have been produced in

Page 37: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 37 | 75

2000, 2006, and 2012. It consists of an inventory of land cover in 44 classes. CLC uses a Minimum

Mapping Unit (MMU) of 25 hectares (ha) for areal phenomena and a minimum width of 100 m for linear

phenomena. The time series are complemented by change layers, which highlight changes in land cover

with an MMU of 5 ha. The Eionet network National Reference Centres Land Cover (NRC/LC) is producing

the national CLC databases, which are coordinated and integrated by EEA. CLC is produced by the

majority of countries by visual interpretation of high resolution satellite imagery. In a few countries

semi-automatic solutions are applied, using national in-situ data, satellite image processing, GIS

integration and generalization. The 2012 version of CLC is the first one embedding the CLC time series

in the Copernicus program, thus ensuring sustainable funding for the future.

GHS-BUILT

The Global Human Settlement Layer (GHSL) produces new global spatial information, evidence-based

analytics and knowledge describing the human presence on the planet (Pesaresi, Syrris, and Julea 2016;

Corbane et al. 2017). GHSL aims to provide scientific methods and a system for reliable ad automatic

mapping of built-up areas from remote sensing data. GHSL operates in an open and free data and

methods access policy (open input, open method, open output). The GHS P2016 suite consists of multi-

temporal products, that offers an insight into the human presence in the past: 1975, 1990, 2000, and

2014. The European Settlement Maps (GHS-BUILT) are pan-European built-up layers derived from

higher resolution imagery. Information layers on built-up presence as derived from Sentinel1 image

collections (S1A 2016). It contains two experimental datasets, made with different set of parameters,

ESM training (Europe only) and GHSL training (World)

5.3. Case study 6 - Combination of administrative and Earth Observation data to determine

the quality of housing

The aim of this case study is to combine remote sensing data with official statistics and administrative

data in order to investigate the quality of urban life. Remote sensing data can be used to measure

different aspects of quality of life such as air quality, urban heat islands and urban green. Through the

combination of official statistics, earth observation and geodata this topic can be addressed in a more

comprehensive way than with only one data source.

As of the reporting year 2018, the German Microcensus is geocoded so that information about the

surroundings can be linked to a household. The aim of this case study is to investigate the added value

that can be generated from geocoded survey statistics. The topic of urban quality of life was chosen to

demonstrate this, since it is assumed that the wellbeing in a residential environment can be influenced

through its surroundings. In a first step, this case study will identify quality of life aspects that can be

measured through geographic data so that they can be linked to the geocoded statistic. The focus of

the geographic data lies on remote sensing data. The aspects air quality, urban heat islands, urban green

and noise pollution were identified through reviewing quality of life initiatives, discussions with earth

observation experts and a literature review on determining the quality of life using remote sensing data.

Furthermore, a literature review of the identified aspects was conducted. After the determination of

the aspects that could be examined in this study, possible data sources and data sets, which could be

linked to a geocoded statistic, are listed.

The Microcensus contains information about the socio-economic situation of a household. In this case

study, it will be investigated if remote sensing data can be used to determine differences in quality of

life within cities. The data on quality of life on a regional scale and socio-economic characteristics can

be linked to investigate their connection.

Page 38: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 38 | 75

5.3.1. Pre-works

State of the art

Firstly, the literature review will focus on the different initiatives of quality of life and secondly

specifically on those aspects of urban quality of life, which can be monitored with remote sensing data.

In this report, quality of life in urban regions will be referred to the wellbeing of the inhabitants of urban

areas based on different aspects. In a first step, these relevant facets will be identified.

Quality of life initiatives

Several actions deal with different aspects of quality of life. The initiative “Well-being in

Germany”(“Well-Being in Germany” n.d.) identified different objects of investigation within the

framework of national and international research projects and discussions. The OECD Better Life index

(http://www.oecdbetterlifeindex.org) is an interactive tool that allows to perform an analysis on well-

being according to the users own preferences. These preferences can be saved and compared by region

or gender. Furthermore, a regional score exists, measuring the topics on a regional scale. There are

similarities between these two initiatives: Both include indicators on income, health, education,

environment and work-life-balance. These initiatives were used as a starting point to identify quality of

life aspects that are relevant for the urban environment and filter out issues that can be measured using

remote sensing data. Only the target of air quality in both initiatives is related to urban quality of life

and quantifiable through remote sensing. This aspect is also included in the Sustainable Development

Goals (SDGs), in the indicators 3.9.1 “Mortality attributed to household and ambient air pollution” and

11.6.2 “Fine particulate matter in cities”.

Additionally, the 2030 Agenda for Sustainable Development adopted by all United Nations Member

States in 2015, provides a shared blueprint for peace and prosperity for people and the planet, now and

into the future ( https://sustainabledevelopment.un.org/sdgs). As a result, 17 Sustainable Development

Goals (SDGs) were declared, including indicators related to quality of urban life and which could be

generated by remote sensing data. The SDG indicators which are connected to urban quality of life are:

11.1.1 Urban population living in inadequate housing

11.2.1 Convenient access to public transport

11.3.1 Land consumption rate to population growth rate

11.7.1 Built-up area of cities that is open space for public use

There are further aspects which are named in the literature concerning quality of life: The literature

includes the aforementioned aspects but also covers further topics dealing with urban quality of life,

which can be assessed using remote sensing. The aspects are noise pollution, urban green and urban

heat islands. For both geo- and remote sensing data can be used to find regional differences which could

indicate a difference in quality of life across one urban region.

Air quality

Air pollution and thus air quality is an important aspect of urban quality of life which can be seen by the

fact that it is mentioned in the initiatives. Remote sensing can be used to aid in air quality

measurements:

Sentinel-5P from the Copernicus Programme of the EU has the Spectrometer TROPOMI

(Tropospheric Monitoring Instrument) measures Ozone, nitrogen dioxide, carbon monoxide

with resolution of 3.5 km to 7 km. The datasets are available the Sentinel-5P pre-operations

datahub.

Page 39: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 39 | 75

The Copernicus Atmosphere Monitoring Service (CAMS) is a component of Copernicus and

consists of two major forecast and analysis systems. First, the CAMS global near real time (NRT)

service, based on the European Centre for Medium-Range Weather Forecasts (ECMWF)

Integrated Forecast System, provides daily analyses and forecasts of reactive trace gases,

greenhouse gases and aerosol concentrations. Secondly, seven regional models in Europe

perform air quality forecasts and analyses on a daily basis. Based on these individual forecasts

and analyses an ensemble forecast of air quality over Europe is produced and disseminated by

Météo France called ENSEMBLE. Predictions of daily mean and maximum concentrations of

greenhouse related gases as well as particulate matter in the air, computed using numerical

models, are available online. The data have a spatial resolution of 0.1 degree and are available

daily with 1 hour intervals.

NASA provides air quality products derived from the Moderate-resolution Imaging

Spectroradiometer (MODIS) instrument.

The project “SAUBER” simulates the air quality with machine learning methods to provide

comprehensive spatial information about the current and future air quality. To achieve this goal satellite

data will be linked with data from local pollution monitoring stations as well as traffic and weather data.

Through this combination a higher spatial resolution will be reached than through satellite data alone.

Urban Heat Islands (UHI)

The following summary is based on the report of the U.S. Environmental Protection Agency (2008): Land

cover influences the temperature of areas: Roofs and pavements have higher temperatures than

vegetated areas or wetlands. Air temperatures in cities are much higher than rural surrounding areas,

especially after sunset, where the difference can be up to 12 °C. This leads to UHIs, particularly in the

summer, which can result in problems with human health including heat-related mortality. The elderly

and infants are especially vulnerable. Van der Hoeven und Wandl (2014) studied UHI in Amsterdam

retrospectively during a heat wave using Landsat 5 with a resolution of 120 m. They combined their

findings of the location of heat islands with the energy labels of buildings and a quality of life index to

identify vulnerable inhabitants. In general, Landsat satellites are able to provide information about

surface temperatures in a spatial resolution of up to 30m. The latest Landsat satellite in orbit is Landsat

8 with two thermal bands. Since the start of the Sentinel-3 satellite, thermal information can also be

derived from this Copernicus satellite in medium spatial resolutions (300m). Due to global warming the

importance of addressing UHIs will grow with time.

Urban green

The presence of urban green improves the quality of life since it improves the quality of air, reduces

noise pollution, regulates the temperature and provides space for recreation.

Vegetation can be detected in satellite images by using the visible bands, the near infrared channel and

by calculating the NDVI. An interesting aspect of urban green is how its access is distributed across a

city.

Possible data sources:

Satellite data from Sentinel-2 or Landsat can be used to identify green covered pixels/areas.

Data from the high-resolution layer Tree cover density can also be used to acquire data

regarding tree cover.

The German digital land cover model LBM-DE can be used to identify potentially green areas by

using land cover and land use information derived from remote sensing data.

Page 40: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 40 | 75

The Copernicus Urban Atlas or High-resolution Layer “Imperviousness” can be used to identify

urban or built-up areas.

Noise pollution

The following data sources can be used to model noise pollution. Geodata can be used to get

information about traffic noise emission from railways or motorways. Information about street network

from OpenStreetMap, Points of Interest (like hospitals, schools etc.) and railway network from the

Deutsche Bahn (DB) Netz AG can be used.

StreckeDB Streckennetz DB Netz AG 19

Street data from TopPlus and OpenStreetMap

The data of the DB route network are supplied to the BKG in MapInfo format at the beginning of each

year, as of November of the previous year. The data set has a positioning accuracy of 10m and

corresponds to a scale of 1:25000. The Deutsche Bahn route network is available to federal institutions

to perform compulsory tasks, subject to approval by DB Netz AG.

Statistical product definition

The output of this case study will be an analysis. Through the combination of different data sources, a

statistical analysis of how urban quality of life differs between socio-economic groups can be conducted.

The collected geographic data is an output as well. It can be used as a basis for further research.

Data source & toolkit

In the following, besides some useful tools all data sources which were identified as useful for this study

will be described.

Geodata

Additionally, to the already mentioned data sources, further useful geodata that were identified are:

From digital elevation and surface models, information about the height of buildings and

infrastructures can be derived.

Data sets like special Points of interest (POI) can be used to evaluate the distance to essential

infrastructures like hospitals, schools etc. The BKG offers the following georeferenced POIs with

additional information, like universities, kindergartens, hospitals, and schools.

A digital data set about the explicit geometry of houses is available via the HK-DE/HU-DE House

coordinates

Urban Atlas

The Urban Atlas is a product derived from Copernicus data to create a harmonized land cover and land

use map for European cities. The information is derived from Earth Observation but backed by ancillary

data: Very high resolution satellite imagery such as SPOT 5 & 6 and Formosat-2 are used. Basic land

cover classes are determined through automatic segmentation and classification. Furthermore, a visual

and manual interpretation is done from both very high-resolution satellite imagery and navigation data

(OSM or commercial navigation data).

The Urban Atlas 2012 is available for 693 functional urban areas (FUAs) in EU28 and EFTA as well as 107

FUAs in Turkey and the West Balkans countries. 17 urban classes with a minimum mapping unit of 0.25

ha, depending on their class, in urban areas exist. The classes which are relevant for this case study are

described below:

Page 41: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 41 | 75

Urban fabric classes are distinguished by their degree of soil sealing independent of the type of

housing. They are separated into continuous urban fabric (>80% soil sealing), discontinuous

dense urban fabric (50-80%), discontinuous medium density urban fabric (30-50%),

discontinuous low density urban fabric (10-30%), discontinuous very low density urban fabric

(<10%) and isolated structures.

Industrial commercial, public, military, private and transport units is a land use class where the

artificial surface is higher than 30 % and more than half of the surfaces has non-residential use

such as industrial, commercial or transport.

Roads:

o Fast transit roads and associated land, which are defined as “motorways” in the

navigation data

o Other roads and associated land

o Railways and associated land

Land without current use is defined as areas in proximity to artificial surfaces which are still

waiting to be used or re-used.

Green urban areas are public green areas mainly for recreational use such as gardens, zoos and

parks. Furthermore, forests from surrounding rural areas which extend into urban areas are

also classified as green urban areas if at least two sides are adjacent “to urban areas and

structures and traces of recreational use are visible.”

Sports and leisure facilities can be publicly or commercially managed.

Earth observation data

Sentinel 2

General information of Sentinel-2 satellite is described in chapter 3 - EO data sources. Sentinel data can

be used to derive green covered areas within a city. Thus, the urban green ratio will be calculated based

on optical Sentinel-2 data.

Sentinel-5P

Sentinel-5P carries TROPOMI which is the most advanced multispectral imaging spectrometer to date

which measures the air pollution. 5 different aspects of the atmosphere can be measured, which can

be seen in Figure 5.4:

Ozone: Protection from the ultraviolet radiation is given through the stratospheric ozone.

Ozone can form in the lower atmosphere which can lead to respiratory problems and damage

vegetation.

Forest fires and wood processing releases formaldehyde, which can irritate the eyes and the

lining of the nose and throat.

Sentinel-5P can measure nitrogen dioxide (NO2). NO2 can form naturally from BIOGENE

emissions such as microbiologic processes in the soil. In cities however most NO2 emissions

stem from motor vehicle exhaust. Chronic exposure can cause respiratory effects. The total

atmospheric NO2 column between the surface and the top of the troposphere is measured with

Sentinel-5P.

Methane, which is a potent greenhouse gas, stems from the fossil fuel industry, landfill sites,

livestock forming, rice agriculture and permafrost thawing. Headaches and nausea can be a

consequence of exposure.

Page 42: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 42 | 75

Volcanic activity, fires and burning of fossil fuels can lead to carbon monoxide pollution. The

amount of oxygen which can be transported in the blood stream can be affected through

breathing air polluted by carbon monoxide.

Figure 5.4 Sentinel-5P (Source: http://www.esa.int/spaceinimages/Images/2017/09/Sentinel-5P_infographic)

Landsat 8

Landsat 8 can be used as a freely available source of optical data. General information of Landsat 8

satellite is described in chapter 3 - EO data sources.

Georeferenced official Data

Page 43: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 43 | 75

The Federal Statistical Office of Germany (Destatis) is currently working on georeferencing all statistics

which have a geographic component. Some examples are listed below focusing on data sources which

might become useful during this ESSnet through the combination with the other data sources:

Census

Demographic data from the census 2011 is georeferenced on a grid of 100 by 100 meters. These results

are published, where data privacy allows it. The data which is available for most urban grid cells are the

number of inhabitants and demographic information such as age and gender.

Microcensus

The microcensus is a yearly survey containing 1% of the population. The survey gathers information on

the topics of demography, economic and social situation of the household, labor market, education and

housing. It is conducted as a panel, where every household is surveyed for four consecutive years. The

microcensus of 2018 will be geocoded for the first time.

However, it is not yet ready and will only be available in the autumn of 2019 and will therefore not yet

be included in this interim report.

Map of traffic accidents

Data exists of all street accidents with the modes of transportation which were involved and their

severity along with their location.

Map of hospital accessibility

The accessibility of hospitals exists as a data set on the driving distance to the closest hospital on a grid

of 100 by 100 meters.

Business register

The Statistical Business Register (German: Statistisches Unternehmensregister, URS) is geocoded and

includes among other things the branch of the company, revenue and number of employees.

Toolkit

GIS and R (R Core Team 2018) will be used as a toolkit as well as python scripting.

5.3.2. Stage 1

Test site definition

This case study is limited to urban areas. In a first step only the Frankfurt Rhine-Main area will be

considered. The surroundings of the households covered in the microcensus will be examined. The

surrounding characteristics of a household will be calculated on the INSPIRE 100 by 100 m grid as to

match the grid of the georeferenced census.

Data collection

The data was collected and matched to the level of the Inspire grid cells, which have the Lambert

azimuthal equal-area (LAEA) projection, with a width of 100 m. The georeferenced census data is

published in this format. Furthermore, at Destatis a data base for this grid is being built up, results of

this project can be used to fill the data base.

Sentinel-2

Page 44: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 44 | 75

The Sentinel-2 scenes has been downloaded through the Copernicus Open Access Hub Figure 5.5Figure

5.5 shows the satellite images matching these parameters which can be downloaded directly.

Figure 5.5 Results: Sentinel-2 images found for the period of interest, in the area of interest, as shown by the

green shapes.

Sentinel-5P

Sentinel-5P data can be collected from the Sentinel-5P Pre-Operations Data Hub

(https://s5phub.copernicus.eu/). An area and a time frame of interest can be defined. Figure 5.6 shows

the vertical column of NO2.

Page 45: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 45 | 75

Figure 5.6 Sentinel-5P Nitrogen dioxide

Landsat

Land surface temperature (LST) derived from Landsat 8 will be collected in order to analyse the heat

development within the study area. The LST has to be calculated based on the published methodology

following Cook et al. (2014). Therefore, the information of the thermal band and the NIR and Red Band

is needed. Figure 5.7 illustrates the result of the calculation of the summer of 2016 for the study area.

Page 46: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 46 | 75

Figure 5.7 Land surface temperature derived from Landsat 8 for the Frankfurt Rhine-Main area. Date: 23rd of August 2016.

5.3.3. Stage 2

Data pre-processing

Sentinel-5P

To process Sentinel-5P data in a way in which air quality can be inferred from it, chemical transport

models (CTM) have to be simulated due to the non-linearity of NO2 chemistry. A CTM simulates the

atmospheric chemistry. The satellite data has to be scaled to relate to ground-level measurements.

Satellite data of NO2 measurements is strongly linked to in-situ measurements (Bechle et al. 2013).

Landsat

In order to get land surface temperature (LST) from Landsat satellites a couple of pre-processing steps

are needed, as long as the USGS Landsat 8 LST ARD (analysis ready data) Level-2 product is not available,

as it is for the study region. Thus, the digital numbers need to be converted to radiance, then to at-

sensor (Top of atmosphere, TOA) brightness temperature. Furthermore, the land surface temperature

(LST) is calculated by estimating the surface emissivity, therefore land cover information and the

proportion of vegetation (via the Normalized Difference Vegetation Index, NDVI) is needed.

5.3.4. Stage 3

Main data processing

Since the microcensus data is not yet available the main data processing will be included in the final

report.

Page 47: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 47 | 75

6. Report on thematic task 3 - Land cover

6.1. Case study 7 - Comparing «in-situ» and «remote-sensing» collection mode for land cover

data

6.1.1. Pre-works

State of the art

Classification of satellite's images

Satellite's images are processed by a numerical classification technique that uses the information

contained in the values of one or more spectral bands to classify each pixel individually by assigning a

particular land cover class to it (i.e. water, forest, maize, buildings, etc.). The result of the classification

is a new image composed of a mosaic of pixels that each belongs to a particular class. This image is

essentially a thematic representation of the original image. There are several approaches to

classification including supervised and non-supervised classification methods. The main difference is

that in the supervised classification the user specifies the different pixel values or spectral signatures to

be associated with each class by selecting representative sampling sites of type of known coverage. The

computer algorithm uses these training zones to classify the entire image. On the other hand, non-

supervised classification does not use training data and it is the algorithm which forms the spectral

classes on the basis of the numerical information contained in the data itself (pixel values for each band

or index; NRCan 2018). Several studies show that supervised classification methods perform better than

unsupervised methods (Khatami, Mountrakis, and Stehman 2016; Szuster, Chen, and Borger 2011).

Many algorithms are used in the land use classification. The most well-known are Support Vector

Machine (SVM), Decision Trees and Random Forest (Breiman 2001). Several studies show more

satisfactory results with RF compared to the first two classifications (C Pelletier et al. 2016; Inglada et

al. 2015; Rodriguez-Galiano et al. 2012; Gislason, Benediktsson, and Sveinsson 2006). In order to

automate the classification procedure, to process large volumes of data over large French territories in

a short time, the Centre d'Etudes Spatiales de la BIOsphère (CESBIO) developed the iota2 processing

chain based on Random Forest. Land use maps for the whole of Metropolitan France have already been

produced by iota2.

The use of satellite images in the production of land cover maps is becoming increasingly frequent.

Multi-spectral and multi-temporal imaging is used to characterize phrenological variations in the state

of vegetation cover (Rodriguez-Galiano et al. 2012) and to detect the different components of the

Earth’s surface. According to Inglada et al. (2017) high-resolution spatial (metric or decametric) and

temporal images are required to produce detailed land cover maps. Sentinel-2 images (Drusch et al.

2012) with its unique characteristics (290 km wide swath, 10-60 m spatial resolution, revisited 5 days

with 2 satellites and 13 spectral bands) provide a powerful tool for mapping and monitoring large rich,

complex and sensitive ecosystems (Yesou et al. 2016). Ma et al. (2017) also found a positive correlation

between the size of study areas and the spatial resolutions of the images used. Very high spatial

resolution (<2 m) images are the most used. Nevertheless, due to the ease of access and the high

availability, Sentinel-2 images are often used.

In order to automate the classification procedure and process large volumes of data over large French

territories in a short time, the Centre d'Etudes Spatiales de la Bioshphère (CESBIO) developed the iota2

processing chain based on Random Forest. The main product of the chain so far is the Land Use Map of

metropolitan France "OSO" from images Landsat-8 and Sentinel-2 to, respectively, 20 and 10 m

resolution.

Page 48: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 48 | 75

Statistical product definition

The statistical products of the french TERUTI annual survey are complete breakdowns of the national

areas into land cover and land use classifications at NUTS2 level. The land cover classification has 7 main

categories: Artificial Land, Cropland, Woodland, Shrubland, Grassland, Bare land, Water. The statistics

derived from TERUTI survey are a combination of about 68.000 direct observations made by surveyors

in the field every year and automatic data imputation for about 7.2 million points from administrative

and geographical databases.

The OSO map is a land cover map covering the whole french metropolitan territory with a land cover

classification of 17 categories. Both vector and raster maps are available for 2016, 2017 and 2018. The

vector map has a minimum collection unit of 0.1 ha whereas the raster map is made of 10 meters pixels.

The automatic production process of these OSO maps is based on a supervised classification of time

series of SENTINEL2 images using existing databases as reference data for training and validation steps.

The statistical products of the CS7 should be:

a same land cover classification suitable for both TERUTI survey and OSO map;

for the comparison at an individual level (points-pixels) : kind-of confusion matrix (and derived

indices) crossing land cover classes from Teruti with land cover classes from OSO, for the

samples of «in situ» points and imputed points, and for different areas (regions) of the French

territory ;

for the comparison at an aggregated level (NUTS2): land cover area estimation from TERUTI and

OSO;

an adaptation of the remote-sensing process in order to improve the land cover classification

on TERUTI points;

investigate land-cover changes detection by analysing multi-annual Sentinel-2 time series

images in order to provide a list of points from TERUTI sample where land cover likely changed

since 2017.

Data source & toolkit

Sentinel-2 images and iota² processing chain

In France, the Theia Data and Services Center for continental surfaces is in charge of processing Sentinel-

2 images (crossing to level 2A) to correct atmospheric effects and obtain a surface reflectance (Olivier

Hagolle, Huc, et al. 2015; Olivier Hagolle, Sylvander, et al. 2015; O. Hagolle et al. 2010; 2008). These

corrections, made by the MUSCATE processing centre of the CNES with the MAJA processing chain

(MACCS ATCOR® Joint Algorithm; Multi-sensor Atmospheric Correction and Cloud Screening;

Atmospheric and Topographic Correction), allows in addition to detect clouds and their shadows.

The Sentinel-2 images are provided according to a precise and fixed cut creating tiles of 110 km per 110

km in the projection UTM/ WGS84 (Universal Transverse Mercator/ World Geodetic System 1984), with

a 10 km overlap of adjacent tiles (Figure 6.1). Time series of Sentinel 2 images are referred to be the

primary data source for the remote sensing method of the OSO process.

Page 49: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 49 | 75

Figure 6.1 Sentinel-2 tiles on France. Source CESBIO.

This high resolution time series images (Sentinel-2) has opened up new opportunities in satellite image

processing and land use/cover classification. The high repeatability (5 days) allows the evolution of land

cover to be monitored very frequently and therefore dynamic classes to be mapped over time (e.g.

agricultural classes). In this context, CESBIO has developed an image processing chain that can integrate

all the satellite images of a given period to automatically produce large-scale land cover maps, such as

OSO map: the iota² processing chain.

The iota2 processing chain

There is an increasing number of software solutions (commercial and free) for performing supervised

classifications of satellite images (R packages, python modules, ENVI, etc.). For example, the Orfeo

Toolbox (OTB) library developed by the CNES is a solution widely used by the remote sensing

community. These solutions each have their advantages and disadvantages.

In this project, the operational solution chosen is the processing chain iota² (Infrastructure for Land

Cover by Automatic Processing Incorporating Orfeo Toolbox Applications). Iota² is a processing chain

developed by the CESBIO to produce land cover classifications over large areas from satellite images

with one or more sensors. It allows the treatment of large volumes of data, from different tiles and

different acquisition dates. Available as free software (https://framagit.org/iota2-project/iota2/), this

python-based software is based on the Orfeo Toolbox (OTB) library dedicated to image processing

(https://www.orfeo-toolbox.org/) and on the Random Forest algorithm with eco-climatic stratification

(Inglada et al. 2017). It offers several features of large-scale image processing:

management of multiple Sentinel-2 tiles (overlay);

Multi-sensor management (Sentinel-2, Spot 6, Sentinel-1, Landsat, etc.);

gapfilling (filling voids): transition from irregular to regular time series / cloud management;

stratification by large region (e.g. ecoclimatic zone, sylvo-eco-region, etc.).

Iota² was created to be run on POSIX (Portable Operating System Interface–UNIX) operating systems. It

is possible to parallelise processing and use it both on multi-core shared memory machines and on high-

performance computing clusters with hundreds of nodes (C Pelletier et al. 2016). The main product

resulting from the processing of satellite images with iota² is the land cover map of metropolitan France

(« Occupation du Sol - OSO ») based on the analysis of time series Sentinel-2 (http://osr-cesbio.ups-

tlse.fr/~oso/).

Operating iota2

The general classification methodology applied by iota2 is based on a conventional supervised

classification procedure (figure 5.3), with the advantage of being able to process very large territories

and large volumes of data completely automatically in a short time. Because of this automatic approach,

iota2 was designed to be applicable independently of the land cover classes and therefore no date

Page 50: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 50 | 75

selection in terms of season or vegetation phenology is applied. It is therefore advisable to use as many

images as possible to better characterize the annual soil cover cycle (Inglada et al. 2017).

The processing entries shown in Figure 6.2 are:

1. the reference data. These are the georeferenced samples labelled with a known land cover

class;

2. the validity masks of the images time series. Each image corresponding to a date is accompanied

by a mask indicating the valid pixels (surface reflection) and the invalid ones (cloud detection,

cloud shadow, saturation);

3. the level 2A satellite images time series;

4. Optional entry: a Region of Interest (ROI) mask to exclude areas from classification.

The treatment is divided into 6 main steps (green boxes in Figure 6.2):

Figure 6.2 Schematic of the OSO Map Production Procedure (Inglada et al. 2017)

Sample Selection

Iota2 randomly separates reference data into learning data and validation data. In order to limit the

phenomenon of spatial self-correction which artificially increases the accuracy of classifications, the

separation takes place at the polygon level and not at the pixel level. This prevents pixels from the same

polygon, with similar characteristics, from being used for learning and validation, which would bias the

assessment of the quality of the classifications towards optimistic results (Accuracy, F-Score and Kappa

Index).

Linear Interpolation

The images time series are preprocessed with temporal gapfilling and re-sampling to ensure spatial and

temporal homogeneity. The approach consists of linear interpolation of invalid pixels using surface

reflectance values from previous and subsequent dates for the dates with clouds. For temporal re-

sampling, linear interpolation is applied to all surface reflectance values of all dates (valid and invalid

pixels), in order to have common dates for all pixels in the study area.

Feature Extraction

The images time series obtained with interpolations are used for the calculation of spectral indices

(NDVI: Normalised Difference Vegetation Index; NDWI: Normalized Difference Water Index and

Brightness) for each pixel on each acquisition date. These indices are added to the surface reflectance

data of each pixel, which improves the results of the classifications, especially when the study areas are

very large and with very variable landscapes. These indices are used to highlight specific properties of

Page 51: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 51 | 75

observed surfaces such as the presence of vegetation with NDVI, of water and wetlands with NDWI (C

Pelletier et al. 2016).

Training

Training data are used to learn the classifier to identify land cover classes. At this stage, the classification

model is produced. Iota2 is based on the classification algorithm Random Forest by Breiman (2001),

which showed higher overall precision than traditional methods such as Decision Trees and SVM. In

addition, this classification requires shorter processing times with simpler settings (C Pelletier et al.

2016; Inglada et al. 2015; Rodriguez-Galiano et al. 2012; Gislason, Benediktsson, and Sveinsson 2006).

Random Forest algorithm uses combination of decision trees, in a way that each tree depends on an

initial sample and at each step, the construction of a tree node is done on a subset of variables drawn

randomly with replacement (bootstrap). After generating a large number of trees, the prediction is the

result of a majority vote (Ensemble learning). In other words, the class assigned to each pixel is the most

frequently predicted.

Classification

This step assigns a particular land cover class to each pixel of the image using the time series of surface

reflectance and spectral indices with the classification model. The product of this step is a land use map

with the same classes as the learning data.

Validation

The quality of the land cover maps produced by iota2 is assessed with a set of indices derived from a

confusion matrix where the values in the cells correspond to the count of the validation pixels. The rows

correspond to the reference class, called the "true class", and the columns to the class obtained by the

classification. The indices correspond to global statistics which give summarised information on

classification, calculated from the validation data used at the pixel level:

1. Overall Accuracy, calculated by the sum of the diagonal divided by the sum of all the elements

of the confusion matrix, indicates the proportion of pixels that have been well classified, all

classes combined;

2. the Recall, indicating the fraction of pixels correctly classified in relation to the ground truth;

3. the Precision, indicating the fraction of pixels correctly classified in relation to all pixels classified

in the class;

4. the F-Score, denoting the harmonic mean of the Precision and the Recall;

5. and the Kappa Index, which takes into account the part of the agreement between the output

of the classifier and the reference data that may be due to chance. It therefore expresses the

relative difference between the observed agreement and the random agreement that can be

expected if the classification was random (Inglada et al. 2017).

OSO product

Thanks to the processing chain iota², it has been possible to produce the land cover map OSO at the

scale of France territory integrating several terabytes of Sentinel-2 satellite images. This land cover map

has the following nomenclature (in brackets, the data sources and corresponding type used for the

learning samples; CLC: Corine Land Cover, RPG: the agricultural Land Parcel Information System

“Graphical Parcel Register”, BD Topo: French National Geographic Institute, Randolph: The Randolph

Glacier Inventory):

Artificial Areas

o Continuous urban fabric (CLC 111)

o Discontinuous urban fabric (CLC 112)

o Industrial or commercial units (CLC 121)

Page 52: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 52 | 75

o Road surfaces (BD Topo)

Agricultural Areas

o Arable lands

Annual summer crops (RPG)

Annual winter crops (RPG)

Intensive grassland (RPG)

o Perennial crops

Orchards (RPG)

Vineyards (RPG)

Forest and Semi-Natural Areas

o Forests

Broad-leaved forest (BD Topo).

Coniferous forest (BD Topo).

o Shrubs and herbaceous vegetation

Natural grasslands (CLC 321)

Woody moorlands (BD Topo)

o Open spaces with little or no vegetation

Beaches, dunes and sand plains (CLC 331)

Bare rock (CLC 332)

Glaciers and perpetual snow (Randolph)

Water bodies (CLC 523 and BD Topo).

TERUTI data

TERUTI is a statistical area-frame annual survey on land cover and land use covering the French territory

since 1982. The statistical sampling unit is a portion of territory, generally a circular place of 3 meters

diameter. Since 2017, the TERUTI survey's sample is drawn from a 250 m by 250 m grid (into the EPSG:

3035 coordinate system) which includes around 8,8 million covering the whole French metropolitan

territory. Each of these points is classified into 11 land cover categories (the strata) on the basis of

geomatic intersection with administrative and topographical geo-databases (see below). Beyond the

geographical characteristics of the point (i.e. its GPS coordinates, the values of the corresponding

NUTS3, NUTS2, NUTS1 and cities of France), some specific information is added to each point; in

particular the elevation, the distance to the nearest road, the population density in the most internal 1

km², etc.

The 11 strata of TERUTI master sample are:

S1-Water areas

S2-Artificial areas (built-up and non-built up)

S3-Agricultural land presently registered in the RPG (French LPIS)

S4-Land parcel previously registered but presently out of the RPG

S51-Heart of forest (> 10 m)

S52-Edge of forest (< 10 m)

S6-Undescribed areas in urban areas with high population density (> 150 hab./km2)

S91-Other natural areas in urban and suburban agglomerations

S92-Other natural areas in sparsely populated suburban or touristic rural areas

S93-Other natural areas in remote rural areas

S100- High elevation or hard-to-reach areas (to be photo-interpreted)

Page 53: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 53 | 75

Then the sample master is split into three parts:

strata S1, S2, S3, S51 are not surveyed in the field; for these 7.3 million points (82 % of the grid),

the land cover and the land use information is imputed from the available administrative and

geographical databases. These databases may be incomplete or out-of-date, thus there is

possibility to update it during surveys in the field.

strata S4, S52, S6, S91, S92, S93 are sampled and surveyed in the field; A sample of 200 000

points, out of 1.3 million, is randomly selected by strata and allocated by department (NUTS3).

The data field collection is carried out on a 3-year cycle: 68 000 points are visited by surveyors

every year of the cycle in order to determine the land cover and land use at a detailed level. In

the next 3-year cycle, the same points will be re-visited in order to provide a good estimation

of the changes rate of land cover and land use.

strata S100 is sampled and photo-interpreted; A sample of 45 000 points, out of 0.2 million, is

randomly selected and allocated by department (NUTS3). Photo-interpretation is also carried

out on a 3-year cycle with an annual sample of 5 000 points.

The questionnaires used by surveyors to collect data on the field are the following:

Land cover (on a 3 m diameter circle)

Artificial land

C111-Building with one to three floors

C112-Building with more than three floors

C113-Artificial non built-up impervious (coated) area

C121-Artificial non built-up pervious (stabilized, compacted) area

C122-Heterogeneous and artificial coverage area

Bare land

C211-Rock, cliff

C212-Sand, stones

C213-Other bare soil

Water land C221-Water area

C222-Glaciers, permanent snow

Cropland

C311-Annual crop

C312-Fruit and vegetables (excl. Fruit tree)

C313-Fruit tree and small fruit

C314-Vine

C315-Other permanent crop : ornemental, aromatic,...

Grassland

C411-Temporary or artificial meadow

C412-Natural or permanent pasture

C413-Fallow

C414-Agricultural grass strip

C415-Other grassland (with no agricultural use)

Woodland C510-Woodland

Shrubland

C521-Shrubland

C522-Low shrub hedge (linear or organized)

C523-Other woodland

Page 54: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 54 | 75

Environment (features) of the point

Artificial land M110-Artificial linear feature

M120-Artificial area feature

Water land

M211-Inland water body

M212-Inland running water

M221-Intertidal area

M222-Salines

M223-Coastal water body

Agricultural land

M310-Greenhouse

M320-Field surrounded by a hedge

M330-Open field (without border)

M340-Agro-forestry

Woodland

M410-Vegetation with sparse tree cover

M420-Woody hedge and other line of trees

M430-Grove

M441-Open forest

M442-Closed forest

Land use

Primary sector

U11-Agriculture

U13-Fishing

U14-Forestry

U15-Mining, quarrying and other primary production

Secondary sector U20-Industry, manufacturing, energy production and transport

Tertiary sector and residential

U31-Nature preservation, recreation, leisure, sport

U32-Commercial, service and other tertiary activities

U33-Transport, communication networks, storage, protective works

U34-Residential

U91-Unused or abandoned area

U99-Construction site or unknown use

The final statistical estimates are based on the weights derived from the sample master (imputation),

the observations collected on the field (field sample) and photo-interpretation.

TERUTI statistical process involves massive data imputation (for about 7.3 million points) from

administrative and geographical databases.

Administrative database:

the Graphical Parcel Register (RPG) derived from the Integrated Administration and Control

System (IACS) of the Common Agricultural Policy (CAP). The RPG is a geographic information

system for the identification of agricultural parcel. It consists of about 9,000,000 graphic

objects, parcels, covering the French territory of metropolis and overseas. It compiles data from

Page 55: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 55 | 75

agricultural area declarations made by farmers to receive aid from the Common Agricultural

Policy, especially identification of farms and agricultural parcel with the type of culture reported

for each parcel: grains, oilseeds, protein crops, vegetables, fruits, grasslands, etc. Introduced in

France from 2002, the RPG is updated annually.

the RPG is administered by the Service and Payment Agency (ASP) which is the only service

authorised to distribute the full RPG to the applicants. The ASP is a French public institution

created on 2009 whose mission is to contribute to the implementation of public policies, both

national and European. It pays almost all European aid to recipient agricultural farmers of the

Common Agricultural Policy (CAP).

Geographical databases:

the BD TOPO® from the National Geographical Institute (IGN) is a 3D vectoral description

(structured in objects) of the elements of the territory and its infrastructures, with metric

precision. It covers all the geographical and administrative entities of the national territory. The

BD TOPO® objects are grouped by theme: road network, rail network, energy transport

network, river system, buildings, wooded vegetation, etc. The 3D production process provides

the altimetry of the objects, as well as the height of the buildings. Many themes are

continuously updated; for instance: road network delay < 6 months, railway network delay = 1

year, energy transport system, buildings and river system update follows the cycle of aerial

photography update (3-4 years). The BD TOPO® is produced twice a year (semi-annual edition).

It covers all French departments (NUTS3) including overseas departments as well as the

overseas collectivities in Saint-Pierre-et-Miquelon, Saint-Barthélemy and Saint-Martin.

the BD FORET® is a reference vector database for forest space and semi-natural environments

(woodland, shrubland, grassland). It is the geographical reference framework for the

description of forest species. The objects in the BD FORET® are defined by an area greater than

or equal to 5,000 m2 (50 acres), according to the following thresholds: excluding areas where

the use of land is exclusively agricultural, width of at least 20 metres, vegetation cover rate of

10% or more. It is produced by photo-interpretation of colour infrared aerial images covering

all the departments (NUTS3) of the metropolitan territory. The outer limits of forest surfaces

are obtained by automatic segmentation of the image and, therefore, are based on the outer

limit of the treetop. The positioning accuracy of the limits is less than 10 meters for the external

contours of forest surfaces. BD Forêt® version 2 has been available throughout the metropolitan

area since December 2018.

There is a free access to all IGN data for the government services and its public administrative

establishments as well as for the education and research activities. Private sector can access

IGN data by purchasing user licenses.

Page 56: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 56 | 75

6.1.2. Stage 1

Test site definition

The comparison works between «in situ» land cover data from TERUTI and remote-sensing land cover

data from OSO will be implemented over the whole of the french metropolitan territory.

Data collection

Sentinel-2 images

Level 2A products can be downloaded free of charge from the Theia thematic cluster platform

(http://www.theia-land.fr/en/produits/reflectance-sentinelle-2). For the OSO process, they are stored

in the CNES high performance computing cluster.

In-situ land cover data collection in TERUTI survey

Each year, 68,000 points of the TERUTI sample are observed in the field by about 600 surveyors

recruited and trained by the regional statistical services of the Ministry of Agriculture. These points are

optimally allocated by strata and by NUTS3 french «departments» in order to maximize the accuracy of

the statistical estimators (like artificialization rate) at the NUTS3 geographical scale. Each surveyor have

to collect land cover and land use information on about 100 TERUTI points between June and

September, and should be able to collect about 10 points per day.

Each surveyor has a paper questionnaire to note land cover and land use at the exact location of the

Teruti point. The conditions of access and observation of the point (distance, visibility, environment)

must also be noted by the surveyor. Getting to the exact location of the plot has a great importance

regarding to the quality of the survey results. For this, the surveyor has the GPS coordinates of the point

to be surveyed, an aerial photograph at 1/25000 scale and a recent map at 1/25000 and 1/10000 scale.

A Spot satellite image from the previous year is also provided to the surveyor in case the aerial photo is

too old. Furthermore, surveyors can use a file of points to observe in KML format in order to

automatically geolocate the plot with a GPS device or a smartphone. *

Once the points have been visited and the questionnaires have been filled in, the surveyors have to

enter the collected data in a computer application for data entry and control. Once data entry is

validated by the application, the surveyor can upload the data to a centralized computer server. Thus,

the collection phase is continuously monitored and supervised by the regional statistical services and

by the central statistical service. A collection assessment is carried out by each regional statistical service

at the end of the operation.

6.1.3. Stage 2

Data pre-processing

Teruti-OSO comparison

An initial comparison analysis between the Teruti database and the OSO product was carried out. In

order to identify the differences in area by land cover class between the two methods and thus highlight

the most different classes. The comparison is based on the products of the year 2017, specifically on

the 13 departments (NUTS3) of the Occitanie region (NUTS2) in the south of France. In order to do this,

it was necessary to harmonize the two classifications by grouping classes until a nomenclature of 7

classes was obtained (Table 6.1).

Page 57: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 57 | 75

Table 6.1 Harmonization of the Teruti and OSO classifications 2017.

TERUTI OSO

Artificial impervious land

Artificial permeable land

Cropland

Agricultural permanent and temporary grassland

Non-agricultural grassland

Shrubland

Forest

Other wooded land

Bareland

Water areas

Continuous Urban Fabric

Discontinuous Urban Fabric

Industrial and commercial units

Road surfaces

Annual summer crops

Winter summer crops

Orchards

Vineyards

Intensive grasslands

Natural grasslands

Broad-leaved forests

Coniferous Forests

Woody moorlands

Beaches, dunes and sand

Bare rocks

Water bodies

Glaciers and perpetual snow

TERUTI-OSO

Artificial land

Cropland

Agricultural grassland (ex: meadows, pastures)

Non-agricultural grassland (ex: lawns)

Shrubland

Woodland

Other natural land (Bareland and water area)

Once the nomenclature of comparison was defined, the calculation of the areas by new classes by

department was carried out for each product. In the case of the Teruti database, information on areas

in ha by department is provided. As for the OSO map, pixel counts on the raster were performed and

the area per department was calculated (1 pixel = 0.01 ha).

The results presented in Figure 6.3 show that there is a medium-small gap on artificial land areas in

Occitanie between TERUTI and OSO. It is around + 2% in OSO estimation for the whole NUTS2 region

(+135,000 ha), and larger in the urbanized departments. This is due to the detection of the urban fabric

by OSO related to the spatial resolution of the sentinel2 images (10 m), when the Teruti observation is

based on a 3 m diameter circle. The average difference is small on cultivated areas in Occitanie (+1.2%

or +91,000 ha) with variable differences according to the departments (+7.5% in 82-Tarn-et-Garonne

and -1.8% in 12-Aveyron). The difference on schrubland areas of -4% for OSO (- 275,000 ha) is probably

explained by a difference in the semantic precision of the classes of the two methods; as well as by

confusion with other classes (meadow, lawn, other wooded soils) during remote-sensing classification

(OSO) or field identification (TERUTI). For the woodland areas, the difference is about - 8% for OSO (-

580,000 ha). It may be due to the use of national forest inventory data in the TERUTI process. Concerning

agricultural grassland, the difference of -7% for OSO (- 500,000 ha) may also be due to a very frequent

confusion with other classes of grassland (heaths, lawns) by the remote-sensing classification. It is also

possible that the administrative source used by Teruti (RPG – CAP Declaration) has reporting bias.

Finally, the largest gap between TERUTI and OSO occurs on non-agricultural grassland (lawns): +16% for

Page 58: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 58 | 75

OSO (+1 150 000 ha), which partially compensates for the differences on grasslands and schrublands.

On the other hand, if the Teruti-OSO comparison were made with another class grouping and with more

accurate class-definitions, the gap would be smaller. So, it should be recalled that each of these two

products is based on completely different spatial methods and resolutions that may be complementary

in order to identify a change in the land cover at two different times.

Figure 6.3 Land cover by NUT3 department in Occitanie.

TERUTI-oriented classification (work in progress)

The previous results suggest experimenting with an automatic classification of Sentinel-2 images to

automatically assign land cover information to TERUTI points. It is therefore necessary to use iota² to

establish a land cover classification based on the following nomenclature:

Artificial impervious land

Artificial permeable land

Bareland

Water areas

Cropland

Agricultural permanent and temporary grassland

Non-agricultural grassland

Forest

Schrubland

The assignment of TERUTI points takes place in the spring of the reference year (automatic imputation

and field campaign). It is therefore proposed to classify the Sentinel-2 images from the previous year

until March of the current year (from March n-1 to April n). Two different scenarios of classification are

planned:

classification model calibrated from TERUTI points (one point corresponding to one Sentinel-2

pixel),

Page 59: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 59 | 75

classification model calibrated from filtered OSO training dataset (Inglada et al. 2017). The

filtering of OSO training polygons will be done using TERUTI points.

This experimental phase will be conducted in a test region, namely the Occitanie region (NUTS2) in the

south of France.

At the end of this phase, a new contingency analysis will be carried out again between these new

classifications obtained and the TERUTI points.

6.1.4. Stage 3

Main data processing

The main part of the case study 7 has not yet begun. It will consist in a deeper analysis of differences

between TERUTI and OSO products at individual levels of TERUTI points and OSO pixels. Then some

methods of detecting changes will be tested but depending on their implementation, their

performances and more broadly theirs relevancies, some may ultimately not be retained.

As described above, the contingency between OSO product and TERUTI points is highly variable

depending on the land use/cover classes. These errors have several causes:

automatic imputation from databases that are not always up to date, for example an IGN forest

database, often describes a forest cover that existed a few years earlier.

classification error due to the spatial resolution of the Sentinel-2 image which integrates several

spectral responses, where the collector analyses within a radius of 3 m (except for vegetation).

However, higher accuracy is observed for high-level classes (LESQUELLES) at a higher scale (zonal

statistics / indicator scale). The land cover/use changes identified from these high-level classes could

allow inter-annual changes to be spatialized with greater thematic and spatial accuracy.

It is therefore important to understand at what spatial scale and with what thematic precision is remote

sensing from time series Sentinel 2 images relevant for TERUTI issues?

The methodological proposal to be conducted is based on the experimentation of different methods of

detecting land cover/use changes between 2017 and 2019 on the field-based TERUTI points (~70 000

points):

Spectral/Index method

Analyse automatically detected changes and qualify them by photo-interpretation

o urbanization / artificialization

o deforestation (clear cut)

o intensification

o abandonment

Post-classification detection (around TERUTI points)

o classification with high-level classes on 2 successive years (2016 – 2018)

direct changes

classes and changes probabilities

Changes classification

Validation with 2020 TERUTI survey on summer 2020 (changes from 2017 to 2020).

Page 60: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 60 | 75

6.2. Case study 8 - Land cover maps at very detailed scale

6.2.1. Pre-works

State of the Art

Timely and frequently updated Land Cover (LC) information is of paramount importance to modern

National Statistical Institutes (NSI). Since most of – if not all – the facts and events surveyed by NSIs take

place somewhere in the national territory, LC information is structurally complementary to survey and

administrative data. High quality LC data and statistics can lead to a wider and deeper understanding of

many phenomena of interest.

As far as Europe is concerned, two flagship LC projects exist: CORINE (Bossard, Feranec, and Otahel

2000; Büttner 2014), currently run by the Copernicus Program, and LUCAS (Bettio et al. 2002; EUROSTAT

2003), managed by Eurostat. Despite these projects address the study of land cover very differently –

CORINE in a cartography (i.e. full-coverage) perspective, LUCAS in a statistical estimation (i.e. sample

survey) perspective – they suffer common shortcomings. Both are very costly, have very complex

production pipelines, rely heavily on clerical work, and produce their outputs with a rather low time

frequency. Most of the shortcomings affecting CORINE and LUCAS depend on the huge amount of

human workload they require. It is, therefore, very tempting to try to overcome these shortcomings

through process automation. Given an input satellite image depicting a portion of territory, a fully

automatic system should ideally be able to (i) classify the territory according to some standard LC

taxonomy, and to (ii) quantify the area (or the proportion) of territory covered by each LC class, without

any human intervention.

The Italian National Institute of Statistics (Istat) is currently investigating whether Deep Learning

(Goodfellow, Bengio, and Courville 2016) methods could be used to derive automated Land Cover

estimates of satisfactory quality from Sentinel-2 satellite images. A prototype software system is being

developed within the scope of this research. The present case study about “land cover maps” focuses

on a very relevant, though quite specific, output artefact of the system.

Methodology

Istat research goal is to design and develop an automatic LC estimation system. Such a system should

be able to take as input a satellite image depicting a portion of territory, and to return as output a table

of LC statistics.

Although LC estimation is a quantification problem rather than a classification one, we decided to

implement our system according to a ‘classify-and-count’ design. The main driver of this design choice

was to incorporate into our system a Convolutional Neural Network (CNN)1, so as to take advantage of

its tremendous performance in image classification tasks. Without going into technical details, our

classify-and-count design can be summarized as follows:

0. Train a CNN to predict the LC class of a satellite image ‘tile’ (i.e. a small, fixed-size sub-image).

1. Divide the satellite images covering a ‘target area’ (i.e. the territory for which LC statistics have

to be computed) into tiles.

2. Use the trained CNN to predict the LC class of all the generated tiles.

1 CNNs (LeCun et al. 1989; LeCun and Bengio 1995) are cutting edge Deep Learning architectures that have recently reached superhuman accuracy in many Computer Vision tasks and whose topology was originally inspired by the organization of the visual cortex of mammals.

Page 61: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 61 | 75

3. Obtain LC statistics for the target area by simply computing the relative frequencies of predicted

LC classes.

It ought to be clear that phases (1), (2), (3) have to be repeated each time LC statistics are requested for

a new target area, whereas the CNN’s training phase is carried out only once (whence the (0) index in

the list).

6.2.2. Stage 1

Test site & data collection

We tested our automatic LC estimation system on two sample satellite images. Both test images have

been cropped from Sentinel-2 products downloaded from Copernicus Open Access Hub. These products

are TCI (True Color Image) objects (Ledley, Buas, and Golab 1990) encoded in JPEG2000 format

(Christopoulos 2000). They represent two quite different Italian territories:

The first territory is a portion of Apulia that includes the city of Lecce. The corresponding image

crop is shown in Figure 6.4. For conciseness, we will call this crop ‘Lecce image’. The Lecce image

has a size of 2,496 x 3,008 pixels, therefore depicting a surface area of approximately 751 km2.

Note that this is just 3.8% of Apulia’s overall surface.

Figure 6.4 The ‘Lecce image’. This test image has been cropped from a Sentinel-2 TCI object taken on June 26th,

2016. The area of the depicted territory is about 751 km2, i.e. about 3.8% of Apulia.

The second territory is a portion of Tuscany that includes (part of) the city of Pisa. The

corresponding image crop is shown in Figure 6.5. For conciseness, we will call this crop

‘Pisaimage’. The Pisa image has a size of 3,008 x 1,472 pixels, therefore depicting a surface area

of approximately 443 km2. Note that this is just 1.9% of Tuscany’s overall surface.

Page 62: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 62 | 75

Figure 6.5 The ‘Pisa image’. This test image has been cropped from a Sentinel-2 TCI object taken on March 25th,

2019. The area of the depicted territory is about 443 km2, i.e. about 1.9% of Tuscany.

6.2.3. Stage 2

Pre-processing

We decided to adopt the EuroSAT dataset (Helber et al. 2019) as training set for our CNN. EuroSAT

contains 27,000 manually labelled image patches of size 64 x 64 pixels. These patches have been

cropped from carefully selected Sentinel-2 satellite images covering 34 European countries. EuroSAT

images are multispectral (all 13 Sentinel-2 bands are provided) but we have so far restricted our interest

to Red, Green and Blue bands only (i.e. to RGB color images). Since the resolution of Sentinel-2 images

in the R, G and B bands is 10 meters per pixel, each 64 x 64 EuroSAT patch represents a ground area of

6402 square meters, i.e. about 41 hectares. The LC classification according to which EuroSAT patches

have been manually labelled entails 10 classes: 1) ‘Annual Crop’, 2) ‘Forest’, 3) ‘Herbaceous Vegetation’,

4) ‘Highway’, 5) ‘Industrial’, 6) ‘Pasture’, 7) ‘Permanent Crop’, 8) ‘Residential’, 9) ‘River’, 10) ‘Sea &

Lake’. EuroSAT authors have defined this LC taxonomy following the principle that the patterns of each

class should be visible at the resolution of 10 meters per pixel. The dataset is roughly balanced with

respect to the 10 classes, as class cardinalities range from 2,000 to 3,000 patches.

Figure 6.6 below reports the labels of the 10 classes of EuroSAT’s land cover classification, along with a

convenience color palette that we will use in later sections for visualization purposes.

Figure 6.6 Labels of EuroSAT’s LC classification.

6.2.4. Stage 3

CNN Model Training and Accuracy

To implement the classification engine of the system, we are currently using a cutting-edge, highly

sophisticated CNN model named Inception-V3 (Szegedy et al. 2016), which we customized and trained

on the EuroSAT dataset. As far as the training stage is concerned, we randomly split the EuroSAT data

into training set and test set according to a 75/25 proportion. The generated training set and the test

set contain 20,250 and 6,750 image patches, respectively. Figure 6.7 below reports the Confusion Matrix

obtained contrasting the LC classes predicted by our best model and the true LC labels of the 6,750

image patches belonging to the test set.

Page 63: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 63 | 75

Figure 6.7 The Confusion Matrix obtained contrasting the LC classes predicted by our Inception-V3 model and the true LC labels of the 6,750 image patches belonging to the test set. The trace of the matrix gives the overall

number of exact predictions (i.e. 6,644), which implies an accuracy of 6,644/6,750, i.e. 98.43%.

Land Cover Estimation Algorithm

Once the CNN has been trained on the EuroSAT dataset, our automatic LC estimation system can be fed

with a satellite image and return LC statistics for the corresponding territory. To do so, a classify-and-

count algorithm is used, whose main logical steps can be summarized as follows:

i. The input Sentinel-2 image is split into a set of (possibly overlapping) tiles of size 64 x 64 pixels.

These tiles are generated by cropping the input image along a regular spatial grid, through a

‘sliding window’ algorithm.

ii. The trained CNN classifies one tile at a time and logically links the predicted LC class to the

corresponding area of the original image. The output of the whole process is a ‘classification

matrix’: each element of this matrix corresponds to a tile of the original image and stores its

predicted LC class.

iii. The area shares of each LC class for the whole territory depicted in the input satellite image is

estimated by the relative frequency of the corresponding label within the classification matrix.

iv. A moderate resolution land cover map of the territory depicted in the input satellite image is

obtained by rendering the classification matrix as a raster image.

The working mechanism of the sliding window algorithm mentioned in (i) is schematically illustrated in

Figure 6.8. Basically, a window of 64 x 64 pixels slides horizontally and vertically over the input image

with a stride (i.e. step length) of s pixels, starting from its upper-left corner. For each step of the window,

one tile is generated by cropping the area of the input image that is framed by the window. This way,

the algorithm actually produces a systematic spatial sample of tiles drawn from the input image. Note

that, since each generated tile corresponds to a specific area of the input image, the output sample has

an intrinsic geometrical structure. More specifically, the generated tiles are naturally arranged

according to a regular spatial grid (see Figure 6.8).

Page 64: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 64 | 75

Figure 6.8 Illustration of the sliding window algorithm. A convenience image of size 2D x 2D is split into tiles of size D x D. In the upper panel, the window slides horizontally and vertically with a stride of length D, giving rise to

4 non-overlapping tiles arranged according to a 2 x 2 grid. In the lower panel, the stride is reduced to D/2: this generates 9 partially overlapping tiles arranged along a 3 x 3 grid. Note that reducing the stride from D to D/2

allowed to resolve more image details: for instance the red ‘A’, which in the upper panel was not framed in any tile of the grid, now pops up in the central tile of the lower panel grid.

In (ii) the trained CNN is used to predict the LC class of all the tiles generated in (i). Note, incidentally,

that different tiles can be processed independently, allowing our system to take advantage of

high-performance parallel computing architectures (GPUs).

In (iii) the system calculates output LC statistics from the classification matrix. In accordance with our

classify-and-count approach, this is accomplished by simply computing class frequencies. If we indicate

with c a generic LC class and with fc the proportion of class c within the classification matrix, W is the

width in pixel of the input image and H is its height, then the corresponding area and area share are

estimated by:

{𝐴𝑟𝑒𝑎𝑐 = (𝑓𝑐 ∙ 𝑊 ∙ 𝐻) ∙ 100𝑚2

𝐴𝑟𝑒𝑎𝑆ℎ𝑎𝑟𝑒𝑐 = 𝑓𝑐 (6.1)

Note that in the upper equation of (5.1) we took into account that the resolution of the satellite images

processed by our system is 10 meters per pixel.

While the LC statistics calculated in (iii) have to be regarded as the main output of our system, a further

interesting artefact can be distilled, as a by-product, from the classification matrix. Indeed, as mentioned

in (iv), a moderate resolution land cover map can be produced by simply rendering the classification

matrix as a raster image. It is worth stressing that this is only possible because of the geometric structure

of the systematic spatial sample of tiles generated by the sliding window algorithm. Clearly, the smaller

the stride, the larger will be the dimension of the classification matrix and, therefore, the resolution of

the obtained land cover map.

Automated Land Cover Maps

We briefly analyse here the automated LC maps that our system generated from the Lecce and Pisa

images as by-products of LC estimation. Recall that our system produces LC maps by simply rendering

as a raster image the classification matrix computed for LC estimation. The success of this approach

entirely rests on the inherent spatial structure of the sample of tiles determined by the sliding window

algorithm (Section Land Cover Estimation Algorithm). Since both the LC estimates and the LC maps

produced by our system improve as the stride of the sliding window decreases, we provide here results

obtained by setting the stride to its minimum value of 1 pixel. This setting generated (39 x 47) = 1,833

tiles for the Lecce image, and (47 x 23) = 1,081 tiles for the Pisa image.

Page 65: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 65 | 75

Figure 6.9 shows the automated LC map obtained for the territory depicted in the Lecce image. The

adopted color legend is provided in Figure 6.6. Overall, the map exhibits a high degree of spatial

consistency, in that the main structures (urban centres, industrial areas, highways, crops and

vegetation) have been correctly detected and nicely reconstructed.

For instance, focusing on residential areas (brown pixels on the map) and comparing visually the map

with the original image, one can observe that the sizes, the shapes and the relative positions of cities

are all described fairly well by the map.

Figure 6.9 Automated LC map (right panel) of the territory depicted in the Lecce image (left panel). The color legend of the LC map is provided in figure 5.4.

Figure 6.10 shows the automated LC map obtained for the territory depicted in the Pisa image. The

color legend is again provided in Figure 6.6. The detected LC classes exhibit a much more complex

topology in this map than it happened in the previous one (Figure 6.9). This comes as no surprise, since

the Pisa image reveals at first sight a wider variety of structures that are arranged on the ground in a

much more intricate way. Nevertheless, most of these structures seem nicely reconstructed in the map

of Figure 6.10.

Figure 6.10 Automated LC map (right panel) of the territory depicted in the Pisa image (left panel). The color legend is provided in Figure 6.6

Page 66: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 66 | 75

7. Report on thematic task 4 - Settlements, Enumeration Areas and Forestry

7.1. Case study 9 - Update the INSPIRE Theme Statistical Units dataset and preventing forest

fire

7.1.1. Pre-works

State of the art

The use of cartography has supported census data collection at Statistics Portugal since 1981. In 1995,

Statistics Portugal started the preparation of the 2001 census cartography, which was named

“Geographic Information Referencing Base” (BGRI 2001) and was based on Geographic Information

Systems. Since 2006, with the production of the BGRI 2011 to support the 2011 census, Statistics

Portugal has been developing a Spatial Data Infrastructure (SDI) and carrying out other statistical

activities in a permanent effort to introduce the spatial perspective across the different phases of

statistical production.

Aiming to support the Census 2021, Statistics Portugal is conducted an internal work to create an update

version of the small statistics units.

The SDI is currently being used, in a transversal way, at Statistics Portugal activities, promoting the

integration of the spatial component in the statistical production process, in order to achieve efficiency

and accuracy, within several domains such as the sampling process, the data collection or the

dissemination of statistical information.

Statistics Portugal has only little experience using Copernicus data.

This context is the framework that supports the actions of the Portugal in the present case study.

Literature review

The literature review focused on general literature about the use of remote sensing images and specific

literature related to our task specifically Machine Learning and deep learning theory (Al-Obeidat et al.

2015; Carfagna and Gallego 2006; Chasmer et al. 2014; Crammer and Singer 2002; Ghosh, Mishra, and

Ghosh 2011; Han and Liu 2015; Hansen et al. 2000; Hawkins 2004; Huang and Zhang 2013; Jackson and

Landgrebe 2002; Lawrence, Giles, and Tsoi 1997; Y. Li et al. 2014; Piotrowski and Napiorkowski 2013;

Santaella 2019; Sun et al. 2019).

Tools/Data

CLC 2018: https://land.copernicus.eu/pan-european/corine-land-cover/clc2018

COS 2015, national land cover map:

http://www.dgterritorio.pt/cartografia_e_geodesia/cartografia/cartografia_tematica/cartogra

fia_de_uso_e_ocupacao_do_solo__cos_clc_e_copernicus_/

DIAS (Data and Information Access Services): https://www.copernicus.eu/en/access-data/dias

ESA Science Toolbox Exploitation Platform: http://step.esa.int/main/

Getting started with Google Earth engine: https://developers.google.com/earth-

engine/getstarted

GHSL - Global Human Settlement Layer, https://ghsl.jrc.ec.europa.eu/

SDG Monitoring and Reporting Toolkit for UN Country Teams. Monitoring and Reporting the

SDGs | LAND CONSUMPTION; https://unhabitat.org/wp-content/uploads/2019/02/Indicator-

11.3.1-Training-Module_Land-Consumption_Jan-2019.pdf

The Urban Mapper / Trends Earth (http://trends.earth/docs/en/): Open source tool based on

Google Earth Engine for tracking land use change.

Page 67: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 67 | 75

Wekeo Platform: https://www.wekeo.eu/

Zonal statistics function, as referred on the site of Python GDAL/OGR cookbook :

https://pcjericks.github.io/py-gdalogr-cookbook/raster_layers.html#calculate-zonal-statistics

Statistical product definition

The output of this case study will be an updated version of the geography of the Settlements and

Enumeration Areas for Census 2021 and an analysis on the possibility to produce an indicator of the

total of eucalyptus plantation.

Data source & toolkit

Research on the possibilities of using the cloud based sources for data processing as referred on the

DIAS website were made. We registered as beta tester for one of the clouds based platforms, WEkEO,

however no tests with this platform have been executed. The advantages of this platform in relation to

google earth engine must be evaluated.

Registered as user for Google Earth engine, some small tests were made with some of the online java

scripts samples. This platform looks promising, since it is used for several applications, for example the

“The Urban Mapper / Trends Earth” tools.

Test of the SNAP Desktop software, using local data.

7.1.2. Stage 1

Test site definition

For the first action, update the INSPIRE Theme Statistical Units dataset, we intend to apply for the

mainland territory of Portugal. For the second action the investigated area will be a NUTS III area.

Data collection

For the first action data from the following sources were used:

National reference thematic map for Land Cover Land Use in Portugal (2015)

CORINE Land Cover / CLC 2018

GHSL - Global Human Settlement Layer 2014

7.1.3. Stage 2

Data pre-processing

For the update of the INSPIRE Theme Statistical Units dataset the aim is to obtain the following results:

Potential urban areas within residual subsections

Degree of imperviousness of the 2011 enumeration areas using GHSL data

Treatment for COS2015 and CLC18 data

Download data

Selection of urban areas

Pre-processing of GHSL data

Download of datasets, GHS_BUILT 38 m2 and 250 m2

Clip of the images for the territory of continental Portugal

Page 68: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 68 | 75

For the analysis of the forest and the eucalyptus plantation the aim is to study a georeferencing

methodology for eucalyptus areas in Portugal mainland by municipality, using Artificial Intelligence

algorithms in particular Machine Learning and Deep Learning on Copernicus satellite images, namely

Sentinel-2.

For test and training the models data from the COS will be used with some field work to validate the

eucalyptus areas. COS is a Land use and soil occupation map, it is a product with a minimum unit of 1

hectare and a minimum line distance of 20 meters, published for the reference years 1995, 2007, 2010

and 2015. It is a map of polygons representing homogeneous land use / occupation units. It will be used

the 2015 version.

In ArcGIS the algorithms Maximum Likelihood, Random Trees, Support Vector Machine and Forest

Based Classification and regression are already implemented and can be used to extract eucalyptus

areas from earth observation images.

Deep Learning is one of the methods for feature extraction and classification. Deep Learning models are

capable of learning to focus on the right features by themselves and requires little guidance from the

programmer. Basically, deep learning mimics the way our brain functions (Figure 7.1).

Figure 7.1 The deep learning workflow includes Create training samples, Train model and Perform inference, all can be used in ArcGis Pro to identify Eucalyptus parcels.

7.1.4. Stage 3

Main data processing

Processing executed for update of the INSPIRE Theme Statistical Units dataset

Selection of residential enumerations areas

Selection of areas with urban classification from the CLC18 and COS2015 data

Overlay of the datasets and selection urban areas within residential enumeration areas

The work to obtain the degree of imperviousness of the 2011 enumeration areas using GHSL data is still

in process. Until now the following activities have been executed:

Selection of study area

Development of a script in python 2.7 making use of the capabilities of gdal, ogr and numpy

libraries to obtain zonal statistics for image data for polygons. This script is based on the

examples listed in the Python GDAL/OGR cookbook.

Some more work has to be done to analyse what will be the value of this kind of data.

Results analysis

Only preliminary results have been obtained concerning the identification of potential urban areas using

COS2015 and CLC18 data. The following images (Figure 7.2, Figure 7.3) show some results:

Page 69: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 69 | 75

Figure 7.2 New artificial surfaces outside localities using CLC18

Figure 7.3 New artificial surfaces outside localities using COS2015

Page 70: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 70 | 75

8. Report on the meetings

Within WPH of ESSnet Big Data II were held four meetings. Three of them were organised by WebEx

platform. One internal face to face meeting took place in Statistical Office in Olsztyn (Poland). The list

of meetings is below in Table 8.1

Table 8.1 List of meeting within WPH.

No Date Type of meeting

1 28 Feb 2019 WPH WebEx Meeting 1

2 09 May 2019 WPH WebEx Meeting 2

3 26-28 June2019 Meeting 3 in Olsztyn

4 12 Sep 2019 WPH WebEx Meeting 4

Overview of the meetings at the level of WPH Earth observation can be found on Wiki

(https://webgate.ec.europa.eu/fpfis/mwikis/essnetbigdata/index.php/WPH_Meetings).

9. Bibliography

Al-Obeidat, Feras, Ahmad T. Al-Taani, Nabil Belacel, Leo Feltrin, and Neil Banerjee. 2015. “A Fuzzy Decision Tree for Processing Satellite Images and Landsat Data.” In Procedia Computer Science. https://doi.org/10.1016/j.procs.2015.05.157.

“AppEEARS.” n.d. Accessed September 19, 2019. https://lpdaacsvc.cr.usgs.gov/appeears/. Atzberger, Clement. 2013. “Advances in Remote Sensing of Agriculture: Context Description, Existing

Operational Monitoring Systems and Major Information Needs.” Remote Sensing. https://doi.org/10.3390/rs5020949.

Bargiel, Damian. 2017. “A New Method for Crop Classification Combining Time Series of Radar Images and Crop Phenology Information.” Remote Sensing of Environment. https://doi.org/10.1016/j.rse.2017.06.022.

Bargiel, Damian, Felix Neuendorf, Michael Schlund, and Uwe Soergel. 2014. “Classification of Crops in Different European Regions Based on TerraSAR-X Data.” In 10TH EUROPEAN CONFERENCE ON SYNTHETIC APERTURE RADAR (EUSAR 2014).

Belgiu, Mariana, and Lucian Drăgu. 2016. “Random Forest in Remote Sensing: A Review of Applications and Future Directions.” ISPRS Journal of Photogrammetry and Remote Sensing 114: 24–31. https://doi.org/10.1016/j.isprsjprs.2016.01.011.

Bernardis, Caleb De, Fernando Vicente-Guijalba, Tomas Martinez-Marin, and Juan M. Lopez-Sanchez. 2016. “Contribution to Real-Time Estimation of Crop Phenological States in a Dynamical Framework Based on NDVI Time Series: Data Fusion with SAR and Temperature.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. https://doi.org/10.1109/JSTARS.2016.2539498.

Bettio, M, J Delincé, P Bruyas, W Croi, and G Eiden. 2002. “Area Frame Surveys: Aim, Principals and Operational Surveys. Building Agri-Environmental Indicators, Focussing on the European Area Frame Survey LUCAS,” 12–27.

Bossard, M, J Feranec, and J Otahel. 2000. “CORINE Land Cover Technical Guide: Addendum 2000.” Breiman, Leo. 2001. “Random Forests.” Machine Learning 45 (1): 5–32.

https://doi.org/10.1017/CBO9781107415324.004. Büttner, G. 2014. “CORINE Land Cover and Land Cover Change Products. In Land Use and Land Cover

Mapping in Europe,” 55–74. Carfagna, Elisabetta, and F. Javier Gallego. 2006. “Using Remote Sensing for Agricultural Statistics.”

International Statistical Review. https://doi.org/10.1111/j.1751-5823.2005.tb00155.x. Chasmer, L., C. Hopkinson, T. Veness, W. Quinton, and J. Baltzer. 2014. “A Decision-Tree Classification

for Low-Lying Complex Land Cover Types within the Zone of Discontinuous Permafrost.” Remote Sensing of Environment. https://doi.org/10.1016/j.rse.2013.12.016.

Page 71: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 71 | 75

Christopoulos, Charilaos. 2000. “The Jpeg2000 Still Image Coding System: An Overview.” IEEE Transactions on Consumer Electronics 46 (4): 1103–27. https://doi.org/10.1109/30.920468.

Corbane, Christina, Martino Pesaresi, Panagiotis Politis, Vasileios Syrris, Aneta J. Florczyk, Pierre Soille, Luca Maffenini, et al. 2017. “Big Earth Data Analytics on Sentinel-1 and Landsat Imagery in Support to Global Human Settlements Mapping.” Big Earth Data 1 (1–2): 118–44. https://doi.org/10.1080/20964471.2017.1397899.

Cortes, Corinna, and Vladimir Vapnik. 1995. “Support-Vector Networks.” Machine Learning 20 (3): 273–79. https://doi.org/10.1023/A:1022627411411.

Crammer, and Singer. 2002. “On the Algorithmic Implementation of Multiclass Kernel-Based Vector Machines.” Journal of Machine Learning Research - JMLR.

Csillik, Ovidiu, Mariana Belgiu, Gregory P. Asner, and Maggi Kelly. 2019. “Object-Based Time-Constrained Dynamic Time Warping Classification of Crops Using Sentinel-2.” Remote Sensing. https://doi.org/10.3390/rs11101257.

Demarez, Valérie, Florian Helen, Claire Marais-Sicre, and Frédéric Baup. 2019. “In-Season Mapping of Irrigated Crops Using Landsat 8 and Sentinel-1 Time Series.” Remote Sensing. https://doi.org/10.3390/rs11020118.

Didan, Kamel. 2015. “MOD13Q1 - MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid.” Nasa Lp Daac. https://doi.org/10.5067/MODIS/MOD13Q1.006.

Dimitrov, Petar, Qinghan Dong, Herman Eerens, Alexander Gikov, Lachezar Filchev, Eugenia Roumenina, and Georgi Jelev. 2019. “Sub-Pixel Crop Type Classification Using PROBA-V 100 m NDVI Time Series and Reference Data from Sentinel-2 Classifications.” Remote Sensing. https://doi.org/10.3390/rs11111370.

Drusch, M., U. Del Bello, S. Carlier, O. Colin, V. Fernandez, F. Gascon, B. Hoersch, et al. 2012. “Sentinel-2: ESA’s Optical High-Resolution Mission for GMES Operational Services.” Remote Sensing of Environment 120: 25–36. https://doi.org/10.1016/j.rse.2011.11.026.

Ehrlich, D., T. Kemper, M. Pesaresi, and C. Corbane. 2018. “Built-up Area and Population Density: Two Essential Societal Variables to Address Climate Hazard Impact.” Environmental Science and Policy 90: 73–82. https://doi.org/10.1016/j.envsci.2018.10.001.

EUROSTAT. 2003. “The Lucas Survey - European Statisticians Monitor Territory. Office for Official Publications of the European Communities.”

Feng, Siwen, Jianjun Zhao, Tingting Liu, Hongyan Zhang, Zhengxiang Zhang, and Xiaoyi Guo. 2019. “Crop Type Identification and Mapping Using Machine Learning Algorithms and Sentinel-2 Time Series Data.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. https://doi.org/10.1109/jstars.2019.2922469.

“Folium.” n.d. Accessed September 19, 2019. https://python-visualization.github.io/folium/. Ghosh, Ashish, Niladri Shekhar Mishra, and Susmita Ghosh. 2011. “Fuzzy Clustering Algorithms for

Unsupervised Change Detection in Remote Sensing Images.” Information Sciences. https://doi.org/10.1016/j.ins.2010.10.016.

Gislason, Pall Oskar, Jon Atli Benediktsson, and Johannes R. Sveinsson. 2006. “Random Forests for Land Cover Classification.” Pattern Recognition Letters 27: 294–300. https://doi.org/10.1016/j.patrec.2005.08.011.

Gómez, Salvador, Sanz, and Casanova. 2019. “Potato Yield Prediction Using Machine Learning Techniques and Sentinel 2 Data.” Remote Sensing. https://doi.org/10.3390/rs11151745.

Goodfellow, I, Y Bengio, and A Courville. 2016. “Deep Learning.” MIT Press. GSARS. 2017. “Handbook on Remote Sensing for Agricultural Statistics.” http://gsars.org/wp-

content/uploads/2017/09/GS-REMOTE-SENSING-HANDBOOK-FINAL-04.pdf. Hagolle, O., G. Dedieu, B. Mougenot, V. Debaecker, B. Duchemin, and A. Meygret. 2008. “Correction of

Aerosol Effects on Multi-Temporal Images Acquired with Constant Viewing Angles: Application to Formosat-2 Images.” Remote Sensing of Environment 112: 1689–1701. https://doi.org/10.1016/j.rse.2007.08.016.

Hagolle, O., M. Huc, D. Villa Pascual, and G. Dedieu. 2010. “A Multi-Temporal Method for Cloud Detection, Applied to FORMOSAT-2, VENμS, LANDSAT and SENTINEL-2 Images.” Remote Sensing

Page 72: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 72 | 75

of Environment 114: 1747–55. https://doi.org/10.1016/j.rse.2010.03.002. Hagolle, Olivier, Mireille Huc, David Villa Pascual, and Gerard Dedieu. 2015. “A Multi-Temporal and

Multi-Spectral Method to Estimate Aerosol Optical Thickness over Land, for the Atmospheric Correction of FormoSat-2, LandSat, VENμS and Sentinel-2 Images.” Remote Sensing 7: 2668–91. https://doi.org/10.3390/rs70302668.

Hagolle, Olivier, Sylvia Sylvander, Mireille Huc, Martin Claverie, Dominique Clesse, Cécile Dechoz, Vincent Lonjou, and Vincent Poulain. 2015. “SPOT-4 (Take 5): Simulation of Sentinel-2 Time Series on 45 Large Sites.” Remote Sensing 7: 12242–64. https://doi.org/10.3390/rs70912242.

Han, Min, and Ben Liu. 2015. “Ensemble of Extreme Learning Machine for Remote Sensing Image Classification.” Neurocomputing. https://doi.org/10.1016/j.neucom.2013.09.070.

Hansen, M. C., R. Sohlberg, R. S. Defries, and J. R.G. Townshend. 2000. “Global Land Cover Classification at 1 Km Spatial Resolution Using a Classification Tree Approach.” International Journal of Remote Sensing. https://doi.org/10.1080/014311600210209.

Hawkins, Douglas M. 2004. “The Problem of Overfitting.” Journal of Chemical Information and Computer Sciences. https://doi.org/10.1021/ci0342472.

Helber, P, B Bischke, A Dengel, and D Borth. 2019. “A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

Hennig, Ernest I., Christian Schwick, Tomáš Soukup, Erika Orlitová, Felix Kienast, and Jochen A.G. Jaeger. 2015. “Multi-Scale Analysis of Urban Sprawl in Europe: Towards a European de-Sprawling Strategy.” Land Use Policy, 483–98. https://doi.org/10.1016/j.landusepol.2015.08.001.

Hennig, Ernest I., Tomáš Soukup, Erika Orlitová, Christian Schwick, Felix Kienast, and Jochen A.G. Jaeger. 2016. “Annexes 1-5: Urban Sprawl in Europe. Joint EEA-FOEN Report.” Luxembourg. https://doi.org/10.2800/143470b.

Huang, Xin, and Liangpei Zhang. 2013. “An SVM Ensemble Approach Combining Spectral, Structural, and Semantic Features for the Classification of High-Resolution Remotely Sensed Imagery.” IEEE Transactions on Geoscience and Remote Sensing. https://doi.org/10.1109/TGRS.2012.2202912.

Hütt, Christoph, and Guido Waldhoff. 2018. “Multi-Data Approach for Crop Classification Using Multitemporal, Dual-Polarimetric TerraSAR-X Data, and Official Geodata.” European Journal of Remote Sensing. https://doi.org/10.1080/22797254.2017.1401909.

Ienco, DIno, Raffaele Gaetano, Claire Dupaquier, and Pierre Maurel. 2017. “Land Cover Classification via Multitemporal Spatial Data by Deep Recurrent Neural Networks.” IEEE Geoscience and Remote Sensing Letters. https://doi.org/10.1109/LGRS.2017.2728698.

Inglada, Jordi, Marcela Arias, Benjamin Tardy, Olivier Hagolle, Silvia Valero, David Morin, Gèrard Dedieu, et al. 2015. “Assessment of an Operational System for Crop Type Map Production Using High Temporal and Spatial Resolution Satellite Optical Imagery.” Remote Sensing 7: 12356–79. https://doi.org/10.3390/rs70912356.

Inglada, Jordi, Arthur Vincent, Marcela Arias, Benjamin Tardy, David Morin, and Isabel Rodes. 2017. “Operational High Resolution Land Cover Map Production at the Country Scale Using Satellite Image Time Series.” Remote Sensing 9: 95. https://doi.org/10.3390/rs9010095.

Jackson, Qiong, and David A. Landgrebe. 2002. “Adaptive Bayesian Contextual Classification Based on Markov Random Fields.” IEEE Transactions on Geoscience and Remote Sensing. https://doi.org/10.1109/TGRS.2002.805087.

Jia, Kun, Qiangzi Li, Yichen Tian, Bingfang Wu, Feifei Zhang, and Jihua Meng. 2012. “Crop Classification Using Multi-Configuration SAR Data in the North China Plain.” International Journal of Remote Sensing. https://doi.org/10.1080/01431161.2011.587844.

Khatami, Reza, Giorgos Mountrakis, and Stephen V. Stehman. 2016. “A Meta-Analysis of Remote Sensing Research on Supervised Pixel-Based Land-Cover Image Classification Processes: General Guidelines for Practitioners and Future Research.” Remote Sensing of Environment 177: 89–100. https://doi.org/10.1016/j.rse.2016.02.028.

Lawrence, Steve, C. Lee Giles, and Ah Chung Tsoi. 1997. “Lessons in Neural Network Training: Overfitting May Be Harder than Expected.” In Proceedings of the National Conference on Artificial Intelligence.

Page 73: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 73 | 75

LeCun, Y., B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. 1989. “Backpropagation Applied to Handwritten Zip Code Recognition.” Neural Computation 1 (4): 541–51. https://doi.org/10.1162/neco.1989.1.4.541.

LeCun, Y, and Y Bengio. 1995. “Convolutional Networks for Images, Speech, and Time Series. The Handbook of Brain Theory and Neural Networks” 3361 (10): 1995.

Lecun, Yann, Yoshua Bengio, and Geoffrey Hinton. 2015. “Deep Learning.” Nature 521 (7553): 436–44. https://doi.org/10.1038/nature14539.

Ledley, RS, M Buas, and TJ Golab. 1990. “Fundamentals of True-Color Image Processing.” In 10th International Conference on Pattern Recognition, 791–95.

Li, Qingting, Cuizhen Wang, Bing Zhang, and Linlin Lu. 2015. “Object-Based Crop Classification with Landsat-MODIS Enhanced Time-Series Data.” Remote Sensing. https://doi.org/10.3390/rs71215820.

Li, Y., X. Zhu, Y. Pan, J. Gu, A. Zhao, and X Liu. 2014. “A Comparison of Model-Assisted Estimators to Infer Land Cover/Use Class Area Using Satellite Imagery.” Remote Sensing 6 (9): 9034–63.

Liu, Peng, Hui Zhang, and Kie B. Eom. 2017. “Active Deep Learning for Classification of Hyperspectral Images.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 10 (2): 712–24. https://doi.org/10.1109/JSTARS.2016.2598859.

Ma, Lei, Manchun Li, Xiaoxue Ma, Liang Cheng, Peijun Du, and Yongxue Liu. 2017. “A Review of Supervised Object-Based Land-Cover Image Classification.” ISPRS Journal of Photogrammetry and Remote Sensing 130: 277–93. https://doi.org/10.1016/j.isprsjprs.2017.06.001.

Ma, Lei, Yu Liu, Xueliang Zhang, Yuanxin Ye, Gaofei Yin, and Brian Alan Johnson. 2019. “Deep Learning in Remote Sensing Applications: A Meta-Analysis and Review.” ISPRS Journal of Photogrammetry and Remote Sensing 152: 166–77. https://doi.org/10.1016/j.isprsjprs.2019.04.015.

“Matplotib.” n.d. Accessed September 19, 2019. https://matplotlib.org/. Nabielek, K, D Hamers, and D Evers. 2016. “Cities in Europe.” PBL Publishers 521 (2470). Navarro, Ana, João Rolim, Irina Miguel, João Catalão, Joel Silva, Marco Painho, Zoltán Vekerdy, et al.

2017. “Regional Scale Cropland Carbon Budgets: Evaluating a Geospatial Agricultural Modeling System Using Inventory Data.” International Geoscience and Remote Sensing Symposium (IGARSS). https://doi.org/10.1016/j.jag.2017.04.009.

NRCan. 2018. “Image Classification and Analysis. Natural Resources Canada Governement of Canada.” 2018. https://www.nrcan.gc.ca/maps-tools-publications/satellite-imagery-air-photos/remote-sensing-tutorials/image-interpretation-analysis/image-classification-and-analysis/9361.

OCED. 2018. Rethinking Urban Sprawl. Rethinking Urban Sprawl. https://doi.org/10.1787/9789264189881-en.

Pedregosa, Fabian, Gael Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, et al. 2011. “Scikit-Learn: Machine Learning in Python.” Journal of Machine Learning Research 12: 2825–30.

Pelletier, C, S Valero, J Inglada, N Champion, and G Dedieu. 2016. “Assessing the Robustness of Random Forests to Map Land Cover with High Resolution Satellite Image Time Series over Large Areas.” Remote Sensing 187: 156–68.

Pelletier, Charlotte, Silvia Valero, Jordi Inglada, Nicolas Champion, and Gérard Dedieu. 2016. “Assessing the Robustness of Random Forests to Map Land Cover with High Resolution Satellite Image Time Series over Large Areas.” Remote Sensing of Environment 187: 156–68. https://doi.org/10.1016/j.rse.2016.10.010.

Peña-Barragán, José M., Moffatt K. Ngugi, Richard E. Plant, and Johan Six. 2011. “Object-Based Crop Identification Using Multiple Vegetation Indices, Textural Features and Crop Phenology.” Remote Sensing of Environment. https://doi.org/10.1016/j.rse.2011.01.009.

Pesaresi, Martino, Christina Corbane, Andreea Julea, Aneta J. Florczyk, Vasileios Syrris, and Pierre Soille. 2016. “Assessment of the Added-Value of Sentinel-2 for Detecting Built-up Areas.” Remote Sensing 8 (4). https://doi.org/10.3390/rs8040299.

Pesaresi, Martino, Vasileios Syrris, and Andreea Julea. 2016. “A New Method for Earth Observation Data Analytics Based on Symbolic Machine Learning.” Remote Sensing 8 (5).

Page 74: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 74 | 75

https://doi.org/10.3390/rs8050399. Piotrowski, Adam P., and Jarosław J. Napiorkowski. 2013. “A Comparison of Methods to Avoid

Overfitting in Neural Networks Training in the Case of Catchment Runoff Modelling.” Journal of Hydrology. https://doi.org/10.1016/j.jhydrol.2012.10.019.

“Python.” n.d. Accessed September 19, 2019. https://www.python.org/about/gettingstarted/. Rodriguez-Galiano, V. F., B. Ghimire, J. Rogan, M. Chica-Olmo, and J. P. Rigol-Sanchez. 2012. “An

Assessment of the Effectiveness of a Random Forest Classifier for Land-Cover Classification.” ISPRS Journal of Photogrammetry and Remote Sensing 67: 93–104. https://doi.org/10.1016/j.isprsjprs.2011.11.002.

Rufin, Philippe, David Frantz, Stefan Ernst, Andreas Rabe, Patrick Griffiths, Mutlu özdoğan, and Patrick Hostert. 2019. “Mapping Cropping Practices on a National Scale Using Intra-Annual Landsat Time Series Binning.” Remote Sensing. https://doi.org/10.3390/rs11030232.

Santaella, Julio. 2019. “In-Depth Review of Satellite Imagery / Earth Observation Technology in Official Statistics.” In Conference of European Statisticians, 67th Plenary Session, Paris, France.

“Sentinel-5P Pre-Operations Data Hub.” n.d. Accessed September 19, 2019. https://s5phub.copernicus.eu/dhus/#/home.

Soille, P., A. Burger, D. De Marchi, P. Kempeneers, D. Rodriguez, V. Syrris, and V. Vasilev. 2018. “A Versatile Data-Intensive Computing Platform for Information Retrieval from Big Geospatial Data.” Future Generation Computer Systems 81: 30–40. https://doi.org/10.1016/j.future.2017.11.007.

Sonobe, Rei. 2019. “Parcel-Based Crop Classification Using Multi-Temporal TerraSAR-X Dual Polarimetric Data.” Remote Sensing. https://doi.org/10.3390/rs11101148.

Sonobe, Rei, Hiroshi Tani, Xiufeng Wang, Nobuyuki Kobayashi, and Hideki Shimamura. 2015. “Discrimination of Crop Types with TerraSAR-X-Derived Information.” Physics and Chemistry of the Earth. https://doi.org/10.1016/j.pce.2014.11.001.

Sun, Zhongchang, Ru Xu, Wenjie Du, Lei Wang, and Lu Dengsheng. 2019. “High-Resolution Urban Land Mapping in China from Sentinel 1A/2 Imagery Based on Google Earth Engine.” Remote Sensing.

“Sustainable Development.” n.d. Accessed September 19, 2019. https://sustainabledevelopment.un.org/sdgs.

Szegedy, C, V Vanhoucke, S Ioffe, J Shlens, and Z Wojna. 2016. “Rethinking the Inception Architecture for Computer Vision.” In IEEE Conference on Computer Vision and Pattern Recognition, 2818–26.

Szuster, Brian W., Qi Chen, and Michael Borger. 2011. “A Comparison of Classification Techniques to Support Land Cover and Land Use Analysis in Tropical Coastal Zones.” Applied Geography 31: 525–32. https://doi.org/10.1016/j.apgeog.2010.11.007.

“The Basisregistratie Adressen En Gebouwen.” n.d. Accessed September 19, 2019. https://zakelijk.kadaster.nl/basisregistratie-adressen-en-gebouwen.

“The Copernicus Open Access Hub.” n.d. Accessed September 19, 2019. https://scihub.copernicus.eu/. “The OECD Better Life Index.” n.d. Accessed September 19, 2019. http://www.oecdbetterlifeindex.org. “The Public Services On the Map.” n.d. Accessed September 19, 2019. https://www.pdok.nl. “The Soil Use File.” n.d. Accessed September 19, 2019. https://www.pdok.nl/introductie/-/article/cbs-

bestand-bodemgebruik. UNECE-CES. 2019. “In-Depth Review of Satellite Imagery / Earth Observation Technology in Official

Statistics.” https://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/2019/ECE_CES_2019_16-1906490E.pdf.

United Nation. 2017. “Earth Observations for Official Statistics.” https://unstats.un.org/bigdata/taskteams/satellite/UNGWG_Satellite_Task_Team_Report_WhiteCover.pdf.

Veloso, Amanda, Stéphane Mermoz, Alexandre Bouvet, Thuy Le Toan, Milena Planells, Jean François Dejoux, and Eric Ceschia. 2017. “Understanding the Temporal Behavior of Crops Using Sentinel-1 and Sentinel-2-like Data for Agricultural Applications.” Remote Sensing of Environment 199: 415–26. https://doi.org/10.1016/j.rse.2017.07.015.

“Well-Being in Germany.” n.d. Accessed September 19, 2019. https://www.gut-leben-in-

Page 75: ESSnet Big Data II Grant Agreement Number: 847375-2018-NL ... · activities related to background study including state of the art deep researching, definition of statistical products

Page 75 | 75

deutschland.de/static/LB/en/. Xie, Qinghua, Jinfei Wang, Chunhua Liao, Jiali Shang, Juan M. Lopez-Sanchez, Haiqiang Fu, and Xiuguo

Liu. 2019. “On the Use of Neumann Decomposition for Crop Classification Using Multi-Temporal RADARSAt-2 Polarimetric SAR Data.” Remote Sensing. https://doi.org/10.3390/rs11070776.

Yesou, Herve, Eric Pottier, Gregoire Mercier, Manuel Grizonnet, Sadri Haouet, Alain Giros, Robin Faivre, Claire Huber, and Julien Michel. 2016. “Synergy of Sentinel-1 and Sentinel-2 Imagery for Wetland Monitoring Information Extraction from Continuous Flow of Sentinel Images Applied to Water Bodies and Vegetation Mapping and Monitoring.” In International Geoscience and Remote Sensing Symposium (IGARSS). https://doi.org/10.1109/IGARSS.2016.7729033.