interactive geospatial data analytics using jupyterlab · • eo big data challenges tackled in...

26
Interactive Geospatial Data Analytics using JupyterLab Pierre Soille Davide De Marchi European Commission, Joint Research Centre Directorate I Competences, Unit I.3 Text and Data Mining Joint Research Centre (JRC) Data analytics workshop for official statistics (daWos) ---The geographical dimension of data analytics --- Amsterdam. 12/09/2018 URL: https://cidportal.jrc.ec.europa.eu

Upload: others

Post on 25-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Interactive Geospatial Data Analytics using JupyterLab · • EO Big Data challenges tackled in terms of: •Data variety (raster, vector, in-situ data) •On-the-fly analysis and

Interactive Geospatial Data Analytics

using JupyterLab

Pierre Soille Davide De Marchi

European Commission, Joint Research Centre Directorate I Competences, Unit I.3 Text and Data Mining

Joint Research Centre (JRC)

Data analytics workshop for official statistics (daWos) ---The geographical dimension of data analytics ---

Amsterdam. 12/09/2018

URL: https://cidportal.jrc.ec.europa.eu

Page 2: Interactive Geospatial Data Analytics using JupyterLab · • EO Big Data challenges tackled in terms of: •Data variety (raster, vector, in-situ data) •On-the-fly analysis and

Outline

• Context

• Geospatial data holdings

• Jupyter/JupyterLab

• Deferred processing

• Application gallery

• Takeaway messages

Page 3: Interactive Geospatial Data Analytics using JupyterLab · • EO Big Data challenges tackled in terms of: •Data variety (raster, vector, in-situ data) •On-the-fly analysis and

Indicators

Decisions

Big data

Big geospatial data for policy

Policy relevant information

Data

Volu

me,

Velo

city,

Variety

atmosphere

marine

land

climate

emergency

security

Exploit data volume, velocity, and variety to generate policy relevant information

• Using FAIR data principles (findable, accessible, interoperable, reusable) • With data mining competence in shared and collaborative environment • Relying on reproducible workflows

directives, legislations, communications, …

Earth Observation, in situ, crowd sourcing, social sensing, text data, 0S data, web scrapping, …

Page 4: Interactive Geospatial Data Analytics using JupyterLab · • EO Big Data challenges tackled in terms of: •Data variety (raster, vector, in-situ data) •On-the-fly analysis and

Versatile platform bringing the users to the data and allowing for:

• Running large scale batch processing of existing scientific workflows thanks to lightweight virtualisation based on Docker

• Remote desktop capability

for fast prototyping in

legacy environments

• Interactive data

visualisation and

analysis with Jupyter

JRC Earth Observation Data and Processing Platform

Page 5: Interactive Geospatial Data Analytics using JupyterLab · • EO Big Data challenges tackled in terms of: •Data variety (raster, vector, in-situ data) •On-the-fly analysis and

Datasets accessed from external sources

• Enabled by appropriate interfaces (‘virtualization’ layer) to retrieve data from external backends

• Basemaps through Web Map Tile Services:

• OpenStreepMap, OpenTopoMap, OpenRailwayMap, …

• NASAGIBS (MODIS at any date, EarthAtNight 2012)

• Any provider of Web Map Tile Service (e.g., Strava)

• In situ data through appropriate APIs. Example with Eurostat data through Json API, proof of concept with

• Population density map

• Poverty map

Page 6: Interactive Geospatial Data Analytics using JupyterLab · • EO Big Data challenges tackled in terms of: •Data variety (raster, vector, in-situ data) •On-the-fly analysis and

• Raster datasets

• Sentinel images (Sentinel-1-2-3)

• Landsat imagery

• Elevation data

• Urban Atlas

• Corine Land Cover

• Population, nightlight, etc.

• Vector datasets

• GISCO NUTS regions

• EFFIS forest fires

• Natura2000

• Transport networks

• …

Hosted datasets

Page 7: Interactive Geospatial Data Analytics using JupyterLab · • EO Big Data challenges tackled in terms of: •Data variety (raster, vector, in-situ data) •On-the-fly analysis and

Sentinel-2 L1C on JEOPDPP

• Daily data download for defined AOI’s (Europe, tropical zones), status 2018-04-17

• Currently 540k tiles, 550 TB of data downloaded

Page 8: Interactive Geospatial Data Analytics using JupyterLab · • EO Big Data challenges tackled in terms of: •Data variety (raster, vector, in-situ data) •On-the-fly analysis and

Sentinel-2 on JEODPP

~620 KTiles out of 4.5 MTiles on Copernicus hub, 550 TB in total

Page 9: Interactive Geospatial Data Analytics using JupyterLab · • EO Big Data challenges tackled in terms of: •Data variety (raster, vector, in-situ data) •On-the-fly analysis and

Sentinel-2 quicklooks/cloud masks

Page 10: Interactive Geospatial Data Analytics using JupyterLab · • EO Big Data challenges tackled in terms of: •Data variety (raster, vector, in-situ data) •On-the-fly analysis and

Full resolution cloud free Sentinel-2 composite: example for China

Methodology DOI:10.1080/20964471.2017.1407489

Page 11: Interactive Geospatial Data Analytics using JupyterLab · • EO Big Data challenges tackled in terms of: •Data variety (raster, vector, in-situ data) •On-the-fly analysis and

Sentinel-2 Global Human Settlements Layer Corine Land Cover Copernicus Core003 Global Surface Water Sentinel-1 Mosaic EU-DEM, SRTM, GEBCO Global NDVI 25 years

Raster data management

Many raster datasets available:

Page 12: Interactive Geospatial Data Analytics using JupyterLab · • EO Big Data challenges tackled in terms of: •Data variety (raster, vector, in-situ data) •On-the-fly analysis and

Jupyter ecosystem

http://jupyter.org/

Page 13: Interactive Geospatial Data Analytics using JupyterLab · • EO Big Data challenges tackled in terms of: •Data variety (raster, vector, in-situ data) •On-the-fly analysis and

ipyleaflet

https://github.com/ellisonbg/ipyleaflet

Page 14: Interactive Geospatial Data Analytics using JupyterLab · • EO Big Data challenges tackled in terms of: •Data variety (raster, vector, in-situ data) •On-the-fly analysis and

ipywidgets and bqplot

https://github.com/jupyter-widgets/ipywidgets https://github.com/bloomberg/bqplot

Page 15: Interactive Geospatial Data Analytics using JupyterLab · • EO Big Data challenges tackled in terms of: •Data variety (raster, vector, in-situ data) •On-the-fly analysis and

Interactive visualization and analysis with Jupyter

• Web interface to visualize and analyze big geospatial data

• Allows fast search and display of complex dataset

• Creates an agile test environment for raster and vector processing algorithms thanks to the immediate display of the output results

• Available for geospatial expert with some programming capabilities

• Allows easy creation of GUI applications for non programmers

Page 16: Interactive Geospatial Data Analytics using JupyterLab · • EO Big Data challenges tackled in terms of: •Data variety (raster, vector, in-situ data) •On-the-fly analysis and

From data to interactive display

Source: DOI:10.1016/j.future.2017.11.007

Page 17: Interactive Geospatial Data Analytics using JupyterLab · • EO Big Data challenges tackled in terms of: •Data variety (raster, vector, in-situ data) •On-the-fly analysis and

Deferred processing

• No data is pre-calculated, similarly to Google Earth Engine platform

• Processing steps, their input parameters and their combinations in processing chains are defined by the user in the client environment and saved in JSON format

• Processing chains are executed server side in a highly parallel infrastructure by a C/C++ image processing library having a direct access to data

• Results are sent via HTTP to the ipyleaflet client

Page 18: Interactive Geospatial Data Analytics using JupyterLab · • EO Big Data challenges tackled in terms of: •Data variety (raster, vector, in-situ data) •On-the-fly analysis and

Deferred processing in action

Page 19: Interactive Geospatial Data Analytics using JupyterLab · • EO Big Data challenges tackled in terms of: •Data variety (raster, vector, in-situ data) •On-the-fly analysis and

Statistical data visualisation

• Data accessed directly from Eurostat (API) • Interactive rendering on JEODPP (e.g., poverty map)

Page 20: Interactive Geospatial Data Analytics using JupyterLab · • EO Big Data challenges tackled in terms of: •Data variety (raster, vector, in-situ data) •On-the-fly analysis and

Statistical data visualisation

• Another example with Eurostat population density map interactively rendered in JEODPP

Page 21: Interactive Geospatial Data Analytics using JupyterLab · • EO Big Data challenges tackled in terms of: •Data variety (raster, vector, in-situ data) •On-the-fly analysis and

JEODPP Sentinel-2 Explorer in JupyterLab

Page 22: Interactive Geospatial Data Analytics using JupyterLab · • EO Big Data challenges tackled in terms of: •Data variety (raster, vector, in-situ data) •On-the-fly analysis and

Global Human Settlement Layer with Global Surface Water Occurence on top of Global S1 mosaic

GHSL-S1 doi:10.1080/01431161.2017.1392642 Global Surface Water doi:10.1038/nature20584 Global S1-Mosaic doi:10.1109/TBDATA.2018.2846265 See also https://cidportal.jrc.ec.europa.eu/services/webview/jeodpp/databrowser/

Page 23: Interactive Geospatial Data Analytics using JupyterLab · • EO Big Data challenges tackled in terms of: •Data variety (raster, vector, in-situ data) •On-the-fly analysis and

Takeaway messages

• Jupyter ecosystem enables the interactive visualization and analysis on the JEODPP while fostering collaborative working and knowledge sharing

• Powerful data manipulation for data scientist

• Easy interface building for policy officers

Page 24: Interactive Geospatial Data Analytics using JupyterLab · • EO Big Data challenges tackled in terms of: •Data variety (raster, vector, in-situ data) •On-the-fly analysis and

Takeaway messages

• EO Big Data challenges tackled in terms of:

• Data variety (raster, vector, in-situ data)

• On-the-fly analysis and rendering of massive datasets

• API completeness and easy of use (importance of documentation with functional examples)

• Need for open APIs to apply analysis workflows on data available on external and retrieve the results from them (H2020 openEO project http://openeo.org/)

Page 25: Interactive Geospatial Data Analytics using JupyterLab · • EO Big Data challenges tackled in terms of: •Data variety (raster, vector, in-situ data) •On-the-fly analysis and

• Big Data from Space’19: Turning Data into Insights

• Co-organised by ESA, JRC, and SatCen

• Hosted by DLR, Munich, February 2019

• Paper submission deadline: 15/10/18

+ participation to ESA S2QWG and Network of Resources WG

2019 Big Data from Space Conference Munich Congress Hall, 19-21/02/19

BiDS’17 proceedings: http://dx.doi.org/10.2760/383579

Page 26: Interactive Geospatial Data Analytics using JupyterLab · • EO Big Data challenges tackled in terms of: •Data variety (raster, vector, in-situ data) •On-the-fly analysis and

Thank you

Contact: Pierre.Soille at ec.europa.eu