interactive geospatial data analytics using jupyterlab · • eo big data challenges tackled in...
TRANSCRIPT
Interactive Geospatial Data Analytics
using JupyterLab
Pierre Soille Davide De Marchi
European Commission, Joint Research Centre Directorate I Competences, Unit I.3 Text and Data Mining
Joint Research Centre (JRC)
Data analytics workshop for official statistics (daWos) ---The geographical dimension of data analytics ---
Amsterdam. 12/09/2018
URL: https://cidportal.jrc.ec.europa.eu
Outline
• Context
• Geospatial data holdings
• Jupyter/JupyterLab
• Deferred processing
• Application gallery
• Takeaway messages
Indicators
Decisions
Big data
Big geospatial data for policy
Policy relevant information
Data
Volu
me,
Velo
city,
Variety
atmosphere
marine
land
climate
emergency
security
Exploit data volume, velocity, and variety to generate policy relevant information
• Using FAIR data principles (findable, accessible, interoperable, reusable) • With data mining competence in shared and collaborative environment • Relying on reproducible workflows
directives, legislations, communications, …
Earth Observation, in situ, crowd sourcing, social sensing, text data, 0S data, web scrapping, …
Versatile platform bringing the users to the data and allowing for:
• Running large scale batch processing of existing scientific workflows thanks to lightweight virtualisation based on Docker
• Remote desktop capability
for fast prototyping in
legacy environments
• Interactive data
visualisation and
analysis with Jupyter
JRC Earth Observation Data and Processing Platform
Datasets accessed from external sources
• Enabled by appropriate interfaces (‘virtualization’ layer) to retrieve data from external backends
• Basemaps through Web Map Tile Services:
• OpenStreepMap, OpenTopoMap, OpenRailwayMap, …
• NASAGIBS (MODIS at any date, EarthAtNight 2012)
• Any provider of Web Map Tile Service (e.g., Strava)
• In situ data through appropriate APIs. Example with Eurostat data through Json API, proof of concept with
• Population density map
• Poverty map
• Raster datasets
• Sentinel images (Sentinel-1-2-3)
• Landsat imagery
• Elevation data
• Urban Atlas
• Corine Land Cover
• Population, nightlight, etc.
• Vector datasets
• GISCO NUTS regions
• EFFIS forest fires
• Natura2000
• Transport networks
• …
Hosted datasets
Sentinel-2 L1C on JEOPDPP
• Daily data download for defined AOI’s (Europe, tropical zones), status 2018-04-17
• Currently 540k tiles, 550 TB of data downloaded
Sentinel-2 on JEODPP
~620 KTiles out of 4.5 MTiles on Copernicus hub, 550 TB in total
Sentinel-2 quicklooks/cloud masks
Full resolution cloud free Sentinel-2 composite: example for China
Methodology DOI:10.1080/20964471.2017.1407489
Sentinel-2 Global Human Settlements Layer Corine Land Cover Copernicus Core003 Global Surface Water Sentinel-1 Mosaic EU-DEM, SRTM, GEBCO Global NDVI 25 years
Raster data management
Many raster datasets available:
Jupyter ecosystem
http://jupyter.org/
ipyleaflet
https://github.com/ellisonbg/ipyleaflet
ipywidgets and bqplot
https://github.com/jupyter-widgets/ipywidgets https://github.com/bloomberg/bqplot
Interactive visualization and analysis with Jupyter
• Web interface to visualize and analyze big geospatial data
• Allows fast search and display of complex dataset
• Creates an agile test environment for raster and vector processing algorithms thanks to the immediate display of the output results
• Available for geospatial expert with some programming capabilities
• Allows easy creation of GUI applications for non programmers
From data to interactive display
Source: DOI:10.1016/j.future.2017.11.007
Deferred processing
• No data is pre-calculated, similarly to Google Earth Engine platform
• Processing steps, their input parameters and their combinations in processing chains are defined by the user in the client environment and saved in JSON format
• Processing chains are executed server side in a highly parallel infrastructure by a C/C++ image processing library having a direct access to data
• Results are sent via HTTP to the ipyleaflet client
Deferred processing in action
Statistical data visualisation
• Data accessed directly from Eurostat (API) • Interactive rendering on JEODPP (e.g., poverty map)
Statistical data visualisation
• Another example with Eurostat population density map interactively rendered in JEODPP
JEODPP Sentinel-2 Explorer in JupyterLab
Global Human Settlement Layer with Global Surface Water Occurence on top of Global S1 mosaic
GHSL-S1 doi:10.1080/01431161.2017.1392642 Global Surface Water doi:10.1038/nature20584 Global S1-Mosaic doi:10.1109/TBDATA.2018.2846265 See also https://cidportal.jrc.ec.europa.eu/services/webview/jeodpp/databrowser/
Takeaway messages
• Jupyter ecosystem enables the interactive visualization and analysis on the JEODPP while fostering collaborative working and knowledge sharing
• Powerful data manipulation for data scientist
• Easy interface building for policy officers
Takeaway messages
• EO Big Data challenges tackled in terms of:
• Data variety (raster, vector, in-situ data)
• On-the-fly analysis and rendering of massive datasets
• API completeness and easy of use (importance of documentation with functional examples)
• Need for open APIs to apply analysis workflows on data available on external and retrieve the results from them (H2020 openEO project http://openeo.org/)
• Big Data from Space’19: Turning Data into Insights
• Co-organised by ESA, JRC, and SatCen
• Hosted by DLR, Munich, February 2019
• Paper submission deadline: 15/10/18
+ participation to ESA S2QWG and Network of Resources WG
2019 Big Data from Space Conference Munich Congress Hall, 19-21/02/19
BiDS’17 proceedings: http://dx.doi.org/10.2760/383579
Thank you
Contact: Pierre.Soille at ec.europa.eu