open data cube in a box - tassic · running cube in a box on the cloud •template based using...

31
FRONTIER SI .COM.AU THE OPEN DATA CUBE IN A BOX Sam Amirebrahimi Research and Project Manager 24 October 2018

Upload: others

Post on 02-Nov-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

  • F R O N T I E R S I . C O M . A U

    THE OPEN DATA CUBE

    IN A BOX

    Sam Amirebrahimi

    Research and Project Manager

    24 October 2018

  • 2

    What is the Open Data Cube

    And why am I here talking to you

  • 3

    A definition of the project

    According to the OpenDataCube.org website:

    “The objective of the ODC is to increase the impact of satellite data by providing an open and freely accessible exploitation tool, and to foster a community to develop, sustain, and grow the breadth and depth of applications.”

    ss

    A Python library that facilitates working with raster data

    http://opendatacube.org/

  • 4

    WHICH DATA

  • 5

    • For a long time, earth observation data meant Landsat

    • Landsat has been running since Landsat 1, which launched in 1972

    • Currently Landsat 7 and 8 are operational

    • Sentinel is capturing

    • There are more satellites being launched every year

    • This is one of those exponential trends!

    Earth observation data

  • 6

    • More satellites

    • More data

    • More accessible

  • 7

    Cloud Optimised GeoTIFFs

  • 8

    What’s a COG

    A Cloud Optimized GeoTIFF (COG) is a regular GeoTIFF file, aimed at being hosted on a HTTP file server, with an internal organization that

    enables more efficient workflows on the cloud. It does this by leveraging the ability of clients issuing HTTP GET range requests to ask for

    just the parts of a file they need.

    https://www.cogeo.org/

    https://tools.ietf.org/html/rfc7233

  • 9

  • 10

    How do I use COGs?

  • 11

    https://medium.com/radiant-earth-insights/cloud-optimized-geotiff-advances-6b01750eb5ac

  • 12

    https://blog.mapbox.com/combining-the-power-of-aws-lambda-and-rasterio-8ffd3648c348

  • 13

    COGs are supported natively

    in the ODC

  • 14

    What is the ODC?

  • 15

    • Australia used to be a ‘downlink’ for Landsat

    • Data was delivered back to the US

    • Raw data was stored on tapes

    • Data eventually was digitized in Unlocking the Landsat Archive project (with CRCSI)

    • The Australian Geoscience Data Cube (ARDC) was developed

    • Rewrite was undertaken making it more generic (ARDCv2)

    • Renamed to the Open Data Cube

    A brief history of the ODC

  • 16

    The technical components

    In a nutshell:

    • Data

    • An index

    • Software

  • 17

  • 18

    Using the Open Data Cube

  • 19

    Do you have a supercomputer?

    • Get access to one of the operational cubes

    • Deployments are often on national supercomputer infrastructure…

    • What about me? (it isn’t fair…)

  • 20

    Reference deployments

    Example deployments:

    • Easy: Someone else’s install!

    • Medium: Cube in a Box

    • Medium-Hard: Manual installation

    • Hard: JupyterHub (as a SysAdmin)

    • Very Hard: Supercomputer deployment

  • 21

    JupyterHub

    • Enables a multi-user Jupyterenvironment, with permanent storage

    • Runs on top of Kubernetes

    • Will be forming the base of the ‘sandbox’ environment for the ODC

    • Deployment is complex, but users can access a ‘sandbox’ very easily and start testing

  • 22

    Cube in a Box

    • Works locally, i.e, on a laptop (Docker Compose)

    • Works on AWS (CloudFormation and Docker Compose)

    • Enables several things:

    • Test environment

    • Development environment

    • Disposable workspace

    • Rapid evaluation

  • 23

    Cube in a BoxEnabling technology

    • Docker

    • Jupyter

    • Cloud Optimised GeoTIFFs (and relevant cloud data stores)

    • GDAL and RasterIO

    • PostgreSQL

    • The Open Data Cube

  • 24

    Running Cube in a Box Locally

    • Requirements: Docker and Docker Compose

    • Very fast to start• Easy to scrap workspace and

    start again• Configured with Landsat 8 on

    AWS for auto-indexing• Get running with Open Data

    Cube in minutes

  • 25

    Running Cube in a Box on the cloud

    • Template based using AWS’ CloudFormation

    • Deploys in 3 minutes

    • Takes 5 minutes to index data

    • Ready to run with Landsat 8 data from AWS auto-indexed

    Satellite When will it be available

    Landsat (1987 – today) Now

    Sentinel2 ARD ~ 3 months

    Near-real time Sentinel 2 Now (beta)

    Near-real time Landsat ~ 3 months

    Sentinel 1 RADAR ~ 1 year

    Landsat Surface Temperature ~ 1 year

  • 26

    Applications

    • Jupyter (data science)

    • WMS/WCS/WPS (web services)

    • Web UI (Python based library)

    • Mobile apps (NDVI in the field)

    https://github.com/opendatacube

  • 27

    Case Studies

  • 28

    Outcomes

  • 29

    What does this mean?

    • The learning curve has been flattened

    • Infrastructure as code documents the architecture

    • Users can worry about using, not deploying

    • Testing and evaluation is easy

    • You can try it now!

    https://github.com/crc-si/cube-in-a-box

  • 30

    What’s next from FrontierSI?

    • Industry engagement in Australia

    • More documentation!

    • ODC Sandbox

    • Global data already indexed

    • Example notebooks and training

    • Free to use

    • Learn about ODC

  • 31

    Conclusions

    • COGs make data easy

    • The ODC makes access easy

    • Cube in a Box makes ODC easy!

    • Join us!

    • slack.opendatacube.org

    • github.com/opendatacube

    More info: [email protected]

    mailto:[email protected]