open data cube in a box - tassic · running cube in a box on the cloud •template based using...
TRANSCRIPT
-
F R O N T I E R S I . C O M . A U
THE OPEN DATA CUBE
IN A BOX
Sam Amirebrahimi
Research and Project Manager
24 October 2018
-
2
What is the Open Data Cube
And why am I here talking to you
-
3
A definition of the project
According to the OpenDataCube.org website:
“The objective of the ODC is to increase the impact of satellite data by providing an open and freely accessible exploitation tool, and to foster a community to develop, sustain, and grow the breadth and depth of applications.”
ss
A Python library that facilitates working with raster data
http://opendatacube.org/
-
4
WHICH DATA
-
5
• For a long time, earth observation data meant Landsat
• Landsat has been running since Landsat 1, which launched in 1972
• Currently Landsat 7 and 8 are operational
• Sentinel is capturing
• There are more satellites being launched every year
• This is one of those exponential trends!
Earth observation data
-
6
• More satellites
• More data
• More accessible
-
7
Cloud Optimised GeoTIFFs
-
8
What’s a COG
A Cloud Optimized GeoTIFF (COG) is a regular GeoTIFF file, aimed at being hosted on a HTTP file server, with an internal organization that
enables more efficient workflows on the cloud. It does this by leveraging the ability of clients issuing HTTP GET range requests to ask for
just the parts of a file they need.
https://www.cogeo.org/
https://tools.ietf.org/html/rfc7233
-
9
-
10
How do I use COGs?
-
11
https://medium.com/radiant-earth-insights/cloud-optimized-geotiff-advances-6b01750eb5ac
-
12
https://blog.mapbox.com/combining-the-power-of-aws-lambda-and-rasterio-8ffd3648c348
-
13
COGs are supported natively
in the ODC
-
14
What is the ODC?
-
15
• Australia used to be a ‘downlink’ for Landsat
• Data was delivered back to the US
• Raw data was stored on tapes
• Data eventually was digitized in Unlocking the Landsat Archive project (with CRCSI)
• The Australian Geoscience Data Cube (ARDC) was developed
• Rewrite was undertaken making it more generic (ARDCv2)
• Renamed to the Open Data Cube
A brief history of the ODC
-
16
The technical components
In a nutshell:
• Data
• An index
• Software
-
17
-
18
Using the Open Data Cube
-
19
Do you have a supercomputer?
• Get access to one of the operational cubes
• Deployments are often on national supercomputer infrastructure…
• What about me? (it isn’t fair…)
-
20
Reference deployments
Example deployments:
• Easy: Someone else’s install!
• Medium: Cube in a Box
• Medium-Hard: Manual installation
• Hard: JupyterHub (as a SysAdmin)
• Very Hard: Supercomputer deployment
-
21
JupyterHub
• Enables a multi-user Jupyterenvironment, with permanent storage
• Runs on top of Kubernetes
• Will be forming the base of the ‘sandbox’ environment for the ODC
• Deployment is complex, but users can access a ‘sandbox’ very easily and start testing
-
22
Cube in a Box
• Works locally, i.e, on a laptop (Docker Compose)
• Works on AWS (CloudFormation and Docker Compose)
• Enables several things:
• Test environment
• Development environment
• Disposable workspace
• Rapid evaluation
-
23
Cube in a BoxEnabling technology
• Docker
• Jupyter
• Cloud Optimised GeoTIFFs (and relevant cloud data stores)
• GDAL and RasterIO
• PostgreSQL
• The Open Data Cube
-
24
Running Cube in a Box Locally
• Requirements: Docker and Docker Compose
• Very fast to start• Easy to scrap workspace and
start again• Configured with Landsat 8 on
AWS for auto-indexing• Get running with Open Data
Cube in minutes
-
25
Running Cube in a Box on the cloud
• Template based using AWS’ CloudFormation
• Deploys in 3 minutes
• Takes 5 minutes to index data
• Ready to run with Landsat 8 data from AWS auto-indexed
Satellite When will it be available
Landsat (1987 – today) Now
Sentinel2 ARD ~ 3 months
Near-real time Sentinel 2 Now (beta)
Near-real time Landsat ~ 3 months
Sentinel 1 RADAR ~ 1 year
Landsat Surface Temperature ~ 1 year
-
26
Applications
• Jupyter (data science)
• WMS/WCS/WPS (web services)
• Web UI (Python based library)
• Mobile apps (NDVI in the field)
https://github.com/opendatacube
-
27
Case Studies
-
28
Outcomes
-
29
What does this mean?
• The learning curve has been flattened
• Infrastructure as code documents the architecture
• Users can worry about using, not deploying
• Testing and evaluation is easy
• You can try it now!
https://github.com/crc-si/cube-in-a-box
-
30
What’s next from FrontierSI?
• Industry engagement in Australia
• More documentation!
• ODC Sandbox
• Global data already indexed
• Example notebooks and training
• Free to use
• Learn about ODC
-
31
Conclusions
• COGs make data easy
• The ODC makes access easy
• Cube in a Box makes ODC easy!
• Join us!
• slack.opendatacube.org
• github.com/opendatacube
More info: [email protected]
mailto:[email protected]