the optirad platform: cloud-hosted ipython notebooks for...
TRANSCRIPT
The OPTIRAD Platform: Cloud-hosted IPython Notebooks for collaborative EO Data Analysis and Processing
ESA EO Open Science 2.0 Conference 12-14 October 2015
Philip Kershaw (CEDA), John Holt (Tessella plc.) José Gómez-Dans, Philip Lewis (UCL)
Nicola Pounder, Jon Styles (Assimila Ltd.)
JASMIN (STFC/Stephen Kill)
Introduction
• OPTIRAD = OPTImisation environment for joint retrieval of multi-sensor RADiances – Collaboration: CEDA, UCL, Assimila Ltd, FastOpt and VU Amsterdam
– Funded by ESA
• Overview of technical solution – Introduction to IPython (Jupyter) Notebook
– Deployment on JASMIN-CEMS science cloud
• Make the case, IPython Notebook + Cloud = powerful combination for EO Open Science 2.0
OPTIRAD Goals
Address the challenge of producing consistent EO land surface information products from heterogeneous EO data input:
Collaboration: provide a collaborative research environment as a means to engender closer working between algorithm specialists, modellers and end users.
Computing resources: processing at high spatial and temporal resolutions with computationally expensive algorithms.
Usability and access: easy execution and development of existing Python code and the provision of interactive tutorials for new users
IPython Notebook
• Provides Python kernels accessible via a web browser
• Sessions can be saved and shared • Trivial access to parallel processing
capabilities – IPython.parallel (ipyparallel)
• IPython Jupyter Notebook • Support for other languages such as
R
• New JupyterHub allows multi-user management of notebooks
• Gained traction as a teaching and collaborative tool
IPython Notebook + Cloud
• Cloud’s characteristics: – Broad network access, resource pooling, elasticity, scale – compute and
storage – Good fit for Big Data science applications
• Cloud-hosted Notebook - a model already demonstrated with public cloud services e.g. – Wakari, Azure, Rackspace
• Central hosting allows central management of software packages
– no installation steps needed for the user
• Algorithm prototyping environment next to Big Data
– Acts as a precursor to operational processing services
Notebook: a user – application perspective
Support a spectrum of usage models
Dif
fere
nt
clas
ses
of
use
r
Long-tail of science users
Design and development considerations
• Host on JASMIN-CEMS – Data analysis facility and science cloud at Rutherford Appleton Lab, UK – Advantage of proximity to locally hosted EO and climate science datasets – Integration with environmental sciences community
• Lightweight development and deployment philosophy – Build on Open Source and community efforts to use what’s already available
• How to meet multi-user support requirement?
– Buy off-the-shelf: run Wakari on JASMIN-CEMS platform or – Try JupyterHub: multi-user IPython Notebook solution or – Roll our own solution
• How to integrate parallel processing? – IPython.parallel (ipyparallel) Python API accessed via the Notebook
OPTIRAD JASMIN Cloud Tenancy
Docker Container
VM: Swarm pool 0 VM: Swarm pool 0
Deployment Architecture
JupyterHub
VM: Swarm pool 0
Docker Container
IPython Notebook
Kernel
Docker Container
IPython Notebook
Kernel
Kernel
Kernel Parallel Controller
Parallel Controller
VM: Swarm pool 0
VM: Swarm pool 0
VM: slave 0
Parallel Engine
Parallel Engine
Nodes for parallel Processing
Notebooks and kernels in containers
Swarm manages allocation of containers for notebooks
Manage users and provision of
notebooks
Swarm
Fire
wal
l VM: shared services
NFS LDAP
Browser access
Conclusions + Next Steps
• Experiences from project delivery – Off-shelf solution using JupyterHub paid off
– JupyterHub and Swarm was new but
– Installation straightforward + operationally robust
• Challenges and future development – Extend use of containers for parallel compute
– Challenge: managing cloud elasticity with both containers and host VMs
– Provide object storage – CEPH likely to be adopted
– Expand from OPTIRAD pilot to wider user community
– Deploy with toolboxes e.g. Sentinels or CIS.
Demo . . .
• A tutorial on EO data assimilation
– Notebook blurs the traditional separation between tutorial documentation and using the target system
– The two are one self-contained interactive unit
Further information
• OPTIRAD: – Optimisation Environment For Joint Retrieval Of Multi-Sensor Radiances
(OPTIRAD), Proceedings of the ESA 2014 Conference on Big Data from Space (BiDS’14) http://dx.doi.org/10.2788/1823
• JASMIN paper (Sept 2013) – http://home.badc.rl.ac.uk/lawrence/static/2013/10/14/LawEA13_Jasmin.
pdf – Cloud paper to follow soon
• Cloud-hosted JupyterHub with Docker for teaching: – https://developer.rackspace.com/blog/deploying-jupyterhub-for-
education/
• JASMIN and CEDA: – http://jasmin.ac.uk/ – http://www.ceda.ac.uk
• @PhilipJKershaw