data analytics in the cloud with jupyter notebooks
TRANSCRIPT
Data analytics in the cloud with Jupyter
NotebooksGraham Dumpleton
http://jupyter.org/
Python Data Science Handbook / 04.12-Three-Dimensional-Plotting
Python Data Science Handbook / 04.13-Geographic-Data-With-Basemap
https://blog.data.gov.sg/how-we-caught-the-circle-line-rogue-train-with-data-79405c86ab6a
Who’s Using It?
Individuals
Collaborators
Teachers
Getting Started
pip3 install jupyter
jupyter notebook
Empty Workspace
Upload Notebooks
Local File System
$ ls notebooks/01*.ipynbnotebooks/01.00-IPython-Beyond-Normal-Python.ipynbnotebooks/01.01-Help-And-Documentation.ipynbnotebooks/01.02-Shell-Keyboard-Shortcuts.ipynbnotebooks/01.03-Magic-Commands.ipynbnotebooks/01.04-Input-Output-History.ipynbnotebooks/01.05-IPython-And-Shell-Commands.ipynbnotebooks/01.06-Errors-and-Debugging.ipynbnotebooks/01.07-Timing-and-Profiling.ipynbnotebooks/01.08-More-IPython-Resources.ipynb
Browsing Files
Interacting with a Notebook
Status of Notebooks
Installing Packages
Positives
• Save notebooks/data locally.
• Python virtual environments.
• Select Python version you want.
• Install required Python packages.
Negatives• Operating system differences.
• Python distribution differences.
• Python version differences.
• Package index differences.
• PyPi (pip) vs Anaconda (conda)
• Effort to setup and maintain.
Running Docker Image
docker run -it --rm -p 8888:8888 \jupyter/minimal-notebook
Positives• Pre-created images.
• Bundled operating system packages.
• Known Python distribution/vendor.
• Bundled Python packages.
• Docker images are read only.
• Don’t need to maintain the image.
Negatives (1)• More effort to customise experience.
• Build a custom Docker image to extend.
• Install extra packages each time you run it.
• Images can be very large.
• Multiple Python versions.
• Packages that you do not need.
Negatives (2)
• Access to and saving your notebooks/data.
• Need to mount persistent storage volumes.
• Ensuring access is done securely.
Azure Notebooks
https://notebooks.azure.com/
Binder Service
http://mybinder.org/
Positives
• Somebody else looks after everything.
Negatives• Shared resource.
• Outside of your control.
• Reliability.
• Customisation.
• Software versions.
• Information security.
Positives
• Can customise however you want.
• Modify code for service.
• Use custom images.
Negatives
• Dedicated infrastructure.
• Effort to understand and set it up.
• Effort to keep it running.
Many Options to Choose From
OpenShift
Deployments
Docker Image
Image Stream
Notebook Storage
Attaching Storage
Shared Storage
Positives• Use existing features of OpenShift
• No special storage backends required.
• No custom provisioning applications.
• Cluster can still be used for other applications.
• Simply set quotas and users do what they want.
Source-to-Image
Positives• Easily build custom images.
• Pre-populated with required Python packages.
• Pre-populated with required Jupyter Notebooks.
• Pre-populated with required data files.
• Direct to application, or to create images.
Service Catalog
Templates (builder)
Templates (cluster)
Templates (notebook)
IPyParallel Cluster
Parallel Computing
Positives
• Templates enable complex deployments.
• Don’t need something like JupyterHub.
Challenges
• Custom base images and builders.
• Learning curve for writing templates.
Command Lineoc new-app stats101-notebook-template \ --param STUDENT_NUMBER=1 \ --param CLASS_NUMBER=1234
oc new-app stats101-notebook-template \ --param STUDENT_NUMBER=2 \ --param CLASS_NUMBER=1234
…
oc delete all --selector class=1234
REST APIimport powershift.endpoints as endpoints
client = endpoints.Client()projects = client.oapi.v1.projects.get()
def public_address(route): host = route.spec.host path = route.spec.path or '/' if route.spec.tls: return 'https://%s%s' % (host, path) return 'http://%s%s' % (host, path)
routes = client.oapi.v1.namespaces(namespace='stats101').routes.get()
for route in routes.items: print(' route=%r' % public_address(route))
Positives
• Easily trigger multiple deployments using CLI.
• REST API also available for custom front ends.
Resources• S2I enabled Jupyter Notebook images
• https://github.com/getwarped/jupyter-notebooks
• OpenShift versions of Jupyter Project images
• https://github.com/getwarped/jupyter-stacks
• Python REST API client for OpenShift
• https://github.com/getwarped/powershift