Download - Deploying Data Science with Docker and AWS
Deploying Data Science with Docker and AWS
Audience: Cambridge AWS Meetup Group
Presenter: Matt McDonnell, Data Scientist at Metail
Date: 9th June 2016
Context
Lots of event stream data
Many AWS components
Outputs:- Business Intelligence- Bespoke Analysis- Productionised Science
What?Goal: Moving laptop analyses onto a server
Turn :
<types>run_analysis.sh<presses enter>
… analysis script retrieves data from DB, Looker, web, etc. …
… runs analysis …
… outputs results as csv, png, etc. to local hard disk …
<gets back command prompt>
Into :
Automated process running on a server
Why?• Production scheduled task e.g. Firm Wide Metrics daily processing
• Make use of more powerful Amazon Web Services (AWS) cloud resources for large scale analysis
• Ease of deployment for Data Science analysts
• Build consistent development environment
How?• Containerize applications and runtime using Docker to produce images
• Store images on AWS Elastic Container Registry (ECR)
• Run images either locally, or Amazon Elastic Container Service (ECS)
• Use AWS Lambda functions to trigger scheduled tasks (or react to events)
What is Docker?
“Docker containers wrap up a piece of software in a complete filesystem that contains everything it needs to run: code, runtime, system tools, system libraries – anything you can install on a server. This guarantees that it will always run the same, regardless of the environment it is running in.” -- https://www.docker.com/what-docker
Public code: store Dockerfile on GitHub, use Travis to automatically build image on DockerHub
Private code: private Dockerfile, build locally, push image to AWS Elastic Container Registry
Example application: retrieve market data
PyAnalysisApplication code built on PCR image
https://github.com/mattmcd/PyAnalysis
PCR: Python Component Runtime Base Docker image
https://github.com/mattmcd/PCR
Where? Amazon Web Services Cloud
• Elastic Container Service (ECS) • Defines the task that runs the container• Runs tasks on a cluster of EC2 nodes
• EC2 instance set up to act as node • Needs to be an AWS ECS optimized AMI
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/launch_container_instance.html
• Needs an IAM Role that has:• AmazonEC2ContainerServiceforEC2Role policy attached• Policies to allow access to any AWS resources needed e.g. S3
• Lambda function to trigger ECS task• cron equivalent by using CloudWatch scheduled events
EC2 Instance Security Group
EC2 instance used by ECS can be locked down – no need to SSH in to it so no inbound ports needed
EC2 Instance AMI
Use latest available Amazon ECS Optimized AMI – it has Docker and ECS Container Agent already installed
EC2 Instance Details
Enable Auto-assign Public IP so ECS can connect and assign a custom IAM Role as a hook for access permissions
EC2 Instance IAM Role
Attach AmazonEC2ContainerServiceForEC2Role Policy and any extra access Policies for containers on the instance
Lambda function IAM role
AWS will create default IAM Roles for Lambda function – need to add ecs:RunTask to run container
Demo / Q&ABlog posts
• ‘Scheduled Downloads using AWS EC2 and Docker — Medium’ http://bit.ly/1TO9a1h (me)
• ‘Better Together: Amazon ECS and AWS Lambda’ http://amzn.to/1UkitEF (not me)
Code samples
• https://github.com/mattmcd/PyAnalysis
• https://github.com/mattmcd/PCR
Docker images
• mattmcd/pyanalysis
• mattmcd/pcr
Me
• Twitter @mattmcd
• Email [email protected] or [email protected]