cloud academy & aws: how we use amazon web services for machine learning and data collection
TRANSCRIPT
Cloud Academy & AWS: how we use Amazon Web Services
for machine learning and data collec:on
cloudacademy.com4/27/2016
About us
Alex Casalboni Roberto Turrin Luca BaroffioSr. SoCware Engineer Sr. Data Scien:st (PhD) Data Scien:st (PhD)
@alex_casalboni @robytur @lucabaroffio
clda.co/webinar-ML
What is Machine Learning (ML)?
Back to 1959 (A. Samuel)
Decision problems that can be modeled from data
clda.co/webinar-ML
Machine Learning pipeline
Training Predic1on
batch real-‐:me
Feature extrac1on
batch
data informaGon
features ML models
clda.co/webinar-ML
?
Machine Learning taxonomy
Supervised Learning
Unsupervised Learning
clda.co/webinar-ML
?Machine Learning taxonomy
classifica3on
regression 170cm
Supervised Learning
Unsupervised Learning
clda.co/webinar-ML
Machine Learning taxonomy
Supervised Learning
Unsupervised Learning
clda.co/webinar-ML
Machine Learning taxonomy
clustering
rule extrac3on
group A group B
A, B C
Supervised Learning
Unsupervised Learning
clda.co/webinar-ML
What problems can ML solve for you?
Supervised Learning
Unsupervised Learning
classifica'on
regression
clustering
rule extrac'on
?
170cm
gro gro
A, B C
clda.co/webinar-ML
What problems can ML solve for you?
Supervised Learning
Unsupervised Learning
classifica'on
regression
clustering
rule extrac'on
?fraud detecGon
170cm
gro gro
A, B C
price of a stock over Gme
purchase likelihood
user segmentaGon
clda.co/webinar-ML
LearningDataMachine
Cloud
Big
Science
Information
Internet
Statistics
Technology
Python Future
Mining Social
Deep
IOT
AlgorithmsManagement
Storage Petabytes
Parallel
Network
Privacy
MillionNoSQL
PaaS
SQL
Database
Exabytes
Billion
Dataset
Hadoop
R
clda.co/webinar-ML
Machine learning and Big data
“90% of the data in the world today has been created in the last two years alone” -‐ IBM
“300+ hours worth of video content is being uploaded to the site every minute” -‐ Youtube
clda.co/webinar-ML
Big data challenges
clda.co/webinar-ML
This much data can’t be manually inspected
Data-‐driven decisions
Distributed/parallel compu=ng
The curse of dimensionality
Why is deploying ML models a challenge?
1. Prototyping != Produc=on-‐ready
2. We need Elas=city
4. Avoid lack of ownership
clda.co/webinar-ML
3. Too many nice-‐to-‐have features
Where is the lack of ownership?
clda.co/webinar-ML
!=
Data Scien=st DevOps
Machine Learning Data mining
Sta:s:cal analysis
System administra:on (Cloud) Opera:ons SoCware engineering
Many op:ons and tools offered by AWS
ELB Auto Scaling
Elas:c Beanstalk
Amazon MLECS
EMR LambdaEC2
API Gateway
clda.co/webinar-ML
Serverless compu:ng to the rescue!
Transparent scalability, elas=city and availability
Developer-‐friendly maintenance (versioning + aliases)
AWS Lambda
Event-‐driven approach & never pay for idle
1 func=on = 1 model
clda.co/webinar-ML
A/B tes=ng via composi=on
How is “Serverless” possible?
There is always a server somewhere, you just don't have to worry about it :)
clda.co/webinar-ML
AWS Lambda + Amazon API Gateway
+AWS
LambdaAPI
Gateway
RESTful & auth layer
Global CDN and caching (CloudFront)
Staging & versioning & mocking
API Decoupling
clda.co/webinar-ML
Quick Example
clda.co/webinar-ML
clda.co/webinar-ML-example
clda.co/webinar-ML
clda.co/webinar-ML-lambda
AWS Lambda limita:ons
clda.co/webinar-ML
No real-‐=me models (only pseudo real-‐=me)
Deployment package management: size limit and OS libraries
Not suitable for model training yet (5 min max execu=on =me)AWS Lambda
What about Amazon Machine Learning?
clda.co/webinar-ML
Amazon ML
One of the first MLaaS solu=ons (1 year old)
Great service for classifica=on and regression
Only linear models (linear & logis=c regression + SGD)
No support for advanced scenarios yet (collabora=ve recommenda=on, mul=media, online learning, etc.)
Key Takeaways
clda.co/webinar-ML
Data-‐driven decision and user-‐centered ML will make your product smarter
Maximize ownership by removing obstacles btw prototype and produc=on
Eliminate tradeoffs btw high-‐scalability and nice-‐to-‐have features
Go Serverless and stop worrying about Ops
MLaaS makes your life even simpler, unless you need more control