dato keynote
TRANSCRIPT
The ML pipeline circa 2013
Data ML
Algorithm
My curve is better than your curve
Write a paper
Retail
Movie Distribution
Music
Advertising
Networking
Search
Taxis
Dating
Legal Advice Human Resources
Coupons
Campaigning
Real Estate
Wearables
CRM
Disruptive companies differentiated by
INTELLIGENT APPLICATIONS
using
Machine Learning
Dato’s mission is to accelerate the creation of
intelligent applications
by making sophisticated machine learning
as easy as “Hello world!”
• Released 3 products
• More than 10,000 downloads
GraphLab Create Dato Distributed Dato Predictive Services
Since last year…
Since last year…
Our customers…
Demo: Intelligent application (Gift for Julia)
Systems Elastic, scalable
People Data scientist
Challenge today: Path from inspiration to production
Production
Prototyping
Inspiration
Scale
Sophisticated ML Production
Sophisticated ML is impractical
• Hard to match algo to app • Algos trapped in paper
Scaling is costly
• Rewrite algo from scratch • Expensive infrastructure
Deployment: more costly infrastructure & time
• Build custom services & API • Model quality deteriorates
Deploy Service
Slow & expensive process
Sophisticated ML is impractical
ML
dev
elo
pm
ent
tod
ay
Inspiration for Intelligent Application
Data
Top down solution would be easiest
Read data
Extract text
Create features
Choose model
Tune parameter
Forced to go bottoms up
Try again
And again
but not possible:
Application is innovative →
no black box solution available
Fine approach if it’s 2013 & I’m obsessed with
“my curve is better than your curve” (i.e., yet another solution for same old problem)
or not primarily focused on accelerating creation of intelligent applications
Inspiration for Intelligent Application
Data
If in 5 years all applications intelligent, ML needs:
Start from relevant, high-level, sophisticated ML building blocks
Don’t waste time on boring stuff, like parameter search or
worry about specialized ML knowledge, like SGD
Quickly write code: combine, blend,
understand, adapt, improve, optimize
Read data
Extract text
Create features
Choose model
Tune parameter
Forced to go bottoms up
Try again
And again
ML done differently,
Let’s see
how…
Demo: Building an intelligent application with GraphLab Create (Restaurant recommender)
High-level ML toolkits get started with 4 lines of code, then modify, blend, add yours…
Recommender Image search
Sentiment analysis
Data matching
Auto tagging
Churn predictor
Object detector
Product sentiment
Click prediction Fraud detection User
segmentation Data
completion
Anomaly detection
Document clustering Forecasting Search
ranking Summarization …
import graphlab as gl data = gl.SFrame.read_csv('my_data.csv') model = gl.recommender.create(data,
user_id='user', item_id='movie’, target='rating')
recommendations = model.recommend(k=5)
Sophisticated machine learning made easy Create Intelligence Accelerants
High-level ML toolkits
AutoML
tune params, model selection,…
è so you can focus on
creative parts
Reusable features
transferrable feature engineering
è accuracy with less data &
less effort
Makes ML hard
Understand & scale
complex models
Feature engineering
Need for lots of
labeled data
Very hard! Usually: Simple models & lots of feature engineering
Krishna’s talk tomorrow @9:10am: auto feature engineering Next: Transfer learning can provide complex models with less work & less data
Modeling challenge Data challenge
Representation challenge
Example: Deep learning in computer vision
(or the deep devil is in the deep details)
Image features • Features = local detectors
o Combined to make prediction o (in reality, features are more low-level)
Face!
Eye
Eye
Nose
Mouth
Many hand create features exist… Computer$vision$features$
SIFT$ Spin$image$
HoG$ RIFT$
Textons$ GLOH$Slide$Credit:$Honglak$Lee$
Standard image classification approach
Input
Computer$vision$features$
SIFT$ Spin$image$
HoG$ RIFT$
Textons$ GLOH$Slide$Credit:$Honglak$Lee$
Extract features Use simple classifier e.g., logistic regression, SVMs
Car?
Many hand create features exist… Computer$vision$features$
SIFT$ Spin$image$
HoG$ RIFT$
Textons$ GLOH$Slide$Credit:$Honglak$Lee$
… but very painful to design
Deep neural networks implicitly learn features
Each layer learns features, at different levels of abstraction
Y LeCunMA Ranzato
Deep Learning = Learning Hierarchical Representations
It's deep if it has more than one stage of non-linear feature transformation
Trainable Classifier
Low-LevelFeature
Mid-LevelFeature
High-LevelFeature
Feature visualization of convolutional net trained on ImageNet from [Zeiler & Fergus 2013]
Color & edge detectors
Geometric detectors
Car-specific detectors
Deep learning has yielded exciting accuracy, e.g., Krizhevsky et al. won 2012 ImageNet competition impressively
Huge gain
Challenges of deep learning
Deep learning workflow
Lots of labeled data
Training set
Validation set
80%
20%
Learn deep neural net
model
Validate
Many tricks needed to work well…
Different types of layers, connections,… needed for high accuracy
Krizhevsky et al. ‘12
GraphLab Create adds deep features
Deep learning + Transfer learning
Change image classification approach?
Input
Computer$vision$features$
SIFT$ Spin$image$
HoG$ RIFT$
Textons$ GLOH$Slide$Credit:$Honglak$Lee$
Extract features Use simple classifier e.g., logistic regression, SVMs
Car?
Can we learn features from data, even when
we don’t have data or time?
Transfer learning: Use data from one domain to help learn on another
Lots of data:
Learn neural net
Great accuracy on cat v. dog vs.
Some data:
Neural net as feature extractor
+
Simple classifier
Great accuracy on 101
categories
Old idea, explored for deep learning by Donahue et al. ’14
What’s learned in a neural net Neural net trained for Task 1: cat vs. dog
Very specific to Task 1 Should be ignored for other tasks
More generic Can be used as feature extractor
vs.
Transfer learning in more detail…
Neural net trained for Task 1: cat vs. dog
Very specific to Task 1 Should be ignored for other tasks
More generic Can be used as feature extractor
Keep weights fixed!
For Task 2, predicting 101 categories, learn only end part
Use simple classifier e.g., logistic regression, SVMs
Class?
Transfer learning with deep features
Training set
Validation set
80%
20%
Learn simple model
Some labeled data
Extract features
with neural net trained on different
task
Validate Deploy in
production
Deep learning tutorial tomorrow, 4pm!
Demo: The power of deep features, a.k.a., transfer learning (Shoes, please)
How general are deep features?
Talk by founder, Jason Gates, tomorrow 9:40am
GraphLab Create includes easy to use, deep learning on multi-GPUs
Deep learning tutorial tomorrow, 4pm!
graphlab.deeplearning.create(data,target=label')
Deep learning in 1 line of code You can also
open the box and add your own layers
Average Pooling Layer Rectified Linear Layer
Convolution Layer Sigmoid Layer
Dropout Layer SoftMax Layer
Flatten Layer SoftPlus Layer
Full Connection Layer Sum Pooling Layer
Max Pooling Layer Tanh Layer
0.60%
0.65%
0.70%
0.75%
0.80%
0.85%
0 5 10 15
Tes
t Er
ror
Hours
Digit recognition benchmark
H2O.ai: 10 machines/80 cores
GraphLab Create 4 min on 4 GPUs
GraphLab Create for intelligent applications
High-level ML toolkits (4 lines of code gets you started)
deep learning, recommender, product reviews, data matching, sentiment, image search, churn,
click prediction, customer segmentation, fraud detection,…
Auto Feature Engineering (automate, achieve high accuracy)
. deep & reusable features . data transformation pipelines . kernels & hashing, encodings
AutoML (automate to focus on creativity) . parameter search . model selection . algorithm selection . distributed
Tables, graphs, text, images
Scalable viz for TBs of data
Including Matplotlib at scale
Anthony Goldbloom Founder & CEO
Debora Donato Sr. Director of Personalization & Principal Data Scientist
Native Advertising – The opportunity of making ads valuable
For the users
For the publishers
Bad advertising does not work for anybody
The data: • 400k raw html pages containing:
o text, images, links, and well, everything web pages have The task: • predict which pages are organic and which are
sponsored advertising When: • starts August 1!
The Prize • Fame!!! • Knowledge!!! • $10,000
A lot of effort in Kaggle competitions involves running many experiments…
…can get slow L
SFrame ❤ ️ all ML tools SGraph
Sophisticated machine learning made scalable Data Structures to Create Intelligence
Data frames user movie rating
When you choose a data frame,
have your application in mind
SFrame is optimized for ML
ML has specific data access patterns,
we make them fast, really fast (Columnar transformations,
creating new features, iterations,…)
… Same code
user movie rating
SFrame: Scalable data frame optimized for ML Never run out of memory Sharded, compressed, out-of-core, columnar Arbitrary lambda transformations, joins,… from Python
Talk tomorrow with details: Yucheng @11am
Large data on one machine?
Limited RAM è Must use disk (out-of-core computation)
Opportunity for Out-of-Core ML
Capacity 1 TB
0.5 GB/s
10 TB
0.1 GB/s
0.1 TB
1 GB/s Throughput
Fast, but significantly limits data size Opportunity for big data on 1 machine
For sequential reads only! Random access very slow
Out-of-core ML opportunity is huge
Usual design → Lots of random access → Slow
Design to maximize sequential access for
ML algo patterns
GraphChi early example SFrame data frame for ML
Demo: 10TBs of data on one machine!
SFrame ❤ ️ all ML
scikit-learn is awesome, but...
0
1000
2000
3000
4000
0 50 100 150 200 250 300 350 400
Ru
ntim
e (s
)
Millions of Rows Airline Delay Dataset, SGDLinearClassifier
scikit-learn +
Numpy
Out of RAM Numpy in memory only
Demo: 10TBs of data on one machine redux
Numpy Automatically Backed by Sframes → Scale many Python packages (scikit-learn, scipy,…)
import graphlab.numpy Scalable numpy activation successful
0
1000
2000
3000
4000
0 50 100 150 200 250 300 350 400
Ru
ntim
e (s
)
Millions of Rows Airline Delay Dataset,
SGDLinearClassifier
Out of RAM Graphlab Create
+ scikit-learn
+ Numpy
scikit-learn +
Numpy
Caveats apply
- Scales most memory-bound sklearn algorithms
- Sequential access highly preferred for performance
ML is not just about tables
ML pipelines combine multiple data types
Raw Wikipedia
< / > < / > < / > XML
Hyperlinks PageRank Top 20 Pages
Title PR Text
Table
Title Body Topic Model
(LDA) Word Topics
Word Topic
Term-Doc Graph
SGraph
Graph processing & analytics
Out-of-core & scalable
Neighborhoods, paths, graph algos, community detection,
label propagation, ML on graphs, viz, …
Backed by SFrame
Performance of SGraph
55
70 sec
251 sec
200 sec
2,128 sec
0 750 1500 2250
GraphLab Create
GraphX
Giraph
Spark
Connected components in Twitter graph
Source(s): Gonzalez et. al. (OSDI 2014) Twitter: 41 million Nodes, 1.4 billion Edges
SGraph
16 machines
1 machine
Pagerank on Common Crawl Graph 3.5 billion Nodes and 128 billion Edges
0
2
4
6
8
10
1 machine
Min
ute
s p
er it
erat
ion
16 CPUs, 1 SSD
We ❤ ️ open source
SFrame & SGraph
Optimized out-of-core
computation for ML
High Performance 1 machine can handle:
TBs of data 100s Billions of edges
Optimized for ML . Columnar transformation . Create features . Iterators . Filter, join, group-by, aggregate . User-defined functions . Easily extended through SDK
Tables, graphs, text, images
Open-source ❤ ️
BSD license
(August)
Distributed machine learning
Your big data infrastructure
(cloud, hadoop, spark,..)
Sophisticated machine learning made distributed Create Intelligence on Huge Data
Pagerank on Common Crawl Graph 3.5 billion Nodes and 128 billion Edges
0
2
4
6
8
10
1 machine 16 machines
Min
ute
s p
er it
erat
ion
256 CPUs 16 CPUs
45 secs/iteration 3B edges/sec
Criteo Terabyte Click Prediction
4.4 Billion Rows 13 Features
½ TB of data
0
500
1000
1500
2000
2500
3000
3500
4000
0 4 8 12 16
Ru
ntim
e
#Machines
225s
3630s
Same code, distributed ML
import graphlab as gl data = gl.SFrame.read_csv(’s3://…') model = gl.classifier.create(data,
target=’click’)
Sin
gle
mac
hin
e
ML
cod
e
c = gl.deploy.ec2_cluster.load(’s3://…')
gl.set_distributed_execution_environment(c)
c = gl.deploy.hadoop_cluster.load(’hdfs://…') c = gl.deploy.spark_cluster.load(’hdfs://…') …
Dato machine learning platform
Inspiration
Scale
Sophisticated ML
Optimized for ML performance, for any data size, on any infrastructure
AutoML
GraphLab Create
ML Toolkits
Canvas
Reusable Features
Job Mgmt
Distributed Engine
Distributed ML Dato Distributed
SGraph
Create Engine
SFrame GraphLab Create
Machine Learning In Production
Machine Learning in Production
Deployment
Easily serve live predictions
Deployment Engineers
Deploying ML models
Data Scientists
Exciting new deep learning model.
How long is this going to take?!
REST API! I will be done today.
It’s accurate!
Dato Predictive Services
Choosing between deployed models
Machine Learning in Production
Evaluation
Monitoring
Deployment
Management
Easily serve live predictions
Measuring quality of deployed models
Tracking model operations
Talk tomorrow with details: Alice & Rajat @1:45pm
Evaluation
Monitoring
Deployment
Management
Inspiration
Scale
Sophisticated ML
Optimized for ML performance, for any data size, on any infrastructure
AutoML
GraphLab Create
ML Toolkits
Canvas
Reusable Features
Job Mgmt
Distributed Engine
Distributed ML Dato Distributed
SGraph
Create Engine
SFrame GraphLab Create
Dato machine learning platform
Dato machine learning platform
Inspiration
Scale
Production Deploy Service
Optimized for ML performance, for any data size, on any infrastructure
AutoML
GraphLab Create
ML Toolkits
Canvas
Reusable Features REST Client Model Mgmt
Dato Predictive Services
Robust, Elastic
Direct
Job Mgmt
Distributed Engine
Distributed ML Dato Distributed
SGraph
Create Engine
SFrame GraphLab Create
Sophisticated ML
Create of intelligent applications faster & cheaper
My curve is better than your curve
INTELLIGENT APPLICATIONS
are disrupting markets
Phase transition of machine learning
Accelerate this process
> pip install graphlab-create
[email protected] @guestrin