mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical...
TRANSCRIPT
Collaborators: Evan Sparks, Michael Franklin, Michael I. Jordan, Tim Kraska
UC Berkeley
Ameet Talwalkar
Towards an OpBmizer for MLbase
Problem: Scalable implementa.ons difficult for ML Developers…
ML Developer
Meta-Data
Statistics
User
Declarative ML Task
ML Contract + Code
Master Server
….
result (e.g., fn-model & summary)
Optimizer
Parser
Executor/Monitoring
ML Library
DMX Runtime
DMX Runtime
DMX Runtime
DMX Runtime
LLP
PLP
Master
Slaves
Problem: Scalable implementa.ons difficult for ML Developers…
ML Developer
Meta-Data
Statistics
User
Declarative ML Task
ML Contract + Code
Master Server
….
result (e.g., fn-model & summary)
Optimizer
Parser
Executor/Monitoring
ML Library
DMX Runtime
DMX Runtime
DMX Runtime
DMX Runtime
LLP
PLP
Master
Slaves
Key Features
-
-
® ®
The Language of Technical Computing
MATLAB® is a high-level language and interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop algorithms, and create models and applications. The language, tools, and built-in math functions enable you to explore multiple approaches and reach a solution faster than with spreadsheets or traditional programming languages, such as C/C++ or Java™.
You can use MATLAB for a range of appli-cations, including signal processing and communications, image and video process-ing, control systems, test and measurement, computational finance, and computational biology. More than a million engineers and scientists in industry and academia use MATLAB, the language of technical computing.
MATLAB Overview 2:04
Analyzing and visualizing data using the MATLAB desktop. The MATLAB environment also lets you write programs and develop algorithms and applications.
Key Features
-
-
® ®
The Language of Technical Computing
MATLAB® is a high-level language and interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop algorithms, and create models and applications. The language, tools, and built-in math functions enable you to explore multiple approaches and reach a solution faster than with spreadsheets or traditional programming languages, such as C/C++ or Java™.
You can use MATLAB for a range of appli-cations, including signal processing and communications, image and video process-ing, control systems, test and measurement, computational finance, and computational biology. More than a million engineers and scientists in industry and academia use MATLAB, the language of technical computing.
MATLAB Overview 2:04
Analyzing and visualizing data using the MATLAB desktop. The MATLAB environment also lets you write programs and develop algorithms and applications.
Problem: Scalable implementa.ons difficult for ML Developers…
ML Developer
Meta-Data
Statistics
User
Declarative ML Task
ML Contract + Code
Master Server
….
result (e.g., fn-model & summary)
Optimizer
Parser
Executor/Monitoring
ML Library
DMX Runtime
DMX Runtime
DMX Runtime
DMX Runtime
LLP
PLP
Master
Slaves
Problem: Scalable implementa.ons difficult for ML Developers…
ML Developer
Meta-Data
Statistics
User
Declarative ML Task
ML Contract + Code
Master Server
….
result (e.g., fn-model & summary)
Optimizer
Parser
Executor/Monitoring
ML Library
DMX Runtime
DMX Runtime
DMX Runtime
DMX Runtime
LLP
PLP
Master
Slaves
CHALLENGE: Can we simplify distributed ML development?
Problem: ML is difficultfor End Users…
Too many ways to preprocess…
Problem: ML is difficultfor End Users…
Too many ways to preprocess…
Problem: ML is difficultfor End Users…
Too many algorithms…
Too many ways to preprocess…
Too many knobs…
Problem: ML is difficultfor End Users…
Too many algorithms…
Too many ways to preprocess…
Too many knobs…
Problem: ML is difficultfor End Users…
Difficult to debug…
Too many algorithms…
Too many ways to preprocess…
Too many knobs…
Problem: ML is difficultfor End Users…
Difficult to debug…
Doesn’t scale…
Too many algorithms…
Too many ways to preprocess…
Too many knobs…
Problem: ML is difficultfor End Users…
Difficult to debug…
Doesn’t scale…
CHALLENGE: Can we automate ML pipeline
construcBon?
Too many algorithms…
MLbase
4
MLbase aims to simplify development and deployment of ML
pipelines
MLbase
4
Apache Spark
Spark: Cluster compuBng system designed for iteraBve computaBon
MLbase aims to simplify development and deployment of ML
pipelines
MLbase
4
MLlibApache Spark
Spark: Cluster compuBng system designed for iteraBve computaBon
MLlib: Spark’s core ML library
MLbase aims to simplify development and deployment of ML
pipelines
MLbase
4
MLlib
MLI
Apache Spark
Spark: Cluster compuBng system designed for iteraBve computaBon
MLlib: Spark’s core ML library
MLI: API to simplify ML development
MLbase aims to simplify development and deployment of ML
pipelines
MLbase
4
MLlib
MLIMLOpt
Apache Spark
Spark: Cluster compuBng system designed for iteraBve computaBon
MLlib: Spark’s core ML library
MLI: API to simplify ML development
MLOpt: DeclaraBve layer that aims to automate ML pipeline construcBon via search over feature extractors and models
MLbase aims to simplify development and deployment of ML
pipelines
MLbase
4
MLlib
MLIMLOpt
Apache Spark
Spark: Cluster compuBng system designed for iteraBve computaBon
MLlib: Spark’s core ML library
MLI: API to simplify ML development
MLOpt: DeclaraBve layer that aims to automate ML pipeline construcBon via search over feature extractors and models
MLbase aims to simplify development and deployment of ML
pipelinesMLOpt and MLI are experimental testbeds
Vision MLlib and MLI MLOpt
6
MLlib+ Scalable and fast + Simple development environment + Part of Spark’s robust ecosystem
SparkSQL
Apache Spark
Spark Streaming
MLlib (machine learning)
GraphX (graph)
AcBve Development
AcBve DevelopmentIni.al Release• Developed by MLbase team in AMPLab (11 contributors)
• Scala, Java
• Shipped with Spark v0.8 (Sep 2013)
AcBve DevelopmentIni.al Release• Developed by MLbase team in AMPLab (11 contributors)
• Scala, Java
• Shipped with Spark v0.8 (Sep 2013)
11 months later…• 55+ contributors from various organizaBons
• Scala, Java, Python
• Improved documentaBon / code examples, API stability
• Latest release part of Spark v1.0 (May 2014)
Algorithms in v0.8• classifica.on: logisBc regression, linear support vector machines (SVM)
• regression: linear regression,
• collabora.ve filtering: alternaBng least squares (ALS)
• clustering: k-‐means
• op.miza.on: stochasBc gradient descent (SGD)
Algorithms in v1.0• classifica.on: logisBc regression, linear support vector machines (SVM), naive Bayes, decision trees
• regression: linear regression, regression trees
• collabora.ve filtering: alternaBng least squares (ALS)
• clustering: k-‐means
• op.miza.on: stochasBc gradient descent (SGD), limited-‐memory BFGS (L-‐BFGS)
• dimensionality reduc.on: singular value decomposiBon (SVD), principal component analysis (PCA)
MLlib, MLI and Roadmap
MLlib, MLI and Roadmap• MLI: Shield ML Developers from low-‐details
• Provide familiar mathemaBcal operators in distributed sebng (tables, matrices, opBmizaBon primiBves)
• Standard APIs defining ML algorithms and feature extractors
MLlib, MLI and Roadmap• MLI: Shield ML Developers from low-‐details
• Provide familiar mathemaBcal operators in distributed sebng (tables, matrices, opBmizaBon primiBves)
• Standard APIs defining ML algorithms and feature extractors
• Many of these ideas are (or soon will be) in MLlib
MLlib, MLI and Roadmap• MLI: Shield ML Developers from low-‐details
• Provide familiar mathemaBcal operators in distributed sebng (tables, matrices, opBmizaBon primiBves)
• Standard APIs defining ML algorithms and feature extractors
• Many of these ideas are (or soon will be) in MLlib
• Next release of Spark and MLlib being tested now • staBsBcal toolbox, python decision tree API, online logisBc regression, …
MLlib, MLI and Roadmap• MLI: Shield ML Developers from low-‐details
• Provide familiar mathemaBcal operators in distributed sebng (tables, matrices, opBmizaBon primiBves)
• Standard APIs defining ML algorithms and feature extractors
• Many of these ideas are (or soon will be) in MLlib
• Next release of Spark and MLlib being tested now • staBsBcal toolbox, python decision tree API, online logisBc regression, …
• Longer term • Scalable implementaBons of standard ML algorithms and underlying opBmizaBon primiBves
• Support for ML pipeline development (related to MLOpt)
MLlib, MLI and Roadmap• MLI: Shield ML Developers from low-‐details
• Provide familiar mathemaBcal operators in distributed sebng (tables, matrices, opBmizaBon primiBves)
• Standard APIs defining ML algorithms and feature extractors
• Many of these ideas are (or soon will be) in MLlib
• Next release of Spark and MLlib being tested now • staBsBcal toolbox, python decision tree API, online logisBc regression, …
• Longer term • Scalable implementaBons of standard ML algorithms and underlying opBmizaBon primiBves
• Support for ML pipeline development (related to MLOpt)
Feedback and Contribu9ons Encouraged!
Vision MLlib and MLI MLOpt
Grand Vision
✦ User declaraBvely specifies a task ✦ Search through MLlib/MLI to find
the best model/pipeline
SQL Result ‘MQL’ Model
A Standard ML Pipeline
!Data
A Standard ML Pipeline
!Data
Feature ExtracBon
A Standard ML Pipeline
!Data
Feature ExtracBon
Model Training
A Standard ML Pipeline
!Data
Feature ExtracBon
Model Training
Final Model
A Standard ML Pipeline
!Data
Feature ExtracBon
Model Training
Final Model
A Standard ML Pipeline
✦ In pracBce, model building is an iteraBve process of conBnuous refinement
!Data
Feature ExtracBon
Model Training
Final Model
A Standard ML Pipeline
✦ In pracBce, model building is an iteraBve process of conBnuous refinement
✦ Our grand vision is to automate the construcBon of these pipelines
Training A Model
Training A Model
✦ For each point in dataset ✦ compute gradient ✦ update model ✦ repeat unBl converged
Training A Model
✦ For each point in dataset ✦ compute gradient ✦ update model ✦ repeat unBl converged
✦ Requires mul.ple passes
Training A Model
✦ For each point in dataset ✦ compute gradient ✦ update model ✦ repeat unBl converged
✦ Requires mul.ple passes✦ Common access pafern
✦ Naive Bayes, Trees, etc.
Training A Model
✦ For each point in dataset ✦ compute gradient ✦ update model ✦ repeat unBl converged
✦ Requires mul.ple passes✦ Common access pafern
✦ Naive Bayes, Trees, etc.
✦ Minutes to train an SVM on 200GB of data on a 16-‐node cluster
The Tricky Part✦ Algorithms
✦ LogisBc Regression, SVM, Tree-‐based, etc.
✦ Algorithm hyper-‐parameters ✦ Learning Rate, RegularizaBon, etc.
Algorithms
Hyper Parameters
The Tricky Part✦ Algorithms
✦ LogisBc Regression, SVM, Tree-‐based, etc.
✦ Algorithm hyper-‐parameters ✦ Learning Rate, RegularizaBon, etc.
Algorithms
Hyper Parameters
FeaturizaBon
✦ FeaturizaBon ✦ Text: n-‐grams, TF-‐IDF ✦ Images: Gabor filters, random
convoluBons ✦ Random projecBon? Scaling?
The Tricky Part✦ Algorithms
✦ LogisBc Regression, SVM, Tree-‐based, etc.
✦ Algorithm hyper-‐parameters ✦ Learning Rate, RegularizaBon, etc.
Algorithms
Hyper Parameters
FeaturizaBon
✦ FeaturizaBon ✦ Text: n-‐grams, TF-‐IDF ✦ Images: Gabor filters, random
convoluBons ✦ Random projecBon? Scaling?
A Standard ML Pipeline!
DataFeature ExtracBon
Model Training
Final Model
A Standard ML Pipeline
✦ In pracBce, model building is an iteraBve process of conBnuous refinement
✦ Our grand vision is to automate the construcBon of these pipelines
!Data
Feature ExtracBon
Model Training
Final Model
A Standard ML Pipeline
✦ In pracBce, model building is an iteraBve process of conBnuous refinement
✦ Our grand vision is to automate the construcBon of these pipelines
✦ Start with one aspect of the pipeline -‐ model selecBon
!Data
Feature ExtracBon
Model Training
Final Model
Automated Model Selec.on
One Approach
Learning Rate
RegularizaBon
One Approach
Learning Rate
RegularizaBon✦ Try it all!
✦ Search over all hyperparameters, algorithms, features, etc.
One Approach
Learning Rate
RegularizaBon
Best answer
✦ Try it all! ✦ Search over all
hyperparameters, algorithms, features, etc.
One Approach
Learning Rate
RegularizaBon
Best answer
✦ Try it all! ✦ Search over all
hyperparameters, algorithms, features, etc.
One Approach
Learning Rate
RegularizaBon
Best answer
✦ Try it all! ✦ Search over all
hyperparameters, algorithms, features, etc.
One Approach
Learning Rate
RegularizaBon
Best answer
✦ Try it all! ✦ Search over all
hyperparameters, algorithms, features, etc.
One Approach
Learning Rate
RegularizaBon
Best answer
✦ Try it all! ✦ Search over all
hyperparameters, algorithms, features, etc.
One Approach
Learning Rate
RegularizaBon
Best answer
✦ Try it all! ✦ Search over all
hyperparameters, algorithms, features, etc.
One Approach
Learning Rate
RegularizaBon
Best answer
✦ Try it all! ✦ Search over all
hyperparameters, algorithms, features, etc.
One Approach
Learning Rate
RegularizaBon
Best answer
✦ Try it all! ✦ Search over all
hyperparameters, algorithms, features, etc.
One Approach
Learning Rate
RegularizaBon
Best answer
✦ Try it all! ✦ Search over all
hyperparameters, algorithms, features, etc.
One Approach
Learning Rate
RegularizaBon
Best answer
✦ Try it all! ✦ Search over all
hyperparameters, algorithms, features, etc.
✦ Drawbacks ✦ Expensive to compute models ✦ Hyperparameter space is large
✦ Some version of this sBll oken done in pracBce!
A Befer Approach
✦ Befer resource uBlizaBon ✦ through batching
✦ Algorithmic Speedups ✦ via early stopping
✦ Improved Search ✦ e.g., via randomizaBon
Learning Rate
RegularizaBon
Best answer
A Befer Approach
✦ Befer resource uBlizaBon ✦ through batching
✦ Algorithmic Speedups ✦ via early stopping
✦ Improved Search ✦ e.g., via randomizaBon
Learning Rate
RegularizaBon
Best answer
A Befer Approach
✦ Befer resource uBlizaBon ✦ through batching
✦ Algorithmic Speedups ✦ via early stopping
✦ Improved Search ✦ e.g., via randomizaBon
Learning Rate
RegularizaBon
Best answer
A Befer Approach
✦ Befer resource uBlizaBon ✦ through batching
✦ Algorithmic Speedups ✦ via early stopping
✦ Improved Search ✦ e.g., via randomizaBon
Learning Rate
RegularizaBon
Best answer
A Tale Of 3 OpBmizaBons
Be4er Resource U.liza.on
Algorithmic Speedups
Improved Search
Befer Resource UBlizaBon
Befer Resource UBlizaBon
✦ Modern memory slower than processors
✦ Can read: 0.6b doubles/sec/core (4.8 GB/s)
✦ Can compute: 15b flops/sec/core
Befer Resource UBlizaBon
✦ Modern memory slower than processors
✦ Can read: 0.6b doubles/sec/core (4.8 GB/s)
✦ Can compute: 15b flops/sec/core
✦ We can do 25 flops/double read
What Does This Mean For Modeling?
A B C
1 a Dog
1 b Cat
2 c Cat
2 d Cat
3 e Dog
3 f Horse
4 g Doge
Mod
el 2
Mod
el 1
What Does This Mean For Modeling?
✦ Typical model update requires 2-‐4 flops/double ✦ recall: 25 flops / double read
A B C
1 a Dog
1 b Cat
2 c Cat
2 d Cat
3 e Dog
3 f Horse
4 g Doge
Mod
el 2
Mod
el 1
What Does This Mean For Modeling?
✦ Typical model update requires 2-‐4 flops/double ✦ recall: 25 flops / double read
✦ Can do 7-‐10 model updates per double we read ✦ Assuming that models fit in
cache
A B C
1 a Dog
1 b Cat
2 c Cat
2 d Cat
3 e Dog
3 f Horse
4 g Doge
Mod
el 2
Mod
el 1
What Does This Mean For Modeling?
✦ Typical model update requires 2-‐4 flops/double ✦ recall: 25 flops / double read
✦ Can do 7-‐10 model updates per double we read ✦ Assuming that models fit in
cache
✦ Train mul.ple models simultaneously
A B C
1 a Dog
1 b Cat
2 c Cat
2 d Cat
3 e Dog
3 f Horse
4 g Doge
Mod
el 2
Mod
el 1
What Do We See In Spark?
✦ 2x and 5x increase in models trained/sec with batching
✦ Overhead from virtualizaBon, network, etc.
What Do We See In Spark?
What Do We See In Spark?
✦ These numbers are with vector-‐matrix mulBplies
What Do We See In Spark?
✦ These numbers are with vector-‐matrix mulBplies
✦ Can do befer when rewriBng in terms of matrix-‐matrix mulBplies
What Do We See In Spark?
✦ These numbers are with vector-‐matrix mulBplies
✦ Can do befer when rewriBng in terms of matrix-‐matrix mulBplies
A Tale Of 3 OpBmizaBons
Be4er Resource U.liza.on
Algorithmic Speedups
Improved Search
Learning Rate
RegularizaBon
Best answer
Algorithmic Speedups
Learning Rate
RegularizaBon
Best answer
Algorithmic Speedups
✦ Each point in hyper-‐parameter space represents trained model
Learning Rate
RegularizaBon
Best answer
Algorithmic Speedups
✦ Each point in hyper-‐parameter space represents trained model
✦ SomeBmes we see early on that a model is no good
Learning Rate
RegularizaBon
Best answer
Algorithmic Speedups
✦ Each point in hyper-‐parameter space represents trained model
✦ SomeBmes we see early on that a model is no good
✦ So we stop early -‐ give up on models that are not progressing
Learning Rate
RegularizaBon
Best answer
Algorithmic Speedups
✦ Each point in hyper-‐parameter space represents trained model
✦ SomeBmes we see early on that a model is no good
✦ So we stop early -‐ give up on models that are not progressing
Algorithmic Speedups
✦ Each point in hyper-‐parameter space represents trained model
✦ SomeBmes we see early on that a model is no good
✦ So we stop early -‐ give up on models that are not progressing
Algorithmic Speedups
✦ Each point in hyper-‐parameter space represents trained model
✦ SomeBmes we see early on that a model is no good
✦ So we stop early -‐ give up on models that are not progressing
Be4er Resource U.liza.on
Algorithmic Speedups
Improved Search
A Tale Of 3 OpBmizaBons
What Search Method?
What Search Method?
✦ Various derivaBve-‐free opBmizaBon techniques ✦ Simple ones (Grid, Random) ✦ Classic DerivaBve-‐Free (Nelder-‐Mead, Powell’s method) ✦ Bayesian (SMAC, TPE, Spearmint)
What Search Method?
✦ Various derivaBve-‐free opBmizaBon techniques ✦ Simple ones (Grid, Random) ✦ Classic DerivaBve-‐Free (Nelder-‐Mead, Powell’s method) ✦ Bayesian (SMAC, TPE, Spearmint)
✦ What should we do? ✦ Tried on 5 datasets, opBmized over 4 hyperparameters!
What Search Method? GRID NELDER_MEAD POWELL RANDOM SMAC SPEARMINT TPE
0.00.10.20.30.40.5
0.00.10.20.30.40.5
0.00.10.20.30.40.5
0.00.10.20.30.40.5
0.00.10.20.30.40.5
australianbreast
diabetesfourclass
splice
16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625Method and Maximum Calls
Data
set a
nd V
alid
atio
n Er
ror
Maximum Calls1681256625
Comparison of Search Methods Across Learning Problems
✦ Various derivaBve-‐free opBmizaBon techniques ✦ Simple ones (Grid, Random) ✦ Classic DerivaBve-‐Free (Nelder-‐Mead, Powell’s method) ✦ Bayesian (SMAC, TPE, Spearmint)
✦ What should we do? ✦ Tried on 5 datasets, opBmized over 4 hyperparameters!
What Search Method? GRID NELDER_MEAD POWELL RANDOM SMAC SPEARMINT TPE
0.00.10.20.30.40.5
0.00.10.20.30.40.5
0.00.10.20.30.40.5
0.00.10.20.30.40.5
0.00.10.20.30.40.5
australianbreast
diabetesfourclass
splice
16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625Method and Maximum Calls
Data
set a
nd V
alid
atio
n Er
ror
Maximum Calls1681256625
Comparison of Search Methods Across Learning Problems
✦ Various derivaBve-‐free opBmizaBon techniques ✦ Simple ones (Grid, Random) ✦ Classic DerivaBve-‐Free (Nelder-‐Mead, Powell’s method) ✦ Bayesian (SMAC, TPE, Spearmint)
✦ What should we do? ✦ Tried on 5 datasets, opBmized over 4 hyperparameters!
What Search Method? GRID NELDER_MEAD POWELL RANDOM SMAC SPEARMINT TPE
0.00.10.20.30.40.5
0.00.10.20.30.40.5
0.00.10.20.30.40.5
0.00.10.20.30.40.5
0.00.10.20.30.40.5
australianbreast
diabetesfourclass
splice
16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625Method and Maximum Calls
Data
set a
nd V
alid
atio
n Er
ror
Maximum Calls1681256625
Comparison of Search Methods Across Learning Problems
✦ Various derivaBve-‐free opBmizaBon techniques ✦ Simple ones (Grid, Random) ✦ Classic DerivaBve-‐Free (Nelder-‐Mead, Powell’s method) ✦ Bayesian (SMAC, TPE, Spearmint)
✦ What should we do? ✦ Tried on 5 datasets, opBmized over 4 hyperparameters!
What Search Method? GRID NELDER_MEAD POWELL RANDOM SMAC SPEARMINT TPE
0.00.10.20.30.40.5
0.00.10.20.30.40.5
0.00.10.20.30.40.5
0.00.10.20.30.40.5
0.00.10.20.30.40.5
australianbreast
diabetesfourclass
splice
16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625Method and Maximum Calls
Data
set a
nd V
alid
atio
n Er
ror
Maximum Calls1681256625
Comparison of Search Methods Across Learning Problems
Pubng It All Together
Pubng It All Together✦ First version of MLbase opBmizer
Pubng It All Together✦ First version of MLbase opBmizer✦ 30GB dense images (240K x 16K)✦ 2 model families, 5 hyperparams
Pubng It All Together✦ First version of MLbase opBmizer✦ 30GB dense images (240K x 16K)✦ 2 model families, 5 hyperparams✦ Baseline: grid search✦ Our method: combinaBon of
✦ Batching ✦ Early stopping ✦ Random or TPE
●●●●●●●●●●●●●●●●
●●●●
●●●●
●●●●
●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●
●●●
●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
0.25
0.50
0.75
0 200 400 600 800Time elapsed (m)
Best
Val
idat
ion
Erro
r See
n So
Far
Search Method●
●
●
Grid − UnoptimizedRandom − OptimizedTPE − Optimized
Model Convergence Over Time
Pubng It All Together✦ First version of MLbase opBmizer✦ 30GB dense images (240K x 16K)✦ 2 model families, 5 hyperparams✦ Baseline: grid search✦ Our method: combinaBon of
✦ Batching ✦ Early stopping ✦ Random or TPE
●●●●●●●●●●●●●●●●
●●●●
●●●●
●●●●
●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●
●●●
●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
0.25
0.50
0.75
0 200 400 600 800Time elapsed (m)
Best
Val
idat
ion
Erro
r See
n So
Far
Search Method●
●
●
Grid − UnoptimizedRandom − OptimizedTPE − Optimized
Model Convergence Over Time
Pubng It All Together✦ First version of MLbase opBmizer✦ 30GB dense images (240K x 16K)✦ 2 model families, 5 hyperparams✦ Baseline: grid search✦ Our method: combinaBon of
✦ Batching ✦ Early stopping ✦ Random or TPE
20x speedup compared to grid search 15 minutes vs 5 hours!
Does It Scale?
●●●●●●●● ●●●●●● ●●●●● ●●●●
●●●●
●● ●● ●
0.25
0.50
0.75
5 10Time elapsed (h)
Best
Val
idat
ion
Erro
r See
n So
Far
Convergence of Model Accuracy on 1.5TB Dataset
Does It Scale?
✦ 1.5TB dataset (1.2M x 160K)●●●●●●●● ●●●●●● ●●●●● ●●●●
●●●●
●● ●● ●
0.25
0.50
0.75
5 10Time elapsed (h)
Best
Val
idat
ion
Erro
r See
n So
Far
Convergence of Model Accuracy on 1.5TB Dataset
Does It Scale?
✦ 1.5TB dataset (1.2M x 160K)✦ 128 nodes, thousands of passes
over data✦ Tried 32 models in 15 hours
✦ Good answer aker 11 hours
●●●●●●●● ●●●●●● ●●●●● ●●●●
●●●●
●● ●● ●
0.25
0.50
0.75
5 10Time elapsed (h)
Best
Val
idat
ion
Erro
r See
n So
Far
Convergence of Model Accuracy on 1.5TB Dataset
Future Work
!Data
Feature ExtracBon
Model Training
Final Model
Future Work
!Data
Feature ExtracBon
Model Training
Final Model
Automated Model Selec.on
Future Work
!Data
Feature ExtracBon
Model Training
Final Model
Future Work
!Data
Feature ExtracBon
Model Training
Final Model
Automated ML Pipeline Construc.on
A Real Pipeline for Image ClassificaBon
Inspired by Coates & Ng, 2012
Data Image Parser Normalizer Convolver
sqrt,mean
Zipper
Linear Solver
Symmetric Rectifier
ident,absident,mean
Global Pooling
Pooler
Patch Extractor
Patch Whitener
KMeans Clusterer
Feature Extractor
Label Extractor
ModelLinear Mapper
Test Data
Label Extractor
Feature Extractor
Test Error
Error Computer
Slide courtesy of Evan Sparks
Data Image Parser Normalizer Convolver
sqrt,mean
Zipper
Linear Solver
Symmetric Rectifier
ident,absident,mean
Global
Pooler
Patch Extractor
Patch Whitener
KMeans Clusterer
Feature Extractor
Label Extractor
Linear Mapper Model
Test Data
Label Extractor
Feature Extractor
Test Error
Error Computer
No Hyperparameters A few Hyperparameters Lotsa Hyperparameters
Slide courtesy of Evan Sparks
Other Future Work
Other Future Work
✦ Ensembling
Other Future Work
✦ Ensembling
✦ Leverage sampling
Other Future Work
✦ Ensembling
✦ Leverage sampling
✦ Befer parallelism for smaller datasets
Other Future Work
✦ Ensembling
✦ Leverage sampling
✦ Befer parallelism for smaller datasets
✦ MulBple hypothesis tesBng issues
MLOpt: DeclaraBve layer that aims to automate ML pipeline construcBon
MLlib: Spark’s core ML library
MLI: API to simplify ML development
Spark: Cluster compuBng system designed for iteraBve computaBon
THANKS! QUESTIONS?
baseML
baseML
baseML
baseML
ML base
ML base
ML base
ML base
ML base
www.mlbase.org
MLlib
MLIMLOpt
Apache Spark