mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical...

Collaborators: Evan Sparks, Michael Franklin, Michael I. Jordan, Tim Kraska

UC Berkeley

Ameet Talwalkar

Towards an OpBmizer for MLbase

Problem: Scalable implementa.ons difficult for ML Developers…

ML Developer

Meta-Data

Statistics

User

Declarative ML Task

ML Contract + Code

Master Server

….

result (e.g., fn-model & summary)

Optimizer

Parser

Executor/Monitoring

ML Library

DMX Runtime

DMX Runtime

DMX Runtime

DMX Runtime

LLP

PLP

Master

Slaves


ML Developer

Meta-Data

Statistics

User

Declarative ML Task

ML Contract + Code

Master Server

….


Optimizer

Parser

Executor/Monitoring

ML Library

DMX Runtime

DMX Runtime

DMX Runtime

DMX Runtime

LLP

PLP

Master

Slaves

Key Features

-

-

® ®

The Language of Technical Computing

MATLAB® is a high-level language and interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop algorithms, and create models and applications. The language, tools, and built-in math functions enable you to explore multiple approaches and reach a solution faster than with spreadsheets or traditional programming languages, such as C/C++ or Java™.

You can use MATLAB for a range of appli-cations, including signal processing and communications, image and video process-ing, control systems, test and measurement, computational finance, and computational biology. More than a million engineers and scientists in industry and academia use MATLAB, the language of technical computing.

MATLAB Overview 2:04

Analyzing and visualizing data using the MATLAB desktop. The MATLAB environment also lets you write programs and develop algorithms and applications.

Key Features

-

-

® ®

The Language of Technical Computing

MATLAB® is a high-level language and interactive environment for numerical com-putation, visualization, and programming. Using MATLAB, you can analyze data, develop algorithms, and create models and applications. The language, tools, and built-in math functions enable you to explore multiple approaches and reach a solution faster than with spreadsheets or traditional programming languages, such as C/C++ or Java™.

You can use MATLAB for a range of appli-cations, including signal processing and communications, image and video process-ing, control systems, test and measurement, computational finance, and computational biology. More than a million engineers and scientists in industry and academia use MATLAB, the language of technical computing.

MATLAB Overview 2:04

Analyzing and visualizing data using the MATLAB desktop. The MATLAB environment also lets you write programs and develop algorithms and applications.


ML Developer

Meta-Data

Statistics

User

Declarative ML Task

ML Contract + Code

Master Server

….


Optimizer

Parser

Executor/Monitoring

ML Library

DMX Runtime

DMX Runtime

DMX Runtime

DMX Runtime

LLP

PLP

Master

Slaves


ML Developer

Meta-Data

Statistics

User

Declarative ML Task

ML Contract + Code

Master Server

….


Optimizer

Parser

Executor/Monitoring

ML Library

DMX Runtime

DMX Runtime

DMX Runtime

DMX Runtime

LLP

PLP

Master

Slaves

CHALLENGE: Can we simplify distributed ML development?

Problem: ML is difficultfor End Users…

Too many ways to preprocess…




Too many algorithms…


Too many knobs…




Too many knobs…


Difficult to debug…



Too many knobs…



Doesn’t scale…



Too many knobs…



Doesn’t scale…

CHALLENGE: Can we automate ML pipeline

construcBon?


MLbase

4

MLbase aims to simplify development and deployment of ML

pipelines

MLbase

4

Apache Spark

Spark: Cluster compuBng system designed for iteraBve computaBon


pipelines

MLbase

4

MLlibApache Spark


MLlib: Spark’s core ML library


pipelines

MLbase

4

MLlib

MLI

Apache Spark



MLI: API to simplify ML development


pipelines

MLbase

4

MLlib

MLIMLOpt

Apache Spark




MLOpt: DeclaraBve layer that aims to automate ML pipeline construcBon via search over feature extractors and models


pipelines

MLbase

4

MLlib

MLIMLOpt

Apache Spark




MLOpt: DeclaraBve layer that aims to automate ML pipeline construcBon via search over feature extractors and models


pipelinesMLOpt and MLI are experimental testbeds

Vision MLlib and MLI MLOpt

6

MLlib+ Scalable and fast + Simple development environment + Part of Spark’s robust ecosystem

SparkSQL

Apache Spark

Spark Streaming

MLlib (machine learning)

GraphX (graph)

AcBve Development

AcBve DevelopmentIni.al Release• Developed by MLbase team in AMPLab (11 contributors)

• Scala, Java

• Shipped with Spark v0.8 (Sep 2013)

AcBve DevelopmentIni.al Release• Developed by MLbase team in AMPLab (11 contributors)

• Scala, Java

• Shipped with Spark v0.8 (Sep 2013)

11 months later…• 55+ contributors from various organizaBons

• Scala, Java, Python

• Improved documentaBon / code examples, API stability

• Latest release part of Spark v1.0 (May 2014)

Algorithms in v0.8• classifica.on: logisBc regression, linear support vector machines (SVM)

• regression: linear regression,

• collabora.ve filtering: alternaBng least squares (ALS)

• clustering: k-‐means

• op.miza.on: stochasBc gradient descent (SGD)

Algorithms in v1.0• classifica.on: logisBc regression, linear support vector machines (SVM), naive Bayes, decision trees

• regression: linear regression, regression trees

• collabora.ve filtering: alternaBng least squares (ALS)

• clustering: k-‐means

• op.miza.on: stochasBc gradient descent (SGD), limited-‐memory BFGS (L-‐BFGS)

• dimensionality reduc.on: singular value decomposiBon (SVD), principal component analysis (PCA)

MLlib, MLI and Roadmap

MLlib, MLI and Roadmap• MLI: Shield ML Developers from low-‐details

• Provide familiar mathemaBcal operators in distributed sebng (tables, matrices, opBmizaBon primiBves)

• Standard APIs defining ML algorithms and feature extractors




• Many of these ideas are (or soon will be) in MLlib





• Next release of Spark and MLlib being tested now • staBsBcal toolbox, python decision tree API, online logisBc regression, …






• Longer term • Scalable implementaBons of standard ML algorithms and underlying opBmizaBon primiBves

• Support for ML pipeline development (related to MLOpt)






• Longer term • Scalable implementaBons of standard ML algorithms and underlying opBmizaBon primiBves

• Support for ML pipeline development (related to MLOpt)

Feedback and Contribu9ons Encouraged!

Vision MLlib and MLI MLOpt

Grand Vision

✦ User declaraBvely specifies a task ✦ Search through MLlib/MLI to find

the best model/pipeline

SQL Result ‘MQL’ Model

A Standard ML Pipeline

!Data


!Data

Feature ExtracBon


!Data

Feature ExtracBon

Model Training


!Data

Feature ExtracBon

Model Training

Final Model


!Data

Feature ExtracBon

Model Training

Final Model


✦ In pracBce, model building is an iteraBve process of conBnuous refinement

!Data

Feature ExtracBon

Model Training

Final Model



✦ Our grand vision is to automate the construcBon of these pipelines

Training A Model

Training A Model

✦ For each point in dataset ✦ compute gradient ✦ update model ✦ repeat unBl converged

Training A Model


✦ Requires mul.ple passes

Training A Model


✦ Requires mul.ple passes✦ Common access pafern

✦ Naive Bayes, Trees, etc.

Training A Model


✦ Requires mul.ple passes✦ Common access pafern

✦ Naive Bayes, Trees, etc.

✦ Minutes to train an SVM on 200GB of data on a 16-‐node cluster

The Tricky Part✦ Algorithms

✦ LogisBc Regression, SVM, Tree-‐based, etc.

✦ Algorithm hyper-‐parameters ✦ Learning Rate, RegularizaBon, etc.

Algorithms

Hyper Parameters

The Tricky Part✦ Algorithms

✦ LogisBc Regression, SVM, Tree-‐based, etc.

✦ Algorithm hyper-‐parameters ✦ Learning Rate, RegularizaBon, etc.

Algorithms

Hyper Parameters

FeaturizaBon

✦ FeaturizaBon ✦ Text: n-‐grams, TF-‐IDF ✦ Images: Gabor filters, random

convoluBons ✦ Random projecBon? Scaling?

A Standard ML Pipeline!

DataFeature ExtracBon

Model Training

Final Model




!Data

Feature ExtracBon

Model Training

Final Model




✦ Start with one aspect of the pipeline -‐ model selecBon

!Data

Feature ExtracBon

Model Training

Final Model

Automated Model Selec.on

One Approach

Learning Rate

RegularizaBon

One Approach

Learning Rate

RegularizaBon✦ Try it all!

✦ Search over all hyperparameters, algorithms, features, etc.

One Approach

Learning Rate

RegularizaBon

Best answer

✦ Try it all! ✦ Search over all

hyperparameters, algorithms, features, etc.

One Approach

Learning Rate

RegularizaBon

Best answer

✦ Try it all! ✦ Search over all

hyperparameters, algorithms, features, etc.

✦ Drawbacks ✦ Expensive to compute models ✦ Hyperparameter space is large

✦ Some version of this sBll oken done in pracBce!

A Befer Approach

✦ Befer resource uBlizaBon ✦ through batching

✦ Algorithmic Speedups ✦ via early stopping

✦ Improved Search ✦ e.g., via randomizaBon

Learning Rate

RegularizaBon

Best answer

A Tale Of 3 OpBmizaBons

Be4er Resource U.liza.on

Algorithmic Speedups

Improved Search

Befer Resource UBlizaBon


✦ Modern memory slower than processors

✦ Can read: 0.6b doubles/sec/core (4.8 GB/s)

✦ Can compute: 15b flops/sec/core


✦ Modern memory slower than processors

✦ Can read: 0.6b doubles/sec/core (4.8 GB/s)

✦ Can compute: 15b flops/sec/core

✦ We can do 25 flops/double read

What Does This Mean For Modeling?

A B C

1 a Dog

1 b Cat

2 c Cat

2 d Cat

3 e Dog

3 f Horse

4 g Doge

Mod

el 2

Mod

el 1


✦ Typical model update requires 2-‐4 flops/double ✦ recall: 25 flops / double read

A B C

1 a Dog

1 b Cat

2 c Cat

2 d Cat

3 e Dog

3 f Horse

4 g Doge

Mod

el 2

Mod

el 1



✦ Can do 7-‐10 model updates per double we read ✦ Assuming that models fit in

cache

A B C

1 a Dog

1 b Cat

2 c Cat

2 d Cat

3 e Dog

3 f Horse

4 g Doge

Mod

el 2

Mod

el 1



✦ Can do 7-‐10 model updates per double we read ✦ Assuming that models fit in

cache

✦ Train mul.ple models simultaneously

A B C

1 a Dog

1 b Cat

2 c Cat

2 d Cat

3 e Dog

3 f Horse

4 g Doge

Mod

el 2

Mod

el 1

What Do We See In Spark?

✦ 2x and 5x increase in models trained/sec with batching

✦ Overhead from virtualizaBon, network, etc.


✦ These numbers are with vector-‐matrix mulBplies


✦ These numbers are with vector-‐matrix mulBplies

✦ Can do befer when rewriBng in terms of matrix-‐matrix mulBplies




Improved Search

Learning Rate

RegularizaBon

Best answer


Learning Rate

RegularizaBon

Best answer


✦ Each point in hyper-‐parameter space represents trained model

Learning Rate

RegularizaBon

Best answer



✦ SomeBmes we see early on that a model is no good

Learning Rate

RegularizaBon

Best answer




✦ So we stop early -‐ give up on models that are not progressing




✦ So we stop early -‐ give up on models that are not progressing



Improved Search


What Search Method?

What Search Method?

✦ Various derivaBve-‐free opBmizaBon techniques ✦ Simple ones (Grid, Random) ✦ Classic DerivaBve-‐Free (Nelder-‐Mead, Powell’s method) ✦ Bayesian (SMAC, TPE, Spearmint)

What Search Method?


✦ What should we do? ✦ Tried on 5 datasets, opBmized over 4 hyperparameters!

What Search Method? GRID NELDER_MEAD POWELL RANDOM SMAC SPEARMINT TPE

0.00.10.20.30.40.5

0.00.10.20.30.40.5

0.00.10.20.30.40.5

0.00.10.20.30.40.5

0.00.10.20.30.40.5

australianbreast

diabetesfourclass

splice

16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625Method and Maximum Calls

Data

set a

nd V

alid

atio

n Er

ror

Maximum Calls1681256625

Comparison of Search Methods Across Learning Problems


✦ What should we do? ✦ Tried on 5 datasets, opBmized over 4 hyperparameters!

What Search Method? GRID NELDER_MEAD POWELL RANDOM SMAC SPEARMINT TPE

0.00.10.20.30.40.5

0.00.10.20.30.40.5

0.00.10.20.30.40.5

0.00.10.20.30.40.5

0.00.10.20.30.40.5

australianbreast

diabetesfourclass

splice

16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625 16 81 256 625Method and Maximum Calls

Data

set a

nd V

alid

atio

n Er

ror

Maximum Calls1681256625

Comparison of Search Methods Across Learning Problems

Pubng It All Together

Pubng It All Together✦ First version of MLbase opBmizer

Pubng It All Together✦ First version of MLbase opBmizer✦ 30GB dense images (240K x 16K)✦ 2 model families, 5 hyperparams

Pubng It All Together✦ First version of MLbase opBmizer✦ 30GB dense images (240K x 16K)✦ 2 model families, 5 hyperparams✦ Baseline: grid search✦ Our method: combinaBon of

✦ Batching ✦ Early stopping ✦ Random or TPE

●●●●●●●●●●●●●●●●

●●●●

●●●●

●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0.25

0.50

0.75

0 200 400 600 800Time elapsed (m)

Best

Val

idat

ion

Erro

r See

n So

Far

Search Method●

●

●

Grid − UnoptimizedRandom − OptimizedTPE − Optimized

Model Convergence Over Time



●●●●●●●●●●●●●●●●

●●●●

●●●●

●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0.25

0.50

0.75

0 200 400 600 800Time elapsed (m)

Best

Val

idat

ion

Erro

r See

n So

Far

Search Method●

●

●

Grid − UnoptimizedRandom − OptimizedTPE − Optimized

Model Convergence Over Time



20x speedup compared to grid search 15 minutes vs 5 hours!

Does It Scale?

●●●●●●●● ●●●●●● ●●●●● ●●●●

●●●●

●● ●● ●

0.25

0.50

0.75

5 10Time elapsed (h)

Best

Val

idat

ion

Erro

r See

n So

Far

Convergence of Model Accuracy on 1.5TB Dataset

Does It Scale?

✦ 1.5TB dataset (1.2M x 160K)●●●●●●●● ●●●●●● ●●●●● ●●●●

●●●●

●● ●● ●

0.25

0.50

0.75


Best

Val

idat

ion

Erro

r See

n So

Far


Does It Scale?

✦ 1.5TB dataset (1.2M x 160K)✦ 128 nodes, thousands of passes

over data✦ Tried 32 models in 15 hours

✦ Good answer aker 11 hours

●●●●●●●● ●●●●●● ●●●●● ●●●●

●●●●

●● ●● ●

0.25

0.50

0.75


Best

Val

idat

ion

Erro

r See

n So

Far


Future Work

!Data

Feature ExtracBon

Model Training

Final Model

Future Work

!Data

Feature ExtracBon

Model Training

Final Model

Automated Model Selec.on

Future Work

!Data

Feature ExtracBon

Model Training

Final Model

Future Work

!Data

Feature ExtracBon

Model Training

Final Model

Automated ML Pipeline Construc.on

A Real Pipeline for Image ClassificaBon

Inspired by Coates & Ng, 2012

Data Image Parser Normalizer Convolver

sqrt,mean

Zipper

Linear Solver

Symmetric Rectifier

ident,absident,mean

Global Pooling

Pooler

Patch Extractor

Patch Whitener

KMeans Clusterer

Feature Extractor

Label Extractor

ModelLinear Mapper

Test Data

Label Extractor

Feature Extractor

Test Error

Error Computer

Slide courtesy of Evan Sparks

Data Image Parser Normalizer Convolver

sqrt,mean

Zipper

Linear Solver

Symmetric Rectifier

ident,absident,mean

Global

Pooler

Patch Extractor

Patch Whitener

KMeans Clusterer

Feature Extractor

Label Extractor

Linear Mapper Model

Test Data

Label Extractor

Feature Extractor

Test Error

Error Computer

No Hyperparameters A few Hyperparameters Lotsa Hyperparameters

Slide courtesy of Evan Sparks

Other Future Work

Other Future Work

✦ Ensembling

Other Future Work

✦ Ensembling

✦ Leverage sampling

Other Future Work

✦ Ensembling


✦ Befer parallelism for smaller datasets

Other Future Work

✦ Ensembling


✦ Befer parallelism for smaller datasets

✦ MulBple hypothesis tesBng issues

MLOpt: DeclaraBve layer that aims to automate ML pipeline construcBon




THANKS! QUESTIONS?

baseML

baseML

baseML

baseML

ML base

ML base

ML base

ML base

ML base

www.mlbase.org

MLlib

MLIMLOpt

Apache Spark

http://www.mlbase.org

mlbase stanford sparkrezab/sparkclass/slides/ameet...interactive environment for numerical...

Documents