scott clark, ceo, sigopt, at mlconf seattle 2017
TRANSCRIPT
![Page 1: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/1.jpg)
BAYESIAN GLOBAL OPTIMIZATIONUsing Optimal Learning to Deep Learning / AI Models
Scott Clark
![Page 2: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/2.jpg)
OUTLINE
1. Why is Tuning Models Hard?
2. Comparison of Tuning Methods
3. Bayesian Global Optimization
4. Deep Learning Examples
![Page 3: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/3.jpg)
Deep Learning / AI is extremely powerful
Tuning these systems is extremely non-intuitive
![Page 4: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/4.jpg)
https://www.quora.com/What-is-the-most-important-unresolved-problem-in-machine-learning-3
What is the most important unresolved problem in machine learning?
“...we still don't really know why some configurations of deep neural networks work in some case and not others, let alone having a more or less automatic approach to determining the architectures and the hyperparameters.”
Xavier Amatriain, VP Engineering at Quora(former Director of Research at Netflix)
![Page 5: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/5.jpg)
Photo: Joe Ross
![Page 6: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/6.jpg)
TUNABLE PARAMETERS IN DEEP LEARNING
![Page 7: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/7.jpg)
TUNABLE PARAMETERS IN DEEP LEARNING
![Page 8: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/8.jpg)
Photo: Tammy Strobel
![Page 9: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/9.jpg)
STANDARD METHODSFOR HYPERPARAMETER SEARCH
![Page 10: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/10.jpg)
STANDARD TUNING METHODS
ParameterConfiguration
?Grid Search Random Search
Manual Search
- Weights- Thresholds- Window sizes- Transformations
ML / AIModel
TestingData
CrossValidation
TrainingData
![Page 11: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/11.jpg)
OPTIMIZATION FEEDBACK LOOP
Objective Metric
Better Results
REST API
New configurationsML / AIModel
TestingData
CrossValidation
TrainingData
![Page 12: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/12.jpg)
BAYESIAN GLOBAL OPTIMIZATION
![Page 13: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/13.jpg)
… the challenge of how to collect information as efficiently as possible, primarily for settings where collecting information is time consuming and expensive.
Prof. Warren Powell - Princeton
What is the most efficient way to collect information?Prof. Peter Frazier - Cornell
How do we make the most money, as fast as possible?Scott Clark - CEO, SigOpt
OPTIMAL LEARNING
![Page 14: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/14.jpg)
● Optimize objective function○ Loss, Accuracy, Likelihood
● Given parameters○ Hyperparameters, feature/architecture params
● Find the best hyperparameters○ Sample function as few times as possible○ Training on big data is expensive
BAYESIAN GLOBAL OPTIMIZATION
![Page 15: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/15.jpg)
SMBO
Sequential Model-Based Optimization
HOW DOES IT WORK?
![Page 16: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/16.jpg)
1. Build Gaussian Process (GP) with points sampled so far
2. Optimize the fit of the GP (covariance hyperparameters)
3. Find the point(s) of highest Expected Improvement within parameter domain
4. Return optimal next best point(s) to sample
GP/EI SMBO
![Page 17: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/17.jpg)
GAUSSIAN PROCESSES
![Page 18: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/18.jpg)
GAUSSIAN PROCESSES
![Page 19: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/19.jpg)
overfit good fit underfit
GAUSSIAN PROCESSES
![Page 20: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/20.jpg)
EXPECTED IMPROVEMENT
![Page 21: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/21.jpg)
DEEP LEARNING EXAMPLES
![Page 22: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/22.jpg)
● Classify movie reviews using a CNN in MXNet
SIGOPT + MXNET
![Page 23: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/23.jpg)
TEXT CLASSIFICATION PIPELINE
ML / AIModel
(MXNet)
TestingText
ValidationAccuracy
Better Results
REST API
HyperparameterConfigurations
andFeature
TransformationsTrainingText
![Page 24: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/24.jpg)
TUNABLE PARAMETERS IN DEEP LEARNING
![Page 25: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/25.jpg)
● Comparison of several RMSProp SGD parametrizations
STOCHASTIC GRADIENT DESCENT
![Page 26: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/26.jpg)
ARCHITECTURE PARAMETERS
![Page 27: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/27.jpg)
Grid Search Random Search
This slide’s GIF loops automatically
?
TUNING METHODS
![Page 28: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/28.jpg)
SPEED UP #2: RANDOM/GRID -> SIGOPT
![Page 29: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/29.jpg)
CONSISTENTLY BETTER AND FASTER
![Page 30: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/30.jpg)
● Classify house numbers in an image dataset (SVHN)
SIGOPT + TENSORFLOW
![Page 31: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/31.jpg)
COMPUTER VISION PIPELINE
ML / AIModel
(Tensorflow)
TestingImages
CrossValidation
Accuracy
Better Results
REST API
HyperparameterConfigurations
andFeature
TransformationsTrainingImages
![Page 32: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/32.jpg)
METRIC OPTIMIZATION
![Page 33: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/33.jpg)
● All convolutional neural network● Multiple convolutional and dropout layers● Hyperparameter optimization mixture of
domain expertise and grid search (brute force)
SIGOPT + NEON
http://arxiv.org/pdf/1412.6806.pdf
![Page 34: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/34.jpg)
COMPARATIVE PERFORMANCE
● Expert baseline: 0.8995○ (using neon)
● SigOpt best: 0.9011○ 1.6% reduction in
error rate○ No expert time
wasted in tuning
![Page 35: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/35.jpg)
SIGOPT + NEON
http://arxiv.org/pdf/1512.03385v1.pdf
● Explicitly reformulate the layers as learning residualfunctions with reference to the layer inputs, instead of learning unreferenced functions
● Variable depth● Hyperparameter optimization mixture of domain expertise and grid search (brute force)
![Page 36: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/36.jpg)
COMPARATIVE PERFORMANCE
Standard Method
● Expert baseline: 0.9339○ (from paper)
● SigOpt best: 0.9436○ 15% relative error
rate reduction○ No expert time
wasted in tuning
![Page 37: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/37.jpg)
SIGOPT SERVICE
![Page 38: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/38.jpg)
OPTIMIZATION FEEDBACK LOOP
Objective Metric
Better Results
REST API
New configurationsML / AIModel
TestingData
CrossValidation
TrainingData
![Page 39: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/39.jpg)
SIMPLIFIED OPTIMIZATIONClient Libraries● Python● Java● R● Matlab● And more...
Framework Integrations● TensorFlow● scikit-learn● xgboost● Keras● Neon● And more...
Live Demo
![Page 40: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/40.jpg)
DISTRIBUTED TRAINING
● SigOpt serves as a distributed scheduler for training models across workers
● Workers access the SigOpt API for the latest parameters to try for each model
● Enables easy distributed training of non-distributed algorithms across any number of models
![Page 41: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/41.jpg)
https://sigopt.com/getstarted
Try it yourself!
![Page 42: Scott Clark, CEO, SigOpt, at MLconf Seattle 2017](https://reader031.vdocuments.us/reader031/viewer/2022020214/5aacef0f7f8b9a003b8b457b/html5/thumbnails/42.jpg)
https://sigopt.com@SigOpt