optimization strategies hyperparameter · 2018. 4. 1. · meetup.com/iasi-ai/ facebook.com/ai.iasi/...

29
meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net Hyperparameter optimization strategies git clone https://github.com/IASIAI/hyperparameter-optimization-strategies.git

Upload: others

Post on 09-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: optimization strategies Hyperparameter · 2018. 4. 1. · meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net Gabriel Marchidan Bogdan Burlacu Software architect AI researcher, PhD

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Hyperparameter optimization strategies

git clone https://github.com/IASIAI/hyperparameter-optimization-strategies.git

Page 2: optimization strategies Hyperparameter · 2018. 4. 1. · meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net Gabriel Marchidan Bogdan Burlacu Software architect AI researcher, PhD

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Gabriel Marchidan Bogdan BurlacuSoftware architect AI researcher, PhD

Page 3: optimization strategies Hyperparameter · 2018. 4. 1. · meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net Gabriel Marchidan Bogdan Burlacu Software architect AI researcher, PhD

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

“Algorithms are conceived in analytic purity in the high citadels of academic research, heuristics are midwifed by expediency in the dark corners of the practitioner’s lair”

Fred Glover, 1977

Page 4: optimization strategies Hyperparameter · 2018. 4. 1. · meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net Gabriel Marchidan Bogdan Burlacu Software architect AI researcher, PhD

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Contents

● Problem statement● Disclaimer● Random search● Grid search● Bayesian optimization● Covariance Matrix Adaptation Evolution Strategy (CMA-ES)

Page 5: optimization strategies Hyperparameter · 2018. 4. 1. · meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net Gabriel Marchidan Bogdan Burlacu Software architect AI researcher, PhD

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Problem statement

● Hyperparameters are parameters whose values are set prior to the commencement of the learning process.

● By contrast, the values of other parameters are derived via training.

● The problem (hyper)parameter optimization is not specific to ML

Page 6: optimization strategies Hyperparameter · 2018. 4. 1. · meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net Gabriel Marchidan Bogdan Burlacu Software architect AI researcher, PhD

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Problem statement

● 

Page 7: optimization strategies Hyperparameter · 2018. 4. 1. · meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net Gabriel Marchidan Bogdan Burlacu Software architect AI researcher, PhD

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Problem statement

● 

Page 8: optimization strategies Hyperparameter · 2018. 4. 1. · meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net Gabriel Marchidan Bogdan Burlacu Software architect AI researcher, PhD

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Disclaimer

Things that will improve a real-life algorithm mode than in-depth parameter optimization:

● Having better data ● Having more data● Changing the algorithm (or the rates in which multiple algorithms’ results are being weighted)

So don’t start with this !

Page 9: optimization strategies Hyperparameter · 2018. 4. 1. · meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net Gabriel Marchidan Bogdan Burlacu Software architect AI researcher, PhD

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Grid search

● Scans the parameter space in a grid pattern with a certain step size

● Hence the name “Grid search”

Page 10: optimization strategies Hyperparameter · 2018. 4. 1. · meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net Gabriel Marchidan Bogdan Burlacu Software architect AI researcher, PhD

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Grid search

● Grid search probes parameter configurations deterministically, by laying down a grid of all possible configurations inside your parameter space

● In all continuous dimensions of parameter space a step is considered (defines the smoothness of the grid)

Page 11: optimization strategies Hyperparameter · 2018. 4. 1. · meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net Gabriel Marchidan Bogdan Burlacu Software architect AI researcher, PhD

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Grid search

● Requires a lot of function evaluations

● Is highly impractical for algorithms with more than 4 parameters

● The number of function evaluations grows exponentially with each additional parameter (curse of dimensionality)

Page 12: optimization strategies Hyperparameter · 2018. 4. 1. · meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net Gabriel Marchidan Bogdan Burlacu Software architect AI researcher, PhD

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Curse of dimensionality

● 

Page 13: optimization strategies Hyperparameter · 2018. 4. 1. · meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net Gabriel Marchidan Bogdan Burlacu Software architect AI researcher, PhD

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Random search

● Random points from the parameter space are being chosen

● In turn, random points from the solution space are being sampled

Page 14: optimization strategies Hyperparameter · 2018. 4. 1. · meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net Gabriel Marchidan Bogdan Burlacu Software architect AI researcher, PhD

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Random search

● Random Search suggests configurations randomly from your parameter space

● The best result is saved along with the corresponding parameters

● The next result is sampled either randomly from the whole parameter space or randomly from a sphere around the current result

● The process is repeated until a termination criterion is met (usually, number of iterations)

Page 15: optimization strategies Hyperparameter · 2018. 4. 1. · meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net Gabriel Marchidan Bogdan Burlacu Software architect AI researcher, PhD

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Random search

● Can be applied to functions that are not continuous or differentiable

● It makes no assumptions about the properties of the function

● Has multiple variants: fixed step, optimum step, adaptive step etc.

Page 16: optimization strategies Hyperparameter · 2018. 4. 1. · meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net Gabriel Marchidan Bogdan Burlacu Software architect AI researcher, PhD

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Bayesian optimization

● With each observation we are improving the model of the objective function

● We are sampling the points that have the highest chance to improve the objective function

Page 17: optimization strategies Hyperparameter · 2018. 4. 1. · meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net Gabriel Marchidan Bogdan Burlacu Software architect AI researcher, PhD

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Bayesian optimization

● 

Page 18: optimization strategies Hyperparameter · 2018. 4. 1. · meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net Gabriel Marchidan Bogdan Burlacu Software architect AI researcher, PhD

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Bayesian optimization

● To use Bayesian optimization, we need a way to flexibly model distributions over objective functions

● For this problem, Gaussian Processes are a particularly elegant technique

● Used for problems where each sampling is costly either as time or resources

● Historically Gaussian Processes were developed to help search for gold

Page 19: optimization strategies Hyperparameter · 2018. 4. 1. · meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net Gabriel Marchidan Bogdan Burlacu Software architect AI researcher, PhD

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Bayesian optimization

● 

Page 20: optimization strategies Hyperparameter · 2018. 4. 1. · meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net Gabriel Marchidan Bogdan Burlacu Software architect AI researcher, PhD

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

CMA-ES

● CMA-ES stands for Covariance Matrix Adaptation Evolution Strategy

● It is an evolutionary algorithm for difficult non-linear non-convex black-box optimisation problems in continuous domain

● The CMA-ES is considered as state-of-the-art in evolutionary computation and has been adopted as one of the standard tools for continuous optimisation

Page 21: optimization strategies Hyperparameter · 2018. 4. 1. · meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net Gabriel Marchidan Bogdan Burlacu Software architect AI researcher, PhD

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

CMA-ES

● It is an evolutionary algorithm

● Solutions are represented by parameter vectors with real number values

● Initial solutions are randomly generated

● Subsequent solutions are generated from the fittest solutions of the previous generation by recombination and mutation

Page 22: optimization strategies Hyperparameter · 2018. 4. 1. · meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net Gabriel Marchidan Bogdan Burlacu Software architect AI researcher, PhD

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

CMA-ES

● Runs have the same population size each step (λ, λ > 4, usually λ > 20)

● Each value from the parameter solution vector is modified by sampling a certain distribution

● The distribution is updated by CMA based on the best solutions found in the current step(ES)

Page 23: optimization strategies Hyperparameter · 2018. 4. 1. · meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net Gabriel Marchidan Bogdan Burlacu Software architect AI researcher, PhD

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

CMA-ES

● The mean is updated each time to provide a new centroid for new solutions

● Two paths of the time evolution of the distribution mean of the strategy are recorded, called search or evolution paths

● The two paths contain information about the correlation between consecutive iterations

Page 24: optimization strategies Hyperparameter · 2018. 4. 1. · meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net Gabriel Marchidan Bogdan Burlacu Software architect AI researcher, PhD

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

CMA-ES

● Each iteration the mean is adjusted

● The two evolution paths are updated

● A new step size is calculated

Page 25: optimization strategies Hyperparameter · 2018. 4. 1. · meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net Gabriel Marchidan Bogdan Burlacu Software architect AI researcher, PhD

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Running the simulations

● Windows - WinPython - https://winpython.github.io/● Linux and macOS – Python 3.5+, SciPy, scikit-learn, skopt (scikit-optimize)● Python virtualenv recommended for Linux and macOS● To test, you should be able to run the examples here: https://scikit-optimize.github.io/

git clone https://github.com/IASIAI/hyperparameter-optimization-strategies.git

Page 26: optimization strategies Hyperparameter · 2018. 4. 1. · meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net Gabriel Marchidan Bogdan Burlacu Software architect AI researcher, PhD

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Simulation

● Trying to find the maximum value of the Rastrigin function

● Will run, in turn:○ Grid Search○ Random Search○ Bayesian optimization○ CMA-ES

Page 27: optimization strategies Hyperparameter · 2018. 4. 1. · meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net Gabriel Marchidan Bogdan Burlacu Software architect AI researcher, PhD

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Biography

● https://www.lri.fr/~hansen/cmaesintro.html● https://blog.sigopt.com/posts/evaluating-hyperparameter-optimization-strategies● https://cloud.google.com/blog/big-data/2017/08/hyperparameter-tuning-in-cloud-machine-learning-engine-using-bayesia

n-optimization● https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)● https://en.wikipedia.org/wiki/CMA-ES● https://en.wikipedia.org/wiki/Rastrigin_function

Page 28: optimization strategies Hyperparameter · 2018. 4. 1. · meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net Gabriel Marchidan Bogdan Burlacu Software architect AI researcher, PhD

meetup.com/IASI-AI/ facebook.com/AI.Iasi/ iasiai.net

Questions ?