Download - Game-theoretic Online Learning
![Page 1: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/1.jpg)
An Introduction to
Game-theoretic Online Learning
Tim van Erven
General Mathematics Colloquium Leiden, December 4, 2014
![Page 2: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/2.jpg)
Statistics
BayesianStatistics Frequentist
Statistics
![Page 3: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/3.jpg)
Statistics
BayesianStatistics Frequentist
Statistics A.was here
![Page 4: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/4.jpg)
Statistics
BayesianStatistics Frequentist
Statistics
RogueFactions
![Page 5: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/5.jpg)
Statistics
BayesianStatistics Frequentist
Statistics
RogueFactions
AdversarialBandits
![Page 6: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/6.jpg)
Statistics
BayesianStatistics Frequentist
Statistics
RogueFactions
Game-theoreticOnline Learning
AdversarialBandits
Youare here
![Page 7: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/7.jpg)
Outline
● Introduction to Online Learning– Game-theoretic Model
– Regression example: Electricity
– Classification example: Spam
● Three algorithms● Tuning the Learning Rate
![Page 8: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/8.jpg)
Game-Theoretic Online Learning
● Predict data that arrive one by one● Model: repeated game against an adversary● Applications:
– spam detection
– data compression
– online convex optimization
– predicting electricity consumption
– predicting air pollution levels
– ...
![Page 9: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/9.jpg)
Repeated Game (Informally)
● Sequentially predict outcomes ● Measure quality of prediction by loss
● Before predicting , get predictions (=advice)from experts
● Goal: to predict as well as the best expert over rounds.
● Data and Advice can be adversarial
![Page 10: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/10.jpg)
Repeated Game
● Every round :
1. Get expert predictions
2. Predict
3. Outcome is revealed
4. Measure losses
![Page 11: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/11.jpg)
Repeated Game
● Every round :
1. Get expert predictions
2. Predict
3. Outcome is revealed
4. Measure losses
● Best expert:
● Goal: minimize regret
![Page 12: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/12.jpg)
Outline
● Introduction to Online Learning– Game-theoretic Model
– Regression example: Electricity
– Classification example: Spam
● Three algorithms● Tuning the Learning Rate
![Page 13: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/13.jpg)
Regression Example: Electricity
● Électricité de France: predict electricitydemand one day ahead, every day
● Experts: complicated regression models● Loss:
[Devaine, Gaillard,Goude, Stoltz, 2013]
![Page 14: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/14.jpg)
Regression Example: Electricity
● Électricité de France: predict electricitydemand one day ahead, every day
● Experts: complicated regression models● Loss:
● Best model after one year:
● How much worse are we?
[Devaine, Gaillard,Goude, Stoltz, 2013]
![Page 15: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/15.jpg)
Outline
● Introduction to Online Learning– Game-theoretic Model
– Regression example: Electricity
– Classification example: Spam
● Three algorithms● Tuning the Learning Rate
![Page 16: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/16.jpg)
Example: Spam Detection
spam
spam
spamspam
spamspam
ham
hamham
ham
![Page 17: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/17.jpg)
Classification Example: Spam
● Experts: spam detection algorithms● Messages:
Predictions:● Loss:
● Regret: extra mistakes we make over bestalgorithm on messages
![Page 18: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/18.jpg)
Outline
● Introduction to Online Learning● Three algorithms:
1. Halving
2. Follow the Leader (FTL)
3. Follow the Regularized Leader (FTRL)
● Tuning the Learning Rate
![Page 19: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/19.jpg)
A First Algorithm: Halving
● Suppose one of the spam detectors is perfect
● Keep track of experts without mistakes so far:
● Halving algorithm:
● Theorem: regret
![Page 20: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/20.jpg)
A First Algorithm: Halving
Theorem: regret
● Does not grow with
Proof:● Suppose halving makes mistakes, regret =
● Every mistake eliminates at least half of
● is at most mistakes
![Page 21: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/21.jpg)
Outline
● Introduction to Online Learning● Three algorithms:
1. Halving
2. Follow the Leader (FTL)
3. Follow the Regularized Leader (FTRL)
● Tuning the Learning Rate
![Page 22: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/22.jpg)
Follow the Leader
● Want to remove unrealistic assumption thatone expert is perfect
● FTL: copy the leader's prediction● The leader at time :
where is cumulative loss forexpert k
(break ties randomly)
![Page 23: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/23.jpg)
FTL Works with Perfect Expert
Theorem: Suppose one of the spam detectorsis perfect. Then Expected regret
Proof:● Expected regret = E[nr. mistakes] - 0● Worst case: experts get one loss in turn● E[nr. mistakes]
![Page 24: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/24.jpg)
FTL: More Good News
● No assumption of perfect expert
Theorem: regret
![Page 25: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/25.jpg)
FTL: More Good News
● No assumption of perfect expert
Theorem: regret
● Proof sketch:– No leader change: our loss = loss of leader, so
the regret stays the same
– Leader change: our regret increases at mostby 1 (range of losses)
![Page 26: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/26.jpg)
FTL: More Good News
● No assumption of perfect expert
Theorem: regret
● Proof sketch:– No leader change: our loss = loss of leader, so
the regret stays the same
– Leader change, our regret increases at mostby 1 (range of losses)
● Works well for i.i.d. losses, because the leaderchanges only finitely many times w.h.p.
![Page 27: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/27.jpg)
FTL on IID Losses
● 4 experts with Bernoulli 0.1, 0.2, 0.3, 0.4losses; regret = O(log K)
![Page 28: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/28.jpg)
FTL Worst-case Losses
![Page 29: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/29.jpg)
FTL Worst-case Losses
● Two experts with tie/leader change everyround:
● Both experts have cumulative loss:● Regret is linear in
● Problem: FTL too sure of itself when no ties!
Expert 1 1 0 1 0 1 0
Expert 2 0 1 0 1 0 1
FTL 1/2 1 1/2 1 1/2 1
![Page 30: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/30.jpg)
Outline
● Introduction to Online Learning● Three algorithms:
1. Halving
2. Follow the Leader (FTL)
3. Follow the Regularized Leader (FTRL)
● Tuning the Learning Rate
![Page 31: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/31.jpg)
Solution: Be Less Sure!
● Pull FTL choices towards uniform distribution● Follow the Leader:
– leader:
– as distribution:
● Follow the Regularized Leader:
– add penalty for being away from uniform;Kullback-Leibler divergence in this talk
![Page 32: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/32.jpg)
The Learning Rate
● Follow the Regularized Leader:
● Very sensitive to choice of learning rate
Follow the Leader Don't learn at all
![Page 33: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/33.jpg)
Outline
● Introduction to Online Learning● Three algorithms● Tuning the Learning Rate:
– Safe tuning
– The Price of Robustness
– Learning the Learning Rate
![Page 34: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/34.jpg)
The Worst-case Safe Learning Rate
Theorem: For FTRL regret
regret
● No (probabilistic) assumptions about data!● Optimal● is standard in online learning
![Page 35: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/35.jpg)
Outline
● Introduction to Online Learning● Three algorithms● Tuning the Learning Rate:
– Safe tuning
– The Price of Robustness
– Learning the Learning Rate
![Page 36: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/36.jpg)
The Price of Robustness
● Safe tuning does much worse than FTL oni.i.d. losses
![Page 37: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/37.jpg)
The Price of Robustness
Method Special Caseof FTRL
PerfectExpert Data
IID Data Worst-caseData
Halving no undefined undefined
Follow the Leader
(very large)
FTRL with Worst-case Safe Tuning
(small)
Can we adapt to optimal eta automatically?
![Page 38: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/38.jpg)
Outline
● Introduction to Online Learning● Three algorithms● Tuning the Learning Rate:
– Safe tuning
– The Price of Robustness
– Learning the Learning Rate
![Page 39: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/39.jpg)
A Failed Approach: Being Meta
● We want ● Idea: meta-problem
– Expert 1: FTL
– Expert 2: FTRL with Safe Tuning
![Page 40: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/40.jpg)
A Failed Approach: Being Meta
● We want ● Idea: meta-problem
– Expert 1: FTL
– Expert 2: FTRL with Safe Tuning
● Regret ● Best of both worlds if meta-regret small!
![Page 41: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/41.jpg)
A Failed Approach: Being Meta
● We want ● Idea: meta-problem
– Expert 1: FTL
– Expert 2: FTRL with Safe Tuning
● Regret ● Best of both worlds if meta-regret small!● If and ,
then is too big!
![Page 42: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/42.jpg)
Partial Progress
● Safe tuning: Regret● Improvement for small losses:
Regret
![Page 43: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/43.jpg)
Partial Progress
● Safe tuning: Regret● Improvement for small losses:
● Variance Bounds:
Regret variance of
● Cesa-Bianchi, Mansour, Stoltz, 2007● vE, Grünwald, Koolen, De Rooij, 2011● De Rooij, vE, Grünwald, Koolen, 2014
![Page 44: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/44.jpg)
Partial Progress
● Safe tuning: Regret● Improvement for small losses:
● Variance Bounds:
Regret variance of
![Page 45: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/45.jpg)
Regret is a Difficult Function 1/2
● All the previous solutions could potentially endup in wrong local minimum
![Page 46: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/46.jpg)
Regret is a Difficult Function 2/2
● Regret as a function of T for fixed is non-monotonic.
● This means some may look very bad for awhile, but end up being very good after all
● How do we see which 's are good?● Use (best-possible) monotonic lower-bound
per
![Page 47: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/47.jpg)
Learning the Learning Rate
● Koolen, vE, Grünwald, 2014
![Page 48: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/48.jpg)
Learning the Learning Rate
● Track performance for grid of learning rates
● Switch between them– Pay for switching, but not too much
● Running time as fast as for single fixed– does not depend on size of grid
![Page 49: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/49.jpg)
Learning the Learning Rate
Theorems:
● For all interesting :
} As good asall previousmethods
polylog(K,T)
![Page 50: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/50.jpg)
Learning the Learning Rate
● Koolen, vE, Grünwald, 2014
![Page 51: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/51.jpg)
The Price of Robustness
Method Special Case ofFTRL
PerfectExpert Data
IID Data Worst-caseData
Compared toOptimal
Follow theLeader
(very large)
no usefulguarantees
FTRL withWorst-caseSafe Tuning
(small)
no usefulguarantees
LLR polylog(K,T)factor
Can we adapt to optimal eta automatically? Yes!
![Page 52: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/52.jpg)
Summary
● Game-theoretic Online Learning– e.g. electricity forecasting, spam detection
● Three algorithms:– Halving, Follow the (Regularized) Leader
● Tuning the Learning Rate:– Safe tuning pays the Price of Robustness
– Learning the learning rate adapts to theoptimal learning rate automatically
Statistics
FrequentistStatistics
RogueFactions
Game-theoreticOnline Learning
AdversarialBandits
Youare here
BayesianStatistics
![Page 53: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/53.jpg)
Future Work
● So far: compete with the best expert
● Online Convex Optimization: compete with thebest convex combination of experts
● Future work: extend LLR to online convexoptimization
![Page 54: Game-theoretic Online Learning](https://reader035.vdocuments.us/reader035/viewer/2022062413/58a19dcb1a28abab478b8a92/html5/thumbnails/54.jpg)
References
● Cesa-Bianchi and Lugosi. Prediction, learning, and games. 2006.
● Cesa-Bianchi, Mansour, Stoltz. Improved second-order bounds for prediction withexpert advice. Machine Learning, 66(2/3):321–352, 2007.
● Devaine, Gaillard, Goude, Stoltz. Forecasting electricity consumption byaggregating specialized experts. Machine Learning, 90(2):231-260, 2013.
● Van Erven, Grünwald, Koolen and De Rooij. Adaptive Hedge. NIPS, 2011.
● De Rooij, Van Erven, Grünwald, Koolen. Follow the Leader If You Can, Hedge IfYou Must. Journal of Machine Learning Research, 2014.
● Koolen, Van Erven, Grünwald. Learning the Learning Rate forPrediction with Expert Advice. NIPS, 2014.