foundations of robotics and autonomous learning summer ...€¦ · problem: you have bbinary...
TRANSCRIPT
![Page 1: Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary bandits; when choosing bandit iat time t, it returns y t ˘Bern( i); you want to maximize](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75751653e76823d863b5f3/html5/thumbnails/1.jpg)
Foundations of Robotics and AutonomousLearning Summer School (RALSS’17)
Berlin, Sep 4-8, 2017
Active Learning & Bayesian Optimization
Marc ToussaintUniversity of StuttgartSummer 2017
![Page 2: Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary bandits; when choosing bandit iat time t, it returns y t ˘Bern( i); you want to maximize](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75751653e76823d863b5f3/html5/thumbnails/2.jpg)
• Detailed reference:http://ipvs.informatik.uni-stuttgart.de/mlr/marc/teaching/
14-BanditsOptimizationActiveLearningBayesianRL.pdf
• orhttp://ipvs.informatik.uni-stuttgart.de/mlr/marc/teaching/
Lecture-Optimization.pdf
Chapter 5 “Global & Bayesian Optimization”
Active Learning & Bayesian Optimization – – 2/14
![Page 3: Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary bandits; when choosing bandit iat time t, it returns y t ˘Bern( i); you want to maximize](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75751653e76823d863b5f3/html5/thumbnails/3.jpg)
4 Sessions
• Bandits
• Bayesian Optimization
Active Learning & Bayesian Optimization – – 3/14
![Page 4: Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary bandits; when choosing bandit iat time t, it returns y t ˘Bern( i); you want to maximize](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75751653e76823d863b5f3/html5/thumbnails/4.jpg)
(1) Bandits
• Problem: You have B binary bandits; when choosing bandit i at time t,it returns yt ∼ Bern(θi); you want to maximize max
⟨∑Tt=1 yt
⟩. How do
you choose bandits?
Active Learning & Bayesian Optimization – – 4/14
![Page 5: Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary bandits; when choosing bandit iat time t, it returns y t ˘Bern( i); you want to maximize](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75751653e76823d863b5f3/html5/thumbnails/5.jpg)
• Representing the state of knowledge: statistics & beliefs
• Exploration vs exploitation– Exploration: Choose the next decision to min 〈H(bt)〉– Exploitation: Choose the next decision to max 〈yt〉
• Belief planning– What would be an optimal strategy?– How large is V (bt) for T = 10 and 3 bandits?
• UCB: α(i) = yi + β√
2 lnnni
• Optimism in the face of uncertainty
Active Learning & Bayesian Optimization – – 5/14
![Page 6: Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary bandits; when choosing bandit iat time t, it returns y t ˘Bern( i); you want to maximize](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75751653e76823d863b5f3/html5/thumbnails/6.jpg)
Practical: UCBgit checkout mastergit pull cd teaching/RoboticsCourse/12-banditsmake cleanAll; make./x.exe
• What you see are returns for t = 1, .., T averaged over K runs
• Implement a better decision policy
• Play around with different bandits, e.g. θ = (.5, .6) (harder), orGaussian bandits
Active Learning & Bayesian Optimization – – 6/14
![Page 7: Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary bandits; when choosing bandit iat time t, it returns y t ˘Bern( i); you want to maximize](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75751653e76823d863b5f3/html5/thumbnails/7.jpg)
The Robotics’ Active Learning ChallengeAutonomously explore the environmentto learn what is manipulable and how
◦(bird research by Alex Kacelnik et al. (U Oxford))
Active Learning & Bayesian Optimization – – 7/14
![Page 8: Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary bandits; when choosing bandit iat time t, it returns y t ˘Bern( i); you want to maximize](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75751653e76823d863b5f3/html5/thumbnails/8.jpg)
• Active Learning for ‘kinematic beliefs’Otte, Kulick, Toussaint & Brock: Entropy Based Strategies for Physical Exploration of theEnvironments Degrees of Freedom. IROS’14
Kulick, Otte & Toussaint: Active Exploration of Joint Dependency Structures. ICRA’15
• Application
◦Bernstein, Hofer, Kulick, Martin-Martin, Baum, Toussaint, Brock: . ICRA submission
Active Learning & Bayesian Optimization – – 8/14
![Page 9: Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary bandits; when choosing bandit iat time t, it returns y t ˘Bern( i); you want to maximize](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75751653e76823d863b5f3/html5/thumbnails/9.jpg)
More work on Active Learning
• More efficient active learning of hyper parameters:Kulick, Lieck & Toussaint: Cross-Entropy as a Criterion for Robust Interactive Learning ofLatent Properties. NIPS workshop FILM’16
• Safe Active Learning:Schreiter et al: Safe Exploration for Active Learning with Gaussian Processes. ECML’15
• In relational RL:Lang, Toussaint & Kersting: Exploration in Relational Domains for Model-basedReinforcement Learning. JMLR 2012
Lang, Toussaint & Kersting: Exploration in Relational Worlds. ECML’10
• Non-information-gain-based:Lopes, Lang & Toussaint: Exploration in Model-based Reinforcement Learning byEmpirically Estimating Learning Progress. NIPS’12
• Application symbol learning:Kulick, Toussaint, Lang & Lopes: Active Learning for Teaching a Robot GroundedRelational Symbols. IJCAI’13
Active Learning & Bayesian Optimization – – 9/14
![Page 10: Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary bandits; when choosing bandit iat time t, it returns y t ˘Bern( i); you want to maximize](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75751653e76823d863b5f3/html5/thumbnails/10.jpg)
(2) Global Optimization
• Problem: Let x ∈ Rn, f : Rn → R, find
minx
f(x)
only by sampling values yt = f(xt). No access to ∇f or ∇2f .Observations may be noisy y ∼ N(y | f(xt), σ)
Active Learning & Bayesian Optimization – – 10/14
![Page 11: Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary bandits; when choosing bandit iat time t, it returns y t ˘Bern( i); you want to maximize](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75751653e76823d863b5f3/html5/thumbnails/11.jpg)
• Global Optimization = Infinite Bandits
• Gaussian Processes as Belief bt = GP (f |Dt)
• Optimal Global Optimization? V (bt)
• Acquisition functions:
xMPIt = argmax
x
∫ y∗
−∞N(y|f(x), σ(x))
xEIt = argmax
x
∫ y∗
−∞N(y|f(x), σ(x)) (y∗ − y)
xUCBt = argmin
xf(x)− βtσ(x)
Active Learning & Bayesian Optimization – – 11/14
![Page 12: Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary bandits; when choosing bandit iat time t, it returns y t ˘Bern( i); you want to maximize](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75751653e76823d863b5f3/html5/thumbnails/12.jpg)
Practicalgit checkout mastergit pull cd teaching/RoboticsCourse/13-bayesOptmake./x.exe
• This implements GP updates with random querying
• Implement an active learning strategy... discuss!
• Implement a GP-UCB strategy
Active Learning & Bayesian Optimization – – 12/14
![Page 13: Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary bandits; when choosing bandit iat time t, it returns y t ˘Bern( i); you want to maximize](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75751653e76823d863b5f3/html5/thumbnails/13.jpg)
Issues
• Hyperparameter choice– Standard proofs do not apply anymore!– Fully Bayesian, or Entropy Search → where get the prior from?– Should we really assume a squared explonential kernel? If not, what else?
Active Learning & Bayesian Optimization – – 13/14
![Page 14: Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary bandits; when choosing bandit iat time t, it returns y t ˘Bern( i); you want to maximize](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75751653e76823d863b5f3/html5/thumbnails/14.jpg)
Issues
• Hyperparameter choice– Standard proofs do not apply anymore!– Fully Bayesian, or Entropy Search → where get the prior from?– Should we really assume a squared explonential kernel? If not, what else?
Active Learning & Bayesian Optimization – – 13/14
![Page 15: Foundations of Robotics and Autonomous Learning Summer ...€¦ · Problem: You have Bbinary bandits; when choosing bandit iat time t, it returns y t ˘Bern( i); you want to maximize](https://reader034.vdocuments.us/reader034/viewer/2022050313/5f75751653e76823d863b5f3/html5/thumbnails/15.jpg)
No Free Lunch
Active Learning & Bayesian Optimization – – 14/14