foundations of robotics and autonomous learning summer ...€¦ · problem: you have bbinary...

Foundations of Robotics and AutonomousLearning Summer School (RALSS’17)

Berlin, Sep 4-8, 2017

Active Learning & Bayesian Optimization

Marc ToussaintUniversity of StuttgartSummer 2017

• Detailed reference:http://ipvs.informatik.uni-stuttgart.de/mlr/marc/teaching/

14-BanditsOptimizationActiveLearningBayesianRL.pdf

• orhttp://ipvs.informatik.uni-stuttgart.de/mlr/marc/teaching/

Lecture-Optimization.pdf

Chapter 5 “Global & Bayesian Optimization”

Active Learning & Bayesian Optimization – – 2/14

http://ipvs.informatik.uni-stuttgart.de/mlr/marc/teaching/14-BanditsOptimizationActiveLearningBayesianRL.pdf

http://ipvs.informatik.uni-stuttgart.de/mlr/marc/teaching/14-BanditsOptimizationActiveLearningBayesianRL.pdf

http://ipvs.informatik.uni-stuttgart.de/mlr/marc/teaching/Lecture-Optimization.pdf

http://ipvs.informatik.uni-stuttgart.de/mlr/marc/teaching/Lecture-Optimization.pdf

4 Sessions

• Bandits

• Bayesian Optimization


(1) Bandits

• Problem: You have B binary bandits; when choosing bandit i at time t,it returns yt ∼ Bern(θi); you want to maximize max

⟨∑Tt=1 yt

⟩. How do

you choose bandits?


• Representing the state of knowledge: statistics & beliefs

• Exploration vs exploitation– Exploration: Choose the next decision to min 〈H(bt)〉– Exploitation: Choose the next decision to max 〈yt〉

• Belief planning– What would be an optimal strategy?– How large is V (bt) for T = 10 and 3 bandits?

• UCB: α(i) = yi + β√

2 lnnni

• Optimism in the face of uncertainty


Practical: UCBgit checkout mastergit pull cd teaching/RoboticsCourse/12-banditsmake cleanAll; make./x.exe

• What you see are returns for t = 1, .., T averaged over K runs

• Implement a better decision policy

• Play around with different bandits, e.g. θ = (.5, .6) (harder), orGaussian bandits


The Robotics’ Active Learning ChallengeAutonomously explore the environmentto learn what is manipulable and how

◦(bird research by Alex Kacelnik et al. (U Oxford))


• Active Learning for ‘kinematic beliefs’Otte, Kulick, Toussaint & Brock: Entropy Based Strategies for Physical Exploration of theEnvironments Degrees of Freedom. IROS’14

Kulick, Otte & Toussaint: Active Exploration of Joint Dependency Structures. ICRA’15

• Application

◦Bernstein, Hofer, Kulick, Martin-Martin, Baum, Toussaint, Brock: . ICRA submission


More work on Active Learning

• More efficient active learning of hyper parameters:Kulick, Lieck & Toussaint: Cross-Entropy as a Criterion for Robust Interactive Learning ofLatent Properties. NIPS workshop FILM’16

• Safe Active Learning:Schreiter et al: Safe Exploration for Active Learning with Gaussian Processes. ECML’15

• In relational RL:Lang, Toussaint & Kersting: Exploration in Relational Domains for Model-basedReinforcement Learning. JMLR 2012

Lang, Toussaint & Kersting: Exploration in Relational Worlds. ECML’10

• Non-information-gain-based:Lopes, Lang & Toussaint: Exploration in Model-based Reinforcement Learning byEmpirically Estimating Learning Progress. NIPS’12

• Application symbol learning:Kulick, Toussaint, Lang & Lopes: Active Learning for Teaching a Robot GroundedRelational Symbols. IJCAI’13


(2) Global Optimization

• Problem: Let x ∈ Rn, f : Rn → R, find

minx

f(x)

only by sampling values yt = f(xt). No access to ∇f or ∇2f .Observations may be noisy y ∼ N(y | f(xt), σ)


• Global Optimization = Infinite Bandits

• Gaussian Processes as Belief bt = GP (f |Dt)

• Optimal Global Optimization? V (bt)

• Acquisition functions:

xMPIt = argmax

x

∫ y∗

−∞N(y|f(x), σ(x))

xEIt = argmax

x

∫ y∗

−∞N(y|f(x), σ(x)) (y∗ − y)

xUCBt = argmin

xf(x)− βtσ(x)


Practicalgit checkout mastergit pull cd teaching/RoboticsCourse/13-bayesOptmake./x.exe

• This implements GP updates with random querying

• Implement an active learning strategy... discuss!

• Implement a GP-UCB strategy


Issues

• Hyperparameter choice– Standard proofs do not apply anymore!– Fully Bayesian, or Entropy Search → where get the prior from?– Should we really assume a squared explonential kernel? If not, what else?


No Free Lunch


foundations of robotics and autonomous learning summer ...€¦ · problem: you have bbinary...

Documents