Transcript
Page 1: CPSC540 - University of British Columbia

CPSC540

Bayesian optimization,

Nando de FreitasFebruary 2013

Bayesian optimization,

bandits and

Thompson sampling

Page 2: CPSC540 - University of British Columbia

Multi-armed bandit problem

money!

Page 3: CPSC540 - University of British Columbia

Multi-armed bandit problem

Actions

Reward(s)

Sequence of trials

• Trade-off between Exploration and Exploitation• Regret = Player reward – Reward of best action

Page 4: CPSC540 - University of British Columbia
Page 5: CPSC540 - University of British Columbia

CPSC 340 5

Acq

uisi

tion

func

tion

Parameter

Page 6: CPSC540 - University of British Columbia

Exploration-exploitation tradeoff

Recall the expressions for GP prediction:

We should choose the next point x where the mean is high (exploitation) and the variance is high (exploration).

We could balance this tradeoff with an acquisition function as follows:

Page 7: CPSC540 - University of British Columbia

Acquisition functions

Page 8: CPSC540 - University of British Columbia

An acquisition function: Probability of Improvement

Page 9: CPSC540 - University of British Columbia

People as Bayesian reasoners

Page 10: CPSC540 - University of British Columbia

Utilitarian view: We need models to make the right decisions under uncertainty. Inference and decision making are intertwined.

Learned posterior Cost/Reward model u(x,a)

Bayes and decision theory

P(x=healthy|data) = 0.9

P(x=cancer|data) = 0.1

We choose the action that maximizes the expected utility, or equivalently, which minimizes the expected cost.

EU(a=no treatment) =

EU(a=treatment) =

EU(a) = u(x,a) P(x|data)xΣΣΣΣ

Page 11: CPSC540 - University of British Columbia

An expected utility criterionAt iteration n+1, choose the point that minimizes the distance to the objective evaluated at the maximum x*:

We don’t know the true objective at the maximum. To overcome this, Mockus proposed the following acquisition function:

Page 12: CPSC540 - University of British Columbia

Expected improvement

For this acquisition, we can obtain an analytical expression:

Page 13: CPSC540 - University of British Columbia

A third criterion: GP-UCB

Define the regret and cumulative regret as follows:

The GP-UCB criterion is as follows:The GP-UCB criterion is as follows:

Beta is set using a simple concentration bound:

[Srinivas et al, 2010]

Page 14: CPSC540 - University of British Columbia

A fourth criterion: Thompson sampling

Page 15: CPSC540 - University of British Columbia

Acquisition functions

Page 16: CPSC540 - University of British Columbia
Page 17: CPSC540 - University of British Columbia

Portfolios of acquisition functions help

Page 18: CPSC540 - University of British Columbia

Why Bayesian Optimization works

Page 19: CPSC540 - University of British Columbia

Intelligent user interfaces

Page 20: CPSC540 - University of British Columbia

Example: Tuning NP hard problem solvers

Page 21: CPSC540 - University of British Columbia

Why random tuning works sometimes

Page 22: CPSC540 - University of British Columbia

Example: Tuning random forests

Page 23: CPSC540 - University of British Columbia

Example: Tuning hybrid Monte Carlo

Page 24: CPSC540 - University of British Columbia

24

Page 25: CPSC540 - University of British Columbia

The games industry, rich in sophisticated large-scale simulators, is a great environment for the design and

study of automatic decision making systems.

Page 26: CPSC540 - University of British Columbia

Hierarchical policy example

– High-levelmodel-based learning for deciding when to navigate, park, pickup and dropoff passengers.

– Mid-levelactive path learning for navigating a topological map.

– Low-level active policy optimizer to learn control of continuous non-linear vehicle dynamics.

Page 27: CPSC540 - University of British Columbia

Active Path Finding in Middle Level• Mid-level Navigate policy generates sequence of waypoints on

a topological map to navigate from a location to a destination. V(θ) value function represents the path length from the current node, to the target.

Page 28: CPSC540 - University of British Columbia

Low-Level: Trajectory following

Vx

VyYerr

Ωerr

trajectory

TORCS: 3D game engine that implements complex vehicle dynamics complete with manual and automatic transmission, engine, clutch, tire, and suspension models.

Page 29: CPSC540 - University of British Columbia

Hierarchical systems apply to many robot tasks – key to build large systems

We used TORCS: A 3D game engine that implements complex vehicle dynamics complete with manual and automatic transmission, engine, clutch, tire, and suspension models.

Page 30: CPSC540 - University of British Columbia

Gaze planning

Digits Experiment:

Face Experiment:

30

Page 31: CPSC540 - University of British Columbia

Next lecture

In the next lecture, we embark on our quest to learn all about random forests. We will begin by learning about decision trees.


Top Related