joelle pineau robotics institute carnegie mellon university stanford university may 3, 2004

70
Tractable Planning for Real- World Robotics: The promises and challenges of dealing with uncertainty Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Upload: rory

Post on 12-Jan-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Tractable Planning for Real-World Robotics: The promises and challenges of dealing with uncertainty. Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004. Robots in unstructured environments. A vision for robotic-assisted health-care. Providing - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Tractable Planning for Real-World Robotics:

The promises and challenges of dealing with uncertainty

Joelle PineauRobotics Institute

Carnegie Mellon University

Stanford University

May 3, 2004

Page 2: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 2

Robots in unstructured environments

Page 3: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 3

A vision for robotic-assisted health-care

Moving thingsaround

Moving thingsaround Supporting

inter-personalcommunication

Supportinginter-personal

communication

Calling for helpin emergencies

Calling for helpin emergencies

Monitoring Rx adherence

& safety

Monitoring Rx adherence

& safety

Providinginformation Providing

information

Reminding to eat, drink, & take meds

Reminding to eat, drink, & take meds

Providing physical

assistance

Providing physical

assistance

Page 4: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 4

What is hard about planning for real-world robotics?

• Generating a plan that has high expected utility.

• Switching between different representations of the world.

• Resolving conflicts between interfering jobs.

• Accomplishing jobs in changing, partly unknown environments.

• Handling percepts which are incomplete, ambiguous, outdated,

incorrect.

* Highlights from a proposed robot grand challenge

Page 5: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 5

Talk outline

• Uncertainty in plan-based robotics

• Partially Observable Markov Decision Processes (POMDPs)

• POMDP solver #1: Point-based value iteration (PBVI)

• POMDP solver #2: Policy-contingent abstraction (PolCA+)

Page 6: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 6

Why use a POMDP?

• POMDPs provide a rich framework for sequential decision-making, which can model:

– Effect uncertainty

– State uncertainty

– Varying rewards across actions and goals

Page 7: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 7

Robotics:

• Robust mobile robot navigation[Simmons & Koenig, 1995; + many more]

• Autonomous helicopter control[Bagnell & Schneider, 2001; Ng et al., 2003]

• Machine vision[Bandera et al., 1996; Darrell & Pentland, 1996]

• High-level robot control[Pineau et al., 2003]

• Robust dialogue management[Roy, Pineau & Thrun, 2000; Peak & Horvitz, 2000]

POMDP applications in last decade

Others:

• Machine maintenance[Puterman., 1994]

• Network troubleshooting[Thiebeaux et al., 1996]

• Circuits testing [correspondence, 2004]

• Preference elicitation[Boutilier, 2002]

• Medical diagnosis[Hauskrecht, 1997]

Page 8: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 8

POMDP model

POMDP is n-tuple { S, A, Z, T, O, R }:

What goes on: st-1 st

at-1 at

S = state setA = action setZ = observation set

What we see: zt-1 zt

Page 9: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 9

POMDP model

POMDP is n-tuple { S, A, Z, T, O, R }:

What goes on: st-1 st

at-1 at

T: Pr(s’|s,a) = state-to-state transition probabilitiesS = state setA = action setZ = observation set

What we see: zt-1 zt

Page 10: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 10

POMDP model

POMDP is n-tuple { S, A, Z, T, O, R }:

What goes on: st-1 st

at-1 at

T: Pr(s’|s,a) = state-to-state transition probabilitiesO: Pr(z|s,a) = observation generation probabilities

S = state setA = action setZ = observation set

What we see: zt-1 zt

Page 11: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 11

POMDP model

POMDP is n-tuple { S, A, Z, T, O, R }:

What goes on: st-1 st

at-1 at

T: Pr(s’|s,a) = state-to-state transition probabilitiesO: Pr(z|s,a) = observation generation probabilitiesR(s,a) = reward function

S = state setA = action setZ = observation set

What we see: zt-1 zt

rt-1 rt

Page 12: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 12

POMDP model

POMDP is n-tuple { S, A, Z, T, O, R }:

What goes on: st-1 st

at-1 atWhat we see: zt-1 zt

What we infer: bt-1 bt

rt-1 rt

T: Pr(s’|s,a) = state-to-state transition probabilitiesO: Pr(z|s,a) = observation generation probabilitiesR(s,a) = reward function

S = state setA = action setZ = observation set

Page 13: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 13

Examples of robot beliefs

robot particles

Uniform belief Bi-modal belief

Page 14: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 14

Understanding the belief state

• A belief is a probability distribution over states

Where Dim(B) = |S|-1

– E.g. Let S={s1, s2}

P(s1)

0

1

Page 15: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 15

Understanding the belief state

• A belief is a probability distribution over states

Where Dim(B) = |S|-1

– E.g. Let S={s1, s2, s3}

P(s1)

P(s2)

0

1

1

Page 16: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 16

Understanding the belief state

• A belief is a probability distribution over states

Where Dim(B) = |S|-1

– E.g. Let S={s1, s2, s3 , s4}

P(s1)

P(s2)

0

1

1

P(s3)

Page 17: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 17

POMDP solving

Objective: Find the sequence of actions that maximizes the expected sum of rewards.

Bb

AabVbabTabRbV

'

)'()',,(),(max)(

Valuefunction

Immediatereward

Futurereward

Page 18: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 18

• Represent V(b) as the upper surface of a set of vectors.– Each vector is a piece of the control policy (= action sequences).

– Dim(vector) = number of states.

• Modify / add vectors to update value fn (i.e. refine policy).

POMDP value function

P(s1)

V(b)

b

2 states

Page 19: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 19

Optimal POMDP solving

• Simple problem: 2 states, 3 actions, 3 observations

V0(b)

b

Plan length #vectors 0 1

P(break-in)

Page 20: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 20

Optimal POMDP solving

• Simple problem: 2 states, 3 actions, 3 observations

P(break-in)

V1(b)

b

Plan length # vectors 0 1 1 3

Call-911

Investigate

Go-to-bed

Page 21: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 21

Optimal POMDP solving

• Simple problem: 2 states, 3 actions, 3 observations

V2(b)

b

Plan length # vectors 0 1 1 3 2 27

P(break-in)

Call-911

Investigate

Go-to-bed

Page 22: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 22

Optimal POMDP solving

• Simple problem: 2 states, 3 actions, 3 observations

V3(b)

b

Plan length # vectors 0 1 1 3 2 27 3 2187

P(break-in)

Call-911

Investigate

Go-to-bed

Page 23: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 23

Optimal POMDP solving

• Simple problem: 2 states, 3 actions, 3 observations

Plan length # vectors 0 1 1 3 2 27 3 2187 4 14,348,907V4(b)

b

P(break-in)

Call-911

Investigate

Go-to-bed

Page 24: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 24

The curse of history

)A( Z1 nn O

Policy size grows exponentially with the

planning horizon:

Where Γ = policy sizen = planning horizonA = # actionsZ = # observations

Page 25: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 25

How many vectors for this problem?

104 (navigation) x 103 (dialogue) states1000+ observations100+ actions

Page 26: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 26

Talk outline

• Uncertainty in plan-based robotics

• Partially Observable Markov Decision Processes (POMDPs)

• POMDP solver #1: Point-based value iteration (PBVI)

• POMDP solver #2: Policy-contingent abstraction (PolCA+)

Page 27: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 27

Exact solving assumes all beliefs are equally likely

robot particles

Uniform belief Bi-modal belief N-modal belief

INSIGHT: No sequence of actions and observations canproduce this N-modal belief.

Page 28: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 28

A new algorithm: Point-based value iteration

P(s1)

V(b)

b1 b0 b2

Approach:

Select a small set of belief points

Plan for those belief points only

Page 29: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 29

A new algorithm: Point-based value iteration

P(s1)

V(b)

b1 b0 b2a,z a,z

Approach:

Select a small set of belief points Use well-separated, reachable beliefs

Plan for those belief points only

Page 30: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 30

A new algorithm: Point-based value iteration

P(s1)

V(b)

b1 b0 b2a,z a,z

Approach:

Select a small set of belief points Use well-separated, reachable beliefs

Plan for those belief points only Learn value and its gradient

Page 31: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 31

A new algorithm: Point-based value iteration

Approach:

Select a small set of belief points Use well-separated, reachable beliefs

Plan for those belief points only Learn value and its gradient

Pick action that maximizes value: bbV

max)(

P(s1)

V(b)

bb1 b0 b2

Page 32: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 32

The anytime PBVI algorithm

• Alternate between:

1. Growing the set of belief point

2. Planning for those belief points

• Terminate when you run out of time or have a good policy.

Page 33: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 33

Complexity of value update

Exact Update PBVI

Time:Projection S2 A Z n S2 A Z n

Sum S A nZ S A Z n B

Size: (# vectors) A nZ B

where: S = # states n = # vectors at iteration n A = # actions B = # belief points

Z = # observations

Page 34: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 34

Theoretical properties of PBVI

Theorem: For any set of belief points B and planning horizon n, the error of the PBVI algorithm is bounded by:

P(s1)

V(b)

b1 b0 b2

Where is the set of reachable beliefsB is the set of all beliefs

1'2minmax* ||'||minmax

)1(

)(|||| bb

RRVV Bbbn

Bn

Err Err

Page 35: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 35

The anytime PBVI algorithm

• Alternate between:

1. Growing the set of belief point

2. Planning for those belief points

• Terminate when you run out of time or have a good policy.

Page 36: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 36

PBVI’s belief selection heuristic

1. Leverage insight from policy search methods:

– Focus on reachable beliefs.

P(s1)

b ba1,z2ba2,z2ba2,z1ba1,z1

a2,z2 a1,z2

a2,z1

a1,z1

Page 37: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 37

PBVI’s belief selection heuristic

1. Leverage insight from policy search methods:

– Focus on reachable beliefs.

2. Focus on high probability beliefs:

– Consider all actions, but stochastic observation choice.

P(s1)

b ba1,z2ba2,z1

a1,z2

a2,z1

ba2,z2ba1,z1

Page 38: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 38

PBVI’s belief selection heuristic

1. Leverage insight from policy search methods:

– Focus on reachable beliefs.

2. Focus on high probability beliefs:

– Consider all actions, but stochastic observation choice.

3. Use the error bound on point-based value updates:

– Select well-separated beliefs, rather than near-by beliefs.

P(s1)

b ba1,z2ba2,z1

a1,z2

a2,z1

Page 39: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 39

Classes of value function approximations

1. No belief [Littman et al., 1995]

3. Compressed belief[Poupart&Boutilier, 2002;

Roy&Gordon, 2002]

x1

x0

x2

2. Grid over belief[Lovejoy, 1991; Brafman 1997;

Hauskrecht, 2000; Zhou&Hansen, 2001]

4. Sample belief points[Poon, 2001; Pineau et al, 2003]

Page 40: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 40

Performance on well-known POMDPs

Maze1

0.20

0.94

0.00

2.30

2.25

REWARD

Maze2

0.11

-

0.07

0.35

0.34

Maze3

0.26

-

0.11

0.53

0.53

Maze1

0.19

-

24hrs

12166

3448

TIME

Maze2

1.44

-

24hrs

27898

360

Maze3

0.51

-

24hrs

450

288

Maze1

-

174

-

660

470

# Belief points

Maze2

-

337

-

1840

95

Maze3

-

-

-

300

86

Method

No belief[Littman&al., 1995]

Grid[Brafman., 1997]

Compressed[Poupart&al., 2003]

Sample[Poon, 2001]

PBVI[Pineau&al., 2003]

Maze1:36 states

Maze2:92 states

Maze3: 60 states

Page 41: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 41

PBVI in the Nursebot domain

Objective: Find the patient!

State space = RobotPosition PatientPosition

Observation space = RobotPosition + PatientFound

Action space = {North, South, East, West, Declare}

Page 42: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 42

PBVI performance on find-the-patient domain

Patient found 17% of trials

Patient found 90% of trialsNo Belief PBVI

No Belief

PBVI

Page 43: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 43

Validation of PBVI’s belief expansion heuristic

Greedy

PBVI

No BeliefRandom

Find-the-patient domain870 states, 5 actions, 30 observations

Page 44: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 44

Policy assuming full observability

You find some:(25%)

You loose some:(75%)

Page 45: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 45

PBVI Policy with 3141 belief points

You find some:(81%)

You loose some:(19%)

Page 46: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 46

PBVI Policy with 643 belief points

You find some:(22%)

You loose some:(78%)

Page 47: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 47

Highlights of the PBVI algorithm

• Algorithmic:

– New belief sampling algorithm.

– Efficient heuristic for belief point selection.

– Anytime performance.

• Experimental:

– Outperforms previous value approximation algorithms on known problems.

– Solves new larger problem (1 order of magnitude increase in problem size).

• Theoretical:

– Bounded approximation error.

[ Pineau, Gordon & Thrun, IJCAI 2003. Pineau, Gordon & Thrun, NIPS 2003. ]

Page 48: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 48

Back to the grand challenge

How can we go from 103 states to real-world problems?

Page 49: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 49

Talk outline

• Uncertainty in plan-based robotics

• Partially Observable Markov Decision Processes (POMDPs)

• POMDP solver #1: Point-based value iteration (PBVI)

• POMDP solver #2: Policy-contingent abstraction (PolCA+)

Page 50: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 50

Navigation

Structured POMDPs

Many real-world decision-making problems exhibit structure inherent to the problem domain.

Cognitive support Social interaction

High-level controller

Move AskWhere

Left Right Forward Backward

Page 51: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 51

Structured POMDP approaches

Factored models[Boutilier & Poole, 1996; Hansen & Feng, 2000; Guestrin et al., 2001]

– Idea: Represent state space with multi-valued state features.

– Insight: Independencies between state features can be leveraged to

overcome the curse of dimensionality.

Hierarchical POMDPs[Wiering & Schmidhuber, 1997; Theocharous et al., 2000; Hernandez-Gardiol &

Mahadevan, 2000; Pineau & Thrun, 2000]

– Idea: Exploit domain knowledge to divide one POMDP into many

smaller ones.

– Insight: Smaller action sets further help overcome the curse of history.

Page 52: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 52

A hierarchy of POMDPs

Act

ExamineHealth Navigate

MoveVerifyFluids

ClarifyGoal

North South East West

VerifyMeds

subtask

abstract action

primitive action

Page 53: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 53

PolCA+: Planning with a hierarchy of POMDPs

Navigate

Move ClarifyGoal

South East WestNorth

AMove = {N,S,E,W}

ACTIONSNorthSouthEastWest

ClarifyGoalVerifyFluidsVerifyMeds

ACTIONSNorthSouthEastWest

ClarifyGoalVerifyFluidsVerifyMeds

Step 1: Select the action set

Page 54: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 54

PolCA+: Planning with a hierarchy of POMDPs

Navigate

Move ClarifyGoal

South East WestNorth

AMove = {N,S,E,W}

SMove = {s1,s2}

STATE FEATURESX-positionY-position

X-goalY-goal

HealthStatus

STATE FEATURESX-positionY-position

X-goalY-goal

HealthStatus

ACTIONSNorthSouthEastWest

ClarifyGoalVerifyFluidsVerifyMeds

ACTIONSNorthSouthEastWest

ClarifyGoalVerifyFluidsVerifyMeds

Step 1: Select the action set

Step 2: Minimize the state set

Page 55: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 55

PolCA+: Planning with a hierarchy of POMDPs

Navigate

Move ClarifyGoal

South East WestNorth

AMove = {N,S,E,W}

SMove = {s1,s2}

STATE FEATURESX-positionY-position

X-goalY-goal

HealthStatus

STATE FEATURESX-positionY-position

X-goalY-goal

HealthStatus

ACTIONSNorthSouthEastWest

ClarifyGoalVerifyFluidsVerifyMeds

ACTIONSNorthSouthEastWest

ClarifyGoalVerifyFluidsVerifyMeds

PARAMETERS

{bh,Th,Oh,Rh}

PARAMETERS

{bh,Th,Oh,Rh}

Step 1: Select the action set

Step 2: Minimize the state set

Step 3: Choose parameters

Page 56: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 56

PolCA+: Planning with a hierarchy of POMDPs

Navigate

Move ClarifyGoal

South East WestNorth

STATE FEATURESX-positionY-position

X-goalY-goal

HealthStatus

STATE FEATURESX-positionY-position

X-goalY-goal

HealthStatus

ACTIONSNorthSouthEastWest

ClarifyGoalVerifyFluidsVerifyMeds

ACTIONSNorthSouthEastWest

ClarifyGoalVerifyFluidsVerifyMeds

PLAN

h

PLAN

h

PARAMETERS

{bh,Th,Oh,Rh}

PARAMETERS

{bh,Th,Oh,Rh}

Step 1: Select the action set

Step 2: Minimize the state set

Step 3: Choose parameters

Step 4: Plan task h

AMove = {N,S,E,W}

SMove = {s1,s2}

Page 57: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 57

PolCA+ in the Nursebot domain

• Goal: A robot is deployed in a nursing home, where it provides reminders to elderly users and accompanies them to appointments.

Page 58: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 58

Performance measure

-2000

2000

6000

10000

14000

0 400 800 1200

Time Steps

Cum

ulat

ive

Rew

ard

PolCA+

PolCA

QMDP

Hierarchy + Belief

Execution Steps

Hierarchy + Belief

Hierarchy + BeliefPolCA+

Page 59: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 59

Comparing user performance

0.1 0.10.18

POMDP PolicyNo Belief Policy

Page 60: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 60

Visit to the nursing home

Page 61: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 61

Highlights of the PolCA+ algorithm

• Algorithmic:– New hierarchical approach for POMDP framework.

– POMDP-specific state and observation abstraction methods.

• Experimental:– First instance of POMDP-based high-level robot controller.

– Novel application of POMDPs to robust dialogue management.

• Theoretical:– For special case (fully observable), guarantees recursive optimality.

[ Pineau, Gordon & Thrun, UAI 2003. Pineau et al., RAS 2003. Roy, Pineau & Thrun, ACL 2001]

Page 62: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 62

Future work

PBVI:

• How can we handle domains with multi-valued state features?

• Can we leverage dimensionality reduction?

• Can we find better ways to pick belief points?

PolCA+:

• Can we automatically learn hierarchies?

• How can we learn (or do without) pseudo-reward functions?

Page 63: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 63

Questions?

Project information:www.cs.cmu.edu/~nursebot

Navigation software:www.cs.cmu.edu/~carmen

Papers and more:www.cs.cmu.edu/~jpineau

Collaborators: Geoffrey Gordon, Judith Matthews, Michael Montemerlo,Martha Pollack, Nicholas Roy, Sebastian Thrunh

Page 64: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 64

Two types of uncertainty

Effect → Stochastic action effects

State → Partial and noisy sensor information

Page 65: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 65

Example: Effect uncertainty

Startposition

Distribution over possiblenext-step positions

Startposition

Distribution over possiblenext-step positions

Motion action

Page 66: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 66

Two types of uncertainty

Effect → Stochastic action effects

State → Partial and noisy sensor information

Page 67: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 67

Example: State uncertainty

Effect → Stochastic action effects

State → Partial and noisy sensor information

Model → Inaccurate parameterization of the environment

Agent → Unknown behaviour of other agents

Startposition

Distribution over possiblenext-step positions

Page 68: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 68

Validation of PBVI’s belief expansion heuristic

No Belief

Hallway domain60 states, 5 actions, 20 observations

Page 69: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 69

0

500

1000

1500

2000

2500

3000

3500

4000

4500

NoAbs PolCA PolCA+

# S

tate

ssubInform

subMove

subContact

subRest

subAssist

subRemind

act

State space reduction

No hierarchy PolCA+

Page 70: Joelle Pineau Robotics Institute Carnegie Mellon University Stanford University May 3, 2004

Joelle Pineau Tractable Planning for Real-World Robotics 70

Future directions

• Improving POMDP planning

– sparser belief space sampling, ordered value updating, dimensionality reduction, continuous / hybrid domains

• Addressing two more types of uncertainty:1. Effect2. State3. Model4. Agent

• Exploring new applications