confidence based autonomy: policy learning by demonstration manuela m. veloso thanks to sonia...

17
Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University Grad AI – Spring 2013

Upload: suzan-lester

Post on 13-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University

Confidence Based Autonomy:Policy Learning by

Demonstration

Manuela M. Veloso

Thanks to Sonia Chernova

Computer Science DepartmentCarnegie Mellon University

Grad AI – Spring 2013

Page 2: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University

Task Representation

• Robot state

• Robot actions

• Training dataset:

• Policy as classifier(e.g., Gaussian Mixture Model, Support Vector Machine)– policy action– decision boundary with greatest confidence for the query– classification confidence w.r.t. decision boundary

sensor data

f1

f2

),,(: dbp cdbasC

} ,...,1,:),{( niAaasD ii

},...,{: 1 kaaA

nf

f

s ...1

s

dbdbc

pa

Page 3: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University

Confidence-Based Autonomy Assumptions

• Teacher understands and can demonstrate the task

• High-level task learning– Discrete actions– Non-negligible action duration

• State space contains all information necessary to learn the task policy

• Robot is able to stop to request demonstration– … however, the environment may continue to change

Page 4: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University

Policy

No Yes

Confident Execution

s2 st…si…s4s3s1

Time

Current State

si

RequestDemonstration

?

ExecuteAction

ap

Relearn Classifier

ExecuteAction ad

RequestDemonstration

ad

),,( dbp cdba

Add Training Point (si, ad)

Page 5: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University

Demonstration Selection

• When should the robot request a demonstration? – To obtain useful training data– To restrict autonomy in areas of uncertainty

Page 6: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University

Fixed Confidence Threshold

• Why not apply a fixed classification confidence threshold?

– Example: conf = 0.5

– Simple– How to select good threshold value?

ss

Page 7: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University

Confident Execution Demonstration Selection

• Distance parameter dist – Used to identify outliers and unexplored regions of state space

• Set of confidence parameters conf – Used to identify ambiguous state regions in which more than one

action is applicable

Page 8: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University

),( DsNND

Confident Execution Distance Parameter

• Distance parameter dist

s

n

i

i

n

DpNND

1dist

),(

))ˆ,ˆ((),(1

jnj

spdistMinDpNND

} ,...,1 ,:),{( niAaasD ii

where

Given

Given state query , request demonstration ifs distDsNND ),(

dist

Page 9: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University

Confident Execution Confidence Parameters

• Set of confidence

parameters conf – One for each decision

boundary

db

db

db

M

i db

iconf M

sconf

1

)(

} ,...,1 ,:),{( niAaasD ii

where

Given

),,(: dbp cdbasC and classifier

}:))(,,,{( ipipiidb aasconfaasM db

Given state query , request demonstration ifsdbconfdb sconf )(

db

s

Page 10: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University

Policy

No Yes

Confident Executionsi

RequestDemonstration

?

ExecuteAction

ap

Relearn Classifier

ExecuteAction ad

RequestDemonstration

ad

),,( dbp cdba

Add Training Point (si, ad)

)(dbdb confisconf

disti DsNND ),(or

Page 11: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University

CorrectiveDemonstration

Confidence-Based Autonomy

ConfidentExecution

Policy

No Yes

si

RequestDemonstration

?

ExecuteAction

ap

Relearn Classifier

ExecuteAction ad

RequestDemonstration

ad

),,( dbp cdba

Add Training Point (si, ad)

ac

Teacher

Relearn Classifier

Add Training Point (si, ac)

Page 12: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University

Evaluation in Driving Domain

Introduced byAbbeel and Ng, 2004

Task: Teach the agent to drive on the highway– Fixed driving speed– Pass slower cars and avoid collisions

current lanenearest car lane 1nearest car lane 2nearest car lane 3

state

merge left merge right stay in lane

actions

Page 13: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University

Evaluation in Driving Domain

Demonstration Selection Method

# Demonstrations Collision Timesteps

“Teacher knows best” 1300 2.7%

Confident Execution

fixed conf 1016 3.8%

Confident Execution

dist & mult.conf 504 1.9%

CBA 703 0%

CBA Final Policy

Page 14: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University

Demonstrations Over Time

Total DemonstrationsConfident ExecutionCorrective Demonstration

Page 15: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University
Page 16: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University

Summary

Confidence-Based Autonomy algorithm– Confident Execution demonstration selection – Corrective Demonstration

Page 17: Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University

What did we do today?

• (PO)MDPs: need to generate a good policy– Assumes the agent has some method for estimating its state (given

current belief state and action, observation, where do I think I am now?)– How do we estimate this?

• Discrete latent states HMMs (simplest DBNs)• Continuous latent states, observed states drawn from Gaussian,

linear dynamical system Kalman filters– (Assumptions relaxed by Extended Kalman Filter, etc)

• Not analytic particle filters– Take weighted samples (“particles”) of an underlying distribution

• We’ve mainly looked at policies for discrete state spaces• For continuous state spaces, can use LfD:

– ML gives us a good-guess action based on past actions– If we’re not confident enough, ask for help!