learning equivalent action choices from demonstration (s. chernova and m. veloso )

16
Basia Korel Brown University cs2950-z February 15, 2010

Upload: frey

Post on 11-Jan-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Learning Equivalent Action Choices from Demonstration (S. Chernova and M. Veloso ). Basia Korel Brown University cs2950-z February 15, 2010. Outline. Overview Demonstration Learning Algorithm Confident Execution Corrective Demonstration Limitations Option Class Algorithm - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Learning Equivalent Action Choices from Demonstration (S.  Chernova  and M.  Veloso )

Basia KorelBrown University cs2950-z

February 15, 2010

Page 2: Learning Equivalent Action Choices from Demonstration (S.  Chernova  and M.  Veloso )

Overview Demonstration Learning Algorithm Confident Execution Corrective Demonstration Limitations Option Class Algorithm Experiments and Results Conclusion

Page 3: Learning Equivalent Action Choices from Demonstration (S.  Chernova  and M.  Veloso )

Addressing: equivalent action choices The context: learning from demonstration In the real world: equivalent actions

demonstrated arbitrarily and inconsistently

Page 4: Learning Equivalent Action Choices from Demonstration (S.  Chernova  and M.  Veloso )

Resulting problem: labeled training data lacks consistency

Contribution: identify, represent and enact equivalent action choices Identify conflicting demonstrations Represent choice of multiple actions in the

policy Common assumption of previous

approaches: each state maps to one best action

Page 5: Learning Equivalent Action Choices from Demonstration (S.  Chernova  and M.  Veloso )

Learning equivalent actions is built upon: Confident Execution: to obtain teacher

demonstrations and learn the action policy Corrective Demonstration: to correct

execution mistakes by additional demonstrations

Page 6: Learning Equivalent Action Choices from Demonstration (S.  Chernova  and M.  Veloso )

An interactive learning algorithm. Given the current world state, the robot: Determines the need for a demonstration

based on a confidence May request demonstrations to improve policy

Page 7: Learning Equivalent Action Choices from Demonstration (S.  Chernova  and M.  Veloso )

Robot’s policy represented by classifier C : s(a,c,db) Trained using states as inputs and actions as

labels Measure of action selection confidence

Page 8: Learning Equivalent Action Choices from Demonstration (S.  Chernova  and M.  Veloso )

An algorithm to correct unwanted actions by providing the teacher with supplementary corrective demonstrations

Page 9: Learning Equivalent Action Choices from Demonstration (S.  Chernova  and M.  Veloso )

Assumptions made: One-to-one state-action mapping Consistent demonstrations A complete policy given enough

demonstrations Assumptions may fail in the real world!

Multiple equivalent actions cause ambiguity Robot sensor noise may cause inconsistency

Page 10: Learning Equivalent Action Choices from Demonstration (S.  Chernova  and M.  Veloso )

Option class: a cluster of data points that have been labeled with at least two different actions

Algorithm: extracts and explicitly models option classes in the robot’s policy

Page 11: Learning Equivalent Action Choices from Demonstration (S.  Chernova  and M.  Veloso )

given demonstration dataset DM PointsInLowConfidenceRegion(D)d MeanNearestNeighborDist(D)C ConnectedComponents(M,d)for c ∈ C do A ActionClasses(c) if Size(c) > 3 and Size(A) > 1 then CreateClass(D, c, Option-A)UpdateClassifier(D)ResetClass(D)

Page 12: Learning Equivalent Action Choices from Demonstration (S.  Chernova  and M.  Veloso )

Obstacle avoidance domain:

Gathered data:

Page 13: Learning Equivalent Action Choices from Demonstration (S.  Chernova  and M.  Veloso )

Evaluation: Confident Execution with and without option classes

Metrics: % of complete policies # of demonstrations NOT classification accuracy

Results (with respect to option classes): Converge to complete policy with much higher

frequency Required demonstrations much lower

Page 14: Learning Equivalent Action Choices from Demonstration (S.  Chernova  and M.  Veloso )
Page 15: Learning Equivalent Action Choices from Demonstration (S.  Chernova  and M.  Veloso )

Multiple equivalent actions exist in the real world

Model action choices explicitly in the policy

Domain limitations: discrete action labels

Page 16: Learning Equivalent Action Choices from Demonstration (S.  Chernova  and M.  Veloso )

Chad Jenkins, Brown RLAB and cs2950-z course staff/leaders