robert platt colin kohler marcus gualtieri€¦ · background: grasp pose detection (gpd) our work:...

45
Deictic Image Mapping Robert Platt Colin Kohler Marcus Gualtieri Computer Science Northeastern University

Upload: others

Post on 17-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Deictic Image Mapping

Robert PlattColin Kohler

Marcus Gualtieri

Computer ScienceNortheastern University

Page 2: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Goal: open world manipulation

Task: pick up stuff from floor and throw it out– novel objects– unstructured environments– novel starting configurations

Page 3: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Background: Grasp Pose Detection (GPD)

Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017Work by others: Mahler et al., RSS 2017; Levine et al., ISER 2016

ROS Package: https://github.com/atenpas/gpd

Page 4: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

This talk: open world manipulation

Evaluation: define task with built-in variation– e.g. make coffee w/ single cup maker– e.g. put dishes in dish washer

Axes of variation:– variation over machines/mugs– variation over object pose

Page 5: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

This talk: open world manipulation

Challenge: interact with objects w/o full models

If object pose is known:

Page 6: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

This talk: open world manipulation

Challenge: interact with objects w/o full models

If object pose is unknown:

Page 7: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

This talk: open world manipulation

Reinforcement learning?

Page 8: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Deep RL is not pose invariant

Page 9: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Deep reinforcement learning

Co

nvo lution al La yers

Fully C

onne cted Laye rs

Run?

Jump?

Value functionImage Features

Page 10: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Deep RL for robotic manipulation

Convo

lution al La

yers

Fully C

onne cted

Laye rsMove left?

Rotate?

Page 11: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Suppose agent learns this policy...

Convo

lution al La

yers

Fully C

onne cted

Laye rsMove left?

Rotate?

Policy A

Page 12: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

It does not generalize

Convo

lution al La

yers

Fully C

onne cted

Laye rsMove left?

Rotate?

Policy B

Page 13: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Agent must see both poses during training

Convo

lution al La

yers

Fully C

onne cted

Laye rsMove left?

Rotate?

Page 14: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Training set must span SE(2)

Convo

lution al La

yers

Fully C

onne cted

Laye rsMove left?

Rotate?

Page 15: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Training set must span camera poses

Convo

lution al La

yers

Fully C

onne cted

Laye rsMove left?

Rotate?

Page 16: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

The pose invariant policy learning problem

Don’t want to learn a separate policy for each mug.

Page 17: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

The pose invariant policy learning problem

Don’t want to learn to make coffee twice!

Page 18: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Question

Convo

lution al La

yers

Fully C

onne cted

Laye rsMove left?

Rotate?

How do we encode things so that we generalize from this

Page 19: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Question

Convo

lution al La

yers

Fully C

onne cted

Laye rsMove left?

Rotate?

How do we encode things so that we generalize from this to this?

Page 20: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Deictic Image Mapping

Page 21: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Assume actions are collision-free motions

Pose B

Mov

e to

Pos

e B

Each action performs an end-to-end collision free motion

Page 22: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Assume actions are collision-free motions

Pose A

Pose B

Pose C

Pose D

Pose E

Each action performs an end-to-end collision free motion– action set spans SE(2) or SE(3)

Page 23: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Standard Encoding of Action

Convo

lution al La

yers

Fully C

onne cted

Laye rsMove pose A

Movepose B

Mov

e to

Pos

e B

Movepose C

Image

Encode actions as coordinates in image

Page 24: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Deictic Image Encoding of Action

Encode this action as that image patch

Mov

e to

Pos

e B

Page 25: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

DQN with Deictic Image Mapping

Convo lu

tion a

l Laye rs

Fully C

onne cte

d L aye

rs

Q(s,a)

action

stateReduced state representation

Action represented as image patch

Page 26: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

DQN with Deictic Image Mapping

Convo lu

tion a

l Laye rs

Fully C

onne cte

d L aye

rs

Q(s,a)

action

stateReduced state representation

These two actions have same encoding

Page 27: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

DQN with Deictic Image Mapping

Convo lu

tion a

l Laye rs

Fully C

onne cte

d L aye

rs

Q(s,a)

action

stateReduced state representation

These two actions have same encoding

Page 28: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Illustration: Pick/Place on Grid

Actions: pick or place in any cell (32 actions total)

State: set of grid configurations

Transitions: deterministic

Reward: two-in-a-row = 1; else 0.

Pick cell 15 Place cell 3

cell 15

cell 3

Page 29: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Standard DQN Encoding

What’s the value of this transition: ?

Convolu tional L ayers

Fully C

o nnecte d Laye rs

Pickcell 1?

PlaceCell 3?

32 actions

Page 30: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Deictic Action Encoding

Convolu tional L ayers

Fully C

o nnecte d Laye rs

action

Q(s,a)

state

What’s the value of this transition: ?

Image patch

Page 31: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Deictic Action Encoding

We want these two actions to have the same encoding:

Convolu tional L ayers

Fully C

o nnecte d Laye rs

action

Q(s,a)

state

That’s what Deicitic Image Map accomplishes:

Page 32: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Comparison

– 5x5 disk grid world– avg over 10 trials

DQN: standard encoding

DQN: deictic encoding

Platt, et al., Submitted 2018

Page 33: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Let’s think about the encoding as a mapping

Convolu tional L ayers

Fully C

o nnecte d Laye rs

action

Q(s,a)

state

State

ActionPick cell (1,3)

Abstract State/ActionUnderlying State/Action

Page 34: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Deictic Image Mapping

Abstract SpaceUnderlying Space

Abstract

Page 35: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Deictic Image Mapping

Abstract SpaceUnderlying Space

Abstract

Solve

Page 36: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Deictic Image Mapping

Abstract SpaceUnderlying Space

Abstract

Solve

Project

Use neural network to estimate:

Get underlying greedy action:

Page 37: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Deictic Image Mapping

Abstract SpaceUnderlying Space

Abstract

Solve

Project

Use neural network to estimate:

Get underlying greedy action:

Does this approach learn suboptimal policies?

Page 38: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Deictic Image Mapping

Abstract SpaceUnderlying Space

Abstract

Solve

Project

Use neural network to estimate:

Get underlying greedy action:

Does this approach learn suboptimal policies?No!: can prove optimality using theory

of MDP homomorphisms

Page 39: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Key challenge: large action space

e.g., this task has 13.5k pick actions + 13.5k place actions

Several ways to handle this:

– pass all actions in a single batch

– use fully convolutional network

– use hierarchical value function

– curriculum learning

Page 40: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Key challenge: large action space

e.g., this task has 13.5k pick actions + 13.5k place actions

Several ways to handle this:

– pass all actions in a single batch

– use fully convolutional network

– use hierarchical value function

– curriculum learning

Level 1

Level 2

Level 3

Page 41: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Key challenge: large action space

Disks100 actions

Blocks512 actions

Curriculum:1. Disks, 100 actions2. Disks, 400 actions3. Blocks, 100 actions4. Blocks, 200 actions5. Blocks, 1.3k actions6. Blocks, 7k actions7. Blocks, 13.5k actions8. Blocks, 26.9k actions

Total train time: ~1.5 hrs on one NVIDA 1080

Pick Place

Blocks26.9k actions

Page 42: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Demonstration: Novel Object Pick/Place

Gualtieri, et al., ICRA 2018

Page 43: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Demonstration: Novel Object Regrasping

Gualtieri, et al., ICRA 2018

Page 44: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Toward a system for learning arbitrary tasks

Gualtieri, et al., CoRL 2018, Platt et al., AAAI 2019

Page 45: Robert Platt Colin Kohler Marcus Gualtieri€¦ · Background: Grasp Pose Detection (GPD) Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017 Work by others: Mahler et

Summary

1. We tackle the pose invariant policy learning problem

2. Assume a large space of reach actions– must be careful about how Q-function is encoded

3. Provably correct

4. Can work well in practice

Marcus Gualtieri Colin Kohler