robert platt colin kohler marcus gualtieri€¦ · background: grasp pose detection (gpd) our work:...

Deictic Image Mapping

Robert PlattColin Kohler

Marcus Gualtieri

Computer ScienceNortheastern University

Goal: open world manipulation

Task: pick up stuff from floor and throw it out– novel objects– unstructured environments– novel starting configurations

Background: Grasp Pose Detection (GPD)

Our work: Gualtieri et al., ICRA 2016, ten Pas et al., IJRR 2017Work by others: Mahler et al., RSS 2017; Levine et al., ISER 2016

ROS Package: https://github.com/atenpas/gpd

This talk: open world manipulation

Evaluation: define task with built-in variation– e.g. make coffee w/ single cup maker– e.g. put dishes in dish washer

Axes of variation:– variation over machines/mugs– variation over object pose

Challenge: interact with objects w/o full models

If object pose is known:

Challenge: interact with objects w/o full models

If object pose is unknown:

Reinforcement learning?

Deep RL is not pose invariant

Deep reinforcement learning

nvo lution al La yers

Fully C

onne cted Laye rs

Value functionImage Features

Deep RL for robotic manipulation

lution al La

Fully C

onne cted

Laye rsMove left?

Rotate?

Suppose agent learns this policy...

lution al La

Fully C

onne cted

Laye rsMove left?

Rotate?

Policy A

It does not generalize

lution al La

Fully C

onne cted

Laye rsMove left?

Rotate?

Policy B

Agent must see both poses during training

lution al La

Fully C

onne cted

Laye rsMove left?

Rotate?

Training set must span SE(2)

lution al La

Fully C

onne cted

Laye rsMove left?

Rotate?

Training set must span camera poses

lution al La

Fully C

onne cted

Laye rsMove left?

Rotate?

The pose invariant policy learning problem

Don’t want to learn a separate policy for each mug.

The pose invariant policy learning problem

Don’t want to learn to make coffee twice!

Question

lution al La

Fully C

onne cted

Laye rsMove left?

Rotate?

How do we encode things so that we generalize from this

Question

lution al La

Fully C

onne cted

Laye rsMove left?

Rotate?

How do we encode things so that we generalize from this to this?

Assume actions are collision-free motions

Pose B

Each action performs an end-to-end collision free motion

Assume actions are collision-free motions

Pose A

Pose B

Pose C

Pose D

Pose E

Each action performs an end-to-end collision free motion– action set spans SE(2) or SE(3)

Standard Encoding of Action

lution al La

Fully C

onne cted

Laye rsMove pose A

Movepose B

Movepose C

Encode actions as coordinates in image

Deictic Image Encoding of Action

Encode this action as that image patch

DQN with Deictic Image Mapping

Convo lu

tion a

l Laye rs

Fully C

onne cte

d L aye

Q(s,a)

action

stateReduced state representation

Action represented as image patch

Convo lu

tion a

l Laye rs

Fully C

onne cte

d L aye

Q(s,a)

action

These two actions have same encoding

Convo lu

tion a

l Laye rs

Fully C

onne cte

d L aye

Q(s,a)

action

These two actions have same encoding

Illustration: Pick/Place on Grid

Actions: pick or place in any cell (32 actions total)

State: set of grid configurations

Transitions: deterministic

Reward: two-in-a-row = 1; else 0.

Pick cell 15 Place cell 3

cell 15

cell 3

Standard DQN Encoding

What’s the value of this transition: ?

Convolu tional L ayers

Fully C

o nnecte d Laye rs

Pickcell 1?

PlaceCell 3?

32 actions

Deictic Action Encoding

Fully C

o nnecte d Laye rs

action

Q(s,a)

What’s the value of this transition: ?

Image patch

Deictic Action Encoding

We want these two actions to have the same encoding:

Fully C

o nnecte d Laye rs

action

Q(s,a)

That’s what Deicitic Image Map accomplishes:

Comparison

– 5x5 disk grid world– avg over 10 trials

DQN: standard encoding

DQN: deictic encoding

Platt, et al., Submitted 2018

Let’s think about the encoding as a mapping

Fully C

o nnecte d Laye rs

action

Q(s,a)

ActionPick cell (1,3)

Abstract State/ActionUnderlying State/Action

Abstract SpaceUnderlying Space

Abstract

Project

Use neural network to estimate:

Get underlying greedy action:

Abstract

Project

Does this approach learn suboptimal policies?

Abstract

Project

Does this approach learn suboptimal policies?No!: can prove optimality using theory

of MDP homomorphisms

Key challenge: large action space

e.g., this task has 13.5k pick actions + 13.5k place actions

Several ways to handle this:

– pass all actions in a single batch

– use fully convolutional network

– use hierarchical value function

– curriculum learning

e.g., this task has 13.5k pick actions + 13.5k place actions

Several ways to handle this:

– pass all actions in a single batch

– use fully convolutional network

– use hierarchical value function

– curriculum learning

Level 1

Level 2

Level 3

Disks100 actions

Blocks512 actions

Curriculum:1. Disks, 100 actions2. Disks, 400 actions3. Blocks, 100 actions4. Blocks, 200 actions5. Blocks, 1.3k actions6. Blocks, 7k actions7. Blocks, 13.5k actions8. Blocks, 26.9k actions

Total train time: ~1.5 hrs on one NVIDA 1080

Pick Place

Blocks26.9k actions

Demonstration: Novel Object Pick/Place

Gualtieri, et al., ICRA 2018

Demonstration: Novel Object Regrasping

Gualtieri, et al., ICRA 2018

Toward a system for learning arbitrary tasks

Gualtieri, et al., CoRL 2018, Platt et al., AAAI 2019

Summary

1. We tackle the pose invariant policy learning problem

2. Assume a large space of reach actions– must be careful about how Q-function is encoded

3. Provably correct

4. Can work well in practice

Marcus Gualtieri Colin Kohler

robert platt colin kohler marcus gualtieri€¦ · background: grasp pose detection (gpd) our work:...

Documents

gualtieri ressler apha2011

future of consumer healthcare lisa gualtieri, phd, scm,...

mobile health design: global mobile health june 10, 2013...

digital magazine - scuolesquillace.gov.it magazine... ·...

haptic interface transparency achieved through viscous...

mobile health search lisa gualtieri apha12

msw treatment state of the art gualtieri

temperance movement kieren billig, britney coleman, diego...

the untapped potential of patient activists lisa gualtieri,...

valeria ferrari quasi-normalmodesandgravitational...

artificial intelligence: how enterprises can crush it with...

supplementary information - royal society of chemistry ·...

deforestation presented by team genius ariana woodson, heng...

prof. alessandro francesco gualtieri -...

thank you for choosing to shop with us! how to order 14/77...

ijrr observability based rules consistency 2010

ballbot intro ijrr

5 reasons enterprise adoption of spark is unstoppable by...

randomized kinodynamic motion planning with moving...

1300hf: smooth manifolds1300hf: smooth manifolds marco...