raia hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from v....

Deep Learning for Robots

Raia Hadsellwww.raiahadsell.com

The story:

1. Deep Learning is the future of robotics

2. There are very significant challenges

3. But some solutions emerging, as well.

2010: Speech Recognition

Audio → Acoustic Model → Phonetic Model → Language Model → Text

2012: Computer Vision

Pixels → Key Points → SIFT features → Deformable Part Model → Labels

2014: Machine Translation

Text → Reordering → Phrase Table/Dictionary → Language Model → Text

2017: Robotics?

Sensors → Perception → World Model → Planning → Control → Action

End-to-end Deep Learning for robots?

slide from V. Vanhoucke

2010: Speech Recognition

Audio → Acoustic Model → Phonetic Model → Language Model → Text

2012: Computer Vision

Pixels → Key Points → SIFT features → Deformable Part Model → Labels

2014: Machine Translation

Text → Reordering → Phrase Table/Dictionary → Language Model → Text

2017: Robotics?

Sensors → Perception → World Model → Planning → Control → Action

Deep Net

End-to-end Deep Learning for robots?

slide from V. Vanhoucke

Deep Net

Deep Net

Deep Net

General Artificial Intelligence

Robotics is different

LABELS


Robotics is different

ACTIONSSENSORS


EnvironmentAgent

Reinforcement Learning

GOALOBSERVATIONS

ACTIONS

REWARD?


EnvironmentAgent

Deep Reinforcement Learning

GOALOBSERVATIONS

ACTIONS

REWARD?

neural network


● Sensorimotor control ?

Could deep RL allow robots to learn end-to-end?

Space Invaders

[Mnih et al, Playing Atari with Deep Reinforcement Learning, 2014]

https://www.youtube.com/watch?v=wHDxF5N700Q

http://www.youtube.com/watch?v=wHDxF5N700Q

General Atari Player

[Mnih et al, Playing Atari with Deep Reinforcement Learning, 2014]

https://www.youtube.com/watch?v=Erkt7HelEco

http://www.youtube.com/watch?v=Erkt7HelEco

9DOF Random reacher

https://youtu.be/u0M3PvTgTcE

http://www.youtube.com/watch?v=u0M3PvTgTcE


● Sensorimotor control




● Exploration of complex spaces ?


Maze navigation

https://youtu.be/zHhbypmKaj0

http://www.youtube.com/watch?v=zHhbypmKaj0



● Exploration of complex spaces





● Strategy and decision making ?



Policy Network Value Network


Lesson: use supervised learning when possible

Human expertpositions

Supervised Learningpolicy network

Reinforcement Learningpolicy network

Generates New Data(30 mil. Positions)

Value network

Self Play Self Play




● Strategy and decision making



Not so fast …

● Deep RL is very data inefficient -

how can it learn on real robots?

So, where are the superhuman robots?

24 hours in simulation with 16 threads …… 55 days on the real Jaco arm


1. Train in simulation, then transfer to real robot

● Benefit is obvious

● Hard to do in practice

Two methods to speed up Deep RL for robots


Progressive Neural Networks


11

22

a

a



11

22

33

a

a

a

a

a

a


Sim-to-Real

Sim-to-Real1

12

23

3

Simulation Robot

Task A Task A Task B


Sim-to-Real: 3d reacher

https://www.youtube.com/watch?v=YZz5Io_ipi8

11

128

22

a

a

a

16

http://www.youtube.com/watch?v=YZz5Io_ipi8




Sim-to-Real: 2d reacher with moving target 1

12

23

3

128 16 16

www.youtube.com/watch?v=e78J1K5LKCI

http://www.youtube.com/watch?v=e78J1K5LKCI

arxiv.org/abs/1606.04671 arxiv.org/abs/1610.04286v1

Andrei Rusu Neil C. Rabinowitz

Guillaume Desjardins

Hubert Soyer

James Kirkpatrick

Koray Kavukcuoglu

Razvan Pascanu


Sim-to-Real Robot Learning from Pixels

NicolasHeess

Raia Hadsell


2. Learn with auxiliary tasks● Accelerate and stabilise reinforcement learning

Two methods to speed up Deep RL for robots


Navigation mazes Game episode:

1. Random start2. Find the goal (+10)3. Teleport randomly4. Re-find the goal (+10)5. Repeat (limited time)

+10 +1


Nav agent ingredients:

1. Convolutional encoder and RGB inputs

2. Stacked LSTM

3. Additional inputs (reward, action, and velocity)

4. RL: Asynchronous advantage actor critic (A3C)

5. Auxiliary task 1: Depth predictor

6. Auxiliary task 2: Loop closure predictor enc

Loop (L)

Depth (D1 )

xt rt-1 {vt, at-1}

Depth (D2 )

[Mnih et al, Asynchonous Methods for Deep Reinforcement Learning, 2016]

Variations in architecture

xt rt-1 {vt, at-1}

enc

xt

enc enc

Loop (L)

Depth (D1 )

a. FF A3C c. Nav A3C d. Nav A3C +D1D2L

xt rt-1 {vt, at-1}

enc

xt

b. LSTM A3C

Depth (D2 )

Results: Results: Auxiliary tasks speed up RL ten-fold!

https://youtu.be/lNoaTyMZsWI

http://www.youtube.com/watch?v=lNoaTyMZsWI&t=77

Piotr Mirowski, Razvan Pascanu, Raia Hadsell

Fabio Ross Andy Hubert Laurent Koray Dharsh Misha Andrea

Learning to navigate in complex environments

arxiv.org/abs/1611.03673

The story:

1. Deep Learning is the future of robotics

2. There are very significant challenges

3. But some solutions emerging, as well.

Thank you!We are hiring! [email protected], [email protected]

raia hadsellraiahadsell.com/uploads/3/6/4/2/36428762/erf2017_keynote_talk.pdf · slide from v....

Documents