robot navigation: from abilities to capabilities€¦ · handles task and robot dynamics helicopter...
TRANSCRIPT
![Page 1: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/1.jpg)
Robot Navigation: From Abilities to Capabilities Machine Learning in Robot Motion Planning Workshop @ IROS 2018
Aleksandra Faust, Ph.DGoogle Brain Robotics
October 2018
![Page 2: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/2.jpg)
![Page 3: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/3.jpg)
Go down the hallway and take the second right.
Navigation
![Page 4: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/4.jpg)
Go down the hallway and take the second right.
Navigation
Perception
Planning
Controls
Slam
![Page 5: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/5.jpg)
Go down the hallway and take the second right.
Navigation
Sight
Hearing
Sensors, hardware, geometry, determine how robot perceives the world
![Page 6: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/6.jpg)
Go down the hallway and take the second right.
Navigation
Sight
Hearing
And what it can do. How it can communicate, move, and interact.
![Page 7: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/7.jpg)
Abilities
Sight
Hearing
Let’s focus on the robot abilities, and learn the foundational motion behaviors.
![Page 8: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/8.jpg)
Essential Navigation behavior
Moving obstacle avoidance
Sight
Hearing
![Page 9: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/9.jpg)
Navigation behaviors based on robot’s abilities
● With primitive sensors● Robust to noise● Dynamically feasible● Transfers between
environments
Moving obstacle avoidance for real robots
![Page 10: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/10.jpg)
Navigation behaviors based on robot’s abilities
● With primitive sensors● Robust to noise● Dynamically feasible● Transfers between
environments
Behaviors to learn: ● Point to point navigation ● Path following
Moving obstacle avoidance for real robots
![Page 11: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/11.jpg)
Confidential + Proprietary
Learning navigation behaviors end to endUnder submission, https://arxiv.org/abs/1809.10124
Hao-Tian Lewis Chiang*, Aleksandra Faust*, Marek Fiser, Anthony Francis
*Equal contributions
Aleksandra Faust Anthony FrancisHao-Tien Chiang Marek Fiser
![Page 12: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/12.jpg)
Learning navigation task
22x18 m
[Lillicrap et al. 2015]
True objective: reach goalWorld
Observations
DDPG
Observation, o
Parameters, θ Policy, 𝛑θ(o, a) = P(a|o)
ActionActor
Critic
[Chiang et al., under submission]
![Page 13: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/13.jpg)
Learning navigation task
22x18 m
[Lillicrap et al. 2015]
True objective: reach goalWorld
Observations
DDPG
Observation, o
Parameters, θ Policy, 𝛑θ(o, a) = P(a|o)
ActionActor
Critic
[Mülling et al., ‘11]
[Faust et al, ‘14][Bagnell and Schneider ‘01]
[Yahya et al., ‘16]
[Levine et al., ‘16]
Handles sensor inputHandles task and robot dynamics
Helicopter image from [Kober et al, ‘13]
![Page 14: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/14.jpg)
Learning Navigation Task
22x18 m
[Lillicrap et al. 2015]
Velocity and orientation @ 5 Hz
True objective: reach goalWorld
Observations: Noisy 1D lidar + goal + orientation
DDPG
Observation, o
Parameters, θ Policy, 𝛑θ(o, a) = P(a|o)
Action
22x18 m
Actor
Critic
[Chiang et al., under submission]
![Page 15: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/15.jpg)
RL Setup is Hard
22x18 m
[Lillicrap et al. 2015]
Velocity and orientation @ 5 Hz
True objective: reach goalWorld
Observations: Noisy 1D lidar + goal + orientation
DDPG
Observation, o
Parameters, θ Policy, 𝛑θ(o, a) = P(a|o)
Action
22x18 m
Actor
Critic
r(s | θr) = R(s, θr)Selecting reward is hard.
[Chiang et al., under submission]
![Page 16: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/16.jpg)
RL Setup is Hard
22x18 m
[Lillicrap et al. 2015]
Velocity and orientation @ 5 Hz
World
Observations: Noisy 1D lidar + goal + orientation
DDPG
Observation, o
Parameters, θ Policy, 𝛑θ(o, a) = P(a|o)
Action
22x18 m
Actor
CriticQ
FF Layer 2
FF Layer 1
FF Layer 3
FF Layer 2
FF Layer 1
r(s | θr) = R(s, θr)Selecting reward is hard.
Network architecture selection is hard.
[Chiang et al., under submission]
True objective: reach goal
![Page 17: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/17.jpg)
Shaped-DDPG
Select best policy
]
]
]
]
]
]
DDPG Agents Training in parallel Evaluate
]
Select new weights
]
Spawn new training agent
Solution: large-scale gradient-free hyper-parameter optimization.
![Page 18: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/18.jpg)
Shaped-DDPG
Shape the reward,fixed network
Select best policy
]
]
]
]
]
]
DDPG Agents Training in parallel Evaluate
]
Select new weights
]
Spawn new training agent
]]]]]] ]
[Chiang et al., under submission]
Solution: large-scale gradient-free optimization.
r(s | θr) = R(s, θr)Find the best reward function that maximizes the true objective
![Page 19: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/19.jpg)
Shaped-DDPG
Shape the reward,fixed network
Select best policy
]
]
]
]
]
]
DDPG Agents Training in parallel Evaluate
]
Select new weights
]
Spawn new training agent
]]]]]] ]
[Chiang et al., under submission]
Solution: large-scale gradient-free optimization.
r(s | θr) = R(s, θr)
Each agent uses different reward
Each agent uses different reward
Each agent uses different reward
Each agent uses different reward
Each agent uses different reward
Each agent uses different reward function
Find the best reward function that maximizes the true objective
![Page 20: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/20.jpg)
Shaped-DDPG
Shape the reward,fixed network
Select best policy
]
]
]
]
]
]
DDPG Agents Training in parallel Evaluate
]
Select new weights
]
Spawn new training agent
Shape actor and critic,fixed reward
Best policy
]
]]]]]] ]
]]]]]] ]
[Chiang et al., under submission]
Solution: large-scale gradient-free optimization.
Fixed r(s| θr)
Number of neurons in each layer in actor and critic
Fixed nn
Find the best reward function that maximizes the true objective
Find the best NN architecture, that maximizes the reward
![Page 21: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/21.jpg)
Shaped-DDPG
Shape the reward,fixed network
Select best policy
]
]
]
]
]
]
DDPG Agents Training in parallel Evaluate
]
Select new weights
]
Spawn new training agent
Shape actor and critic,fixed reward
Best policy
]
]]]]]] ]
]]]]]] ]
[Chiang et al., under submission]
Solution: large-scale gradient-free optimization.
Fixed r(s| θr)
Number of neurons in each layer in actor and critic
Fixed nn
Find the best reward function that maximizes the true objective
Find the best NN architecture, that maximizes the reward
Each agent uses different reward
Each agent uses different reward
Each agent uses different reward
Each agent uses different reward
Each agent uses different reward
Each agent uses different neural network architecture.
![Page 22: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/22.jpg)
Shaped-DDPG Learning Results for Path Following
Reward only shaping
1000 trails, 5 million steps each @ 5Hz - trains in a week[Chiang et al., under submission]
![Page 23: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/23.jpg)
Shaped-DDPG Learning Results for Path Following
Reward only shaping Reward and NN shaping
Stable learning, consistent trials
1000 trails, 5 million steps each @ 5Hz - trains in a week. [Chiang et al., under submission]
![Page 24: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/24.jpg)
Shaped-DDPG Learning Results for Path Following
Reward only shaping Reward and NN shaping
Stable learning, consistent trials
1000 trails, 5 million steps each @ 5Hz - trains in a week
Equivalent of 32 years of collective experience.12 days each trial, learning from previous generations
[Chiang et al., under submission]
![Page 25: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/25.jpg)
... ... ...
323 47 560
Actor network
Shaped-DDPG Learning Results for Path Following
[Chiang et al., under submission]
![Page 26: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/26.jpg)
Shaped-DDPG Evaluation
● Two baselines● Learned: Vanilla DDPG● Classic: Artificial potential
fields● Evaluation environments
● 3 large buildings● With moving obstacles
22x18 m
Building 1, 183 by 66m
Building 2, 60 by 47m Building 3, 134 by 93mTraining, 23 by 18m
[Chiang et al., under submission]
![Page 27: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/27.jpg)
Shaped-DDPG Evaluation: Success Rate
Success rate:shaped-DDPG, vanilla DDPG, classic APF
Higher success rate across all buildings
Path FollowingPoint to point
[Chiang et al., under submission]
![Page 28: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/28.jpg)
Shaped-DDPG vs. vanilla DDPG
Shaped-DDPG: smooth trajectories
Vanilla DDPG: suboptimal behavior
[Chiang et al., under submission]
![Page 29: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/29.jpg)
Shaped-DDPG Evaluation: Impact of Noise
Point to Point
Path Following
Shaped-DDPG policy is more robust to noise
[Chiang et al., under submission]
![Page 30: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/30.jpg)
Shaped-DDPG: On-robot experiments
Path followingPoint to point[Chiang et al., under submission]
![Page 31: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/31.jpg)
![Page 32: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/32.jpg)
Impact of Number of Obstacles
No Moving obstacle 30 moving obstacles[Chiang et al., under submission]
![Page 33: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/33.jpg)
Confidential + proprietary
![Page 34: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/34.jpg)
Navigation behaviors based on robot’s abilities
Handles:Sensors to controls, dynamics, noiseObstacle avoidance
Transferable to new environmentsEasy sim2real
Learned end-end methods handle noise.
Shaping optimizes the trajectories.
Traditional methods: well behaved, brittle.
![Page 35: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/35.jpg)
Confidential + Proprietary
Navigation capabilities:Learn to navigate by looking at a map
PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-based PlanningICRA 2018, https://arxiv.org/abs/1710.03937, Best paper in Service Robotics
Aleksandra Faust, Oscar Ramirez, Marek Fiser, Kenneth Oslund, Anthony Francis, James Davidson Lydia Tapia
Aleksandra Faust Anthony FrancisMarek Fiser James DavidsonKen OslundOscar Ramirez Lydia Tapia
![Page 36: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/36.jpg)
How to navigate to a location on a map?
Transferable to new environmentsEasy sim2real
Lacks context needed for navigation.
Handles:Sensors to controls, dynamics, noiseObstacle avoidance
![Page 37: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/37.jpg)
● Long-distance collision-free navigation○ Approximate all possible robot motions○ Sample and connect robot poses○ Connect them with small, feasible, motion
transitions● Checking validity of pose transitions is
expensive● Sampling based planners that create
reusable roadmaps○ Often consider geometry only
Related work: Sampling-based planners
[Kavraki et al. ‘96]
[LaValle & Kuffner, ‘01]
[Hauser et al., ‘06] [Hsu et al., ‘02]
[LaValle & Kuffner, ‘01]
![Page 38: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/38.jpg)
Related Work: Probabilistic Roadmaps (PRMs)
● Building[Kavraki et al. 1996][Kavraki et al. ‘96]
![Page 39: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/39.jpg)
Related Work: Probabilistic Roadmaps (PRMs)
● Building○ Sample configuration space
[Kavraki et al. 1996][Kavraki et al. ‘96]
![Page 40: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/40.jpg)
Related Work: Probabilistic Roadmaps (PRMs)
● Building○ Sample configuration space○ Reject in-collision samples
[Kavraki et al. 1996][Kavraki et al. ‘96]
![Page 41: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/41.jpg)
Related Work: Probabilistic Roadmaps (PRMs)
● Building○ Sample configuration space○ Reject in-collision samples○ Connect samples only if a local
planner finds a collision-free path
[Kavraki et al. 1996][Kavraki et al. ‘96]
![Page 42: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/42.jpg)
Related Work: Probabilistic Roadmaps (PRMs)
● Building○ Sample configuration space○ Reject in-collision samples○ Connect samples only if a local
planner finds a collision-free path
[Kavraki et al. 1996][Kavraki et al. ‘96]
![Page 43: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/43.jpg)
Related Work: Probabilistic Roadmaps (PRMs)
● Building○ Sample configuration space○ Reject in-collision samples○ Connect samples only if a local
planner finds a collision-free path● Querying
[Kavraki et al. 1996][Kavraki et al. ‘96]
![Page 44: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/44.jpg)
Related Work: Probabilistic Roadmaps (PRMs)
● Building○ Sample configuration space○ Reject in-collision samples○ Connect samples only if a local
planner finds a collision-free path● Querying
○ Add start and goal to the roadmap
[Kavraki et al. 1996][Kavraki et al. ‘96]
![Page 45: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/45.jpg)
Related Work: Probabilistic Roadmaps (PRMs)
● Building○ Sample configuration space○ Reject in-collision samples○ Connect samples only if a local
planner finds a collision-free path● Querying
○ Add start and goal to the roadmap○ Find the shortest path in the graph
[Kavraki et al. 1996][Kavraki et al. ‘96]
![Page 46: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/46.jpg)
Related Work: Probabilistic Roadmaps (PRMs)
● Building○ Sample configuration space○ Reject in-collision samples○ Connect samples only if a local
planner finds a collision-free path● Querying
○ Add start and goal to the roadmap○ Find the shortest path in the graph
● Path following○ Path guided artificial potential fields○ Reinforcement learning
[Kavraki et al. 1996]
[Chiang et al. 2015]
[Faust et al. 2017]
[Kavraki et al. ‘96]
![Page 47: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/47.jpg)
PRM-RL Algorithm
Trained point to point agent - basic navigation behavior.
[Faust et al., ICRA 2018]
]
RL Agent
![Page 48: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/48.jpg)
PRM-RL Algorithm
One time setup
[Faust et al., ICRA 2018]
]
RL Agent
Trained point to point agent - basic navigation behavior.
![Page 49: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/49.jpg)
PRM-RL Algorithm
Add an edge only ifRL agent can consistently navigate between two nodes
One time setup
[Faust et al., ICRA 2018]
Trained point to point agent - basic navigation behavior.
]
RL Agent
![Page 50: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/50.jpg)
PRM-RL Algorithm
Add an edge only ifRL agent can consistently navigate between two nodes
One time setup
Execute long trajectories[Faust et al., ICRA 2018]
]
RL Agent
Trained point to point agent - basic navigation behavior.
![Page 51: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/51.jpg)
PRM-RL: Indoor Navigation Building PRMs
180x65 m
60x47 m134x92 m
60x larger than the training
20 trials with 85% confidence
[Faust et al., ICRA 2018]
![Page 52: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/52.jpg)
2 hours to build
One-time set-up
Largest roadmap:
1700 nodes60 000 edges23 million collision checks
PRM-RL: Indoor Navigation Building PRMs
[Faust et al., ICRA 2018]
![Page 53: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/53.jpg)
PRM-RL: Results for Indoor Navigation
Longest trajectory 215 meters
45 waypoints
[Faust et al., ICRA 2018]
![Page 54: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/54.jpg)
PRM-RL Experimental Results
Four noisy trials
All successful, because the map is tuned to the robot’s abilities
[Faust et al., ICRA 2018]
![Page 55: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/55.jpg)
How to navigate to a location on a map?
Transferable to new environmentsEasy sim2real
Navigates over long distances.
Requires one time set-up. Handles:Sensors to controls, dynamics, noiseObstacle avoidance
![Page 56: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/56.jpg)
Confidential + Proprietary
Navigation capabilities: Following DirectionsFollowing Natural Language Navigation Instructions with Deep Reinforcement LearningUnder submissionAleksandra Faust, Chase Kew, Dilek Hakkani-Tur, Marek Fiser, Pararth Shah
FollowNet: Towards Robot Navigation by Following Natural Language Directions with Deep Reinforcement LearningMLPC @ ICRA 2018, https://arxiv.org/abs/1805.06150Pararth Shah, Marek Fiser, Aleksandra Faust, Chase Kew, Dilek Hakkani-Tur
Aleksandra Faust J. Chase KewMarek Fiser Dilek Hakkani-Tur Pararth Shah
![Page 57: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/57.jpg)
How to follow directions?
Sensor to controls, dynamics, noiseObstacle avoidance
Transferable to new environmentsEasy sim2real
Go down the hallway and take the second right.
One time building / robot setup.
Hearing
![Page 58: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/58.jpg)
Following instructions
Go down the hallway and take the second right.
Start
Goal
[Shah et al., 2018]
![Page 59: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/59.jpg)
Following instructions
Go down the hallway and take the second right.
Start
Goal
[Shah et al., 2018]
Dataset: 150 instructions 2 buildings
![Page 60: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/60.jpg)
Language complexity
Env
ironm
ent c
ompl
exity
Unfiltered, natural language
Partial observations
Full environment observability
Processed, language tokens
FollowNet
[Mei et al.,2016] [Chaplot et al., 2018]
[Das et al, 2018]
[Misra et al. 2017]
[Yu et al. 2018]
[Thomason, et al, 2015]
[Anderson et al., 2017]
[Arumugam et al. 2017]
[Thomason et al., 2017]
Following instructions: Related work
[Shah et al., 2018]
![Page 61: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/61.jpg)
Following instructions: RL Training
22x18 m
DQN Agent[Mnih et al. 2015]
Actions:Turn left, turnright, go straight
Reward: waypoint reachedWorld
Observations: Images and NL instruction
FollowNet Architecture
Observation, o
Parameters, θ Policy, 𝛑θ(o, a) = P(a|o)
Action
Go down the hallway and take a second right.[Shah et al., 2018]
![Page 62: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/62.jpg)
FollowNet: Architecture
3, 8, and 16 outputs [1,1], [4,4], [3,3] kernels 1, 2, 1 strides.
32 outputs
16 and 8 hidden layers
16 hidden states
8 and 16 outputs [4, 4], [3, 3] kernels 2, 1 strides
[Shah et al., 2018]
![Page 63: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/63.jpg)
FollowNet: Results
Better learning curve than the baseline
Baseline: model without attention
Attention over steps
Learns what is important
Learns what to ignore
[Faust et al., under submission]
![Page 64: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/64.jpg)
Instruction complexity
Number of waypoints measures the instruction complexity, not number of words or path length.
[Faust et al., under submission]
![Page 65: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/65.jpg)
FollowNet on New Instructions
52% success on new instructions67% at least partial success
30% increase over the baseline
![Page 66: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/66.jpg)
FollowNet performance per word
Words with spatial semantics are more likely to be successful
Landmarks, less so
[Faust et al., under submission]
![Page 67: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/67.jpg)
![Page 68: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/68.jpg)
How to follow directions?
Sensor to controls, dynamics, noiseObstacle avoidance
Transferable to new environmentsEasy sim2real
Go down the hallway and take the second right.
One time building / robot setup.
Does not require building set-up.Promising results.
![Page 69: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/69.jpg)
From robot abilities to navigation capabilities
Go down the hallway and take the second right.
End to end basic navigation behaviors behaviors End to end complex navigation tasks
Hearing
![Page 70: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/70.jpg)
Go down the hallway and take the second right.
Navigation
Perception
Planning
Controls
Slam
![Page 71: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/71.jpg)
Go down the hallway and take the second right.
Navigation
Bas
ic
beha
vior
s
Sim
ple
task
s
Com
plex
Tas
ks
Sight
Hearing
![Page 72: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/72.jpg)
Navigation
Bas
ic
beha
vior
s
Sim
ple
task
s
Com
plex
Tas
ks
Sight
HearingThank you!QUESTIONS?
![Page 73: Robot Navigation: From Abilities to Capabilities€¦ · Handles task and robot dynamics Helicopter image from [Kober et al, ‘13] Learning Navigation Task 22x18 m [Lillicrap et](https://reader036.vdocuments.us/reader036/viewer/2022071009/5fc6a3b67b22dd142e4bb89f/html5/thumbnails/73.jpg)
Thank you
Anthony Francis J. Chase Kew
Hao-Tien Chiang
Marek Fiser
Dilek Hakkani-Tur Pararth Shah James Davidson
Ken Oslund Oscar RamirezLydia Tapia