![Page 1: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/1.jpg)
Robot Control in Situated Instruction Following
Yoav Artzi
Advances of Language & Vision Research Workshop ACL 2020
![Page 2: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/2.jpg)
Mapping Instructions to Actions
f(instruction, context)=actionsGoal: model and learn a mapping
![Page 3: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/3.jpg)
TodayMapping instructions to continuous control
• Generating and executing interpretable plans
• Jointly learning in real life and a simulation
• Training for test-time exploration
![Page 4: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/4.jpg)
• Navigation between landmarks
• Agent: quadcopter drone
• Context: pose and RGB camera image
Task
![Page 5: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/5.jpg)
Task
after the blue bale take a right towards the small white bush before the white bush take a right and head towards the
right side of the banana
![Page 6: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/6.jpg)
STOP
Mapping Instructions to Control
• The drone maintains a configuration of target velocities
• Each action updates the configuration or stops
• Goal: learn a mapping from inputs to configuration updates
after the blue bale take a right towards the
small white bush before the white bush …
Linear forward velocity Angular yaw rate
f( )=, ,
(v, ω)(
vtωt
![Page 7: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/7.jpg)
Language Understanding
MappingPerception
Planning Control
Modular Approach
Instruction
• Build/train separate components • Symbolic meaning representation • Complex integration
![Page 8: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/8.jpg)
Single-model Approach
Planning with Position Visitation
PredictionAction
Generation
Stage I Stage IIInstruction
Action
Visitation Distributions
fHow to think of modularity and interpretability when
packing everything in a single model?
![Page 9: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/9.jpg)
Single-model Approach
Planning with Position Visitation
PredictionAction
Generation
Stage I Stage IIInstruction
Action
1. Predict states likely to visit and track accumulated observability
2. Generate actions to visit high-probability states and explore
Visitation Distributions
Observation Mask
![Page 10: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/10.jpg)
• The state-visitation distribution is the probability of visiting state following policy from start state
• Predicting for an expert policy tells us the states to visit to complete the task
• We compute two distributions: trajectory-visitation and goal-visitation
Visitation Distribution
πd(s; π, s0)
s0
s
π*d(s; π*, s0)
![Page 11: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/11.jpg)
Visitation Distribution for Navigation
Trajectory distribution
Planning with Position Visitation
PredictionAction
Generation
Stage I Stage IIInstruction
Goal distribution• Distributions reflect the agent plan
• Model goal observability
• Refined as observing more of the environment
![Page 12: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/12.jpg)
Trajectory distribution
Stage I: Planning with Position Visitation Prediction
Learned Maps
Observation Mask
CNN
Instruction
Mapping
![Page 13: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/13.jpg)
Differentiable Mapping Step 1: Feature Extraction
• Extract features with a ResNet
• Recover a low resolution semantic view
ResNet
![Page 14: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/14.jpg)
Differentiable Mapping Step 1: Feature Extraction
• Extract features with a ResNet
• Recover a low resolution semantic view
ResNet
![Page 15: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/15.jpg)
Differentiable Mapping Step 1: Feature Extraction
• Extract features with a ResNet
• Recover a low resolution semantic view
ResNet
![Page 16: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/16.jpg)
Differentiable Mapping Step 1: Feature Extraction
• Extract features with a ResNet
• Recover a low resolution semantic view
ResNet
![Page 17: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/17.jpg)
Differentiable Mapping Step 2: Projection
• Deterministic projection from camera image plane to environment ground with pinhole camera model
• Transform from first-person to third-person
Step 2: Deterministic Projection
Feature Map Projected Features
![Page 18: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/18.jpg)
Differentiable Mapping Step 3: Accumulation
∑
Semantic Map (time t-1)
Semantic Map (time t)
Projected Features (time t)
Leaky Integrator
![Page 19: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/19.jpg)
Differentiable Mapping Step 4: Grounding
1x1 Filter
after the blue bale take a right towards the small white bush
before the white bush …
Semantic Map Grounding Map
![Page 20: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/20.jpg)
Observability Mask
∑
Observation Mask (time t-1)
Observation Mask (time t)
Current Observation Mask
Leaky Integrator
![Page 21: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/21.jpg)
Stage I: Planning with Position Visitation Prediction✓ Extract visual features and construct maps
• Compute visitation distributions over the maps
Differentiable Mapping
LingUNet
Trajectory distribution Goal distribution
![Page 22: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/22.jpg)
Predicting Visitation Distributions
• We compute two distributions: trajectory-visitation and goal-visitation
• Cast distribution prediction as image generation
![Page 23: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/23.jpg)
LingUNet
• Image-to-image encoder-decoder
• Visual reasoning at multiple image scales
• Conditioned on language input at all levels of reasoning using text-based convolutions
![Page 24: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/24.jpg)
LingUNet
Convolutions
Instruction
RNN
Text Kernels
Text Convolutions
Deconvolutions
SoftMax
Semantic Maps Visitation Distributions
Updae media
![Page 25: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/25.jpg)
Stage I: Planning with Position Visitation Prediction✓ Extract visual features and construct maps
✓ Compute visitation distributions over the maps
Differentiable Mapping
LingUNet
Trajectory distribution Goal distribution
![Page 26: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/26.jpg)
Stage II: Action Generation• Relatively simple control problem without language
• Transform and crop to agent perspective and generate configuration update
Trajectory distribution Goal distribution
Egocentric Transform
Control Network
![Page 27: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/27.jpg)
Learning vs. Engineering
• Visual features (ResNet)
• Text representation (RNN)
• Image generation (LingUNet)
• Control (control network)
• Feature projections (pinhole camera model)
• Map accumulation (leaky integrator)
• Egocentric transformation (matrix rotations)
Learned 👨🎓 Engineered 👷
Complete network remains fully differentiable
![Page 28: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/28.jpg)
Simulation-Reality Joint Learning
after the blue bale take a right towards the small white bush before the white bush take a right and head towards the right side of the banana
Go between the mushroom and flower chair the tree all the way up
to the phone booth
![Page 29: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/29.jpg)
Training Data
after the blue bale take a right towards the small white bush before the white bush take a right and head towards the right side of the banana
Go between the mushroom and flower chair the tree all the way up
to the phone booth
Simulator Physical Environment
Demonstrations and simulator Demonstrations only
![Page 30: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/30.jpg)
Learning ArchitectureInstruction
CNN Control
CNN-Sim
Features Mask Visitations
Supervised Learning Reinforcement Learning
Map+Plan
![Page 31: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/31.jpg)
Supervised Plan LearningInstruction
CNN
CNN-Sim
Visitations
Map+Plan
Two objectives:(a) Generate visitation distributions (b) Invariance to input environment
Real FeaturesSimulator Features
![Page 32: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/32.jpg)
Supervised Plan LearningInstruction
CNN
CNN-Sim
Visitations
Map+Plan
Real FeaturesSimulator Features
Cross-entropy
loss
Demonstration Visitations
Data: real and simulated states paired with visitation predictions
![Page 33: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/33.jpg)
Supervised Plan LearningInstruction
CNN
CNN-Sim
Visitations
Map+Plan
Cross-entropy
loss
Real FeaturesSimulator Features
Source Discriminator
Empirical Wasserstein
distance
Demonstration Visitations
Adversarial discriminator to force feature invariance following CNN, no data alignment needed
![Page 34: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/34.jpg)
RL for ControlInstruction
CNN Control
CNN-Sim
Features Mask Visitations
Reinforcement Learning
Map+Plan
Intrinsic reward
![Page 35: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/35.jpg)
SuReAL Supervised and Reinforcement Asynchronous Learning
Instruction
CNN Control
CNN-Sim
Supervised Learning Reinforcement Learning
Map+Plan
Periodic parameter updates
Sampled action sequences
![Page 36: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/36.jpg)
SuReAL
• Stage I: learn to predict visitation distributions based on noisy predicted execution trajectories
• Stage II: learn to predict actions using predicted visitation distributions
Supervised and Reinforcement Asynchronous Learning
Periodic parameter updates
Replace gold action sequences with sampled
![Page 37: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/37.jpg)
Experimental Setup• Intel Aero quadcopter
• Vicon motion capture for pose estimate
• Simulation with Microsoft AirSim
• Drone cage is 4.7x4.7m
• Roughly 1.5% of training data in physical environment (402 vs. 23k examples)
![Page 38: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/38.jpg)
Human Evaluation
• Score path and goal on a 5-point Likert scale for 73 examples
• Our model receives five-point path scores 37.8% of the time, 24.8% improvement over PVN2-BC
• Improvements over PVN2-BC illustrates the benefit of SuReAL and the exploration reward
![Page 39: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/39.jpg)
Cool Exampleonce near the rear of the gorilla turn right and head
towards the rock stopping once near it
![Page 40: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/40.jpg)
Failurehead towards the area just to the left of the mushroom
and then loop around it
![Page 41: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/41.jpg)
The Papers• Learning to Map Natural Language Instructions to Physical
Quadcopter Control Using Simulated Flight Valts Blukis, Yannick Terme, Eyvind Niklasson, Ross A. Knepper, and Yoav Artzi CoRL, 2019
• Mapping Navigation Instructions to Continuous Control Actions with Position Visitation Prediction Valts Blukis, Dipendra Misra, Ross A. Knepper, and Yoav Artzi CoRL, 2018
• Following High-level Navigation Instructions on a Simulated Quadcopter with Imitation Learning Valts Blukis, Nataly Brukhim, Andrew Bennett, Ross A. Knepper, and Yoav Artzi RSS, 2018.
![Page 42: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/42.jpg)
TodayMapping instructions to continuous control
• Generating and executing interpretable plans Casting planning as visitation distribution prediction and decomposing the policy to tie action generation to spatial reasoning
• Jointly learning in real life and a simulation Learning with an adversarial discriminator to train domain-invariant perception
• Training for test-time explorationSuReAL training to combine the benefits of supervised and reinforcement learning
![Page 43: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/43.jpg)
Valts Blukis
github.com/lil-lab/drif
And collaborators: Dipendra Misra, Eyvind Niklasson, Nataly Brukhim, Andrew Bennett, and Ross Knepper
Thank you! Questions?
https://github.com/lil-lab/drif
![Page 44: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/44.jpg)
[fin]
![Page 45: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/45.jpg)
Visitation Distributions
![Page 46: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/46.jpg)
• Given a Markov Decisions Process:
• The state-visitation distribution is the probability of visiting state following policy from start state
• Predicting for an expert policy tells us the states to visit to complete the task
• Can learn from demonstrations, but prediction generally impossible: is very large!
Visitation Distribution
πd(s; π, s0)
s0s
π*
MDP StatesS ActionsA RewardR HorizonH
S
d(s; π*, s0)
![Page 47: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/47.jpg)
Approximating Visitation Distributions
• Solution: approximate the state space
• Use an approximate state space and a mapping between the state spaces
• For a well chosen , a policy with a state-visitation distribution close to has bounded sub-optimality
MDP StatesS ActionsA RewardR HorizonH
S̃ϕ : S → S̃
ϕ πd(s̃; π*, s̃0)
![Page 48: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/48.jpg)
Visitation Distribution for Navigation
• is a set of discrete positions in the world
• We compute two distributions: trajectory-visitation and goal-visitation
MDP StatesS ActionsA RewardR HorizonH
Trajectory Probability
S̃
Planning with Position Visitation
PredictionAction
Generation
Stage I Stage IIInstruction
Action
Goal Probability
![Page 49: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/49.jpg)
Drone Model
![Page 50: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/50.jpg)
Drone Learning
![Page 51: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/51.jpg)
Learning Architecture
CNN Mapping LingUNet Control
Features Semantic Grounding Mask Visitations
Instruction
![Page 52: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/52.jpg)
Learning Architecture
CNN Mapping LingUNet Control
CNN-Sim
Features Semantic Grounding Mask Visitations
Instruction
![Page 53: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/53.jpg)
Learning ArchitectureInstruction
CNN Control
CNN-Sim
Features Semantic Grounding Mask Visitations
Supervised Learning Reinforcement Learning
Map+Plan
![Page 54: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/54.jpg)
RL for ControlInstruction
CNN Control
CNN-Sim
Features Mask Visitations
Reinforcement Learning
Map+Plan
![Page 55: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/55.jpg)
RL for Control
Control
Mask Visitations
Value Function
Proximal Policy Optimization
Stage 1
Instruction
Simulation data only
![Page 56: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/56.jpg)
Reward Goals• We want the agent to:
• Follow the plan
• Explore to find the goal if not observed
• Only select feasible actions
• Be efficient
![Page 57: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/57.jpg)
Reward
• Visitation: reduction in earth mover’s distance between the predicted trajectory distribution and what has been done so far
• Stop: if stopping, the earth mover’s distance between the stop location and the predicted goal distribution
• Exploration: reward reducing the probability that the goal has not been observed, and penalize stopping when reward not observed
• Action: penalize actions outside of controller range
• Step: constant step verbosity term to encourage efficient execution
Visitation Stop Exploration Action Step+ + + +
![Page 58: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/58.jpg)
Drone Related Work
![Page 59: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/59.jpg)
Related Work: Task• Mapping instructions to actions with robotic agents
• Mapping instruction to actions in software and simulated environments
• Learning visuomotor policies for robotic agents
Tellex et al. 2011; Matuszek et al. 2012; Duvallet et al. 2013; Walter et al. 2013; Misra et al. 2014; Hemachandra et al. 2015; Lignos et al. 2015
MacMahon et al. 2006; Branavan et al. 2010; Matuszek et al. 2010, 2012; Artzi et al. 2013, 2014; Misra et al. 2017, 2018; Anderson et al. 2017; Suhr and Artzi 2018
Lenz et al. 2015; Levine et al. 2016; Bhatti et al. 2016; Nair et al. 2017; Tobin et al. 2017; Quillen et al. 2018, Sadeghi et al. 2017
![Page 60: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/60.jpg)
Related Work: Method• Mapping and planning in neural networks
• Model and learning decomposition
• Learning to explore
Bhatti et al. 2016; Gupta et al. 2017; Khan et al. 2018; Savinov et al. 2018; Srinivas et al. 2018
Pastor et al. 2009, 2011; Konidaris et al. 2012; Paraschos et al. 2013; Maeda et al. 2017
Knepper et al. 2015; Nyga et al. 2018
![Page 61: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/61.jpg)
Drone Data Collection
![Page 62: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/62.jpg)
Data• Crowdsourced with a simplified environment and agent
• Two-step data collection: writing and validation/segmentation
Go towards the pink flowers and pass them on your left, between them and the ladder. Go left around the flower until you're pointed towards the bush, going
between the gorilla and the traffic cone. Go around the bush, and go in between it and the apple, with the apple on your right. Turn right and go around the apple.
![Page 63: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/63.jpg)
Data• Crowdsourced with a simplified environment and agent
• Two-step data collection: writing and validation/segmentation
Go towards the pink flowers and pass them on your left, between them and the ladder. Go left around the flower until you're pointed towards the bush, going
between the gorilla and the traffic cone. Go around the bush, and go in between it and the apple, with the apple on your right. Turn right and go around the apple.
Go towards the pink flowers and pass them on your left, between them and the ladder. Go left around the flower until you're pointed towards the bush, going
between the gorilla and the traffic cone. Go around the bush, and go in between it and the apple, with the apple on your right. Turn right and go around the apple.
![Page 64: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/64.jpg)
CoRL 2018 Experiments
![Page 65: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/65.jpg)
Experimental Setup• Crowdsourced instructions and demonstrations
• 19,758/4,135/4,072 train/dev/test examples
• Each environment includes 6-13 landmarks
• Quadcopter simulation with AirSim
• Metric: task-completion accuracy
![Page 66: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/66.jpg)
Test Results
0
12.5
25
37.5
50
Success Rate
41.21
24.3621.34
16.43
5.72
STOPAverageChaplot et al. 2018Blukis et al. 2018Our Approach
• Explicit mapping helps performance
• Explicit planning further improves performance
![Page 67: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/67.jpg)
Synthetic vs. Natural Language
• Synthetically generated instructions with templates
• Evaluated with explicit mapping (Blukis et al. 2018)
• Using natural language is significantly more challenging
• Not only a language problem, trajectories become more complex
0
20
40
60
80
Success Rate
24.36
79.2
Synthetic LanguageNatural Language
![Page 68: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/68.jpg)
Ablations Development Results
• The language is being used effectively
• Auxiliary objectives help with credit assignment
0
10.5
21
31.5
42
Success Rate
23.07
30.77
35.9838.87
40.44Our Approachw/o imitation learningw/o goal distributionw/o auxiliary objectivesw/o language
![Page 69: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/69.jpg)
Analysis Development Results
• Better control can improve performance
• Observing the environment, potentially through exploration, remains a challenge
0
17.5
35
52.5
70
Success Rate
60.59
45.7
40.44
Our ApproachIdeal ActionsFully Observable
![Page 70: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/70.jpg)
Drone Experiments
![Page 71: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/71.jpg)
Environment
• Drone cage is 4.7x4.7m
• Created in reality and simulation
• 15 possible landmarks, 5-8 in each environment
• Also: larger 50x50m simulation-only environment with 6-13 landmarks out of possible 63
![Page 72: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/72.jpg)
Data• Real environment training data includes 100
instruction paragraphs, segmented to 402 instructions
• Evaluation with 20 paragraphs
• Evaluate on concatenated consecutive segments
• Oracle trajectories from a simple carrot planner
• Much more data in simulation, including for a larger 50x50m environment
![Page 73: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/73.jpg)
Evaluation
• Two automated metrics
• SR: success rate
• EMD: path earth’s move distance
• Human evaluation: score path and goal on a 5-point Likert scale
![Page 74: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/74.jpg)
Observability
• Big benefit when goal is not immediately observed
• However, complexity comes at small performance cost on easier examples
![Page 75: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/75.jpg)
Test Results
Success Rate
3030.629.2
20.8
16.7
AveragePVN-BCPVN2-BCOur Approach
EMD
0.520.590.61
0.71 • SR often too strict: 30.6% compared to 39.7% five-points on goal
• EMD performance generally more reliable, but still fails to account for semantic correctness
![Page 76: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/76.jpg)
Simple vs. Complex Instructions
• Performance on easier single-segment instructions is much higher
• Instructions are shorter and trajectories simpler
Success Rate
3030.6
56.51-segment Instructions2-segment Instructions
EMD
0.52
0.34
![Page 77: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/77.jpg)
Transfer Effects
• Visual and flight dynamics transfer challenges remain
• Even Oracle shows a drop in performance form 0.17 EMD in the simulation to 0.23 in the real environment
Success Rate
30
30.6
33.3
Simulator Real
EMD
0.52
0.42
![Page 78: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/78.jpg)
Sim-real Shift Examples
![Page 79: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/79.jpg)
Sim-real Control Shiftwhen you reach the right of the palm tree take a sharp
right when you see a blue box head toward it
![Page 80: Robot Control in Situated Instruction Following · 2020-07-12 · STOP Mapping Instructions to Control • The drone maintains a configuration of target velocities • Each action](https://reader033.vdocuments.us/reader033/viewer/2022050204/5f57d0edadcac0199a7e05a3/html5/thumbnails/80.jpg)
Sim-real Control Shiftmake a right at the rock and head towards the banana