query-efficient imitation learning for end-to-end ... · • supervised learning • alvinn net...
TRANSCRIPT
![Page 1: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving](https://reader033.vdocuments.us/reader033/viewer/2022042220/5ec62eebd0820c0307272423/html5/thumbnails/1.jpg)
Query-Efficient Imitation Learningfor End-to-End Simulated Driving
Jiakai Zhang, Kyunghyun Cho
New York University
![Page 2: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving](https://reader033.vdocuments.us/reader033/viewer/2022042220/5ec62eebd0820c0307272423/html5/thumbnails/2.jpg)
Introduction• End-to-end learning for self-driving
• Related work
Learning method• Convolutional neural network
• Imitation learning using SafeDAgger
Experiment• Setup
• Results
Conclusion and future work
Overview
![Page 3: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving](https://reader033.vdocuments.us/reader033/viewer/2022042220/5ec62eebd0820c0307272423/html5/thumbnails/3.jpg)
Introduction
End-to-end learning for self-driving
• Sensory input from front-facing camera
• Control signal
Steering Brake
![Page 4: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving](https://reader033.vdocuments.us/reader033/viewer/2022042220/5ec62eebd0820c0307272423/html5/thumbnails/4.jpg)
Introduction
Related work
• Supervised learning
• ALVINN net [Pomerleau 1989]
• DeepDriving [Chen et al. 2015]
• End-to-end learning for self-driving cars [Bojarski et al. 2016]
• Imitation learning
• DAgger [Ross, Gordon, and Bagnell 2010]
• SafeDAgger [Zhang and Cho 2017]
![Page 5: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving](https://reader033.vdocuments.us/reader033/viewer/2022042220/5ec62eebd0820c0307272423/html5/thumbnails/5.jpg)
DAgger algorithm
Dataset 𝐷0 Policy 𝜋1Initialize
Policy 𝜋𝑖 = 𝛽𝑖𝜋∗ + (1 − 𝛽𝑖) 𝜋𝑖
Dataset 𝐷𝑖 = 𝐷′ ∪ 𝐷𝑖−1
Dataset 𝐷′
Policy 𝜋𝑖
Iteration
Return Best policy 𝜋𝑖
Disadvantage:
• Query a reference policy
constantly
• Safe issue to environment
![Page 6: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving](https://reader033.vdocuments.us/reader033/viewer/2022042220/5ec62eebd0820c0307272423/html5/thumbnails/6.jpg)
SafeDAgger algorithm
Dataset 𝐷0 Policy 𝜋1Initialize
Dataset 𝐷𝑖 = 𝐷′ ∪ 𝐷𝑖−1
Dataset𝐷′ not safe
Policy 𝜋𝑖
Iteration
Return Best policy 𝜋𝑖
Safety classifier 𝑐1
Policy 𝜋𝑖 = 𝛽𝑖𝜋∗ + (1 − 𝛽𝑖) 𝜋𝑖
Safety classifier 𝑐𝑖
Safety classifier 𝑐1
Safety classifier 𝑐𝑖Advantage:
• Query-efficient
• Safety feature
![Page 7: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving](https://reader033.vdocuments.us/reader033/viewer/2022042220/5ec62eebd0820c0307272423/html5/thumbnails/7.jpg)
Safety classifier
• Deviation of a primary policy from a reference
policy defined
• Optimal safety classifier defined as
Learning safety classifier
• Minimize a binary cross-entropy loss
![Page 8: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving](https://reader033.vdocuments.us/reader033/viewer/2022042220/5ec62eebd0820c0307272423/html5/thumbnails/8.jpg)
Experiment – Setup
TORCS – Open source racing game
Training tracks
Test tracks
![Page 9: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving](https://reader033.vdocuments.us/reader033/viewer/2022042220/5ec62eebd0820c0307272423/html5/thumbnails/9.jpg)
Experiment – Model
Input image – 3x160x72
Convolutional layer – 64x3x3
Max Pooling – 2x2
Convolutional layer – 128x5x5
Fully connected layer
Control
signals
Environment
variables
x 4
x 2
Primary policy
Feature map
Fully connected layer
Safety value
x 2
Safety classifier
Optimization algorithm: stochastic gradient descent
![Page 10: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving](https://reader033.vdocuments.us/reader033/viewer/2022042220/5ec62eebd0820c0307272423/html5/thumbnails/10.jpg)
ResultsSafe Frames
Unsafe Frames
![Page 11: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving](https://reader033.vdocuments.us/reader033/viewer/2022042220/5ec62eebd0820c0307272423/html5/thumbnails/11.jpg)
Evaluation on test tracks
1. Mean squared error of steering angle
2. Damage per lap
3. Number of laps
4. Portion of time driven by a reference policy
Results
![Page 12: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving](https://reader033.vdocuments.us/reader033/viewer/2022042220/5ec62eebd0820c0307272423/html5/thumbnails/12.jpg)
Dashed curve – with traffic
Solid curve – without traffic
Mean squared error of steering angle
Results
# of Dagger Iterations
MS
E (
Ste
erin
g A
ngle
)
![Page 13: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving](https://reader033.vdocuments.us/reader033/viewer/2022042220/5ec62eebd0820c0307272423/html5/thumbnails/13.jpg)
Damage per Lap
Dashed curve – with traffic
Solid curve – without traffic
Results
# of Dagger Iterations
Dam
age
per
Lap
![Page 14: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving](https://reader033.vdocuments.us/reader033/viewer/2022042220/5ec62eebd0820c0307272423/html5/thumbnails/14.jpg)
Number of Laps
Dashed curve – with traffic
Solid curve – without traffic
Results
# of Dagger Iterations
Avg. # o
f L
aps
![Page 15: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving](https://reader033.vdocuments.us/reader033/viewer/2022042220/5ec62eebd0820c0307272423/html5/thumbnails/15.jpg)
Portion of time driven by a reference policy
Dashed curve – with traffic
Solid curve – without traffic
Results
# of Dagger Iterations
% o
f c s
afe
= 0
![Page 16: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving](https://reader033.vdocuments.us/reader033/viewer/2022042220/5ec62eebd0820c0307272423/html5/thumbnails/16.jpg)
Demo
![Page 17: Query-Efficient Imitation Learning for End-to-End ... · • Supervised learning • ALVINN net [Pomerleau 1989] • DeepDriving [Chen et al. 2015] • End-to-end learning for self-driving](https://reader033.vdocuments.us/reader033/viewer/2022042220/5ec62eebd0820c0307272423/html5/thumbnails/17.jpg)
Conclusion Proposed SafeDAgger algorithm
• Query efficient
• Safety feature
End-to-end simulated driving
• Trained a convolutional neural network to drive in TORCS with traffic
Future work Evaluate SafeDAgger in the real world
Learn to use temporal information