deeptraffic: driving fast through dense traffic with deep ... · hope for deep learning +...
TRANSCRIPT
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
DeepTraffic: Driving Fast through Dense Traffic
with Deep Reinforcement LearningLex Fridman
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Americans spend 8 billion hours stuck in traffic every year.
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Goal:
Deep Learning for Everyoneaccessible and fun: seconds to start, eternity* to master
http://cars.mit.eduor search for:
“DeepTraffic”
* estimated time to discover globally optimal solution
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
To Play: To Win:
Goal:
Deep Learning for Everyone
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Machine Learning from Human and Machine
Memorization
Understanding
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
http://cars.mit.edu/deeptesla
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Naturalistic Driving Data
Teslas instrumented: 18
Hours of data: 6,000+ hours
Distance traveled: 140,000+ miles
Video frames: 2+ billion
Autopilot: ~12%
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Naturalistic Driving Data
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
http://cars.mit.edu/deeptesla
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
• Localization and Mapping:Where am I?
• Scene Understanding:Where/who/what/why of everyone else?
• Movement Planning:How do I get from A to B?
• Driver State:What’s the driver up to?
• Communicate:How to I convey intent to the driver and to the world?
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Autonomous Driving: A Hierarchical View
Paden B, Čáp M, Yong SZ, Yershov D, Frazzoli E. "A Survey of Motion Planning and Control Techniques for Self-driving Urban Vehicles." IEEE Transactions on Intelligent Vehicles 1.1 (2016): 33-55.
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Applying Deep Reinforcment Learningto Micro-Traffic Simulation
Reference: http://www.traffic-simulation.de
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Formulate Driving as Reinforcement Learning Problem
How to formalize and learn driving?
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Philosophical Motivation for Reinforcement Learning
Takeaway from Supervised Learning:
Neural networks are great at memorization and not (yet) great at reasoning.
Hope for Reinforcement Learning:
Brute-force propagation of outcomes to knowledge about states and actions. This is a kind of brute-force “reasoning”.
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
(Deep) Reinforcement Learning
• Pros:• Cheap: Very little human annotation is needed.
• Robust: Can learn to act under uncertainty.
• General: Can (seemingly) deal with (huge) raw sensory input.
• Promising: Our current best framework for achieving “intelligence”.
• Cons• Constrained by Formalism: Have to formally define the
state space, the action space, the reward, and the simulated environment.
• Huge Data: Have to be able to simulate (in software or hardware) or have a lot of real-world examples.
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Agent and Environment
• At each step the agent:• Executes action
• Receives observation (new state)
• Receives reward
• The environment:• Receives action
• Emits observation (new state)
• Emits reward
References: [80]
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Markov Decision Process
𝑠0, 𝑎0, 𝑟1, 𝑠1, 𝑎1, 𝑟2, … , 𝑠𝑛−1, 𝑎𝑛−1, 𝑟𝑛, 𝑠𝑛
state
action
reward
Terminal state
References: [84]
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Major Components of an RL Agent
An RL agent may include one or more of these components:
• Policy: agent’s behavior function
• Value function: how good is each state and/or action
• Model: agent’s representation of the environment
𝑠0, 𝑎0, 𝑟1, 𝑠1, 𝑎1, 𝑟2, … , 𝑠𝑛−1, 𝑎𝑛−1, 𝑟𝑛, 𝑠𝑛
state
action
reward
Terminal state
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Robot in a Room
+1
-1
START
actions: UP, DOWN, LEFT, RIGHT
UP
80% move UP
10% move LEFT
10% move RIGHT
• reward +1 at [4,3], -1 at [4,2]
• reward -0.04 for each step
• what’s the strategy to achieve max reward?
• what if the actions were deterministic?
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Is this a solution?
+1
-1
• only if actions deterministic• not in this case (actions are stochastic)
• solution/policy• mapping from each state to an action
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Optimal policy
+1
-1
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Reward for each step -2
+1
-1
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Reward for each step: -0.1
+1
-1
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Reward for each step: -0.04
+1
-1
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Reward for each step: -0.01
+1
-1
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Reward for each step: +0.01
+1
-1
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Value Function
• Future reward 𝑅 = 𝑟1 + 𝑟2 + 𝑟3 + ⋯ + 𝑟𝑛
𝑅𝑡 = 𝑟𝑡 + 𝑟𝑡+1 + 𝑟𝑡+2 + ⋯ + 𝑟𝑛
• Discounted future reward (environment is stochastic)
𝑅𝑡 = 𝑟𝑡 + 𝛾𝑟𝑡+1 + 𝛾2𝑟𝑡+2 + ⋯ + 𝛾𝑛−𝑡𝑟𝑛= 𝑟𝑡 + 𝛾(𝑟𝑡+1 + 𝛾(𝑟𝑡+2 + ⋯))= 𝑟𝑡 + 𝛾𝑅𝑡+1
• A good strategy for an agent would be to always choose an action that maximizes the (discounted) future reward
References: [84]
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Q-Learning
• State-action value function: Q(s,a)• Expected return when starting in s,
performing a, and following
• Q-Learning: Use any policy to estimate Q that maximizes future reward:• Q directly approximates Q* (Bellman optimality equation)
• Independent of the policy being followed
• Only requirement: keep updating each (s,a) pair
s
a
s’
r
New State Old State Reward
Learning Rate Discount Factor
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Exploration vs Exploitation
• Key ingredient of Reinforcement Learning
• Deterministic/greedy policy won’t explore all actions• Don’t know anything about the environment at the beginning
• Need to try all actions to find the optimal one
• Maintain exploration
• Use soft policies instead: (s,a)>0 (for all s,a)
• ε-greedy policy• With probability 1-ε perform the optimal/greedy action
• With probability ε perform a random action
• Will keep exploring the environment
• Slowly move it towards greedy policy: ε -> 0
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Q-Learning: Value Iteration
References: [84]
A1 A2 A3 A4
S1 +1 +2 -1 0
S2 +2 0 +1 -2
S3 -1 +1 0 -2
S4 -2 0 +1 +1
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Q-Learning: Representation Matters
• In practice, Value Iteration is impractical• Very limited states/actions
• Cannot generalize to unobserved states
• Think about the Breakout game• State: screen pixels
• Image size: 𝟖𝟒 × 𝟖𝟒 (resized)
• Consecutive 4 images
• Grayscale with 256 gray levels
𝟐𝟓𝟔𝟖𝟒×𝟖𝟒×𝟒 rows in the Q-table!
References: [83, 84]
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Philosophical Motivation for Deep Reinforcement Learning
Takeaway from Supervised Learning:
Neural networks are great at memorization and not (yet) great at reasoning.
Hope for Reinforcement Learning:
Brute-force propagation of outcomes to knowledge about states and actions. This is a kind of brute-force “reasoning”.
Hope for Deep Learning + Reinforcement Learning:
General purpose artificial intelligence through efficient generalizable learning of the optimal thing to do given a formalized set of actions and states (possibly huge).
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Deep Q-Learning
Use a function (with parameters) to approximate the Q-function
• Linear• Non-linear: Q-Network
References: [83]
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Deep Q-Network: Atari
Mnih et al. "Playing atari with deep reinforcement learning." 2013.
References: [83]
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Atari Breakout
References: [85]
After120 Minutes
of Training
After10 Minutesof Training
After240 Minutes
of Training
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
DQN Results in Atari
References: [83]
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Deep Q-Network: DeepTraffic
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Deep Q-Network Training
Given a transition < s, a, r, s’ >, the Q-table update rule in the previous algorithm must be replaced with the following:
• Do a feedforward pass for the current state s to get predicted Q-values for all actions
• Do a feedforward pass for the next state s’ and calculate maximum overall network outputs max a’ Q(s’, a’)
• Set Q-value target for action to r + γmax a’ Q(s’, a’) (use the max calculated in step 2).
• For all other actions, set the Q-value target to the same as originally returned from step 1, making the error 0 for those outputs.
• Update the weights using backpropagation.
References: [83]
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Philosophical Motivation for Deep Reinforcement Learning
Takeaway from Supervised Learning:
Neural networks are great at memorization and not (yet) great at reasoning.
Hope for Reinforcement Learning:
Brute-force propagation of outcomes to knowledge about states and actions. This is a kind of brute-force “reasoning”.
Hope for Deep Learning + Reinforcement Learning:
General purpose artificial intelligence through efficient generalizable learning of the optimal thing to do given a formalized set of actions and states (possibly huge in size).
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Driving may need more than SLAM, Perception, and Control
References: (Karaman RRT*)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Moravec’s Paradox: The “Easy” Problems are Hard
Soccer is harder than Chess
References: [8, 9]
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Formulate Driving as a Reinforcement Learning Problem
http://cars.mit.edu/deeptrafficjs
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
The Road, The Car, The Speed
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
The Road, The Car, The Speed
• Solo motion planning subtasks• Longitudinal: speed
• Lateral: lane choice
• Vehicular interaction subtasks• Longitudinal: car-following
• Lateral: lane changing
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
The Road, The Car, The Speed
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
“Safety System”: Motion and Control are Given
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Learning the “Behavioral Layer” Task
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Learning the “Behavioral Layer” Task
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Action Space
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Driving / Learning
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Learning Input
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Deep RL: Q-Function Learning Parameters
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Deep RL: Layers
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Deep RL: Output (Actions)
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
ConvNetJS: Options
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Formulate Driving as a Reinforcement Learning Problem
http://cars.mit.edu/deeptrafficjs
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Slides available at http://cars.mit.edu/gtc
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
OpenAI Gym: From JS to TensorFlow
1. Formulate DeepTraffic as a reinforcement learning task.
2. Use TensorFlow/Keras/PyTorch to train an agent
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Formulate DeepTraffic as a Reinforcement Learning Task
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Adding a Deep Q-Network (with Keras)
Example: https://github.com/matthiasplappert/keras-rl
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Adding a Deep Q-Network (with Keras)
Example: https://github.com/matthiasplappert/keras-rl
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
http://cars.mit.edu
DeepTraffic
v1.0: In MIT v1.1: Outside MIT
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
http://cars.mit.edu
DeepTraffic 2.0
1st place: Titan XP2nd place: GeForce GTX 1080 Ti3rd place: Jetson TX2
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
http://cars.mit.edu
DeepTraffic
Challenge to GTC Attendees:• Create account on the site and put “GTC” as
how you heard about us.• Make a neural network that travels 70+ mph.
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Have fun with Deep RL and DeepTraffic!
GTC 2017May 11
DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning
Have fun with Deep RL and DeepTraffic!
But not too much fun...
Slides available at http://cars.mit.edu/gtc