deeptraffic: driving fast through dense traffic with deep ... · hope for deep learning +...

67
Lex Fridman [email protected] GTC 2017 May 11 DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning Lex Fridman

Upload: others

Post on 01-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

DeepTraffic: Driving Fast through Dense Traffic

with Deep Reinforcement LearningLex Fridman

Page 2: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Page 3: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Americans spend 8 billion hours stuck in traffic every year.

Page 4: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Goal:

Deep Learning for Everyoneaccessible and fun: seconds to start, eternity* to master

http://cars.mit.eduor search for:

“DeepTraffic”

* estimated time to discover globally optimal solution

Page 5: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

To Play: To Win:

Goal:

Deep Learning for Everyone

Page 6: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Machine Learning from Human and Machine

Memorization

Understanding

Page 7: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

http://cars.mit.edu/deeptesla

Page 8: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Naturalistic Driving Data

Teslas instrumented: 18

Hours of data: 6,000+ hours

Distance traveled: 140,000+ miles

Video frames: 2+ billion

Autopilot: ~12%

Page 9: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Naturalistic Driving Data

Page 10: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

http://cars.mit.edu/deeptesla

Page 11: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

• Localization and Mapping:Where am I?

• Scene Understanding:Where/who/what/why of everyone else?

• Movement Planning:How do I get from A to B?

• Driver State:What’s the driver up to?

• Communicate:How to I convey intent to the driver and to the world?

Page 12: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Autonomous Driving: A Hierarchical View

Paden B, Čáp M, Yong SZ, Yershov D, Frazzoli E. "A Survey of Motion Planning and Control Techniques for Self-driving Urban Vehicles." IEEE Transactions on Intelligent Vehicles 1.1 (2016): 33-55.

Page 13: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Applying Deep Reinforcment Learningto Micro-Traffic Simulation

Reference: http://www.traffic-simulation.de

Page 14: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Formulate Driving as Reinforcement Learning Problem

How to formalize and learn driving?

Page 15: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Philosophical Motivation for Reinforcement Learning

Takeaway from Supervised Learning:

Neural networks are great at memorization and not (yet) great at reasoning.

Hope for Reinforcement Learning:

Brute-force propagation of outcomes to knowledge about states and actions. This is a kind of brute-force “reasoning”.

Page 16: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

(Deep) Reinforcement Learning

• Pros:• Cheap: Very little human annotation is needed.

• Robust: Can learn to act under uncertainty.

• General: Can (seemingly) deal with (huge) raw sensory input.

• Promising: Our current best framework for achieving “intelligence”.

• Cons• Constrained by Formalism: Have to formally define the

state space, the action space, the reward, and the simulated environment.

• Huge Data: Have to be able to simulate (in software or hardware) or have a lot of real-world examples.

Page 17: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Agent and Environment

• At each step the agent:• Executes action

• Receives observation (new state)

• Receives reward

• The environment:• Receives action

• Emits observation (new state)

• Emits reward

References: [80]

Page 18: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Markov Decision Process

𝑠0, 𝑎0, 𝑟1, 𝑠1, 𝑎1, 𝑟2, … , 𝑠𝑛−1, 𝑎𝑛−1, 𝑟𝑛, 𝑠𝑛

state

action

reward

Terminal state

References: [84]

Page 19: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Major Components of an RL Agent

An RL agent may include one or more of these components:

• Policy: agent’s behavior function

• Value function: how good is each state and/or action

• Model: agent’s representation of the environment

𝑠0, 𝑎0, 𝑟1, 𝑠1, 𝑎1, 𝑟2, … , 𝑠𝑛−1, 𝑎𝑛−1, 𝑟𝑛, 𝑠𝑛

state

action

reward

Terminal state

Page 20: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Robot in a Room

+1

-1

START

actions: UP, DOWN, LEFT, RIGHT

UP

80% move UP

10% move LEFT

10% move RIGHT

• reward +1 at [4,3], -1 at [4,2]

• reward -0.04 for each step

• what’s the strategy to achieve max reward?

• what if the actions were deterministic?

Page 21: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Is this a solution?

+1

-1

• only if actions deterministic• not in this case (actions are stochastic)

• solution/policy• mapping from each state to an action

Page 22: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Optimal policy

+1

-1

Page 23: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Reward for each step -2

+1

-1

Page 24: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Reward for each step: -0.1

+1

-1

Page 25: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Reward for each step: -0.04

+1

-1

Page 26: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Reward for each step: -0.01

+1

-1

Page 27: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Reward for each step: +0.01

+1

-1

Page 28: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Value Function

• Future reward 𝑅 = 𝑟1 + 𝑟2 + 𝑟3 + ⋯ + 𝑟𝑛

𝑅𝑡 = 𝑟𝑡 + 𝑟𝑡+1 + 𝑟𝑡+2 + ⋯ + 𝑟𝑛

• Discounted future reward (environment is stochastic)

𝑅𝑡 = 𝑟𝑡 + 𝛾𝑟𝑡+1 + 𝛾2𝑟𝑡+2 + ⋯ + 𝛾𝑛−𝑡𝑟𝑛= 𝑟𝑡 + 𝛾(𝑟𝑡+1 + 𝛾(𝑟𝑡+2 + ⋯))= 𝑟𝑡 + 𝛾𝑅𝑡+1

• A good strategy for an agent would be to always choose an action that maximizes the (discounted) future reward

References: [84]

Page 29: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Q-Learning

• State-action value function: Q(s,a)• Expected return when starting in s,

performing a, and following

• Q-Learning: Use any policy to estimate Q that maximizes future reward:• Q directly approximates Q* (Bellman optimality equation)

• Independent of the policy being followed

• Only requirement: keep updating each (s,a) pair

s

a

s’

r

New State Old State Reward

Learning Rate Discount Factor

Page 30: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Exploration vs Exploitation

• Key ingredient of Reinforcement Learning

• Deterministic/greedy policy won’t explore all actions• Don’t know anything about the environment at the beginning

• Need to try all actions to find the optimal one

• Maintain exploration

• Use soft policies instead: (s,a)>0 (for all s,a)

• ε-greedy policy• With probability 1-ε perform the optimal/greedy action

• With probability ε perform a random action

• Will keep exploring the environment

• Slowly move it towards greedy policy: ε -> 0

Page 31: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Q-Learning: Value Iteration

References: [84]

A1 A2 A3 A4

S1 +1 +2 -1 0

S2 +2 0 +1 -2

S3 -1 +1 0 -2

S4 -2 0 +1 +1

Page 32: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Q-Learning: Representation Matters

• In practice, Value Iteration is impractical• Very limited states/actions

• Cannot generalize to unobserved states

• Think about the Breakout game• State: screen pixels

• Image size: 𝟖𝟒 × 𝟖𝟒 (resized)

• Consecutive 4 images

• Grayscale with 256 gray levels

𝟐𝟓𝟔𝟖𝟒×𝟖𝟒×𝟒 rows in the Q-table!

References: [83, 84]

Page 33: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Philosophical Motivation for Deep Reinforcement Learning

Takeaway from Supervised Learning:

Neural networks are great at memorization and not (yet) great at reasoning.

Hope for Reinforcement Learning:

Brute-force propagation of outcomes to knowledge about states and actions. This is a kind of brute-force “reasoning”.

Hope for Deep Learning + Reinforcement Learning:

General purpose artificial intelligence through efficient generalizable learning of the optimal thing to do given a formalized set of actions and states (possibly huge).

Page 34: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Deep Q-Learning

Use a function (with parameters) to approximate the Q-function

• Linear• Non-linear: Q-Network

References: [83]

Page 35: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Deep Q-Network: Atari

Mnih et al. "Playing atari with deep reinforcement learning." 2013.

References: [83]

Page 36: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Atari Breakout

References: [85]

After120 Minutes

of Training

After10 Minutesof Training

After240 Minutes

of Training

Page 37: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

DQN Results in Atari

References: [83]

Page 38: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Deep Q-Network: DeepTraffic

Page 39: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Deep Q-Network Training

Given a transition < s, a, r, s’ >, the Q-table update rule in the previous algorithm must be replaced with the following:

• Do a feedforward pass for the current state s to get predicted Q-values for all actions

• Do a feedforward pass for the next state s’ and calculate maximum overall network outputs max a’ Q(s’, a’)

• Set Q-value target for action to r + γmax a’ Q(s’, a’) (use the max calculated in step 2).

• For all other actions, set the Q-value target to the same as originally returned from step 1, making the error 0 for those outputs.

• Update the weights using backpropagation.

References: [83]

Page 40: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Philosophical Motivation for Deep Reinforcement Learning

Takeaway from Supervised Learning:

Neural networks are great at memorization and not (yet) great at reasoning.

Hope for Reinforcement Learning:

Brute-force propagation of outcomes to knowledge about states and actions. This is a kind of brute-force “reasoning”.

Hope for Deep Learning + Reinforcement Learning:

General purpose artificial intelligence through efficient generalizable learning of the optimal thing to do given a formalized set of actions and states (possibly huge in size).

Page 41: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Driving may need more than SLAM, Perception, and Control

References: (Karaman RRT*)

Page 42: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Moravec’s Paradox: The “Easy” Problems are Hard

Soccer is harder than Chess

References: [8, 9]

Page 43: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Formulate Driving as a Reinforcement Learning Problem

http://cars.mit.edu/deeptrafficjs

Page 44: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

The Road, The Car, The Speed

Page 45: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

The Road, The Car, The Speed

• Solo motion planning subtasks• Longitudinal: speed

• Lateral: lane choice

• Vehicular interaction subtasks• Longitudinal: car-following

• Lateral: lane changing

Page 46: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

The Road, The Car, The Speed

Page 47: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

“Safety System”: Motion and Control are Given

Page 48: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Learning the “Behavioral Layer” Task

Page 49: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Learning the “Behavioral Layer” Task

Page 50: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Action Space

Page 51: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Driving / Learning

Page 52: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Learning Input

Page 53: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Deep RL: Q-Function Learning Parameters

Page 54: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Deep RL: Layers

Page 55: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Deep RL: Output (Actions)

Page 56: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

ConvNetJS: Options

Page 57: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Formulate Driving as a Reinforcement Learning Problem

http://cars.mit.edu/deeptrafficjs

Page 58: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Slides available at http://cars.mit.edu/gtc

Page 59: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

OpenAI Gym: From JS to TensorFlow

1. Formulate DeepTraffic as a reinforcement learning task.

2. Use TensorFlow/Keras/PyTorch to train an agent

Page 60: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Formulate DeepTraffic as a Reinforcement Learning Task

Page 61: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Adding a Deep Q-Network (with Keras)

Example: https://github.com/matthiasplappert/keras-rl

Page 62: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Adding a Deep Q-Network (with Keras)

Example: https://github.com/matthiasplappert/keras-rl

Page 63: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

http://cars.mit.edu

DeepTraffic

v1.0: In MIT v1.1: Outside MIT

Page 64: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

http://cars.mit.edu

DeepTraffic 2.0

1st place: Titan XP2nd place: GeForce GTX 1080 Ti3rd place: Jetson TX2

Page 65: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

http://cars.mit.edu

DeepTraffic

Challenge to GTC Attendees:• Create account on the site and put “GTC” as

how you heard about us.• Make a neural network that travels 70+ mph.

Page 66: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Have fun with Deep RL and DeepTraffic!

Page 67: DeepTraffic: Driving Fast through Dense Traffic with Deep ... · Hope for Deep Learning + Reinforcement Learning: General purpose artificial intelligence through efficient generalizable

Lex [email protected]

GTC 2017May 11

DeepTraffic: Driving Fast through Dense Trafficwith Deep Reinforcement Learning

Have fun with Deep RL and DeepTraffic!

But not too much fun...

Slides available at http://cars.mit.edu/gtc