deep reinforcement learning - space · 11/14/2017  · further resources •planspace.org has these...

116
Deep Reinforcement Learning Aaron Schumacher Data Science DC 2017-11-14

Upload: others

Post on 16-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

DeepReinforcement LearningAaron Schumacher

Data Science DC

2017-11-14

Page 2: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Aaron Schumacher•planspace.org has these slides

Page 3: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning
Page 4: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning
Page 5: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning
Page 6: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Plan•applications: what

• theory•applications: how•onward

Page 7: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

applications: what

Page 8: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Backgammon

Page 9: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Atari

Page 10: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Tetris

Page 11: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Go

Page 12: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

theory

Page 13: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Yann LeCun’s cake• cake: unsupervised learning• icing: supervised learning• cherry: reinforcement learning

Page 14: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

unsupervised learning•𝑠

Page 15: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

state 𝑠

Lorem ipsum dolor sit amet, consectetur adipiscingelit. Cras quis cursus purus. Aliquam nisl purus, posuere id venenatis et, posuere non leo. Proincommodo, arcu vel commodo pellentesque, eratsapien mattis justo, nec semper urna lacus vel nisi. Nullam nisi odio, interdum id dictum vel, ultriciesquis leo. Aliquam sed porttitor lacus. Suspendissevulputate, leo vitae tempus accumsan, justo ex hendrerit magna, vel suscipit eros purus non magna. Vestibulum ac interdum ipsum. Sed nibh sem, pharetra euismod purus non, finibus mattis sapien.

NUMBERS

Page 16: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

unsupervised (?) learning•given 𝑠• learn 𝑠 → cluster_id• learn 𝑠 → 𝑠

Page 17: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

unsupervised learning•𝑠

deep

with deep neural nets

Page 18: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

supervised learning•𝑠 → 𝑎

Page 19: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

action 𝑎

NUMBERS

“cat”/“dog”

“left”/“right”

17.3

[2.0, 11.7, 5]

4.2V

Page 20: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

supervised learning•given 𝑠, 𝑎• learn 𝑠 → 𝑎

Page 21: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

supervised learning•𝑠 → 𝑎

deep

with deep neural nets

Page 22: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

reinforcement learning•𝑟, 𝑠 → 𝑎

Page 23: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

reward 𝑟

NUMBER

-30

7.41

Page 24: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

optimal control / reinforcement learning

Page 25: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

𝑟, 𝑠 → 𝑎

Page 26: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Page 27: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

𝑟′, 𝑠′ → 𝑎′

Page 28: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Page 29: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning
Page 30: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

reinforcement learning• “given” 𝑟, 𝑠, 𝑎, 𝑟1, 𝑠1, 𝑎1, …• learn “good” 𝑠 → 𝑎

Page 31: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Question•Can you formulate supervised learning as reinforcement learning?

Page 32: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

supervised as reinforcement?• reward 0 for incorrect, reward 1 for correct�beyond binary classification?

• reward deterministic•next state random

Page 33: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Question•Why is the Sarsa algorithm called Sarsa?

Page 34: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

reinforcement learning•𝑟, 𝑠 → 𝑎

Page 35: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

reinforcement learning•𝑟, 𝑠 → 𝑎

deep

with deep neural nets

Page 36: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Markov property•𝑠 is enough

Page 37: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Make choices?no yes

Completely observable?

yes Markov Chain MDPMarkov Decision Process

no HMMHidden Markov Model

POMDPPartially Observable MDP

Page 38: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

usual RL problem elaboration

Page 39: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

policy 𝜋•𝜋: 𝑠 → 𝑎

Page 40: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

return•∑𝑟���into the future (episodic or ongoing)

Page 41: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

start goal

Page 42: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

0 0 0 0

0 0 0 0

0 0 0 1

0 0 0 0

reward

Page 43: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

1 1 1 1

1 1 1 1

1 1 1 1

1 1 1 1

return

Page 44: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Question•How do we incentivize efficiency?

start goal

Page 45: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

negative reward at each time step

-1 -1 -1 -1

-1 -1 -1 -1

-1 -1 -1 0

-1 -1 -1 -1

Page 46: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

discounted rewards•∑𝛾8𝑟�

0 0 0 0

0 0 0 0

0 0 0 1

0 0 0 0

Page 47: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

average rate of reward•∑ 9��8

0 0 0 0

0 0 0 0

0 0 0 1

0 0 0 0

Page 48: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

value functions•𝑣: 𝑠 → ∑𝑟��•𝑞: 𝑠, 𝑎 → ∑𝑟��

Page 49: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Question•Does a value function specify a policy (if you want to maximize return)?

•𝑣: 𝑠 → ∑𝑟��•𝑞: 𝑠, 𝑎 → ∑𝑟��

Page 50: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

trick question?•𝑣<•𝑞<

Page 51: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Question•Does a value function specify a policy (if you want to maximize return)?

•𝑣: 𝑠 → ∑𝑟��•𝑞: 𝑠, 𝑎 → ∑𝑟��

Page 52: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

environment dynamics•𝑠, 𝑎 → 𝑟1, 𝑠′

Page 53: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

start goal

Going “right” from “start,” where will you be next?

Page 54: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

model•𝑠, 𝑎 → 𝑟1, 𝑠′

Page 55: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

learning and acting

Page 56: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

learn model of dynamics• from experience•difficulty varies

Page 57: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

learn value function(s) with planning•assuming we have the environment dynamics or a good model already

Page 58: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

planning

𝑠=

𝑎=

𝑎>

Page 59: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

planning

𝑠=

𝑠>

𝑠?

𝑎=𝑟=

𝑎> 𝑟>

𝑎?

𝑎@

𝑎A

Page 60: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

planning

𝑠=

𝑠>

𝑠?

𝑎=𝑟=

𝑎> 𝑟>

𝑎?

𝑎@

𝑎A

Start estimating 𝑞(𝑠, 𝑎)!

Page 61: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

planning

𝑠=

𝑠>

𝑠?

𝑠A

𝑠@

𝑠D

𝑎=𝑟=

𝑎?𝑟?

𝑎D

𝑎E

𝑎> 𝑟>𝑎@ 𝑟@

𝑎A 𝑟A𝑎F

Page 62: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

planning

𝑠=

𝑠>

𝑠?

𝑠A

𝑠@

𝑠D

𝑎=𝑟=

𝑎?𝑟?

𝑎D

𝑎E

𝑎> 𝑟>𝑎@ 𝑟@

𝑎A 𝑟A𝑎F

Start estimating 𝑣(𝑠)!

Page 63: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

planning

𝑠=

𝑠>

𝑠?

𝑠A

𝑠@

𝑠F

𝑠G

𝑠E

𝑠D

𝑎=𝑟=

𝑎?𝑟?

𝑎D𝑟D

𝑎G

𝑎E𝑟E

𝑎=H

𝑎> 𝑟>𝑎@ 𝑟@

𝑎A 𝑟A𝑎F 𝑟F

𝑎==

𝑎=>

Page 64: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

planning

𝑠=

𝑠>

𝑠?

𝑠A

𝑠@

𝑠F

𝑠G

𝑠==

𝑠=>

𝑠E

𝑠=H

𝑠D 𝑠=?

𝑎=𝑟=

𝑎?𝑟?

𝑎D𝑟D

𝑎G𝑟G

𝑎E𝑟E

𝑎=H𝑟=H

𝑎> 𝑟>𝑎@ 𝑟@

𝑎A 𝑟A𝑎F 𝑟F

𝑎==𝑟==

𝑎=> 𝑟=>

𝑎=?

𝑎=A

𝑎=@

𝑎=D

Page 65: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

•Dijkstra’s algorithm•A* algorithm•minimax search�two-player games not a problem

•Monte Carlo tree search

connections

Page 66: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

roll-outs

𝑠=

𝑠>

𝑠@

𝑠F

𝑠==

𝑎=𝑟=

𝑎?

𝑎E𝑟E

𝑎=H𝑟=H

𝑎>

𝑎A 𝑟A𝑎F

𝑎==

𝑎=@

Page 67: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

roll-outs

Page 68: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

everything can be stochastic

Page 69: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

stochastic 𝑠′

-1 -1 -1 -1

-1 -1 -1 -1

start -1 -1 goal

-10 -10 -10 -10

“icy pond”20% chance you “slip” and go perpendicular to your intent

Page 70: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

stochastic 𝑟′

multi-armed bandit

Page 71: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

exploration vs. exploitation•𝜀-greedy policy�usually do what looks best�𝜀 of the time, choose random action

Page 72: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Question•How does randomness affect 𝜋, 𝑣, and 𝑞?�𝜋: 𝑠 → 𝑎�𝑣: 𝑠 → ∑𝑟���𝑞: 𝑠, 𝑎 → ∑𝑟��

Page 73: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Question•How does randomness affect 𝜋, 𝑣, and 𝑞?�𝜋:ℙ(𝑎|𝑠)�𝑣: 𝑠 → 𝔼∑𝑟���𝑞: 𝑠, 𝑎 → 𝔼∑𝑟��

Page 74: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

model-free: Monte Carlo returns•𝑣 𝑠 =?•keep track and average• like “planning” from experienced “roll-outs”

Page 75: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

non-stationarity•𝑣(𝑠) changes over time!

Page 76: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

moving average• new mean = old mean + 𝛼(new sample - old mean)

Page 77: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

moving average• new mean = old mean + 𝛼(new sample - old mean)

𝛻12 sample − mean >

Page 78: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Question•What’s the difference between 𝑣(𝑠) and 𝑣(𝑠′)?

𝑠′𝑠 …

… …𝑎 𝑟′

Page 79: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Question•What’s the difference between 𝑣(𝑠) and 𝑣(𝑠′)?

𝑠′𝑠 …

… …𝑎 𝑟′

Page 80: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Bellman equation•𝑣 𝑠 = 𝑟1 + 𝑣(𝑠1)

Page 81: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Bellman equation•𝑣 𝑠 = 𝑟1 + 𝑣(𝑠1)

Page 82: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Bellman equation•𝑣 𝑠 = 𝑟1 + 𝑣(𝑠1)

dynamic programming

Page 83: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Bellman equation•𝑣 𝑠 = 𝑟1 + 𝑣(𝑠1)

dynamic programming

curse of dimensionality

Page 84: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

temporal difference (TD)•𝑣 𝑠 = 𝑟1 + 𝑣(𝑠1)•0 =? 𝑟1 + 𝑣 𝑠1 − 𝑣(𝑠)�“new sample”: 𝑟1 + 𝑣 𝑠1�“old mean”: 𝑣(𝑠)

Page 85: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Q-learning• new𝑞 𝑠, 𝑎 = 𝑞 𝑠, 𝑎 + 𝛼(𝑟1 + 𝛾max

]𝑞 𝑠1, 𝑎 − 𝑞 𝑠, 𝑎 )

Page 86: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

on-policy / off-policy

Page 87: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

estimate 𝑣, 𝑞•with a deep neural network

Page 88: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

back to the 𝜋•parameterize 𝜋 directly•update based on how well it works•REINFORCE� REward Increment = Nonnegative Factor times Offset

Reinforcement times Characteristic Eligibility

Page 89: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

policy gradient•𝛻log(𝜋 𝑎 𝑠 )

Page 90: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

actor-critic•𝜋 is the actor•𝑣 is the critic• train𝜋 by policy gradient to encourage actions that work out better than 𝑣 expected

Page 91: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

applications: how

Page 92: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Backgammon

Page 93: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

TD-Gammon (1992)• 𝑠 is custom features• 𝑣 with shallow neural net• 𝑟 is 1 for a win, 0 otherwise•TD(𝜆)� eligibility traces

• self-play• shallow forward search

Page 94: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Atari

Page 95: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

DQN (2015)• 𝑠 is four frames of video• 𝑞 with deep convolutional neural net• 𝑟 is 1 if score increases, -1 if decreases, 0 otherwise• Q-learning

� usually-frozen target network� clipping update size etc.

• 𝜀-greedy• experience replay

Page 96: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Tetris

Page 97: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

evolution (2006)•𝑠 is custom features•𝜋 has a simple parameterization•evaluate by final score•cross-entropy method

Page 98: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Go

Page 99: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

AlphaGo (2016)• 𝑠 is custom features over geometry

� big and small variants

• 𝜋ab with deep convolutional neural net, supervised training

• 𝜋9cddce8 with smaller convolutional neural net, supervised training

• 𝑟 is 1 for a win, -1 for a loss, 0 otherwise

• 𝜋fb is 𝜋ab refined with policy gradient peer-play reinforcement learning

• 𝑣 with deep convolutional neural net, trained based on 𝜋fb games

• asynchronous policy and value Monte Carlo tree search� expand tree with 𝜋ab� evaluate positions with blend of 𝑣 and 𝜋9cddce8 rollouts

4 nets

Page 100: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

AlphaGo Zero (2017)• 𝑠 is simple features over time and geometry

• 𝜋, 𝑣 with deep residual convolutional neural net

• 𝑟 is 1 for a win, -1 for a loss, 0 otherwise

• Monte Carlo tree search for self-play training and play 1 net

Page 101: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

onward

Page 102: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Neural Architecture Search (NAS)• policy gradient where the actions design a neural net• reward is designed net’s validation set performance

Page 103: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

language

Page 104: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning
Page 105: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning
Page 106: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

robot control

Page 107: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

self-driving cars•interest•results?

Page 108: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

DotA 2

Page 109: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

conclusion

Page 110: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning
Page 111: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

reinforcement learning•𝑟, 𝑠 → 𝑎•𝜋: 𝑠 → 𝑎•𝑣: 𝑠 → ∑𝑟��•𝑞: 𝑠, 𝑎 → ∑𝑟��

Page 112: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

network architectures for deep RL• feature engineering can still matter• if using pixels� often simpler than state-of-the-art for supervised � don’t pool away location information if you need it

• consider using multiple/auxiliary outputs• consider phrasing regression as classification• room for big advancements

Page 113: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

the lure and limits of RL•seems like AI (?)•needs so much data

Page 114: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Question•Should you use reinforcement learning?

Page 115: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

Thank you!

Page 116: Deep Reinforcement Learning - space · 11/14/2017  · further resources •planspace.org has these slides and links for all resources •A Brief Survey of Deep Reinforcement Learning

further resources• planspace.org has these slides and links for all resources

• A Brief Survey of Deep Reinforcement Learning (paper)

• Karpathy’s Pong from Pixels (blog post)

• Reinforcement Learning: An Introduction (textbook)

• David Silver’s course (videos and slides)

• Deep Reinforcement Learning Bootcamp (videos, slides, and labs)

• OpenAI gym / baselines (software)

• National Go Center (physical place)

• Hack and Tell (fun meetup)