lab 6-1: q network - github pages · 2017-10-02 · lab 6-1: q network reinforcement learning with...

18
Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim <[email protected]>

Upload: others

Post on 17-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim  State(0~15) as input

Lab 6-1: Q Network

Reinforcement Learning with TensorFlow&OpenAI GymSung Kim <[email protected]>

Page 2: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim  State(0~15) as input

State(0~15) as input

(2)Ws(1)sstate 7

Page 3: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim  State(0~15) as input

State(0~15) as input

(2)Ws(1)sOne-hot state 7

Page 4: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim  State(0~15) as input

np.identify

Page 5: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim  State(0~15) as input

State (0~15) as input

(2)Ws(1)sOne-hot state 7

Page 6: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim  State(0~15) as input

Q-Network training (Network construction)

(2)Ws(1)s

Page 7: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim  State(0~15) as input

Q-Network training (linear regression)

(2)Ws(1)s

y = r + �maxQ(s0)

cost(W ) = (Ws� y)2

Page 8: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim  State(0~15) as input

Algorithm

Playing Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al.

Page 9: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim  State(0~15) as input

Algorithm

Playing Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al.

Page 10: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim  State(0~15) as input

Y label and loss function

Playing Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al.

Page 11: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim  State(0~15) as input

Code: Network and setup

Page 12: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim  State(0~15) as input

Code: Training

Page 13: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim  State(0~15) as input

Code: results

Percent of successful episodes: 0.5195%

Page 14: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim  State(0~15) as input

Q-Table VS NetworkQ-network: 0.5195%

Q-table: 0.653

Page 15: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim  State(0~15) as input

Array shape

[[0, 2, …]][ [0,1,2,3], [3,1,2,3], [0,5,2,3], … ]

1x16

16x4

[[a1,a2,a3,a4]]1x4

Page 16: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim  State(0~15) as input

Array Shape

[[a1,a2,a3,a4]]1x4

Page 17: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim  State(0~15) as input

Exercise

• Too slow- Minibatch?

• A bit unstable?

Page 18: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim  State(0~15) as input

Next

Lab: Q-network for

cart pole