lab 6-1: q network - github pageshunkim.github.io/ml/rl/rl06-l1.pdfplaying atari with deep...

Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim <[email protected]>

Upload: tranduong

Post on 09-Jun-2018

219 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Lab 6-1: Q Network

Reinforcement Learning with TensorFlow&OpenAI GymSung Kim <[email protected]>

State(0~15) as input

(2)Ws(1)sstate 7

Page 3: Lab 6-1: Q Network - GitHub Pageshunkim.github.io/ml/RL/rl06-l1.pdfPlaying Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al. Algorithm Playing Atari with

State(0~15) as input

(2)Ws(1)sOne-hot state 7

Page 4: Lab 6-1: Q Network - GitHub Pageshunkim.github.io/ml/RL/rl06-l1.pdfPlaying Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al. Algorithm Playing Atari with

np.identify

Page 5: Lab 6-1: Q Network - GitHub Pageshunkim.github.io/ml/RL/rl06-l1.pdfPlaying Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al. Algorithm Playing Atari with

State (0~15) as input

(2)Ws(1)sOne-hot state 7

Page 6: Lab 6-1: Q Network - GitHub Pageshunkim.github.io/ml/RL/rl06-l1.pdfPlaying Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al. Algorithm Playing Atari with

Q-Network training (Network construction)

(2)Ws(1)s

Page 7: Lab 6-1: Q Network - GitHub Pageshunkim.github.io/ml/RL/rl06-l1.pdfPlaying Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al. Algorithm Playing Atari with

Q-Network training (linear regression)

(2)Ws(1)s

y = r + �maxQ(s0)

cost(W ) = (Ws� y)2

Page 8: Lab 6-1: Q Network - GitHub Pageshunkim.github.io/ml/RL/rl06-l1.pdfPlaying Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al. Algorithm Playing Atari with

Algorithm

Playing Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al.

Page 9: Lab 6-1: Q Network - GitHub Pageshunkim.github.io/ml/RL/rl06-l1.pdfPlaying Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al. Algorithm Playing Atari with

Algorithm

Playing Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al.

Page 10: Lab 6-1: Q Network - GitHub Pageshunkim.github.io/ml/RL/rl06-l1.pdfPlaying Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al. Algorithm Playing Atari with

Y label and loss function

Playing Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al.

Page 11: Lab 6-1: Q Network - GitHub Pageshunkim.github.io/ml/RL/rl06-l1.pdfPlaying Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al. Algorithm Playing Atari with

Code: Network and setup

Page 12: Lab 6-1: Q Network - GitHub Pageshunkim.github.io/ml/RL/rl06-l1.pdfPlaying Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al. Algorithm Playing Atari with

Code: Training

Page 13: Lab 6-1: Q Network - GitHub Pageshunkim.github.io/ml/RL/rl06-l1.pdfPlaying Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al. Algorithm Playing Atari with

Code: results

Percent of successful episodes: 0.5195%

Page 14: Lab 6-1: Q Network - GitHub Pageshunkim.github.io/ml/RL/rl06-l1.pdfPlaying Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al. Algorithm Playing Atari with

Q-Table VS NetworkQ-network: 0.5195%

Q-table: 0.653

Page 15: Lab 6-1: Q Network - GitHub Pageshunkim.github.io/ml/RL/rl06-l1.pdfPlaying Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al. Algorithm Playing Atari with

Array shape

[[0, 2, …]][ [0,1,2,3], [3,1,2,3], [0,5,2,3], … ]

1x16

16x4

[[a1,a2,a3,a4]]1x4

Page 16: Lab 6-1: Q Network - GitHub Pageshunkim.github.io/ml/RL/rl06-l1.pdfPlaying Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al. Algorithm Playing Atari with

Array Shape

[[a1,a2,a3,a4]]1x4

Page 17: Lab 6-1: Q Network - GitHub Pageshunkim.github.io/ml/RL/rl06-l1.pdfPlaying Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al. Algorithm Playing Atari with

Exercise

• Too slow- Minibatch?

• A bit unstable?

Page 18: Lab 6-1: Q Network - GitHub Pageshunkim.github.io/ml/RL/rl06-l1.pdfPlaying Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al. Algorithm Playing Atari with

Lab: Q-network for

cart pole

Atari 520ST Schematic

Atari 2600 Homebrew

De Re Atari - musanim.com · Preface Introduction by enter value here This manual is about the ATARI Home Computer. It covers both the ATARI 400TM and the ATARI 800TM Computers

Atari 2600 Adventure

Atari 410 Program Recorder Operator's Manual (Atari)

Atari 1040ST Schematic

Hard 'n Heavy - Atari ST - Manual - gamesdatabase · Hard 'n Heavy - Atari ST - Manual - gamesdatabase.org Author: gamesdatabase.org Subject: Atari ST game manual Keywords: Atari

Atari Puzzlements

Atari petition

Arcade Games Arcade Blaster List · Atari Baseball Atari Football Atari Mini Golf Atari Soccer Ataxx Athena Athena no Hatena ? Atomic Point Atomic Robo‐kid Attack Ufo Aurail Avalanche

Album Ploc Atari

Your Atari Computer, A Guide to Atari 400/800 Personal Computers

SIT-IN DEDICATED GAME OPERATION MANUAL Francisco Rush Man… · atari _.._ .__. . atari. \.._! ; atari! > _. : _‘“,:. .__ -... __.____ atari.-

Rana 1000 Systems - the ATARI site – the ATARI site

Atari ST Internals

Atari Vazno

Atari Software

Atari-HEAD Atari Human Eye-Tracking and Demonstrationzharucs/publications/AAAI2020-RLG.pdf · Atari-HEAD Atari Human Eye-Tracking and Demonstration Dataset Ruohan Zhang*, Calen Walshe,

Dungeon Master - Atari ST - Manual - gamesdatabase...Title Dungeon Master - Atari ST - Manual - gamesdatabase.org Author gamesdatabase.org Subject Atari ST game manual Keywords Atari

Atari Teenage Riot

Star Raiders - Atari 8-bit - Manual - gamesdatabase · atari 400/800 computer adventure star raiders ata company model cxl4011 use with atari' 400'" or atari personal computer systems

ATARI LOGO - atarionline.platarionline.pl/biblioteka/materialy_ksiazkowe/Atari Logo... · ATARI LOGO Introduction to Programming ... A Game Project 125 ... To use the ATARI Logo Cartridge,

Liste des jeux (64Go) - BoutiqueDuGeek.fr · Atari ST Pictionary Atari ST Pirates of the Barbary Coast Atari ST Player Manager Atari ST Police Quest : In Pursuit of the Death Angel

Atari System Reference manualmixinc.net/atari/books/Atari-System_Reference_manual.pdf · 2014. 12. 24. · Atari ∗System Reference ... 8 The numbers on top are the bit numbers

Status of CSR RL06 GRACE reprocessing and preliminary results · • Single ACC RL06 solutions from Nov 2016 to June 2017 will be ... is a 4-character mnemonic used to identify the

Atari - Lazard

Atari xbox

Atari Punk Console

Sea Battle - Atari 2600 - Manual - gamesdatabase Battle - Atari 2600 - Manual - gamesdatabase.org Author: gamesdatabase.org Subject: Atari 2600 game manual Keywords: Atari 2600 …

Battlezone - Atari 2600 - Manual - gamesdatabase€¦ · Battlezone - Atari 2600 - Manual - gamesdatabase.org Atari 2600 1983 Atari Simulation system game manual

Zaxxon - Atari 2600 - Manual - gamesdatabase · Zaxxon - Atari 2600 - Manual - gamesdatabase.org Author: gamesdatabase.org Subject: Atari 2600 game manual Keywords: Atari 2600 1982

Atari The book

ATARI® PROGRAM EXCHANGE Program... · Atari, Inc., created the ATARI Program Exchange (APX) to distribute user-written software for ATARI Home Computers. Our goal is to increase

Atari-HEAD: Atari Human Eye-Tracking and Demonstration Dataset