lab 6-1: q network - github pages · 2017-10-02 · lab 6-1: q network reinforcement learning with...

Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim <[email protected]>

Upload: others

Post on 17-Mar-2020

0 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Lab 6-1: Q Network

Reinforcement Learning with TensorFlow&OpenAI GymSung Kim <[email protected]>

State(0~15) as input

(2)Ws(1)sstate 7

State(0~15) as input

(2)Ws(1)sOne-hot state 7

np.identify

State (0~15) as input

(2)Ws(1)sOne-hot state 7

Q-Network training (Network construction)

(2)Ws(1)s

Page 7: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input

Q-Network training (linear regression)

(2)Ws(1)s

y = r + �maxQ(s0)

cost(W ) = (Ws� y)2

Page 8: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input

Algorithm

Playing Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al.

Page 9: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input

Algorithm

Playing Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al.

Page 10: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input

Y label and loss function

Playing Atari with Deep Reinforcement Learning - University of Toronto by V Mnih et al.

Page 11: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input

Code: Network and setup

Page 12: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input

Code: Training

Page 13: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input

Code: results

Percent of successful episodes: 0.5195%

Page 14: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input

Q-Table VS NetworkQ-network: 0.5195%

Q-table: 0.653

Page 15: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input

Array shape

[[0, 2, …]][ [0,1,2,3], [3,1,2,3], [0,5,2,3], … ]

1x16

16x4

[[a1,a2,a3,a4]]1x4

Page 16: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input

Array Shape

[[a1,a2,a3,a4]]1x4

Page 17: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input

Exercise

• Too slow- Minibatch?

• A bit unstable?

Page 18: Lab 6-1: Q Network - GitHub Pages · 2017-10-02 · Lab 6-1: Q Network Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim State(0~15) as input

Lab: Q-network for

cart pole

Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI

Lab 4: Q-learning (table) - GitHub Pages · Lab 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Working in OpenAI Environments Designing Your Own · Designing Your Own Mike Rudd CS 885 Guest Lecture May 18, 2018. OpenAI* •Not-for-profit, funded by private ... Building Your

Generative Adversarial Networks (GANs) - Ian Goodfellow · 2020. 9. 24. · Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Presentation at Berkeley

Lecture 3: Q-learning (table) - GitHub Pages · Lecture 3: Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Improving Language Understanding by Generative …openai-assets.s3.amazonaws.com/research-covers/language...such as language modeling [44], machine translation [38], and discourse

Lecture 7: DQN - GitHub Pageshunkim.github.io/ml/RL/rl07.pdf · Lecture 7: DQN Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim Q-function Approximation:

ns-3 meets OpenAI Gym: The Playground for …...ns-3 meets OpenAI Gym MSWiM ’19, Nov 25–29, 2019, Miami Beach, USA over Ethernet or WiFi network devices. Based on the core concepts,

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 34, NO. …people.csail.mit.edu/hunkim/papers/kim-tse-2008.pdf · 2015-09-04 · classification uses a machine learning classifier

Measuring the Algorithmic Efficiency of Neural NetworksMeasuring the Algorithmic Efﬁciency of Neural Networks Danny Hernandez OpenAI [email protected] Tom B. Brown OpenAI [email protected]

Lecture 4: Q-learning (table) - GitHub Pages · Lecture 4: Q-learning (table) exploit&exploration and discounted future reward Reinforcement Learning with TensorFlow&OpenAI Gym Sung

10-703 Deep RL and Controls OpenAI Gym Recitation API Basic Datatypes ... Minecraft. VirtualEnv Installation ... 10-703 Deep RL and Controls OpenAI Gym Recitation Author: Devin Schwab

Extending the OpenAI Gym for robotics: a toolkit for ...erlerobotics.com/whitepaper/robot_gym.pdfExtending the OpenAI Gym for robotics: a toolkit for reinforcement learning using ROS

Catacomb : A database backed WebDAV and DASL repositorywebdav.org/papers/catacomb-apachecon2002.pdf · Title: Catacomb : A database backed WebDAV and DASL repository Author: hunkim

State Common Entrance Test Cell - fe2019.mahacet.orgfe2019.mahacet.org/CAP-I/CAPR-I_EN3207.pdf · 95 693790.6428261EN19333354GHANGREKAR ATHARVA VINAY M OPENAI 96 753789.8902170EN19247083ANKIT

CLAMI: Defect Prediction on Unlabeled Datasetspeople.csail.mit.edu/hunkim/papers/nam-HDP-fse2015.pdfthat show the potential for defect prediction on unlabeled datasets in an automated

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 39, …people.csail.mit.edu/hunkim/papers/kim-tse2013.pdf · information about a specific software failure, including the failure symptoms,

OpenAI Five Model Architecture · OpenAI Five Model Architecture (06/06/2018) Title: dota_network_diagram Created Date: 6/24/2018 4:00:19 PM

Automatically Generated Patches as Debugging Aids: A Human ...people.csail.mit.edu/hunkim/papers/tao-fse2014.pdf · Debugging is of paramount importance in the process of software

Lecture 7: DQN - GitHub Pages · PDF fileLecture 7: DQN Reinforcement Learning with TensorFlow&OpenAI Gym ... Deep Q-Networks ... Deep Reinforcement Learning, David Silver,

Jonas Schneider, Head of Engineering for Robotics, OpenAI

Lab 2: Playing OpenAI Gym Games - GitHub Pageshunkim.github.io/ml/RL/rl-l02.pdf · Lab 2: Playing OpenAI Gym Games Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Lab 7: DQN 1 (NIPS 2013) - GitHub Pages · Lab 7: DQN 1 (NIPS 2013) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Lab 2: Playing OpenAI Gym Games - GitHub Pages · 2017-10-02 · Lab 2: Playing OpenAI Gym Games Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

PDF - arXiv · Proximal Policy Optimization Algorithms John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov OpenAI fjoschu, filip, prafulla, alec

greenwaycollab.com€¦ · q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q

Lecture 1: Introduction - GitHub Pageshunkim.github.io/ml/RL/rl01.pdf · Lecture 1: Introduction Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Model-Based Reinforcement Learning via Meta-Policy ...h2t.anthropomatik.kit.edu/pdf/Rothfuss2018.pdf · Jonas Rothfuss KIT, UC Berkeley [email protected] John Schulman OpenAI

Ian Goodfellow, OpenAI Research Scientist Guest lecture for CS 294

GeoﬀreyIrving PaulChristiano DarioAmodei OpenAIAI safety via debate GeoﬀreyIrving∗ PaulChristiano OpenAI DarioAmodei Abstract TomakeAIsystemsbroadlyusefulforchallengingreal-worldtasks,weneedthemtolearn

OpenAI Five Model Architecture - Amazon S3

ReLink: Recovering Links between Bugs and Changes › ~hunkim › images › b › b6 › Relink_fse2011.pdf · We manually inspected the explicit links, which have explicit bug IDs

Lab 3: Dummy Q-learning (table) - GitHub Pages · PDF fileLab 3: Dummy Q-learning (table) Reinforcement Learning with TensorFlow&OpenAI Gym Sung Kim

Large-Scale Study of Curiosity-Driven LearningLarge-Scale Study of Curiosity-Driven Learning Yuri Burda OpenAI Harri Edwards OpenAI Deepak Pathak UC Berkeley Amos Storkey Univ. of

ut s OpenAI + DotA 2kanmy/courses/6101_1820/s13.pdf · ut s OpenAI Rapid[1] … a general-purpose RL training system . ut s Proximal Policy Optimization[1, 4, 5]