non-linear value function approximation: double deep q

Non-linear Value Function Approximation: Double Deep Q-Networks

Alina Vereshchaka

CSE4/510 Reinforcement LearningFall 2019

avereshc@buffalo.edu

October 8, 2019

*Slides are based on Deep Reinforcement Learning: Q-Learning by Garima Lalwani, Karan Ganju, UnnatJain. Illinois

Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 1 / 18

Overview

1 Recap: DQN

2 Double Deep Q Network

Table of Contents

1 Recap: DQN

Recap: Deep Q-Networks (DQN)

Represent value function by deep Q-network with weights w

Q(s, a,w) ≈ Qπ(s, a)

Define objective function

Leading to the following Q-leaning gradient

Optimize objective end-to-end by SGD, using ∂L(w)∂w

Deep Q-Networks

DQN provides a stable solution to deep value-based RL

1 Use experience replay

2 Freeze target Q-network

3 Clip rewards or normalize network adaptive to sensible range

Deep Q-Network (DQN) Architecture

Naive DQN Optimized DQN used by DeepMind

DQN in Atari

DQN Algorithm

Table of Contents

1 Recap: DQN

Double Q learning

Double Q-learning

Two estimators:

Estimator Q1: Obtain best actions

Estimator Q2: Evaluate Q for the above action

What is the main motivation?

Double Q-learning

Two estimators:

What is the main motivation?

Double Q-learning

Two estimators:

Chances of both estimators overestimating at sameaction is lesser

Double Q-learning

Two estimators:

Q1(s, a)← Q1(s, a) + α(Target− Q1(s, a))

Q Target: r(s, a) + γmaxa′ Q1(s′, a′)

Double Q Target: r(s, a) + γQ2(s′, argmaxa′(Q1(s

′, a′)))

Double Q-learning

Two estimators:

Q1(s, a)← Q1(s, a) + α(Target− Q1(s, a))

Q Target: r(s, a) + γmaxa′ Q1(s′, a′)

Double Q Target: r(s, a) + γQ2(s′, argmaxa′(Q1(s

′, a′)))

Double Q-learning

Double Deep Q Network

Two estimators:

Double Deep Q Network

Are the Q-values accurate?

non-linear value function approximation: double deep q

Documents

non-linear approximation of reﬂectance functions

near-linear time approximation algorithms for optimal...

linear programming for approximation...

linear time approximation algorithms for degree constrained...

continuous and discontinuous linear approximation …

linear-programming based approximation algorithms for

vertex cover - linear progamming and approximation...

linear-programming based approximation algorithms for...

on piecewise linear approximation of quadratic functions

linear time approximation algorithms for degree constrained

3020 differentials and linear approximation ab calculus

approximation with linear, biarc, conic and bihelix...

least squares approximation: a linear algebra technique

linear approximation

lesson 12: linear approximation and differentials

3020 differentials and linear approximation

3.9 linear approximation and differentials

linear approximation and differentials lesson 3.9

the diffusion approximation for the linear boltzmann ... ·...

3.1 linear approximation (page 95) .. - mit … linear...