non-linear value function approximation: double deep q
Post on 08-Dec-2021
2 Views
Preview:
TRANSCRIPT
Non-linear Value Function Approximation: Double Deep Q-Networks
Alina Vereshchaka
CSE4/510 Reinforcement LearningFall 2019
avereshc@buffalo.edu
October 8, 2019
*Slides are based on Deep Reinforcement Learning: Q-Learning by Garima Lalwani, Karan Ganju, UnnatJain. Illinois
Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 1 / 18
Overview
1 Recap: DQN
2 Double Deep Q Network
Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 2 / 18
Table of Contents
1 Recap: DQN
2 Double Deep Q Network
Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 3 / 18
Recap: Deep Q-Networks (DQN)
Represent value function by deep Q-network with weights w
Q(s, a,w) ≈ Qπ(s, a)
Define objective function
Leading to the following Q-leaning gradient
Optimize objective end-to-end by SGD, using ∂L(w)∂w
Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 4 / 18
Deep Q-Networks
DQN provides a stable solution to deep value-based RL
1 Use experience replay
2 Freeze target Q-network
3 Clip rewards or normalize network adaptive to sensible range
Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 5 / 18
Deep Q-Network (DQN) Architecture
Naive DQN Optimized DQN used by DeepMind
Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 6 / 18
DQN in Atari
Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 7 / 18
DQN Algorithm
Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 8 / 18
Table of Contents
1 Recap: DQN
2 Double Deep Q Network
Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 9 / 18
Double Q learning
Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 10 / 18
Double Q learning
Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 11 / 18
Double Q-learning
Two estimators:
Estimator Q1: Obtain best actions
Estimator Q2: Evaluate Q for the above action
What is the main motivation?
Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 12 / 18
Double Q-learning
Two estimators:
Estimator Q1: Obtain best actions
Estimator Q2: Evaluate Q for the above action
What is the main motivation?
Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 12 / 18
Double Q-learning
Two estimators:
Estimator Q1: Obtain best actions
Estimator Q2: Evaluate Q for the above action
Chances of both estimators overestimating at sameaction is lesser
Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 13 / 18
Double Q-learning
Two estimators:
Estimator Q1: Obtain best actions
Estimator Q2: Evaluate Q for the above action
Q1(s, a)← Q1(s, a) + α(Target− Q1(s, a))
Q Target: r(s, a) + γmaxa′ Q1(s′, a′)
Double Q Target: r(s, a) + γQ2(s′, argmaxa′(Q1(s
′, a′)))
Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 14 / 18
Double Q-learning
Two estimators:
Estimator Q1: Obtain best actions
Estimator Q2: Evaluate Q for the above action
Q1(s, a)← Q1(s, a) + α(Target− Q1(s, a))
Q Target: r(s, a) + γmaxa′ Q1(s′, a′)
Double Q Target: r(s, a) + γQ2(s′, argmaxa′(Q1(s
′, a′)))
Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 14 / 18
Double Q-learning
Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 15 / 18
Double Deep Q Network
Two estimators:
Estimator Q1: Obtain best actions
Estimator Q2: Evaluate Q for the above action
Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 16 / 18
Double Deep Q Network
Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 17 / 18
Are the Q-values accurate?
Alina Vereshchaka (UB) CSE4/510 Reinforcement Learning, Lecture 12 October 8, 2019 18 / 18
top related