cs230.stanford.edu...deep q-learning uses the same dqn to select and evaluate actions, which can...

6

Upload: others

Post on 21-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: cs230.stanford.edu...Deep Q-learning uses the same DQN to select and evaluate actions, which can result in overestimation of Q-values [10]. Those overestimations may lead to "overoptimism"
Page 2: cs230.stanford.edu...Deep Q-learning uses the same DQN to select and evaluate actions, which can result in overestimation of Q-values [10]. Those overestimations may lead to "overoptimism"
Page 3: cs230.stanford.edu...Deep Q-learning uses the same DQN to select and evaluate actions, which can result in overestimation of Q-values [10]. Those overestimations may lead to "overoptimism"
Page 4: cs230.stanford.edu...Deep Q-learning uses the same DQN to select and evaluate actions, which can result in overestimation of Q-values [10]. Those overestimations may lead to "overoptimism"
Page 5: cs230.stanford.edu...Deep Q-learning uses the same DQN to select and evaluate actions, which can result in overestimation of Q-values [10]. Those overestimations may lead to "overoptimism"
Page 6: cs230.stanford.edu...Deep Q-learning uses the same DQN to select and evaluate actions, which can result in overestimation of Q-values [10]. Those overestimations may lead to "overoptimism"