div class=ts-pagebutton class=gotoPage data-page=1Page 1button div class=ts-imageimg data-url=cs230-deep-q-learning-uses-the-same-dqn-to-select-and-evaluate-actions-whichhtmlpage=1 data-page=1 class=ts-thumb lazyload alt=Page 1: cs230stanfordeduDeep Q-learning uses the same DQN to select and evaluate actions which can result in overestimation of Q-values 10 Those overestimations may lead to overoptimism loading=lazy src=data:imagegifbase64iVBORw0KGgoAAAANSUhEUgAAAAIAAAACCAQAAADYv8WvAAAAD0lEQVR42mP8X8AwAgiABKBAv+vAXklAAAAAElFTkSuQmCC data-src=https:reader033vdocumentsusreader033viewer20220422205ec69a6625ea1f1b6d46a0a0html5thumbnails1jpg width=140 height=200 divdivdiv class=ts-pagebutton class=gotoPage data-page=2Page 2button div class=ts-imageimg data-url=cs230-deep-q-learning-uses-the-same-dqn-to-select-and-evaluate-actions-whichhtmlpage=2 data-page=2 class=ts-thumb lazyload alt=Page 2: cs230stanfordeduDeep Q-learning uses the same DQN to select and evaluate actions which can result in overestimation of Q-values 10 Those overestimations may lead to overoptimism loading=lazy src=data:imagegifbase64iVBORw0KGgoAAAANSUhEUgAAAAIAAAACCAQAAADYv8WvAAAAD0lEQVR42mP8X8AwAgiABKBAv+vAXklAAAAAElFTkSuQmCC data-src=https:reader033vdocumentsusreader033viewer20220422205ec69a6625ea1f1b6d46a0a0html5thumbnails2jpg width=140 height=200 divdivdiv class=ts-pagebutton class=gotoPage data-page=3Page 3button div class=ts-imageimg data-url=cs230-deep-q-learning-uses-the-same-dqn-to-select-and-evaluate-actions-whichhtmlpage=3 data-page=3 class=ts-thumb lazyload alt=Page 3: cs230stanfordeduDeep Q-learning uses the same DQN to select and evaluate actions which can result in overestimation of Q-values 10 Those overestimations may lead to overoptimism loading=lazy src=data:imagegifbase64iVBORw0KGgoAAAANSUhEUgAAAAIAAAACCAQAAADYv8WvAAAAD0lEQVR42mP8X8AwAgiABKBAv+vAXklAAAAAElFTkSuQmCC data-src=https:reader033vdocumentsusreader033viewer20220422205ec69a6625ea1f1b6d46a0a0html5thumbnails3jpg width=140 height=200 divdivdiv class=ts-pagebutton class=gotoPage data-page=4Page 4button div class=ts-imageimg data-url=cs230-deep-q-learning-uses-the-same-dqn-to-select-and-evaluate-actions-whichhtmlpage=4 data-page=4 class=ts-thumb lazyload alt=Page 4: cs230stanfordeduDeep Q-learning uses the same DQN to select and evaluate actions which can result in overestimation of Q-values 10 Those overestimations may lead to overoptimism loading=lazy src=data:imagegifbase64iVBORw0KGgoAAAANSUhEUgAAAAIAAAACCAQAAADYv8WvAAAAD0lEQVR42mP8X8AwAgiABKBAv+vAXklAAAAAElFTkSuQmCC data-src=https:reader033vdocumentsusreader033viewer20220422205ec69a6625ea1f1b6d46a0a0html5thumbnails4jpg width=140 height=200 divdivdiv class=ts-pagebutton class=gotoPage data-page=5Page 5button div class=ts-imageimg data-url=cs230-deep-q-learning-uses-the-same-dqn-to-select-and-evaluate-actions-whichhtmlpage=5 data-page=5 class=ts-thumb lazyload alt=Page 5: cs230stanfordeduDeep Q-learning uses the same DQN to select and evaluate actions which can result in overestimation of Q-values 10 Those overestimations may lead to overoptimism loading=lazy src=data:imagegifbase64iVBORw0KGgoAAAANSUhEUgAAAAIAAAACCAQAAADYv8WvAAAAD0lEQVR42mP8X8AwAgiABKBAv+vAXklAAAAAElFTkSuQmCC data-src=https:reader033vdocumentsusreader033viewer20220422205ec69a6625ea1f1b6d46a0a0html5thumbnails5jpg width=140 height=200 divdivdiv class=ts-pagebutton class=gotoPage data-page=6Page 6button div class=ts-imageimg data-url=cs230-deep-q-learning-uses-the-same-dqn-to-select-and-evaluate-actions-whichhtmlpage=6 data-page=6 class=ts-thumb lazyload alt=Page 6: cs230stanfordeduDeep Q-learning uses the same DQN to select and evaluate actions which can result in overestimation of Q-values 10 Those overestimations may lead to overoptimism loading=lazy src=data:imagegifbase64iVBORw0KGgoAAAANSUhEUgAAAAIAAAACCAQAAADYv8WvAAAAD0lEQVR42mP8X8AwAgiABKBAv+vAXklAAAAAElFTkSuQmCC data-src=https:reader033vdocumentsusreader033viewer20220422205ec69a6625ea1f1b6d46a0a0html5thumbnails6jpg width=140 height=200 divdiv