making deep q-learning approaches robust to time

4
Making Deep Q-learning Approaches Robust to Time Discretization Corentin Tallec eonard Blier Yann Ollivier Universit´ e Paris-Sud, Facebook AI Research June 4, 2019 C. Tallec et al. (UPSUD, FAIR) Framerate robust DQ Learning June 4, 2019 1/4

Upload: others

Post on 08-Dec-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Making Deep Q-learning Approaches Robust to Time DiscretizationCorentin Tallec Leonard Blier Yann Ollivier
Universite Paris-Sud, Facebook AI Research
June 4, 2019
C. Tallec et al. (UPSUD, FAIR) Framerate robust DQ Learning June 4, 2019 1 / 4
Reinforcement Learning in Near Continuous Time
What happens when using standard RL methods with small time discretization or high framerate?
Usual RL algorithm + high framerate → failure
Scalability limited by algorithms! Better hardware, sensors, actuators → Worse performance
Contributes to lack of robustness of Deep RL: New environment → different framerate → new hyperparameters.
Low FPS High FPS
C. Tallec et al. (UPSUD, FAIR) Framerate robust DQ Learning June 4, 2019 2 / 4
ddpg_low_best.mp4
As δt → 0, Qπ(s, a) → V π(s)
Qπ does not depend on actions when δt → 0 =⇒ Cannot use Qπ to select actions!
There is no continuous time ε-greedy exploration
ε-greedy, ε = 1 pendulum:
δt = .05 δt = .0001
C. Tallec et al. (UPSUD, FAIR) Framerate robust DQ Learning June 4, 2019 3 / 4
eps-0-05.mp4
Poster #32 this evening
Low FPS High FPS
C. Tallec et al. (UPSUD, FAIR) Framerate robust DQ Learning June 4, 2019 4 / 4
advup_low_best.mp4