practice theory · 2020. 1. 3. · practice theory powerful modeling, simple exploration...

48

Upload: others

Post on 17-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",
Page 2: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",
Page 3: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

Practice Theory

Powerfulmodeling,simpleexploration Sophisticatedexploration insmall-stateMDPs

e.g.:AtariDeepReinforcement Learning e.g.𝐸",R-MAXalgorithms

Limitedtheoryforrichobservations

Goal

DevelopReinforcementLearningapproachesguaranteed tolearnanoptimalpolicy withasmallnumberofsamples despiterichobservations.

Page 4: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

Model PACGuarantees

Small-state MDPs Known

Structured large-stateMDPs New

ReactivePOMDPs Extended

ReactivePSRs New

LQR (continuousactions) Known

Page 5: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

Model PACGuarantees

Small-state MDPs Known

Structured large-stateMDPs New

ReactivePOMDPs Extended

ReactivePSRs New

LQR (continuousactions) Known

Page 6: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

§

§

§

Page 7: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",
Page 8: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",
Page 9: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",
Page 10: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

𝐻

Page 11: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

§

Page 12: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

§

𝜋(𝑥')

§

Page 13: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",
Page 14: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",
Page 15: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

§

§

§

§

§

Page 16: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",
Page 17: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",
Page 18: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

§

§

§

§

§

Page 19: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",
Page 20: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

§

§

§

§

𝑥

§

Page 21: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

§

𝜋 𝑥 ) *

Distributionofinitialstate

Distributionofnextstate

Instantaneousreward

Page 22: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

§

max/E0~23 𝑟 𝑎 + E*7~8 *,/ 𝑉⋆(𝑥<)

Distributionofinitialstate

DistributionofnextstateInstantaneous

reward

Optimalaction

Page 23: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

§

max/E0~23 𝑟 𝑎 + E*7~8 *,/ 𝑉⋆(𝑥<)

𝑄⋆(𝑥, 𝑎)

𝜋⋆ 𝑥 = argmax/

𝑄⋆ 𝑥, 𝑎

Page 24: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

§

§

Page 25: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

§

§

§

§

Page 26: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

§

§

§

§

Page 27: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",
Page 28: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

E0~23 𝑟 𝑎 + E*7~8 *,/ 𝑉⋆ 𝑥<

§

Page 29: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

E0~23 𝑟 𝑎 + E*7~8 *,/ max/7𝑄⋆(𝑥<, 𝑎<)

§

Page 30: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

E0~23 𝑟 𝑎 + E*7~8 *,/ 𝑄⋆(𝑥<, 𝜋⋆ 𝑥< )

§

§

Page 31: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

E 𝑓 𝑥', 𝑎' − 𝑟' − 𝑓 𝑥'CD, 𝑎'CD ,

𝑥'

Page 32: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

E 𝑓 𝑥', 𝑎' − 𝑟' − 𝑓 𝑥'CD, 𝑎'CD ,

Page 33: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

§

§

§ Validitycondition

Page 34: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

§

§

§

§

Page 35: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

§

§

§

§

Page 36: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",
Page 37: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

§

§

§

§

§

Page 38: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

§

§§

§

§

§

Page 39: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",
Page 40: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

§

§E*∼8F max/ [𝑄⋆ 𝑥, 𝑎 ]

E*∼8F𝑄⋆(𝑥, 𝜋⋆ 𝑥 )

Page 41: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

§

§

§ 𝑉I = E𝒙∼𝚪𝟏[𝒇 𝒙, 𝝅𝒇 𝒙 ]

§

§

§

§

§

Optimismunderuncertainty,guessfor𝑉 𝜋⋆ if𝑓 = 𝑄⋆

Checkingouroptimisticbelief

Prunethepossiblesolutions

Page 42: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",
Page 43: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

§§

§§

§

§

§

§

Page 44: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

§

§

§

Page 45: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

§

§

§

§

Page 46: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

§

§

§

§

§

Page 47: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

§

§

§

§

§

Page 48: Practice Theory · 2020. 1. 3. · Practice Theory Powerful modeling, simple exploration Sophisticated explorationin small-state MDPs e.g.: Atari Deep ReinforcementLearning e.g. !",

Detailsat:https://arxiv.org/abs/1610.09512