rl2: q(s,a) easily gets us u(s) and pi(s) leftarrow with alpha above = move towards target, weighted...

Report

Tags:

Post on 18-Jan-2016

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

• t, r -> planner -> policy Dyn. Prog.• s, a, s’, r -> learner -> policy Monte Carlo• s, a, s’, r -> modeler -> t,r• model -> simulate -> s, a, s’, r

• Q-Learning ?• TD ?

• DP, MC, DP• Model, online/offline update, bootstrapping,

on-policy/off-policy • Batch updating vs. online updating

Ex. 6.4

• V(B) = ¾

Ex. 6.4

• V(B) = ¾• V(A) = 0? Or ¾?

• 4 is terminal state• V(3) = 0.5• TD(0) here is better than MC for V(2). Why?

• 4 is terminal state• V(3) = 0.5• TD(0) here is better than MC. Why?

• Visit 2 on the kth time, state 3 visited 10k times• Variance for MC will be much higher than TD(0) because of

bootstrapping

• 4 is terminal state• V(3) = 0.5

• Change so that R(3,a,4) was deterministic• Now, MC would be faster

• What happened in the 1st episode?

Q-Learning

• Decaying learning rate?• Decaying exploration rate?

top related

choi recall: what is a series? a series is the sum of terms...

Documents

ö’ s sum dis, sum dat papa’s fresh market happy hour p...

Documents

r.link video-inserter rl2-dvd900 for opel, chevrolet and...

Documents

sum of and sum of sum of sum of due for sales proceed for...

Documents

rl2: multi-armedbandits(mab) · rl2:...

Documents

start - kanwal rekhivijaya/ssrvm/dokuwiki/media/s...print...

Documents

specification · 2019-01-08 · 1.7. provisional sum(s) it...

Documents

a sum-utility maximization approach for fairness resource...

Documents

r.link video-inserter rl2-mib-3 for vag vehicles with mib...

Documents

r.link video-inserter rl2-mmi3g-q3 rl2-mmi3g-gw …micro-fit...

Documents

r.link video-inserter ci-rl2-mmi3g-q3 ci-rl2-mmi3g-gw

Documents

what is a condensate of cmaps?. the s pre and s post cmaps:...

Documents

sum conference 2017 handbook - selu · sum conference...

Documents

sum of unpaid and unclaimed dividend sum of interest on...

Documents

jackson county - storage.googleapis.com · michael. jackson...

Documents

infinite series - university of utah · pdf file3 infinite...

Documents

a rum sum sum - · pdf file*more musical learning resources...

Documents

sum of and sum of sum of sum of due for sales proceed for...

Documents

polynomial nonnegativity sum of squares (sos...

Documents

files.transtutors.com · 2017-12-14 · find the indicated...

Documents