decision making under uncertainty lec #10: partially observable mdps uiuc cs 598: section ea...
TRANSCRIPT
![Page 1: Decision Making Under Uncertainty Lec #10: Partially Observable MDPs UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy](https://reader038.vdocuments.us/reader038/viewer/2022110103/5697c0221a28abf838cd3465/html5/thumbnails/1.jpg)
Decision Making Under UncertaintyLec #10: Partially Observable MDPs
UIUC CS 598: Section EA
Professor: Eyal AmirSpring Semester 2006
Some slides by Jeremy Wyatt (U Birmingham), Alp Sardağ,and Craig Boutilier (Toronto)
![Page 2: Decision Making Under Uncertainty Lec #10: Partially Observable MDPs UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy](https://reader038.vdocuments.us/reader038/viewer/2022110103/5697c0221a28abf838cd3465/html5/thumbnails/2.jpg)
Partially Observable Planning
![Page 3: Decision Making Under Uncertainty Lec #10: Partially Observable MDPs UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy](https://reader038.vdocuments.us/reader038/viewer/2022110103/5697c0221a28abf838cd3465/html5/thumbnails/3.jpg)
Today
• Partially Observable Markov Decision Processes:– Stochastic Domains– Partially observable
![Page 4: Decision Making Under Uncertainty Lec #10: Partially Observable MDPs UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy](https://reader038.vdocuments.us/reader038/viewer/2022110103/5697c0221a28abf838cd3465/html5/thumbnails/4.jpg)
POMDPs
• Partially observable Markov Decision Process (POMDP): – a stochastic system = (S, A, P) as before– A finite set O of observations
• Pa(o|s) = probability of observation o in state s after executing action a
– Require that for each a and s, ∑o in O Pa(o|s) = 1
• O models partial observability– The controller can’t observe s directly; it can only observe o– The same observation o can occur in more than one state
• Why do the observations depend on the action a? Why do we have Pa(o|s) rather than P(o|s)?
– This is a way to model sensing actions, which do not change the state but return information make some observation available (e.g., from a sensor)
![Page 5: Decision Making Under Uncertainty Lec #10: Partially Observable MDPs UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy](https://reader038.vdocuments.us/reader038/viewer/2022110103/5697c0221a28abf838cd3465/html5/thumbnails/5.jpg)
Example of a Sensing Action
• Suppose there are a state s1 action a1, and observation o1 with the following properties:– For every state s, Pa1(s|s) = 1
• a1 does not change the state
– Pa1(o1|s1) = 1, andPa1(o1|s) = 0 for every state s ≠ s1
• After performing a1, o1 occurs if and only if we’re in state s1
• Then to tell if you’re in state s1, just perform action a1 and see whether you observe o1
• Two states s and s’ are indistinguishable if for every o and a, Pa(o|s) = Pa(o|s’)
![Page 6: Decision Making Under Uncertainty Lec #10: Partially Observable MDPs UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy](https://reader038.vdocuments.us/reader038/viewer/2022110103/5697c0221a28abf838cd3465/html5/thumbnails/6.jpg)
Belief States
• At each point we will have a probability distribution b(s) over the states in S– b(s) is called a belief state (our belief about what state we’re in)
• Basic properties: – 0 ≤ b(s) ≤ 1 for every s in S
– ∑s in S b(s) = 1
• Definitions:– ba = the belief state after doing action a in belief state b
• Thus ba(s) = P(in s after doing a in b) = ∑s' in S Pa(s|s') b(s')
– ba(o) = P(observe o after doing a in b) = ∑s in S Pa(o|s) b(s)
– bao(s) = P(in s after doing a in b and observing o)
![Page 7: Decision Making Under Uncertainty Lec #10: Partially Observable MDPs UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy](https://reader038.vdocuments.us/reader038/viewer/2022110103/5697c0221a28abf838cd3465/html5/thumbnails/7.jpg)
Belief States (Continued)
• Recall that in general, P(x|y,z) P(y|z) = P(x,y|z)• Thus
Pa(o|s) ba(s)= P(observe o after doing a in s) P(in s after doing a in b)= P(in s and observe o after doing a in b)
• Similarly,ba
o(s) ba(o)= P(in s after doing a in b and observing o) * P(observe o after doing a in b)= P(in s and observe o after doing a in b)
• Thus bao(s) = Pa(o|s) ba(s) / ba(o)
• Can use this to distinguish states that would otherwise be indistinguishable
![Page 8: Decision Making Under Uncertainty Lec #10: Partially Observable MDPs UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy](https://reader038.vdocuments.us/reader038/viewer/2022110103/5697c0221a28abf838cd3465/html5/thumbnails/8.jpg)
Example
• Robot r1 can movebetween l1 and l2– move(r1,l1,l2)
– move(r1,l2,l1)
• There may be a container c1 in location l2– in(c1,l2)
• O = {full, empty}– full: c1 is present
– empty: c1 is absent
– abbreviate full as f, and empty as e
a = move(r1,l1,l2)
state ba
ba
ba
ba
ba
b
b
b
b
state b
![Page 9: Decision Making Under Uncertainty Lec #10: Partially Observable MDPs UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy](https://reader038.vdocuments.us/reader038/viewer/2022110103/5697c0221a28abf838cd3465/html5/thumbnails/9.jpg)
• Neither “move” action returns useful observations
• For every state s and fora = either “move” action,– Pa(f|s) = Pa(e|s) =
Pa(f|s) = Pa(e|s) = 0.5
• Thus if there are no other actions, then– s1 and s2 are
indistinguishable
– s3 and s4 are indistinguishable
a = move(r1,l1,l2)
state ba
ba
ba
ba
ba
b
b
b
b
state b
Example (Continued)
![Page 10: Decision Making Under Uncertainty Lec #10: Partially Observable MDPs UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy](https://reader038.vdocuments.us/reader038/viewer/2022110103/5697c0221a28abf838cd3465/html5/thumbnails/10.jpg)
• Suppose there’s a sensingaction see that worksperfectly in location l2Psee(f|s4) = Psee(e|s3) = 1
Psee(f|s3) = Psee(e|s4) = 0
• see does not work elsewherePsee(f|s1) = Psee(e|s1)
= Psee(f|s2) = Psee(e|s2) = 0.5
• Then– s1 and s2 are still
indistinguishable– s3 and s4 are now
distinguishable
Example (Continued)
a = move(r1,l1,l2)
state ba
ba
ba
ba
ba
b
b
b
b
state b
![Page 11: Decision Making Under Uncertainty Lec #10: Partially Observable MDPs UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy](https://reader038.vdocuments.us/reader038/viewer/2022110103/5697c0221a28abf838cd3465/html5/thumbnails/11.jpg)
• By itself, see doesn’t tellus the state with certainty– bsee
e(s3)= Psee(e|s3) * bsee(s3) / bsee(e)= 1 * 0.25 / 0.5 = 0.5
• If we first do a=move(l1,l2) then do see,this will tell the state with certainty– Let b' = ba
– b'seee(s3)
= Psee(e|s3) * b'see(s3) / b'see(e)= 1 * 0.5 / 0.5 = 1
Example (Continued)
a = move(r1,l1,l2)
state b' = ba
ba
ba
ba
ba
b
b
b
b
state b
![Page 12: Decision Making Under Uncertainty Lec #10: Partially Observable MDPs UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy](https://reader038.vdocuments.us/reader038/viewer/2022110103/5697c0221a28abf838cd3465/html5/thumbnails/12.jpg)
Policies on Belief States
• Let B be the set of all belief states
• In a partially observable domain, a policy is a partial function from B into A
• S was finite, but B is infinite and continuous– A policy may be either finite or infinite
![Page 13: Decision Making Under Uncertainty Lec #10: Partially Observable MDPs UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy](https://reader038.vdocuments.us/reader038/viewer/2022110103/5697c0221a28abf838cd3465/html5/thumbnails/13.jpg)
a = move(r1,l1,l2)
state b' = ba
ba
ba
ba
ba
b
b
b
b
state b
Modified Example
• Suppose we know theinitial belief state is b
• Policy to tell if there’s acontainer in l2:– π = {(b, move(r1,l1,l2)),
(b', see)}
![Page 14: Decision Making Under Uncertainty Lec #10: Partially Observable MDPs UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy](https://reader038.vdocuments.us/reader038/viewer/2022110103/5697c0221a28abf838cd3465/html5/thumbnails/14.jpg)
Solving POMDPs
• Information-state MDPs– Belief states of POMDP are states in new
MDP– Continuous state space– Discretise
• Policy-tree algorithm
![Page 15: Decision Making Under Uncertainty Lec #10: Partially Observable MDPs UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy](https://reader038.vdocuments.us/reader038/viewer/2022110103/5697c0221a28abf838cd3465/html5/thumbnails/15.jpg)
Policy Trees
• Tree(a,T) – create a new policy tree with action a at root and observation z=T(z)
• Vp – vector for value function for policy tree p with one component per state
• Act(p) – action at root of tree p
• Subtree(p,z) – subtree of p after obs z
• Stval(a,z,p) – vector for probability-weighted value of tree p after a,z
![Page 16: Decision Making Under Uncertainty Lec #10: Partially Observable MDPs UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy](https://reader038.vdocuments.us/reader038/viewer/2022110103/5697c0221a28abf838cd3465/html5/thumbnails/16.jpg)
Monahan Enumeration Phase
• Generate all vectors:
Number of gen. Vectors = |A|M||
where M vectors of previous state
![Page 17: Decision Making Under Uncertainty Lec #10: Partially Observable MDPs UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy](https://reader038.vdocuments.us/reader038/viewer/2022110103/5697c0221a28abf838cd3465/html5/thumbnails/17.jpg)
Monahan Reduction Phase• All vectors can be kept:
– Each time maximize over all vectors.– Lot of excess baggage– The number of vectors in next step will be
even large.
• LP used to trim away useless vectors
![Page 18: Decision Making Under Uncertainty Lec #10: Partially Observable MDPs UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy](https://reader038.vdocuments.us/reader038/viewer/2022110103/5697c0221a28abf838cd3465/html5/thumbnails/18.jpg)
Remove Dominated Policy Trees• For a vector to be
useful, there must be at least one belief point it gives larger value than others
• Thus, we solve LP and find if d>0 for every policy tree p,p’
• If d=0 for some p’, we remove p
0][
1][
)(][)(][
min'
sx
sx
dsVsxsVsx
d
s
s
p
s
p
![Page 19: Decision Making Under Uncertainty Lec #10: Partially Observable MDPs UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy](https://reader038.vdocuments.us/reader038/viewer/2022110103/5697c0221a28abf838cd3465/html5/thumbnails/19.jpg)
Witness Algorithm forEnumeration of Policy Trees
• Maintain a set U of useful policies
• If p is new not dominates by U, then U is not complete:– Find x where p dominates U– Find p’ that dominates all other p on x, and is
lexicographically better than all other p– This p’ is useful – add it to U
• How do we find p’?
![Page 20: Decision Making Under Uncertainty Lec #10: Partially Observable MDPs UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy](https://reader038.vdocuments.us/reader038/viewer/2022110103/5697c0221a28abf838cd3465/html5/thumbnails/20.jpg)
Witness/Sondik’s Algorithm
• Create set U of depth t from U of depth t-1
s zz
sUp
s zzUp
s zUp
pUp
spzastvalsxasRsx
spzastvalasRsx
szpsubtreezastvalasRsx
xV
tz
tz
at
at
])[,,(][max),(]([
)])[,,(max),(]([
))]))[,(,,(),(]([(max
)(max
1
1
![Page 21: Decision Making Under Uncertainty Lec #10: Partially Observable MDPs UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy](https://reader038.vdocuments.us/reader038/viewer/2022110103/5697c0221a28abf838cd3465/html5/thumbnails/21.jpg)
Homework
1. Read readings for next time:
Alex’s paper