projection methods (symbolic tools we have used to do…) ron parr duke university joint work with:...
TRANSCRIPT
![Page 1: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)](https://reader033.vdocuments.us/reader033/viewer/2022051517/5697bfe91a28abf838cb677c/html5/thumbnails/1.jpg)
Projection Methods(Symbolic tools we have used to do…)
Ron Parr
Duke University
Joint work with:
Carlos Guestrin (Stanford)
Daphne Koller (Stanford)
![Page 2: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)](https://reader033.vdocuments.us/reader033/viewer/2022051517/5697bfe91a28abf838cb677c/html5/thumbnails/2.jpg)
Overview
• Why?– MPDs need value functions– Value function approximation– “Good” approximate value functions
• How– Approximation architecture– Dot products in large state spaces– Expectation in large state spaces– Orthogonal projection– MAX in large state spaces
![Page 3: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)](https://reader033.vdocuments.us/reader033/viewer/2022051517/5697bfe91a28abf838cb677c/html5/thumbnails/3.jpg)
Why You Need Value Functions
Given current configuration:• Expected value of all widgets produced by factory• Expected number of steps before failure
![Page 4: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)](https://reader033.vdocuments.us/reader033/viewer/2022051517/5697bfe91a28abf838cb677c/html5/thumbnails/4.jpg)
DBN - MDPs
X
Y
Z
StateVariables
Time t t+1
Action
![Page 5: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)](https://reader033.vdocuments.us/reader033/viewer/2022051517/5697bfe91a28abf838cb677c/html5/thumbnails/5.jpg)
Adding rewards
X
Y
Z
t t+1
R1
Reward have smallsets of parent variables too
Total reward addssub-rewards:R=R1+R2
R2
![Page 6: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)](https://reader033.vdocuments.us/reader033/viewer/2022051517/5697bfe91a28abf838cb677c/html5/thumbnails/6.jpg)
Computing Values
RγPVV
ValueFunction
Symbolic transition model(DBN)
Q: Does V have a convenient, compact form?
![Page 7: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)](https://reader033.vdocuments.us/reader033/viewer/2022051517/5697bfe91a28abf838cb677c/html5/thumbnails/7.jpg)
Compact Models = Compact V?
X
Y
Z
t t+1
R=+1
x x
yz
zyzyzy
yz
zyzyzy
x x
yz
zyzyzy
x x
yz
zyzyzy
x x
t+2 t+3
![Page 8: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)](https://reader033.vdocuments.us/reader033/viewer/2022051517/5697bfe91a28abf838cb677c/html5/thumbnails/8.jpg)
Enter Value Function Approximation
• Not enough structure for exact, symbolic methods in many domains
• Our approach:– Combine symbolic methods with VFA– Define a restricted class of value functions– Find the “best” in that class– Bound error
![Page 9: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)](https://reader033.vdocuments.us/reader033/viewer/2022051517/5697bfe91a28abf838cb677c/html5/thumbnails/9.jpg)
Linearly Decomposable Value Functions
Approximate high-dimensional functions witha combination of lower-dimensional functions
Motivation: Multi-attribute utility theory (Keeney & Raifa)
Note:Overlappingis allowed!
![Page 10: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)](https://reader033.vdocuments.us/reader033/viewer/2022051517/5697bfe91a28abf838cb677c/html5/thumbnails/10.jpg)
Decomposable Value Functions
• Each has a domain of a small set of variables
• Each a feature of a complex system– status of a machine– inventory of a store
• Also: think of each as a basis function
i ii shwsV )()(
~
Linear combination of functions:
ih
ih
ih
![Page 11: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)](https://reader033.vdocuments.us/reader033/viewer/2022051517/5697bfe91a28abf838cb677c/html5/thumbnails/11.jpg)
Matrix Form
Note for linear Algebra fans: is a linear function in the column space of h1…hk
AwV ˆ
K basis functions
states
h1(s1) h2(s1)...h1(s2) h2(s2)…...
A=
assigns a value to every state
V̂
![Page 12: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)](https://reader033.vdocuments.us/reader033/viewer/2022051517/5697bfe91a28abf838cb677c/html5/thumbnails/12.jpg)
Defining a fixed point
)(ˆ RPAwAwV
RγPVV Standard fixedpoint equation
Projection operator
Fixed pointWith approximation
We use orthogonal projection to force V to have the desired form.
![Page 13: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)](https://reader033.vdocuments.us/reader033/viewer/2022051517/5697bfe91a28abf838cb677c/html5/thumbnails/13.jpg)
Solving for the fixed point
RAPAAAAw TTT 1
Theorem: w has a solution for all but finitely many discount factors [Koller & Parr 00]
Note: The existence of a solution is a weaker condition than the contraction property required for iterative, value iteration based methods.
LSTD[Bratdke & Barto 96]
O(k2n)
![Page 14: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)](https://reader033.vdocuments.us/reader033/viewer/2022051517/5697bfe91a28abf838cb677c/html5/thumbnails/14.jpg)
Key Operations
• Backprojection of a basis function:
• Dot product of two restricted domain basis functions
If these two operations can be done efficiently:
iPh
ji hh
kx1kxk
Solution Cost for k basis functions: matrix inversion
RAPAAAAw TTT 1
kxk
![Page 15: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)](https://reader033.vdocuments.us/reader033/viewer/2022051517/5697bfe91a28abf838cb677c/html5/thumbnails/15.jpg)
Backprojection = 1-step Expectation
Important: Single step of lookaheadonly - no more
x x
yz
zyzyzy
yz
zyzyzy
x x
X
Y
Z
)(zfh 1)(yzfPh 1
![Page 16: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)](https://reader033.vdocuments.us/reader033/viewer/2022051517/5697bfe91a28abf838cb677c/html5/thumbnails/16.jpg)
Efficient dot product
Need to compute: s ji shsh )()(
e.g.: h1 = f(x), h2 = f(y)
yz
zy
zy
zy
x x
yz
zy
zy
zy
x x
yz
zy
zy
zy
x x
1h 2h 21 hh
![Page 17: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)](https://reader033.vdocuments.us/reader033/viewer/2022051517/5697bfe91a28abf838cb677c/html5/thumbnails/17.jpg)
Symbolic Linear VFA
• Incurs only 1 step worth representation blowup– Solve directly for fixed point– Contrast with bisimulation/structured DP
• Exact
• Iterative – representation grows with each step
• No a priori quality guarantees
• a posteriori quality guarantees
![Page 18: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)](https://reader033.vdocuments.us/reader033/viewer/2022051517/5697bfe91a28abf838cb677c/html5/thumbnails/18.jpg)
Error Bounds
1
)ˆ(ˆˆ*
RVPVVV
How are we doing?:
Claim:•Equivalent to maximizing sum of restricted domain functions•Use a cost network (Dechter 99)
(one-step lookahead expected value)
(max one-step error)
![Page 19: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)](https://reader033.vdocuments.us/reader033/viewer/2022051517/5697bfe91a28abf838cb677c/html5/thumbnails/19.jpg)
• Can use variable elimination to maximize over state space: [Bertele & Brioschi ‘72]
Cost Networks
),(),(),(max
),(),(max),(),(max
),(),(),(),(max
121,,
4321,,
4321,,,
CBgCAfBAf
DBfDCfCAfBAf
DBfDCfCAfBAf
CBA
DCBA
DCBA
A
D
B C
1f
4f 3f
2f
As in Bayes nets, maximization is exponential in size of largest factor.
NP-hard in general
Here we need only 16, instead of 64 sum operations.
![Page 20: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)](https://reader033.vdocuments.us/reader033/viewer/2022051517/5697bfe91a28abf838cb677c/html5/thumbnails/20.jpg)
Checkpoint
• Starting with:– Factored model (DBN)– Restricted value function space
(restricted domain basis functions)
• Find fixed point in restricted space
• Bound solution quality a posteriori
• But: Fixed point may not have lowest max norm error
![Page 21: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)](https://reader033.vdocuments.us/reader033/viewer/2022051517/5697bfe91a28abf838cb677c/html5/thumbnails/21.jpg)
Max-norm Error Minimization
• General max-norm error minimization
• Symbolic operation over large state spaces
RwAPAw wminarg
bHww wminarg
AwPRAw
![Page 22: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)](https://reader033.vdocuments.us/reader033/viewer/2022051517/5697bfe91a28abf838cb677c/html5/thumbnails/22.jpg)
• Algorithm for finding:
bHww wminarg*
.)()(max
)()(max:
;:;,,...,:
1
1
1
k
iii
s
k
iii
s
k
shwsb
andsbshwtoSubject
MinimizewwVariables
General Max-norm Error Minimization
Solve by Linear Programming: [Cheney ’82]
![Page 23: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)](https://reader033.vdocuments.us/reader033/viewer/2022051517/5697bfe91a28abf838cb677c/html5/thumbnails/23.jpg)
Symbolic max-norm minimization
• For fixed weights w, compute max-norm:
)()(max sbshwbHwi
iis
However, if basis and target are functions of only a few variables, we can do it efficiently!
Cost Networks can maximize over large state spaces efficiently when function is factored:
niiiXX
XXCwhereCfn
1,)(max
1
![Page 24: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)](https://reader033.vdocuments.us/reader033/viewer/2022051517/5697bfe91a28abf838cb677c/html5/thumbnails/24.jpg)
Representing the Constraints
• Explicit representation is exponential (|S|=2n):
Sssbshwk
iii 1,)()(
1
If basis and target are factored, can use Cost Networks to represent the constraints:
),(),(max),(),(max 4321,,
DBfDCfCAfBAfDCBA
),(),(
),(),(max
43),(
1
),(121
,,
DBfDCfg
gCAfBAf
CB
CB
CBA
![Page 25: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)](https://reader033.vdocuments.us/reader033/viewer/2022051517/5697bfe91a28abf838cb677c/html5/thumbnails/25.jpg)
Conclusions
• Value function approximation w/error bounds
• Symbolic operations (no sampling!)
• Methods over large state spaces– Orthogonal Projection– Max-norm error minimization
• Tools over large state spaces– Expectation– Dot product– Max