Projection Methods(Symbolic tools we have used to do…)
Ron Parr
Duke University
Joint work with:
Carlos Guestrin (Stanford)
Daphne Koller (Stanford)
Overview
• Why?– MPDs need value functions– Value function approximation– “Good” approximate value functions
• How– Approximation architecture– Dot products in large state spaces– Expectation in large state spaces– Orthogonal projection– MAX in large state spaces
Why You Need Value Functions
Given current configuration:• Expected value of all widgets produced by factory• Expected number of steps before failure
DBN - MDPs
X
Y
Z
StateVariables
Time t t+1
Action
Adding rewards
X
Y
Z
t t+1
R1
Reward have smallsets of parent variables too
Total reward addssub-rewards:R=R1+R2
R2
Computing Values
RγPVV
ValueFunction
Symbolic transition model(DBN)
Q: Does V have a convenient, compact form?
Compact Models = Compact V?
X
Y
Z
t t+1
R=+1
x x
yz
zyzyzy
yz
zyzyzy
x x
yz
zyzyzy
x x
yz
zyzyzy
x x
t+2 t+3
Enter Value Function Approximation
• Not enough structure for exact, symbolic methods in many domains
• Our approach:– Combine symbolic methods with VFA– Define a restricted class of value functions– Find the “best” in that class– Bound error
Linearly Decomposable Value Functions
Approximate high-dimensional functions witha combination of lower-dimensional functions
Motivation: Multi-attribute utility theory (Keeney & Raifa)
Note:Overlappingis allowed!
Decomposable Value Functions
• Each has a domain of a small set of variables
• Each a feature of a complex system– status of a machine– inventory of a store
• Also: think of each as a basis function
i ii shwsV )()(
~
Linear combination of functions:
ih
ih
ih
Matrix Form
Note for linear Algebra fans: is a linear function in the column space of h1…hk
AwV ˆ
K basis functions
states
h1(s1) h2(s1)...h1(s2) h2(s2)…...
A=
assigns a value to every state
V̂
Defining a fixed point
)(ˆ RPAwAwV
RγPVV Standard fixedpoint equation
Projection operator
Fixed pointWith approximation
We use orthogonal projection to force V to have the desired form.
Solving for the fixed point
RAPAAAAw TTT 1
Theorem: w has a solution for all but finitely many discount factors [Koller & Parr 00]
Note: The existence of a solution is a weaker condition than the contraction property required for iterative, value iteration based methods.
LSTD[Bratdke & Barto 96]
O(k2n)
Key Operations
• Backprojection of a basis function:
• Dot product of two restricted domain basis functions
If these two operations can be done efficiently:
iPh
ji hh
kx1kxk
Solution Cost for k basis functions: matrix inversion
RAPAAAAw TTT 1
kxk
Backprojection = 1-step Expectation
Important: Single step of lookaheadonly - no more
x x
yz
zyzyzy
yz
zyzyzy
x x
X
Y
Z
)(zfh 1)(yzfPh 1
Efficient dot product
Need to compute: s ji shsh )()(
e.g.: h1 = f(x), h2 = f(y)
yz
zy
zy
zy
x x
yz
zy
zy
zy
x x
yz
zy
zy
zy
x x
1h 2h 21 hh
Symbolic Linear VFA
• Incurs only 1 step worth representation blowup– Solve directly for fixed point– Contrast with bisimulation/structured DP
• Exact
• Iterative – representation grows with each step
• No a priori quality guarantees
• a posteriori quality guarantees
Error Bounds
1
)ˆ(ˆˆ*
RVPVVV
How are we doing?:
Claim:•Equivalent to maximizing sum of restricted domain functions•Use a cost network (Dechter 99)
(one-step lookahead expected value)
(max one-step error)
• Can use variable elimination to maximize over state space: [Bertele & Brioschi ‘72]
Cost Networks
),(),(),(max
),(),(max),(),(max
),(),(),(),(max
121,,
4321,,
4321,,,
CBgCAfBAf
DBfDCfCAfBAf
DBfDCfCAfBAf
CBA
DCBA
DCBA
A
D
B C
1f
4f 3f
2f
As in Bayes nets, maximization is exponential in size of largest factor.
NP-hard in general
Here we need only 16, instead of 64 sum operations.
Checkpoint
• Starting with:– Factored model (DBN)– Restricted value function space
(restricted domain basis functions)
• Find fixed point in restricted space
• Bound solution quality a posteriori
• But: Fixed point may not have lowest max norm error
Max-norm Error Minimization
• General max-norm error minimization
• Symbolic operation over large state spaces
RwAPAw wminarg
bHww wminarg
AwPRAw
• Algorithm for finding:
bHww wminarg*
.)()(max
)()(max:
;:;,,...,:
1
1
1
k
iii
s
k
iii
s
k
shwsb
andsbshwtoSubject
MinimizewwVariables
General Max-norm Error Minimization
Solve by Linear Programming: [Cheney ’82]
Symbolic max-norm minimization
• For fixed weights w, compute max-norm:
)()(max sbshwbHwi
iis
However, if basis and target are functions of only a few variables, we can do it efficiently!
Cost Networks can maximize over large state spaces efficiently when function is factored:
niiiXX
XXCwhereCfn
1,)(max
1
Representing the Constraints
• Explicit representation is exponential (|S|=2n):
Sssbshwk
iii 1,)()(
1
If basis and target are factored, can use Cost Networks to represent the constraints:
),(),(max),(),(max 4321,,
DBfDCfCAfBAfDCBA
),(),(
),(),(max
43),(
1
),(121
,,
DBfDCfg
gCAfBAf
CB
CB
CBA
Conclusions
• Value function approximation w/error bounds
• Symbolic operations (no sampling!)
• Methods over large state spaces– Orthogonal Projection– Max-norm error minimization
• Tools over large state spaces– Expectation– Dot product– Max