projection methods (symbolic tools we have used to do…) ron parr duke university joint work with:...

Projection Methods(Symbolic tools we have used to do…)

Ron Parr

Duke University

Joint work with:

Carlos Guestrin (Stanford)

Daphne Koller (Stanford)

Overview

• Why?– MPDs need value functions– Value function approximation– “Good” approximate value functions

• How– Approximation architecture– Dot products in large state spaces– Expectation in large state spaces– Orthogonal projection– MAX in large state spaces

Why You Need Value Functions

Given current configuration:• Expected value of all widgets produced by factory• Expected number of steps before failure

DBN - MDPs

X

Y

Z

StateVariables

Time t t+1

Action

Adding rewards

X

Y

Z

t t+1

R1

Reward have smallsets of parent variables too

Total reward addssub-rewards:R=R1+R2

R2

Computing Values

RγPVV

ValueFunction

Symbolic transition model(DBN)

Q: Does V have a convenient, compact form?

Compact Models = Compact V?

X

Y

Z

t t+1

R=+1

x x

yz

zyzyzy

yz

zyzyzy

x x

yz

zyzyzy

x x

yz

zyzyzy

x x

t+2 t+3

Enter Value Function Approximation

• Not enough structure for exact, symbolic methods in many domains

• Our approach:– Combine symbolic methods with VFA– Define a restricted class of value functions– Find the “best” in that class– Bound error

Linearly Decomposable Value Functions

Approximate high-dimensional functions witha combination of lower-dimensional functions

Motivation: Multi-attribute utility theory (Keeney & Raifa)

Note:Overlappingis allowed!

Decomposable Value Functions

• Each has a domain of a small set of variables

• Each a feature of a complex system– status of a machine– inventory of a store

• Also: think of each as a basis function

i ii shwsV )()(

~

Linear combination of functions:

ih

ih

ih

Matrix Form

Note for linear Algebra fans: is a linear function in the column space of h1…hk

AwV ˆ

K basis functions

states

h1(s1) h2(s1)...h1(s2) h2(s2)…...

A=

assigns a value to every state

V̂

Defining a fixed point

)(ˆ RPAwAwV

RγPVV Standard fixedpoint equation

Projection operator

Fixed pointWith approximation

We use orthogonal projection to force V to have the desired form.

Solving for the fixed point

RAPAAAAw TTT 1

Theorem: w has a solution for all but finitely many discount factors [Koller & Parr 00]

Note: The existence of a solution is a weaker condition than the contraction property required for iterative, value iteration based methods.

LSTD[Bratdke & Barto 96]

O(k2n)

Key Operations

• Backprojection of a basis function:

• Dot product of two restricted domain basis functions

If these two operations can be done efficiently:

iPh

ji hh

kx1kxk

Solution Cost for k basis functions: matrix inversion

RAPAAAAw TTT 1

kxk

Backprojection = 1-step Expectation

Important: Single step of lookaheadonly - no more

x x

yz

zyzyzy

yz

zyzyzy

x x

X

Y

Z

)(zfh 1)(yzfPh 1

Efficient dot product

Need to compute: s ji shsh )()(

e.g.: h1 = f(x), h2 = f(y)

yz

zy

zy

zy

x x

yz

zy

zy

zy

x x

yz

zy

zy

zy

x x

1h 2h 21 hh

Symbolic Linear VFA

• Incurs only 1 step worth representation blowup– Solve directly for fixed point– Contrast with bisimulation/structured DP

• Exact

• Iterative – representation grows with each step

• No a priori quality guarantees

• a posteriori quality guarantees

Error Bounds

1

)ˆ(ˆˆ*

RVPVVV

How are we doing?:

Claim:•Equivalent to maximizing sum of restricted domain functions•Use a cost network (Dechter 99)

(one-step lookahead expected value)

(max one-step error)

• Can use variable elimination to maximize over state space: [Bertele & Brioschi ‘72]

Cost Networks

),(),(),(max

),(),(max),(),(max

),(),(),(),(max

121,,

4321,,

4321,,,

CBgCAfBAf

DBfDCfCAfBAf

DBfDCfCAfBAf

CBA

DCBA

DCBA

A

D

B C

1f

4f 3f

2f

As in Bayes nets, maximization is exponential in size of largest factor.

NP-hard in general

Here we need only 16, instead of 64 sum operations.

Checkpoint

• Starting with:– Factored model (DBN)– Restricted value function space

(restricted domain basis functions)

• Find fixed point in restricted space

• Bound solution quality a posteriori

• But: Fixed point may not have lowest max norm error

Max-norm Error Minimization

• General max-norm error minimization

• Symbolic operation over large state spaces

RwAPAw wminarg

bHww wminarg

AwPRAw

• Algorithm for finding:

bHww wminarg*

.)()(max

)()(max:

;:;,,...,:

1

1

1

k

iii

s

k

iii

s

k

shwsb

andsbshwtoSubject

MinimizewwVariables

General Max-norm Error Minimization

Solve by Linear Programming: [Cheney ’82]

Symbolic max-norm minimization

• For fixed weights w, compute max-norm:

)()(max sbshwbHwi

iis

However, if basis and target are functions of only a few variables, we can do it efficiently!

Cost Networks can maximize over large state spaces efficiently when function is factored:

niiiXX

XXCwhereCfn

1,)(max

1

Representing the Constraints

• Explicit representation is exponential (|S|=2n):

Sssbshwk

iii 1,)()(

1

If basis and target are factored, can use Cost Networks to represent the constraints:

),(),(max),(),(max 4321,,

DBfDCfCAfBAfDCBA

),(),(

),(),(max

43),(

1

),(121

,,

DBfDCfg

gCAfBAf

CB

CB

CBA

Conclusions

• Value function approximation w/error bounds

• Symbolic operations (no sampling!)

• Methods over large state spaces– Orthogonal Projection– Max-norm error minimization

• Tools over large state spaces– Expectation– Dot product– Max

projection methods (symbolic tools we have used to do…) ron parr duke university joint work with:...

Documents

value iteration

linear function

compact v

large state spacesexpectation

large state spaceswhy

fixed pointcontrast

fixed pointtheorem

compact form