projection methods (symbolic tools we have used to do…) ron parr duke university joint work with:...

25
Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)

Upload: milton-dorsey

Post on 18-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)

Projection Methods(Symbolic tools we have used to do…)

Ron Parr

Duke University

Joint work with:

Carlos Guestrin (Stanford)

Daphne Koller (Stanford)

Page 2: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)

Overview

• Why?– MPDs need value functions– Value function approximation– “Good” approximate value functions

• How– Approximation architecture– Dot products in large state spaces– Expectation in large state spaces– Orthogonal projection– MAX in large state spaces

Page 3: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)

Why You Need Value Functions

Given current configuration:• Expected value of all widgets produced by factory• Expected number of steps before failure

Page 4: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)

DBN - MDPs

X

Y

Z

StateVariables

Time t t+1

Action

Page 5: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)

Adding rewards

X

Y

Z

t t+1

R1

Reward have smallsets of parent variables too

Total reward addssub-rewards:R=R1+R2

R2

Page 6: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)

Computing Values

RγPVV

ValueFunction

Symbolic transition model(DBN)

Q: Does V have a convenient, compact form?

Page 7: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)

Compact Models = Compact V?

X

Y

Z

t t+1

R=+1

x x

yz

zyzyzy

yz

zyzyzy

x x

yz

zyzyzy

x x

yz

zyzyzy

x x

t+2 t+3

Page 8: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)

Enter Value Function Approximation

• Not enough structure for exact, symbolic methods in many domains

• Our approach:– Combine symbolic methods with VFA– Define a restricted class of value functions– Find the “best” in that class– Bound error

Page 9: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)

Linearly Decomposable Value Functions

Approximate high-dimensional functions witha combination of lower-dimensional functions

Motivation: Multi-attribute utility theory (Keeney & Raifa)

Note:Overlappingis allowed!

Page 10: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)

Decomposable Value Functions

• Each has a domain of a small set of variables

• Each a feature of a complex system– status of a machine– inventory of a store

• Also: think of each as a basis function

i ii shwsV )()(

~

Linear combination of functions:

ih

ih

ih

Page 11: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)

Matrix Form

Note for linear Algebra fans: is a linear function in the column space of h1…hk

AwV ˆ

K basis functions

states

h1(s1) h2(s1)...h1(s2) h2(s2)…...

A=

assigns a value to every state

Page 12: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)

Defining a fixed point

)(ˆ RPAwAwV

RγPVV Standard fixedpoint equation

Projection operator

Fixed pointWith approximation

We use orthogonal projection to force V to have the desired form.

Page 13: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)

Solving for the fixed point

RAPAAAAw TTT 1

Theorem: w has a solution for all but finitely many discount factors [Koller & Parr 00]

Note: The existence of a solution is a weaker condition than the contraction property required for iterative, value iteration based methods.

LSTD[Bratdke & Barto 96]

O(k2n)

Page 14: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)

Key Operations

• Backprojection of a basis function:

• Dot product of two restricted domain basis functions

If these two operations can be done efficiently:

iPh

ji hh

kx1kxk

Solution Cost for k basis functions: matrix inversion

RAPAAAAw TTT 1

kxk

Page 15: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)

Backprojection = 1-step Expectation

Important: Single step of lookaheadonly - no more

x x

yz

zyzyzy

yz

zyzyzy

x x

X

Y

Z

)(zfh 1)(yzfPh 1

Page 16: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)

Efficient dot product

Need to compute: s ji shsh )()(

e.g.: h1 = f(x), h2 = f(y)

yz

zy

zy

zy

x x

yz

zy

zy

zy

x x

yz

zy

zy

zy

x x

1h 2h 21 hh

Page 17: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)

Symbolic Linear VFA

• Incurs only 1 step worth representation blowup– Solve directly for fixed point– Contrast with bisimulation/structured DP

• Exact

• Iterative – representation grows with each step

• No a priori quality guarantees

• a posteriori quality guarantees

Page 18: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)

Error Bounds

1

)ˆ(ˆˆ*

RVPVVV

How are we doing?:

Claim:•Equivalent to maximizing sum of restricted domain functions•Use a cost network (Dechter 99)

(one-step lookahead expected value)

(max one-step error)

Page 19: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)

• Can use variable elimination to maximize over state space: [Bertele & Brioschi ‘72]

Cost Networks

),(),(),(max

),(),(max),(),(max

),(),(),(),(max

121,,

4321,,

4321,,,

CBgCAfBAf

DBfDCfCAfBAf

DBfDCfCAfBAf

CBA

DCBA

DCBA

A

D

B C

1f

4f 3f

2f

As in Bayes nets, maximization is exponential in size of largest factor.

NP-hard in general

Here we need only 16, instead of 64 sum operations.

Page 20: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)

Checkpoint

• Starting with:– Factored model (DBN)– Restricted value function space

(restricted domain basis functions)

• Find fixed point in restricted space

• Bound solution quality a posteriori

• But: Fixed point may not have lowest max norm error

Page 21: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)

Max-norm Error Minimization

• General max-norm error minimization

• Symbolic operation over large state spaces

RwAPAw wminarg

bHww wminarg

AwPRAw

Page 22: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)

• Algorithm for finding:

bHww wminarg*

.)()(max

)()(max:

;:;,,...,:

1

1

1

k

iii

s

k

iii

s

k

shwsb

andsbshwtoSubject

MinimizewwVariables

General Max-norm Error Minimization

Solve by Linear Programming: [Cheney ’82]

Page 23: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)

Symbolic max-norm minimization

• For fixed weights w, compute max-norm:

)()(max sbshwbHwi

iis

However, if basis and target are functions of only a few variables, we can do it efficiently!

Cost Networks can maximize over large state spaces efficiently when function is factored:

niiiXX

XXCwhereCfn

1,)(max

1

Page 24: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)

Representing the Constraints

• Explicit representation is exponential (|S|=2n):

Sssbshwk

iii 1,)()(

1

If basis and target are factored, can use Cost Networks to represent the constraints:

),(),(max),(),(max 4321,,

DBfDCfCAfBAfDCBA

),(),(

),(),(max

43),(

1

),(121

,,

DBfDCfg

gCAfBAf

CB

CB

CBA

Page 25: Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)

Conclusions

• Value function approximation w/error bounds

• Symbolic operations (no sampling!)

• Methods over large state spaces– Orthogonal Projection– Max-norm error minimization

• Tools over large state spaces– Expectation– Dot product– Max