efficient solution algorithms for factored mdps

22
Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn

Upload: elvis

Post on 16-Jan-2016

38 views

Category:

Documents


3 download

DESCRIPTION

Efficient Solution Algorithms for Factored MDPs. by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman. Presented by Arkady Epshteyn. Problem with MDPs. Exponential number of states Example: Sysadmin Problem 4 computers: M 1 , M 2 , M 3 , M 4 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Efficient Solution Algorithms for Factored MDPs

Efficient Solution Algorithms for Factored MDPs

by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman

Presented by Arkady Epshteyn

Page 2: Efficient Solution Algorithms for Factored MDPs

Problem with MDPs

• Exponential number of states• Example: Sysadmin Problem

• 4 computers: M1, M2 , M3 , M4

• Each machine is working or has failed.• State space: 24

• 8 actions: whether to reboot each machine or not• Reward: depends on the number of working

machines

Page 3: Efficient Solution Algorithms for Factored MDPs

Factored Representation

• Transition model: DBN• Reward model:

k

j

j xrxR1

)()(

Page 4: Efficient Solution Algorithms for Factored MDPs

Approximate Value Function

• Linear value function:

• Basis functions:

hi(Xi=true)=1

hi(Xi=false)=0

h0=1

k

j

jj xhwxV1

)()(

Page 5: Efficient Solution Algorithms for Factored MDPs

Markov Decision Processes

'

)( )'()|'()()(x

x xVxxPxRxV For fixed policy :

The optimal value function V*:

])'(*)|'()([max)(*'

x

aaa

xVxxPxRxV

Page 6: Efficient Solution Algorithms for Factored MDPs

Solving MDPMethod 1: Policy Iteration

• Value determination

• Policy Improvement

'

)()( )'()|'()()(x

txx

t xVxxPxRxV

•Polynomial in the number of states N•Exponential in the number of variables K

])'()|'()([maxarg)('

1

x

taa

a

t xVxxPxRx

Page 7: Efficient Solution Algorithms for Factored MDPs

Solving MDPMethod 2: Linear Programming

Intuition: compare with the fixed point of V(x):

axVxxPxRVtoSubject

xiVxMinimize

VVVariables

i

j

jijaai

i

x

ii

N

i

,,)|()(:

0)(:,)(:

,...,: 1

•Polynomial in the number of states N•Exponential in the number of variables

])'(*)|'()([max)(*'

x

aaa

xVxxPxRxV

Page 8: Efficient Solution Algorithms for Factored MDPs

Value Function Approximation

axxhwxxPxRxhwtoSubject

xixhwxMinimize

wwVariables

i

ii

x

aa

i

ii

x

k

i

ii

K

,,)'()|()()(:

0)(:,)()(:

,...,:

'

'

1

1

axVxxPxRVtoSubject

xiVxMinimize

VVVariables

i

j

jijaai

i

x

ii

N

i

,,)|()(:

0)(:,)(:

,...,: 1

Page 9: Efficient Solution Algorithms for Factored MDPs

Objective function

axxhwxxPxRxhwtoSubject

xixhwxMinimize

wwVariables

i

ii

x

aa

i

ii

i

x i

ii

K

,,)'()|()()(:

0)(:,)()(:

,...,:

'

'

1

•Objective function polynomial in the number of basis functions

i

i

Cx

i

i

ii

c

ii

i

i

x

i

x i

ii

xcwhere

chcw

xhxw

xhwx

)()(

,)()(

)()(

)()(

Page 10: Efficient Solution Algorithms for Factored MDPs

Each Constraint: Backprojection

axxhwxxPxRxhwtoSubject

xixhwxMinimize

wwVariables

i

ii

x

aa

i

ii

i

x i

ii

K

,,)'()|()()(:

0)(:,)()(:

,...,:

'

'

1

i

i

x

ai

i

ii

x

a xhxxPwxhwxxP )'()|()'()|('

'

'

'

))(|(

)|(

)|'(

iii

ii

i

cpacEh

xcEh

xxEh

Page 11: Efficient Solution Algorithms for Factored MDPs

Representing Exponentially Many Constraints

axxhwxxPxRxhwtoSubject

xixhwxMinimize

wwVariables

i

ii

x

aa

i

ii

i

x i

ii

K

,,)'()|()()(:

0)(:,)()(:

,...,:

'

'

1

axRxhxhxxPw

axxRxhxhxxPw

axxhwxxPxRxhw

a

i

ii

x

aix

a

i

ii

x

ai

i

ii

x

aa

i

ii

),()]()'()|([max0

,),()]()'()|([0

,,)'()|()()(

'

'

'

'

'

'

Page 12: Efficient Solution Algorithms for Factored MDPs

Restricted Domain

i j

jiix

a

i

iaii

x

a

i

ii

x

aix

xrxfw

xRxhxgw

axRxhxhxxPw

)()(max

)()]()([max

),()]()'()|([max0'

'

1. Backprojection - depends on few variables2. Basis function3. Reward function

1 2 3

Page 13: Efficient Solution Algorithms for Factored MDPs

Variable Elimination

)],(),([max),(

)],(),(),([max

)]],(),([max),(),([max

),(),(),(),(max

)()(max

4324214

321

321312221113,2,1

4324214

312221113,2,1

432421312221114,3,2,1

xxrxxrxxewhere

xxexxfwxxfw

xxrxxrxxfwxxfw

xxrxxrxxfwxxfw

xrxfw

x

xxx

xxxx

xxxx

i j

jiix

- similar to Bayesian Networks

Page 14: Efficient Solution Algorithms for Factored MDPs

Maximization as Linear Constraints

...

),(),(),(

),(),(),(

),(),(),(

),(),(),(

:sconstrainttoEquivalent

)],(),([max),(

432421321

432421321

432421321

432421321

4324214

321

xxrxxrxxe

xxrxxrxxe

xxrxxrxxe

xxrxxrxxe

xxrxxrxxex

• Exponential in the size of each function’s domain, not the number of states

Page 15: Efficient Solution Algorithms for Factored MDPs

Factored LP: Scaling

Page 16: Efficient Solution Algorithms for Factored MDPs

Rule-based Representation

Page 17: Efficient Solution Algorithms for Factored MDPs

Approximate Value Function

k

j hRule

ij

k

j

jj

k

j

jj

ji

xxxxRulew

xxxxhwxhwxV

1

4321

1

4321

1

),,,(

),,,()()(

x1

x30

5 0.6

h1:

6.0:,:

5:,:

0::

313

312

11

xxRule

xxRule

xRule

Notice: compact representation (2/4 variables, 3/16 rules)

Page 18: Efficient Solution Algorithms for Factored MDPs

Summing Over Rules

k

j hRule

ij

ji

xxxxRulewxV1

4321 ),,,()(

x1

x3u1

u2 u3

h1(x)

x2

x1u4

u5

h2(x)

+

u6

=

x2

x1

u1+u4

u2+u6 u3+u6

x1

x3 x3u5+u1

u2+u4 u3+u4

Page 19: Efficient Solution Algorithms for Factored MDPs

Multiplying over Rules

• Analogous construction

axRxhxhxxPw a

i

ii

x

aix

),()]()'()|([max0'

'

Page 20: Efficient Solution Algorithms for Factored MDPs

Rule-based MaximizationaxRxhxhxxPw a

i

ii

x

aix

),()]()'()|([max0'

'

x1

x2u1

u2 x3

u3 u4

Eliminate x2

x1

x3u1

max(u2,u3) max(u2,u4)

Page 21: Efficient Solution Algorithms for Factored MDPs

Rule-based Linear Program

• Backprojection, objective function – handled in a similar way

• All the operations (summation, multiplication, maximization) – keep rule representation intact

• is a linear function ji hRule

ij xxxxRulew ),,,( 4321

Page 22: Efficient Solution Algorithms for Factored MDPs

Conclusions

• Compact representation can be exploited to solve MDPs with exponentially many states efficiently.

• Still NP-complete in the worst case.• Factored solution may increase the size of LP

when the number of states is small (but it scales better).

• Success depends on the choice of the basis functions for value approximation and the factored decomposition of rewards and transition probabilities.