probability cse 473 – autumn 2003 henry kautz. expectimax

Probability

CSE 473 – Autumn 2003

Henry Kautz

ExpectiMax

node chance a isn if )(ExpectiMax)(

nodemax isn if )}(children|)(ExpectiMaxmax{

node terminala isn if )(

)(ExpectiMax

)(

nchildrens

ssP

nss

nU

n

Hungry Monkey: 2-Ply Game Tree

0 0 1 0 0 0 1 0 1 1 2 1 0 0 1 0

jump

jump jumpjump

jump

shake

shake shake shakeshake

2/3

2/3 2/3 2/3 2/3 2/3

1/3

1/3 1/3 1/3 1/3 1/3

1/6 5/6

1/6 1/61/6 5/6 5/6 5/6

ExpectiMax 1 – Chance Nodes

0 2/3

0 0 1 0

0 1/6

0 0 1 0

1 7/6

1 1 2 1

0 1/6

0 0 1 0

jump

jump jumpjump

jump

shake


2/3

2/3 2/32/3 2/3 2/3

1/3

1/3 1/3 1/3 1/3 1/3

1/6 5/6

1/6 1/61/6 5/6 5/6 5/6

ExpectiMax 2 – Max Nodes

2/3

0 2/3

0 0 1 0

1/6

0 1/6

0 0 1 0

7/6

1 7/6

1 1 2 1

1/6

0 1/6

0 0 1 0

jump

jump jumpjump

jump

shake


2/3

2/3 2/32/3 2/3 2/3

1/3

1/3 1/3 1/3 1/3 1/3

1/6 5/6

1/6 1/61/6 5/6 5/6 5/6

ExpectiMax 3 – Chance Nodes

1/2 1/3

2/3

0 2/3

0 0 1 0

1/6

0 1/6

0 0 1 0

7/6

1 7/6

1 1 2 1

1/6

0 1/6

0 0 1 0

jump

jump jumpjump

jump

shake


2/3

2/3 2/32/3 2/3 2/3

1/3

1/3 1/3 1/3 1/3 1/3

1/6 5/6

1/6 1/61/6 5/6 5/6 5/6

ExpectiMax 4 – Max Node

1/2

1/2 1/3

2/3

0 2/3

0 0 1 0

1/6

0 1/6

0 0 1 0

7/6

1 7/6

1 1 2 1

1/6

0 1/6

0 0 1 0

jump

jump jumpjump

jump

shake


2/3

2/3 2/32/3 2/3 2/3

1/3

1/3 1/3 1/3 1/3 1/3

1/6 5/6

1/6 1/61/6 5/6 5/6 5/6

Policies

• The result of the ExpectiMax analysis is a conditional plan (also called a policy):– Optimal plan for 2 steps: jump; shake– Optimal plan for 3 steps:

jump; if (ontable) {shake; shake}

else {jump; shake}

• Probabilistic planning can be generalized in many ways, including:– Action costs– Hidden state

• The general problem is that of solving a Markov Decision Process (MDP)

2 Player Games of Chance

( )

ExpectiMiniMax( )

( ) if n is a terminal node

max{ExpectiMiniMax( ) | children( )} if n is max node

min{ExpectiMiniMax( ) | children( )} if n is min node

( )ExpectiMiniMax( ) if n is a cs children n

n

U n

s s n

s s n

P s s

hance node

Backgammon• Branching factor:

– Chance node: 21– Max node: about 20 on average– Size of tree: O(ckmk)– In practice: can search 3 plies

• Neurogammon & TD-Gammon (Tesauro 1995)– Learned weights on static evaluation function by playing

against itself

– Use results of games to optimize weights:• “Punish” features that were on in losing games

• “Reward” features that were on in winning games

– A kind of reinforcement learning

– Became world’s best backgammon player!

http://www.eclecticon.net/backgammon/backgammon.gif

probability cse 473 – autumn 2003 henry kautz. expectimax

Documents