probability cse 473 – autumn 2003 henry kautz. expectimax

10
Probability CSE 473 – Autumn 2003 Henry Kautz

Upload: shannon-snow

Post on 18-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Probability CSE 473 – Autumn 2003 Henry Kautz. ExpectiMax

Probability

CSE 473 – Autumn 2003

Henry Kautz

Page 2: Probability CSE 473 – Autumn 2003 Henry Kautz. ExpectiMax

ExpectiMax

node chance a isn if )(ExpectiMax)(

nodemax isn if )}(children|)(ExpectiMaxmax{

node terminala isn if )(

)(ExpectiMax

)(

nchildrens

ssP

nss

nU

n

Page 3: Probability CSE 473 – Autumn 2003 Henry Kautz. ExpectiMax

Hungry Monkey: 2-Ply Game Tree

0 0 1 0 0 0 1 0 1 1 2 1 0 0 1 0

jump

jump jumpjump

jump

shake

shake shake shakeshake

2/3

2/3 2/3 2/3 2/3 2/3

1/3

1/3 1/3 1/3 1/3 1/3

1/6 5/6

1/6 1/61/6 5/6 5/6 5/6

Page 4: Probability CSE 473 – Autumn 2003 Henry Kautz. ExpectiMax

ExpectiMax 1 – Chance Nodes

0 2/3

0 0 1 0

0 1/6

0 0 1 0

1 7/6

1 1 2 1

0 1/6

0 0 1 0

jump

jump jumpjump

jump

shake

shake shake shakeshake

2/3

2/3 2/32/3 2/3 2/3

1/3

1/3 1/3 1/3 1/3 1/3

1/6 5/6

1/6 1/61/6 5/6 5/6 5/6

Page 5: Probability CSE 473 – Autumn 2003 Henry Kautz. ExpectiMax

ExpectiMax 2 – Max Nodes

2/3

0 2/3

0 0 1 0

1/6

0 1/6

0 0 1 0

7/6

1 7/6

1 1 2 1

1/6

0 1/6

0 0 1 0

jump

jump jumpjump

jump

shake

shake shake shakeshake

2/3

2/3 2/32/3 2/3 2/3

1/3

1/3 1/3 1/3 1/3 1/3

1/6 5/6

1/6 1/61/6 5/6 5/6 5/6

Page 6: Probability CSE 473 – Autumn 2003 Henry Kautz. ExpectiMax

ExpectiMax 3 – Chance Nodes

1/2 1/3

2/3

0 2/3

0 0 1 0

1/6

0 1/6

0 0 1 0

7/6

1 7/6

1 1 2 1

1/6

0 1/6

0 0 1 0

jump

jump jumpjump

jump

shake

shake shake shakeshake

2/3

2/3 2/32/3 2/3 2/3

1/3

1/3 1/3 1/3 1/3 1/3

1/6 5/6

1/6 1/61/6 5/6 5/6 5/6

Page 7: Probability CSE 473 – Autumn 2003 Henry Kautz. ExpectiMax

ExpectiMax 4 – Max Node

1/2

1/2 1/3

2/3

0 2/3

0 0 1 0

1/6

0 1/6

0 0 1 0

7/6

1 7/6

1 1 2 1

1/6

0 1/6

0 0 1 0

jump

jump jumpjump

jump

shake

shake shake shakeshake

2/3

2/3 2/32/3 2/3 2/3

1/3

1/3 1/3 1/3 1/3 1/3

1/6 5/6

1/6 1/61/6 5/6 5/6 5/6

Page 8: Probability CSE 473 – Autumn 2003 Henry Kautz. ExpectiMax

Policies

• The result of the ExpectiMax analysis is a conditional plan (also called a policy):– Optimal plan for 2 steps: jump; shake– Optimal plan for 3 steps:

jump; if (ontable) {shake; shake}

else {jump; shake}

• Probabilistic planning can be generalized in many ways, including:– Action costs– Hidden state

• The general problem is that of solving a Markov Decision Process (MDP)

Page 9: Probability CSE 473 – Autumn 2003 Henry Kautz. ExpectiMax

2 Player Games of Chance

( )

ExpectiMiniMax( )

( ) if n is a terminal node

max{ExpectiMiniMax( ) | children( )} if n is max node

min{ExpectiMiniMax( ) | children( )} if n is min node

( )ExpectiMiniMax( ) if n is a cs children n

n

U n

s s n

s s n

P s s

hance node

Page 10: Probability CSE 473 – Autumn 2003 Henry Kautz. ExpectiMax

Backgammon• Branching factor:

– Chance node: 21– Max node: about 20 on average– Size of tree: O(ckmk)– In practice: can search 3 plies

• Neurogammon & TD-Gammon (Tesauro 1995)– Learned weights on static evaluation function by playing

against itself

– Use results of games to optimize weights:• “Punish” features that were on in losing games

• “Reward” features that were on in winning games

– A kind of reinforcement learning

– Became world’s best backgammon player!