games of chance introduction to artificial intelligence cos302 michael l. littman fall 2001
Post on 21-Dec-2015
215 views
TRANSCRIPT
![Page 1: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/1.jpg)
Games of ChanceGames of Chance
Introduction toIntroduction toArtificial IntelligenceArtificial Intelligence
COS302COS302
Michael L. LittmanMichael L. Littman
Fall 2001Fall 2001
![Page 2: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/2.jpg)
AdministrationAdministration
Rush hour (10/22).Rush hour (10/22).
Today not part of midterm (10/24), Today not part of midterm (10/24), just final.just final.
![Page 3: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/3.jpg)
Uncertainty in SearchUncertainty in Search
We’ve assumed everything is known: We’ve assumed everything is known: starting state, neighbors, goals, starting state, neighbors, goals, etc.etc.
Often need to make decisions even Often need to make decisions even though some things are uncertain.though some things are uncertain.
Complicates things…Complicates things…
![Page 4: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/4.jpg)
Types of UncertaintyTypes of Uncertainty
Opponent: What will other player do?Opponent: What will other player do?• MinimaxMinimax
Outcome: Which neighbor get?Outcome: Which neighbor get?• Model via probability distributionModel via probability distribution
State: Where are we now?State: Where are we now?• Hidden informationHidden information
Transition: What are the rules?Transition: What are the rules?• Need to use learning to find outNeed to use learning to find out
![Page 5: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/5.jpg)
Nim-RandNim-Rand
Pile of sticks.Pile of sticks.• Lose if take last stick.Lose if take last stick.• On your turn, take 1 or 2.On your turn, take 1 or 2.• Flip a coin. If H, take 1 more.Flip a coin. If H, take 1 more.
Which type of uncertainty?Which type of uncertainty?
![Page 6: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/6.jpg)
Value of a GameValue of a Game
Without randomness: maximize your Without randomness: maximize your winnings in the worst case.winnings in the worst case.
With randomness: maximize your With randomness: maximize your expectedexpected winnings in the worst winnings in the worst case.case.
Want to do well on average.Want to do well on average.
What games are like this?What games are like this?
![Page 7: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/7.jpg)
Nim-Rand TreeNim-Rand Tree
(|||)-X(|||)-X
cc cc(||)-Y(||)-Y
(|)-Y(|)-Y (|)-Y(|)-Y ()-Y()-Ycc
()-X()-X ()-X()-X ()-X()-X(|)-X(|)-X
+1 +1 -1-1
1 2
+1 +1
1 2
+1
()-X()-X+1
+1
()-Y()-Y
![Page 8: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/8.jpg)
Nim-Rand ValuesNim-Rand Values
(|||)-X(|||)-X
cc cc(||)-Y(||)-Y
(|)-Y(|)-Y (|)-Y(|)-Y ()-Y()-Ycc
()-X()-X ()-X()-X ()-X()-X(|)-X(|)-X
+1 +1 -1-1
1 2
+1 +1
1 2
+1
()-X()-X+1
+1
()-Y()-Y-1-1+1+1
+1+1 +1+1 +1+1
-1-1
-1-1
+1+1 +1+1+0+0
+0+0+0.5+0.5 +0+0
+0.5+0.5
![Page 9: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/9.jpg)
Search ModelSearch Model
States, terminal states (G), values for States, terminal states (G), values for terminal states (V).terminal states (V).
X states (maximizer), Y states X states (maximizer), Y states (minimizer), Z states (chance)(minimizer), Z states (chance)
For all s in Z, for all s’ in N(s)For all s in Z, for all s’ in N(s)
P(s’|s) is the probability of reaching P(s’|s) is the probability of reaching s’ from s.s’ from s.
![Page 10: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/10.jpg)
Game Value (no loops)Game Value (no loops)
Gameval(s) = {Gameval(s) = {If (G(s)) return V(s)If (G(s)) return V(s)Else if s in XElse if s in X
return maxreturn maxs’ in N(s) s’ in N(s) Gameval(s’)Gameval(s’)Else if s in YElse if s in Y
return minreturn mins’ in N(s) s’ in N(s) Gameval(s’)Gameval(s’)Else Else
return sumreturn sums’ in N(s) s’ in N(s) P(s’|s) Gameval(s’)P(s’|s) Gameval(s’)}}
![Page 11: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/11.jpg)
Games with LoopsGames with Loops
No known poly time algorithm.No known poly time algorithm.
Approximated by Approximated by value iterationvalue iteration::
For all s, if G(s), L(s) = V(s), else 0For all s, if G(s), L(s) = V(s), else 0
Repeat until changes are small:Repeat until changes are small:
for all s, L(s) = for all s, L(s) =
max, min, avg L(s’), s’ in N(s)max, min, avg L(s’), s’ in N(s)
depending on s in X, Y, or Z.depending on s in X, Y, or Z.
![Page 12: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/12.jpg)
Hidden InformationHidden Information
Games like Poker, 2-player bridge, Games like Poker, 2-player bridge, Scrabble ™, Diplomacy, StrategoScrabble ™, Diplomacy, Stratego
Don’t fit game tree model, even Don’t fit game tree model, even when chance nodes included.when chance nodes included.
![Page 13: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/13.jpg)
Pure StrategiesPure Strategies
X:X: II: 1=L, 4=L: 1=L, 4=L
IIII: 1=L, 4=R: 1=L, 4=R
IIIIII: 1=R, 4=L: 1=R, 4=L
IVIV: 1=R, 4=R: 1=R, 4=R
Y:Y: II: 2=L, 3=R: 2=L, 3=R
IIII: 2=M, 3=R: 2=M, 3=R
IIIIII: 2=R, 3=R: 2=R, 3=R
X-1
+7 +3
-1
+5
+4
Y-2 Y-3
X-4
L R
L R
L M RR
![Page 14: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/14.jpg)
Matrix FormMatrix Form
Summarizes all decisions in one for Summarizes all decisions in one for each, chosen simultaneouslyeach, chosen simultaneously
X-X-II X-X-IIII X-X-IIIIII X-X-IVIV
Y-Y-II 77 77 22 22
Y-Y-IIII 33 33 22 22
Y-Y-IIIIII -1-1 44 22 22
![Page 15: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/15.jpg)
Value of Matrix GameValue of Matrix Game
X picks column with largest minX picks column with largest min
Y picks row with smallest maxY picks row with smallest max
X-X-II X-X-IIII X-X-IIIIII X-X-IVIV
Y-Y-II 77 77 22 22
Y-Y-IIII 33 33 22 22
Y-Y-IIIIII -1-1 44 22 22
![Page 16: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/16.jpg)
MinimaxMinimax
Von Neumann proved zero-sum Von Neumann proved zero-sum matrix game, minimax=maximin.matrix game, minimax=maximin.
Given perfect information (no state Given perfect information (no state uncertainty), there exists optimal uncertainty), there exists optimal pure strategy for each player.pure strategy for each player.
![Page 17: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/17.jpg)
Game w/ Chance NodesGame w/ Chance Nodes
X-1
+4 -20
-5
+3
+10
c Y-3
c
L R
0.5 0.5 RL
0.8 0.2
Use expected Use expected valuesvalues
X-X-I I (L)
X-X-II II (R)
Y-Y-I I (L) -8-8 -2-2
Y-Y-II II (R) -8-8 +3+3
![Page 18: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/18.jpg)
More General MatricesMore General Matrices
What game tree leads to this matrix?What game tree leads to this matrix?
Does von Neumann’s theorem still Does von Neumann’s theorem still hold?hold?
X-X-I I (L)
X-X-II II (R)
Y-Y-I I (L) 11 00
Y-Y-II II (R) 00 11
![Page 19: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/19.jpg)
Hidden Info. MatricesHidden Info. Matrices
X picks L or R, keeping the choice X picks L or R, keeping the choice hidden from Y.hidden from Y.
Y makes a choice.Y makes a choice.
X’s choice is revealed and game X’s choice is revealed and game ends.ends. X-X-I I
(L)X-X-II II (R)
Y-Y-I I (L) 11 00
Y-Y-II II (R) 00 11
![Page 20: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/20.jpg)
Micro PokerMicro Poker
X is dealt high X is dealt high or low card, or low card, holds/folds.holds/folds.
Y folds/sees.Y folds/sees.
High card winsHigh card wins
Y can’t see X’s Y can’t see X’s card.card.
c
-20
+10 -40 +30+10
X-L X-H
Y
fold hold
0.5 0.5
Yseefold fold see
hold
![Page 21: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/21.jpg)
Matrix FormMatrix Form
Player X can guarantee itself +1 on Player X can guarantee itself +1 on average. How?average. How?
It can even announce its strategy.It can even announce its strategy.
X-X-I I (fold)
X-X-II II (hold)
Y-Y-I I (fold) -5-5 +10+10
Y-Y-II II (see) +5+5 -5-5
![Page 22: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/22.jpg)
Mixed StrategiesMixed Strategies
Pick a number p.Pick a number p.
X: With prob. p, fold; else hold.X: With prob. p, fold; else hold.
Since Y doesn’t know what’s coming, Since Y doesn’t know what’s coming, the response will sometimes work, the response will sometimes work, sometimes not.sometimes not.
![Page 23: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/23.jpg)
Guess a ProbabilityGuess a Probability
X announces X announces p=1/3.p=1/3.
Y’s pick?Y’s pick?
X-X-I I (fold)
X-X-II II (hold)
Y-Y-I I (fold) -5-5 +10+10
Y-Y-II II (see) +5+5 -5-5
Fold: +5Fold: +5
See: -1 2/3See: -1 2/3
seesee
![Page 24: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/24.jpg)
Guess a ProbabilityGuess a Probability
X announces X announces p=2/3.p=2/3.
Y’s pick?Y’s pick?
X-X-I I (fold)
X-X-II II (hold)
Y-Y-I I (fold) -5-5 +10+10
Y-Y-II II (see) +5+5 -5-5
Fold: +0Fold: +0
See: +1 2/3See: +1 2/3
foldfold
![Page 25: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/25.jpg)
All StrategiesAll Strategies
What should What should X pick for p X pick for p to to maximize maximize its worst its worst case?case?
p=0.6p=0.6
Payoff +1Payoff +1 -5
0
5
10
0 0.5 1
see
fold
pp
![Page 26: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/26.jpg)
Randomizing YRandomizing Y
If Y random, If Y random, answer is answer is the same.the same.
No matter No matter what, X can what, X can guarantee guarantee itself +1.itself +1.
-5
0
5
10
0 0.5 1
see
fold
![Page 27: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/27.jpg)
BluffingBluffing
c
-20
+10 -40 +30+10
X-L X-H
Y
fold hold
0.5 0.5
Yseefold fold see
hold
X: On a low X: On a low card, bluff card, bluff with prob. with prob. 0.4.0.4.
Y: On hold, Y: On hold, fold with fold with prob. 0.4.prob. 0.4.
![Page 28: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/28.jpg)
Solving 2x2 GameSolving 2x2 Game
X-X-I I with prob. pwith prob. p
X’s expected gain X’s expected gain vs. Y-vs. Y-II : :
mm1111p+mp+m1212(1-p)(1-p)
vs. Y-vs. Y-IIII : :
mm2121p+mp+m2222(1-p)(1-p)
X-X-II X-X-IIII
Y-Y-II mm1111 mm1212
Y-Y-IIII mm2121 mm2222
Maximize the Maximize the minimum.minimum.
Try p=0, p=1, where lines meet.Try p=0, p=1, where lines meet.
![Page 29: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/29.jpg)
Solving General mxnSolving General mxn
Linear program: pLinear program: p11,…,p,…,pnn..
pp11+…+p+…+pnn = 1, p = 1, pii 0 0
Maximize X’s gain, gMaximize X’s gain, g
vs Y-vs Y-II: m: m1111 p p11 + … +m + … +mn1n1 p pn n g g
vs Y-vs Y-IIII: m: m1212 p p11 + … +m + … +mn2n2 p pn n g g
… …
Against all Y strategies.Against all Y strategies.
![Page 30: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/30.jpg)
IssuesIssues
Can we solve poker?Can we solve poker?• More than 2 playersMore than 2 players• Not zero sum (collude)Not zero sum (collude)• Huge state spaceHuge state space
Poker: Opponent modelingPoker: Opponent modeling
Bridge: Use simulation to Bridge: Use simulation to approximateapproximate
![Page 31: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/31.jpg)
What to LearnWhat to Learn
Minimax value in games of chance Minimax value in games of chance and the DFS algorithm for and the DFS algorithm for computing it.computing it.
Converting games to matrix form.Converting games to matrix form.
Solve 2x2 game.Solve 2x2 game.
![Page 32: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/32.jpg)
Homework 5 (due 11/7)Homework 5 (due 11/7)
1.1. The value iteration algorithm from the The value iteration algorithm from the Games of ChanceGames of Chance lecture can be lecture can be applied to deterministic games with applied to deterministic games with loops. Argue that it produces the same loops. Argue that it produces the same answer as the “Loopy” algorithm from answer as the “Loopy” algorithm from the the Game TreeGame Tree lecture. lecture.
2.2. Write the matrix form of the game tree Write the matrix form of the game tree below.below.
![Page 33: Games of Chance Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001](https://reader030.vdocuments.us/reader030/viewer/2022032522/56649d605503460f94a419b5/html5/thumbnails/33.jpg)
Game TreeGame Tree
X-1
+2
-1 +4
Y-2 Y-3
X-4
L R
L R
L R
+5L
+2R