search with other agents - sharif
TRANSCRIPT
Soleymani
Search with Other AgentsCE417: Introduction to Artificial IntelligenceSharif University of TechnologyFall 2020
βArtificial Intelligence: A Modern Approachβ, 3rd Edition, Chapter 5
Most slides have been adopted from Klein and Abdeel, CS188, UC Berkeley.
Adversarial Games
2
Zero-Sum Game Games βΊ
Checkers: 1950: First computer player. 1994:First computer champion: Chinook ended 40-year-reign of human champion Marion Tinsley usingcomplete 8-piece endgame. 2007: Checkerssolved!
Chess: 1997: Deep Blue defeats human championGary Kasparov in a six-game match. Deep Blueexamined 200M positions per second, used verysophisticated evaluation and undisclosed methodsfor extending some lines of search up to 40 ply.Current programs are even better, if less historic.
Go: Human champions are now starting to bechallenged by machines, though the best humansstill beat the best machines. In go, b > 300!Classic programs use pattern knowledge bases,but big recent advances use Monte Carlo(randomized) expansion methods.
3
Zero-Sum Game Games βΊ
Checkers: 1950: First computer player. 1994:First computer champion: Chinook ended 40-year-reign of human champion Marion Tinsley usingcomplete 8-piece endgame. 2007: Checkerssolved!
Chess: 1997: Deep Blue defeats human championGary Kasparov in a six-game match. Deep Blueexamined 200M positions per second, used verysophisticated evaluation and undisclosed methodsfor extending some lines of search up to 40 ply.Current programs are even better, if less historic.
Go :2016: Alpha GO defeats humanchampion. Uses Monte Carlo Tree Search,learned evaluation function by deeplearning.
4
Many different kinds of games!
Axes:
Deterministic or stochastic?
One, two, or more players?
Zero sum?
Perfect information (can you see the state)?
Want algorithms for calculating a strategy (policy) which recommends amove from each state
Types of Games
5
Zero-Sum Games
Zero-Sum Games
Agents have opposite utilities
(values on outcomes)
Lets us think of a single value that one
maximizes and the other minimizes
Adversarial, pure competition
General Games
Agents have independent utilities
(values on outcomes)
Cooperation, indifference, competition,
and more are all possible
More later on non-zero-sum games
6
Primary assumptions
We start with these games:
Two-player
Turn taking
agents act alternately
Zero-sum
agentsβ goals are in conflict: sum of utility values at the end of the
game is zero or constant
Perfect information
fully observable
7
Examples:Tic-tac-toe, chess, checkers
Deterministic Games
Many possible formalizations, one is:
States: S (start at s0)
Players: P={1...N} (usually take turns)
Actions:A (may depend on player / state)
Transition Function: SxA β S
Terminal Test: S β {t,f}
Terminal Utilities: SxP β R
Solution for a player is a policy: S βA
8
Single-Agent Trees
8
2 0 2 6 4 6β¦ β¦
9
Value of a State
Non-Terminal States:
8
2 0 2 6 4 6β¦ β¦ Terminal States:
Value of a state: The best achievable outcome (utility) from that state
10
Adversarial Search
11
Game tree (tic-tac-toe)
12
Two players:π1 and π2 (π1 is now searching to find a good move)
Zero-sum games:π1 gets π(π‘),π2 gets πΆ β π(π‘) for terminal node π‘
Utilities from the point of view of π1
1-ply = half move
π1
π2
π1
π2
π1: ππ2: π
Adversarial Game Trees
-20 -8 -18 -5 -10 +4β¦ β¦ -20 +8
13
Minimax Values
+8-10-5-8
States Under Agentβs Control:
Terminal States:
States Under Opponentβs Control:
14
Tic-Tac-Toe Game Tree
15
Adversarial Search (Minimax)
Opponent is assumed optimal
Minimax search:
A state-space search tree
Players alternate turns
Compute each nodeβs minimax value:
the best achievable utility against a
rational (optimal) adversary8 2 5 6
max
min2 5
5
Terminal values:part of the game
16
Minimax
ππΌππΌππ΄π(π ) shows the best achievable outcome of being in
state π (assumption: optimal opponent)
17
ππΌππΌππ΄π π = ΰ΅
πππΌπΏπΌππ(π ,ππ΄π) ππ ππΈπ ππΌππ΄πΏ_ππΈππ(π )
πππ₯πβπ΄πΆππΌπππ π ππΌππΌππ΄π(π πΈπππΏπ(π , π)) ππΏπ΄ππΈπ π = ππ΄π
ππππβπ΄πΆππΌπππ π ππΌππΌππ΄π(π πΈπππΏπ π , π ) ππΏπ΄ππΈπ π = ππΌπUtility of being in state s
3 2 2
3
Minimax (Cont.)
18
Optimal strategy: move to the state with highest minimax
value
Best achievable payoff against best play
Maximizes the worst-case outcome for MAX
It works for zero-sum games
Minimax Implementation
def min-value(state):initialize v = +βfor each successor of state:
v = min(v, max-value(successor))return v
def max-value(state):initialize v = -βfor each successor of state:
v = max(v, min-value(successor))return v
19
Minimax Implementation (Dispatch)
def value(state):
if the state is a terminal state: return the stateβs utility
if the next agent is MAX: return max-value(state)
if the next agent is MIN: return min-value(state)
def min-value(state):initialize v = +βfor each successor of state:
v = min(v, value(successor))return v
def max-value(state):initialize v = -βfor each successor of state:
v = max(v, value(successor))return v
20
Minimax algorithm
21
Depth first search
function ππΌππΌππ΄π_π·πΈπΆπΌππΌππ(π π‘ππ‘π) returns ππ πππ‘πππreturn max
πβπ΄πΆππΌπππ(π π‘ππ‘π)ππΌπ_ππ΄πΏππΈ(π πΈπππΏπ(π π‘ππ‘π, π))
function ππ΄π_ππ΄πΏππΈ(π π‘ππ‘π) returns π π’π‘ππππ‘π¦ π£πππ’πif ππΈπ ππΌππ΄πΏ_ππΈππ(π π‘ππ‘π) then return πππΌπΏπΌππ(π π‘ππ‘π)π£ β ββfor each π in π΄πΆππΌπππ(π π‘ππ‘π) do
π£ β ππ΄π(π£,ππΌπ_ππ΄πΏππΈ(π πΈπππΏππ(π π‘ππ‘π, π)))return π£
function ππΌπ_ππ΄πΏππΈ(π π‘ππ‘π) returns π π’π‘ππππ‘π¦ π£πππ’πif ππΈπ ππΌππ΄πΏ_ππΈππ(π π‘ππ‘π) then return πππΌπΏπΌππ(π π‘ππ‘π)π£ β βfor each π in π΄πΆππΌπππ(π π‘ππ‘π) do
π£ β ππΌπ(π£,ππ΄π_ππ΄πΏππΈ(π πΈπππΏππ(π π‘ππ‘π, π)))return π£
Properties of minimax
Complete?Yes (when tree is finite)
Optimal?Yes (against an optimal opponent)
Time complexity:π(ππ)
Space complexity:π(ππ) (depth-first exploration)
For chess, π β 35,π > 50 for reasonable games
Finding exact solution is completely infeasible
22
Minimax Properties
Optimal against a perfect player. Otherwise?
10 10 9 100
max
min
23
Video of Demo Min vs. Exp (Min)
24
Video of Demo Min vs. Exp (Exp)
25
Game Tree Pruning
26
Pruning
27
Correct minimax decision without looking at every node
in the game tree
Ξ±-Ξ² pruning
Branch & bound algorithm
Prunes away branches that cannot influence the final decision
Ξ±-Ξ² pruning example
28
Ξ±-Ξ² pruning example
29
Ξ±-Ξ² pruning example
30
Ξ±-Ξ² pruning example
31
Ξ±-Ξ² pruning example
32
Alpha-Beta Pruning
General configuration (MIN version)
Weβre computing the MIN-VALUE at some node n
Weβre looping over nβs children
Who cares about nβs value? MAX
Let πΌ be the best value that MAX can get at any
choice point along the current path from the root
If n becomes worse than πΌ, MAX will avoid it, so we
can stop considering nβs other children (itβs already
bad enough that it wonβt be played)
MAX version is symmetric
MAX
MIN
MAX
MIN
πΌ
n
33
Ξ±-Ξ² pruning
34
Assuming depth-first generation of tree
We prune node π when player has a better choice π at
(parent or) any ancestor of π
Two types of pruning (cuts):
pruning of max nodes (Ξ±-cuts)
pruning of min nodes (Ξ²-cuts)
Why is it called Ξ±-Ξ²?
Ξ±: Value of the best (highest) choice found so far at any choice
point along the path for MAX
π½: Value of the best (lowest) choice found so far at any choice
point along the path for MIN
Updating Ξ± and π½ during the search process
For a MAX node once the value of this node is known to be more
than the current π½ (v β₯ π½), its remaining branches are pruned.
For a MIN node once the value of this node is known to be less
than the current πΌ (v β€ πΌ), its remaining branches are pruned.
35
Alpha-Beta Quiz 2
36
Alpha-Beta Quiz 2
10
10
>=100
2
<=2
37
Ξ±-Ξ² pruning (an other example)
38
1 5
β€23
β₯5
2
3
Alpha-Beta Implementation
def min-value(state , Ξ±, Ξ²):initialize v = +βfor each successor of state:
v = min(v, value(successor, Ξ±, Ξ²))if v β€ Ξ± return vΞ² = min(Ξ², v)
return v
def max-value(state, Ξ±, Ξ²):initialize v = -βfor each successor of state:
v = max(v, value(successor, Ξ±, Ξ²))if v β₯ Ξ² return vΞ± = max(Ξ±, v)
return v
Ξ±: MAXβs best option on path to root
Ξ²: MINβs best option on path to root
39
40
function π΄πΏππ»π΄_π΅πΈππ΄_ππΈπ΄π πΆπ»(π π‘ππ‘π) returns ππ πππ‘ππππ£ β ππ΄π_ππ΄πΏππΈ(π π‘ππ‘π, ββ,+β)return the πππ‘πππ in π΄πΆππΌπππ(π π‘ππ‘π) with value π£
function ππ΄π_ππ΄πΏππΈ(π π‘ππ‘π, πΌ, π½) returns π π’π‘ππππ‘π¦ π£πππ’πif ππΈπ ππΌππ΄πΏ_ππΈππ(π π‘ππ‘π) then return πππΌπΏπΌππ(π π‘ππ‘π)π£ β ββfor each π in π΄πΆππΌπππ(π π‘ππ‘π) do
π£ β ππ΄π(π£,ππΌπ_ππ΄πΏππΈ(π πΈπππΏππ(π π‘ππ‘π, π), πΌ, π½))if π£ β₯ π½ then return π£πΌ β ππ΄π(πΌ, π£)
return π£
function ππΌπ_ππ΄πΏππΈ(π π‘ππ‘π, πΌ, π½) returns π π’π‘ππππ‘π¦ π£πππ’πif ππΈπ ππΌππ΄πΏ_ππΈππ(π π‘ππ‘π) then return πππΌπΏπΌππ(π π‘ππ‘π)π£ β +βfor each π in π΄πΆππΌπππ(π π‘ππ‘π) do
π£ β ππΌπ(π£,ππ΄π_ππ΄πΏππΈ(π πΈπππΏππ(π π‘ππ‘π, π), πΌ, π½))if π£ β€ πΌ then return π£π½ β ππΌπ(π½, π£)
return π£
Alpha-Beta Pruning Properties This pruning has no effect on minimax value computed for the root!
Values of intermediate nodes might be wrong
Important: children of the root may have the wrong value
So the most naΓ―ve version wonβt let you do action selection
Good child ordering improves effectiveness of pruning
With βperfect orderingβ:
Time complexity drops to O(bm/2)
Doubles solvable depth!
Full search of, e.g. chess, is still hopelessβ¦
This is a simple example of metareasoning (computing about what to compute)
10 10 0
max
min
41
Order of moves
Good move ordering improves effectiveness of pruning
Best order: time complexity is π(π Ξ€π 2)
42
?
Computational time limit (example)
43
100 secs is allowed for each move (game rule)
104 nodes/sec (processor speed)
We can explore just 106 nodes for each move
bm = 106, b=35 βΉ m=4
(4-ply look-ahead is a hopeless chess player!)
- reaches about depth 8 β decent chess program
It is still hopeless
Resource Limits
44
Resource Limits
Problem: In realistic games, cannot searchto leaves!
Solution: Depth-limited search Instead, search only to a limited depth in the tree
Replace terminal utilities with an evaluationfunction for non-terminal positions
Guarantee of optimal play is gone
More plies makes a BIG difference
Use iterative deepening for an anytimealgorithm
? ? ? ?
-1 -2 4 9
4
min
max
-2 4
45
Computational time limit: Solution
46
Cut off the search and apply a heuristic evaluation
function
cutoff test: turns non-terminal nodes into terminal leaves
Cut off test instead of terminal test (e.g., depth limit)
evaluation function: estimated desirability of a state
Heuristic function evaluation instead of utility function
This approach does not guarantee optimality.
Heuristic minimax
47
π»ππΌππΌππ΄π π ,π =
ΰ΅
πΈππ΄πΏ(π ,ππ΄π) ππ πΆππππΉπΉ_ππΈππ(π , π)
πππ₯πβπ΄πΆππΌπππ π π»_ππΌππΌππ΄π(π πΈπππΏπ π , π , π + 1) ππΏπ΄ππΈπ π = MAX
ππππβπ΄πΆππΌπππ π π»_ππΌππΌππ΄π(π πΈπππΏπ π , π , π + 1) ππΏπ΄ππΈπ π = MIN
Depth Matters
Evaluation functions are always imperfect
The deeper in the tree the evaluation
function is buried, the less the quality of
the evaluation function matters
An important example of the tradeoff
between complexity of features and
complexity of computation
48
Video of Demo Limited Depth (2)
49
Video of Demo Limited Depth (10)
50
Evaluation Functions
51
Evaluation functions
52
For terminal states, it should order them in the same way
as the true utility function.
For non-terminal states, it should be strongly correlated
with the actual chances of winning.
It must not need high computational cost.
Evaluation functions based on features
53
Example: features for evaluation of the chess states
Number of each kind of piece: number of white pawns, black
pawns, white queens, black queens, etc
King safety
Good pawn structure
β¦
Evaluation Functions
Evaluation functions score non-terminals in depth-limited search
Ideal function: returns the actual minimax value of the position
In practice: typically weighted linear sum of features:
e.g. f1(s) = (num white queens β num black queens), etc.
54
Evaluation for Pacman
[Demo: thrashing d=2, thrashing d=2 (fixed evaluation function), smart ghosts coordinate (L6D6,7,8,10)]
55
Video of Demo Thrashing (d=3)
56
Why Pacman Starves
A danger of replanning agents! He knows his score will go up by eating the dot now (west, east)
He knows his score will go up just as much by eating the dot later (east, west)
There are no point-scoring opportunities after eating the dot (within the horizon, two here)
Therefore, waiting seems just as good as eating: he may go east, then back west in the next round ofreplanning!
57
Fixing the evaluation function
Evaluation function:
# of eaten dots + 1/(distance to the closest dot)
58
Video of Demo Thrashing -- Fixed (d=3)
59
Video of Demo Smart Ghosts (Coordination)
60
Video of Demo Smart Ghosts (Coordination) β
Zoomed In
61
Other Game Types
62
Multi-Agent Utilities
What if the game is not zero-sum, or has multiple players?
Generalization of minimax: Terminals have utility tuples
Node values are also utility tuples
Each player maximizes its own component
Can give rise to cooperation and
competition dynamicallyβ¦
1,6,6 7,1,2 6,1,2 7,2,1 5,1,7 1,5,2 7,7,1 5,2,5
1,6,6
63
Uncertain Outcomes
64
Probabilities
65
Reminder: Probabilities A random variable represents an event whose outcome is unknown
A probability distribution is an assignment of weights to outcomes
Example:Traffic on freeway
Random variable:T = whether thereβs traffic
Outcomes:T in {none, light, heavy}
Distribution: P(T=none) = 0.25, P(T=light) = 0.50, P(T=heavy) = 0.25
Some laws of probability (more later):
Probabilities are always non-negative
Probabilities over all possible outcomes sum to one
As we get more evidence, probabilities may change:
P(T=heavy) = 0.25, P(T=heavy | Hour=8am) = 0.60
Weβll talk about methods for reasoning and updating probabilities later
0.25
0.50
0.25
66
The expected value of a function of a random variable is theaverage, weighted by the probability distribution overoutcomes
Example: How long to get to the airport?
Reminder: Expectations
0.25 0.50 0.25Probability:
20 min 30 min 60 minTime:35 min
x x x+ +
67
Mixed Layer Types
E.g. Backgammon
Expectiminimax
Environment is an extra βrandom agentβ
player that moves after each min/max agent
Each node computes the appropriate
combination of its children
68
Stochastic games
69
πΈπππΈπΆπ_ππΌππΌππ΄π(π ) =πππΌπΏπΌππ(π ,ππ΄π) ππ ππΈπ ππΌππ΄πΏ_ππΈππ(π )
πππ₯πβπ΄πΆππΌπππ π πΈπππΈπΆπ_ππΌππΌππ΄π(π πΈπππΏπ(π , π)) ππΏπ΄ππΈπ π = MAX
ππππβπ΄πΆππΌπππ π πΈπππΈπΆπ_ππΌππΌππ΄π(π πΈπππΏπ(π , π)) ππΏπ΄ππΈπ π = MIN
ππ(π) πΈπππΈπΆπ_ππΌππΌππ΄π(π πΈπππΏπ π , π ) ππΏπ΄ππΈπ π = CHANCE
Stochastic games: Backgammon
70
Stochastic games
71
Expected utility: Chance nodes take average (expectation) over
all possible outcomes.
It is consistent with the definition of rational agents trying to
maximize expected utility.
2 3 1 4
2.1 1.3average of the values
weighted by their
probabilities
Evaluation functions for stochastic games
72
An order preserving transformation on leaf values is nota sufficient condition.
Evaluation function must be a positive lineartransformation of the expected utility of a position.
Properties of search space for stochastic games
73
π(ππππ) Backgammon: π β 20 (can be up to 4000 for double dice rolls), π = 21
(no. of different possible dice rolls)
3-plies is manageable (β 108 nodes)
Probability of reaching a given node decreases enormously by
increasing the depth (multiplying probabilities)
Forming detailed plans of actions may be pointless
Limiting depth is not such damaging particularly when the probability values (for
each non-deterministic situation) are close to each other
But pruning is not straightforward.
Example: Backgammon
Dice rolls increase b: 21 possible rolls with 2 dice
Backgammon 20 legal moves
Depth 2 = 20 x (21 x 20)3 = 1.2 x 109
As depth increases, probability of reaching a given
search node shrinks
So usefulness of search is diminished
So limiting depth is less damaging
But pruning is trickierβ¦
Historic AI: TDGammon uses depth-2 search + very
good evaluation function + reinforcement learning
Image: Wikipedia
74
1st AI world champion in any game!
State-of-the-art game programs
Chess (π β 35) In 1997, Deep Blue defeated Kasparov.
ran on a parallel computer doing alpha-beta search.
reaches depth 14 plies routinely. techniques to extend the effective search depth
Hydra: Reaches depth 18 plies using more heuristics.
Checkers (π < 10) Chinook (ran on a regular PC and uses alpha-beta search) ended 40-
year-reign of human world championTinsley in 1994.
Since 2007, Chinook has been able to play perfectly by using alpha-betasearch combined with a database of 39 trillion endgame positions.
75
State-of-the-art game programs (Cont.)
76
Othello (π is usually between 5 to 15) Logistello defeated the human world champion by
six games to none in 1997.
Human champions are no match for computers atOthello.
Go (π > 300) Human champions refuse to compete against
computers (current programs are still at advancedamateur level).
MOGO avoided alpha-beta search and used MonteCarlo rollouts.
AlphaGo (2016) has beaten professionals withouthandicaps.
State-of-the-art game programs (Cont.)
77
Backgammon (stochastic) TD-Gammon (1992) was competitive with top
human players. Depth 2 or 3 search along with a good evaluation
function developed by learning methods
Bridge (partially observable, multiplayer) In 1998, GIB was 12th in a filed of 35 in the par
contest at human world championship.
In 2005, Jack defeated three out of seven topchampions pairs. Overall, it lost by a small margin.
Scrabble (partially observable & stochastic) In 2006, Quackle defeated the former world
champion 3-2.
Utilities & Expectimax
Utilities
79
Utilities
Utilities are functions from outcomes (states of the world) to real numbersthat describe an agentβs preferences
Where do utilities come from? In a game, may be simple (+1/-1)
Utilities summarize the agentβs goals
Theorem: any βrationalβ preferences can be summarized as a utility function
We hard-wire utilities and let behaviors emerge Why donβt we let agents pick utilities?
Why donβt we prescribe behaviors?
80
Utilities: Uncertain Outcomes
Getting ice cream
Get Single Get Double
Oops Whew!
81
Expectimax Search (One Player Game)
Why wouldnβt we know what the result of an action will be? Explicit randomness: rolling dice
Unpredictable opponents: the ghosts respond randomly
Actions can fail: when moving a robot, wheels might slip
Values should now reflect average-case (expectimax)
outcomes, not worst-case outcomes
Expectimax search: compute the average score under
optimal play Max nodes as in minimax search
Chance nodes are like min nodes but the outcome is uncertain
Calculate their expected utilities i.e. take weighted average (expectation) of children
Soon, weβll learn how to formalize the underlying uncertain-
result problems as Markov Decision Processes
10 4 5 7
max
chance
10 10 9 100
82
Expectimax Pseudocode: One Player
def value(state):
if the state is a terminal state: return the stateβs utility
if the next agent is MAX: return max-value(state)
if the next agent is EXP: return exp-value(state)
def exp-value(state):initialize v = 0for each successor of state:
p = probability(successor)v += p * value(successor)
return v
def max-value(state):initialize v = -βfor each successor of state:
v = max(v, value(successor))return v
83
Expectimax Pseudocode
def exp-value(state):initialize v = 0for each successor of state:
p = probability(successor)v += p * value(successor)
return v 5 78 24 -12
1/21/3
1/6
v = (1/2) (8) + (1/3) (24) + (1/6) (-12) = 10
84
Expectimax Pruning?
12 93 2
85
What Utilities to Use?
For worst-case minimax reasoning, terminal function scale doesnβt matter
We just want better states to have higher evaluations (get the ordering right)
We call this insensitivity to monotonic transformations
For average-case expectimax reasoning, we need magnitudes to be meaningful
0 40 20 30 x2 0 1600 400 900
86
Expectimax Search
Why wouldnβt we know what the result of an action will be? Explicit randomness: rolling dice Unpredictable opponents: the ghosts respond randomly Unpredictable humans: humans are not perfect Actions can fail: when moving a robot, wheels might slip
Values should now reflect average-case (expectimax)outcomes, not worst-case (minimax) outcomes
Expectimax search: compute the average score under optimalplay Max nodes as in minimax search Chance nodes are like min nodes but the outcome is uncertain Calculate their expected utilities I.e. take weighted average (expectation) of children
Later, weβll learn how to formalize the underlying uncertain-result problems as Markov Decision Processes
10 4 5 7
max
chance
10 10 9 100
[Demo: min vs exp (L7D1,2)]
87
Worst-Case vs. Average Case (One player
Game)
10 10 9 100
max
min
Idea: Uncertain outcomes controlled by chance, not an adversary!
88
Video of Demo Minimax vs Expectimax (Min)
89
Video of Demo Minimax vs Expectimax (Exp)
90
In expectimax search, we have a probabilistic modelof how the opponent (or environment) will behavein any state could be a simple uniform distribution (roll a die) could be sophisticated and require a great deal of
computation a chance node for any outcome out of our control:
opponent or environment The model might say that adversarial actions are likely!
What Probabilities to Use?
Having a probabilistic belief about another agentβs action
does not mean that the agent is flipping any coins!
91
What Probabilities to Use?
Having a probabilistic belief about another agentβs action
does not mean that the agent is flipping any coins!
92
In expectimax search, we have a probabilistic modelof how the opponent (or environment) will behavein any state could be a simple uniform distribution (roll a die) could be sophisticated and require a great deal of
computation a chance node for any outcome out of our control:
opponent or environment The model might say that adversarial actions are likely!
For now, assume each chance node magicallycomes along with probabilities that specify thedistribution over its outcomes
Quiz: Informed Probabilities
Letβs say you know that your opponent is actually running a depth 2 minimax, using
the result 80% of the time, and moving randomly otherwise
Question:What tree search should you use?
0.1 0.9
βͺ Answer: Expectimax!βͺ To figure out EACH chance nodeβs probabilities,
you have to run a simulation of your opponent
βͺ This kind of thing gets very slow very quickly
βͺ Even worse if you have to simulate your opponent simulating youβ¦
βͺ β¦ except for minimax and maximax, which have the nice property that it all collapses into one game tree
This is basically how you would model a human, except for their utility: their utility might be the same as yours (i.e. you try to help them, but they are depth 2 and noisy), or they might have a slightly different utility (like another person navigating in the office)
93
Modeling Assumptions
94
The Dangers of Optimism and Pessimism
Dangerous OptimismAssuming chance when the world is
adversarial
Dangerous PessimismAssuming the worst case when itβs not
likely
95
Video of Demo World Assumptions
Random Ghost β Expectimax Pacman
96
Video of Demo World Assumptions
Adversarial Ghost β Minimax Pacman
97
Video of Demo World Assumptions
Adversarial Ghost β Expectimax Pacman
98
Video of Demo World Assumptions
Random Ghost β Minimax Pacman
99
Assumptions vs. Reality
Adversarial Ghost Random Ghost
MinimaxPacman
Won 5/5
Avg. Score: 483
Won 5/5
Avg. Score: 493
ExpectimaxPacman
Won 1/5
Avg. Score: -303
Won 5/5
Avg. Score: 503
Results from playing 5 games
Pacman used depth 4 search with an eval function that avoids troubleGhost used depth 2 search with an eval function that seeks Pacman
100
Maximum Expected Utility
Why should we average utilities? Why not minimax?
Principle of maximum expected utility:
A rational agent should chose the action that maximizes itsexpected utility, given its knowledge
Questions:
Where do utilities come from?
How do we know such utilities even exist?
How do we know that averaging even makes sense?
What if our behavior (preferences) canβt be described by
utilities?
101