it’s all in the game - welcome to the olli at uci blog · what is a “game?” - a situation in...
TRANSCRIPT
Introduction to Game Theory
- What is a “game?”
… Basic concepts and assumptions
… Brief early history of game theory
- Dominant and dominated strategies
- Two-player zero-sum games: the minimax theorem
- Two-player non-zero-sum games: The Prisoner’s Dilemma
What is a “Game?”
- A situation in which entities (players) with opposing interests
must select from among different courses of action (strategies)
… Each strategy leads to a different outcome (payoff)
- Applies to many fields:
… Economics
… Biology
… Sociology
… War
… Politics
… Philosophy
… Recreation and entertainment
- The object of game theory is to determine the players’ best
strategies
Example: “Matching Pennies”
- Two players each display a penny at the same time
- If both pennies match (i.e. both are heads or both are tails),
Player 1 wins a penny
- If the pennies do not match, Player 2 wins a penny
Player 2
Player
1
Show head Show tail
Show
head
Show
tail
(+1, -1) (-1, +1)
(-1, +1) (+1, -1)
Game matrix for “Matching Pennies” - Player 1 wins all matches; Player 2 wins all mis-matches
Match
Match
Mis-match
Mis-match
- Convention: Each cell shows
(Player 1’s payoff, Player 2’s payoff)
Observations about “Matching Pennies”
- Both players make their “moves” at the same time
- The payoffs in each cell add up to zero (zero-sum game)
- No fixed strategy guarantees a win for either player
… Fixed strategies – for example, showing heads all
the time – are called pure strategies
- What if a player shows heads or tails randomly?
… This type of strategy is called a mixed strategy
… If each player plays a 50% mixed strategy of heads/tails
(for example, by flipping the coins) then each player will
win 50% of the time on the average
Another Representation: Extensive Form
Player
1
Player
2
Player
2
Action
A
Action
B
Action
X
Action
Y
Action
X
Action
Y
Payoff from (A, X)
Payoff from (A, Y)
Payoff from (B, X)
Payoff from (B, Y)
- Extensive form is used when players’ actions alternate
over time so that each player knows the other’s prior move(s)
Social Contract Theory –
The State of Nature
- Thomas Hobbes: Without a central government the natural
state of humanity is a state of war of every individual against
every other individual
- In this state, individuals make economic decisions based
strictly on what they can compel by their own personal force
Whatsoever therefore is consequent to a time of Warre,
where every man is Enemy to every man; the same is
consequent to the time, wherein men live without other
security, than what their own strength, and their own
invention shall furnish them withall. In such condition, there
is no place for Industry; because the fruit thereof is uncertain:
and consequently no Culture of the Earth; no Navigation, nor
use of the commodities that may be imported by Sea; no
commodious Building; no Instruments of moving, and removing
such things as require much force; no Knowledge of the face
of the Earth; no account of Time; no Arts; no Letters; no
Society; and which is worst of all, continuall feare, and danger
of violent death; And the life of man, solitary, poore, nasty,
brutish, and short.
- Thomas Hobbes, Leviathan, 1651
Example: The Farmers’ Dilemma
(David Hume, 1740)
- Two farmers each have a bumper crop of corn (400 acres)
- Each farmer needs the other’s help to completely harvest
the corn, or a substantial portion (200 acres) will rot in the field
- The effort expended to harvest the neighboring farmer’s
field results in loss of yield of 100 acres of one’s own field
- Will the farmer whose corn ripens later (Farmer 2) help the
farmer whose corn ripens first (Farmer 1)?
- Starting from the end of the game tree, backward induction
eliminates actions that are not payoff-maximizing at each stage
Example: The Farmers’ Dilemma
(David Hume, 1740)
Farmer
2
Farmer
1
Farmer
1
Help
Don’t
help
Help
Don’t
help
Help
Don’t
help
(300,300)
(100,400)
(400,100)
(200,200)
Example: The Farmers’ Dilemma
(David Hume, 1740)
Farmer
2
Farmer
1
Farmer
1
Help
Don’t
help
Help
Don’t
help
Help
Don’t
help
(300,300)
(100,400)
(400,100)
(200,200)
x
Example: The Farmers’ Dilemma
(David Hume, 1740)
Farmer
2
Farmer
1
Farmer
1
Help
Don’t
help
Help
Don’t
help
Help
Don’t
help
(300,300)
(100,400)
(400,100)
(200,200)
x
x
Example: The Farmers’ Dilemma
(David Hume, 1740)
Farmer
2
Farmer
1
Farmer
1
Help
Don’t
help
Help
Don’t
help
Help
Don’t
help
(300,300)
(100,400)
(400,100)
(200,200)
x
x
Observations about the Farmers’ Dilemma
- Rational analysis seems to lead to a sub-optimal result
… If the farmers could somehow trust each other, and
cooperate, they would each get a payoff of 300 vs. 200
- Foreshadows other paradoxes of rationality that are
revealed by game theory, notably in the Prisoner’s Dilemma
Your corn is ripe today; mine will be so tomorrow. ’Tis
profitable for us both, that I shou’d labour with you to-day, and
that you shou’d aid me to-morrow. I have no kindness for you,
and know you have as little for me. I will not, therefore, take
any pains on your account; and should I labour with you upon
my own account, in expectation of a return, I know I shou’d
be disappointed, and that I shou’d in vain depend upon your
gratitude. Here then I leave you to labour alone: You treat me
in the same manner. The seasons change; and both of us
lose our harvests for want of mutual confidence and security.
- David Hume
Basic assumptions of game theory
- Players are rational
… “We wish to find the mathematically complete principles
which define “rational behavior” for the participants in a
social economy, and to derive from them the general
characteristics of that behavior.” (Von Neumann and
Morgenstern, Theory of Games and Economic Behavior)
- Players are seeking to maximize “utility”
… Max utility = most preferred outcome
- Players’ interests are opposed
... Their preferences may still coincide
- Players’ strategies comprehensively specify their complete
plans of action under all possible circumstances
Basic assumptions of game theory -
Rationality
- Are people really rational in conflict situations?
… Experiments suggest they often are not
- Rationality assumes complete (“perfect”) information
… All players know all other players, outcomes, payoffs, etc.
… All players know all preferences of all other players
… All the above are “common knowledge”
- In many cases people do not actually have perfect information
… Partial knowledge results in bounded rationality
(myopic behavior, poor decisions, etc.)
- The assumption of rationality can lead to paradoxes
Example: The Cuban Missile Crisis
- In October 1962 the U.S. discovered that Russia was building
a missile base in Cuba
- A Russian fleet was on its way to Cuba to bring in supplies
(including missiles with nuclear warheads)
- The U.S. set up a naval blockade
- Basically the U.S. and Russia were playing a game of
“Chicken” … who would “back down” first?
- Suppose that President Kennedy had been notified that his
staff was infiltrated by Russian spies and that Premier
Khrushchev would immediately learn of all his decisions
… How great an advantage would this have given Khrushchev?
The Cuban Missile Crisis
Possible Outcomes and Preferences (4 = most desirable, 1 = least desirable)
US Russia
US stands firm,
Russia backs down
US backs down,
Russia stands firm
4
US backs down,
Russia backs down
US backs down,
Russia backs down 3
US backs down,
Russia stands firm
US stands firm,
Russia backs down
2
US stands firm,
Russia stands firm
US stands firm,
Russia stands firm
1
Russia
Back down Stand firm
Back
down
Stand
firm
(3,3)
(4,2) (1,1)
(2,4)
US
Game Matrix for the Cuban Missile Crisis
The Cuban Missile Crisis: Paradox
- Foreknowledge of Kennedy’s decision would have worked
to Khrushchev’s disadvantage!
… Assumes Kennedy was aware of the foreknowledge
Russia
Back down Stand firm
Back
down
Stand
firm
(3,3)
(4,2) (1,1)
(2,4)
US
Game Matrix for the Cuban Missile Crisis
- If Kennedy decides to back down, Khrushchev will know it,
and will decide to stand firm (payoff 4 vs. 3: outcome (2,4))
Russia
Back down Stand firm
Back
down
Stand
firm
(3,3)
(4,2) (1,1)
(2,4)
US
Game Matrix for the Cuban Missile Crisis
- If Kennedy decides to stand firm, Khrushchev will know it,
and will decide to back down (payoff 2 vs. 1: outcome (4,2))
Russia
Back down Stand firm
Back
down
Stand
firm
(3,3)
(4,2) (1,1)
(2,4)
US
Game Matrix for the Cuban Missile Crisis
- Therefore if Kennedy has to choose between only those two
alternatives, he will choose to stand firm (payoff 4 vs. 2),
causing Khrushchev to back down (which is what happened)
Basic assumptions of game theory - Utility
- “Utility” is supposed to measure players’ preferences
… Not necessarily the same as monetary payoff, but could
be correlated with it
… Takes into account altruism, risk tolerance, etc.
- Utility is closely related to the definition of rationality
… Rational decision makers should maximize expected utility
- Do people actually try to maximize utility in conflict situations?
… Experiments suggest they often may not
… People may make the “first decision that works”, rather
than the “best possible” decision
Herbert Simon
1916-2001
- American psychologist and
decision science theorist, Carnegie-
Mellon University; Nobel Prize, 1978
- One of the key founders of the fields
of artificial intelligence and decision
theory
- Introduced the concept of bounded
rationality
- Put forth the key idea that decision makers satisfice instead
of actually attempting to maximize utility; that is, they tend to
make the first decision that satisfies a set of requirements they
are trying to fulfill, instead of the truly optimal decision
Utility and Risk Attitudes - Which would you prefer?
… A lottery ticket that pays out $10 with probability 50% and $0
otherwise, or
… A lottery ticket that pays out $3 with probability 100%
Utility and Risk Attitudes - Which would you prefer?
… A lottery ticket that pays out $10 with probability 50% and $0
otherwise, or
… A lottery ticket that pays out $3 with probability 100%
- How about:
… A lottery ticket that pays out $100,000,000 with probability 50%
and $0 otherwise, or
… A lottery ticket that pays out $30,000,000 with probability 100%
Utility and Risk Attitudes - Which would you prefer?
… A lottery ticket that pays out $10 with probability 50% and $0
otherwise, or
… A lottery ticket that pays out $3 with probability 100%
- How about:
… A lottery ticket that pays out $100,000,000 with probability 50%
and $0 otherwise, or
… A lottery ticket that pays out $30,000,000 with probability 100%
- Risk-neutral people only care about the ticket’s expected value
- Risk-averse (most) people prefer the ticket’s expected value to
the ticket
- Risk-seeking people prefer the ticket to the ticket’s expected
value
Maximizing Expected Utility Utility
Money $200 $1500 $5000
Buy a bike (utility = 1)
Buy a nicer bike (utility = 2)
Buy a used car (utility = 3)
Risk-averse
utility function
with diminishing
marginal utility
- Risk-averse utility functions flatten out the more money is
involved because of diminishing marginal utility
… Each additional dollar provides less utility than the dollar
before it
Maximizing Expected Utility Utility
Money $200 $1500 $5000
Buy a bike (utility = 1)
Buy a nicer bike (utility = 2)
Buy a used car (utility = 3)
Risk-averse
utility function
with diminishing
marginal utility
- Choice 1: get $1500 with probability 100%
- Choice 2: Lottery – get 60% chance of $200 or a 40% chance
of $5000
- Which is preferred with a risk-averse utility function?
Maximizing Expected Utility
- Choice 1: get $1500 with probability 100%
… Expected utility = 2
- Choice 2: Lottery – get 60% chance of $200 or a 40% chance
of $5000 … Expected utility = 60%(1) + 40%(3) = 1.8 < 2
… Expected value = 60%($200) + 40%($5000) = $2120 > $1500
- So: maximizing expected utility is consistent with risk aversion
Utility
Money $200 $1500 $5000
Buy a bike (utility = 1)
Buy a nicer bike (utility = 2)
Buy a used car (utility = 3)
Risk-averse
utility function
with diminishing
marginal utility
Expected utility of
Choice 2 = 1.8
Expected value of Choice 2 = $2120
Lottery
function
Utility Theory Axioms (“>” = “is preferred to”)
- Completeness: Either A > B, B > A or A = B
- Transitivity: If A > B and B > C, then A > C
- Independence: If A > B, then a weighted average of
A and C > a weighted average of B and C
- Continuity: If A > B > C, it should always be possible to
find a probability P that makes the individual indifferent
between gambling and not gambling in the following
situation –
B
P%
1-P%
A
C
Gamble
Don’t
gamble
Utility Functions and Rationality
- Only risk-neutrality or risk-aversion are rational
… Risk-aversion must also be independent of size of risk
- Transitivity implies that only rigorously consistent
preferences are rational
- Experiments (Tversky, Kahneman, Thaler) show that some
realistic utility functions are not rational
… Risk-seeking (gambling)
... Risk-aversion increasing with ratio of risk to capital/wealth
… Intransitive: Just because I prefer A to B, and B to C,
does not necessarily mean I always prefer A to C
- Preferences can depend on changes in the reference points
from which gains and losses will occur
Daniel Kahneman
1934-
- Israeli psychologist and economist,
Hebrew University, University of
British Columbia, University of
California-Berkeley and Princeton
University; Nobel Prize, 2002
- Expert in behavioral economics
- With Amos Tversky and their
graduate student Richard Thaler, developed prospect theory,
which proposes that people do not actually maximize expected
utility when making decisions, but rather tend to adjust their
risk-aversion to reference points from which gains and losses
will occur, causing violations of the transitivity assumption of
rationality (inconsistency of preferences)
Example: Reference Point Sensitivity
(Richard Thaler’s experiment)
- Option 1: Everyone just won $30:
Flip a coin vs. no coin flip
… Heads: win $9
… Tails: lose $9
… 70% chose the coin flip
- Option 2: Everyone starts at $0:
Flip a coin vs. $30 for sure
… Heads: win $39
… Tails: win $21
… 43% chose the coin flip
- Participants based their choice on the reference point
… No actual economic difference between Options 1 and 2
Basic assumptions of game theory -
Players’ interests are opposed
- If players’ interests are opposed, can they cooperate?
… Their preferences can still coincide
- Cooperation can arise naturally from opposing interests
… Communication, signaling, bargaining, negotiation
… Coalitions (for example, labor unions vs. management)
… Evolution over time in repeated games (Iterated Prisoner’s
Dilemma)
Basic assumptions of game theory –
Comprehensive strategies
- Is it really possible to formulate comprehensive strategies
specifying what actions players will take under all possible
circumstances?
… For some types of games this may not be possible
… Players may not know all the other players, all possible
outcomes, etc. (bounded rationality)
… It may be impractical to calculate comprehensive
strategies even though they may be theoretically possible
(for example, chess)
Brief early history of game theory
- Pascal’s Wager
- Cournot
- Borel
- Von Neumann: Minimax theory
- Dresher and Flood and Tucker: RAND Corporation
experiments and the Prisoner’s Dilemma
Blaise Pascal
(1623-1662)
- French mathematician and
philosopher
- Believed that nature is actually
infinite and that human reason
is capable of a nobility and
dignity that transcend human
finitude
- Maintained that although we are incapable of knowing
whether or not God exists, we must make a “wager” on one
or the other alternative in determining how we live our lives
“God is, or He is not.” But to which side shall we incline?
Reason can decide nothing here. There is an infinite
chaos which separated us. A game is being played at
the extremity of this infinite distance where heads or tails
will turn up... Which will you choose then? Let us see.
Since you must choose, let us see which interests you least.
You have two things to lose, the true and the good; and two
things to stake, your reason and your will, your knowledge
and your happiness; and your nature has two things to shun,
error and misery. Your reason is no more shocked in choosing
one rather than the other, since you must of necessity choose...
But your happiness? Let us weigh the gain and the loss in
wagering that God is...
-- Pascal
Game Matrix for Pascal’s Wager
Reality
You
God exists (p) God does not exist (1-p)
Act as
if God
exists
Act as
if God
does
not
exist
Eternal
reward
Eternal
punishment
Earthly
Reward only
Pious life
only
But there is here an infinity of an infinitely
happy life to gain, a chance of gain against
a finite number of chances of loss, and
what you stake is finite. It is all divided;
wherever the infinite is and there is not an
infinity of chances of loss against that of
gain, there is no time to hesitate, you must
give all...
-- Pascal
Game Matrix for Pascal’s Wager Reality
You
God exists (p) God does not exist (1-p)
Act as
if God
exists
Act as
if God
does
not
exist
Eternal
reward
Eternal
punishment
Earthly
Reward only
Pious life
only x
x - The infinity of the payoffs if God exists makes the payoffs if God
does not exist irrelevant
- Therefore the strategy of acting as if God exists dominates the
strategy of acting as if God does not exist
Dominated Strategies
- Strategy A dominates strategy B if a player’s payoff with
strategy A is always greater than or equal to that of strategy
B no matter what the other players do
- It’s often possible to simplify a complex game by
eliminating dominated strategies
Player 2
Strategy
A
Strategy
B
Strategy
C
Strategy 1 Strategy 2 Strategy 3
(7,-7) (9,-9) (8,-8)
(9,-9) (10,-10) (12,-12)
(8,-8) (8,-8) (8,-8)
Example: Dominated Strategies
Player
1
Player 2
Strategy
A
Strategy
B
Strategy
C
Strategy 1 Strategy 2 Strategy 3
(7,-7) (9,-9) (8,-8)
(9,-9) (10,-10) (12,-12)
(8,-8) (8,-8) (8,-8)
Example: Dominated Strategies
Player
1
- For Player 1, Strategy B dominates the other two strategies
because its payoffs are greater no matter whether Player 2
follows Strategy 1, 2 or 3
Player 2
Strategy
A
Strategy
B
Strategy
C
Strategy 1 Strategy 2 Strategy 3
(7,-7) (9,-9) (8,-8)
(9,-9) (10,-10) (12,-12)
(8,-8) (8,-8) (8,-8)
Example: Dominated Strategies
Player
1
- For Player 2, Strategy 1 dominates the other two strategies
because its negative payoffs are less than or equal to those
of Strategies 2 or 3 no matter what strategy Player 1 follows
Player 2
Strategy
A
Strategy
B
Strategy
C
Strategy 1 Strategy 2 Strategy 3
(7,-7) (9,-9) (8,-8)
(9,-9) (10,-10) (12,-12)
(8,-8) (8,-8) (8,-8)
Example: Dominated Strategies
Player
1
-Therefore Player 1 will play Strategy B, Player 2 will play
Strategy 1 and the resulting payoff will be (9,-9)
Antoine Augustin Cournot
1801-1877
- French mathematician and economist
- Published his Recherches in 1838,
founding the theory of the firm
- Based his analysis of the economic
behavior of firms on a primitive
notion of game theory, specifically,
the idea of rivals achieving an equilibrium based on each one
making its best response to the other’s actions
Emile Borel
1871-1956
- French mathematician and
politician
- First to mathematically define the
notion of games of strategy
- Gave the first modern formulation
of a mixed strategy along with
a method of finding the best
strategy for certain two-person games
John von Neumann
1903-1957
- Jewish-Hungarian-American
prodigy (became Catholic in
his first marriage); one of the
greatest mathematicians
of the 20th century
- Made key contributions to many
fields, including set theory, quantum
physics, computer science, and the
development of the atomic bomb
- Proved the minimax theorem in 1926 in a paper presented to
the Gottingen Mathematical Society
- With Oskar Morgenstern, published Theory of Games and
Economic Behavior in 1944, effectively founding game theory
von Neumann’s Minimax Theorem
- In every finite two-person zero-sum game with perfect
information, there exists a strategy for each player which
allows both players to minimize their maximum losses
(hence the name “minimax”)
- In a game matrix the minimax payoff and its corresponding
strategies are easily recognized if they are pure strategies
… The minimax payoff is the smallest in its row and the
largest in its column when looked at from the point of
view of the row player
- When a pure minimax strategy exists in a game, that strategy
will be an equilibrium for that game
… Once the players settle on a pair of strategies
corresponding to a minimax point, they have no reason
to change strategies
Example of Minimax Equilibrium:
Abortion Issue
- The Democrats and Republicans are trying to formulate
their strategies about the abortion issue
- Each party will hold a convention, determine its platform
privately and announce it simultaneously
- Polling prior to the conventions indicates the percentage
of the vote that each party is likely to get with each possible
outcome of platform choices
- What strategy should each party adopt? Pro-life?
Pro-choice? Or dodge the issue?
Demo-
crats
Game Matrix for Abortion Issue
Republicans
Pro-
life
Pro-
choice
Dodge
issue
Pro-life Pro-choice Dodge
issue
(35%,65%) (10%,90%) (60%,40%)
(45%,55%) (55%,45%) (50%,50%)
(40%,60%) (10%,90%) (65%,35%)
Demo-
crats
Game Matrix for Abortion Issue
Republicans
Pro-
life
Pro-
choice
Dodge
issue
Pro-life Pro-choice Dodge
issue
(35%,65%) (10%,90%) (60%,40%)
(45%,55%) (55%,45%) (50%,50%)
(40%,60%) (10%,90%) (65%,35%)
45% is the largest value in the column (looked
at from the Democrats’ point of view)
Demo-
crats
Game Matrix for Abortion Issue
Republicans
Pro-
life
Pro-
choice
Dodge
issue
Pro-life Pro-choice Dodge
issue
(35%,65%) (10%,90%) (60%,40%)
(45%,55%) (55%,45%) (50%,50%)
(40%,60%) (10%,90%) (65%,35%)
45% is the smallest value in the row
Demo-
crats
Game Matrix for Abortion Issue
Republicans
Pro-
life
Pro-
choice
Dodge
issue
Pro-life Pro-choice Dodge
issue
(35%,65%) (10%,90%) (60%,40%)
(45%,55%) (55%,45%) (50%,50%)
(40%,60%) (10%,90%) (65%,35%)
45% is called a saddle point and is the minimax
Why Is the Minimax an Equilibrium?
- The payoff associated with the saddle point is called
the value of the game
- By playing the equilibrium strategy (i.e. the strategy that
results in the saddle point payoff), each player gets at
least the value of the game
- By playing the equilibrium strategy, an opponent can stop
a player from getting any more than the value of the game
- Since the game is zero-sum, a player’s opponent wants to
minimize that player’s payoff
- Neither player can gain by changing strategies unilaterally
What If There Is No Pure Strategy Minimax?
- By Von Neumann’s theorem, for a finite two-person zero-sum
game with perfect information, a mixed-strategy minimax must
exist
… This means each player selects from among the various
pure strategies in the game with some random probability
… This strategy will give the value of the game on average,
rather than with certainty as in a pure strategy minimax
- Finding the mixed-strategy minimax is often difficult
- One way is to choose a mixed strategy that gives the same
average payoff (expected value) whatever the opponent
does
… Expected value = sum (payoff x probability of payoff)
Example: The Escaped Convict
- A convict escapes from jail
- He has two possible routes of escape:
The highway or the forest
- The sheriff can only cover one route
- If they take different routes, escape is certain
- If both take the highway, the convict will certainly be caught
- If both take the forest, the probability of escape is 1-1/n,
where n represents the difficulty of searching in the forest
Sheriff
Convict
Highway Forest
Highway
Forest
(0,1)
(1,0) (1-1/n,1/n)
(1,0)
Game Matrix for The Escaped Convict (Matrix entries indicate the probability of escape or capture)
Sheriff
Convict
Highway Forest
Highway
Forest
(0,1)
(1,0) (1-1/n,1/n)
(1,0)
Game Matrix for The Escaped Convict (Matrix entries indicate the probability of escape or capture)
- First consider what happens from the Convict’s point of view if
the Sheriff takes the Highway
Sheriff
Convict
Highway Forest
Highway
(p)
Forest
(1-p)
(0,1)
(1,0) (1-1/n,1/n)
(1,0)
Game Matrix for The Escaped Convict (Matrix entries indicate the probability of escape or capture)
0(p)
1(1-p)
+
1-p =
Sheriff
Convict
Highway Forest
Highway
Forest
(0,1)
(1,0) (1-1/n,1/n)
(1,0)
Game Matrix for The Escaped Convict (Matrix entries indicate the probability of escape or capture)
- Next consider what happens from the Convict’s point of view if
the Sheriff goes into the Forest
Sheriff
Convict
Highway Forest
Highway
(p)
Forest
(1-p)
(0,1)
(1,0) (1-1/n,1/n)
(1,0)
Game Matrix for The Escaped Convict (Matrix entries indicate the probability of escape or capture)
1(p)
(1-1/n)(1-p)
+
= p + 1 – p – 1/n + p/n
= 1 – 1/n + p/n
Analysis of The Escaped Convict
- If the Sheriff takes the Highway: Convict’s average payoff is
1-p
- If the Sheriff takes the Forest: Convict’s average payoff is
1 – 1/n + p/n
- To find the probability of taking the Highway that gives
the same average payoff for the Convict whatever
the Sheriff does, set these two payoffs equal to each other:
1-p = 1 – 1/n + p/n
-p = -1/n + p/n
-p – p/n = -1/n
-np – p = -1
np + p = 1
p(n + 1) = 1
p = 1/(n+1) = Convict’s probability of taking Highway
Sheriff
Convict
Highway Forest
Highway
Forest
(0,1)
(1,0) (1-1/n,1/n)
(1,0)
Game Matrix for The Escaped Convict (Matrix entries indicate the probability of escape or capture)
p = 1/(n+1)
Sheriff
Convict
Highway Forest
Highway
Forest
(0,1)
(1,0) (1-1/n,1/n)
(1,0)
Game Matrix for The Escaped Convict (Matrix entries indicate the probability of escape or capture)
1-p = 1 - 1/(n+1)
Sheriff
Convict
Highway (q) Forest (1-q)
Highway
Forest
(0,1)
(1,0) (1-1/n,1/n)
(1,0)
Game Matrix for The Escaped Convict (Matrix entries indicate the probability of escape or capture)
1(q) + 0(1-q) = q
0(q) + 1/n(1-q) =
(1-q)/n
- It turns out that the Sheriff’s probabilities are the same:
q = (1-q)/n; nq = 1-q; (n+1)q = 1; q = 1/(n+1) and 1-q = 1-1/(n+1)
Sheriff
Convict
Highway Forest
Highway
Forest
(0,1)
(1,0) (1-1/n,1/n)
(1,0)
Game Matrix for The Escaped Convict (Matrix entries indicate the probability of escape or capture)
1-q = 1-1/(n+1)
- The bigger n is, the more likely the forest route is for both
- The bigger n is, the more likely the Convict is to escape
1-p = 1 - 1/(n+1)
An Executive Decision Maker for The
Escaped Convict Game
Forest
p, q = 1 - 1/(n+1)
Highway
p, q = 1/(n+1) The part
of the circle
representing
Highway
gets smaller
the bigger
n gets
The part of
the circle
representing
Forest gets
bigger the
bigger n
gets
Melvin Dresher
1911-1992
- Polish-American mathematician
- Studied problems of equilibrium
in non-zero-sum games
- With Merrill Flood, ran an
experiment at the RAND
Corporation in 1950 that resulted
in the formulation of the game
theoretical model of conflict and
cooperation later called the “Prisoner’s Dilemma” by Albert W.
Tucker (who was John Nash’s Ph.D. thesis advisor at Princeton)
Dresher and Flood’s RAND Experiment
- Dresher and Flood wondered whether real people actually
would tend to find the equilibrium point in a game when
the game had a potentially “better” outcome
- They ran 100 repetitions of the following game between
two players (Armen Alchian of UCLA’s economics
department and John Williams, head of RAND’s
mathematics department)
Williams
Alchian
Strategy 1 Strategy 2
Strategy
1
Strategy
2
(-1 cent,
2 cents)
(0, ½ cent) (1 cent,
-1 cent)
(½ cent,
1 cent)
Dresher and Flood’s RAND Experiment
Williams
Alchian
Strategy 1 Strategy 2
Strategy
1
Strategy
2
(-1 cent,
2 cents)
(0, ½ cent) (1 cent,
-1 cent)
(½ cent,
1 cent)
Dresher and Flood’s RAND Experiment
- For Alchian, Strategy 2 dominates Strategy 1, because his
payoffs in Strategy 2 are higher than his payoffs in Strategy 1
no matter what strategy Williams chooses
Williams
Alchian
Strategy 1 Strategy 2
Strategy
1
Strategy
2
(-1 cent,
2 cents)
(0, ½ cent) (1 cent,
-1 cent)
(½ cent,
1 cent)
Dresher and Flood’s RAND Experiment
- For Williams, Strategy 1 dominates Strategy 2, because his
payoffs from Strategy 1 are higher than his payoffs from
Strategy 2 no matter what strategy Alchian chooses
Williams
Alchian
Strategy 1 Strategy 2
Strategy
1
Strategy
2
(-1 cent,
2 cents)
(0, ½ cent) (1 cent,
-1 cent)
(½ cent,
1 cent)
Dresher and Flood’s RAND Experiment
- (0, ½ cent) is therefore the equilibrium point
- But (½ cent,1 cent) is a better outcome for both, even though
it is biased in favor of Williams
Dresher and Flood’s RAND Experiment
- The experiment showed no evidence of any instinctive
preference for the equilibrium point
… Chosen only 14 times out of the 100 games
- The players appeared to struggle over the course of the
experiment to secure mutual cooperation
… This is apparent from reading the logs of the comments
made by the players during the experiment
Dresher and Flood’s RAND Experiment Excerpts from Log
Alchian Williams
Game Strategy Strategy Alchian comment Williams comment
-------------------------------------------------------------------------------------------------
1 2 2 Williams will play Hope he’s bright.
Strategy 1 – sure
win. Hence if I
play 1 – I lose.
2 2 2 What is he doing?!! He isn’t but maybe
he’ll wise up.
3 2 1 Trying mixed? Okay, dope.
4 2 1 Has he settled on 1? Okay, dope.
5 1 1 Perverse! It isn’t the best of all
possible worlds.
6 2 2 I’m sticking to 2 Oh ho! Guess I’ll
since he will mix for have to give him
at least 4 more times. another chance.
Dresher and Flood’s RAND Experiment Excerpts from Log
Alchian Williams
Game Strategy Strategy Alchian comment Williams comment
-------------------------------------------------------------------------------------------------
17 1 1 The stinker.
18 1 1 He’s crazy. I’ll teach
him the hard way.
19 2 1 I’m completely Let him suffer.
confused. Is he trying
to convey information
to me?
20 2 1
21 2 2 Maybe he’ll be a
good boy now.
22 1 2 Always takes time to
learn.
Albert W. Tucker
1905-1995
- Canadian-born American
mathematician, chair of
Princeton University mathematics
department
- Chaired the AP Calculus
committee of the College
Board during the 1960’s
- Ph.D thesis advisor of John Nash
- Restructured the game in Dresher and Flood’s experiment
and gave it the narrative we know today as the Prisoner’s
Dilemma
The Prisoner’s Dilemma
- Two criminals collaborate in the commission of a robbery
- They are arrested and held separately on a charge of
carrying concealed weapons, which carries a one year
jail term
- The testimony of each one is required in order to convict
the other of the robbery, which carries a 20 year jail term
- Each prisoner is offered immunity and a suspended sentence if
he will turn state’s evidence and testify against the other
prisoner (but he could still be convicted on the basis of the
testimony of the other prisoner!)
- If both prisoners confess, they each go to jail for 5 years
- If neither prisoner confesses, they each go to jail for 1 year
on the concealed weapons charge
Second Prisoner
First
Prisoner
Do not confess Confess
(“Cooperate”) (“Defect”)
Confess
(“Defect”)
Do not
confess
(“Cooperate”)
(-5 yr., -5 yr.)
(-20 yr., 0) (-1 yr., -1 yr.)
(0, -20 yr.)
Game Matrix for the Prisoner’s Dilemma
Observations about the Prisoner’s Dilemma
- This game is not zero-sum (conditions for minimax do not hold)
- The “minimax” strategy for each prisoner is to defect
… Look at the game from the first prisoner’s point of view
… If the second prisoner defects, our prisoner
must either cooperate and go to jail for 20 years, or
defect and go to jail for 5 years
… If the second prisoner cooperates, our prisoner can serve
1 year by cooperating also, or go free by defecting
- If the prisoners could somehow both cooperate by not
confessing, they would achieve better payoffs
(they both would serve 1 year as opposed to 5 years)
- Experiments show that people frequently do defect in
situations similar to the Prisoner’s Dilemma
Cooperate Defect
Co-
operate
Defect
(R, R)
(T, S) (P, P)
(S, T)
Canonical Prisoner’s Dilemma Matrix
R = Reward for mutual cooperation
S = Sucker’s payoff
T = Temptation to defect
P = Punishment for defection
T > R > P > S
2R > T + S
Applications of the Prisoner’s Dilemma
- Nuclear arms race
… Any one country can “defect” by breaking an arms treaty
- Global climate change
… Any one country can “defect” by not adhering to an
emissions standard
- Steroid use in professional athletics
… Any one athlete can “defect” by using performance
enhancing drugs
- OPEC (or any other economic cartel)
… Any one country can “defect” by shipping more oil than
its production ceiling allows
What if we play the Prisoner’s Dilemma
repeatedly? (Iterated Prisoner’s Dilemma)
- If the game is played exactly N times then defect each time
is a dominant strategy
- Why?
… Assume the game is played N+1 times
… Defect on the last play, since no retaliation is possible
… But then the game reduces to N times … etc.
- If we don’t know how many times the game will be played,
it gets a lot more interesting
Robert Axelrod
1943-
- Professor of Political Science and
Public Policy, University of Michigan
- Winner of MacArthur Fellowship
- Published seminal work, The
Evolution of Cooperation, in 1984,
summarizing the results of his study
of the Iterated Prisoner’s Dilemma
Axelrod’s Iterated Prisoner’s Dilemma
Tournament
- Different computer programs were run against each other
in two pairwise round robin tournaments
… Each game lasted 200 “moves” (but this was not told
to the program developers before the tournament)
- Each program embodied a particular strategy
… Examples: “Always defect”; “Always cooperate”; “Co-
operate until your opponent defects, then always defect”;
“Co-operate but defect 10% of the time”; etc.
- Participants in the second tournament had access to the
results of the first one
- There was also a “random” competitor in each tournament
… Defected or cooperated with 50% probability
Axelrod’s Iterated Prisoner’s Dilemma
Tournament – Game Matrix
Cooperate Defect
Co-
operate
Defect
(3, 3)
(5, 0) (1, 1)
(0, 5)
- The winner was the program accumulating the most total
points over the entire tournament
Axelrod’s Iterated Prisoner’s Dilemma
Tournament – The results - A program called “Tit for Tat” won both tournaments
… Submitted by Anatol Rapoport of U. of Toronto
… Started by cooperating
… Then did whatever its opponent did on the prior move
… Strangely like the “Golden Rule” plus lex talionis
- Other programs that did well also started by cooperating
… If they started by cooperating, many continued doing so
… These programs did well because they did well with
each other and because there were enough of them
to raise each other’s average score
- “Greedy” or “selfish” programs didn’t do as well
… They tended to be rapidly punished by counter-defection
Axelrod’s observations about successful
Iterated Prisoner’s Dilemma strategies
- Nice strategies finish first
… “Nice” = not being the first to defect
- Good strategies are retaliatory
… Punish defection immediately, no matter how
cooperative the interaction has been so far
- Good strategies are forgiving
… Do not continue punishing defection for more than
one move
- Good strategies are clear
... Other strategies easily recognize and adjust to them
… None of the more complex programs performed as well
as the simple “Tit For Tat”
Implications of the Iterated Prisoner’s
Dilemma - Biology: Suppose programs reproduce or die out according
to their scores
… Even strategies that do poorly can affect which strategies
do best
… “Tit for Tat” and programs like it end up dominating in
population simulations
… Suggests that cooperation may evolve in a world of
competing entities
- Philosophy: Suggests that moral principles underlying
cooperation (modeled in a very primitive fashion by Axelrod’s
observations about successful strategies) could result from
evolution
- Politics and law: Suggests a basis for social contract theory
Player 2
Player
1
Action X Action Y
Action
A
A Action
B
Payoff from (A, X) Payoff from (A, Y)
Payoff from (B, X) Payoff from (B, Y)
Normal Form Representation: Game Matrix
- Convention: Each cell shows
(Player 1’s payoff, Player 2’s payoff)
The Cuban Missile Crisis: Paradox
- Khrushchev’s foreknowledge of Kennedy’s decision would
have worked to Khrushchev’s disadvantage, assuming that
Kennedy was aware of Khrushchev’s foreknowledge!
- Why? Reason as follows –
… If Kennedy decides to back down, Khrushchev will know it,
and will decide to stand firm: payoff (2,4) (vs. backing down
with payoff (3,3))
… If Kennedy decides to stand firm, Khrushchev will know it,
and will decide to back down: payoff (4,2) (vs. standing firm
with payoff (1,1))
- Thus if Kennedy backs down the outcome will be (2,4),
whereas if he stands firm the outcome will be (4,2); he would
therefore choose to stand firm, causing Khrushchev to
back down (which is what actually transpired)
A Utility Function
W1 W2
Wager: pW1+(1-p)W2
W
U(W), E(W)
E[pW1+(1-p)W2]
U[pW1+(1-p)W2]
- The utility of the expected value is less than the
expected value, so this is a risk-averse utility function
Utility function
Expected value
function
W = amount
of money
p = probability
(between
0 and 1)
U = utility
E = expected
value
A Utility Function
W1=100 W2=200
Wager: 50%(100) + 50%(200)
W
U(W), E(W)
E[pW1+(1-p)W2] = 150
U[pW1+(1-p)W2] = 140
- The utility of the expected value is less than the
expected value, so this is a risk-averse utility function
Utility function
Expected value
function
W = amount
of money
p = probability
(between
0 and 1) = 50%
U = utility
E = expected
value
Another Utility Function
W1 W2
Wager: pW1+(1-p)W2
W
U(W), E(W)
E[pW1+(1-p)W2]
U[pW1+(1-p)W2]
W = amount
of money
p = probability
(between
0 and 1)
U = utility
E = expected
value
- The utility of the expected value is greater than the
expected value, so this is a risk-seeking utility function
Utility function
Expected
value function
A Utility Function
100 200
Wager: 50%(100)+50%(200)
W
U(W), E(W)
E[pW1+(1-p)W2] = 150
U[pW1+(1-p)W2] = 160
W = amount
of money
p = probability
(between
0 and 1) = 50%
U = utility
E = expected
value
- The utility of the expected value is greater than the
expected value, so this is a risk-seeking utility function
Utility function
Expected
value function
Basic assumptions of game theory - Utility
Four axioms of utility theory define a rational decision maker:
- Completeness: For every A and B, either A>B, A=B or A<B:
Either A is preferred to B, as good as B or worse than B
- Transitivity: For every A, B and C, if A>B and B>C then A>C:
If A is preferred to B and B is preferred to C, then A is always preferred to C
- Independence: For every set of gambles A, B and C where A>B, there
should be some weighting factor 0<w<1 such that wA+(1-w)C > wB+(1-w)C:
If A is preferred to B, then the weighted average of A and C should be
preferred to the weighted average of B and C (for at least some weight)
- Continuity: For every set of gambles A, B and C where A>B>C, there must
be a probability p such that B = pA + (1-p)C:
If B is ranked between A and C, there must be a possible combination of A
and C that makes the individual indifferent between that combination and B
(otherwise, it would not be logical for B to be ranked between A and C)
Example: “Matching Pennies”
- Two players each display a penny at the same time
- If both pennies match (i.e. both are heads or both are tails),
Player 1 wins a penny
- If the pennies do not match, Player 2 wins a penny
Observations about Pascal’s Wager
- The strategy of acting as if God exists dominates the
strategy of acting as if God does not exist
… The infinite magnitude of the punishment in the case
that God actually exists means that even if the
probability (p) of that case is minuscule, the risk is still
too great
… The infinite magnitude of the reward in the case that
God actually exists means that even if the probability (p)
of that case is minuscule, the reward is still very great
… Therefore the infinity of the payoffs following from God’s
existence makes the payoffs following from God’s
nonexistence irrelevant
Player 2
Player
1
Show head Show tail
Show
head
Show
tail
(+1, -1) (-1, +1)
(-1, +1) (+1, -1)
Game matrix for “Matching Pennies”
Player 2
Player
1
Show head (q) Show tail (1-q)
Show
head
(p)
Show
tail
(1-p)
(+1, -1) (-1, +1)
(-1, +1) (+1, -1)
Game matrix for “Matching Pennies”
Player 2
Player
1
Show head (q) Show tail (1-q)
Show
head
Show
tail
(+1, -1) (-1, +1)
(-1, +1) (+1, -1)
Game matrix for “Matching Pennies”
q + -1 + q 2q-1 =
- Expected value to Player 2 of Player 1 showing heads = 2q-1
Player 2
Player
1
Show head (q) Show tail (1-q)
Show
head
Show
tail
(+1, -1) (-1, +1)
(-1, +1) (+1, -1)
Game matrix for “Matching Pennies”
-q + 1 - q = 1-2q
- Expected value to Player 2 of Player 1 showing tails = 1-2q
Player 2
Player
1
Show head (q) Show tail (1-q)
Show
head
Show
tail
(+1, -1) (-1, +1)
(-1, +1) (+1, -1)
Game matrix for “Matching Pennies”
- Player 2 will randomize if 2q-1 = 1-2q; 4q = 2; q = ½, 1-q = ½
- Therefore Player 2 should show heads ½ the time and tails
½ the time, randomly
q + -1 + q 2q-1 =
-q + 1 - q 1-2q =
Player 2
Player
1
Show head Show tail
Show
head
(p)
Show
tail
(1-p)
(+1, -1) (-1, +1)
(-1, +1) (+1, -1)
Game matrix for “Matching Pennies”
- Similarly, Player 1 will randomize if 1-2p = 2p-1; 4p = 2; p = ½
- Therefore Player 1 should show heads ½ the time and tails
½ the time, randomly
-p
+
1 - p
=
1-2p
p
+
-1 + p
=
2p-1
Axelrod’s Iterated Prisoner’s Dilemma
Tournament – The results
- Example: Tit For Tat vs. Joss
… Joss: Similar to Tit For Tat, but defects randomly (10%
of the time) after the other player cooperates
… In this sequence Joss randomly defects on the 6th move
… Tit For Tat then defects back on the 7th move
even though Joss returns to cooperating
… Joss then defects back in response on the 8th move
… This results in an “echo effect” -- Joss defects
on all the later even numbered moves and Tit For Tat
defects on all the later odd numbered moves
… On the 25th move Joss randomly defects again, causing Tit
for Tat to defect back, and another echo begins, causing
both programs to defect on every move
Axelrod’s Iterated Prisoner’s Dilemma
Tournament – Tit For Tat vs. Joss Moves Results
1-20 11111 23232 32323 23232
21-40 32324 44444 44444 44444
41-60 44444 44444 44444 44444
61-80 44444 44444 44444 44444
81-100 44444 44444 44444 44444
101-120 44444 44444 44444 44444
121-140 44444 44444 44444 44444
141-160 44444 44444 44444 44444
161-180 44444 44444 44444 44444
181-200 44444 44444 44444 44444
1 = both cooperated 3 = Joss only cooperated
2 = Tit for Tat only cooperated 4 = both defected
Final score: Tit For Tat 236, Joss 241
Tit For Tat Always Cooperate
Spiteful Bully (Defect until opponent defects back, then
cooperate unless opponent defects three times in a row)
Collectively stable strategies in the
Iterated Prisoner’s Dilemma
- A strategy is collectively stable if in a population of
entities using it, no other strategy can successfully
invade and establish itself
… The new strategy would have to get a higher score
against the “native” strategy than the “native” strategy
gets against another copy of the “native” strategy