it’s all in the game - welcome to the olli at uci blog · what is a “game?” - a situation in...

It’s All in the Game An Introduction to Game Theory

Introduction to Game Theory

- What is a “game?”

… Basic concepts and assumptions

… Brief early history of game theory

- Dominant and dominated strategies

- Two-player zero-sum games: the minimax theorem

- Two-player non-zero-sum games: The Prisoner’s Dilemma

What is a “Game?”

- A situation in which entities (players) with opposing interests

must select from among different courses of action (strategies)

… Each strategy leads to a different outcome (payoff)

- Applies to many fields:

… Economics

… Biology

… Sociology

… War

… Politics

… Philosophy

… Recreation and entertainment

- The object of game theory is to determine the players’ best

strategies

Example: “Matching Pennies”

- Two players each display a penny at the same time

- If both pennies match (i.e. both are heads or both are tails),

Player 1 wins a penny

- If the pennies do not match, Player 2 wins a penny

Player 2

Player

1

Show head Show tail

Show

head

Show

tail

(+1, -1) (-1, +1)

(-1, +1) (+1, -1)

Game matrix for “Matching Pennies” - Player 1 wins all matches; Player 2 wins all mis-matches

Match

Match

Mis-match

Mis-match

- Convention: Each cell shows

(Player 1’s payoff, Player 2’s payoff)

Observations about “Matching Pennies”

- Both players make their “moves” at the same time

- The payoffs in each cell add up to zero (zero-sum game)

- No fixed strategy guarantees a win for either player

… Fixed strategies – for example, showing heads all

the time – are called pure strategies

- What if a player shows heads or tails randomly?

… This type of strategy is called a mixed strategy

… If each player plays a 50% mixed strategy of heads/tails

(for example, by flipping the coins) then each player will

win 50% of the time on the average

Another Representation: Extensive Form

Player

1

Player

2

Player

2

Action

A

Action

B

Action

X

Action

Y

Action

X

Action

Y

Payoff from (A, X)

Payoff from (A, Y)

Payoff from (B, X)

Payoff from (B, Y)

- Extensive form is used when players’ actions alternate

over time so that each player knows the other’s prior move(s)

Social Contract Theory –

The State of Nature

- Thomas Hobbes: Without a central government the natural

state of humanity is a state of war of every individual against

every other individual

- In this state, individuals make economic decisions based

strictly on what they can compel by their own personal force

Whatsoever therefore is consequent to a time of Warre,

where every man is Enemy to every man; the same is

consequent to the time, wherein men live without other

security, than what their own strength, and their own

invention shall furnish them withall. In such condition, there

is no place for Industry; because the fruit thereof is uncertain:

and consequently no Culture of the Earth; no Navigation, nor

use of the commodities that may be imported by Sea; no

commodious Building; no Instruments of moving, and removing

such things as require much force; no Knowledge of the face

of the Earth; no account of Time; no Arts; no Letters; no

Society; and which is worst of all, continuall feare, and danger

of violent death; And the life of man, solitary, poore, nasty,

brutish, and short.

- Thomas Hobbes, Leviathan, 1651

Example: The Farmers’ Dilemma

(David Hume, 1740)

- Two farmers each have a bumper crop of corn (400 acres)

- Each farmer needs the other’s help to completely harvest

the corn, or a substantial portion (200 acres) will rot in the field

- The effort expended to harvest the neighboring farmer’s

field results in loss of yield of 100 acres of one’s own field

- Will the farmer whose corn ripens later (Farmer 2) help the

farmer whose corn ripens first (Farmer 1)?

- Starting from the end of the game tree, backward induction

eliminates actions that are not payoff-maximizing at each stage


(David Hume, 1740)

Farmer

2

Farmer

1

Farmer

1

Help

Don’t

help

Help

Don’t

help

Help

Don’t

help

(300,300)

(100,400)

(400,100)

(200,200)


(David Hume, 1740)

Farmer

2

Farmer

1

Farmer

1

Help

Don’t

help

Help

Don’t

help

Help

Don’t

help

(300,300)

(100,400)

(400,100)

(200,200)

x


(David Hume, 1740)

Farmer

2

Farmer

1

Farmer

1

Help

Don’t

help

Help

Don’t

help

Help

Don’t

help

(300,300)

(100,400)

(400,100)

(200,200)

x

x

Observations about the Farmers’ Dilemma

- Rational analysis seems to lead to a sub-optimal result

… If the farmers could somehow trust each other, and

cooperate, they would each get a payoff of 300 vs. 200

- Foreshadows other paradoxes of rationality that are

revealed by game theory, notably in the Prisoner’s Dilemma

Your corn is ripe today; mine will be so tomorrow. ’Tis

profitable for us both, that I shou’d labour with you to-day, and

that you shou’d aid me to-morrow. I have no kindness for you,

and know you have as little for me. I will not, therefore, take

any pains on your account; and should I labour with you upon

my own account, in expectation of a return, I know I shou’d

be disappointed, and that I shou’d in vain depend upon your

gratitude. Here then I leave you to labour alone: You treat me

in the same manner. The seasons change; and both of us

lose our harvests for want of mutual confidence and security.

- David Hume

Basic assumptions of game theory

- Players are rational

… “We wish to find the mathematically complete principles

which define “rational behavior” for the participants in a

social economy, and to derive from them the general

characteristics of that behavior.” (Von Neumann and

Morgenstern, Theory of Games and Economic Behavior)

- Players are seeking to maximize “utility”

… Max utility = most preferred outcome

- Players’ interests are opposed

... Their preferences may still coincide

- Players’ strategies comprehensively specify their complete

plans of action under all possible circumstances

Basic assumptions of game theory -

Rationality

- Are people really rational in conflict situations?

… Experiments suggest they often are not

- Rationality assumes complete (“perfect”) information

… All players know all other players, outcomes, payoffs, etc.

… All players know all preferences of all other players

… All the above are “common knowledge”

- In many cases people do not actually have perfect information

… Partial knowledge results in bounded rationality

(myopic behavior, poor decisions, etc.)

- The assumption of rationality can lead to paradoxes

Example: The Cuban Missile Crisis

- In October 1962 the U.S. discovered that Russia was building

a missile base in Cuba

- A Russian fleet was on its way to Cuba to bring in supplies

(including missiles with nuclear warheads)

- The U.S. set up a naval blockade

- Basically the U.S. and Russia were playing a game of

“Chicken” … who would “back down” first?

- Suppose that President Kennedy had been notified that his

staff was infiltrated by Russian spies and that Premier

Khrushchev would immediately learn of all his decisions

… How great an advantage would this have given Khrushchev?

The Cuban Missile Crisis

Possible Outcomes and Preferences (4 = most desirable, 1 = least desirable)

US Russia

US stands firm,

Russia backs down

US backs down,

Russia stands firm

4

US backs down,

Russia backs down

US backs down,

Russia backs down 3

US backs down,

Russia stands firm

US stands firm,

Russia backs down

2

US stands firm,

Russia stands firm

US stands firm,

Russia stands firm

1

Russia

Back down Stand firm

Back

down

Stand

firm

(3,3)

(4,2) (1,1)

(2,4)

US

Game Matrix for the Cuban Missile Crisis

The Cuban Missile Crisis: Paradox

- Foreknowledge of Kennedy’s decision would have worked

to Khrushchev’s disadvantage!

… Assumes Kennedy was aware of the foreknowledge

Russia


Back

down

Stand

firm

(3,3)

(4,2) (1,1)

(2,4)

US


- If Kennedy decides to back down, Khrushchev will know it,

and will decide to stand firm (payoff 4 vs. 3: outcome (2,4))

Russia


Back

down

Stand

firm

(3,3)

(4,2) (1,1)

(2,4)

US


- If Kennedy decides to stand firm, Khrushchev will know it,

and will decide to back down (payoff 2 vs. 1: outcome (4,2))

Russia


Back

down

Stand

firm

(3,3)

(4,2) (1,1)

(2,4)

US


- Therefore if Kennedy has to choose between only those two

alternatives, he will choose to stand firm (payoff 4 vs. 2),

causing Khrushchev to back down (which is what happened)

Basic assumptions of game theory - Utility

- “Utility” is supposed to measure players’ preferences

… Not necessarily the same as monetary payoff, but could

be correlated with it

… Takes into account altruism, risk tolerance, etc.

- Utility is closely related to the definition of rationality

… Rational decision makers should maximize expected utility

- Do people actually try to maximize utility in conflict situations?

… Experiments suggest they often may not

… People may make the “first decision that works”, rather

than the “best possible” decision

Herbert Simon

1916-2001

- American psychologist and

decision science theorist, Carnegie-

Mellon University; Nobel Prize, 1978

- One of the key founders of the fields

of artificial intelligence and decision

theory

- Introduced the concept of bounded

rationality

- Put forth the key idea that decision makers satisfice instead

of actually attempting to maximize utility; that is, they tend to

make the first decision that satisfies a set of requirements they

are trying to fulfill, instead of the truly optimal decision

http://upload.wikimedia.org/wikipedia/commons/d/dd/HerbertSimon.jpg

Utility and Risk Attitudes - Which would you prefer?

… A lottery ticket that pays out $10 with probability 50% and $0

otherwise, or

… A lottery ticket that pays out $3 with probability 100%



otherwise, or


- How about:

… A lottery ticket that pays out $100,000,000 with probability 50%

and $0 otherwise, or




otherwise, or


- How about:


and $0 otherwise, or


- Risk-neutral people only care about the ticket’s expected value

- Risk-averse (most) people prefer the ticket’s expected value to

the ticket

- Risk-seeking people prefer the ticket to the ticket’s expected

value

Maximizing Expected Utility Utility

Money $200 $1500 $5000

Buy a bike (utility = 1)

Buy a nicer bike (utility = 2)

Buy a used car (utility = 3)

Risk-averse

utility function

with diminishing

marginal utility

- Risk-averse utility functions flatten out the more money is

involved because of diminishing marginal utility

… Each additional dollar provides less utility than the dollar

before it

Maximizing Expected Utility Utility

Money $200 $1500 $5000




Risk-averse

utility function

with diminishing

marginal utility

- Choice 1: get $1500 with probability 100%

- Choice 2: Lottery – get 60% chance of $200 or a 40% chance

of $5000

- Which is preferred with a risk-averse utility function?

Maximizing Expected Utility

- Choice 1: get $1500 with probability 100%

… Expected utility = 2

- Choice 2: Lottery – get 60% chance of $200 or a 40% chance

of $5000 … Expected utility = 60%(1) + 40%(3) = 1.8 < 2

… Expected value = 60%($200) + 40%($5000) = $2120 > $1500

- So: maximizing expected utility is consistent with risk aversion

Utility

Money $200 $1500 $5000




Risk-averse

utility function

with diminishing

marginal utility

Expected utility of

Choice 2 = 1.8

Expected value of Choice 2 = $2120

Lottery

function

Utility Theory Axioms (“>” = “is preferred to”)

- Completeness: Either A > B, B > A or A = B

- Transitivity: If A > B and B > C, then A > C

- Independence: If A > B, then a weighted average of

A and C > a weighted average of B and C

- Continuity: If A > B > C, it should always be possible to

find a probability P that makes the individual indifferent

between gambling and not gambling in the following

situation –

B

P%

1-P%

A

C

Gamble

Don’t

gamble

Utility Functions and Rationality

- Only risk-neutrality or risk-aversion are rational

… Risk-aversion must also be independent of size of risk

- Transitivity implies that only rigorously consistent

preferences are rational

- Experiments (Tversky, Kahneman, Thaler) show that some

realistic utility functions are not rational

… Risk-seeking (gambling)

... Risk-aversion increasing with ratio of risk to capital/wealth

… Intransitive: Just because I prefer A to B, and B to C,

does not necessarily mean I always prefer A to C

- Preferences can depend on changes in the reference points

from which gains and losses will occur

Daniel Kahneman

1934-

- Israeli psychologist and economist,

Hebrew University, University of

British Columbia, University of

California-Berkeley and Princeton

University; Nobel Prize, 2002

- Expert in behavioral economics

- With Amos Tversky and their

graduate student Richard Thaler, developed prospect theory,

which proposes that people do not actually maximize expected

utility when making decisions, but rather tend to adjust their

risk-aversion to reference points from which gains and losses

will occur, causing violations of the transitivity assumption of

rationality (inconsistency of preferences)

http://en.wikipedia.org/wiki/File:Daniel_KAHNEMAN.jpg

Example: Reference Point Sensitivity

(Richard Thaler’s experiment)

- Option 1: Everyone just won $30:

Flip a coin vs. no coin flip

… Heads: win $9

… Tails: lose $9

… 70% chose the coin flip

- Option 2: Everyone starts at $0:

Flip a coin vs. $30 for sure

… Heads: win $39

… Tails: win $21

… 43% chose the coin flip

- Participants based their choice on the reference point

… No actual economic difference between Options 1 and 2

Basic assumptions of game theory -

Players’ interests are opposed

- If players’ interests are opposed, can they cooperate?

… Their preferences can still coincide

- Cooperation can arise naturally from opposing interests

… Communication, signaling, bargaining, negotiation

… Coalitions (for example, labor unions vs. management)

… Evolution over time in repeated games (Iterated Prisoner’s

Dilemma)

Basic assumptions of game theory –

Comprehensive strategies

- Is it really possible to formulate comprehensive strategies

specifying what actions players will take under all possible

circumstances?

… For some types of games this may not be possible

… Players may not know all the other players, all possible

outcomes, etc. (bounded rationality)

… It may be impractical to calculate comprehensive

strategies even though they may be theoretically possible

(for example, chess)

Brief early history of game theory

- Pascal’s Wager

- Cournot

- Borel

- Von Neumann: Minimax theory

- Dresher and Flood and Tucker: RAND Corporation

experiments and the Prisoner’s Dilemma

Blaise Pascal

(1623-1662)

- French mathematician and

philosopher

- Believed that nature is actually

infinite and that human reason

is capable of a nobility and

dignity that transcend human

finitude

- Maintained that although we are incapable of knowing

whether or not God exists, we must make a “wager” on one

or the other alternative in determining how we live our lives

“God is, or He is not.” But to which side shall we incline?

Reason can decide nothing here. There is an infinite

chaos which separated us. A game is being played at

the extremity of this infinite distance where heads or tails

will turn up... Which will you choose then? Let us see.

Since you must choose, let us see which interests you least.

You have two things to lose, the true and the good; and two

things to stake, your reason and your will, your knowledge

and your happiness; and your nature has two things to shun,

error and misery. Your reason is no more shocked in choosing

one rather than the other, since you must of necessity choose...

But your happiness? Let us weigh the gain and the loss in

wagering that God is...

-- Pascal

Game Matrix for Pascal’s Wager

Reality

You

God exists (p) God does not exist (1-p)

Act as

if God

exists

Act as

if God

does

not

exist

Eternal

reward

Eternal

punishment

Earthly

Reward only

Pious life

only

But there is here an infinity of an infinitely

happy life to gain, a chance of gain against

a finite number of chances of loss, and

what you stake is finite. It is all divided;

wherever the infinite is and there is not an

infinity of chances of loss against that of

gain, there is no time to hesitate, you must

give all...

-- Pascal

Game Matrix for Pascal’s Wager Reality

You

God exists (p) God does not exist (1-p)

Act as

if God

exists

Act as

if God

does

not

exist

Eternal

reward

Eternal

punishment

Earthly

Reward only

Pious life

only x

x - The infinity of the payoffs if God exists makes the payoffs if God

does not exist irrelevant

- Therefore the strategy of acting as if God exists dominates the

strategy of acting as if God does not exist

Dominated Strategies

- Strategy A dominates strategy B if a player’s payoff with

strategy A is always greater than or equal to that of strategy

B no matter what the other players do

- It’s often possible to simplify a complex game by

eliminating dominated strategies

Player 2

Strategy

A

Strategy

B

Strategy

C

Strategy 1 Strategy 2 Strategy 3

(7,-7) (9,-9) (8,-8)

(9,-9) (10,-10) (12,-12)

(8,-8) (8,-8) (8,-8)

Example: Dominated Strategies

Player

1

Player 2

Strategy

A

Strategy

B

Strategy

C


(7,-7) (9,-9) (8,-8)

(9,-9) (10,-10) (12,-12)

(8,-8) (8,-8) (8,-8)


Player

1

- For Player 1, Strategy B dominates the other two strategies

because its payoffs are greater no matter whether Player 2

follows Strategy 1, 2 or 3

Player 2

Strategy

A

Strategy

B

Strategy

C


(7,-7) (9,-9) (8,-8)

(9,-9) (10,-10) (12,-12)

(8,-8) (8,-8) (8,-8)


Player

1

- For Player 2, Strategy 1 dominates the other two strategies

because its negative payoffs are less than or equal to those

of Strategies 2 or 3 no matter what strategy Player 1 follows

Player 2

Strategy

A

Strategy

B

Strategy

C


(7,-7) (9,-9) (8,-8)

(9,-9) (10,-10) (12,-12)

(8,-8) (8,-8) (8,-8)


Player

1

-Therefore Player 1 will play Strategy B, Player 2 will play

Strategy 1 and the resulting payoff will be (9,-9)

Antoine Augustin Cournot

1801-1877

- French mathematician and economist

- Published his Recherches in 1838,

founding the theory of the firm

- Based his analysis of the economic

behavior of firms on a primitive

notion of game theory, specifically,

the idea of rivals achieving an equilibrium based on each one

making its best response to the other’s actions

Emile Borel

1871-1956

- French mathematician and

politician

- First to mathematically define the

notion of games of strategy

- Gave the first modern formulation

of a mixed strategy along with

a method of finding the best

strategy for certain two-person games

John von Neumann

1903-1957

- Jewish-Hungarian-American

prodigy (became Catholic in

his first marriage); one of the

greatest mathematicians

of the 20th century

- Made key contributions to many

fields, including set theory, quantum

physics, computer science, and the

development of the atomic bomb

- Proved the minimax theorem in 1926 in a paper presented to

the Gottingen Mathematical Society

- With Oskar Morgenstern, published Theory of Games and

Economic Behavior in 1944, effectively founding game theory

von Neumann’s Minimax Theorem

- In every finite two-person zero-sum game with perfect

information, there exists a strategy for each player which

allows both players to minimize their maximum losses

(hence the name “minimax”)

- In a game matrix the minimax payoff and its corresponding

strategies are easily recognized if they are pure strategies

… The minimax payoff is the smallest in its row and the

largest in its column when looked at from the point of

view of the row player

- When a pure minimax strategy exists in a game, that strategy

will be an equilibrium for that game

… Once the players settle on a pair of strategies

corresponding to a minimax point, they have no reason

to change strategies

Example of Minimax Equilibrium:

Abortion Issue

- The Democrats and Republicans are trying to formulate

their strategies about the abortion issue

- Each party will hold a convention, determine its platform

privately and announce it simultaneously

- Polling prior to the conventions indicates the percentage

of the vote that each party is likely to get with each possible

outcome of platform choices

- What strategy should each party adopt? Pro-life?

Pro-choice? Or dodge the issue?

Demo-

crats

Game Matrix for Abortion Issue

Republicans

Pro-

life

Pro-

choice

Dodge

issue

Pro-life Pro-choice Dodge

issue

(35%,65%) (10%,90%) (60%,40%)

(45%,55%) (55%,45%) (50%,50%)

(40%,60%) (10%,90%) (65%,35%)

Demo-

crats


Republicans

Pro-

life

Pro-

choice

Dodge

issue


issue

(35%,65%) (10%,90%) (60%,40%)

(45%,55%) (55%,45%) (50%,50%)

(40%,60%) (10%,90%) (65%,35%)

45% is the largest value in the column (looked

at from the Democrats’ point of view)

Demo-

crats


Republicans

Pro-

life

Pro-

choice

Dodge

issue


issue

(35%,65%) (10%,90%) (60%,40%)

(45%,55%) (55%,45%) (50%,50%)

(40%,60%) (10%,90%) (65%,35%)

45% is the smallest value in the row

Demo-

crats


Republicans

Pro-

life

Pro-

choice

Dodge

issue


issue

(35%,65%) (10%,90%) (60%,40%)

(45%,55%) (55%,45%) (50%,50%)

(40%,60%) (10%,90%) (65%,35%)

45% is called a saddle point and is the minimax

Why Is the Minimax an Equilibrium?

- The payoff associated with the saddle point is called

the value of the game

- By playing the equilibrium strategy (i.e. the strategy that

results in the saddle point payoff), each player gets at

least the value of the game

- By playing the equilibrium strategy, an opponent can stop

a player from getting any more than the value of the game

- Since the game is zero-sum, a player’s opponent wants to

minimize that player’s payoff

- Neither player can gain by changing strategies unilaterally

What If There Is No Pure Strategy Minimax?

- By Von Neumann’s theorem, for a finite two-person zero-sum

game with perfect information, a mixed-strategy minimax must

exist

… This means each player selects from among the various

pure strategies in the game with some random probability

… This strategy will give the value of the game on average,

rather than with certainty as in a pure strategy minimax

- Finding the mixed-strategy minimax is often difficult

- One way is to choose a mixed strategy that gives the same

average payoff (expected value) whatever the opponent

does

… Expected value = sum (payoff x probability of payoff)

Example: The Escaped Convict

- A convict escapes from jail

- He has two possible routes of escape:

The highway or the forest

- The sheriff can only cover one route

- If they take different routes, escape is certain

- If both take the highway, the convict will certainly be caught

- If both take the forest, the probability of escape is 1-1/n,

where n represents the difficulty of searching in the forest

Sheriff

Convict

Highway Forest

Highway

Forest

(0,1)

(1,0) (1-1/n,1/n)

(1,0)

Game Matrix for The Escaped Convict (Matrix entries indicate the probability of escape or capture)

Sheriff

Convict

Highway Forest

Highway

Forest

(0,1)

(1,0) (1-1/n,1/n)

(1,0)


- First consider what happens from the Convict’s point of view if

the Sheriff takes the Highway

Sheriff

Convict

Highway Forest

Highway

(p)

Forest

(1-p)

(0,1)

(1,0) (1-1/n,1/n)

(1,0)


0(p)

1(1-p)

+

1-p =

Sheriff

Convict

Highway Forest

Highway

Forest

(0,1)

(1,0) (1-1/n,1/n)

(1,0)


- Next consider what happens from the Convict’s point of view if

the Sheriff goes into the Forest

Sheriff

Convict

Highway Forest

Highway

(p)

Forest

(1-p)

(0,1)

(1,0) (1-1/n,1/n)

(1,0)


1(p)

(1-1/n)(1-p)

+

= p + 1 – p – 1/n + p/n

= 1 – 1/n + p/n

Analysis of The Escaped Convict

- If the Sheriff takes the Highway: Convict’s average payoff is

1-p

- If the Sheriff takes the Forest: Convict’s average payoff is

1 – 1/n + p/n

- To find the probability of taking the Highway that gives

the same average payoff for the Convict whatever

the Sheriff does, set these two payoffs equal to each other:

1-p = 1 – 1/n + p/n

-p = -1/n + p/n

-p – p/n = -1/n

-np – p = -1

np + p = 1

p(n + 1) = 1

p = 1/(n+1) = Convict’s probability of taking Highway

Sheriff

Convict

Highway Forest

Highway

Forest

(0,1)

(1,0) (1-1/n,1/n)

(1,0)


p = 1/(n+1)

Sheriff

Convict

Highway Forest

Highway

Forest

(0,1)

(1,0) (1-1/n,1/n)

(1,0)


1-p = 1 - 1/(n+1)

Sheriff

Convict

Highway (q) Forest (1-q)

Highway

Forest

(0,1)

(1,0) (1-1/n,1/n)

(1,0)


1(q) + 0(1-q) = q

0(q) + 1/n(1-q) =

(1-q)/n

- It turns out that the Sheriff’s probabilities are the same:

q = (1-q)/n; nq = 1-q; (n+1)q = 1; q = 1/(n+1) and 1-q = 1-1/(n+1)

Sheriff

Convict

Highway Forest

Highway

Forest

(0,1)

(1,0) (1-1/n,1/n)

(1,0)


1-q = 1-1/(n+1)

- The bigger n is, the more likely the forest route is for both

- The bigger n is, the more likely the Convict is to escape

1-p = 1 - 1/(n+1)

An Executive Decision Maker for The

Escaped Convict Game

Forest

p, q = 1 - 1/(n+1)

Highway

p, q = 1/(n+1) The part

of the circle

representing

Highway

gets smaller

the bigger

n gets

The part of

the circle

representing

Forest gets

bigger the

bigger n

gets

Melvin Dresher

1911-1992

- Polish-American mathematician

- Studied problems of equilibrium

in non-zero-sum games

- With Merrill Flood, ran an

experiment at the RAND

Corporation in 1950 that resulted

in the formulation of the game

theoretical model of conflict and

cooperation later called the “Prisoner’s Dilemma” by Albert W.

Tucker (who was John Nash’s Ph.D. thesis advisor at Princeton)

http://upload.wikimedia.org/wikipedia/en/f/fa/Melvin-dresher.jpg

Dresher and Flood’s RAND Experiment

- Dresher and Flood wondered whether real people actually

would tend to find the equilibrium point in a game when

the game had a potentially “better” outcome

- They ran 100 repetitions of the following game between

two players (Armen Alchian of UCLA’s economics

department and John Williams, head of RAND’s

mathematics department)

Williams

Alchian

Strategy 1 Strategy 2

Strategy

1

Strategy

2

(-1 cent,

2 cents)

(0, ½ cent) (1 cent,

-1 cent)

(½ cent,

1 cent)


Williams

Alchian


Strategy

1

Strategy

2

(-1 cent,

2 cents)


-1 cent)

(½ cent,

1 cent)


- For Alchian, Strategy 2 dominates Strategy 1, because his

payoffs in Strategy 2 are higher than his payoffs in Strategy 1

no matter what strategy Williams chooses

Williams

Alchian


Strategy

1

Strategy

2

(-1 cent,

2 cents)


-1 cent)

(½ cent,

1 cent)


- For Williams, Strategy 1 dominates Strategy 2, because his

payoffs from Strategy 1 are higher than his payoffs from

Strategy 2 no matter what strategy Alchian chooses

Williams

Alchian


Strategy

1

Strategy

2

(-1 cent,

2 cents)


-1 cent)

(½ cent,

1 cent)


- (0, ½ cent) is therefore the equilibrium point

- But (½ cent,1 cent) is a better outcome for both, even though

it is biased in favor of Williams


- The experiment showed no evidence of any instinctive

preference for the equilibrium point

… Chosen only 14 times out of the 100 games

- The players appeared to struggle over the course of the

experiment to secure mutual cooperation

… This is apparent from reading the logs of the comments

made by the players during the experiment

Dresher and Flood’s RAND Experiment Excerpts from Log

Alchian Williams

Game Strategy Strategy Alchian comment Williams comment

-------------------------------------------------------------------------------------------------

1 2 2 Williams will play Hope he’s bright.

Strategy 1 – sure

win. Hence if I

play 1 – I lose.

2 2 2 What is he doing?!! He isn’t but maybe

he’ll wise up.

3 2 1 Trying mixed? Okay, dope.

4 2 1 Has he settled on 1? Okay, dope.

5 1 1 Perverse! It isn’t the best of all

possible worlds.

6 2 2 I’m sticking to 2 Oh ho! Guess I’ll

since he will mix for have to give him

at least 4 more times. another chance.

Dresher and Flood’s RAND Experiment Excerpts from Log

Alchian Williams

Game Strategy Strategy Alchian comment Williams comment

-------------------------------------------------------------------------------------------------

17 1 1 The stinker.

18 1 1 He’s crazy. I’ll teach

him the hard way.

19 2 1 I’m completely Let him suffer.

confused. Is he trying

to convey information

to me?

20 2 1

21 2 2 Maybe he’ll be a

good boy now.

22 1 2 Always takes time to

learn.

Albert W. Tucker

1905-1995

- Canadian-born American

mathematician, chair of

Princeton University mathematics

department

- Chaired the AP Calculus

committee of the College

Board during the 1960’s

- Ph.D thesis advisor of John Nash

- Restructured the game in Dresher and Flood’s experiment

and gave it the narrative we know today as the Prisoner’s

Dilemma

The Prisoner’s Dilemma

- Two criminals collaborate in the commission of a robbery

- They are arrested and held separately on a charge of

carrying concealed weapons, which carries a one year

jail term

- The testimony of each one is required in order to convict

the other of the robbery, which carries a 20 year jail term

- Each prisoner is offered immunity and a suspended sentence if

he will turn state’s evidence and testify against the other

prisoner (but he could still be convicted on the basis of the

testimony of the other prisoner!)

- If both prisoners confess, they each go to jail for 5 years

- If neither prisoner confesses, they each go to jail for 1 year

on the concealed weapons charge

Second Prisoner

First

Prisoner

Do not confess Confess

(“Cooperate”) (“Defect”)

Confess

(“Defect”)

Do not

confess

(“Cooperate”)

(-5 yr., -5 yr.)

(-20 yr., 0) (-1 yr., -1 yr.)

(0, -20 yr.)

Game Matrix for the Prisoner’s Dilemma

Observations about the Prisoner’s Dilemma

- This game is not zero-sum (conditions for minimax do not hold)

- The “minimax” strategy for each prisoner is to defect

… Look at the game from the first prisoner’s point of view

… If the second prisoner defects, our prisoner

must either cooperate and go to jail for 20 years, or

defect and go to jail for 5 years

… If the second prisoner cooperates, our prisoner can serve

1 year by cooperating also, or go free by defecting

- If the prisoners could somehow both cooperate by not

confessing, they would achieve better payoffs

(they both would serve 1 year as opposed to 5 years)

- Experiments show that people frequently do defect in

situations similar to the Prisoner’s Dilemma

Cooperate Defect

Co-

operate

Defect

(R, R)

(T, S) (P, P)

(S, T)

Canonical Prisoner’s Dilemma Matrix

R = Reward for mutual cooperation

S = Sucker’s payoff

T = Temptation to defect

P = Punishment for defection

T > R > P > S

2R > T + S

Applications of the Prisoner’s Dilemma

- Nuclear arms race

… Any one country can “defect” by breaking an arms treaty

- Global climate change

… Any one country can “defect” by not adhering to an

emissions standard

- Steroid use in professional athletics

… Any one athlete can “defect” by using performance

enhancing drugs

- OPEC (or any other economic cartel)

… Any one country can “defect” by shipping more oil than

its production ceiling allows

What if we play the Prisoner’s Dilemma

repeatedly? (Iterated Prisoner’s Dilemma)

- If the game is played exactly N times then defect each time

is a dominant strategy

- Why?

… Assume the game is played N+1 times

… Defect on the last play, since no retaliation is possible

… But then the game reduces to N times … etc.

- If we don’t know how many times the game will be played,

it gets a lot more interesting

Robert Axelrod

1943-

- Professor of Political Science and

Public Policy, University of Michigan

- Winner of MacArthur Fellowship

- Published seminal work, The

Evolution of Cooperation, in 1984,

summarizing the results of his study

of the Iterated Prisoner’s Dilemma

Axelrod’s Iterated Prisoner’s Dilemma

Tournament

- Different computer programs were run against each other

in two pairwise round robin tournaments

… Each game lasted 200 “moves” (but this was not told

to the program developers before the tournament)

- Each program embodied a particular strategy

… Examples: “Always defect”; “Always cooperate”; “Co-

operate until your opponent defects, then always defect”;

“Co-operate but defect 10% of the time”; etc.

- Participants in the second tournament had access to the

results of the first one

- There was also a “random” competitor in each tournament

… Defected or cooperated with 50% probability


Tournament – Game Matrix

Cooperate Defect

Co-

operate

Defect

(3, 3)

(5, 0) (1, 1)

(0, 5)

- The winner was the program accumulating the most total

points over the entire tournament


Tournament – The results - A program called “Tit for Tat” won both tournaments

… Submitted by Anatol Rapoport of U. of Toronto

… Started by cooperating

… Then did whatever its opponent did on the prior move

… Strangely like the “Golden Rule” plus lex talionis

- Other programs that did well also started by cooperating

… If they started by cooperating, many continued doing so

… These programs did well because they did well with

each other and because there were enough of them

to raise each other’s average score

- “Greedy” or “selfish” programs didn’t do as well

… They tended to be rapidly punished by counter-defection

Axelrod’s observations about successful

Iterated Prisoner’s Dilemma strategies

- Nice strategies finish first

… “Nice” = not being the first to defect

- Good strategies are retaliatory

… Punish defection immediately, no matter how

cooperative the interaction has been so far

- Good strategies are forgiving

… Do not continue punishing defection for more than

one move

- Good strategies are clear

... Other strategies easily recognize and adjust to them

… None of the more complex programs performed as well

as the simple “Tit For Tat”

Implications of the Iterated Prisoner’s

Dilemma - Biology: Suppose programs reproduce or die out according

to their scores

… Even strategies that do poorly can affect which strategies

do best

… “Tit for Tat” and programs like it end up dominating in

population simulations

… Suggests that cooperation may evolve in a world of

competing entities

- Philosophy: Suggests that moral principles underlying

cooperation (modeled in a very primitive fashion by Axelrod’s

observations about successful strategies) could result from

evolution

- Politics and law: Suggests a basis for social contract theory

Player 2

Player

1

Action X Action Y

Action

A

A Action

B

Payoff from (A, X) Payoff from (A, Y)

Payoff from (B, X) Payoff from (B, Y)

Normal Form Representation: Game Matrix

- Convention: Each cell shows

(Player 1’s payoff, Player 2’s payoff)

The Cuban Missile Crisis: Paradox

- Khrushchev’s foreknowledge of Kennedy’s decision would

have worked to Khrushchev’s disadvantage, assuming that

Kennedy was aware of Khrushchev’s foreknowledge!

- Why? Reason as follows –

… If Kennedy decides to back down, Khrushchev will know it,

and will decide to stand firm: payoff (2,4) (vs. backing down

with payoff (3,3))

… If Kennedy decides to stand firm, Khrushchev will know it,

and will decide to back down: payoff (4,2) (vs. standing firm

with payoff (1,1))

- Thus if Kennedy backs down the outcome will be (2,4),

whereas if he stands firm the outcome will be (4,2); he would

therefore choose to stand firm, causing Khrushchev to

back down (which is what actually transpired)

A Utility Function

W1 W2

Wager: pW1+(1-p)W2

W

U(W), E(W)

E[pW1+(1-p)W2]

U[pW1+(1-p)W2]

- The utility of the expected value is less than the

expected value, so this is a risk-averse utility function

Utility function

Expected value

function

W = amount

of money

p = probability

(between

0 and 1)

U = utility

E = expected

value

A Utility Function

W1=100 W2=200

Wager: 50%(100) + 50%(200)

W

U(W), E(W)

E[pW1+(1-p)W2] = 150

U[pW1+(1-p)W2] = 140

- The utility of the expected value is less than the

expected value, so this is a risk-averse utility function

Utility function

Expected value

function

W = amount

of money

p = probability

(between

0 and 1) = 50%

U = utility

E = expected

value

Another Utility Function

W1 W2

Wager: pW1+(1-p)W2

W

U(W), E(W)

E[pW1+(1-p)W2]

U[pW1+(1-p)W2]

W = amount

of money

p = probability

(between

0 and 1)

U = utility

E = expected

value

- The utility of the expected value is greater than the

expected value, so this is a risk-seeking utility function

Utility function

Expected

value function

A Utility Function

100 200

Wager: 50%(100)+50%(200)

W

U(W), E(W)

E[pW1+(1-p)W2] = 150

U[pW1+(1-p)W2] = 160

W = amount

of money

p = probability

(between

0 and 1) = 50%

U = utility

E = expected

value

- The utility of the expected value is greater than the

expected value, so this is a risk-seeking utility function

Utility function

Expected

value function

Basic assumptions of game theory - Utility

Four axioms of utility theory define a rational decision maker:

- Completeness: For every A and B, either A>B, A=B or A<B:

Either A is preferred to B, as good as B or worse than B

- Transitivity: For every A, B and C, if A>B and B>C then A>C:

If A is preferred to B and B is preferred to C, then A is always preferred to C

- Independence: For every set of gambles A, B and C where A>B, there

should be some weighting factor 0<w<1 such that wA+(1-w)C > wB+(1-w)C:

If A is preferred to B, then the weighted average of A and C should be

preferred to the weighted average of B and C (for at least some weight)

- Continuity: For every set of gambles A, B and C where A>B>C, there must

be a probability p such that B = pA + (1-p)C:

If B is ranked between A and C, there must be a possible combination of A

and C that makes the individual indifferent between that combination and B

(otherwise, it would not be logical for B to be ranked between A and C)

Prospect Theory Utility Function

http://en.wikipedia.org/wiki/File:Valuefun.jpg

Example: “Matching Pennies”

- Two players each display a penny at the same time

- If both pennies match (i.e. both are heads or both are tails),

Player 1 wins a penny

- If the pennies do not match, Player 2 wins a penny

Observations about Pascal’s Wager

- The strategy of acting as if God exists dominates the

strategy of acting as if God does not exist

… The infinite magnitude of the punishment in the case

that God actually exists means that even if the

probability (p) of that case is minuscule, the risk is still

too great

… The infinite magnitude of the reward in the case that

God actually exists means that even if the probability (p)

of that case is minuscule, the reward is still very great

… Therefore the infinity of the payoffs following from God’s

existence makes the payoffs following from God’s

nonexistence irrelevant

Player 2

Player

1

Show head Show tail

Show

head

Show

tail

(+1, -1) (-1, +1)

(-1, +1) (+1, -1)

Game matrix for “Matching Pennies”

Player 2

Player

1

Show head (q) Show tail (1-q)

Show

head

(p)

Show

tail

(1-p)

(+1, -1) (-1, +1)

(-1, +1) (+1, -1)


Player 2

Player

1


Show

head

Show

tail

(+1, -1) (-1, +1)

(-1, +1) (+1, -1)


q + -1 + q 2q-1 =

- Expected value to Player 2 of Player 1 showing heads = 2q-1

Player 2

Player

1


Show

head

Show

tail

(+1, -1) (-1, +1)

(-1, +1) (+1, -1)


-q + 1 - q = 1-2q

- Expected value to Player 2 of Player 1 showing tails = 1-2q

Player 2

Player

1


Show

head

Show

tail

(+1, -1) (-1, +1)

(-1, +1) (+1, -1)


- Player 2 will randomize if 2q-1 = 1-2q; 4q = 2; q = ½, 1-q = ½

- Therefore Player 2 should show heads ½ the time and tails

½ the time, randomly

q + -1 + q 2q-1 =

-q + 1 - q 1-2q =

Player 2

Player

1

Show head Show tail

Show

head

(p)

Show

tail

(1-p)

(+1, -1) (-1, +1)

(-1, +1) (+1, -1)


- Similarly, Player 1 will randomize if 1-2p = 2p-1; 4p = 2; p = ½

- Therefore Player 1 should show heads ½ the time and tails

½ the time, randomly

-p

+

1 - p

=

1-2p

p

+

-1 + p

=

2p-1


Tournament – The results

- Example: Tit For Tat vs. Joss

… Joss: Similar to Tit For Tat, but defects randomly (10%

of the time) after the other player cooperates

… In this sequence Joss randomly defects on the 6th move

… Tit For Tat then defects back on the 7th move

even though Joss returns to cooperating

… Joss then defects back in response on the 8th move

… This results in an “echo effect” -- Joss defects

on all the later even numbered moves and Tit For Tat

defects on all the later odd numbered moves

… On the 25th move Joss randomly defects again, causing Tit

for Tat to defect back, and another echo begins, causing

both programs to defect on every move


Tournament – Tit For Tat vs. Joss Moves Results

1-20 11111 23232 32323 23232

21-40 32324 44444 44444 44444

41-60 44444 44444 44444 44444

61-80 44444 44444 44444 44444

81-100 44444 44444 44444 44444

101-120 44444 44444 44444 44444

121-140 44444 44444 44444 44444

141-160 44444 44444 44444 44444

161-180 44444 44444 44444 44444

181-200 44444 44444 44444 44444

1 = both cooperated 3 = Joss only cooperated

2 = Tit for Tat only cooperated 4 = both defected

Final score: Tit For Tat 236, Joss 241

Tit For Tat Always Cooperate

Spiteful Bully (Defect until opponent defects back, then

cooperate unless opponent defects three times in a row)

Collectively stable strategies in the

Iterated Prisoner’s Dilemma

- A strategy is collectively stable if in a population of

entities using it, no other strategy can successfully

invade and establish itself

… The new strategy would have to get a higher score

against the “native” strategy than the “native” strategy

gets against another copy of the “native” strategy

it’s all in the game - welcome to the olli at uci blog · what is a “game?” - a situation in...

Documents