game theory ii. definition of nash equilibrium a game has n players. each player i has a strategy...

Game Theory II

Definition of Nash Equilibrium

• A game has n players.

• Each player i has a strategy set Si

– This is his possible actions

• Each player has a payoff function – I: S ! R

• A strategy ti 2 Si is a best response if there is no other strategy in Si that produces a higher payoff, given the opponent’s strategies.

Definition of Nash Equilibrium

• A strategy profile is a list (s1, s2, …, sn) of the strategies each player is using.

• If each strategy is a best response given the other strategies in the profile, the profile is a Nash equilibrium.

• Why is this important?– If we assume players are rational, they will play Nash

strategies.– Even less-than-rational play will often converge to

Nash in repeated settings.

An Example of a Nash Equilibrium

a b

b 2,1

0,1

1,0

1,2

Row

Column

a

(b,a) is a Nash equilibrium.To prove this: Given that column is playing a, row’s best response is b. Given that row is playing b, column’s best response is a.

Finding Nash Equilibria – Dominated Strategies

• What to do when it’s not obvious what the equilibrium is?

• In some cases, we can eliminate dominated strategies.– These are strategies that are inferior for every

opponent action.

• In the previous example, row = a is dominated.

Example

• A 3x3 example:

a b

b 80,26

57,42

35,12

73,25

Row

Column

a

c

c

66,32

32,54

28,27 63,31 54,29

Example

• A 3x3 example:

a b

b 80,26

57,42

35,12

73,25

Row

Column

a

c

c

66,32

32,54

28,27 63,31 54,29

c dominates a for the column player

Example

• A 3x3 example:

a b

b 80,26

57,42

35,12

73,25

Row

Column

a

c

c

66,32

32,54

28,27 63,31 54,29

b is then dominated by both a and c for the row player.

Example

• A 3x3 example:

a b

b 80,26

57,42

35,12

73,25

Row

Column

a

c

c

66,32

32,54

28,27 63,31 54,29

Given this, b dominates c for the column player – the column player will always play b.

Example

• A 3x3 example:

a b

b 80,26

57,42

35,12

73,25

Row

Column

a

c

c

66,32

32,54

28,27 63,31 54,29

Since column is playing b, row will prefer c.

Example

a b

b 80,26

57,42

35,12

73,25

Row

Column

a

c

c

66,32

32,54

28,27 63,31 54,29

We verify that (c,b) is a Nash Equilibrium by observation:If row plays c, b is the best response for column.If column plays b, c is the best response by row.

Example #2

• You try this one:

a b

b 1,2

1,1

4,1

2,2

Row

Column

a

c

4,0

3,5

Coordination Games

• Consider the following problem:– A supplier and a buyer need to decide whether

to adopt a new purchasing system.

new old

old 0,0

0,0

5,5

20,20

Supplier

Buyer

new

No dominated strategies!

Coordination Games

new old

old 0,0

0,0

5,5

20,20Supplier

Buyer

new

• This game has two Nash equilibria (new,new) and (old,old)•Real-life examples: Beta vs VHS, Mac vs Windows vs Linux, others?

• Each player wants to do what the other does• which may be different than what they say they’ll do

• How to choose a strategy? Nothing is dominated.

Solving Coordination Games

• Coordination games turn out to be an important real-life problem– Technology/policy/strategy adoption, delegation of

authority, synchronization

• Human agents tend to use “focal points”– Solutions that seem to make “natural sense”

• e.g. pick a number between 1 and 10

• Social norms/rules are also used– Driving on the right/left side of the road

• These strategies change the structure of the game

Finding Nash Equilibria – Simultaneous Equations

• We can also express a game as a set of equations.• Demand for corn is governed by the following

equation:– Quantity = 100000(10 – p)

• Government price supports say that p must be at least 0.25 (and it can’t be more than 10)

• Three farmers can each choose to sell 0-600000 lbs of corn.

• What are the Nash equilibria?

Setup

• Quantity (q) = q1 + q2 + q3• Price(p) = a –bq (downward-sloping line)• Farmer 1 is trying to decide a quantity to sell.• Maximize profit = price * quantity• Maximize: pq1 =(a –bq) * q1• Profit = (a – b(q1 + q2 + q3)) * q1 = = aq1 –bq12 –bq1q2 –bq1q3Differentiate: Pr’ = a – 2bq1 –bq2 – bq3To maximize: set this equal to zero.

Setup• So solutions must satisfy

– a – b(q2 + q3) – 2bq1 = 0

• So what if q1 = q2 = q3 (everyone ships the same amount?)– Since the game is symmetric, this should be a solution.– a – 4bq1 = 0, a = 4bq1, q1 = a/4b.– q = 3a/4b, p = a/4. Each farmer gets a2 / 16b.– In this problem, a=10, b=1/100000.– Price = $2.50, q1=250000, profit = 625,000– q1=q2=q3=250000 is a solution.– Price supports not used in this solution.

Setup• What if farmers 2 & 3 send everything they have?

– q2 + q3 = 1,200,000

• If farmer 1 then shipped nothing, price would be: – 10 - 1,200,000/100,000 = -2.

• But prices can’t fall below $0.25, so they’d be capped there.• Adding quantity would reduce the price, except for supports.

– So, farmer 1 should sell all his corn at $0.25, and earn $125,000.

• So everyone selling everything at the lowest price (q1 = q2 =q3 = 600,000) is also a Nash equilibrium.– These are the only pure strategy Nash equilibria.

Price-matching Example

• Two sellers are offering the same book for sale.

• This book costs each seller $25.

• The lowest price gets all the customers; if they match, profits are split.

• What is the Nash Equilibrium strategy?

Price-matching Example

• Suppose the monopoly price of the book is $30. – (price that maximizes profit w/o competition)

• Each seller offers a rebate: if you find the book cheaper somewhere else, we’ll sell it to you with double the difference subtracted.– E.g. $30 at store 1, $24 at store 2 – get it for $18 from

store 1.

• Now what is each seller’s Nash strategy?

Price-matching example

• Observation 1: sellers want to have the same price.– Each suffers from giving the rebate.

• Profit = p1 – 2(p1 – p2) = -p1 –2p2• Pr’ = -1.

– There is no local maximum. So, to maximize profits, maximize price.

• At that point, the rebate 2(p1 – p2) is 0, and p1 is as high as possible.– The 2 makes up for sharing the market.

Cooperative Games and Coalitions

• When a group of agents decide to cooperate to improve their payments (for example, adopting a technology), we call them a coalition– Side payments, bribes, intimidation may be used to set

up a coalition.

– Example: A,B,C are running for class president. The president receives $10, everyone else $0

– Each player’s strategy is to vote for themselves.

– A offers B $5 to vote for her – now both A and B are happier and have formed a coalition.

Efficiency

• We say that a coalition is efficient if there’s no choice of action that can improve one person’s profit without decreasing another.– Same reasoning as Nash equilibria, market equilibria.

– If someone could change their strategy without hurting anyone and improve their payoff, it’s not efficient.

– Money is left “on the table”

• Example: cake-cutting.

Mixed strategies

• Unfortunately, not every game has a pure strategy equilibrium.– Rock-paper-scissors

• However, every game has a mixed strategy Nash equilibrium.

• Each action is assigned a probability of play. • Player is indifferent between actions, given these

probabilities.

Mixed Strategies

• In many games (such as coordination games) a player might not have a pure strategy.

• Instead, optimizing payoff might require a randomized strategy (also called a mixed strategy)

football shopping

shopping 0,0

0,0

1,2

2,1

Husband

Wife

football

Strategy Selection

football shopping

shopping 0,0

0,0

1,2

2,1Husband

Wife

football

If we limit to pure strategies:

Husband: U(football) = 0.5 * 2 + 0.5 * 0 = 1

U(shopping) = 0.5 * 0 + 0.5 * 1 = ½

Wife: U(shopping) = 1, U(football) = ½

Problem: this won’t lead to coordination!

Mixed strategy• Instead, each player selects a probability associated with each

action– Goal: utility of each action is equal– Players are indifferent to choices at this probability

• a=probability husband chooses football• b=probability wife chooses shopping• Since payoffs must be equal, for husband:

– b*1=(1-b)*2 b=2/3

• For wife:– a*1=(1-a)*2 = 2/3

• In each case, expected payoff is 2/3– 2/9 of time go to football, 2/9 shopping, 5/9 miscoordinate

• If they could synchronize ahead of time they could do better.

Example: Rock paper scissors

rock paper

paper 1,-1

-1,1

0,0

0,0

Row

Column

rock

scissors

scissors

1,-1

-1,1

-1,1 1,-1 0,0

Setup

• Player 1 plays rock with probability pr, scissors with probability ps, paper with probability 1-pr –ps

• P2: Utility(rock) = 0*pr + 1*ps – 1(1-pr –ps) = 2 ps + pr -1

• P2: Utility(scissors) = 0*ps + 1*(1 – pr – ps) – 1pr = 1 – 2pr –ps

• P2: Utility(paper) = 0*(1-pr –ps)+ 1*pr – 1ps = pr –ps

Player 2 wants to choose a probability for each strategy so that the expected payoff for each strategy is the same.

Setup

qr(2 ps + pr –1) = qs(1 – 2pr –ps) = (1-qr-qs) (pr –ps)

• It turns out (after some algebra) that the optimal mixed strategy is to play each strategy ½ of the time.

• Intuition: What if you played rock half the time? Your opponent would then play paper half the time, and you’d lose more often than you won.

• So you’d decrease the fraction of times you played rock, until your opponent had no ‘edge’ in guessing what you’ll do.

Repeated games

• Many games get played repeatedly• A common strategy for the husband-wife problem

is to alternate– This leads to a payoff of 1, 2,1,2,…

– 1.5 per week.

• Requires initial synchronization, plus trust that partner will go along.

• Difference in formulation: we are now thinking of the game as a repeated set of interactions, rather than as a one-shot exchange.

Repeated vs Stage Games

• There are two types of multiple-action games:– Stage games: players take a number of actions

and then receive a payoff.• Checkers, chess, bidding in an ascending auction

– Repeated games: Players repeatedly play a shorter game, receiving payoffs along the way.

• Poker, blackjack, rock-paper-scissors, etc

Analyzing Stage Games

• Analyzing stage games requires backward induction

• We start at the last action, determine what should happen there, and work backwards.– Just like a game tree with extensive form.

• Strange things can happen here:– Centipede game

• Players alternate – can either cooperate and get $1 from nature or defect and steal $2 from your opponent

• Game ends when one player has $100 or one player defects.

Analyzing Repeated Games

• Analyzing repeated games requires us to examine the expected utility of different actions.

• Assumption: game is played “infinitely often”– Weird endgame effects go away.

• Prisoner’s Dilemma again:– In this case, tit-for-tat outperforms defection.

• Collusion can also be explained this way.– Short-term cost of undercutting is less than long-run

gains from avoiding competition.

game theory ii. definition of nash equilibrium a game has n players. each player i has a strategy...

Documents

row column

example ab b

column player slide

row player

nash equilibrium ab

3x3 example

nash strategies

previous example