game theory ii. definition of nash equilibrium a game has n players. each player i has a strategy...
Post on 22-Dec-2015
215 views
TRANSCRIPT
Definition of Nash Equilibrium
• A game has n players.
• Each player i has a strategy set Si
– This is his possible actions
• Each player has a payoff function – I: S ! R
• A strategy ti 2 Si is a best response if there is no other strategy in Si that produces a higher payoff, given the opponent’s strategies.
Definition of Nash Equilibrium
• A strategy profile is a list (s1, s2, …, sn) of the strategies each player is using.
• If each strategy is a best response given the other strategies in the profile, the profile is a Nash equilibrium.
• Why is this important?– If we assume players are rational, they will play Nash
strategies.– Even less-than-rational play will often converge to
Nash in repeated settings.
An Example of a Nash Equilibrium
a b
b 2,1
0,1
1,0
1,2
Row
Column
a
(b,a) is a Nash equilibrium.To prove this: Given that column is playing a, row’s best response is b. Given that row is playing b, column’s best response is a.
Finding Nash Equilibria – Dominated Strategies
• What to do when it’s not obvious what the equilibrium is?
• In some cases, we can eliminate dominated strategies.– These are strategies that are inferior for every
opponent action.
• In the previous example, row = a is dominated.
Example
• A 3x3 example:
a b
b 80,26
57,42
35,12
73,25
Row
Column
a
c
c
66,32
32,54
28,27 63,31 54,29
Example
• A 3x3 example:
a b
b 80,26
57,42
35,12
73,25
Row
Column
a
c
c
66,32
32,54
28,27 63,31 54,29
c dominates a for the column player
Example
• A 3x3 example:
a b
b 80,26
57,42
35,12
73,25
Row
Column
a
c
c
66,32
32,54
28,27 63,31 54,29
b is then dominated by both a and c for the row player.
Example
• A 3x3 example:
a b
b 80,26
57,42
35,12
73,25
Row
Column
a
c
c
66,32
32,54
28,27 63,31 54,29
Given this, b dominates c for the column player – the column player will always play b.
Example
• A 3x3 example:
a b
b 80,26
57,42
35,12
73,25
Row
Column
a
c
c
66,32
32,54
28,27 63,31 54,29
Since column is playing b, row will prefer c.
Example
a b
b 80,26
57,42
35,12
73,25
Row
Column
a
c
c
66,32
32,54
28,27 63,31 54,29
We verify that (c,b) is a Nash Equilibrium by observation:If row plays c, b is the best response for column.If column plays b, c is the best response by row.
Coordination Games
• Consider the following problem:– A supplier and a buyer need to decide whether
to adopt a new purchasing system.
new old
old 0,0
0,0
5,5
20,20
Supplier
Buyer
new
No dominated strategies!
Coordination Games
new old
old 0,0
0,0
5,5
20,20Supplier
Buyer
new
• This game has two Nash equilibria (new,new) and (old,old)•Real-life examples: Beta vs VHS, Mac vs Windows vs Linux, others?
• Each player wants to do what the other does• which may be different than what they say they’ll do
• How to choose a strategy? Nothing is dominated.
Solving Coordination Games
• Coordination games turn out to be an important real-life problem– Technology/policy/strategy adoption, delegation of
authority, synchronization
• Human agents tend to use “focal points”– Solutions that seem to make “natural sense”
• e.g. pick a number between 1 and 10
• Social norms/rules are also used– Driving on the right/left side of the road
• These strategies change the structure of the game
Finding Nash Equilibria – Simultaneous Equations
• We can also express a game as a set of equations.• Demand for corn is governed by the following
equation:– Quantity = 100000(10 – p)
• Government price supports say that p must be at least 0.25 (and it can’t be more than 10)
• Three farmers can each choose to sell 0-600000 lbs of corn.
• What are the Nash equilibria?
Setup
• Quantity (q) = q1 + q2 + q3• Price(p) = a –bq (downward-sloping line)• Farmer 1 is trying to decide a quantity to sell.• Maximize profit = price * quantity• Maximize: pq1 =(a –bq) * q1• Profit = (a – b(q1 + q2 + q3)) * q1 = = aq1 –bq12 –bq1q2 –bq1q3Differentiate: Pr’ = a – 2bq1 –bq2 – bq3To maximize: set this equal to zero.
Setup• So solutions must satisfy
– a – b(q2 + q3) – 2bq1 = 0
• So what if q1 = q2 = q3 (everyone ships the same amount?)– Since the game is symmetric, this should be a solution.– a – 4bq1 = 0, a = 4bq1, q1 = a/4b.– q = 3a/4b, p = a/4. Each farmer gets a2 / 16b.– In this problem, a=10, b=1/100000.– Price = $2.50, q1=250000, profit = 625,000– q1=q2=q3=250000 is a solution.– Price supports not used in this solution.
Setup• What if farmers 2 & 3 send everything they have?
– q2 + q3 = 1,200,000
• If farmer 1 then shipped nothing, price would be: – 10 - 1,200,000/100,000 = -2.
• But prices can’t fall below $0.25, so they’d be capped there.• Adding quantity would reduce the price, except for supports.
– So, farmer 1 should sell all his corn at $0.25, and earn $125,000.
• So everyone selling everything at the lowest price (q1 = q2 =q3 = 600,000) is also a Nash equilibrium.– These are the only pure strategy Nash equilibria.
Price-matching Example
• Two sellers are offering the same book for sale.
• This book costs each seller $25.
• The lowest price gets all the customers; if they match, profits are split.
• What is the Nash Equilibrium strategy?
Price-matching Example
• Suppose the monopoly price of the book is $30. – (price that maximizes profit w/o competition)
• Each seller offers a rebate: if you find the book cheaper somewhere else, we’ll sell it to you with double the difference subtracted.– E.g. $30 at store 1, $24 at store 2 – get it for $18 from
store 1.
• Now what is each seller’s Nash strategy?
Price-matching example
• Observation 1: sellers want to have the same price.– Each suffers from giving the rebate.
• Profit = p1 – 2(p1 – p2) = -p1 –2p2• Pr’ = -1.
– There is no local maximum. So, to maximize profits, maximize price.
• At that point, the rebate 2(p1 – p2) is 0, and p1 is as high as possible.– The 2 makes up for sharing the market.
Cooperative Games and Coalitions
• When a group of agents decide to cooperate to improve their payments (for example, adopting a technology), we call them a coalition– Side payments, bribes, intimidation may be used to set
up a coalition.
– Example: A,B,C are running for class president. The president receives $10, everyone else $0
– Each player’s strategy is to vote for themselves.
– A offers B $5 to vote for her – now both A and B are happier and have formed a coalition.
Efficiency
• We say that a coalition is efficient if there’s no choice of action that can improve one person’s profit without decreasing another.– Same reasoning as Nash equilibria, market equilibria.
– If someone could change their strategy without hurting anyone and improve their payoff, it’s not efficient.
– Money is left “on the table”
• Example: cake-cutting.
Mixed strategies
• Unfortunately, not every game has a pure strategy equilibrium.– Rock-paper-scissors
• However, every game has a mixed strategy Nash equilibrium.
• Each action is assigned a probability of play. • Player is indifferent between actions, given these
probabilities.
Mixed Strategies
• In many games (such as coordination games) a player might not have a pure strategy.
• Instead, optimizing payoff might require a randomized strategy (also called a mixed strategy)
football shopping
shopping 0,0
0,0
1,2
2,1
Husband
Wife
football
Strategy Selection
football shopping
shopping 0,0
0,0
1,2
2,1Husband
Wife
football
If we limit to pure strategies:
Husband: U(football) = 0.5 * 2 + 0.5 * 0 = 1
U(shopping) = 0.5 * 0 + 0.5 * 1 = ½
Wife: U(shopping) = 1, U(football) = ½
Problem: this won’t lead to coordination!
Mixed strategy• Instead, each player selects a probability associated with each
action– Goal: utility of each action is equal– Players are indifferent to choices at this probability
• a=probability husband chooses football• b=probability wife chooses shopping• Since payoffs must be equal, for husband:
– b*1=(1-b)*2 b=2/3
• For wife:– a*1=(1-a)*2 = 2/3
• In each case, expected payoff is 2/3– 2/9 of time go to football, 2/9 shopping, 5/9 miscoordinate
• If they could synchronize ahead of time they could do better.
Example: Rock paper scissors
rock paper
paper 1,-1
-1,1
0,0
0,0
Row
Column
rock
scissors
scissors
1,-1
-1,1
-1,1 1,-1 0,0
Setup
• Player 1 plays rock with probability pr, scissors with probability ps, paper with probability 1-pr –ps
• P2: Utility(rock) = 0*pr + 1*ps – 1(1-pr –ps) = 2 ps + pr -1
• P2: Utility(scissors) = 0*ps + 1*(1 – pr – ps) – 1pr = 1 – 2pr –ps
• P2: Utility(paper) = 0*(1-pr –ps)+ 1*pr – 1ps = pr –ps
Player 2 wants to choose a probability for each strategy so that the expected payoff for each strategy is the same.
Setup
qr(2 ps + pr –1) = qs(1 – 2pr –ps) = (1-qr-qs) (pr –ps)
• It turns out (after some algebra) that the optimal mixed strategy is to play each strategy ½ of the time.
• Intuition: What if you played rock half the time? Your opponent would then play paper half the time, and you’d lose more often than you won.
• So you’d decrease the fraction of times you played rock, until your opponent had no ‘edge’ in guessing what you’ll do.
Repeated games
• Many games get played repeatedly• A common strategy for the husband-wife problem
is to alternate– This leads to a payoff of 1, 2,1,2,…
– 1.5 per week.
• Requires initial synchronization, plus trust that partner will go along.
• Difference in formulation: we are now thinking of the game as a repeated set of interactions, rather than as a one-shot exchange.
Repeated vs Stage Games
• There are two types of multiple-action games:– Stage games: players take a number of actions
and then receive a payoff.• Checkers, chess, bidding in an ascending auction
– Repeated games: Players repeatedly play a shorter game, receiving payoffs along the way.
• Poker, blackjack, rock-paper-scissors, etc
Analyzing Stage Games
• Analyzing stage games requires backward induction
• We start at the last action, determine what should happen there, and work backwards.– Just like a game tree with extensive form.
• Strange things can happen here:– Centipede game
• Players alternate – can either cooperate and get $1 from nature or defect and steal $2 from your opponent
• Game ends when one player has $100 or one player defects.
Analyzing Repeated Games
• Analyzing repeated games requires us to examine the expected utility of different actions.
• Assumption: game is played “infinitely often”– Weird endgame effects go away.
• Prisoner’s Dilemma again:– In this case, tit-for-tat outperforms defection.
• Collusion can also be explained this way.– Short-term cost of undercutting is less than long-run
gains from avoiding competition.