unit iii: the evolution of cooperation

Post on 24-Feb-2016

29 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Unit III: The Evolution of Cooperation. Can Selfishness Save the Environment? Repeated Games: the Folk Theorem Evolutionary Games A Tournament How to Promote Cooperation/Unit Review. 7/28. 4 /6. 4/14. Repeated Games. Some Questions: What happens when a game is repeated? - PowerPoint PPT Presentation

TRANSCRIPT

Unit III: The Evolution of Cooperation

•Can Selfishness Save the Environment?•Repeated Games: the Folk Theorem•Evolutionary Games•A Tournament•How to Promote Cooperation/Unit Review

4/14

7/28

4/6

Repeated Games

Some Questions:

• What happens when a game is repeated? • Can threats and promises about the future

influence behavior in the present?• Cheap talk• Finitely repeated games: Backward induction• Indefinitely repeated games: Trigger strategies

The Folk Theorem

(R,R)

(T,S)

(S,T)

(P,P)

Theorem: Any payoff that pareto-dominates the one-shot NE can be supported in a SPNE of the repeated game, if the discount parameter is sufficiently high.

The Folk Theorem

(R,R)

(T,S)

(S,T)

(P,P)

In other words, in the repeatedgame, if the future matters “enough”i.e., (d > d*),there are zillions of equilibria!

• The theorem tells us that in general, repeated games give rise to a very large set of Nash equilibria. In the repeated PD, these are pareto-rankable, i.e., some are efficient and some are not.

• In this context, evolution can be seen as a process that selects for repeated game strategies with efficient payoffs.

“Survival of the Fittest”

The Folk Theorem

Evolutionary Games

Fifteen months after I had begun my systematic enquiry, I happened to read for amusement ‘Malthus on Population’ . . . It at once struck me that . . . favorable variations would tend to be preserved, and unfavorable ones to be destroyed. Here then I had at last got a theory by which to work.

Charles Darwin

Evolutionary Games

• Evolutionary Stability (ESS)• Hawk-Dove: an example• The Replicator Dynamic• The Trouble with TIT FOR TAT• Designing Repeated Game Strategies• Finite Automata

Evolutionary Games

Biological Evolution: Under the pressure of natural selection, any population (capable of reproduction and variation) will evolve so as to become better adapted to its environment, i.e., will develop in the direction of increasing “fitness.”

Economic Evolution: Firms that adopt efficient “routines” will survive, expand, and multiply; whereas others will be “weeded out” (Nelson and Winters, 1982).

Evolutionary StabilityEvolutionary Stable Strategy (ESS): A strategy is evolutionarily

stable if it cannot be invaded by a mutant strategy. 

(Maynard Smith & Price, 1973)

A strategy, A, is ESS, if

i) V(A/A) > V(B/A), for all B ii) either V(A/A) > V(B/A) or V(A/B) > V(B/B), for all B

Hawk-Dove: an example

Imagine a population of Hawks and Doves competing over a scarce resource (say food in a given area). The share of each type in the population changes according to the payoff matrix, so that payoffs determine the number of offspring left to the next generation.  

v = value of the resourcec = cost of fighting

H/D: Hawk gets resource; Dove flees (v, 0)D/D: Share resource (v/2, v/2)H/H: Share resource less cost of fighting ((v-c)/2, (v-c)/2)

(See Hargreave-Heap and Varoufakis: 195-214; Casti: 71-75.)

Hawk-Dove: an example

H D

H (v-c)/2,(v-c)/2 v,0

D 0,v v/2,v/2

v = value of resourcec = cost of fighting

Hawk-Dove: an example

H D

H -1,-1 4,0

D 0,4 2, 2

v = value of resource = 4c = cost of fighting = 6

Hawk-Dove: an example

H D

H -1,-1 4,0

D 0,4 2, 2

NE = {(1,0);(0,1);(2/3,2/3)} unstable stable

The mixed NE corresponds to a population that is 2/3 Hawks and 1/3 Doves

Hawk-Dove: an example

H D

H -1,-1 4,0

D 0,4 2, 2

NE = {(1,0);(0,1);(2/3,2/3)} unstable stable Is any strategy ESS?

H D

H

D

-1,-1 4,0

0,4 2,2

A strategy, A, is ESS, if

i) V(A/A) > V(B/A), for all B

ii) either V(A/A) > V(B/A) or V(A/B) > V(B/B), for all B

EP2(O) = 3p EP2(F) = 5-5p

p* = 5/8

Hawk-Dove: an example

NE = {(1,0);(0,1);(2/3,2/3)}

H D

H

D

-1,-1 4,0

0,4 2,2

A strategy, A, is ESS, if

i) V(A/A) > V(B/A), for all B

In other words, to be ESS, a strategy must be a NE with itself.

EP2(O) = 3p EP2(F) = 5-5p

p* = 5/8

Hawk-Dove: an example

NE = {(1,0);(0,1);(2/3,2/3)}

H D

H

D

-1,-1 4,0

0,4 2,2

A strategy, A, is ESS, if

i) V(A/A) > V(B/A), for all B

In other words, to be ESS, a strategy must be a NE with itself.

Neither H nor D is ESS.(For these payoffs.) EP2(O) = 3p EP2(F) = 5-5p

p* = 5/8

Hawk-Dove: an example

NE = {(1,0);(0,1);(2/3,2/3)}

H D

H

D

-1,-1 4,0

0,4 2,2

A strategy, A, is ESS, if

i) V(A/A) > V(B/A), for all B

ii) either V(A/A) > V(B/A) or V(A/B) > V(B/B), for all B

What about the mixed NE strategy? = 3p EP2(F) = 5-5p

p* = 5/8

Hawk-Dove: an example

NE = {(1,0);(0,1);(2/3,2/3)}

H D

H

D

-1,-1 4,0

0,4 2,2

V(H/H) = -1V(H/D) = 4V(D/H) = 0V(D/D) = 2V(H/M) = 2/3V(H/H)+1/3V(H/D) = 2/3V(M/H) = 2/3V(H/H)+1/3V(D/H) = -2/3V(D/M) = 2/3V(D/H)+1/3V(D/D) = 2/3V(M/D) = 2/3V(H/D)+1/3V(D/D) = 10/3V(M/M) = 2/3V(D/H)+1/3V(D/D) = 2/3

Hawk-Dove: an example

NE = {(1,0);(0,1);(2/3,2/3)}

Where M is the mixed strategy 2/3 Hawk, 1/3 Dove

H D

H

D

-1,-1 4,0

0,4 2,2

V(H/H) = -1V(H/D) = 4V(D/H) = 0V(D/D) = 2V(H/M) = 2/3V(H/H)+1/3V(H/D) = 2/3V(M/H) = 2/3 ( -1 ) +1/3 ( 4 ) = 2/3V(D/M) = 2/3V(D/H)+1/3V(D/D) = 2/3V(M/D) = 2/3V(H/D)+1/3V(D/D) = 10/3V(M/M) = 2/3V(D/H)+1/3V(D/D) = 2/3

Hawk-Dove: an example

NE = {(1,0);(0,1);(2/3,2/3)}

H D

H

D

-1,-1 4,0

0,4 2,2

V(H/H) = -1V(H/D) = 4V(D/H) = 0V(D/D) = 2V(H/M) = 2/3V(H/H)+1/3V(H/D) = 2/3V(M/H) = 2/3V(H/H)+1/3V(D/H) = -2/3V(D/M) = 2/3V(D/H)+1/3V(D/D) = 2/3V(M/D) = 2/3V(H/D)+1/3V(D/D) = 10/3V(M/M) = 4/9V(H/H)+2/9V(H/D)

= 2/9V(D/H)+1/9V(D/D) = 2/3

Hawk-Dove: an example

NE = {(1,0);(0,1);(2/3,2/3)}

H D

H

D

-1,-1 4,0

0,4 2,2

To be an ESS

i) V(M/M) > V(B/M), for all B

ii) either V(M/M) > V(B/M) or V(M/B) > V(B/B), for all B(O) = 3p EP2(F) = 5-5p

p* = 5/8

Hawk-Dove: an example

NE = {(1,0);(0,1);(2/3,2/3)}

H D

H

D

-1,-1 4,0

0,4 2,2

To be an ESS

i) V(M/M) = V(H/M) = V(D/M) = 2/3

ii) either V(M/M) > V(B/M) or V(M/B) > V(B/B), for all B(O) = 3p EP2(F) = 5-5p

p* = 5/8

Hawk-Dove: an example

NE = {(1,0);(0,1);(2/3,2/3)}

H D

H

D

-1,-1 4,0

0,4 2,2

To be an ESS

i) V(M/M) = V(H/M) = V(D/M) = 2/3

ii) either V(M/M) > V(B/M) or V(M/B) > V(B/B), for all B(O) = 3p EP2(F) = 5-5p

p* = 5/8

Hawk-Dove: an example

NE = {(1,0);(0,1);(2/3,2/3)}

V(M/D) > V(D/D) 10/3 > 2 V(M/H) > V(H/H) -2/3 > -1

Evolutionary Stability in IRPD?

Evolutionary Stable Strategy (ESS): A strategy is evolutionarily stable if it cannot be invaded by a mutant strategy. 

(Maynard Smith & Price, 1973)

Is D an ESS?

i) V(D/D) > V(STFT/D) ?ii) V(D/D) > V(STFT/D) or V(D/STFT) > V(STFT/STFT) ?

Consider a mutant strategy called e.g., SUSPICIOUS TIT FOR TAT (STFT). STFT defects on the first round, then plays like TFT

Evolutionary Stability in IRPD?

Evolutionary Stable Strategy (ESS): A strategy is evolutionarily stable if it cannot be invaded by a mutant strategy. 

(Maynard Smith & Price, 1973)

Is D an ESS?

i) V(D/D) = V(STFT/D) ii) V(D/D) = V(STFT/D) or V(D/STFT) = V(STFT/STFT)

Consider a mutant strategy called e.g., SUSPICIOUS TIT FOR TAT (STFT). STFT defects on the first round, then plays like TFT

D and STFT are “neutral mutants”

Evolutionary Stability in IRPD?

Axelrod & Hamilton (1981) demonstrated that D is not an ESS, opening the way to subsequent tournament studies of the game.

This is a sort-of Folk Theorem for evolutionary games: In the one-shot Prisoner’s Dilemma, DEFECT is strictly dominant. But in the repeated game, ALWAYS DEFECT (D) can be invaded by a mutant strategy, e.g., SUSPICIOUS TIT FOR TAT (STFT).

• Many cooperative strategies do better than D, thus they can gain a foothold and grow as a share of the population.

• Depending on the initial population, the equilibrium reached can exhibit any amount of cooperation.

Is STFT an ESS?

Evolutionary Stability in IRPD?

It can be shown that there is no ESS in IRPD (Boyd & Lorberbaum, 1987; Lorberbaum, 1994).

There can be stable polymorphisms among neutral mutants, whose realized behaviors are indistinguishable from one another. (This is the case, for example, of a population of C and TFT).

NoiseIf the system is perturbed by “noise,” these behaviors become distinct and differences in their reproductive success rates are amplified.

As a result, interest has shifted from the proof of the existence of a solution to the design of repeated game strategies that perform well against other sophisticated strategies.

Consider a population of strategies competing over a niche that can only maintain a fixed number of individuals, i.e., the population’s size is upwardly bounded by the system’s carrying capacity.

In each generation, each strategy is matched against every other, itself, & RANDOM in pairwise games.

Between generations, the strategies reproduce, where the chance of successful reproduction (“fitness”) is determined by the payoffs (i.e., payoffs play the role of reproductive rates). 

Then, strategies that do better than average will grow as a share of the population and those that do worse than average will eventually die-out. . .

Replicator Dynamics

Replicator Dynamics

There is a very simple way to describe this process.Let:

x(A) = the proportion of the population using strategy A in a given generation; V(A) = strategy A’s tournament score; V = the population’s average score.

Then A’s population share in the next generation is:

x’(A) = x(A)   

V(A)V

Replicator DynamicsFor any finite set of strategies, the replicator dynamic will attain a fixed-point, where population shares do not change and all strategies are equally fit, i.e., V(A) = V(B), for all B.

However, the dynamic described is population-specific. For instance, if the population consists entirely of naive cooperators (ALWAYS COOPERATE), then x(A) = x’(A) = 1, and the process is at a fixed-point. To be sure, the population is in equilibrium, but only in a very weak sense. For if a single D strategy were to “invade” the population, the system would be driven away from equilibrium, and C would be driven toward extinction.

Simulating Evolution

An evolutionary model includes three components: Reproduction + Selection + Variation

Population of

Strategies

SelectionMechanism

VariationMechanism

Mutation orLearning

Reproduction

Competition

Invasion

The Trouble with TIT FOR TAT

TIT FOR TAT is susceptible to 2 types of perturbations:

Mutations: random Cs can invade TFT (TFT is not ESS), which in turn allows exploiters to gain a foothold.

Noise: a “mistake” between a pair of TFTs induces CD, DC cycles (“mirroring” or “echo” effect).

TIT FOR TAT never beats its opponent; it wins because it elicits reciprocal cooperation. It never exploits “naively” nice strategies.

(See Poundstone: 242-248; Casti 76-84.)

Noise in the form of random errors in implementing or perceiving an action is a common problem in real-world interactions. Such misunderstandings may lead “well-intentioned” cooperators into periods of alternating or mutual defection resulting in lower tournament scores.

TFT: C C C CTFT: C C C D

“mistake”

The Trouble with TIT FOR TAT

Noise in the form of random errors in implementing or perceiving an action is a common problem in real-world interactions. Such misunderstandings may lead “well-intentioned” cooperators into periods of alternating or mutual defection resulting in lower tournament scores.

TFT: C C C C D C D ….TFT: C C C D C D C ….

“mistake”

Avg Payoff = R (T+S)/2

The Trouble with TIT FOR TAT

Nowak and Sigmund (1993) ran an extensive series of computer-based experiments and found the simple learning rule PAVLOV outperformed TIT FOR TAT in the presence of noise.

PAVLOV (win-stay, loose-switch) Cooperate after both cooperated or both defected;otherwise defect.

The Trouble with TIT FOR TAT

PAVLOV cannot be invaded by random C; PAVLOV is an exploiter (will “fleece a sucker” once it discovers no need to fear retaliation).

A mistake between a pair of PAVLOVs causes only a single round of mutual defection followed by a return to mutual cooperation.

PAV: C C C C D C CPAV: C C C D D C C

“mistake”

The Trouble with TIT FOR TAT

Pop. Share0.140

0.100

0.060

0.020

0 200 400 600 800Generations

Simulating Evolution

1(TFT)326

7,9

10411

5

81814,12,1513

No. = Position after 1st Generation

Source:Axelrod 1984, p. 51.

Simulating EvolutionPAV

TFT

GRIM (TRIGGER)D

R

C

Population shares for 6 RPD strategies (including RANDOM), with noise at 0.01 level.

Pop. Shares 0.50

0.40

0.30

0.20

0.10

0.00Generations

GTFT?

In the Repeated Prisoner’s Dilemma, it has been suggested that “uncooperative behavior is the result of ‘unbounded rationality’, i.e., the assumed availability of unlimited reasoning and computational resources to the players” (Papadimitrou, 1992: 122). If players are bounded rational, on the other hand, the cooperative outcome may emerge as the result of a “muddling” process. They reason inductively and adapt (imitate or learn) locally superior strategies.

Thus, not only is bounded rationality a more “realistic” approach, it may also solve some deep analytical problems, e.g., resolution of finite horizon paradoxes.

Bounded Rationality

Tournament AssignmentDesign a strategy to play an Evolutionary Prisoner’s Dilemma Tournament.

Entries will meet in a round robin tournament, with 1% noise (i.e., for each intended choice there is a 1% chance that the opposite choice will be implemented). Games will last at least 1000 repetitions (each generation), and after each generation, population shares will be adjusted according to the replicator dynamic, so that strategies that do better than average will grow as a share of the population whereas others will be driven to extinction. The winner or winners will be those strategies that survive after at least 10,000 generations. 

Designing Repeated Game Strategies

Imagine a very simple decision making machine playing a repeated game. The machine has very little information at the start of the game: no knowledge of the payoffs or “priors” over the opponent’s behavior. It merely makes a choice, receives a payoff, then adapts its behavior, and so on.

The machine, though very simple, is able to implement a strategy against any possible opponent, i.e., it “knows what to do” in any possible situation of the game.

Designing Repeated Game Strategies

A repeated game strategy is a map from a history to an action. A history is all the actions in the game thus far ….

… T-3 T-2 T-1

To

C C C C D C CC C C D D C D

History at time To

?

Designing Repeated Game Strategies

A repeated game strategy is a map from a history to an action. A history is all the actions in the game thus far, subject to the constraint of a finite memory:

… T-3 T-2 T-1

To

C C C C D C CC C C D D C C

History of memory-4

?

Designing Repeated Game Strategies

TIT FOR TAT is a remarkably simple repeated game strategy. It merely requires recall of what happened in the last round (memory-1).

… T-3 T-2 T-1

To

C C C C D D CC C C D D C D

History of memory-1

?

Finite AutomataA FINITE AUTOMATON (FA) is a mathematical representation of a simple decision-making process. FA are completely described by:

• A finite set of internal states• An initial state• An output function• A transition function

The output function determines an action, C or D, in each state.The transition function determines how the FA changes states inresponse to the inputs it receives (e.g., actions of other FA).

Rubinstein, “Finite Automata Play the Repeated PD” JET, 1986)

FA will implement a strategy against any possible opponent, i.e., they “know what to do” in any possible situation of the game.

FA meet in 2-player repeated games and make a move in each round (either C or D). Depending upon the outcome of that round, they “decide” what to play on the next round, and so on.

FA are very simple, have no knowledge of the payoffs or priors over the opponent’s behavior, and no deductive ability. They simply read and react to what happens. Nonetheless, they are capable of a crude form of “learning” — they receive payoffs that reinforce certain behaviors and “punish” others.

Finite Automata

Finite Automata

DC D

C D

“TIT FOR TAT”

C

Finite Automata

CC D

C C D

DD

C

“TIT FOR TWO TATS”

Finite AutomataSome examples:

C C

D

D D D

C,D

C D

D

C

C,D

C

DSTART

“ALWAYS DEFECT” “TIT FOR TAT” “GRIM (TRIGGER)”

C DD

D

C C D

“PAVLOV” “M5”

CC C

DD

C

C

Calculating Automata Payoffs

DC DD

D

C C

“PAVLOV” “M5”

CCDD

C

C

D

Time-average payoffs can be calculated because any pair of FA will achieve cycles, since each FA takes as input only the actions in the previous period (i.e., it is “Markovian”).

For example, consider the following pair of FA:

C

Calculating Automata Payoffs

DC DD

D

C C

“PAVLOV” “M5”

CCDD

C

C

PAVLOV: CM5: D

D

C

Calculating Automata Payoffs

DC DD

D

C C

“PAVLOV” “M5”

CCDD

C

C

PAVLOV: C DM5: D C

D

C

Calculating Automata Payoffs

DC DD

D

C C

“PAVLOV” “M5”

CC C

DD

C

C

Payoff 0 5 1 0 5 1 0 5 AVG=2PAVLOV C D D C D D C DM5 D C D D C D D CPayoff 5 0 1 5 0 1 5 AVG=2

Dcycle cycle cycle

Tournament AssignmentTo design your strategy, access the programs through your fas Unix account. The Finite Automaton Creation Tool (fa) will prompt you to create a finite automata to implement your strategy. Select the number of internal states, designate the initial state, define output and transition functions, which together determine how an automaton “behaves.” The program also allows you to specify probabilistic output and transition functions. Simple probabilistic strategies such as GENEROUS TIT FOR TAT have been shown to perform particularly well in noisy environments, because they avoid costly sequences of alternating defections that undermine sustained cooperation.

Some examples:

C CD D .9D

C,D

C D

D

C

CC

DSTART

ALWAYS DEFECT TIT FOR TAT GENEROUS PAVLOV

Tournament Assignment

A number of test runs will be held and results will be distributed to the class. You can revise your strategy as often as you like before the final submission date. You can also create your own tournament environment and test various designs before submitting.

Entries must be submitted by 5pm, Friday, May 6.

D

Creating your automaton

To create a finite automaton (fa) you need to run the fa creation program. Log into your unix account via an ice server and at the ice% prompt, type:

~neugebor/simulation/fa

From there, simply follow the instructions provided. Use your user name as the name for the fa. If anything goes wrong, simply press “ctrl-c” and start over.

Computer Instructions

Creating your automaton

The program prompts the user to:

• specify the number of states in the automaton, with an upper limit of 50. For each state, the program asks:

• “choose an action (cooperate or defect);” and • “in response to cooperate (defect), transition to what state?”

Finally, the program asks:• specify the initial state.

The program also allows the user to specify probabilistic outputsand transitions.

Computer Instructions

Submitting your automaton

After creating the fa, submit it by typing:cp username.fa ~neugebor/ece1040.11chmod 744 ~neugebor/ece1040.11/username.fa

where username is your user name. You may resubmit as often as you like before the submission deadline.

Computer Instructions

Testing your automaton

You may wish to test your fa before submitting it. You can do this by running sample tournaments with different fa’s you’ve created. To run the tournament program, you must copy it into your own account. You can do this by typing:

mkdir simulationcp ~neugebor/simulation/* simulation

To change into the directory with the tournament program type:cd simulation

Then, to run the tournament type:./tournament

Computer Instructions

Testing your automaton

Follow the instructions provided. Note that running a tournament with many fa’s can be computationally intensive and may take a long time to complete. Use your favorite text editor to view the results of the tournament (“less” is an easy option if you are unfamiliar with unix -- type “less textfilename” to open a text file).To create extra automaton to test in your tournament type: ./faName each fa whatever you want by entering the any name you wish to use instead of your user name. Initially six different kinds of fa’s are in the directory: D, C, TFT, GRIM, PAVLOV, AND RANDOM. Experiment with these and others as you like.

Computer Instructions

4/13 How to Promote Cooperation?Axelrod, Ch. 6-9: 104-191.

Next Time

top related