unit iii: the evolution of cooperation

Unit III: The Evolution of Cooperation

•Can Selfishness Save the Environment?•Repeated Games: the Folk Theorem•Evolutionary Games•A Tournament•How to Promote Cooperation/Unit Review

Repeated Games

Some Questions:

• What happens when a game is repeated? • Can threats and promises about the future

influence behavior in the present?• Cheap talk• Finitely repeated games: Backward induction• Indefinitely repeated games: Trigger strategies

The Folk Theorem

Theorem: Any payoff that pareto-dominates the one-shot NE can be supported in a SPNE of the repeated game, if the discount parameter is sufficiently high.

The Folk Theorem

In other words, in the repeatedgame, if the future matters “enough”i.e., (d > d*),there are zillions of equilibria!

• The theorem tells us that in general, repeated games give rise to a very large set of Nash equilibria. In the repeated PD, these are pareto-rankable, i.e., some are efficient and some are not.

• In this context, evolution can be seen as a process that selects for repeated game strategies with efficient payoffs.

“Survival of the Fittest”

The Folk Theorem

Evolutionary Games

Fifteen months after I had begun my systematic enquiry, I happened to read for amusement ‘Malthus on Population’ . . . It at once struck me that . . . favorable variations would tend to be preserved, and unfavorable ones to be destroyed. Here then I had at last got a theory by which to work.

Charles Darwin

Evolutionary Games

• Evolutionary Stability (ESS)• Hawk-Dove: an example• The Replicator Dynamic• The Trouble with TIT FOR TAT• Designing Repeated Game Strategies• Finite Automata

Evolutionary Games

Biological Evolution: Under the pressure of natural selection, any population (capable of reproduction and variation) will evolve so as to become better adapted to its environment, i.e., will develop in the direction of increasing “fitness.”

Economic Evolution: Firms that adopt efficient “routines” will survive, expand, and multiply; whereas others will be “weeded out” (Nelson and Winters, 1982).

Evolutionary StabilityEvolutionary Stable Strategy (ESS): A strategy is evolutionarily

stable if it cannot be invaded by a mutant strategy.

(Maynard Smith & Price, 1973)

A strategy, A, is ESS, if

i) V(A/A) > V(B/A), for all B ii) either V(A/A) > V(B/A) or V(A/B) > V(B/B), for all B

Hawk-Dove: an example

Imagine a population of Hawks and Doves competing over a scarce resource (say food in a given area). The share of each type in the population changes according to the payoff matrix, so that payoffs determine the number of offspring left to the next generation.

v = value of the resourcec = cost of fighting

H/D: Hawk gets resource; Dove flees (v, 0)D/D: Share resource (v/2, v/2)H/H: Share resource less cost of fighting ((v-c)/2, (v-c)/2)

(See Hargreave-Heap and Varoufakis: 195-214; Casti: 71-75.)

H (v-c)/2,(v-c)/2 v,0

D 0,v v/2,v/2

v = value of resourcec = cost of fighting

H -1,-1 4,0

D 0,4 2, 2

v = value of resource = 4c = cost of fighting = 6

H -1,-1 4,0

D 0,4 2, 2

NE = {(1,0);(0,1);(2/3,2/3)} unstable stable

The mixed NE corresponds to a population that is 2/3 Hawks and 1/3 Doves

H -1,-1 4,0

D 0,4 2, 2

NE = {(1,0);(0,1);(2/3,2/3)} unstable stable Is any strategy ESS?

-1,-1 4,0

0,4 2,2

i) V(A/A) > V(B/A), for all B

ii) either V(A/A) > V(B/A) or V(A/B) > V(B/B), for all B

EP2(O) = 3p EP2(F) = 5-5p

p* = 5/8

NE = {(1,0);(0,1);(2/3,2/3)}

-1,-1 4,0

0,4 2,2

In other words, to be ESS, a strategy must be a NE with itself.

EP2(O) = 3p EP2(F) = 5-5p

p* = 5/8

NE = {(1,0);(0,1);(2/3,2/3)}

-1,-1 4,0

0,4 2,2

In other words, to be ESS, a strategy must be a NE with itself.

Neither H nor D is ESS.(For these payoffs.) EP2(O) = 3p EP2(F) = 5-5p

p* = 5/8

NE = {(1,0);(0,1);(2/3,2/3)}

-1,-1 4,0

0,4 2,2

ii) either V(A/A) > V(B/A) or V(A/B) > V(B/B), for all B

What about the mixed NE strategy? = 3p EP2(F) = 5-5p

p* = 5/8

NE = {(1,0);(0,1);(2/3,2/3)}

-1,-1 4,0

0,4 2,2

V(H/H) = -1V(H/D) = 4V(D/H) = 0V(D/D) = 2V(H/M) = 2/3V(H/H)+1/3V(H/D) = 2/3V(M/H) = 2/3V(H/H)+1/3V(D/H) = -2/3V(D/M) = 2/3V(D/H)+1/3V(D/D) = 2/3V(M/D) = 2/3V(H/D)+1/3V(D/D) = 10/3V(M/M) = 2/3V(D/H)+1/3V(D/D) = 2/3

NE = {(1,0);(0,1);(2/3,2/3)}

Where M is the mixed strategy 2/3 Hawk, 1/3 Dove

-1,-1 4,0

0,4 2,2

V(H/H) = -1V(H/D) = 4V(D/H) = 0V(D/D) = 2V(H/M) = 2/3V(H/H)+1/3V(H/D) = 2/3V(M/H) = 2/3 ( -1 ) +1/3 ( 4 ) = 2/3V(D/M) = 2/3V(D/H)+1/3V(D/D) = 2/3V(M/D) = 2/3V(H/D)+1/3V(D/D) = 10/3V(M/M) = 2/3V(D/H)+1/3V(D/D) = 2/3

NE = {(1,0);(0,1);(2/3,2/3)}

-1,-1 4,0

0,4 2,2

V(H/H) = -1V(H/D) = 4V(D/H) = 0V(D/D) = 2V(H/M) = 2/3V(H/H)+1/3V(H/D) = 2/3V(M/H) = 2/3V(H/H)+1/3V(D/H) = -2/3V(D/M) = 2/3V(D/H)+1/3V(D/D) = 2/3V(M/D) = 2/3V(H/D)+1/3V(D/D) = 10/3V(M/M) = 4/9V(H/H)+2/9V(H/D)

= 2/9V(D/H)+1/9V(D/D) = 2/3

NE = {(1,0);(0,1);(2/3,2/3)}

-1,-1 4,0

0,4 2,2

To be an ESS

i) V(M/M) > V(B/M), for all B

ii) either V(M/M) > V(B/M) or V(M/B) > V(B/B), for all B(O) = 3p EP2(F) = 5-5p

p* = 5/8

NE = {(1,0);(0,1);(2/3,2/3)}

-1,-1 4,0

0,4 2,2

To be an ESS

i) V(M/M) = V(H/M) = V(D/M) = 2/3

p* = 5/8

NE = {(1,0);(0,1);(2/3,2/3)}

-1,-1 4,0

0,4 2,2

To be an ESS

i) V(M/M) = V(H/M) = V(D/M) = 2/3

p* = 5/8

NE = {(1,0);(0,1);(2/3,2/3)}

V(M/D) > V(D/D) 10/3 > 2 V(M/H) > V(H/H) -2/3 > -1

Evolutionary Stability in IRPD?

Evolutionary Stable Strategy (ESS): A strategy is evolutionarily stable if it cannot be invaded by a mutant strategy.

Is D an ESS?

i) V(D/D) > V(STFT/D) ?ii) V(D/D) > V(STFT/D) or V(D/STFT) > V(STFT/STFT) ?

Consider a mutant strategy called e.g., SUSPICIOUS TIT FOR TAT (STFT). STFT defects on the first round, then plays like TFT

Evolutionary Stable Strategy (ESS): A strategy is evolutionarily stable if it cannot be invaded by a mutant strategy.

Is D an ESS?

i) V(D/D) = V(STFT/D) ii) V(D/D) = V(STFT/D) or V(D/STFT) = V(STFT/STFT)

Consider a mutant strategy called e.g., SUSPICIOUS TIT FOR TAT (STFT). STFT defects on the first round, then plays like TFT

D and STFT are “neutral mutants”

Axelrod & Hamilton (1981) demonstrated that D is not an ESS, opening the way to subsequent tournament studies of the game.

This is a sort-of Folk Theorem for evolutionary games: In the one-shot Prisoner’s Dilemma, DEFECT is strictly dominant. But in the repeated game, ALWAYS DEFECT (D) can be invaded by a mutant strategy, e.g., SUSPICIOUS TIT FOR TAT (STFT).

• Many cooperative strategies do better than D, thus they can gain a foothold and grow as a share of the population.

• Depending on the initial population, the equilibrium reached can exhibit any amount of cooperation.

Is STFT an ESS?

It can be shown that there is no ESS in IRPD (Boyd & Lorberbaum, 1987; Lorberbaum, 1994).

There can be stable polymorphisms among neutral mutants, whose realized behaviors are indistinguishable from one another. (This is the case, for example, of a population of C and TFT).

NoiseIf the system is perturbed by “noise,” these behaviors become distinct and differences in their reproductive success rates are amplified.

As a result, interest has shifted from the proof of the existence of a solution to the design of repeated game strategies that perform well against other sophisticated strategies.

Consider a population of strategies competing over a niche that can only maintain a fixed number of individuals, i.e., the population’s size is upwardly bounded by the system’s carrying capacity.

In each generation, each strategy is matched against every other, itself, & RANDOM in pairwise games.

Between generations, the strategies reproduce, where the chance of successful reproduction (“fitness”) is determined by the payoffs (i.e., payoffs play the role of reproductive rates).

Then, strategies that do better than average will grow as a share of the population and those that do worse than average will eventually die-out. . .

Replicator Dynamics

There is a very simple way to describe this process.Let:

x(A) = the proportion of the population using strategy A in a given generation; V(A) = strategy A’s tournament score; V = the population’s average score.

Then A’s population share in the next generation is:

x’(A) = x(A)

Replicator DynamicsFor any finite set of strategies, the replicator dynamic will attain a fixed-point, where population shares do not change and all strategies are equally fit, i.e., V(A) = V(B), for all B.

However, the dynamic described is population-specific. For instance, if the population consists entirely of naive cooperators (ALWAYS COOPERATE), then x(A) = x’(A) = 1, and the process is at a fixed-point. To be sure, the population is in equilibrium, but only in a very weak sense. For if a single D strategy were to “invade” the population, the system would be driven away from equilibrium, and C would be driven toward extinction.

Simulating Evolution

An evolutionary model includes three components: Reproduction + Selection + Variation

Population of

Strategies

SelectionMechanism

VariationMechanism

Mutation orLearning

Reproduction

Competition

Invasion

The Trouble with TIT FOR TAT

TIT FOR TAT is susceptible to 2 types of perturbations:

Mutations: random Cs can invade TFT (TFT is not ESS), which in turn allows exploiters to gain a foothold.

Noise: a “mistake” between a pair of TFTs induces CD, DC cycles (“mirroring” or “echo” effect).

TIT FOR TAT never beats its opponent; it wins because it elicits reciprocal cooperation. It never exploits “naively” nice strategies.

(See Poundstone: 242-248; Casti 76-84.)

Noise in the form of random errors in implementing or perceiving an action is a common problem in real-world interactions. Such misunderstandings may lead “well-intentioned” cooperators into periods of alternating or mutual defection resulting in lower tournament scores.

TFT: C C C CTFT: C C C D

“mistake”

Noise in the form of random errors in implementing or perceiving an action is a common problem in real-world interactions. Such misunderstandings may lead “well-intentioned” cooperators into periods of alternating or mutual defection resulting in lower tournament scores.

TFT: C C C C D C D ….TFT: C C C D C D C ….

“mistake”

Avg Payoff = R (T+S)/2

Nowak and Sigmund (1993) ran an extensive series of computer-based experiments and found the simple learning rule PAVLOV outperformed TIT FOR TAT in the presence of noise.

PAVLOV (win-stay, loose-switch) Cooperate after both cooperated or both defected;otherwise defect.

PAVLOV cannot be invaded by random C; PAVLOV is an exploiter (will “fleece a sucker” once it discovers no need to fear retaliation).

A mistake between a pair of PAVLOVs causes only a single round of mutual defection followed by a return to mutual cooperation.

PAV: C C C C D C CPAV: C C C D D C C

“mistake”

Pop. Share0.140

0 200 400 600 800Generations

Simulating Evolution

1(TFT)326

81814,12,1513

No. = Position after 1st Generation

Source:Axelrod 1984, p. 51.

Simulating EvolutionPAV

GRIM (TRIGGER)D

Population shares for 6 RPD strategies (including RANDOM), with noise at 0.01 level.

Pop. Shares 0.50

0.00Generations

In the Repeated Prisoner’s Dilemma, it has been suggested that “uncooperative behavior is the result of ‘unbounded rationality’, i.e., the assumed availability of unlimited reasoning and computational resources to the players” (Papadimitrou, 1992: 122). If players are bounded rational, on the other hand, the cooperative outcome may emerge as the result of a “muddling” process. They reason inductively and adapt (imitate or learn) locally superior strategies.

Thus, not only is bounded rationality a more “realistic” approach, it may also solve some deep analytical problems, e.g., resolution of finite horizon paradoxes.

Bounded Rationality

Tournament AssignmentDesign a strategy to play an Evolutionary Prisoner’s Dilemma Tournament.

Entries will meet in a round robin tournament, with 1% noise (i.e., for each intended choice there is a 1% chance that the opposite choice will be implemented). Games will last at least 1000 repetitions (each generation), and after each generation, population shares will be adjusted according to the replicator dynamic, so that strategies that do better than average will grow as a share of the population whereas others will be driven to extinction. The winner or winners will be those strategies that survive after at least 10,000 generations.

Designing Repeated Game Strategies

Imagine a very simple decision making machine playing a repeated game. The machine has very little information at the start of the game: no knowledge of the payoffs or “priors” over the opponent’s behavior. It merely makes a choice, receives a payoff, then adapts its behavior, and so on.

The machine, though very simple, is able to implement a strategy against any possible opponent, i.e., it “knows what to do” in any possible situation of the game.

A repeated game strategy is a map from a history to an action. A history is all the actions in the game thus far ….

… T-3 T-2 T-1

C C C C D C CC C C D D C D

History at time To

A repeated game strategy is a map from a history to an action. A history is all the actions in the game thus far, subject to the constraint of a finite memory:

… T-3 T-2 T-1

C C C C D C CC C C D D C C

History of memory-4

TIT FOR TAT is a remarkably simple repeated game strategy. It merely requires recall of what happened in the last round (memory-1).

… T-3 T-2 T-1

C C C C D D CC C C D D C D

History of memory-1

Finite AutomataA FINITE AUTOMATON (FA) is a mathematical representation of a simple decision-making process. FA are completely described by:

• A finite set of internal states• An initial state• An output function• A transition function

The output function determines an action, C or D, in each state.The transition function determines how the FA changes states inresponse to the inputs it receives (e.g., actions of other FA).

Rubinstein, “Finite Automata Play the Repeated PD” JET, 1986)

FA will implement a strategy against any possible opponent, i.e., they “know what to do” in any possible situation of the game.

FA meet in 2-player repeated games and make a move in each round (either C or D). Depending upon the outcome of that round, they “decide” what to play on the next round, and so on.

FA are very simple, have no knowledge of the payoffs or priors over the opponent’s behavior, and no deductive ability. They simply read and react to what happens. Nonetheless, they are capable of a crude form of “learning” — they receive payoffs that reinforce certain behaviors and “punish” others.

Finite Automata

“TIT FOR TAT”

Finite Automata

“TIT FOR TWO TATS”

Finite AutomataSome examples:

DSTART

“ALWAYS DEFECT” “TIT FOR TAT” “GRIM (TRIGGER)”

“PAVLOV” “M5”

Calculating Automata Payoffs

Time-average payoffs can be calculated because any pair of FA will achieve cycles, since each FA takes as input only the actions in the previous period (i.e., it is “Markovian”).

For example, consider the following pair of FA:

PAVLOV: CM5: D

PAVLOV: C DM5: D C

Payoff 0 5 1 0 5 1 0 5 AVG=2PAVLOV C D D C D D C DM5 D C D D C D D CPayoff 5 0 1 5 0 1 5 AVG=2

Dcycle cycle cycle

Tournament AssignmentTo design your strategy, access the programs through your fas Unix account. The Finite Automaton Creation Tool (fa) will prompt you to create a finite automata to implement your strategy. Select the number of internal states, designate the initial state, define output and transition functions, which together determine how an automaton “behaves.” The program also allows you to specify probabilistic output and transition functions. Simple probabilistic strategies such as GENEROUS TIT FOR TAT have been shown to perform particularly well in noisy environments, because they avoid costly sequences of alternating defections that undermine sustained cooperation.

Some examples:

C CD D .9D

DSTART

ALWAYS DEFECT TIT FOR TAT GENEROUS PAVLOV

Tournament Assignment

A number of test runs will be held and results will be distributed to the class. You can revise your strategy as often as you like before the final submission date. You can also create your own tournament environment and test various designs before submitting.

Entries must be submitted by 5pm, Friday, May 6.

Creating your automaton

To create a finite automaton (fa) you need to run the fa creation program. Log into your unix account via an ice server and at the ice% prompt, type:

~neugebor/simulation/fa

From there, simply follow the instructions provided. Use your user name as the name for the fa. If anything goes wrong, simply press “ctrl-c” and start over.

Computer Instructions

Creating your automaton

The program prompts the user to:

• specify the number of states in the automaton, with an upper limit of 50. For each state, the program asks:

• “choose an action (cooperate or defect);” and • “in response to cooperate (defect), transition to what state?”

Finally, the program asks:• specify the initial state.

The program also allows the user to specify probabilistic outputsand transitions.

Submitting your automaton

After creating the fa, submit it by typing:cp username.fa ~neugebor/ece1040.11chmod 744 ~neugebor/ece1040.11/username.fa

where username is your user name. You may resubmit as often as you like before the submission deadline.

Testing your automaton

You may wish to test your fa before submitting it. You can do this by running sample tournaments with different fa’s you’ve created. To run the tournament program, you must copy it into your own account. You can do this by typing:

mkdir simulationcp ~neugebor/simulation/* simulation

To change into the directory with the tournament program type:cd simulation

Then, to run the tournament type:./tournament

Testing your automaton

Follow the instructions provided. Note that running a tournament with many fa’s can be computationally intensive and may take a long time to complete. Use your favorite text editor to view the results of the tournament (“less” is an easy option if you are unfamiliar with unix -- type “less textfilename” to open a text file).To create extra automaton to test in your tournament type: ./faName each fa whatever you want by entering the any name you wish to use instead of your user name. Initially six different kinds of fa’s are in the directory: D, C, TFT, GRIM, PAVLOV, AND RANDOM. Experiment with these and others as you like.

4/13 How to Promote Cooperation?Axelrod, Ch. 6-9: 104-191.

Next Time

unit iii: the evolution of cooperation

example h d h vc2

vc2 v

resource dove

repeated pd

cost of fighting vc2

repeated gamessome questions

population of hawks

share resource v2

Documents

evolution for cooperation

evolution iii

enforcement and the evolution of cooperation

evolution of cooperation

towards realistic models for evolution of cooperation

[robert axelrod] the evolution of cooperation(bookza.org)

evolution of cooperation between tumor...

unit iii: the evolution of cooperation

neural networks and the evolution of cooperation

the evolution of cooperation in infinitely repeated games...

the evolution of cooperation on bougainville · the...

the evolution of cooperation and trading (tect)

axelrodthe evolution of cooperation

- axelrod - the evolution of cooperation (1984)

axelrod(81) the evolution of cooperation

the evolution of cooperation through...

axelrod - the evolution of cooperation

evolution presentation iii

evolution of cooperation by phenotypic similarity€¦ ·...

the evolution of cooperation in infinitely repeated games:...