iterated prisoner’s dilemma game in evolutionary computation 2003. 10. 2 seung-ryong yang

Iterated Prisoner’s Dilemma Game in Evolutionary Computation

2003. 10. 2

Seung-Ryong Yang

http://www.yonsei.ac.kr/

2

Agenda

Motivation

Iterated Prisoner’s Dilemma Game

Related Works

Strategic Coalition

Improving Generalization Ability

Experimental Results

Conclusion


3

Motivation

Evolutionary approachUnderstanding complex behaviors by investigating simulation results using evolutionary processGiving a way to find optimal strategies in a dynamic environment

IPD gameModel complex phenomena such as social and economic behaviorsProvide a testbed to model dynamic environment

ObjectivesObtaining multiple good strategiesForming coalition to improve generalization ability


4

Iterated Prisoner’s Dilemma Game (1/2)

OverviewPrisoner’s possible choice

Defection

Cooperation

CharacteristicsNon-cooperative

Non-zerosum

Types of Game2IPD (2-player Iterated Prisoner’s Dilemma) game

NIPD (N-player Iterated Prisoner’s Dilemma) game

Cooperate Defect

Cooperate R / R T / S

Defect S / T P / P

Payoff Matrix of 2IPD Game by Axelrod, R.(1984)

STRSPRT 2,

Cooperate Defect

Cooperate 3 / 3 0 / 5

Defect 5 / 0 1 / 1


5

Iterated Prisoner’s Dilemma Game (2/2)Representation of Strategy

History Table Recent Action ∙∙∙ Last Action Recent Action ∙∙∙ Last Action

Own History Opponent’s History

0 1 0 ∙∙∙ 1

l = 2 : Example History 11 01

2N History


6

Related Works

Previous StudyPaul J. Darwen and Xin Yao (1997) : Speciation as Automatic Categorical Modularization

Onn M. Shehory, et al. (1998) : Multi-agent Coordination through Coalition Formation

Y. G. Seo and S. B. Cho (1999) : Exploiting Coalition in Co-Evolutionary Learning

IssuesTopics are broad about coalition formation in multi-agent environment

Darwen and Yao have studied coalition in IPD game, but different

Focused on cooperation, the number of player, payoff variances, etc


7

What is Different?

Co-evolutionary LearningSelection Method

Rank BasedRoulette wheelTournament

Coalition FormationCoalition keeps surviving to next generationCondition to form coalition is flexible

Decision Making in CoalitionAdapting several decision making methods to coalition

Borda Function, Condorect FunctionAverage Payoff, Highest Payoff Weighted Voting


8

Evolving StrategyTo evolve strategy, we use ;

Genetic algorithmCo-evolutionary learningStrategic coalition

Evolutionary Process


9

Evolution of Agents (1/2)

Ci

C1

Ck

Before Population Current Population Next Population

Ci

C1

CkCj

Ci

C1

Ck

Cj

Cl

Evolution of AgentsAgents can develop their strategy using co-evolutionary learningWeak agents are removed from the population

Evolution of CoalitionFormed coalition survives to next generation Agents can join coalition generation by generation

Coalition survives or grows up


10

Evolution of Agents (2/2)Problem : Possibility of evolving by weak agents

Caused by removing better agent from the population who belongs to coalition

Making new agents by mixing better agents within coalition

PopulationCk

Ci

Cj

A1

A2

Random Extraction

CoalitionMutation

Ai

Repeat as the number of agents belong to coalition


11

Strategic Coalition (1/2)

What is Coalition?A cooperative game as a set A of agents in which each subset of A is called coalition － Matthias Klusch and Andreas Gerber, 2002

A group of agents that work jointly in order to accomplish their tasks － Onn M. Shehory, 1995

Coalition in the IPD game

Forming coalition through round-robin game

Pursuing more payoff using generalization ability

Coalition forms autonomously without supervision


12

DefinitionsDefinition 1 : Coalition Value

Definition 2 : Payoff Function

Definition 3 : Coalition Identification

CS

C

p

pw

wpS

Cp

C

i i

ii

C

iiiC

1

1

where

Strategic Coalition (2/2)

STRSPRT 2,

(1)

10)(1

1)(0

1

1

1

1

C

i iDi

C

i iCi

C

i iDi

C

i iCi

C

wC

wCDefect

wC

wCCooperate

D

if

if

)1(1

CRankCw

CS

wp

Rankii

Cii

(2)

(3)

Definition 4 : Decision Making

Definition 5 : Payoff Distribution


13

Coalition Formation (1/2)

A1

A2

A3

A4

Ak

An

Am

A5

Aj

...Ai

A2

Ai

A5

A3

C1

Aj

...

C2

Ci

A1

A4

C1

Ak

Al

C2

Am

An

Ci

... ...

Initial Population PopulationIncluding coalition

2IPD game

FormCoalition

Ai A5 A5 C1 C2 Ci

...


14

Coalition Formation (2/2)Algorithm

2IPD Game

Exceeds iterationper generation?

Game type?

Agent vs.Agent

Agent vs.Coalition

Coalition vs.Coalition

Satisfy conditionfor forming coalition?

FormingCoalition

JoiningCoalition

Genetic Operation

Satisfycondition?

N

N

N

Y

Y

StopY

2,

2.1 STpSTp ji

2.2 ,

STCji pp

2,.3 STpp ji

Forming coalition1. Round-robin 2IPD game2. Obtain rank3. Determine confidence of

agent according to the rank

Joining coalition1. Round-robin 2IPD game2. Obtain rank3. If number of agents > max. number of

agents within a coalition, remove the weakest agent

4. Determine confidence of each agent


15

Coalition Decision Making

Decision makingTo decide coalition’s opinionUse weighted voting method

Sharing profitsDistribution payoff with each agent’s confidenceRank influences each weight

Determining next action of coalition

• : Weight for cooperation of coalition Ci

• : Weight for defection of coalition Ci

DiC

CiCCi

Cj

Ck

Cl

∑

∑

Ci

Cj

Ck

Cl

Previous Action Next Action

C

D

or

CiC

DiC


16

Weight of AgentsAdjusting weight

Give incentive to agents in coalitionIt reflects decision making of coalition

DiC

CiCCi

Cj

Ck

Cl

∑

∑

Ci

Cj

Ck

Cl

Previous Action Next Action

C

D

or

Adjusting weight


17

Improving Generalization Ability (1/2)

Problem of one good strategyNot adaptive to dynamic environment

Obtain multiple good strategies for specific environment

Ex) Biological immune system

MethodFitness sharing

Adjust confidences of multiple strategies by evolution

Co-evolution

Coalition formation


18

Improving Generalization Ability (2/2)

How good a player performs against unknown player

Evaluation

Random Generationof 100 Strategies

2IPD Game

Extract Top Strategies

in the Population

1 0001110...2 0000100...

3 0100100...4 0001100...5 0010010...

10 0000010...Top Strategies

Genetically Evolved Strategies

IPDGame


19

Test StrategyTest Strategies

Strategy CharacteristicsTit-For-Tat Initially cooperate, and then follow opponentTrigger Initially cooperate. Once opponent defects, continuously defectAllD Always defectCDCD Cooperate and defect over and overCCD Cooperate and cooperate and defectRandom Random move

Example Strategy

0 0 1 0 1 1 0 0

0 0 0 1 1 1 1 1

1 1 1 1 1 1 1 1

0 1 0 1 0 1 0 1

0 0 1 0 0 1 0 0

1 1 0 1 0 0 1 1

Tit-for-Tat

Trigger

AllD

CDCD

CCD

Random


20

Example of Game

Tit-for-Tat

1 0 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 1 0 1 0 0 1 1 0 0 0 1 0 0 1 1 1 0 1 1 0 0 0 1

Vs.Evolved Strategy

0 0 0 0

1 0 0 0

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 history 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 history

1 1 1 0

1 1 1 1

1 1 1 1

0 0 1 0

1 0 1 1

1 1 1 1

1 1 1 1

0 1 0 03

5

1

1

1

3

0

1

1

1

Payoff Payoff

1

2 3 4 5

1

2 3 4 5


21

Test Environment

Population size : 100

Crossover rate : 0.3

Mutation rate : 0.001

Number of generations : 200

Number of iterations : a third of population

Training set : Well-known 6 strategies

Experimental Result


22

0

1

2

3

4

1 2 3 4 5 6 7 8 9 10

Superior 10 Strategies

Payo

ff

Coalition Payoff

Coalition S.D

Random Payoff

Random S.D

Evolved Strategy vs. Random

Rank Genotype ofEvolved strategy

Evolved strategy Random

Avg. Payoff S.D. Avg. Payoff S.D

1 2 3 4 5 6 7 8 9 10

10111001111010111101001110011110101111011011101111111111111100111011111111111101001110111110111110110011000011111111111100111011111111111011001110111110111110111011111111111111111110111001111111111101

3.0800002.8000002.9200002.8800002.9400002.6800003.0400003.1600003.4800002.760000

1.9983991.9899751.9983991.9963971.9890701.6904441.9996001.9935901.9415461.985548

0.4800000.5500000.5200000.5700000.5400002.3500000.4900000.5000000.3800000.560000

0.4996000.4974940.4996000.6671580.5553381.9968730.4999000.6708200.4853860.496387

Random strategy is one of the weakest strategies for 2IPD game. In this game, the evolved strategies have a good performance. All strategies win the gameagainst Random test strategies with high payoffs.

Experimental Result


23

0

1

2

3

4

1 2 3 4 5 6 7 8 9 10


Payo

ff

Coalition Payoff

Coalition S.D

TFT Payoff

TFT S.D

Evolved Strategy vs. Tit-for-Tat


Evolved strategy Tit-for-Tat


1 2 3 4 5 6 7 8 9 10

11000100001011011100011011000010100111001000100000101101110000000100001010011100100010000010110111000101010000101101110011001000001010011100110011000010110111100111010000101101110001010100011011011100

3.0200003.0000001.0400001.0800002.9800003.0000001.0400003.0000003.0200003.000000

1.6369480.0000000.3979950.5600000.3458321.6248080.3979950.0000001.6369480.000000

2.6400003.0000000.9900001.0200002.9700002.6700000.9900003.0000002.6400003.000000

2.0616500.0000000.0994990.4237920.4112182.0447740.0994990.0000002.0616500.000000

Tit-for-Tat is a mimic strategy that gives “cooperation” on the first move in 2IPD game. The evolved strategies counteract in a proper way not to lose the game. It proves the generalization ability of the evolved strategies well.

Experimental Result


24

0

1

2

3

4

1 2 3 4 5 6 7 8 9 10


Payo

ff

Coalition Payoff

Coalition S.D

Trigger Payoff

Trigger S.D

Evolved Strategy vs. Trigger


Evolved strategy Trigger


1 2 3 4 5 6 7 8 9 10

10111011110011101000101110111100111010010011101111001111100010111011110011111001101110111100111110011011101111001111100110111111110010111000001110111100111110011011101111001111100100111011110011111001

1.0400001.0400001.0600001.0400001.0800001.0400001.0400001.0400001.0600001.040000

0.3979950.3979950.4431700.3979950.4833220.3979950.3979950.3979950.4431700.397995

0.9900000.9900001.0100000.9900001.0300000.9900000.9900000.9900001.0100000.990000

0.0994990.0994990.2233830.0994990.2984960.0994990.0994990.0994990.2233830.099499

Trigger strategy is never forgiving strategy for opponent’s defection. The way to win a game against Trigger is also choosing “defection” iteratively.

Experimental Result


25

0

1

2

3

4

1 2 3 4 5 6 7 8 9 10


Payo

ff

Coalition Payoff

Coalition S.D

AllD Payoff

AllD S.D

Evolved Strategy vs. AllD


Evolved strategy ALLD


1 2 3 4 5 6 7 8 9 10

00111111111110101111001111111111101011110011111111111010111100111011111110101111101111111111101011110011111111111010111110111011111110101111001111111111101011110011111111111010101100111111111110101111

1.0000001.0000001.0000001.0000001.0000001.0000001.0000001.0000001.0000001.000000

0.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.000000

1.0000001.0000001.0000001.0000001.0000001.0000001.0400001.0400001.0000001.000000

0.0000000.0000000.0000000.0000000.0000000.0000000.3979950.3979950.0000000.000000

The only way not to lose the game against AllD is only choosing “defection” on all moves. There is no way to cooperate for the game.

Experimental Result


26

Number of Coalition

0

5

10

15

20

25

30

0 20 40 60 80 100 Generation

Coa

litio

n

Coalition survives next generation. In early evolutionary process, most of coalitionare formed. It makes genetic diversity high and better choice against opponents.Coalition can grow if the conditions of agents are satisfied.

Experimental Result


27

Comparing the Results

The evolved strategies get more payoff against Random, CCD and CDCD than Tit-for-Tat, Trigger and AllD. It describes the evolved strategies exploit opponent’s actions well.

Experimental Result


28

Bias of the Strategy

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

0 50 100 150 200

RandomTFTTriggerAllDCDCDCCD

Bia

s

Generation

Bias shows how next choice of the strategies is selected against its opponents.The higher rate of bias means that a strategy chooses more “cooperation” than“defection” with a bias rate and vice versa.

Experimental Result


29

Conclusions

ConclusionStrategic coalition might be a robust method that can adapt to a dynamic environmentDecision making methods influence the results, but not seriousThe evolved strategies by coalition generalize well against various opponents

DiscussionCan the strategic coalition be adapted to n-IPD game ?Which parameters in IPD game influence generalization ability ?How can make opponent strategies to test ?How can adapt this problem to real world ?


30

Examples (1)Market Observer


31

Examples (2)Forest Prediction


iterated prisoner’s dilemma game in evolutionary computation 2003. 10. 2 seung-ryong yang

Documents