iterated prisoner’s dilemma game in evolutionary computation 2003. 10. 2 seung-ryong yang
DESCRIPTION
3 Motivation Evolutionary approach Understanding complex behaviors by investigating simulation results using evolutionary process Giving a way to find optimal strategies in a dynamic environment IPD game Model complex phenomena such as social and economic behaviors Provide a testbed to model dynamic environment Objectives Obtaining multiple good strategies Forming coalition to improve generalization abilityTRANSCRIPT
Iterated Prisoner’s Dilemma Game in Evolutionary Computation
2003. 10. 2
Seung-Ryong Yang
2
Agenda
Motivation
Iterated Prisoner’s Dilemma Game
Related Works
Strategic Coalition
Improving Generalization Ability
Experimental Results
Conclusion
3
Motivation
Evolutionary approachUnderstanding complex behaviors by investigating simulation results using evolutionary processGiving a way to find optimal strategies in a dynamic environment
IPD gameModel complex phenomena such as social and economic behaviorsProvide a testbed to model dynamic environment
ObjectivesObtaining multiple good strategiesForming coalition to improve generalization ability
4
Iterated Prisoner’s Dilemma Game (1/2)
OverviewPrisoner’s possible choice
Defection
Cooperation
CharacteristicsNon-cooperative
Non-zerosum
Types of Game2IPD (2-player Iterated Prisoner’s Dilemma) game
NIPD (N-player Iterated Prisoner’s Dilemma) game
Cooperate Defect
Cooperate R / R T / S
Defect S / T P / P
Payoff Matrix of 2IPD Game by Axelrod, R.(1984)
STRSPRT 2,
Cooperate Defect
Cooperate 3 / 3 0 / 5
Defect 5 / 0 1 / 1
5
Iterated Prisoner’s Dilemma Game (2/2)Representation of Strategy
History Table Recent Action ∙∙∙ Last Action Recent Action ∙∙∙ Last Action
Own History Opponent’s History
0 1 0 ∙∙∙ 1
l = 2 : Example History 11 01
2N History
6
Related Works
Previous StudyPaul J. Darwen and Xin Yao (1997) : Speciation as Automatic Categorical Modularization
Onn M. Shehory, et al. (1998) : Multi-agent Coordination through Coalition Formation
Y. G. Seo and S. B. Cho (1999) : Exploiting Coalition in Co-Evolutionary Learning
IssuesTopics are broad about coalition formation in multi-agent environment
Darwen and Yao have studied coalition in IPD game, but different
Focused on cooperation, the number of player, payoff variances, etc
7
What is Different?
Co-evolutionary LearningSelection Method
Rank BasedRoulette wheelTournament
Coalition FormationCoalition keeps surviving to next generationCondition to form coalition is flexible
Decision Making in CoalitionAdapting several decision making methods to coalition
Borda Function, Condorect FunctionAverage Payoff, Highest Payoff Weighted Voting
8
Evolving StrategyTo evolve strategy, we use ;
Genetic algorithmCo-evolutionary learningStrategic coalition
Evolutionary Process
9
Evolution of Agents (1/2)
Ci
C1
Ck
Before Population Current Population Next Population
Ci
C1
CkCj
Ci
C1
Ck
Cj
Cl
Evolution of AgentsAgents can develop their strategy using co-evolutionary learningWeak agents are removed from the population
Evolution of CoalitionFormed coalition survives to next generation Agents can join coalition generation by generation
Coalition survives or grows up
10
Evolution of Agents (2/2)Problem : Possibility of evolving by weak agents
Caused by removing better agent from the population who belongs to coalition
Making new agents by mixing better agents within coalition
PopulationCk
Ci
Cj
A1
A2
Random Extraction
CoalitionMutation
Ai
Repeat as the number of agents belong to coalition
11
Strategic Coalition (1/2)
What is Coalition?A cooperative game as a set A of agents in which each subset of A is called coalition - Matthias Klusch and Andreas Gerber, 2002
A group of agents that work jointly in order to accomplish their tasks - Onn M. Shehory, 1995
Coalition in the IPD game
Forming coalition through round-robin game
Pursuing more payoff using generalization ability
Coalition forms autonomously without supervision
12
DefinitionsDefinition 1 : Coalition Value
Definition 2 : Payoff Function
Definition 3 : Coalition Identification
CS
C
p
pw
wpS
Cp
C
i i
ii
C
iiiC
1
1
where
Strategic Coalition (2/2)
STRSPRT 2,
(1)
10)(1
1)(0
1
1
1
1
C
i iDi
C
i iCi
C
i iDi
C
i iCi
C
wC
wCDefect
wC
wCCooperate
D
if
if
)1(1
CRankCw
CS
wp
Rankii
Cii
(2)
(3)
Definition 4 : Decision Making
Definition 5 : Payoff Distribution
13
Coalition Formation (1/2)
A1
A2
A3
A4
Ak
An
Am
A5
Aj
...Ai
A2
Ai
A5
A3
C1
Aj
...
C2
Ci
A1
A4
C1
Ak
Al
C2
Am
An
Ci
... ...
Initial Population PopulationIncluding coalition
2IPD game
FormCoalition
Ai A5 A5 C1 C2 Ci
...
14
Coalition Formation (2/2)Algorithm
2IPD Game
Exceeds iterationper generation?
Game type?
Agent vs.Agent
Agent vs.Coalition
Coalition vs.Coalition
Satisfy conditionfor forming coalition?
FormingCoalition
JoiningCoalition
Genetic Operation
Satisfycondition?
N
N
N
Y
Y
StopY
2,
2.1 STpSTp ji
2.2 ,
STCji pp
2,.3 STpp ji
Forming coalition1. Round-robin 2IPD game2. Obtain rank3. Determine confidence of
agent according to the rank
Joining coalition1. Round-robin 2IPD game2. Obtain rank3. If number of agents > max. number of
agents within a coalition, remove the weakest agent
4. Determine confidence of each agent
15
Coalition Decision Making
Decision makingTo decide coalition’s opinionUse weighted voting method
Sharing profitsDistribution payoff with each agent’s confidenceRank influences each weight
Determining next action of coalition
• : Weight for cooperation of coalition Ci
• : Weight for defection of coalition Ci
DiC
CiCCi
Cj
Ck
Cl
∑
∑
Ci
Cj
Ck
Cl
Previous Action Next Action
C
D
or
CiC
DiC
16
Weight of AgentsAdjusting weight
Give incentive to agents in coalitionIt reflects decision making of coalition
DiC
CiCCi
Cj
Ck
Cl
∑
∑
Ci
Cj
Ck
Cl
Previous Action Next Action
C
D
or
Adjusting weight
17
Improving Generalization Ability (1/2)
Problem of one good strategyNot adaptive to dynamic environment
Obtain multiple good strategies for specific environment
Ex) Biological immune system
MethodFitness sharing
Adjust confidences of multiple strategies by evolution
Co-evolution
Coalition formation
18
Improving Generalization Ability (2/2)
How good a player performs against unknown player
Evaluation
Random Generationof 100 Strategies
2IPD Game
Extract Top Strategies
in the Population
1 0001110...2 0000100...
3 0100100...4 0001100...5 0010010...
10 0000010...Top Strategies
Genetically Evolved Strategies
IPDGame
19
Test StrategyTest Strategies
Strategy CharacteristicsTit-For-Tat Initially cooperate, and then follow opponentTrigger Initially cooperate. Once opponent defects, continuously defectAllD Always defectCDCD Cooperate and defect over and overCCD Cooperate and cooperate and defectRandom Random move
Example Strategy
0 0 1 0 1 1 0 0
0 0 0 1 1 1 1 1
1 1 1 1 1 1 1 1
0 1 0 1 0 1 0 1
0 0 1 0 0 1 0 0
1 1 0 1 0 0 1 1
Tit-for-Tat
Trigger
AllD
CDCD
CCD
Random
20
Example of Game
Tit-for-Tat
1 0 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 1 0 1 0 0 1 1 0 0 0 1 0 0 1 1 1 0 1 1 0 0 0 1
Vs.Evolved Strategy
0 0 0 0
1 0 0 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 history 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 history
1 1 1 0
1 1 1 1
1 1 1 1
0 0 1 0
1 0 1 1
1 1 1 1
1 1 1 1
0 1 0 03
5
1
1
1
3
0
1
1
1
Payoff Payoff
1
2 3 4 5
1
2 3 4 5
21
Test Environment
Population size : 100
Crossover rate : 0.3
Mutation rate : 0.001
Number of generations : 200
Number of iterations : a third of population
Training set : Well-known 6 strategies
Experimental Result
22
0
1
2
3
4
1 2 3 4 5 6 7 8 9 10
Superior 10 Strategies
Payo
ff
Coalition Payoff
Coalition S.D
Random Payoff
Random S.D
Evolved Strategy vs. Random
Rank Genotype ofEvolved strategy
Evolved strategy Random
Avg. Payoff S.D. Avg. Payoff S.D
1 2 3 4 5 6 7 8 9 10
10111001111010111101001110011110101111011011101111111111111100111011111111111101001110111110111110110011000011111111111100111011111111111011001110111110111110111011111111111111111110111001111111111101
3.0800002.8000002.9200002.8800002.9400002.6800003.0400003.1600003.4800002.760000
1.9983991.9899751.9983991.9963971.9890701.6904441.9996001.9935901.9415461.985548
0.4800000.5500000.5200000.5700000.5400002.3500000.4900000.5000000.3800000.560000
0.4996000.4974940.4996000.6671580.5553381.9968730.4999000.6708200.4853860.496387
Random strategy is one of the weakest strategies for 2IPD game. In this game, the evolved strategies have a good performance. All strategies win the gameagainst Random test strategies with high payoffs.
Experimental Result
23
0
1
2
3
4
1 2 3 4 5 6 7 8 9 10
Superior 10 Strategies
Payo
ff
Coalition Payoff
Coalition S.D
TFT Payoff
TFT S.D
Evolved Strategy vs. Tit-for-Tat
Rank Genotype ofEvolved strategy
Evolved strategy Tit-for-Tat
Avg. Payoff S.D. Avg. Payoff S.D
1 2 3 4 5 6 7 8 9 10
11000100001011011100011011000010100111001000100000101101110000000100001010011100100010000010110111000101010000101101110011001000001010011100110011000010110111100111010000101101110001010100011011011100
3.0200003.0000001.0400001.0800002.9800003.0000001.0400003.0000003.0200003.000000
1.6369480.0000000.3979950.5600000.3458321.6248080.3979950.0000001.6369480.000000
2.6400003.0000000.9900001.0200002.9700002.6700000.9900003.0000002.6400003.000000
2.0616500.0000000.0994990.4237920.4112182.0447740.0994990.0000002.0616500.000000
Tit-for-Tat is a mimic strategy that gives “cooperation” on the first move in 2IPD game. The evolved strategies counteract in a proper way not to lose the game. It proves the generalization ability of the evolved strategies well.
Experimental Result
24
0
1
2
3
4
1 2 3 4 5 6 7 8 9 10
Superior 10 Strategies
Payo
ff
Coalition Payoff
Coalition S.D
Trigger Payoff
Trigger S.D
Evolved Strategy vs. Trigger
Rank Genotype ofEvolved strategy
Evolved strategy Trigger
Avg. Payoff S.D. Avg. Payoff S.D
1 2 3 4 5 6 7 8 9 10
10111011110011101000101110111100111010010011101111001111100010111011110011111001101110111100111110011011101111001111100110111111110010111000001110111100111110011011101111001111100100111011110011111001
1.0400001.0400001.0600001.0400001.0800001.0400001.0400001.0400001.0600001.040000
0.3979950.3979950.4431700.3979950.4833220.3979950.3979950.3979950.4431700.397995
0.9900000.9900001.0100000.9900001.0300000.9900000.9900000.9900001.0100000.990000
0.0994990.0994990.2233830.0994990.2984960.0994990.0994990.0994990.2233830.099499
Trigger strategy is never forgiving strategy for opponent’s defection. The way to win a game against Trigger is also choosing “defection” iteratively.
Experimental Result
25
0
1
2
3
4
1 2 3 4 5 6 7 8 9 10
Superior 10 Strategies
Payo
ff
Coalition Payoff
Coalition S.D
AllD Payoff
AllD S.D
Evolved Strategy vs. AllD
Rank Genotype ofEvolved strategy
Evolved strategy ALLD
Avg. Payoff S.D. Avg. Payoff S.D
1 2 3 4 5 6 7 8 9 10
00111111111110101111001111111111101011110011111111111010111100111011111110101111101111111111101011110011111111111010111110111011111110101111001111111111101011110011111111111010101100111111111110101111
1.0000001.0000001.0000001.0000001.0000001.0000001.0000001.0000001.0000001.000000
0.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.000000
1.0000001.0000001.0000001.0000001.0000001.0000001.0400001.0400001.0000001.000000
0.0000000.0000000.0000000.0000000.0000000.0000000.3979950.3979950.0000000.000000
The only way not to lose the game against AllD is only choosing “defection” on all moves. There is no way to cooperate for the game.
Experimental Result
26
Number of Coalition
0
5
10
15
20
25
30
0 20 40 60 80 100 Generation
Coa
litio
n
Coalition survives next generation. In early evolutionary process, most of coalitionare formed. It makes genetic diversity high and better choice against opponents.Coalition can grow if the conditions of agents are satisfied.
Experimental Result
27
Comparing the Results
The evolved strategies get more payoff against Random, CCD and CDCD than Tit-for-Tat, Trigger and AllD. It describes the evolved strategies exploit opponent’s actions well.
Experimental Result
28
Bias of the Strategy
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
0 50 100 150 200
RandomTFTTriggerAllDCDCDCCD
Bia
s
Generation
Bias shows how next choice of the strategies is selected against its opponents.The higher rate of bias means that a strategy chooses more “cooperation” than“defection” with a bias rate and vice versa.
Experimental Result
29
Conclusions
ConclusionStrategic coalition might be a robust method that can adapt to a dynamic environmentDecision making methods influence the results, but not seriousThe evolved strategies by coalition generalize well against various opponents
DiscussionCan the strategic coalition be adapted to n-IPD game ?Which parameters in IPD game influence generalization ability ?How can make opponent strategies to test ?How can adapt this problem to real world ?