multi-agent learning - utrecht university€¦ · author: gerard vreeswijk. slides last modified...
TRANSCRIPT
Multi-agent learningRepeated gamesGerard Vreeswijk, Intelligent Systems Group, Computer Science
Department, Faculty of Sciences, Utrecht University, TheNetherlands.
Wednesday 30th April, 2014
Repeated games: motivation
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
intera tionlearning
stage game
�niteinde�nitein�nite
Repeated games: motivation
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
1. Much intera tion in MASs can be modelled through games.
learning
stage game
�niteinde�nitein�nite
Repeated games: motivation
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
1. Much intera tion in MASs can be modelled through games.
2. Much learning in MASs can therefore be modelled through learning ingames.
stage game
�niteinde�nitein�nite
Repeated games: motivation
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
1. Much intera tion in MASs can be modelled through games.
2. Much learning in MASs can therefore be modelled through learning ingames.
3. Learning in games usually takes place through the (gradual)adaptation of strategies (hence, behaviour) in a repeated game.
stage game
�niteinde�nitein�nite
Repeated games: motivation
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
1. Much intera tion in MASs can be modelled through games.
2. Much learning in MASs can therefore be modelled through learning ingames.
3. Learning in games usually takes place through the (gradual)adaptation of strategies (hence, behaviour) in a repeated game.
4. In most repeated games, one game (a.k.a. stage game) is playedrepeatedly. Possibilities:
�niteinde�nitein�nite
Repeated games: motivation
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
1. Much intera tion in MASs can be modelled through games.
2. Much learning in MASs can therefore be modelled through learning ingames.
3. Learning in games usually takes place through the (gradual)adaptation of strategies (hence, behaviour) in a repeated game.
4. In most repeated games, one game (a.k.a. stage game) is playedrepeatedly. Possibilities:
■ A �nite number of times.
inde�nitein�nite
Repeated games: motivation
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
1. Much intera tion in MASs can be modelled through games.
2. Much learning in MASs can therefore be modelled through learning ingames.
3. Learning in games usually takes place through the (gradual)adaptation of strategies (hence, behaviour) in a repeated game.
4. In most repeated games, one game (a.k.a. stage game) is playedrepeatedly. Possibilities:
■ A �nite number of times.
■ An inde�nite (same: indeterminate) number of times.
in�nite
Repeated games: motivation
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
1. Much intera tion in MASs can be modelled through games.
2. Much learning in MASs can therefore be modelled through learning ingames.
3. Learning in games usually takes place through the (gradual)adaptation of strategies (hence, behaviour) in a repeated game.
4. In most repeated games, one game (a.k.a. stage game) is playedrepeatedly. Possibilities:
■ A �nite number of times.
■ An inde�nite (same: indeterminate) number of times.
■ An in�nite number of times.
Repeated games: motivation
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
1. Much intera tion in MASs can be modelled through games.
2. Much learning in MASs can therefore be modelled through learning ingames.
3. Learning in games usually takes place through the (gradual)adaptation of strategies (hence, behaviour) in a repeated game.
4. In most repeated games, one game (a.k.a. stage game) is playedrepeatedly. Possibilities:
■ A �nite number of times.
■ An inde�nite (same: indeterminate) number of times.
■ An in�nite number of times.
5. Therefore, familiarity with the basic concepts and results from thetheory of repeated games is essential to understand MAL.
Plan for today
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
ba kward indu tion
Dis ount fa torFolk theoremTrigger strategy
Plan for today
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ NE in normal form games that are repeated a finite number of times.
ba kward indu tion
Dis ount fa torFolk theoremTrigger strategy
Plan for today
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ NE in normal form games that are repeated a finite number of times.
● Principle of ba kward indu tion.
Dis ount fa torFolk theoremTrigger strategy
Plan for today
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ NE in normal form games that are repeated a finite number of times.
● Principle of ba kward indu tion.
■ NE in normal form games that are repeated an indefinite number oftimes.
Dis ount fa torFolk theoremTrigger strategy
Plan for today
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ NE in normal form games that are repeated a finite number of times.
● Principle of ba kward indu tion.
■ NE in normal form games that are repeated an indefinite number oftimes.
● Dis ount fa tor. Models the probability of continuation.
Folk theoremTrigger strategy
Plan for today
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ NE in normal form games that are repeated a finite number of times.
● Principle of ba kward indu tion.
■ NE in normal form games that are repeated an indefinite number oftimes.
● Dis ount fa tor. Models the probability of continuation.
● Folk theorem. (Actually many FT’s.) Repeated games generally dohave infinitely many Nash equilibria.
Trigger strategy
Plan for today
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ NE in normal form games that are repeated a finite number of times.
● Principle of ba kward indu tion.
■ NE in normal form games that are repeated an indefinite number oftimes.
● Dis ount fa tor. Models the probability of continuation.
● Folk theorem. (Actually many FT’s.) Repeated games generally dohave infinitely many Nash equilibria.
● Trigger strategy, on-path vs. off-path play, the threat to “minmax”an opponent.
Plan for today
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ NE in normal form games that are repeated a finite number of times.
● Principle of ba kward indu tion.
■ NE in normal form games that are repeated an indefinite number oftimes.
● Dis ount fa tor. Models the probability of continuation.
● Folk theorem. (Actually many FT’s.) Repeated games generally dohave infinitely many Nash equilibria.
● Trigger strategy, on-path vs. off-path play, the threat to “minmax”an opponent.
This presentation draws heavily on (Peters, 2008).
Plan for today
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ NE in normal form games that are repeated a finite number of times.
● Principle of ba kward indu tion.
■ NE in normal form games that are repeated an indefinite number oftimes.
● Dis ount fa tor. Models the probability of continuation.
● Folk theorem. (Actually many FT’s.) Repeated games generally dohave infinitely many Nash equilibria.
● Trigger strategy, on-path vs. off-path play, the threat to “minmax”an opponent.
This presentation draws heavily on (Peters, 2008).
* H. Peters (2008): Game Theory: A Multi-Leveled Approach. Springer, ISBN: 978-3-540-69290-4.Ch. 8: Repeated games.
Recap:
Stage games
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
The expected payoff of a stage game
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Prisoners’ DilemmaOther:
Cooperate DefectYou: Cooperate (3, 3) (0, 5)Defect (5, 0) (1, 1)
The expected payoff of a stage game
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Prisoners’ DilemmaOther:
Cooperate DefectYou: Cooperate (3, 3) (0, 5)Defect (5, 0) (1, 1)
■ Suppose following strategy profile for one game:
The expected payoff of a stage game
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Prisoners’ DilemmaOther:
Cooperate DefectYou: Cooperate (3, 3) (0, 5)Defect (5, 0) (1, 1)
■ Suppose following strategy profile for one game:
● Row player (you) plays with mixed strategy 0.8 on C.
The expected payoff of a stage game
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Prisoners’ DilemmaOther:
Cooperate DefectYou: Cooperate (3, 3) (0, 5)Defect (5, 0) (1, 1)
■ Suppose following strategy profile for one game:
● Row player (you) plays with mixed strategy 0.8 on C.
● Column player (other) plays with mixed strategy 0.7 on C.
The expected payoff of a stage game
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Prisoners’ DilemmaOther:
Cooperate DefectYou: Cooperate (3, 3) (0, 5)Defect (5, 0) (1, 1)
■ Suppose following strategy profile for one game:
● Row player (you) plays with mixed strategy 0.8 on C.
● Column player (other) plays with mixed strategy 0.7 on C.
■ Your expected payoff is 0.8(0.7· 3 + 0.3 · 0) + 0.2(0.7· 5 + 0.3 · 1) = 2.44
The expected payoff of a stage game
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Prisoners’ DilemmaOther:
Cooperate DefectYou: Cooperate (3, 3) (0, 5)Defect (5, 0) (1, 1)
■ Suppose following strategy profile for one game:
● Row player (you) plays with mixed strategy 0.8 on C.
● Column player (other) plays with mixed strategy 0.7 on C.
■ Your expected payoff is 0.8(0.7· 3 + 0.3 · 0) + 0.2(0.7· 5 + 0.3 · 1) = 2.44
■ General formula (cf., e.g., Leyton-Brown et al., 2008):
Expected payoffi,t(s) = ∑(i1,...,in)∈An
Πnk=1sk,ik
·payoffi(si1 , . . . , sin)
Expected payoffs two players in stage PD
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Player 1mayonlymove“back–front”;Player2 mayonlymove“left –right”.
Part I:
Nash equilibria
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Part I:
Nash equilibria
in normal form games
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Part I:
Nash equilibria
in normal form games
that are repeated
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Part I:
Nash equilibria
in normal form games
that are repeated
a finite number of times
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Nash equilibria in playing the PD twice
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:Prisoners’ Dilemma Cooperate DefectYou: Cooperate (3, 3) (0, 5)
Defect (5, 0) (1, 1)
one Nashequilibrium Dtwo times in su ession
Nash equilibria in playing the PD twice
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:Prisoners’ Dilemma Cooperate DefectYou: Cooperate (3, 3) (0, 5)
Defect (5, 0) (1, 1)
■ Even if mixed strategies are allowed, the PD possesses one Nashequilibrium, viz. (D, D) with payoffs (1, 1).
two times in su ession
Nash equilibria in playing the PD twice
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:Prisoners’ Dilemma Cooperate DefectYou: Cooperate (3, 3) (0, 5)
Defect (5, 0) (1, 1)
■ Even if mixed strategies are allowed, the PD possesses one Nashequilibrium, viz. (D, D) with payoffs (1, 1).
■ This equilibrium is Pareto sub-optimal. (Because (3, 3) makes bothplayers better off.)
two times in su ession
Nash equilibria in playing the PD twice
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:Prisoners’ Dilemma Cooperate DefectYou: Cooperate (3, 3) (0, 5)
Defect (5, 0) (1, 1)
■ Even if mixed strategies are allowed, the PD possesses one Nashequilibrium, viz. (D, D) with payoffs (1, 1).
■ This equilibrium is Pareto sub-optimal. (Because (3, 3) makes bothplayers better off.)
■ Does the situation change if two parties get to play the Prisoners’Dilemma two times in su ession?
Nash equilibria in playing the PD twice
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:Prisoners’ Dilemma Cooperate DefectYou: Cooperate (3, 3) (0, 5)
Defect (5, 0) (1, 1)
■ Even if mixed strategies are allowed, the PD possesses one Nashequilibrium, viz. (D, D) with payoffs (1, 1).
■ This equilibrium is Pareto sub-optimal. (Because (3, 3) makes bothplayers better off.)
■ Does the situation change if two parties get to play the Prisoners’Dilemma two times in su ession?
■ The following diagram (hopefully) shows that playing the PD twotimes in succession does not yield an essentially new NE.
Nash equilibria in playing the PD twice
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
(0, 0)
CC(3, 3)
CC(6, 6)
CD(3, 8)
DC(8, 3)
DD(4, 4)
CD(0, 5)
CC(3, 8)
CD(0, 10)
DC(5, 5)
DD(1, 6)
DC(5, 0)
CC(8, 3)
CD(5, 5)
DC(10, 0)
DD(6, 1)
DD(1, 1)
CC(4, 4)
CD(1, 6)
DC(6, 1)
DD(2, 2)
Nash equilibria in playing the PD twice
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
In normal form:
Other:CC CD DC DDYou: CC (6, 6) (3, 8) (3, 8) (0, 10)
CD (8, 3) (4, 4) (5, 5) (1, 6)DC (8, 3) (5, 5) (4, 4) (1, 6)DD (10, 0) (6, 1) (6, 1) (2, 2)
Nash equilibria in playing the PD twice
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
In normal form:
Other:CC CD DC DDYou: CC (6, 6) (3, 8) (3, 8) (0, 10)
CD (8, 3) (4, 4) (5, 5) (1, 6)DC (8, 3) (5, 5) (4, 4) (1, 6)DD (10, 0) (6, 1) (6, 1) (2, 2)
■ The action profile (DD, DD) is the only Nash equilibrium.
Nash equilibria in playing the PD twice
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
In normal form:
Other:CC CD DC DDYou: CC (6, 6) (3, 8) (3, 8) (0, 10)
CD (8, 3) (4, 4) (5, 5) (1, 6)DC (8, 3) (5, 5) (4, 4) (1, 6)DD (10, 0) (6, 1) (6, 1) (2, 2)
■ The action profile (DD, DD) is the only Nash equilibrium.
■ With 3 successive games, we obtain a 23 × 23 matrix, where the actionprofile (DDD, DDD) still would be the only Nash equilibrium.
Nash equilibria in playing the PD twice
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
In normal form:
Other:CC CD DC DDYou: CC (6, 6) (3, 8) (3, 8) (0, 10)
CD (8, 3) (4, 4) (5, 5) (1, 6)DC (8, 3) (5, 5) (4, 4) (1, 6)DD (10, 0) (6, 1) (6, 1) (2, 2)
■ The action profile (DD, DD) is the only Nash equilibrium.
■ With 3 successive games, we obtain a 23 × 23 matrix, where the actionprofile (DDD, DDD) still would be the only Nash equilibrium.
■ Generalise to N repetitions: (DN , DN) still is the only Nashequilibrium in a repeated game where the PD is played N times insuccession.
Backward induction (version for repeated games)
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
historystrategy
Backward induction (version for repeated games)
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ Suppose G is a game in normal form for p players, where all playerspossess the same arsenal of possible actions A = {a1, . . . , am}.
historystrategy
Backward induction (version for repeated games)
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ Suppose G is a game in normal form for p players, where all playerspossess the same arsenal of possible actions A = {a1, . . . , am}.
■ The game Gn arises by playing the stage game G a number of n timesin succession.
historystrategy
Backward induction (version for repeated games)
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ Suppose G is a game in normal form for p players, where all playerspossess the same arsenal of possible actions A = {a1, . . . , am}.
■ The game Gn arises by playing the stage game G a number of n timesin succession.
■ A history h of length k is an element of (Ap)k
strategy
Backward induction (version for repeated games)
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ Suppose G is a game in normal form for p players, where all playerspossess the same arsenal of possible actions A = {a1, . . . , am}.
■ The game Gn arises by playing the stage game G a number of n timesin succession.
■ A history h of length k is an element of (Ap)k, e.g., for p = 3 andk = 10,
a7 a5 a3 a6 a1 a9 a2 a7 a7 a3
a6 a9 a2 a4 a2 a9 a9 a1 a1 a4
a1 a2 a7 a9 a6 a1 a1 a8 a2 a4
is a history of length ten in a game with three players.
strategy
Backward induction (version for repeated games)
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ Suppose G is a game in normal form for p players, where all playerspossess the same arsenal of possible actions A = {a1, . . . , am}.
■ The game Gn arises by playing the stage game G a number of n timesin succession.
■ A history h of length k is an element of (Ap)k, e.g., for p = 3 andk = 10,
a7 a5 a3 a6 a1 a9 a2 a7 a7 a3
a6 a9 a2 a4 a2 a9 a9 a1 a1 a4
a1 a2 a7 a9 a6 a1 a1 a8 a2 a4
is a history of length ten in a game with three players. The set of allpossible histories is denoted by H.
strategy
Backward induction (version for repeated games)
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ Suppose G is a game in normal form for p players, where all playerspossess the same arsenal of possible actions A = {a1, . . . , am}.
■ The game Gn arises by playing the stage game G a number of n timesin succession.
■ A history h of length k is an element of (Ap)k, e.g., for p = 3 andk = 10,
a7 a5 a3 a6 a1 a9 a2 a7 a7 a3
a6 a9 a2 a4 a2 a9 a9 a1 a1 a4
a1 a2 a7 a9 a6 a1 a1 a8 a2 a4
is a history of length ten in a game with three players. The set of allpossible histories is denoted by H. (Hence, |Hk| = mkp.)
strategy
Backward induction (version for repeated games)
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ Suppose G is a game in normal form for p players, where all playerspossess the same arsenal of possible actions A = {a1, . . . , am}.
■ The game Gn arises by playing the stage game G a number of n timesin succession.
■ A history h of length k is an element of (Ap)k, e.g., for p = 3 andk = 10,
a7 a5 a3 a6 a1 a9 a2 a7 a7 a3
a6 a9 a2 a4 a2 a9 a9 a1 a1 a4
a1 a2 a7 a9 a6 a1 a1 a8 a2 a4
is a history of length ten in a game with three players. The set of allpossible histories is denoted by H. (Hence, |Hk| = mkp.)
■ A (possibly mixed) strategy for one player is a function H → Pr(A).
Backward induction (version for repeated games)
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Independen e on history determined future
Backward induction (version for repeated games)
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ For some repeated games of length n, the dominating (read: “clearlybest”) strategy for all players in round n (the last round) does notdepend on the history of play.
Independen e on history determined future
Backward induction (version for repeated games)
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ For some repeated games of length n, the dominating (read: “clearlybest”) strategy for all players in round n (the last round) does notdepend on the history of play. E.g., for the Prisoners’ Dilemma in lastround:
Independen e on history determined future
Backward induction (version for repeated games)
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ For some repeated games of length n, the dominating (read: “clearlybest”) strategy for all players in round n (the last round) does notdepend on the history of play. E.g., for the Prisoners’ Dilemma in lastround:
“No matter what happened in rounds 1 . . . n − 1, I am better off playing D.”
Independen e on history determined future
Backward induction (version for repeated games)
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ For some repeated games of length n, the dominating (read: “clearlybest”) strategy for all players in round n (the last round) does notdepend on the history of play. E.g., for the Prisoners’ Dilemma in lastround:
“No matter what happened in rounds 1 . . . n − 1, I am better off playing D.”
■ Fixed strategies (D, D) in round n determine play after round n − 1.
Independen e on history determined future
Backward induction (version for repeated games)
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ For some repeated games of length n, the dominating (read: “clearlybest”) strategy for all players in round n (the last round) does notdepend on the history of play. E.g., for the Prisoners’ Dilemma in lastround:
“No matter what happened in rounds 1 . . . n − 1, I am better off playing D.”
■ Fixed strategies (D, D) in round n determine play after round n − 1.
■ Independen e on history, plus a determined future, leads to the followingjustification for playing D in round n − 1:
Backward induction (version for repeated games)
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ For some repeated games of length n, the dominating (read: “clearlybest”) strategy for all players in round n (the last round) does notdepend on the history of play. E.g., for the Prisoners’ Dilemma in lastround:
“No matter what happened in rounds 1 . . . n − 1, I am better off playing D.”
■ Fixed strategies (D, D) in round n determine play after round n − 1.
■ Independen e on history, plus a determined future, leads to the followingjustification for playing D in round n − 1:
“No matter what happened in rounds 1 . . . n − 2 (the past), and given that Iwill receive a payoff of 1 in round n (the future), I am better off playing Dnow.”
Backward induction (version for repeated games)
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ For some repeated games of length n, the dominating (read: “clearlybest”) strategy for all players in round n (the last round) does notdepend on the history of play. E.g., for the Prisoners’ Dilemma in lastround:
“No matter what happened in rounds 1 . . . n − 1, I am better off playing D.”
■ Fixed strategies (D, D) in round n determine play after round n − 1.
■ Independen e on history, plus a determined future, leads to the followingjustification for playing D in round n − 1:
“No matter what happened in rounds 1 . . . n − 2 (the past), and given that Iwill receive a payoff of 1 in round n (the future), I am better off playing Dnow.”
■ Per induction in round k, where k ≥ 1:
Backward induction (version for repeated games)
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ For some repeated games of length n, the dominating (read: “clearlybest”) strategy for all players in round n (the last round) does notdepend on the history of play. E.g., for the Prisoners’ Dilemma in lastround:
“No matter what happened in rounds 1 . . . n − 1, I am better off playing D.”
■ Fixed strategies (D, D) in round n determine play after round n − 1.
■ Independen e on history, plus a determined future, leads to the followingjustification for playing D in round n − 1:
“No matter what happened in rounds 1 . . . n − 2 (the past), and given that Iwill receive a payoff of 1 in round n (the future), I am better off playing Dnow.”
■ Per induction in round k, where k ≥ 1: “No matter what happened inrounds 1 . . . k, and given that I will receive a payoff of (n − k) · 1 in rounds(k + 1) . . . n, I am better off playing D in round k.”
Part II:
Nash equilibria
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Part II:
Nash equilibria
in normal form games
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Part II:
Nash equilibria
in normal form games
that are repeated
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Part II:
Nash equilibria
in normal form games
that are repeated
an indefinite number of times
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Indefinite number of repetitions
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
inde�nite
dis ount fa torin�nitely many
Folk theorems
Indefinite number of repetitions
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ A Pareto-suboptimal outcome can be avoided in case the followingthree conditions are met.
inde�nite
dis ount fa torin�nitely many
Folk theorems
Indefinite number of repetitions
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ A Pareto-suboptimal outcome can be avoided in case the followingthree conditions are met.
1. The Prisoners’ Dilemma is repeated an inde�nite number of times(rounds).
dis ount fa torin�nitely many
Folk theorems
Indefinite number of repetitions
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ A Pareto-suboptimal outcome can be avoided in case the followingthree conditions are met.
1. The Prisoners’ Dilemma is repeated an inde�nite number of times(rounds).
2. A so-called dis ount fa tor δ ∈ [0, 1] determines the probability ofcontinuing the game after each round.
in�nitely many
Folk theorems
Indefinite number of repetitions
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ A Pareto-suboptimal outcome can be avoided in case the followingthree conditions are met.
1. The Prisoners’ Dilemma is repeated an inde�nite number of times(rounds).
2. A so-called dis ount fa tor δ ∈ [0, 1] determines the probability ofcontinuing the game after each round.
3. The probability to continue, δ, must be large enough.
in�nitely many
Folk theorems
Indefinite number of repetitions
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ A Pareto-suboptimal outcome can be avoided in case the followingthree conditions are met.
1. The Prisoners’ Dilemma is repeated an inde�nite number of times(rounds).
2. A so-called dis ount fa tor δ ∈ [0, 1] determines the probability ofcontinuing the game after each round.
3. The probability to continue, δ, must be large enough.
■ Under these conditions suddenly in�nitely many Nash equilibria exist.This is sometimes called an embarrassment of richness (Peters, 2008).
Folk theorems
Indefinite number of repetitions
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ A Pareto-suboptimal outcome can be avoided in case the followingthree conditions are met.
1. The Prisoners’ Dilemma is repeated an inde�nite number of times(rounds).
2. A so-called dis ount fa tor δ ∈ [0, 1] determines the probability ofcontinuing the game after each round.
3. The probability to continue, δ, must be large enough.
■ Under these conditions suddenly in�nitely many Nash equilibria exist.This is sometimes called an embarrassment of richness (Peters, 2008).
■ Various Folk theorems state the existence of multiple equilibria ininfinitely repeated games.
Indefinite number of repetitions
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ A Pareto-suboptimal outcome can be avoided in case the followingthree conditions are met.
1. The Prisoners’ Dilemma is repeated an inde�nite number of times(rounds).
2. A so-called dis ount fa tor δ ∈ [0, 1] determines the probability ofcontinuing the game after each round.
3. The probability to continue, δ, must be large enough.
■ Under these conditions suddenly in�nitely many Nash equilibria exist.This is sometimes called an embarrassment of richness (Peters, 2008).
■ Various Folk theorems state the existence of multiple equilibria ininfinitely repeated games.
■ We now informally discuss one version of “the” Folk Theorem.
Example: Prisoners’ Dilemma repeated indefinitely
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
the stage gameoverall gamehistoryrealisation
Example: Prisoners’ Dilemma repeated indefinitely
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ Consider the game G∗(δ).
the stage gameoverall gamehistoryrealisation
Example: Prisoners’ Dilemma repeated indefinitely
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ Consider the game G∗(δ). This is the PD game, G, played anindefinite number of times.
the stage gameoverall gamehistoryrealisation
Example: Prisoners’ Dilemma repeated indefinitely
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ Consider the game G∗(δ). This is the PD game, G, played anindefinite number of times.
■ The number of times the stage game G is played is determined by aparameter 0 ≤ δ ≤ 1.
the stage gameoverall gamehistoryrealisation
Example: Prisoners’ Dilemma repeated indefinitely
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ Consider the game G∗(δ). This is the PD game, G, played anindefinite number of times.
■ The number of times the stage game G is played is determined by aparameter 0 ≤ δ ≤ 1. The probability that the next stage (and thestages thereafter) will be played is δ.
the stage gameoverall gamehistoryrealisation
Example: Prisoners’ Dilemma repeated indefinitely
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ Consider the game G∗(δ). This is the PD game, G, played anindefinite number of times.
■ The number of times the stage game G is played is determined by aparameter 0 ≤ δ ≤ 1. The probability that the next stage (and thestages thereafter) will be played is δ.
Therefore, the probability that stage game Gt will be played is δt.
(What if t = 0?)
the stage gameoverall gamehistoryrealisation
Example: Prisoners’ Dilemma repeated indefinitely
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ Consider the game G∗(δ). This is the PD game, G, played anindefinite number of times.
■ The number of times the stage game G is played is determined by aparameter 0 ≤ δ ≤ 1. The probability that the next stage (and thestages thereafter) will be played is δ.
Therefore, the probability that stage game Gt will be played is δt.
(What if t = 0?)
■ The PD (of which every Gt is an incarnation) is called the stage game,as opposed to the overall game G∗(δ).
historyrealisation
Example: Prisoners’ Dilemma repeated indefinitely
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ Consider the game G∗(δ). This is the PD game, G, played anindefinite number of times.
■ The number of times the stage game G is played is determined by aparameter 0 ≤ δ ≤ 1. The probability that the next stage (and thestages thereafter) will be played is δ.
Therefore, the probability that stage game Gt will be played is δt.
(What if t = 0?)
■ The PD (of which every Gt is an incarnation) is called the stage game,as opposed to the overall game G∗(δ).
■ A history h of length t of a repeated game is a sequence of actionprofiles of length t.
realisation
Example: Prisoners’ Dilemma repeated indefinitely
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ Consider the game G∗(δ). This is the PD game, G, played anindefinite number of times.
■ The number of times the stage game G is played is determined by aparameter 0 ≤ δ ≤ 1. The probability that the next stage (and thestages thereafter) will be played is δ.
Therefore, the probability that stage game Gt will be played is δt.
(What if t = 0?)
■ The PD (of which every Gt is an incarnation) is called the stage game,as opposed to the overall game G∗(δ).
■ A history h of length t of a repeated game is a sequence of actionprofiles of length t.
■ A realisation h is a countably infinite sequence of action profiles.
Example: Prisoners’ Dilemma repeated indefinitely
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
(mixed) strategystrategy pro�leexpe ted payo�
Example: Prisoners’ Dilemma repeated indefinitely
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ Example of a history of length t = 10:
Row player: C D D D C C D D D DColumn player: C D D D D D D C D D
0 1 2 3 4 5 6 7 8 9
(mixed) strategystrategy pro�leexpe ted payo�
Example: Prisoners’ Dilemma repeated indefinitely
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ Example of a history of length t = 10:
Row player: C D D D C C D D D DColumn player: C D D D D D D C D D
0 1 2 3 4 5 6 7 8 9
■ The set of all possible histories (of any length) is denoted by H.
(mixed) strategystrategy pro�leexpe ted payo�
Example: Prisoners’ Dilemma repeated indefinitely
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ Example of a history of length t = 10:
Row player: C D D D C C D D D DColumn player: C D D D D D D C D D
0 1 2 3 4 5 6 7 8 9
■ The set of all possible histories (of any length) is denoted by H.
■ A (mixed) strategy for Player i is a function si : H → ∆{C, D} such that
Pr(Player i plays C in round |h|+ 1 | h ) = si(h) (C).
strategy pro�leexpe ted payo�
Example: Prisoners’ Dilemma repeated indefinitely
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ Example of a history of length t = 10:
Row player: C D D D C C D D D DColumn player: C D D D D D D C D D
0 1 2 3 4 5 6 7 8 9
■ The set of all possible histories (of any length) is denoted by H.
■ A (mixed) strategy for Player i is a function si : H → ∆{C, D} such that
Pr(Player i plays C in round |h|+ 1 | h ) = si(h) (C).
■ A strategy pro�le s is a combination of strategies, one for each player.
expe ted payo�
Example: Prisoners’ Dilemma repeated indefinitely
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ Example of a history of length t = 10:
Row player: C D D D C C D D D DColumn player: C D D D D D D C D D
0 1 2 3 4 5 6 7 8 9
■ The set of all possible histories (of any length) is denoted by H.
■ A (mixed) strategy for Player i is a function si : H → ∆{C, D} such that
Pr(Player i plays C in round |h|+ 1 | h ) = si(h) (C).
■ A strategy pro�le s is a combination of strategies, one for each player.
■ The expe ted payo� for player i given s can be computed. It is
Expected payoffi(s) =∞
∑t=0
δt Expected payoffi,t(s).
Subgame perfect equilibria of G∗(δ): D∗
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Definition. A subgame perfe t Nash equilibrium of an extensive (in thiscase: repeated) game is a Nash equilibrium of this extensive game, ofwhich the restriction to every subgame (read: tailgame) also is a Nashequilibrium to that subgame.
Subgame perfect equilibria of G∗(δ): D∗
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Definition. A subgame perfe t Nash equilibrium of an extensive (in thiscase: repeated) game is a Nash equilibrium of this extensive game, ofwhich the restriction to every subgame (read: tailgame) also is a Nashequilibrium to that subgame.
Consider the strategy of iterated defection D∗: “always defect, no matterwhat”.1
Subgame perfect equilibria of G∗(δ): D∗
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Definition. A subgame perfe t Nash equilibrium of an extensive (in thiscase: repeated) game is a Nash equilibrium of this extensive game, ofwhich the restriction to every subgame (read: tailgame) also is a Nashequilibrium to that subgame.
Consider the strategy of iterated defection D∗: “always defect, no matterwhat”.1
Claim. The strategy profile (D∗, D∗) is a subgame perfect equilibrium inG∗(δ).
Subgame perfect equilibria of G∗(δ): D∗
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Definition. A subgame perfe t Nash equilibrium of an extensive (in thiscase: repeated) game is a Nash equilibrium of this extensive game, ofwhich the restriction to every subgame (read: tailgame) also is a Nashequilibrium to that subgame.
Consider the strategy of iterated defection D∗: “always defect, no matterwhat”.1
Claim. The strategy profile (D∗, D∗) is a subgame perfect equilibrium inG∗(δ).
Proof. Consider any tailgame starting at round t.
Subgame perfect equilibria of G∗(δ): D∗
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Definition. A subgame perfe t Nash equilibrium of an extensive (in thiscase: repeated) game is a Nash equilibrium of this extensive game, ofwhich the restriction to every subgame (read: tailgame) also is a Nashequilibrium to that subgame.
Consider the strategy of iterated defection D∗: “always defect, no matterwhat”.1
Claim. The strategy profile (D∗, D∗) is a subgame perfect equilibrium inG∗(δ).
Proof. Consider any tailgame starting at round t. We are done if we canshow that (D∗, D∗) is a NE for this subgame.
Subgame perfect equilibria of G∗(δ): D∗
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Definition. A subgame perfe t Nash equilibrium of an extensive (in thiscase: repeated) game is a Nash equilibrium of this extensive game, ofwhich the restriction to every subgame (read: tailgame) also is a Nashequilibrium to that subgame.
Consider the strategy of iterated defection D∗: “always defect, no matterwhat”.1
Claim. The strategy profile (D∗, D∗) is a subgame perfect equilibrium inG∗(δ).
Proof. Consider any tailgame starting at round t. We are done if we canshow that (D∗, D∗) is a NE for this subgame. This is true: given that oneplayer always defects, it never pays off for the other player to play C atany time.
Subgame perfect equilibria of G∗(δ): D∗
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Definition. A subgame perfe t Nash equilibrium of an extensive (in thiscase: repeated) game is a Nash equilibrium of this extensive game, ofwhich the restriction to every subgame (read: tailgame) also is a Nashequilibrium to that subgame.
Consider the strategy of iterated defection D∗: “always defect, no matterwhat”.1
Claim. The strategy profile (D∗, D∗) is a subgame perfect equilibrium inG∗(δ).
Proof. Consider any tailgame starting at round t. We are done if we canshow that (D∗, D∗) is a NE for this subgame. This is true: given that oneplayer always defects, it never pays off for the other player to play C atany time. Hence, everyone plays D∗.
1A notation like D∗ or (worse) D∞ is suggestive. Mathematically it makes no sense,but intuitively it does.
Part III:
Trigger strategies
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Example: Cost of deviating in Round 4
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Consider the so-called trigger strategy T: “always play C unless D hasbeen played at least once. In that case play D forever”.
Example: Cost of deviating in Round 4
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Consider the so-called trigger strategy T: “always play C unless D hasbeen played at least once. In that case play D forever”.
Claim. The strategy profile (T, T) is a subgame perfect equilibrium in G∗(δ),provided the probability of continuation, δ, is sufficiently large.
Example: Cost of deviating in Round 4
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Consider the so-called trigger strategy T: “always play C unless D hasbeen played at least once. In that case play D forever”.
Claim. The strategy profile (T, T) is a subgame perfect equilibrium in G∗(δ),provided the probability of continuation, δ, is sufficiently large.
Proof. Consider a typical play:
Row player: C C C C C D D D D D . . .Column player: C C C C D D D D D D . . .
0 1 2 3 4 5 6 7 8 9 . . .
Example: Cost of deviating in Round 4
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Consider the so-called trigger strategy T: “always play C unless D hasbeen played at least once. In that case play D forever”.
Claim. The strategy profile (T, T) is a subgame perfect equilibrium in G∗(δ),provided the probability of continuation, δ, is sufficiently large.
Proof. Consider a typical play:
Row player: C C C C C D D D D D . . .Column player: C C C C D D D D D D . . .
0 1 2 3 4 5 6 7 8 9 . . .
Column player defects after Round 4. By doing so he expects a payoff of
4
∑t=0
δt· 3
Example: Cost of deviating in Round 4
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Consider the so-called trigger strategy T: “always play C unless D hasbeen played at least once. In that case play D forever”.
Claim. The strategy profile (T, T) is a subgame perfect equilibrium in G∗(δ),provided the probability of continuation, δ, is sufficiently large.
Proof. Consider a typical play:
Row player: C C C C C D D D D D . . .Column player: C C C C D D D D D D . . .
0 1 2 3 4 5 6 7 8 9 . . .
Column player defects after Round 4. By doing so he expects a payoff of
4
∑t=0
δt· 3 + δ
5· 5
Example: Cost of deviating in Round 4
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Consider the so-called trigger strategy T: “always play C unless D hasbeen played at least once. In that case play D forever”.
Claim. The strategy profile (T, T) is a subgame perfect equilibrium in G∗(δ),provided the probability of continuation, δ, is sufficiently large.
Proof. Consider a typical play:
Row player: C C C C C D D D D D . . .Column player: C C C C D D D D D D . . .
0 1 2 3 4 5 6 7 8 9 . . .
Column player defects after Round 4. By doing so he expects a payoff of
4
∑t=0
δt· 3 + δ
5· 5 +∞
∑t=6
δt· 1.
Example: Cost of deviating in Round 4
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
By cooperating throughout, the column player could have earned
∞
∑t=0
δt· 3
Example: Cost of deviating in Round 4
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
By cooperating throughout, the column player could have earned
∞
∑t=0
δt· 3
which means he forfeited
∞
∑t=0
δt· 3
Example: Cost of deviating in Round 4
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
By cooperating throughout, the column player could have earned
∞
∑t=0
δt· 3
which means he forfeited
∞
∑t=0
δt· 3 −
(
Example: Cost of deviating in Round 4
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
By cooperating throughout, the column player could have earned
∞
∑t=0
δt· 3
which means he forfeited
∞
∑t=0
δt· 3 −
(
4
∑t=0
δt· 3
Example: Cost of deviating in Round 4
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
By cooperating throughout, the column player could have earned
∞
∑t=0
δt· 3
which means he forfeited
∞
∑t=0
δt· 3 −
(
4
∑t=0
δt· 3 + δ
5· 5
Example: Cost of deviating in Round 4
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
By cooperating throughout, the column player could have earned
∞
∑t=0
δt· 3
which means he forfeited
∞
∑t=0
δt· 3 −
(
4
∑t=0
δt· 3 + δ
5· 5 +∞
∑t=6
δt· 1
)
Example: Cost of deviating in Round 4
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
By cooperating throughout, the column player could have earned
∞
∑t=0
δt· 3
which means he forfeited
∞
∑t=0
δt· 3 −
(
4
∑t=0
δt· 3 + δ
5· 5 +∞
∑t=6
δt· 1
)
= −2δ5 + 2
∞
∑t=6
δt
by deviating from T.
Example: Cost of deviating in Round 4
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
By cooperating throughout, the column player could have earned
∞
∑t=0
δt· 3
which means he forfeited
∞
∑t=0
δt· 3 −
(
4
∑t=0
δt· 3 + δ
5· 5 +∞
∑t=6
δt· 1
)
= −2δ5 + 2
∞
∑t=6
δt
by deviating from T. If δ 6= 0,
−2δ5 + 2
∞
∑t=6
δt> 0
Example: Cost of deviating in Round 4
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
By cooperating throughout, the column player could have earned
∞
∑t=0
δt· 3
which means he forfeited
∞
∑t=0
δt· 3 −
(
4
∑t=0
δt· 3 + δ
5· 5 +∞
∑t=6
δt· 1
)
= −2δ5 + 2
∞
∑t=6
δt
by deviating from T. If δ 6= 0,
−2δ5 + 2
∞
∑t=6
δt> 0 ⇔ 1 −
∞
∑t=1
δt> 0
Example: Cost of deviating in Round 4
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
By cooperating throughout, the column player could have earned
∞
∑t=0
δt· 3
which means he forfeited
∞
∑t=0
δt· 3 −
(
4
∑t=0
δt· 3 + δ
5· 5 +∞
∑t=6
δt· 1
)
= −2δ5 + 2
∞
∑t=6
δt
by deviating from T. If δ 6= 0,
−2δ5 + 2
∞
∑t=6
δt> 0 ⇔ 1 −
∞
∑t=1
δt> 0 ⇔ 1 −
∞
∑t=0
δt + 1 > 0
Example: Cost of deviating in Round 4
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
By cooperating throughout, the column player could have earned
∞
∑t=0
δt· 3
which means he forfeited
∞
∑t=0
δt· 3 −
(
4
∑t=0
δt· 3 + δ
5· 5 +∞
∑t=6
δt· 1
)
= −2δ5 + 2
∞
∑t=6
δt
by deviating from T. If δ 6= 0,
−2δ5 + 2
∞
∑t=6
δt> 0 ⇔ 1 −
∞
∑t=1
δt> 0 ⇔ 1 −
∞
∑t=0
δt + 1 > 0
⇔ 1 −1
1 − δ+ 1 > 0
Example: Cost of deviating in Round 4
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
By cooperating throughout, the column player could have earned
∞
∑t=0
δt· 3
which means he forfeited
∞
∑t=0
δt· 3 −
(
4
∑t=0
δt· 3 + δ
5· 5 +∞
∑t=6
δt· 1
)
= −2δ5 + 2
∞
∑t=6
δt
by deviating from T. If δ 6= 0,
−2δ5 + 2
∞
∑t=6
δt> 0 ⇔ 1 −
∞
∑t=1
δt> 0 ⇔ 1 −
∞
∑t=0
δt + 1 > 0
⇔ 1 −1
1 − δ+ 1 > 0 ⇔ δ >
1
2.
Cost of deviating in Round N
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Player starts to defect at Round N.
Cost of deviating in Round N
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Player starts to defect at Round N. By doing so he expects a payoff of
N−1
∑t=0
Cost of deviating in Round N
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Player starts to defect at Round N. By doing so he expects a payoff of
N−1
∑t=0
δt· 3
Cost of deviating in Round N
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Player starts to defect at Round N. By doing so he expects a payoff of
N−1
∑t=0
δt· 3 + δ
N· 5
Cost of deviating in Round N
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Player starts to defect at Round N. By doing so he expects a payoff of
N−1
∑t=0
δt· 3 + δ
N· 5 +∞
∑t=N+1
δt· 1
Cost of deviating in Round N
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Player starts to defect at Round N. By doing so he expects a payoff of
N−1
∑t=0
δt· 3 + δ
N· 5 +∞
∑t=N+1
δt· 1
By playing C throughout, the column player could have earned
Cost of deviating in Round N
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Player starts to defect at Round N. By doing so he expects a payoff of
N−1
∑t=0
δt· 3 + δ
N· 5 +∞
∑t=N+1
δt· 1
By playing C throughout, the column player could have earned
∞
∑t=0
Cost of deviating in Round N
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Player starts to defect at Round N. By doing so he expects a payoff of
N−1
∑t=0
δt· 3 + δ
N· 5 +∞
∑t=N+1
δt· 1
By playing C throughout, the column player could have earned
∞
∑t=0
δt· 3
Cost of deviating in Round N
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Player starts to defect at Round N. By doing so he expects a payoff of
N−1
∑t=0
δt· 3 + δ
N· 5 +∞
∑t=N+1
δt· 1
By playing C throughout, the column player could have earned
∞
∑t=0
δt· 3
which means he forfeited
Cost of deviating in Round N
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Player starts to defect at Round N. By doing so he expects a payoff of
N−1
∑t=0
δt· 3 + δ
N· 5 +∞
∑t=N+1
δt· 1
By playing C throughout, the column player could have earned
∞
∑t=0
δt· 3
which means he forfeited
δN· (3 − 5)
Cost of deviating in Round N
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Player starts to defect at Round N. By doing so he expects a payoff of
N−1
∑t=0
δt· 3 + δ
N· 5 +∞
∑t=N+1
δt· 1
By playing C throughout, the column player could have earned
∞
∑t=0
δt· 3
which means he forfeited
δN· (3 − 5) +
∞
∑t=N+1
δt· (3 − 1)
Cost of deviating in Round N
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Player starts to defect at Round N. By doing so he expects a payoff of
N−1
∑t=0
δt· 3 + δ
N· 5 +∞
∑t=N+1
δt· 1
By playing C throughout, the column player could have earned
∞
∑t=0
δt· 3
which means he forfeited
δN· (3 − 5) +
∞
∑t=N+1
δt· (3 − 1) = −2δ
N
Cost of deviating in Round N
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Player starts to defect at Round N. By doing so he expects a payoff of
N−1
∑t=0
δt· 3 + δ
N· 5 +∞
∑t=N+1
δt· 1
By playing C throughout, the column player could have earned
∞
∑t=0
δt· 3
which means he forfeited
δN· (3 − 5) +
∞
∑t=N+1
δt· (3 − 1) = −2δ
N + 2∞
∑t=N+1
δt
thanks to deviating in round N (and further).
Cost of deviating in Round N
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
If δ 6= 0, and
−2δN + 2
∞
∑t=N+1
δt> 0
then there is forfeit from payoff.
Cost of deviating in Round N
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
If δ 6= 0, and
−2δN + 2
∞
∑t=N+1
δt> 0
then there is forfeit from payoff. This is when
−2δN + 2
∞
∑t=N+1
δt> 0
Cost of deviating in Round N
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
If δ 6= 0, and
−2δN + 2
∞
∑t=N+1
δt> 0
then there is forfeit from payoff. This is when
−2δN + 2
∞
∑t=N+1
δt> 0
⇔ 1 −∞
∑t=N+1
δt−N
> 0
Cost of deviating in Round N
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
If δ 6= 0, and
−2δN + 2
∞
∑t=N+1
δt> 0
then there is forfeit from payoff. This is when
−2δN + 2
∞
∑t=N+1
δt> 0
⇔ 1 −∞
∑t=N+1
δt−N
> 0 ⇔ 1 −∞
∑t=0
δ(t−N)+(N+1)
> 0
Cost of deviating in Round N
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
If δ 6= 0, and
−2δN + 2
∞
∑t=N+1
δt> 0
then there is forfeit from payoff. This is when
−2δN + 2
∞
∑t=N+1
δt> 0
⇔ 1 −∞
∑t=N+1
δt−N
> 0 ⇔ 1 −∞
∑t=0
δ(t−N)+(N+1)
> 0
⇔ 1 − δ
∞
∑t=0
δt> 0
Cost of deviating in Round N
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
If δ 6= 0, and
−2δN + 2
∞
∑t=N+1
δt> 0
then there is forfeit from payoff. This is when
−2δN + 2
∞
∑t=N+1
δt> 0
⇔ 1 −∞
∑t=N+1
δt−N
> 0 ⇔ 1 −∞
∑t=0
δ(t−N)+(N+1)
> 0
⇔ 1 − δ
∞
∑t=0
δt> 0 ⇔ 1 − δ
1
1 − δ> 0
Cost of deviating in Round N
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
If δ 6= 0, and
−2δN + 2
∞
∑t=N+1
δt> 0
then there is forfeit from payoff. This is when
−2δN + 2
∞
∑t=N+1
δt> 0
⇔ 1 −∞
∑t=N+1
δt−N
> 0 ⇔ 1 −∞
∑t=0
δ(t−N)+(N+1)
> 0
⇔ 1 − δ
∞
∑t=0
δt> 0 ⇔ 1 − δ
1
1 − δ> 0 ⇔ δ >
1
2.
Cost of deviating in Round N
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
If δ 6= 0, and
−2δN + 2
∞
∑t=N+1
δt> 0
then there is forfeit from payoff. This is when
−2δN + 2
∞
∑t=N+1
δt> 0
⇔ 1 −∞
∑t=N+1
δt−N
> 0 ⇔ 1 −∞
∑t=0
δ(t−N)+(N+1)
> 0
⇔ 1 − δ
∞
∑t=0
δt> 0 ⇔ 1 − δ
1
1 − δ> 0 ⇔ δ >
1
2.
Thus, if δ > 1/2 every player forfeits payoff by deviating from T.
Example: an alternating trigger strategy
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Yet another subgame perfect equilibrium:
Example: an alternating trigger strategy
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Yet another subgame perfect equilibrium:
Informal definition of strategies. A and B tacitly agree to alternateactions, i.e. (C, D), (D, C), (C, D), . . . .
Example: an alternating trigger strategy
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Yet another subgame perfect equilibrium:
Informal definition of strategies. A and B tacitly agree to alternateactions, i.e. (C, D), (D, C), (C, D), . . . . If one of them deviates, the otherparty plays D forever.
Example: an alternating trigger strategy
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Yet another subgame perfect equilibrium:
Informal definition of strategies. A and B tacitly agree to alternateactions, i.e. (C, D), (D, C), (C, D), . . . . If one of them deviates, the otherparty plays D forever. (The party who originally deviated plays Dforever thereafter as well.)
Example: an alternating trigger strategy
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Yet another subgame perfect equilibrium:
Informal definition of strategies. A and B tacitly agree to alternateactions, i.e. (C, D), (D, C), (C, D), . . . . If one of them deviates, the otherparty plays D forever. (The party who originally deviated plays Dforever thereafter as well.) Notice the CKR aspect!
Example: an alternating trigger strategy
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Yet another subgame perfect equilibrium:
Informal definition of strategies. A and B tacitly agree to alternateactions, i.e. (C, D), (D, C), (C, D), . . . . If one of them deviates, the otherparty plays D forever. (The party who originally deviated plays Dforever thereafter as well.) Notice the CKR aspect!
Let A be the strategy that plays C in Round 1. Let B be the other strategy.
Example: an alternating trigger strategy
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Yet another subgame perfect equilibrium:
Informal definition of strategies. A and B tacitly agree to alternateactions, i.e. (C, D), (D, C), (C, D), . . . . If one of them deviates, the otherparty plays D forever. (The party who originally deviated plays Dforever thereafter as well.) Notice the CKR aspect!
Let A be the strategy that plays C in Round 1. Let B be the other strategy.
Claim. The strategy profile (A, B) is a subgame perfect equilibrium in G∗(δ),provided the probability of continuation, δ, is sufficiently large.
Example: an alternating trigger strategy
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Yet another subgame perfect equilibrium:
Informal definition of strategies. A and B tacitly agree to alternateactions, i.e. (C, D), (D, C), (C, D), . . . . If one of them deviates, the otherparty plays D forever. (The party who originally deviated plays Dforever thereafter as well.) Notice the CKR aspect!
Let A be the strategy that plays C in Round 1. Let B be the other strategy.
Claim. The strategy profile (A, B) is a subgame perfect equilibrium in G∗(δ),provided the probability of continuation, δ, is sufficiently large.
An analysis of this situation and a proof of this claim can be found in(Peters, 2008), pp. 104-105.*
*H. Peters (2008): Game Theory: A Multi-Leveled Approach. Springer, ISBN: 978-3-540-69290-4.
Generalisation of trigger strategies
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Generalisation of trigger strategies
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ Every convex combination2 of payoffs
α1(3, 3) + α2(0, 5) + α3(5, 0) + α4(1, 1)
can be established by smartly picking appropriate strategy patterns.
Generalisation of trigger strategies
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ Every convex combination2 of payoffs
α1(3, 3) + α2(0, 5) + α3(5, 0) + α4(1, 1)
can be established by smartly picking appropriate strategy patterns.E.g.: “We play 4 times (C, C). Then we play 7 times (C, D), (D, C),then 4 times (C, C) and so on”.
Generalisation of trigger strategies
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ Every convex combination2 of payoffs
α1(3, 3) + α2(0, 5) + α3(5, 0) + α4(1, 1)
can be established by smartly picking appropriate strategy patterns.E.g.: “We play 4 times (C, C). Then we play 7 times (C, D), (D, C),then 4 times (C, C) and so on”.
■ Ensure that (C, C) occurs (in the long run) in α1, (C, D) in α2, (D, C)in α3, and (D, D) in α4 percent of the stages.
Generalisation of trigger strategies
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ Every convex combination2 of payoffs
α1(3, 3) + α2(0, 5) + α3(5, 0) + α4(1, 1)
can be established by smartly picking appropriate strategy patterns.E.g.: “We play 4 times (C, C). Then we play 7 times (C, D), (D, C),then 4 times (C, C) and so on”.
■ Ensure that (C, C) occurs (in the long run) in α1, (C, D) in α2, (D, C)in α3, and (D, D) in α4 percent of the stages.
■ As long as these limiting average payoffs exceed payoff({D, D}) foreach player (which is 1), associated trigger strategies can beformulated that lead to these payoffs and trigger eternal play of(D, D) after a deviation.
Generalisation of trigger strategies
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ Every convex combination2 of payoffs
α1(3, 3) + α2(0, 5) + α3(5, 0) + α4(1, 1)
can be established by smartly picking appropriate strategy patterns.E.g.: “We play 4 times (C, C). Then we play 7 times (C, D), (D, C),then 4 times (C, C) and so on”.
■ Ensure that (C, C) occurs (in the long run) in α1, (C, D) in α2, (D, C)in α3, and (D, D) in α4 percent of the stages.
■ As long as these limiting average payoffs exceed payoff({D, D}) foreach player (which is 1), associated trigger strategies can beformulated that lead to these payoffs and trigger eternal play of(D, D) after a deviation.
■ For δ high enough, these strategies again form a SGP NE.2Meaning αi ≥ 0 and α1 + α2 + α3 + α4 = 1.
Folk theorem for SGP NE in the repeated PD
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
0
1
2
3
4
5
0 1 2 3 4 5
(3, 3)•
•
•
•
Folk theorem for SGP NE in the repeated PD
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
0
1
2
3
4
5
0 1 2 3 4 5
(3, 3)•
•
•
•
1. Feasible payoffs (striped):payoff combos that can beobtained by jointly repeatingpatterns of actions (more
accurately: patterns of actionprofiles).
Folk theorem for SGP NE in the repeated PD
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
0
1
2
3
4
5
0 1 2 3 4 5
(3, 3)•
•
•
•
1. Feasible payoffs (striped):payoff combos that can beobtained by jointly repeatingpatterns of actions (more
accurately: patterns of actionprofiles).
2. Enforceable payoffs (shaded):everyone resides aboveminmax.
Folk theorem for SGP NE in the repeated PD
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
0
1
2
3
4
5
0 1 2 3 4 5
(3, 3)•
•
•
•
1. Feasible payoffs (striped):payoff combos that can beobtained by jointly repeatingpatterns of actions (more
accurately: patterns of actionprofiles).
2. Enforceable payoffs (shaded):everyone resides aboveminmax.
For every payoff pair (x, y)in (1) ∩ (2),
Folk theorem for SGP NE in the repeated PD
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
0
1
2
3
4
5
0 1 2 3 4 5
(3, 3)•
•
•
•
1. Feasible payoffs (striped):payoff combos that can beobtained by jointly repeatingpatterns of actions (more
accurately: patterns of actionprofiles).
2. Enforceable payoffs (shaded):everyone resides aboveminmax.
For every payoff pair (x, y)in (1) ∩ (2), there is aδ(x, y) ∈ (0, 1),
Folk theorem for SGP NE in the repeated PD
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
0
1
2
3
4
5
0 1 2 3 4 5
(3, 3)•
•
•
•
1. Feasible payoffs (striped):payoff combos that can beobtained by jointly repeatingpatterns of actions (more
accurately: patterns of actionprofiles).
2. Enforceable payoffs (shaded):everyone resides aboveminmax.
For every payoff pair (x, y)in (1) ∩ (2), there is aδ(x, y) ∈ (0, 1), such that forall δ ≥ δ(x, y)
Folk theorem for SGP NE in the repeated PD
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
0
1
2
3
4
5
0 1 2 3 4 5
(3, 3)•
•
•
•
1. Feasible payoffs (striped):payoff combos that can beobtained by jointly repeatingpatterns of actions (more
accurately: patterns of actionprofiles).
2. Enforceable payoffs (shaded):everyone resides aboveminmax.
For every payoff pair (x, y)in (1) ∩ (2), there is aδ(x, y) ∈ (0, 1), such that forall δ ≥ δ(x, y) the payoff(x, y) can be obtained as thelimiting average in a sub-game perfect equilibrium ofG∗(δ).
Family of Folk Theorems
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
There actually exist many Folk Theorems.
Family of Folk Theorems
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
There actually exist many Folk Theorems.
■ Horizon. May the game be repeated infinitely (as in our case) or isthere an upper bound to the number of plays?
Family of Folk Theorems
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
There actually exist many Folk Theorems.
■ Horizon. May the game be repeated infinitely (as in our case) or isthere an upper bound to the number of plays?
■ Information. Do players act on the basis of CKR (present case), or arecertain parts of the history hidden?
Family of Folk Theorems
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
There actually exist many Folk Theorems.
■ Horizon. May the game be repeated infinitely (as in our case) or isthere an upper bound to the number of plays?
■ Information. Do players act on the basis of CKR (present case), or arecertain parts of the history hidden?
■ Reward. Do players collect their payoff through a discount factor(present case) or through average rewards?
Family of Folk Theorems
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
There actually exist many Folk Theorems.
■ Horizon. May the game be repeated infinitely (as in our case) or isthere an upper bound to the number of plays?
■ Information. Do players act on the basis of CKR (present case), or arecertain parts of the history hidden?
■ Reward. Do players collect their payoff through a discount factor(present case) or through average rewards?
■ Equilibrium. Do we consider Nash equilibria, or other forms ofequilibria, such as so-called ǫ-Nash equilibria or so-called correlatedequilibria?
Family of Folk Theorems
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
There actually exist many Folk Theorems.
■ Horizon. May the game be repeated infinitely (as in our case) or isthere an upper bound to the number of plays?
■ Information. Do players act on the basis of CKR (present case), or arecertain parts of the history hidden?
■ Reward. Do players collect their payoff through a discount factor(present case) or through average rewards?
■ Equilibrium. Do we consider Nash equilibria, or other forms ofequilibria, such as so-called ǫ-Nash equilibria or so-called correlatedequilibria?
■ Subgame perfectness. Do we consider subgame perfect equilibria(present case) or just Nash equilibria?
Part IV:
non-SGP Nash equilibria
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Existence of non-SGP Nash equilibria
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
many subgame perfe t equilibria exist for repeatedgamesexisten e of non-SGP Nash equilibria in repeated games
Existence of non-SGP Nash equilibria
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ We have seen that many subgame perfe t equilibria exist for repeatedgames.(At least for repeated games were both players have all information,horizon is infinite, and more assumptions.)
existen e of non-SGP Nash equilibria in repeated games
Existence of non-SGP Nash equilibria
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ We have seen that many subgame perfe t equilibria exist for repeatedgames.(At least for repeated games were both players have all information,horizon is infinite, and more assumptions.)
■ What about the existen e of non-SGP Nash equilibria in repeated games,i.e., equilibria that are not necessarily subgame perfect?
Existence of non-SGP Nash equilibria
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ We have seen that many subgame perfe t equilibria exist for repeatedgames.(At least for repeated games were both players have all information,horizon is infinite, and more assumptions.)
■ What about the existen e of non-SGP Nash equilibria in repeated games,i.e., equilibria that are not necessarily subgame perfect?
■ Without the requirement of subgame perfection, deviations can bepunished more severely: the equilibrium does not have to induce aNash equilibrium in the punishment subgame.
Existence of non-SGP Nash equilibria
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ We have seen that many subgame perfe t equilibria exist for repeatedgames.(At least for repeated games were both players have all information,horizon is infinite, and more assumptions.)
■ What about the existen e of non-SGP Nash equilibria in repeated games,i.e., equilibria that are not necessarily subgame perfect?
■ Without the requirement of subgame perfection, deviations can bepunished more severely: the equilibrium does not have to induce aNash equilibrium in the punishment subgame.
■ We will now consider the consequences of relaxing the subgameperfection requirement for a Nash equilibrium in an infinitelyrepeated game.
Example: A repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)
Example: A repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)
1. The pure profile (U, L) is the only mixed strategy profile that is a NE.
Example: A repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)
1. The pure profile (U, L) is the only mixed strategy profile that is a NE.
2. Define trigger-strategies (T1, T2) such that the pattern[(D, R), (U, L)3]∗ is played indefinitely. (So we have periods of length4.)
Example: A repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)
1. The pure profile (U, L) is the only mixed strategy profile that is a NE.
2. Define trigger-strategies (T1, T2) such that the pattern[(D, R), (U, L)3]∗ is played indefinitely. (So we have periods of length4.)
3. If this pattern is violated:
Example: A repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)
1. The pure profile (U, L) is the only mixed strategy profile that is a NE.
2. Define trigger-strategies (T1, T2) such that the pattern[(D, R), (U, L)3]∗ is played indefinitely. (So we have periods of length4.)
3. If this pattern is violated:
■ Fallback strategy of the row player (you) is mixed (0.8, 0.2)∗.
Example: A repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)
1. The pure profile (U, L) is the only mixed strategy profile that is a NE.
2. Define trigger-strategies (T1, T2) such that the pattern[(D, R), (U, L)3]∗ is played indefinitely. (So we have periods of length4.)
3. If this pattern is violated:
■ Fallback strategy of the row player (you) is mixed (0.8, 0.2)∗.
■ Fallback strategy of the column player is pure R∗.
Example: A repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)
1. The pure profile (U, L) is the only mixed strategy profile that is a NE.
2. Define trigger-strategies (T1, T2) such that the pattern[(D, R), (U, L)3]∗ is played indefinitely. (So we have periods of length4.)
3. If this pattern is violated:
■ Fallback strategy of the row player (you) is mixed (0.8, 0.2)∗.
■ Fallback strategy of the column player is pure R∗.
This combination is not a NE. (Cf. 1.)
Example: a repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)
Example: a repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)
Claim. The combination of trigger strategies (T1, T2) is a Nash-equilibrium forsome δ ∈ (0, 1).
Example: a repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)
Claim. The combination of trigger strategies (T1, T2) is a Nash-equilibrium forsome δ ∈ (0, 1).
■ T1 ⇒ T2. If you play (the non-degenerated part of) T1, then thecolumn player must play T2, for T2 is a best response to T1.
Example: a repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)
Claim. The combination of trigger strategies (T1, T2) is a Nash-equilibrium forsome δ ∈ (0, 1).
■ T1 ⇒ T2. If you play (the non-degenerated part of) T1, then thecolumn player must play T2, for T2 is a best response to T1.
■ T2 ⇒ T1. If at all, the best moment for you to deviate is at D, for thatwould give you a incidental advantage of 1. After that your opponentfalls back to R∗.
Example: a repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)
Claim. The combination of trigger strategies (T1, T2) is a Nash-equilibrium forsome δ ∈ (0, 1).
■ T1 ⇒ T2. If you play (the non-degenerated part of) T1, then thecolumn player must play T2, for T2 is a best response to T1.
■ T2 ⇒ T1. If at all, the best moment for you to deviate is at D, for thatwould give you a incidental advantage of 1. After that your opponentfalls back to R∗.
Total payoff for you: 0 (for cheating) + 0 + · · ·+ 0 (for being punishedby your opponent).
Example: a repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)
Example: a repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)
■ T2 ⇒ T1 (continued). Total payoff for row player: 0 (for cheating) +0 + · · ·+ 0 (for being punished by the column player).
Example: a repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)
■ T2 ⇒ T1 (continued). Total payoff for row player: 0 (for cheating) +0 + · · ·+ 0 (for being punished by the column player). Payoff for rowplayer if he was loyal:
(−1 + 1· δ + 1· δ2 + 1· δ
3)
Example: a repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)
■ T2 ⇒ T1 (continued). Total payoff for row player: 0 (for cheating) +0 + · · ·+ 0 (for being punished by the column player). Payoff for rowplayer if he was loyal:
(−1 + 1· δ + 1· δ2 + 1· δ
3) + (−1· δ4 + 1· δ
5 + 1· δ6 + 1· δ
7)
Example: a repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)
■ T2 ⇒ T1 (continued). Total payoff for row player: 0 (for cheating) +0 + · · ·+ 0 (for being punished by the column player). Payoff for rowplayer if he was loyal:
(−1 + 1· δ + 1· δ2 + 1· δ
3) + (−1· δ4 + 1· δ
5 + 1· δ6 + 1· δ
7) + . . .
Example: a repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)
■ T2 ⇒ T1 (continued). Total payoff for row player: 0 (for cheating) +0 + · · ·+ 0 (for being punished by the column player). Payoff for rowplayer if he was loyal:
(−1 + 1· δ + 1· δ2 + 1· δ
3) + (−1· δ4 + 1· δ
5 + 1· δ6 + 1· δ
7) + . . .
=∞
∑k=0
δk
Example: a repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)
■ T2 ⇒ T1 (continued). Total payoff for row player: 0 (for cheating) +0 + · · ·+ 0 (for being punished by the column player). Payoff for rowplayer if he was loyal:
(−1 + 1· δ + 1· δ2 + 1· δ
3) + (−1· δ4 + 1· δ
5 + 1· δ6 + 1· δ
7) + . . .
=∞
∑k=0
δk − 2
∞
∑k=0
δ4k
Example: a repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)
■ T2 ⇒ T1 (continued). Total payoff for row player: 0 (for cheating) +0 + · · ·+ 0 (for being punished by the column player). Payoff for rowplayer if he was loyal:
(−1 + 1· δ + 1· δ2 + 1· δ
3) + (−1· δ4 + 1· δ
5 + 1· δ6 + 1· δ
7) + . . .
=∞
∑k=0
δk − 2
∞
∑k=0
δ4k =
1
1 − δ
Example: a repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)
■ T2 ⇒ T1 (continued). Total payoff for row player: 0 (for cheating) +0 + · · ·+ 0 (for being punished by the column player). Payoff for rowplayer if he was loyal:
(−1 + 1· δ + 1· δ2 + 1· δ
3) + (−1· δ4 + 1· δ
5 + 1· δ6 + 1· δ
7) + . . .
=∞
∑k=0
δk − 2
∞
∑k=0
δ4k =
1
1 − δ− 2
1
1 − δ4.
Example: a repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)
■ T2 ⇒ T1 (continued). Total payoff for row player: 0 (for cheating) +0 + · · ·+ 0 (for being punished by the column player). Payoff for rowplayer if he was loyal:
(−1 + 1· δ + 1· δ2 + 1· δ
3) + (−1· δ4 + 1· δ
5 + 1· δ6 + 1· δ
7) + . . .
=∞
∑k=0
δk − 2
∞
∑k=0
δ4k =
1
1 − δ− 2
1
1 − δ4.
This expression is positive only if δ >≈ 0.54. (Solve 3rd-degreeequation.)
Retaliation in repeated games: playing the minmax
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
In the previous game, your “PlanB” was to play a mixed strategy(0.8, 0.2).
Retaliation in repeated games: playing the minmax
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
In the previous game, your “PlanB” was to play a mixed strategy(0.8, 0.2).
Questions:
Retaliation in repeated games: playing the minmax
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
In the previous game, your “PlanB” was to play a mixed strategy(0.8, 0.2).
Questions:
■ Why may a mixed strategy(0.8, 0.2) be considered apunishment?
Retaliation in repeated games: playing the minmax
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
In the previous game, your “PlanB” was to play a mixed strategy(0.8, 0.2).
Questions:
■ Why may a mixed strategy(0.8, 0.2) be considered apunishment?
■ Is mixed strategy (0.8, 0.2) themost severe punishment?
Retaliation in repeated games: playing the minmax
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
In the previous game, your “PlanB” was to play a mixed strategy(0.8, 0.2).
Questions:
■ Why may a mixed strategy(0.8, 0.2) be considered apunishment?
■ Is mixed strategy (0.8, 0.2) themost severe punishment?
■ If so, why?
Retaliation in repeated games: playing minmax
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:0 1 2You: 0 (4, 9) (7, 9) (5, 8)
1 (6, 7) (8, 7) (4, 8)2 (7, 9) (5, 7) (6, 7)
youpure minmax
Retaliation in repeated games: playing minmax
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:0 1 2You: 0 (4, 9) (7, 9) (5, 8)
1 (6, 7) (8, 7) (4, 8)2 (7, 9) (5, 7) (6, 7)
First for pure strategies.
youpure minmax
Retaliation in repeated games: playing minmax
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:0 1 2You: 0 (4, 9) (7, 9) (5, 8)
1 (6, 7) (8, 7) (4, 8)2 (7, 9) (5, 7) (6, 7)
First for pure strategies.
■ Which action of you (the rowplayer) minimises the maximumpayoff of your opponent?
pure minmax
Retaliation in repeated games: playing minmax
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:0 1 2You: 0 (4, 9) (7, 9) (5, 8)
1 (6, 7) (8, 7) (4, 8)2 (7, 9) (5, 7) (6, 7)
First for pure strategies.
■ Which action of you (the rowplayer) minimises the maximumpayoff of your opponent?
■ This is the pure minmax:minimise maximum payoff,which can be found byscanning “blue rows” (= payoffsof opponent).
Retaliation in repeated games: playing minmax
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:0 1 2You: 0 (4, 9) (7, 9) (5, 8)
1 (6, 7) (8, 7) (4, 8)2 (7, 9) (5, 7) (6, 7)
First for pure strategies.
■ Which action of you (the rowplayer) minimises the maximumpayoff of your opponent?
■ This is the pure minmax:minimise maximum payoff,which can be found byscanning “blue rows” (= payoffsof opponent).
It turns out that Action 1 keepsthe payoff of your opponentbelow 9.
Retaliation in repeated games: playing minmax
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:0 1 2You: 0 (4, 9) (7, 9) (5, 8)
1 (6, 7) (8, 7) (4, 8)2 (7, 9) (5, 7) (6, 7)
First for pure strategies.
■ Which action of you (the rowplayer) minimises the maximumpayoff of your opponent?
■ This is the pure minmax:minimise maximum payoff,which can be found byscanning “blue rows” (= payoffsof opponent).
It turns out that Action 1 keepsthe payoff of your opponentbelow 9.
■ Similarly, if your opponentwishes to punish you, he scans“green columns” to minimiseyour payoff
Retaliation in repeated games: playing minmax
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:0 1 2You: 0 (4, 9) (7, 9) (5, 8)
1 (6, 7) (8, 7) (4, 8)2 (7, 9) (5, 7) (6, 7)
First for pure strategies.
■ Which action of you (the rowplayer) minimises the maximumpayoff of your opponent?
■ This is the pure minmax:minimise maximum payoff,which can be found byscanning “blue rows” (= payoffsof opponent).
It turns out that Action 1 keepsthe payoff of your opponentbelow 9.
■ Similarly, if your opponentwishes to punish you, he scans“green columns” to minimiseyour payoff ⇒ Action 2.
Retaliation in repeated games: playing minmax
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:0 1 2You: 0 (4, 9) (7, 9) (5, 8)
1 (6, 7) (8, 7) (4, 8)2 (7, 9) (5, 7) (6, 7)
First for pure strategies.
■ Which action of you (the rowplayer) minimises the maximumpayoff of your opponent?
■ This is the pure minmax:minimise maximum payoff,which can be found byscanning “blue rows” (= payoffsof opponent).
It turns out that Action 1 keepsthe payoff of your opponentbelow 9.
■ Similarly, if your opponentwishes to punish you, he scans“green columns” to minimiseyour payoff ⇒ Action 2.
■ Alert 1: minmax may be6= maxmin (= security levelstrategy).
Retaliation in repeated games: playing minmax
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other:0 1 2You: 0 (4, 9) (7, 9) (5, 8)
1 (6, 7) (8, 7) (4, 8)2 (7, 9) (5, 7) (6, 7)
First for pure strategies.
■ Which action of you (the rowplayer) minimises the maximumpayoff of your opponent?
■ This is the pure minmax:minimise maximum payoff,which can be found byscanning “blue rows” (= payoffsof opponent).
It turns out that Action 1 keepsthe payoff of your opponentbelow 9.
■ Similarly, if your opponentwishes to punish you, he scans“green columns” to minimiseyour payoff ⇒ Action 2.
■ Alert 1: minmax may be6= maxmin (= security levelstrategy).
■ Alert 2: mixed minmax may be< pure minmax.
Minmax: payoff surface of the opponent
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
his payoff
your mix
his mix
push payoffrange to theminimum
The minmaxvalue as a threat
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
threat
The minmaxvalue as a threat
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ Your (possibly mixed) minmax strategy can be used as a threatto withhold you opponent from deviating of the silently agreedpath.
Cf. the threat:
If you’re not complying to our normal pattern of ac-tions, I am going to minmax you.
The minmaxvalue as a threat
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ Your (possibly mixed) minmax strategy can be used as a threatto withhold you opponent from deviating of the silently agreedpath.
Cf. the threat:
If you’re not complying to our normal pattern of ac-tions, I am going to minmax you.
■ By actually executing this threat you might harm yourself aswell ⇒ non-SGP.
The minmaxvalue as a threat
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
■ Your (possibly mixed) minmax strategy can be used as a threatto withhold you opponent from deviating of the silently agreedpath.
Cf. the threat:
If you’re not complying to our normal pattern of ac-tions, I am going to minmax you.
■ By actually executing this threat you might harm yourself aswell ⇒ non-SGP.
■ For finite two-person strictly competitive games (such as zero-sum games), minmax = maxmin.
(Finite in the sense that the arsenal of playable actions, hencethe payoff matrix, is finite.)
Example: A repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other: L RYou: U (1, 1) (0, 0)D (0, 0) (−1, 4)
Example: A repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other: L RYou: U (1, 1) (0, 0)D (0, 0) (−1, 4)
■ Your opponent can punish youmaximally by playing R∗. Howyou can punish your opponentis less obvious.
Example: A repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other: L RYou: U (1, 1) (0, 0)D (0, 0) (−1, 4)
■ Your opponent can punish youmaximally by playing R∗. Howyou can punish your opponentis less obvious.
■ If you play D∗ your opponentwill earn 4∗. If you play U∗,your opponent will earn 1∗.
Example: A repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other: L RYou: U (1, 1) (0, 0)D (0, 0) (−1, 4)
■ Your opponent can punish youmaximally by playing R∗. Howyou can punish your opponentis less obvious.
■ If you play D∗ your opponentwill earn 4∗. If you play U∗,your opponent will earn 1∗.
■ It is possible to punish youropponent even more bybecoming unpredictable
Example: A repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other: L RYou: U (1, 1) (0, 0)D (0, 0) (−1, 4)
■ Your opponent can punish youmaximally by playing R∗. Howyou can punish your opponentis less obvious.
■ If you play D∗ your opponentwill earn 4∗. If you play U∗,your opponent will earn 1∗.
■ It is possible to punish youropponent even more bybecoming unpredictable (withinCKR!!).
Example: A repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other: L RYou: U (1, 1) (0, 0)D (0, 0) (−1, 4)
■ Your opponent can punish youmaximally by playing R∗. Howyou can punish your opponentis less obvious.
■ If you play D∗ your opponentwill earn 4∗. If you play U∗,your opponent will earn 1∗.
■ It is possible to punish youropponent even more bybecoming unpredictable (withinCKR!!).
Given your mixed strategy(u, d) your opponent maximiseshis payoff by choosing the rightmix (l, r):
maxl,r
ul· 1+ dr· 4
Example: A repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other: L RYou: U (1, 1) (0, 0)D (0, 0) (−1, 4)
■ Your opponent can punish youmaximally by playing R∗. Howyou can punish your opponentis less obvious.
■ If you play D∗ your opponentwill earn 4∗. If you play U∗,your opponent will earn 1∗.
■ It is possible to punish youropponent even more bybecoming unpredictable (withinCKR!!).
Given your mixed strategy(u, d) your opponent maximiseshis payoff by choosing the rightmix (l, r):
maxl,r
ul· 1+ dr· 4
= maxl
ul + 4(1 − u)(1 − l)
Example: A repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other: L RYou: U (1, 1) (0, 0)D (0, 0) (−1, 4)
■ Your opponent can punish youmaximally by playing R∗. Howyou can punish your opponentis less obvious.
■ If you play D∗ your opponentwill earn 4∗. If you play U∗,your opponent will earn 1∗.
■ It is possible to punish youropponent even more bybecoming unpredictable (withinCKR!!).
Given your mixed strategy(u, d) your opponent maximiseshis payoff by choosing the rightmix (l, r):
maxl,r
ul· 1+ dr· 4
= maxl
ul + 4(1 − u)(1 − l)
= maxl
(5u − 4)l + 4 − 4u
Example: A repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Other: L RYou: U (1, 1) (0, 0)D (0, 0) (−1, 4)
■ Your opponent can punish youmaximally by playing R∗. Howyou can punish your opponentis less obvious.
■ If you play D∗ your opponentwill earn 4∗. If you play U∗,your opponent will earn 1∗.
■ It is possible to punish youropponent even more bybecoming unpredictable (withinCKR!!).
Given your mixed strategy(u, d) your opponent maximiseshis payoff by choosing the rightmix (l, r):
maxl,r
ul· 1+ dr· 4
= maxl
ul + 4(1 − u)(1 − l)
= maxl
(5u − 4)l + 4 − 4u
■ If 5u − 4 = 0, it does not matterwhat you opponent chooses forl—his expected payoff is always4 − 4(4/5) = 4/5.
Example: A repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
u0 14/5
l
0
1 4
0
0
1
Draw a picture of the payoff sur-face of the opponent.
Example: A repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
u0 14/5
l
0
1 4
0
0
1
Draw a picture of the payoff sur-face of the opponent.
■ If 5u− 4 = 0, it does not matterwhat you opponent chooses forl. He expects 4 − 4(4/5).
Example: A repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
u0 14/5
l
0
1 4
0
0
1
Draw a picture of the payoff sur-face of the opponent.
■ If 5u− 4 = 0, it does not matterwhat you opponent chooses forl. He expects 4 − 4(4/5).
■ If 5u − 4 > 0 your opponentwill play 1, and expects to earn> 4 − 4u which is > 4/5.
Example: A repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
u0 14/5
l
0
1 4
0
0
1
Draw a picture of the payoff sur-face of the opponent.
■ If 5u− 4 = 0, it does not matterwhat you opponent chooses forl. He expects 4 − 4(4/5).
■ If 5u − 4 > 0 your opponentwill play 1, and expects to earn> 4 − 4u which is > 4/5.
■ If 5u − 4 < 0 your opponentwill play 0, and expects to earn> 4 − 4u which, again, is >
4/5.
Example: A repeated game with a non-SGP NE
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
u0 14/5
l
0
1 4
0
0
1
Draw a picture of the payoff sur-face of the opponent.
■ If 5u− 4 = 0, it does not matterwhat you opponent chooses forl. He expects 4 − 4(4/5).
■ If 5u − 4 > 0 your opponentwill play 1, and expects to earn> 4 − 4u which is > 4/5.
■ If 5u − 4 < 0 your opponentwill play 0, and expects to earn> 4 − 4u which, again, is >
4/5.
These calculations are done byhand, and do not easily generaliseto higher dimensions.
Literature
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Literature on game theory is vast. For me, the following sources werehelpful and (no less important) offered different perspectives to repeatedgames.
Literature
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Literature on game theory is vast. For me, the following sources werehelpful and (no less important) offered different perspectives to repeatedgames.
* C.M. Gintis (2009): Game Theory Evolving: A Problem-Centered Introduction to Modeling StrategicInteraction. Second Edition. University Press, Princeton. Ch. 9: Repeated Games.
Literature
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Literature on game theory is vast. For me, the following sources werehelpful and (no less important) offered different perspectives to repeatedgames.
* C.M. Gintis (2009): Game Theory Evolving: A Problem-Centered Introduction to Modeling StrategicInteraction. Second Edition. University Press, Princeton. Ch. 9: Repeated Games.
* H. Peters (2008): Game Theory: A Multi-Leveled Approach. Springer, ISBN: 978-3-540-69290-4.Ch. 8: Repeated games.
Literature
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Literature on game theory is vast. For me, the following sources werehelpful and (no less important) offered different perspectives to repeatedgames.
* C.M. Gintis (2009): Game Theory Evolving: A Problem-Centered Introduction to Modeling StrategicInteraction. Second Edition. University Press, Princeton. Ch. 9: Repeated Games.
* H. Peters (2008): Game Theory: A Multi-Leveled Approach. Springer, ISBN: 978-3-540-69290-4.Ch. 8: Repeated games.
* K. Leyton-Brown & Y. Shoham (2008): Essentials of Game Theory: A Concise, MultidisciplinaryIntroduction. Morgan and Claypool Publishers, 2008. Ch. 6: Repeated and Stochastic Games.
Literature
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Literature on game theory is vast. For me, the following sources werehelpful and (no less important) offered different perspectives to repeatedgames.
* C.M. Gintis (2009): Game Theory Evolving: A Problem-Centered Introduction to Modeling StrategicInteraction. Second Edition. University Press, Princeton. Ch. 9: Repeated Games.
* H. Peters (2008): Game Theory: A Multi-Leveled Approach. Springer, ISBN: 978-3-540-69290-4.Ch. 8: Repeated games.
* K. Leyton-Brown & Y. Shoham (2008): Essentials of Game Theory: A Concise, MultidisciplinaryIntroduction. Morgan and Claypool Publishers, 2008. Ch. 6: Repeated and Stochastic Games.
* S.P. Hargreaves Heap & Y. Varoufakis (2004): Game theory: a critical text, Routledge. Ch. 5: THEPRISONERS’DILEMMA - The riddle of co-operation and its implications for collective agency.
Literature
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Literature on game theory is vast. For me, the following sources werehelpful and (no less important) offered different perspectives to repeatedgames.
* C.M. Gintis (2009): Game Theory Evolving: A Problem-Centered Introduction to Modeling StrategicInteraction. Second Edition. University Press, Princeton. Ch. 9: Repeated Games.
* H. Peters (2008): Game Theory: A Multi-Leveled Approach. Springer, ISBN: 978-3-540-69290-4.Ch. 8: Repeated games.
* K. Leyton-Brown & Y. Shoham (2008): Essentials of Game Theory: A Concise, MultidisciplinaryIntroduction. Morgan and Claypool Publishers, 2008. Ch. 6: Repeated and Stochastic Games.
* S.P. Hargreaves Heap & Y. Varoufakis (2004): Game theory: a critical text, Routledge. Ch. 5: THEPRISONERS’DILEMMA - The riddle of co-operation and its implications for collective agency.
* J. Ratliff (1997): Graduate-Level Course in Game Theory. (AKA: “Jim Ratliff’s Graduate-LevelCourse in Game Theory”. Lecture notes, Dept. of Economics, University of Arizona. (Availablethrough the web but not officially published.) Sec. 5.3: A Folk Theorem Sampler.
Literature
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Literature on game theory is vast. For me, the following sources werehelpful and (no less important) offered different perspectives to repeatedgames.
* C.M. Gintis (2009): Game Theory Evolving: A Problem-Centered Introduction to Modeling StrategicInteraction. Second Edition. University Press, Princeton. Ch. 9: Repeated Games.
* H. Peters (2008): Game Theory: A Multi-Leveled Approach. Springer, ISBN: 978-3-540-69290-4.Ch. 8: Repeated games.
* K. Leyton-Brown & Y. Shoham (2008): Essentials of Game Theory: A Concise, MultidisciplinaryIntroduction. Morgan and Claypool Publishers, 2008. Ch. 6: Repeated and Stochastic Games.
* S.P. Hargreaves Heap & Y. Varoufakis (2004): Game theory: a critical text, Routledge. Ch. 5: THEPRISONERS’DILEMMA - The riddle of co-operation and its implications for collective agency.
* J. Ratliff (1997): Graduate-Level Course in Game Theory. (AKA: “Jim Ratliff’s Graduate-LevelCourse in Game Theory”. Lecture notes, Dept. of Economics, University of Arizona. (Availablethrough the web but not officially published.) Sec. 5.3: A Folk Theorem Sampler.
* M.J. Osborne & A. Rubinstein (1994): A Course in Game Theory. MIT Press. Ch. 8: Repeatedgames.
What next?
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Now that we know that infinitely many equilibria exist in repeatedgames (an embarrassment of richness), there are a number of ways inwhich we may proceed.
What next?
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Now that we know that infinitely many equilibria exist in repeatedgames (an embarrassment of richness), there are a number of ways inwhich we may proceed.
■ Gradient Dynamics. This is to approximate NE of single-shot games(stage games) through gradient ascent (hill-climbing).
What next?
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Now that we know that infinitely many equilibria exist in repeatedgames (an embarrassment of richness), there are a number of ways inwhich we may proceed.
■ Gradient Dynamics. This is to approximate NE of single-shot games(stage games) through gradient ascent (hill-climbing).
■ Reinforcement Learning. Agents simply execute the action(s) withmaximal rewards in the past.
What next?
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Now that we know that infinitely many equilibria exist in repeatedgames (an embarrassment of richness), there are a number of ways inwhich we may proceed.
■ Gradient Dynamics. This is to approximate NE of single-shot games(stage games) through gradient ascent (hill-climbing).
■ Reinforcement Learning. Agents simply execute the action(s) withmaximal rewards in the past.
■ No-regret learning. Agents execute the action(s) with maximalvirtual rewards in the past.
What next?
Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games
Now that we know that infinitely many equilibria exist in repeatedgames (an embarrassment of richness), there are a number of ways inwhich we may proceed.
■ Gradient Dynamics. This is to approximate NE of single-shot games(stage games) through gradient ascent (hill-climbing).
■ Reinforcement Learning. Agents simply execute the action(s) withmaximal rewards in the past.
■ No-regret learning. Agents execute the action(s) with maximalvirtual rewards in the past.
■ Fictitious Play. Sample the actions of opponent(s) and play a bestresponse.