multi-agent learning - utrecht university€¦ · author: gerard vreeswijk. slides last modified...

232
Multi-agent learning Gerard Vreeswijk, Intelligent Systems Group, Computer Science Department, Faculty of Sciences, Utrecht University, The Netherlands. Wednesday 30 th April, 2014

Upload: others

Post on 22-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Multi-agent learningRepeated gamesGerard Vreeswijk, Intelligent Systems Group, Computer Science

Department, Faculty of Sciences, Utrecht University, TheNetherlands.

Wednesday 30th April, 2014

Page 2: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Repeated games: motivation

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

intera tionlearning

stage game

�niteinde�nitein�nite

Page 3: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Repeated games: motivation

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

1. Much intera tion in MASs can be modelled through games.

learning

stage game

�niteinde�nitein�nite

Page 4: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Repeated games: motivation

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

1. Much intera tion in MASs can be modelled through games.

2. Much learning in MASs can therefore be modelled through learning ingames.

stage game

�niteinde�nitein�nite

Page 5: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Repeated games: motivation

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

1. Much intera tion in MASs can be modelled through games.

2. Much learning in MASs can therefore be modelled through learning ingames.

3. Learning in games usually takes place through the (gradual)adaptation of strategies (hence, behaviour) in a repeated game.

stage game

�niteinde�nitein�nite

Page 6: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Repeated games: motivation

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

1. Much intera tion in MASs can be modelled through games.

2. Much learning in MASs can therefore be modelled through learning ingames.

3. Learning in games usually takes place through the (gradual)adaptation of strategies (hence, behaviour) in a repeated game.

4. In most repeated games, one game (a.k.a. stage game) is playedrepeatedly. Possibilities:

�niteinde�nitein�nite

Page 7: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Repeated games: motivation

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

1. Much intera tion in MASs can be modelled through games.

2. Much learning in MASs can therefore be modelled through learning ingames.

3. Learning in games usually takes place through the (gradual)adaptation of strategies (hence, behaviour) in a repeated game.

4. In most repeated games, one game (a.k.a. stage game) is playedrepeatedly. Possibilities:

■ A �nite number of times.

inde�nitein�nite

Page 8: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Repeated games: motivation

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

1. Much intera tion in MASs can be modelled through games.

2. Much learning in MASs can therefore be modelled through learning ingames.

3. Learning in games usually takes place through the (gradual)adaptation of strategies (hence, behaviour) in a repeated game.

4. In most repeated games, one game (a.k.a. stage game) is playedrepeatedly. Possibilities:

■ A �nite number of times.

■ An inde�nite (same: indeterminate) number of times.

in�nite

Page 9: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Repeated games: motivation

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

1. Much intera tion in MASs can be modelled through games.

2. Much learning in MASs can therefore be modelled through learning ingames.

3. Learning in games usually takes place through the (gradual)adaptation of strategies (hence, behaviour) in a repeated game.

4. In most repeated games, one game (a.k.a. stage game) is playedrepeatedly. Possibilities:

■ A �nite number of times.

■ An inde�nite (same: indeterminate) number of times.

■ An in�nite number of times.

Page 10: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Repeated games: motivation

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

1. Much intera tion in MASs can be modelled through games.

2. Much learning in MASs can therefore be modelled through learning ingames.

3. Learning in games usually takes place through the (gradual)adaptation of strategies (hence, behaviour) in a repeated game.

4. In most repeated games, one game (a.k.a. stage game) is playedrepeatedly. Possibilities:

■ A �nite number of times.

■ An inde�nite (same: indeterminate) number of times.

■ An in�nite number of times.

5. Therefore, familiarity with the basic concepts and results from thetheory of repeated games is essential to understand MAL.

Page 11: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Plan for today

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

ba kward indu tion

Dis ount fa torFolk theoremTrigger strategy

Page 12: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Plan for today

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ NE in normal form games that are repeated a finite number of times.

ba kward indu tion

Dis ount fa torFolk theoremTrigger strategy

Page 13: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Plan for today

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ NE in normal form games that are repeated a finite number of times.

● Principle of ba kward indu tion.

Dis ount fa torFolk theoremTrigger strategy

Page 14: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Plan for today

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ NE in normal form games that are repeated a finite number of times.

● Principle of ba kward indu tion.

■ NE in normal form games that are repeated an indefinite number oftimes.

Dis ount fa torFolk theoremTrigger strategy

Page 15: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Plan for today

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ NE in normal form games that are repeated a finite number of times.

● Principle of ba kward indu tion.

■ NE in normal form games that are repeated an indefinite number oftimes.

● Dis ount fa tor. Models the probability of continuation.

Folk theoremTrigger strategy

Page 16: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Plan for today

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ NE in normal form games that are repeated a finite number of times.

● Principle of ba kward indu tion.

■ NE in normal form games that are repeated an indefinite number oftimes.

● Dis ount fa tor. Models the probability of continuation.

● Folk theorem. (Actually many FT’s.) Repeated games generally dohave infinitely many Nash equilibria.

Trigger strategy

Page 17: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Plan for today

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ NE in normal form games that are repeated a finite number of times.

● Principle of ba kward indu tion.

■ NE in normal form games that are repeated an indefinite number oftimes.

● Dis ount fa tor. Models the probability of continuation.

● Folk theorem. (Actually many FT’s.) Repeated games generally dohave infinitely many Nash equilibria.

● Trigger strategy, on-path vs. off-path play, the threat to “minmax”an opponent.

Page 18: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Plan for today

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ NE in normal form games that are repeated a finite number of times.

● Principle of ba kward indu tion.

■ NE in normal form games that are repeated an indefinite number oftimes.

● Dis ount fa tor. Models the probability of continuation.

● Folk theorem. (Actually many FT’s.) Repeated games generally dohave infinitely many Nash equilibria.

● Trigger strategy, on-path vs. off-path play, the threat to “minmax”an opponent.

This presentation draws heavily on (Peters, 2008).

Page 19: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Plan for today

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ NE in normal form games that are repeated a finite number of times.

● Principle of ba kward indu tion.

■ NE in normal form games that are repeated an indefinite number oftimes.

● Dis ount fa tor. Models the probability of continuation.

● Folk theorem. (Actually many FT’s.) Repeated games generally dohave infinitely many Nash equilibria.

● Trigger strategy, on-path vs. off-path play, the threat to “minmax”an opponent.

This presentation draws heavily on (Peters, 2008).

* H. Peters (2008): Game Theory: A Multi-Leveled Approach. Springer, ISBN: 978-3-540-69290-4.Ch. 8: Repeated games.

Page 20: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Recap:

Stage games

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Page 21: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

The expected payoff of a stage game

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Prisoners’ DilemmaOther:

Cooperate DefectYou: Cooperate (3, 3) (0, 5)Defect (5, 0) (1, 1)

Page 22: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

The expected payoff of a stage game

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Prisoners’ DilemmaOther:

Cooperate DefectYou: Cooperate (3, 3) (0, 5)Defect (5, 0) (1, 1)

■ Suppose following strategy profile for one game:

Page 23: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

The expected payoff of a stage game

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Prisoners’ DilemmaOther:

Cooperate DefectYou: Cooperate (3, 3) (0, 5)Defect (5, 0) (1, 1)

■ Suppose following strategy profile for one game:

● Row player (you) plays with mixed strategy 0.8 on C.

Page 24: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

The expected payoff of a stage game

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Prisoners’ DilemmaOther:

Cooperate DefectYou: Cooperate (3, 3) (0, 5)Defect (5, 0) (1, 1)

■ Suppose following strategy profile for one game:

● Row player (you) plays with mixed strategy 0.8 on C.

● Column player (other) plays with mixed strategy 0.7 on C.

Page 25: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

The expected payoff of a stage game

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Prisoners’ DilemmaOther:

Cooperate DefectYou: Cooperate (3, 3) (0, 5)Defect (5, 0) (1, 1)

■ Suppose following strategy profile for one game:

● Row player (you) plays with mixed strategy 0.8 on C.

● Column player (other) plays with mixed strategy 0.7 on C.

■ Your expected payoff is 0.8(0.7· 3 + 0.3 · 0) + 0.2(0.7· 5 + 0.3 · 1) = 2.44

Page 26: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

The expected payoff of a stage game

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Prisoners’ DilemmaOther:

Cooperate DefectYou: Cooperate (3, 3) (0, 5)Defect (5, 0) (1, 1)

■ Suppose following strategy profile for one game:

● Row player (you) plays with mixed strategy 0.8 on C.

● Column player (other) plays with mixed strategy 0.7 on C.

■ Your expected payoff is 0.8(0.7· 3 + 0.3 · 0) + 0.2(0.7· 5 + 0.3 · 1) = 2.44

■ General formula (cf., e.g., Leyton-Brown et al., 2008):

Expected payoffi,t(s) = ∑(i1,...,in)∈An

Πnk=1sk,ik

·payoffi(si1 , . . . , sin)

Page 27: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Expected payoffs two players in stage PD

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Player 1mayonlymove“back–front”;Player2 mayonlymove“left –right”.

Page 28: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Part I:

Nash equilibria

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Page 29: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Part I:

Nash equilibria

in normal form games

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Page 30: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Part I:

Nash equilibria

in normal form games

that are repeated

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Page 31: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Part I:

Nash equilibria

in normal form games

that are repeated

a finite number of times

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Page 32: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Nash equilibria in playing the PD twice

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:Prisoners’ Dilemma Cooperate DefectYou: Cooperate (3, 3) (0, 5)

Defect (5, 0) (1, 1)

one Nashequilibrium Dtwo times in su ession

Page 33: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Nash equilibria in playing the PD twice

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:Prisoners’ Dilemma Cooperate DefectYou: Cooperate (3, 3) (0, 5)

Defect (5, 0) (1, 1)

■ Even if mixed strategies are allowed, the PD possesses one Nashequilibrium, viz. (D, D) with payoffs (1, 1).

two times in su ession

Page 34: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Nash equilibria in playing the PD twice

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:Prisoners’ Dilemma Cooperate DefectYou: Cooperate (3, 3) (0, 5)

Defect (5, 0) (1, 1)

■ Even if mixed strategies are allowed, the PD possesses one Nashequilibrium, viz. (D, D) with payoffs (1, 1).

■ This equilibrium is Pareto sub-optimal. (Because (3, 3) makes bothplayers better off.)

two times in su ession

Page 35: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Nash equilibria in playing the PD twice

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:Prisoners’ Dilemma Cooperate DefectYou: Cooperate (3, 3) (0, 5)

Defect (5, 0) (1, 1)

■ Even if mixed strategies are allowed, the PD possesses one Nashequilibrium, viz. (D, D) with payoffs (1, 1).

■ This equilibrium is Pareto sub-optimal. (Because (3, 3) makes bothplayers better off.)

■ Does the situation change if two parties get to play the Prisoners’Dilemma two times in su ession?

Page 36: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Nash equilibria in playing the PD twice

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:Prisoners’ Dilemma Cooperate DefectYou: Cooperate (3, 3) (0, 5)

Defect (5, 0) (1, 1)

■ Even if mixed strategies are allowed, the PD possesses one Nashequilibrium, viz. (D, D) with payoffs (1, 1).

■ This equilibrium is Pareto sub-optimal. (Because (3, 3) makes bothplayers better off.)

■ Does the situation change if two parties get to play the Prisoners’Dilemma two times in su ession?

■ The following diagram (hopefully) shows that playing the PD twotimes in succession does not yield an essentially new NE.

Page 37: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Nash equilibria in playing the PD twice

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

(0, 0)

CC(3, 3)

CC(6, 6)

CD(3, 8)

DC(8, 3)

DD(4, 4)

CD(0, 5)

CC(3, 8)

CD(0, 10)

DC(5, 5)

DD(1, 6)

DC(5, 0)

CC(8, 3)

CD(5, 5)

DC(10, 0)

DD(6, 1)

DD(1, 1)

CC(4, 4)

CD(1, 6)

DC(6, 1)

DD(2, 2)

Page 38: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Nash equilibria in playing the PD twice

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

In normal form:

Other:CC CD DC DDYou: CC (6, 6) (3, 8) (3, 8) (0, 10)

CD (8, 3) (4, 4) (5, 5) (1, 6)DC (8, 3) (5, 5) (4, 4) (1, 6)DD (10, 0) (6, 1) (6, 1) (2, 2)

Page 39: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Nash equilibria in playing the PD twice

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

In normal form:

Other:CC CD DC DDYou: CC (6, 6) (3, 8) (3, 8) (0, 10)

CD (8, 3) (4, 4) (5, 5) (1, 6)DC (8, 3) (5, 5) (4, 4) (1, 6)DD (10, 0) (6, 1) (6, 1) (2, 2)

■ The action profile (DD, DD) is the only Nash equilibrium.

Page 40: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Nash equilibria in playing the PD twice

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

In normal form:

Other:CC CD DC DDYou: CC (6, 6) (3, 8) (3, 8) (0, 10)

CD (8, 3) (4, 4) (5, 5) (1, 6)DC (8, 3) (5, 5) (4, 4) (1, 6)DD (10, 0) (6, 1) (6, 1) (2, 2)

■ The action profile (DD, DD) is the only Nash equilibrium.

■ With 3 successive games, we obtain a 23 × 23 matrix, where the actionprofile (DDD, DDD) still would be the only Nash equilibrium.

Page 41: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Nash equilibria in playing the PD twice

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

In normal form:

Other:CC CD DC DDYou: CC (6, 6) (3, 8) (3, 8) (0, 10)

CD (8, 3) (4, 4) (5, 5) (1, 6)DC (8, 3) (5, 5) (4, 4) (1, 6)DD (10, 0) (6, 1) (6, 1) (2, 2)

■ The action profile (DD, DD) is the only Nash equilibrium.

■ With 3 successive games, we obtain a 23 × 23 matrix, where the actionprofile (DDD, DDD) still would be the only Nash equilibrium.

■ Generalise to N repetitions: (DN , DN) still is the only Nashequilibrium in a repeated game where the PD is played N times insuccession.

Page 42: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Backward induction (version for repeated games)

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

historystrategy

Page 43: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Backward induction (version for repeated games)

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ Suppose G is a game in normal form for p players, where all playerspossess the same arsenal of possible actions A = {a1, . . . , am}.

historystrategy

Page 44: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Backward induction (version for repeated games)

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ Suppose G is a game in normal form for p players, where all playerspossess the same arsenal of possible actions A = {a1, . . . , am}.

■ The game Gn arises by playing the stage game G a number of n timesin succession.

historystrategy

Page 45: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Backward induction (version for repeated games)

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ Suppose G is a game in normal form for p players, where all playerspossess the same arsenal of possible actions A = {a1, . . . , am}.

■ The game Gn arises by playing the stage game G a number of n timesin succession.

■ A history h of length k is an element of (Ap)k

strategy

Page 46: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Backward induction (version for repeated games)

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ Suppose G is a game in normal form for p players, where all playerspossess the same arsenal of possible actions A = {a1, . . . , am}.

■ The game Gn arises by playing the stage game G a number of n timesin succession.

■ A history h of length k is an element of (Ap)k, e.g., for p = 3 andk = 10,

a7 a5 a3 a6 a1 a9 a2 a7 a7 a3

a6 a9 a2 a4 a2 a9 a9 a1 a1 a4

a1 a2 a7 a9 a6 a1 a1 a8 a2 a4

is a history of length ten in a game with three players.

strategy

Page 47: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Backward induction (version for repeated games)

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ Suppose G is a game in normal form for p players, where all playerspossess the same arsenal of possible actions A = {a1, . . . , am}.

■ The game Gn arises by playing the stage game G a number of n timesin succession.

■ A history h of length k is an element of (Ap)k, e.g., for p = 3 andk = 10,

a7 a5 a3 a6 a1 a9 a2 a7 a7 a3

a6 a9 a2 a4 a2 a9 a9 a1 a1 a4

a1 a2 a7 a9 a6 a1 a1 a8 a2 a4

is a history of length ten in a game with three players. The set of allpossible histories is denoted by H.

strategy

Page 48: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Backward induction (version for repeated games)

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ Suppose G is a game in normal form for p players, where all playerspossess the same arsenal of possible actions A = {a1, . . . , am}.

■ The game Gn arises by playing the stage game G a number of n timesin succession.

■ A history h of length k is an element of (Ap)k, e.g., for p = 3 andk = 10,

a7 a5 a3 a6 a1 a9 a2 a7 a7 a3

a6 a9 a2 a4 a2 a9 a9 a1 a1 a4

a1 a2 a7 a9 a6 a1 a1 a8 a2 a4

is a history of length ten in a game with three players. The set of allpossible histories is denoted by H. (Hence, |Hk| = mkp.)

strategy

Page 49: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Backward induction (version for repeated games)

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ Suppose G is a game in normal form for p players, where all playerspossess the same arsenal of possible actions A = {a1, . . . , am}.

■ The game Gn arises by playing the stage game G a number of n timesin succession.

■ A history h of length k is an element of (Ap)k, e.g., for p = 3 andk = 10,

a7 a5 a3 a6 a1 a9 a2 a7 a7 a3

a6 a9 a2 a4 a2 a9 a9 a1 a1 a4

a1 a2 a7 a9 a6 a1 a1 a8 a2 a4

is a history of length ten in a game with three players. The set of allpossible histories is denoted by H. (Hence, |Hk| = mkp.)

■ A (possibly mixed) strategy for one player is a function H → Pr(A).

Page 50: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Backward induction (version for repeated games)

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Independen e on history determined future

Page 51: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Backward induction (version for repeated games)

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ For some repeated games of length n, the dominating (read: “clearlybest”) strategy for all players in round n (the last round) does notdepend on the history of play.

Independen e on history determined future

Page 52: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Backward induction (version for repeated games)

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ For some repeated games of length n, the dominating (read: “clearlybest”) strategy for all players in round n (the last round) does notdepend on the history of play. E.g., for the Prisoners’ Dilemma in lastround:

Independen e on history determined future

Page 53: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Backward induction (version for repeated games)

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ For some repeated games of length n, the dominating (read: “clearlybest”) strategy for all players in round n (the last round) does notdepend on the history of play. E.g., for the Prisoners’ Dilemma in lastround:

“No matter what happened in rounds 1 . . . n − 1, I am better off playing D.”

Independen e on history determined future

Page 54: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Backward induction (version for repeated games)

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ For some repeated games of length n, the dominating (read: “clearlybest”) strategy for all players in round n (the last round) does notdepend on the history of play. E.g., for the Prisoners’ Dilemma in lastround:

“No matter what happened in rounds 1 . . . n − 1, I am better off playing D.”

■ Fixed strategies (D, D) in round n determine play after round n − 1.

Independen e on history determined future

Page 55: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Backward induction (version for repeated games)

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ For some repeated games of length n, the dominating (read: “clearlybest”) strategy for all players in round n (the last round) does notdepend on the history of play. E.g., for the Prisoners’ Dilemma in lastround:

“No matter what happened in rounds 1 . . . n − 1, I am better off playing D.”

■ Fixed strategies (D, D) in round n determine play after round n − 1.

■ Independen e on history, plus a determined future, leads to the followingjustification for playing D in round n − 1:

Page 56: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Backward induction (version for repeated games)

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ For some repeated games of length n, the dominating (read: “clearlybest”) strategy for all players in round n (the last round) does notdepend on the history of play. E.g., for the Prisoners’ Dilemma in lastround:

“No matter what happened in rounds 1 . . . n − 1, I am better off playing D.”

■ Fixed strategies (D, D) in round n determine play after round n − 1.

■ Independen e on history, plus a determined future, leads to the followingjustification for playing D in round n − 1:

“No matter what happened in rounds 1 . . . n − 2 (the past), and given that Iwill receive a payoff of 1 in round n (the future), I am better off playing Dnow.”

Page 57: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Backward induction (version for repeated games)

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ For some repeated games of length n, the dominating (read: “clearlybest”) strategy for all players in round n (the last round) does notdepend on the history of play. E.g., for the Prisoners’ Dilemma in lastround:

“No matter what happened in rounds 1 . . . n − 1, I am better off playing D.”

■ Fixed strategies (D, D) in round n determine play after round n − 1.

■ Independen e on history, plus a determined future, leads to the followingjustification for playing D in round n − 1:

“No matter what happened in rounds 1 . . . n − 2 (the past), and given that Iwill receive a payoff of 1 in round n (the future), I am better off playing Dnow.”

■ Per induction in round k, where k ≥ 1:

Page 58: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Backward induction (version for repeated games)

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ For some repeated games of length n, the dominating (read: “clearlybest”) strategy for all players in round n (the last round) does notdepend on the history of play. E.g., for the Prisoners’ Dilemma in lastround:

“No matter what happened in rounds 1 . . . n − 1, I am better off playing D.”

■ Fixed strategies (D, D) in round n determine play after round n − 1.

■ Independen e on history, plus a determined future, leads to the followingjustification for playing D in round n − 1:

“No matter what happened in rounds 1 . . . n − 2 (the past), and given that Iwill receive a payoff of 1 in round n (the future), I am better off playing Dnow.”

■ Per induction in round k, where k ≥ 1: “No matter what happened inrounds 1 . . . k, and given that I will receive a payoff of (n − k) · 1 in rounds(k + 1) . . . n, I am better off playing D in round k.”

Page 59: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Part II:

Nash equilibria

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Page 60: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Part II:

Nash equilibria

in normal form games

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Page 61: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Part II:

Nash equilibria

in normal form games

that are repeated

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Page 62: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Part II:

Nash equilibria

in normal form games

that are repeated

an indefinite number of times

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Page 63: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Indefinite number of repetitions

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

inde�nite

dis ount fa torin�nitely many

Folk theorems

Page 64: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Indefinite number of repetitions

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ A Pareto-suboptimal outcome can be avoided in case the followingthree conditions are met.

inde�nite

dis ount fa torin�nitely many

Folk theorems

Page 65: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Indefinite number of repetitions

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ A Pareto-suboptimal outcome can be avoided in case the followingthree conditions are met.

1. The Prisoners’ Dilemma is repeated an inde�nite number of times(rounds).

dis ount fa torin�nitely many

Folk theorems

Page 66: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Indefinite number of repetitions

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ A Pareto-suboptimal outcome can be avoided in case the followingthree conditions are met.

1. The Prisoners’ Dilemma is repeated an inde�nite number of times(rounds).

2. A so-called dis ount fa tor δ ∈ [0, 1] determines the probability ofcontinuing the game after each round.

in�nitely many

Folk theorems

Page 67: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Indefinite number of repetitions

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ A Pareto-suboptimal outcome can be avoided in case the followingthree conditions are met.

1. The Prisoners’ Dilemma is repeated an inde�nite number of times(rounds).

2. A so-called dis ount fa tor δ ∈ [0, 1] determines the probability ofcontinuing the game after each round.

3. The probability to continue, δ, must be large enough.

in�nitely many

Folk theorems

Page 68: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Indefinite number of repetitions

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ A Pareto-suboptimal outcome can be avoided in case the followingthree conditions are met.

1. The Prisoners’ Dilemma is repeated an inde�nite number of times(rounds).

2. A so-called dis ount fa tor δ ∈ [0, 1] determines the probability ofcontinuing the game after each round.

3. The probability to continue, δ, must be large enough.

■ Under these conditions suddenly in�nitely many Nash equilibria exist.This is sometimes called an embarrassment of richness (Peters, 2008).

Folk theorems

Page 69: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Indefinite number of repetitions

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ A Pareto-suboptimal outcome can be avoided in case the followingthree conditions are met.

1. The Prisoners’ Dilemma is repeated an inde�nite number of times(rounds).

2. A so-called dis ount fa tor δ ∈ [0, 1] determines the probability ofcontinuing the game after each round.

3. The probability to continue, δ, must be large enough.

■ Under these conditions suddenly in�nitely many Nash equilibria exist.This is sometimes called an embarrassment of richness (Peters, 2008).

■ Various Folk theorems state the existence of multiple equilibria ininfinitely repeated games.

Page 70: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Indefinite number of repetitions

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ A Pareto-suboptimal outcome can be avoided in case the followingthree conditions are met.

1. The Prisoners’ Dilemma is repeated an inde�nite number of times(rounds).

2. A so-called dis ount fa tor δ ∈ [0, 1] determines the probability ofcontinuing the game after each round.

3. The probability to continue, δ, must be large enough.

■ Under these conditions suddenly in�nitely many Nash equilibria exist.This is sometimes called an embarrassment of richness (Peters, 2008).

■ Various Folk theorems state the existence of multiple equilibria ininfinitely repeated games.

■ We now informally discuss one version of “the” Folk Theorem.

Page 71: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Prisoners’ Dilemma repeated indefinitely

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

the stage gameoverall gamehistoryrealisation

Page 72: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Prisoners’ Dilemma repeated indefinitely

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ Consider the game G∗(δ).

the stage gameoverall gamehistoryrealisation

Page 73: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Prisoners’ Dilemma repeated indefinitely

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ Consider the game G∗(δ). This is the PD game, G, played anindefinite number of times.

the stage gameoverall gamehistoryrealisation

Page 74: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Prisoners’ Dilemma repeated indefinitely

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ Consider the game G∗(δ). This is the PD game, G, played anindefinite number of times.

■ The number of times the stage game G is played is determined by aparameter 0 ≤ δ ≤ 1.

the stage gameoverall gamehistoryrealisation

Page 75: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Prisoners’ Dilemma repeated indefinitely

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ Consider the game G∗(δ). This is the PD game, G, played anindefinite number of times.

■ The number of times the stage game G is played is determined by aparameter 0 ≤ δ ≤ 1. The probability that the next stage (and thestages thereafter) will be played is δ.

the stage gameoverall gamehistoryrealisation

Page 76: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Prisoners’ Dilemma repeated indefinitely

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ Consider the game G∗(δ). This is the PD game, G, played anindefinite number of times.

■ The number of times the stage game G is played is determined by aparameter 0 ≤ δ ≤ 1. The probability that the next stage (and thestages thereafter) will be played is δ.

Therefore, the probability that stage game Gt will be played is δt.

(What if t = 0?)

the stage gameoverall gamehistoryrealisation

Page 77: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Prisoners’ Dilemma repeated indefinitely

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ Consider the game G∗(δ). This is the PD game, G, played anindefinite number of times.

■ The number of times the stage game G is played is determined by aparameter 0 ≤ δ ≤ 1. The probability that the next stage (and thestages thereafter) will be played is δ.

Therefore, the probability that stage game Gt will be played is δt.

(What if t = 0?)

■ The PD (of which every Gt is an incarnation) is called the stage game,as opposed to the overall game G∗(δ).

historyrealisation

Page 78: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Prisoners’ Dilemma repeated indefinitely

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ Consider the game G∗(δ). This is the PD game, G, played anindefinite number of times.

■ The number of times the stage game G is played is determined by aparameter 0 ≤ δ ≤ 1. The probability that the next stage (and thestages thereafter) will be played is δ.

Therefore, the probability that stage game Gt will be played is δt.

(What if t = 0?)

■ The PD (of which every Gt is an incarnation) is called the stage game,as opposed to the overall game G∗(δ).

■ A history h of length t of a repeated game is a sequence of actionprofiles of length t.

realisation

Page 79: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Prisoners’ Dilemma repeated indefinitely

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ Consider the game G∗(δ). This is the PD game, G, played anindefinite number of times.

■ The number of times the stage game G is played is determined by aparameter 0 ≤ δ ≤ 1. The probability that the next stage (and thestages thereafter) will be played is δ.

Therefore, the probability that stage game Gt will be played is δt.

(What if t = 0?)

■ The PD (of which every Gt is an incarnation) is called the stage game,as opposed to the overall game G∗(δ).

■ A history h of length t of a repeated game is a sequence of actionprofiles of length t.

■ A realisation h is a countably infinite sequence of action profiles.

Page 80: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Prisoners’ Dilemma repeated indefinitely

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

(mixed) strategystrategy pro�leexpe ted payo�

Page 81: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Prisoners’ Dilemma repeated indefinitely

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ Example of a history of length t = 10:

Row player: C D D D C C D D D DColumn player: C D D D D D D C D D

0 1 2 3 4 5 6 7 8 9

(mixed) strategystrategy pro�leexpe ted payo�

Page 82: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Prisoners’ Dilemma repeated indefinitely

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ Example of a history of length t = 10:

Row player: C D D D C C D D D DColumn player: C D D D D D D C D D

0 1 2 3 4 5 6 7 8 9

■ The set of all possible histories (of any length) is denoted by H.

(mixed) strategystrategy pro�leexpe ted payo�

Page 83: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Prisoners’ Dilemma repeated indefinitely

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ Example of a history of length t = 10:

Row player: C D D D C C D D D DColumn player: C D D D D D D C D D

0 1 2 3 4 5 6 7 8 9

■ The set of all possible histories (of any length) is denoted by H.

■ A (mixed) strategy for Player i is a function si : H → ∆{C, D} such that

Pr(Player i plays C in round |h|+ 1 | h ) = si(h) (C).

strategy pro�leexpe ted payo�

Page 84: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Prisoners’ Dilemma repeated indefinitely

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ Example of a history of length t = 10:

Row player: C D D D C C D D D DColumn player: C D D D D D D C D D

0 1 2 3 4 5 6 7 8 9

■ The set of all possible histories (of any length) is denoted by H.

■ A (mixed) strategy for Player i is a function si : H → ∆{C, D} such that

Pr(Player i plays C in round |h|+ 1 | h ) = si(h) (C).

■ A strategy pro�le s is a combination of strategies, one for each player.

expe ted payo�

Page 85: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Prisoners’ Dilemma repeated indefinitely

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ Example of a history of length t = 10:

Row player: C D D D C C D D D DColumn player: C D D D D D D C D D

0 1 2 3 4 5 6 7 8 9

■ The set of all possible histories (of any length) is denoted by H.

■ A (mixed) strategy for Player i is a function si : H → ∆{C, D} such that

Pr(Player i plays C in round |h|+ 1 | h ) = si(h) (C).

■ A strategy pro�le s is a combination of strategies, one for each player.

■ The expe ted payo� for player i given s can be computed. It is

Expected payoffi(s) =∞

∑t=0

δt Expected payoffi,t(s).

Page 86: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Subgame perfect equilibria of G∗(δ): D∗

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Definition. A subgame perfe t Nash equilibrium of an extensive (in thiscase: repeated) game is a Nash equilibrium of this extensive game, ofwhich the restriction to every subgame (read: tailgame) also is a Nashequilibrium to that subgame.

Page 87: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Subgame perfect equilibria of G∗(δ): D∗

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Definition. A subgame perfe t Nash equilibrium of an extensive (in thiscase: repeated) game is a Nash equilibrium of this extensive game, ofwhich the restriction to every subgame (read: tailgame) also is a Nashequilibrium to that subgame.

Consider the strategy of iterated defection D∗: “always defect, no matterwhat”.1

Page 88: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Subgame perfect equilibria of G∗(δ): D∗

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Definition. A subgame perfe t Nash equilibrium of an extensive (in thiscase: repeated) game is a Nash equilibrium of this extensive game, ofwhich the restriction to every subgame (read: tailgame) also is a Nashequilibrium to that subgame.

Consider the strategy of iterated defection D∗: “always defect, no matterwhat”.1

Claim. The strategy profile (D∗, D∗) is a subgame perfect equilibrium inG∗(δ).

Page 89: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Subgame perfect equilibria of G∗(δ): D∗

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Definition. A subgame perfe t Nash equilibrium of an extensive (in thiscase: repeated) game is a Nash equilibrium of this extensive game, ofwhich the restriction to every subgame (read: tailgame) also is a Nashequilibrium to that subgame.

Consider the strategy of iterated defection D∗: “always defect, no matterwhat”.1

Claim. The strategy profile (D∗, D∗) is a subgame perfect equilibrium inG∗(δ).

Proof. Consider any tailgame starting at round t.

Page 90: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Subgame perfect equilibria of G∗(δ): D∗

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Definition. A subgame perfe t Nash equilibrium of an extensive (in thiscase: repeated) game is a Nash equilibrium of this extensive game, ofwhich the restriction to every subgame (read: tailgame) also is a Nashequilibrium to that subgame.

Consider the strategy of iterated defection D∗: “always defect, no matterwhat”.1

Claim. The strategy profile (D∗, D∗) is a subgame perfect equilibrium inG∗(δ).

Proof. Consider any tailgame starting at round t. We are done if we canshow that (D∗, D∗) is a NE for this subgame.

Page 91: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Subgame perfect equilibria of G∗(δ): D∗

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Definition. A subgame perfe t Nash equilibrium of an extensive (in thiscase: repeated) game is a Nash equilibrium of this extensive game, ofwhich the restriction to every subgame (read: tailgame) also is a Nashequilibrium to that subgame.

Consider the strategy of iterated defection D∗: “always defect, no matterwhat”.1

Claim. The strategy profile (D∗, D∗) is a subgame perfect equilibrium inG∗(δ).

Proof. Consider any tailgame starting at round t. We are done if we canshow that (D∗, D∗) is a NE for this subgame. This is true: given that oneplayer always defects, it never pays off for the other player to play C atany time.

Page 92: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Subgame perfect equilibria of G∗(δ): D∗

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Definition. A subgame perfe t Nash equilibrium of an extensive (in thiscase: repeated) game is a Nash equilibrium of this extensive game, ofwhich the restriction to every subgame (read: tailgame) also is a Nashequilibrium to that subgame.

Consider the strategy of iterated defection D∗: “always defect, no matterwhat”.1

Claim. The strategy profile (D∗, D∗) is a subgame perfect equilibrium inG∗(δ).

Proof. Consider any tailgame starting at round t. We are done if we canshow that (D∗, D∗) is a NE for this subgame. This is true: given that oneplayer always defects, it never pays off for the other player to play C atany time. Hence, everyone plays D∗.

1A notation like D∗ or (worse) D∞ is suggestive. Mathematically it makes no sense,but intuitively it does.

Page 93: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Part III:

Trigger strategies

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Page 94: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Cost of deviating in Round 4

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Consider the so-called trigger strategy T: “always play C unless D hasbeen played at least once. In that case play D forever”.

Page 95: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Cost of deviating in Round 4

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Consider the so-called trigger strategy T: “always play C unless D hasbeen played at least once. In that case play D forever”.

Claim. The strategy profile (T, T) is a subgame perfect equilibrium in G∗(δ),provided the probability of continuation, δ, is sufficiently large.

Page 96: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Cost of deviating in Round 4

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Consider the so-called trigger strategy T: “always play C unless D hasbeen played at least once. In that case play D forever”.

Claim. The strategy profile (T, T) is a subgame perfect equilibrium in G∗(δ),provided the probability of continuation, δ, is sufficiently large.

Proof. Consider a typical play:

Row player: C C C C C D D D D D . . .Column player: C C C C D D D D D D . . .

0 1 2 3 4 5 6 7 8 9 . . .

Page 97: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Cost of deviating in Round 4

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Consider the so-called trigger strategy T: “always play C unless D hasbeen played at least once. In that case play D forever”.

Claim. The strategy profile (T, T) is a subgame perfect equilibrium in G∗(δ),provided the probability of continuation, δ, is sufficiently large.

Proof. Consider a typical play:

Row player: C C C C C D D D D D . . .Column player: C C C C D D D D D D . . .

0 1 2 3 4 5 6 7 8 9 . . .

Column player defects after Round 4. By doing so he expects a payoff of

4

∑t=0

δt· 3

Page 98: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Cost of deviating in Round 4

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Consider the so-called trigger strategy T: “always play C unless D hasbeen played at least once. In that case play D forever”.

Claim. The strategy profile (T, T) is a subgame perfect equilibrium in G∗(δ),provided the probability of continuation, δ, is sufficiently large.

Proof. Consider a typical play:

Row player: C C C C C D D D D D . . .Column player: C C C C D D D D D D . . .

0 1 2 3 4 5 6 7 8 9 . . .

Column player defects after Round 4. By doing so he expects a payoff of

4

∑t=0

δt· 3 + δ

5· 5

Page 99: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Cost of deviating in Round 4

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Consider the so-called trigger strategy T: “always play C unless D hasbeen played at least once. In that case play D forever”.

Claim. The strategy profile (T, T) is a subgame perfect equilibrium in G∗(δ),provided the probability of continuation, δ, is sufficiently large.

Proof. Consider a typical play:

Row player: C C C C C D D D D D . . .Column player: C C C C D D D D D D . . .

0 1 2 3 4 5 6 7 8 9 . . .

Column player defects after Round 4. By doing so he expects a payoff of

4

∑t=0

δt· 3 + δ

5· 5 +∞

∑t=6

δt· 1.

Page 100: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Cost of deviating in Round 4

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

By cooperating throughout, the column player could have earned

∑t=0

δt· 3

Page 101: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Cost of deviating in Round 4

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

By cooperating throughout, the column player could have earned

∑t=0

δt· 3

which means he forfeited

∑t=0

δt· 3

Page 102: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Cost of deviating in Round 4

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

By cooperating throughout, the column player could have earned

∑t=0

δt· 3

which means he forfeited

∑t=0

δt· 3 −

(

Page 103: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Cost of deviating in Round 4

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

By cooperating throughout, the column player could have earned

∑t=0

δt· 3

which means he forfeited

∑t=0

δt· 3 −

(

4

∑t=0

δt· 3

Page 104: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Cost of deviating in Round 4

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

By cooperating throughout, the column player could have earned

∑t=0

δt· 3

which means he forfeited

∑t=0

δt· 3 −

(

4

∑t=0

δt· 3 + δ

5· 5

Page 105: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Cost of deviating in Round 4

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

By cooperating throughout, the column player could have earned

∑t=0

δt· 3

which means he forfeited

∑t=0

δt· 3 −

(

4

∑t=0

δt· 3 + δ

5· 5 +∞

∑t=6

δt· 1

)

Page 106: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Cost of deviating in Round 4

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

By cooperating throughout, the column player could have earned

∑t=0

δt· 3

which means he forfeited

∑t=0

δt· 3 −

(

4

∑t=0

δt· 3 + δ

5· 5 +∞

∑t=6

δt· 1

)

= −2δ5 + 2

∑t=6

δt

by deviating from T.

Page 107: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Cost of deviating in Round 4

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

By cooperating throughout, the column player could have earned

∑t=0

δt· 3

which means he forfeited

∑t=0

δt· 3 −

(

4

∑t=0

δt· 3 + δ

5· 5 +∞

∑t=6

δt· 1

)

= −2δ5 + 2

∑t=6

δt

by deviating from T. If δ 6= 0,

−2δ5 + 2

∑t=6

δt> 0

Page 108: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Cost of deviating in Round 4

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

By cooperating throughout, the column player could have earned

∑t=0

δt· 3

which means he forfeited

∑t=0

δt· 3 −

(

4

∑t=0

δt· 3 + δ

5· 5 +∞

∑t=6

δt· 1

)

= −2δ5 + 2

∑t=6

δt

by deviating from T. If δ 6= 0,

−2δ5 + 2

∑t=6

δt> 0 ⇔ 1 −

∑t=1

δt> 0

Page 109: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Cost of deviating in Round 4

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

By cooperating throughout, the column player could have earned

∑t=0

δt· 3

which means he forfeited

∑t=0

δt· 3 −

(

4

∑t=0

δt· 3 + δ

5· 5 +∞

∑t=6

δt· 1

)

= −2δ5 + 2

∑t=6

δt

by deviating from T. If δ 6= 0,

−2δ5 + 2

∑t=6

δt> 0 ⇔ 1 −

∑t=1

δt> 0 ⇔ 1 −

∑t=0

δt + 1 > 0

Page 110: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Cost of deviating in Round 4

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

By cooperating throughout, the column player could have earned

∑t=0

δt· 3

which means he forfeited

∑t=0

δt· 3 −

(

4

∑t=0

δt· 3 + δ

5· 5 +∞

∑t=6

δt· 1

)

= −2δ5 + 2

∑t=6

δt

by deviating from T. If δ 6= 0,

−2δ5 + 2

∑t=6

δt> 0 ⇔ 1 −

∑t=1

δt> 0 ⇔ 1 −

∑t=0

δt + 1 > 0

⇔ 1 −1

1 − δ+ 1 > 0

Page 111: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: Cost of deviating in Round 4

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

By cooperating throughout, the column player could have earned

∑t=0

δt· 3

which means he forfeited

∑t=0

δt· 3 −

(

4

∑t=0

δt· 3 + δ

5· 5 +∞

∑t=6

δt· 1

)

= −2δ5 + 2

∑t=6

δt

by deviating from T. If δ 6= 0,

−2δ5 + 2

∑t=6

δt> 0 ⇔ 1 −

∑t=1

δt> 0 ⇔ 1 −

∑t=0

δt + 1 > 0

⇔ 1 −1

1 − δ+ 1 > 0 ⇔ δ >

1

2.

Page 112: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Player starts to defect at Round N.

Page 113: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Player starts to defect at Round N. By doing so he expects a payoff of

N−1

∑t=0

Page 114: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Player starts to defect at Round N. By doing so he expects a payoff of

N−1

∑t=0

δt· 3

Page 115: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Player starts to defect at Round N. By doing so he expects a payoff of

N−1

∑t=0

δt· 3 + δ

N· 5

Page 116: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Player starts to defect at Round N. By doing so he expects a payoff of

N−1

∑t=0

δt· 3 + δ

N· 5 +∞

∑t=N+1

δt· 1

Page 117: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Player starts to defect at Round N. By doing so he expects a payoff of

N−1

∑t=0

δt· 3 + δ

N· 5 +∞

∑t=N+1

δt· 1

By playing C throughout, the column player could have earned

Page 118: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Player starts to defect at Round N. By doing so he expects a payoff of

N−1

∑t=0

δt· 3 + δ

N· 5 +∞

∑t=N+1

δt· 1

By playing C throughout, the column player could have earned

∑t=0

Page 119: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Player starts to defect at Round N. By doing so he expects a payoff of

N−1

∑t=0

δt· 3 + δ

N· 5 +∞

∑t=N+1

δt· 1

By playing C throughout, the column player could have earned

∑t=0

δt· 3

Page 120: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Player starts to defect at Round N. By doing so he expects a payoff of

N−1

∑t=0

δt· 3 + δ

N· 5 +∞

∑t=N+1

δt· 1

By playing C throughout, the column player could have earned

∑t=0

δt· 3

which means he forfeited

Page 121: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Player starts to defect at Round N. By doing so he expects a payoff of

N−1

∑t=0

δt· 3 + δ

N· 5 +∞

∑t=N+1

δt· 1

By playing C throughout, the column player could have earned

∑t=0

δt· 3

which means he forfeited

δN· (3 − 5)

Page 122: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Player starts to defect at Round N. By doing so he expects a payoff of

N−1

∑t=0

δt· 3 + δ

N· 5 +∞

∑t=N+1

δt· 1

By playing C throughout, the column player could have earned

∑t=0

δt· 3

which means he forfeited

δN· (3 − 5) +

∑t=N+1

δt· (3 − 1)

Page 123: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Player starts to defect at Round N. By doing so he expects a payoff of

N−1

∑t=0

δt· 3 + δ

N· 5 +∞

∑t=N+1

δt· 1

By playing C throughout, the column player could have earned

∑t=0

δt· 3

which means he forfeited

δN· (3 − 5) +

∑t=N+1

δt· (3 − 1) = −2δ

N

Page 124: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Player starts to defect at Round N. By doing so he expects a payoff of

N−1

∑t=0

δt· 3 + δ

N· 5 +∞

∑t=N+1

δt· 1

By playing C throughout, the column player could have earned

∑t=0

δt· 3

which means he forfeited

δN· (3 − 5) +

∑t=N+1

δt· (3 − 1) = −2δ

N + 2∞

∑t=N+1

δt

thanks to deviating in round N (and further).

Page 125: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

If δ 6= 0, and

−2δN + 2

∑t=N+1

δt> 0

then there is forfeit from payoff.

Page 126: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

If δ 6= 0, and

−2δN + 2

∑t=N+1

δt> 0

then there is forfeit from payoff. This is when

−2δN + 2

∑t=N+1

δt> 0

Page 127: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

If δ 6= 0, and

−2δN + 2

∑t=N+1

δt> 0

then there is forfeit from payoff. This is when

−2δN + 2

∑t=N+1

δt> 0

⇔ 1 −∞

∑t=N+1

δt−N

> 0

Page 128: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

If δ 6= 0, and

−2δN + 2

∑t=N+1

δt> 0

then there is forfeit from payoff. This is when

−2δN + 2

∑t=N+1

δt> 0

⇔ 1 −∞

∑t=N+1

δt−N

> 0 ⇔ 1 −∞

∑t=0

δ(t−N)+(N+1)

> 0

Page 129: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

If δ 6= 0, and

−2δN + 2

∑t=N+1

δt> 0

then there is forfeit from payoff. This is when

−2δN + 2

∑t=N+1

δt> 0

⇔ 1 −∞

∑t=N+1

δt−N

> 0 ⇔ 1 −∞

∑t=0

δ(t−N)+(N+1)

> 0

⇔ 1 − δ

∑t=0

δt> 0

Page 130: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

If δ 6= 0, and

−2δN + 2

∑t=N+1

δt> 0

then there is forfeit from payoff. This is when

−2δN + 2

∑t=N+1

δt> 0

⇔ 1 −∞

∑t=N+1

δt−N

> 0 ⇔ 1 −∞

∑t=0

δ(t−N)+(N+1)

> 0

⇔ 1 − δ

∑t=0

δt> 0 ⇔ 1 − δ

1

1 − δ> 0

Page 131: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

If δ 6= 0, and

−2δN + 2

∑t=N+1

δt> 0

then there is forfeit from payoff. This is when

−2δN + 2

∑t=N+1

δt> 0

⇔ 1 −∞

∑t=N+1

δt−N

> 0 ⇔ 1 −∞

∑t=0

δ(t−N)+(N+1)

> 0

⇔ 1 − δ

∑t=0

δt> 0 ⇔ 1 − δ

1

1 − δ> 0 ⇔ δ >

1

2.

Page 132: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Cost of deviating in Round N

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

If δ 6= 0, and

−2δN + 2

∑t=N+1

δt> 0

then there is forfeit from payoff. This is when

−2δN + 2

∑t=N+1

δt> 0

⇔ 1 −∞

∑t=N+1

δt−N

> 0 ⇔ 1 −∞

∑t=0

δ(t−N)+(N+1)

> 0

⇔ 1 − δ

∑t=0

δt> 0 ⇔ 1 − δ

1

1 − δ> 0 ⇔ δ >

1

2.

Thus, if δ > 1/2 every player forfeits payoff by deviating from T.

Page 133: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: an alternating trigger strategy

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Yet another subgame perfect equilibrium:

Page 134: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: an alternating trigger strategy

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Yet another subgame perfect equilibrium:

Informal definition of strategies. A and B tacitly agree to alternateactions, i.e. (C, D), (D, C), (C, D), . . . .

Page 135: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: an alternating trigger strategy

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Yet another subgame perfect equilibrium:

Informal definition of strategies. A and B tacitly agree to alternateactions, i.e. (C, D), (D, C), (C, D), . . . . If one of them deviates, the otherparty plays D forever.

Page 136: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: an alternating trigger strategy

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Yet another subgame perfect equilibrium:

Informal definition of strategies. A and B tacitly agree to alternateactions, i.e. (C, D), (D, C), (C, D), . . . . If one of them deviates, the otherparty plays D forever. (The party who originally deviated plays Dforever thereafter as well.)

Page 137: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: an alternating trigger strategy

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Yet another subgame perfect equilibrium:

Informal definition of strategies. A and B tacitly agree to alternateactions, i.e. (C, D), (D, C), (C, D), . . . . If one of them deviates, the otherparty plays D forever. (The party who originally deviated plays Dforever thereafter as well.) Notice the CKR aspect!

Page 138: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: an alternating trigger strategy

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Yet another subgame perfect equilibrium:

Informal definition of strategies. A and B tacitly agree to alternateactions, i.e. (C, D), (D, C), (C, D), . . . . If one of them deviates, the otherparty plays D forever. (The party who originally deviated plays Dforever thereafter as well.) Notice the CKR aspect!

Let A be the strategy that plays C in Round 1. Let B be the other strategy.

Page 139: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: an alternating trigger strategy

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Yet another subgame perfect equilibrium:

Informal definition of strategies. A and B tacitly agree to alternateactions, i.e. (C, D), (D, C), (C, D), . . . . If one of them deviates, the otherparty plays D forever. (The party who originally deviated plays Dforever thereafter as well.) Notice the CKR aspect!

Let A be the strategy that plays C in Round 1. Let B be the other strategy.

Claim. The strategy profile (A, B) is a subgame perfect equilibrium in G∗(δ),provided the probability of continuation, δ, is sufficiently large.

Page 140: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: an alternating trigger strategy

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Yet another subgame perfect equilibrium:

Informal definition of strategies. A and B tacitly agree to alternateactions, i.e. (C, D), (D, C), (C, D), . . . . If one of them deviates, the otherparty plays D forever. (The party who originally deviated plays Dforever thereafter as well.) Notice the CKR aspect!

Let A be the strategy that plays C in Round 1. Let B be the other strategy.

Claim. The strategy profile (A, B) is a subgame perfect equilibrium in G∗(δ),provided the probability of continuation, δ, is sufficiently large.

An analysis of this situation and a proof of this claim can be found in(Peters, 2008), pp. 104-105.*

*H. Peters (2008): Game Theory: A Multi-Leveled Approach. Springer, ISBN: 978-3-540-69290-4.

Page 141: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Generalisation of trigger strategies

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Page 142: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Generalisation of trigger strategies

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ Every convex combination2 of payoffs

α1(3, 3) + α2(0, 5) + α3(5, 0) + α4(1, 1)

can be established by smartly picking appropriate strategy patterns.

Page 143: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Generalisation of trigger strategies

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ Every convex combination2 of payoffs

α1(3, 3) + α2(0, 5) + α3(5, 0) + α4(1, 1)

can be established by smartly picking appropriate strategy patterns.E.g.: “We play 4 times (C, C). Then we play 7 times (C, D), (D, C),then 4 times (C, C) and so on”.

Page 144: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Generalisation of trigger strategies

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ Every convex combination2 of payoffs

α1(3, 3) + α2(0, 5) + α3(5, 0) + α4(1, 1)

can be established by smartly picking appropriate strategy patterns.E.g.: “We play 4 times (C, C). Then we play 7 times (C, D), (D, C),then 4 times (C, C) and so on”.

■ Ensure that (C, C) occurs (in the long run) in α1, (C, D) in α2, (D, C)in α3, and (D, D) in α4 percent of the stages.

Page 145: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Generalisation of trigger strategies

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ Every convex combination2 of payoffs

α1(3, 3) + α2(0, 5) + α3(5, 0) + α4(1, 1)

can be established by smartly picking appropriate strategy patterns.E.g.: “We play 4 times (C, C). Then we play 7 times (C, D), (D, C),then 4 times (C, C) and so on”.

■ Ensure that (C, C) occurs (in the long run) in α1, (C, D) in α2, (D, C)in α3, and (D, D) in α4 percent of the stages.

■ As long as these limiting average payoffs exceed payoff({D, D}) foreach player (which is 1), associated trigger strategies can beformulated that lead to these payoffs and trigger eternal play of(D, D) after a deviation.

Page 146: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Generalisation of trigger strategies

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ Every convex combination2 of payoffs

α1(3, 3) + α2(0, 5) + α3(5, 0) + α4(1, 1)

can be established by smartly picking appropriate strategy patterns.E.g.: “We play 4 times (C, C). Then we play 7 times (C, D), (D, C),then 4 times (C, C) and so on”.

■ Ensure that (C, C) occurs (in the long run) in α1, (C, D) in α2, (D, C)in α3, and (D, D) in α4 percent of the stages.

■ As long as these limiting average payoffs exceed payoff({D, D}) foreach player (which is 1), associated trigger strategies can beformulated that lead to these payoffs and trigger eternal play of(D, D) after a deviation.

■ For δ high enough, these strategies again form a SGP NE.2Meaning αi ≥ 0 and α1 + α2 + α3 + α4 = 1.

Page 147: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Folk theorem for SGP NE in the repeated PD

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

0

1

2

3

4

5

0 1 2 3 4 5

(3, 3)•

Page 148: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Folk theorem for SGP NE in the repeated PD

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

0

1

2

3

4

5

0 1 2 3 4 5

(3, 3)•

1. Feasible payoffs (striped):payoff combos that can beobtained by jointly repeatingpatterns of actions (more

accurately: patterns of actionprofiles).

Page 149: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Folk theorem for SGP NE in the repeated PD

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

0

1

2

3

4

5

0 1 2 3 4 5

(3, 3)•

1. Feasible payoffs (striped):payoff combos that can beobtained by jointly repeatingpatterns of actions (more

accurately: patterns of actionprofiles).

2. Enforceable payoffs (shaded):everyone resides aboveminmax.

Page 150: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Folk theorem for SGP NE in the repeated PD

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

0

1

2

3

4

5

0 1 2 3 4 5

(3, 3)•

1. Feasible payoffs (striped):payoff combos that can beobtained by jointly repeatingpatterns of actions (more

accurately: patterns of actionprofiles).

2. Enforceable payoffs (shaded):everyone resides aboveminmax.

For every payoff pair (x, y)in (1) ∩ (2),

Page 151: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Folk theorem for SGP NE in the repeated PD

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

0

1

2

3

4

5

0 1 2 3 4 5

(3, 3)•

1. Feasible payoffs (striped):payoff combos that can beobtained by jointly repeatingpatterns of actions (more

accurately: patterns of actionprofiles).

2. Enforceable payoffs (shaded):everyone resides aboveminmax.

For every payoff pair (x, y)in (1) ∩ (2), there is aδ(x, y) ∈ (0, 1),

Page 152: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Folk theorem for SGP NE in the repeated PD

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

0

1

2

3

4

5

0 1 2 3 4 5

(3, 3)•

1. Feasible payoffs (striped):payoff combos that can beobtained by jointly repeatingpatterns of actions (more

accurately: patterns of actionprofiles).

2. Enforceable payoffs (shaded):everyone resides aboveminmax.

For every payoff pair (x, y)in (1) ∩ (2), there is aδ(x, y) ∈ (0, 1), such that forall δ ≥ δ(x, y)

Page 153: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Folk theorem for SGP NE in the repeated PD

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

0

1

2

3

4

5

0 1 2 3 4 5

(3, 3)•

1. Feasible payoffs (striped):payoff combos that can beobtained by jointly repeatingpatterns of actions (more

accurately: patterns of actionprofiles).

2. Enforceable payoffs (shaded):everyone resides aboveminmax.

For every payoff pair (x, y)in (1) ∩ (2), there is aδ(x, y) ∈ (0, 1), such that forall δ ≥ δ(x, y) the payoff(x, y) can be obtained as thelimiting average in a sub-game perfect equilibrium ofG∗(δ).

Page 154: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Family of Folk Theorems

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

There actually exist many Folk Theorems.

Page 155: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Family of Folk Theorems

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

There actually exist many Folk Theorems.

■ Horizon. May the game be repeated infinitely (as in our case) or isthere an upper bound to the number of plays?

Page 156: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Family of Folk Theorems

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

There actually exist many Folk Theorems.

■ Horizon. May the game be repeated infinitely (as in our case) or isthere an upper bound to the number of plays?

■ Information. Do players act on the basis of CKR (present case), or arecertain parts of the history hidden?

Page 157: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Family of Folk Theorems

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

There actually exist many Folk Theorems.

■ Horizon. May the game be repeated infinitely (as in our case) or isthere an upper bound to the number of plays?

■ Information. Do players act on the basis of CKR (present case), or arecertain parts of the history hidden?

■ Reward. Do players collect their payoff through a discount factor(present case) or through average rewards?

Page 158: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Family of Folk Theorems

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

There actually exist many Folk Theorems.

■ Horizon. May the game be repeated infinitely (as in our case) or isthere an upper bound to the number of plays?

■ Information. Do players act on the basis of CKR (present case), or arecertain parts of the history hidden?

■ Reward. Do players collect their payoff through a discount factor(present case) or through average rewards?

■ Equilibrium. Do we consider Nash equilibria, or other forms ofequilibria, such as so-called ǫ-Nash equilibria or so-called correlatedequilibria?

Page 159: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Family of Folk Theorems

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

There actually exist many Folk Theorems.

■ Horizon. May the game be repeated infinitely (as in our case) or isthere an upper bound to the number of plays?

■ Information. Do players act on the basis of CKR (present case), or arecertain parts of the history hidden?

■ Reward. Do players collect their payoff through a discount factor(present case) or through average rewards?

■ Equilibrium. Do we consider Nash equilibria, or other forms ofequilibria, such as so-called ǫ-Nash equilibria or so-called correlatedequilibria?

■ Subgame perfectness. Do we consider subgame perfect equilibria(present case) or just Nash equilibria?

Page 160: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Part IV:

non-SGP Nash equilibria

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Page 161: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Existence of non-SGP Nash equilibria

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

many subgame perfe t equilibria exist for repeatedgamesexisten e of non-SGP Nash equilibria in repeated games

Page 162: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Existence of non-SGP Nash equilibria

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ We have seen that many subgame perfe t equilibria exist for repeatedgames.(At least for repeated games were both players have all information,horizon is infinite, and more assumptions.)

existen e of non-SGP Nash equilibria in repeated games

Page 163: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Existence of non-SGP Nash equilibria

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ We have seen that many subgame perfe t equilibria exist for repeatedgames.(At least for repeated games were both players have all information,horizon is infinite, and more assumptions.)

■ What about the existen e of non-SGP Nash equilibria in repeated games,i.e., equilibria that are not necessarily subgame perfect?

Page 164: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Existence of non-SGP Nash equilibria

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ We have seen that many subgame perfe t equilibria exist for repeatedgames.(At least for repeated games were both players have all information,horizon is infinite, and more assumptions.)

■ What about the existen e of non-SGP Nash equilibria in repeated games,i.e., equilibria that are not necessarily subgame perfect?

■ Without the requirement of subgame perfection, deviations can bepunished more severely: the equilibrium does not have to induce aNash equilibrium in the punishment subgame.

Page 165: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Existence of non-SGP Nash equilibria

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ We have seen that many subgame perfe t equilibria exist for repeatedgames.(At least for repeated games were both players have all information,horizon is infinite, and more assumptions.)

■ What about the existen e of non-SGP Nash equilibria in repeated games,i.e., equilibria that are not necessarily subgame perfect?

■ Without the requirement of subgame perfection, deviations can bepunished more severely: the equilibrium does not have to induce aNash equilibrium in the punishment subgame.

■ We will now consider the consequences of relaxing the subgameperfection requirement for a Nash equilibrium in an infinitelyrepeated game.

Page 166: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)

Page 167: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)

1. The pure profile (U, L) is the only mixed strategy profile that is a NE.

Page 168: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)

1. The pure profile (U, L) is the only mixed strategy profile that is a NE.

2. Define trigger-strategies (T1, T2) such that the pattern[(D, R), (U, L)3]∗ is played indefinitely. (So we have periods of length4.)

Page 169: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)

1. The pure profile (U, L) is the only mixed strategy profile that is a NE.

2. Define trigger-strategies (T1, T2) such that the pattern[(D, R), (U, L)3]∗ is played indefinitely. (So we have periods of length4.)

3. If this pattern is violated:

Page 170: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)

1. The pure profile (U, L) is the only mixed strategy profile that is a NE.

2. Define trigger-strategies (T1, T2) such that the pattern[(D, R), (U, L)3]∗ is played indefinitely. (So we have periods of length4.)

3. If this pattern is violated:

■ Fallback strategy of the row player (you) is mixed (0.8, 0.2)∗.

Page 171: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)

1. The pure profile (U, L) is the only mixed strategy profile that is a NE.

2. Define trigger-strategies (T1, T2) such that the pattern[(D, R), (U, L)3]∗ is played indefinitely. (So we have periods of length4.)

3. If this pattern is violated:

■ Fallback strategy of the row player (you) is mixed (0.8, 0.2)∗.

■ Fallback strategy of the column player is pure R∗.

Page 172: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)

1. The pure profile (U, L) is the only mixed strategy profile that is a NE.

2. Define trigger-strategies (T1, T2) such that the pattern[(D, R), (U, L)3]∗ is played indefinitely. (So we have periods of length4.)

3. If this pattern is violated:

■ Fallback strategy of the row player (you) is mixed (0.8, 0.2)∗.

■ Fallback strategy of the column player is pure R∗.

This combination is not a NE. (Cf. 1.)

Page 173: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: a repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)

Page 174: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: a repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)

Claim. The combination of trigger strategies (T1, T2) is a Nash-equilibrium forsome δ ∈ (0, 1).

Page 175: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: a repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)

Claim. The combination of trigger strategies (T1, T2) is a Nash-equilibrium forsome δ ∈ (0, 1).

■ T1 ⇒ T2. If you play (the non-degenerated part of) T1, then thecolumn player must play T2, for T2 is a best response to T1.

Page 176: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: a repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)

Claim. The combination of trigger strategies (T1, T2) is a Nash-equilibrium forsome δ ∈ (0, 1).

■ T1 ⇒ T2. If you play (the non-degenerated part of) T1, then thecolumn player must play T2, for T2 is a best response to T1.

■ T2 ⇒ T1. If at all, the best moment for you to deviate is at D, for thatwould give you a incidental advantage of 1. After that your opponentfalls back to R∗.

Page 177: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: a repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)

Claim. The combination of trigger strategies (T1, T2) is a Nash-equilibrium forsome δ ∈ (0, 1).

■ T1 ⇒ T2. If you play (the non-degenerated part of) T1, then thecolumn player must play T2, for T2 is a best response to T1.

■ T2 ⇒ T1. If at all, the best moment for you to deviate is at D, for thatwould give you a incidental advantage of 1. After that your opponentfalls back to R∗.

Total payoff for you: 0 (for cheating) + 0 + · · ·+ 0 (for being punishedby your opponent).

Page 178: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: a repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)

Page 179: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: a repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)

■ T2 ⇒ T1 (continued). Total payoff for row player: 0 (for cheating) +0 + · · ·+ 0 (for being punished by the column player).

Page 180: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: a repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)

■ T2 ⇒ T1 (continued). Total payoff for row player: 0 (for cheating) +0 + · · ·+ 0 (for being punished by the column player). Payoff for rowplayer if he was loyal:

(−1 + 1· δ + 1· δ2 + 1· δ

3)

Page 181: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: a repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)

■ T2 ⇒ T1 (continued). Total payoff for row player: 0 (for cheating) +0 + · · ·+ 0 (for being punished by the column player). Payoff for rowplayer if he was loyal:

(−1 + 1· δ + 1· δ2 + 1· δ

3) + (−1· δ4 + 1· δ

5 + 1· δ6 + 1· δ

7)

Page 182: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: a repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)

■ T2 ⇒ T1 (continued). Total payoff for row player: 0 (for cheating) +0 + · · ·+ 0 (for being punished by the column player). Payoff for rowplayer if he was loyal:

(−1 + 1· δ + 1· δ2 + 1· δ

3) + (−1· δ4 + 1· δ

5 + 1· δ6 + 1· δ

7) + . . .

Page 183: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: a repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)

■ T2 ⇒ T1 (continued). Total payoff for row player: 0 (for cheating) +0 + · · ·+ 0 (for being punished by the column player). Payoff for rowplayer if he was loyal:

(−1 + 1· δ + 1· δ2 + 1· δ

3) + (−1· δ4 + 1· δ

5 + 1· δ6 + 1· δ

7) + . . .

=∞

∑k=0

δk

Page 184: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: a repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)

■ T2 ⇒ T1 (continued). Total payoff for row player: 0 (for cheating) +0 + · · ·+ 0 (for being punished by the column player). Payoff for rowplayer if he was loyal:

(−1 + 1· δ + 1· δ2 + 1· δ

3) + (−1· δ4 + 1· δ

5 + 1· δ6 + 1· δ

7) + . . .

=∞

∑k=0

δk − 2

∑k=0

δ4k

Page 185: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: a repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)

■ T2 ⇒ T1 (continued). Total payoff for row player: 0 (for cheating) +0 + · · ·+ 0 (for being punished by the column player). Payoff for rowplayer if he was loyal:

(−1 + 1· δ + 1· δ2 + 1· δ

3) + (−1· δ4 + 1· δ

5 + 1· δ6 + 1· δ

7) + . . .

=∞

∑k=0

δk − 2

∑k=0

δ4k =

1

1 − δ

Page 186: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: a repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)

■ T2 ⇒ T1 (continued). Total payoff for row player: 0 (for cheating) +0 + · · ·+ 0 (for being punished by the column player). Payoff for rowplayer if he was loyal:

(−1 + 1· δ + 1· δ2 + 1· δ

3) + (−1· δ4 + 1· δ

5 + 1· δ6 + 1· δ

7) + . . .

=∞

∑k=0

δk − 2

∑k=0

δ4k =

1

1 − δ− 2

1

1 − δ4.

Page 187: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: a repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:Some game Left ( L ) Right ( R )You: Up ( U ) (1, 1) (0, 0)Down ( D ) (0, 0) (−1, 4)

■ T2 ⇒ T1 (continued). Total payoff for row player: 0 (for cheating) +0 + · · ·+ 0 (for being punished by the column player). Payoff for rowplayer if he was loyal:

(−1 + 1· δ + 1· δ2 + 1· δ

3) + (−1· δ4 + 1· δ

5 + 1· δ6 + 1· δ

7) + . . .

=∞

∑k=0

δk − 2

∑k=0

δ4k =

1

1 − δ− 2

1

1 − δ4.

This expression is positive only if δ >≈ 0.54. (Solve 3rd-degreeequation.)

Page 188: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Retaliation in repeated games: playing the minmax

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

In the previous game, your “PlanB” was to play a mixed strategy(0.8, 0.2).

Page 189: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Retaliation in repeated games: playing the minmax

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

In the previous game, your “PlanB” was to play a mixed strategy(0.8, 0.2).

Questions:

Page 190: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Retaliation in repeated games: playing the minmax

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

In the previous game, your “PlanB” was to play a mixed strategy(0.8, 0.2).

Questions:

■ Why may a mixed strategy(0.8, 0.2) be considered apunishment?

Page 191: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Retaliation in repeated games: playing the minmax

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

In the previous game, your “PlanB” was to play a mixed strategy(0.8, 0.2).

Questions:

■ Why may a mixed strategy(0.8, 0.2) be considered apunishment?

■ Is mixed strategy (0.8, 0.2) themost severe punishment?

Page 192: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Retaliation in repeated games: playing the minmax

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

In the previous game, your “PlanB” was to play a mixed strategy(0.8, 0.2).

Questions:

■ Why may a mixed strategy(0.8, 0.2) be considered apunishment?

■ Is mixed strategy (0.8, 0.2) themost severe punishment?

■ If so, why?

Page 193: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Retaliation in repeated games: playing minmax

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:0 1 2You: 0 (4, 9) (7, 9) (5, 8)

1 (6, 7) (8, 7) (4, 8)2 (7, 9) (5, 7) (6, 7)

youpure minmax

Page 194: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Retaliation in repeated games: playing minmax

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:0 1 2You: 0 (4, 9) (7, 9) (5, 8)

1 (6, 7) (8, 7) (4, 8)2 (7, 9) (5, 7) (6, 7)

First for pure strategies.

youpure minmax

Page 195: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Retaliation in repeated games: playing minmax

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:0 1 2You: 0 (4, 9) (7, 9) (5, 8)

1 (6, 7) (8, 7) (4, 8)2 (7, 9) (5, 7) (6, 7)

First for pure strategies.

■ Which action of you (the rowplayer) minimises the maximumpayoff of your opponent?

pure minmax

Page 196: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Retaliation in repeated games: playing minmax

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:0 1 2You: 0 (4, 9) (7, 9) (5, 8)

1 (6, 7) (8, 7) (4, 8)2 (7, 9) (5, 7) (6, 7)

First for pure strategies.

■ Which action of you (the rowplayer) minimises the maximumpayoff of your opponent?

■ This is the pure minmax:minimise maximum payoff,which can be found byscanning “blue rows” (= payoffsof opponent).

Page 197: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Retaliation in repeated games: playing minmax

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:0 1 2You: 0 (4, 9) (7, 9) (5, 8)

1 (6, 7) (8, 7) (4, 8)2 (7, 9) (5, 7) (6, 7)

First for pure strategies.

■ Which action of you (the rowplayer) minimises the maximumpayoff of your opponent?

■ This is the pure minmax:minimise maximum payoff,which can be found byscanning “blue rows” (= payoffsof opponent).

It turns out that Action 1 keepsthe payoff of your opponentbelow 9.

Page 198: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Retaliation in repeated games: playing minmax

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:0 1 2You: 0 (4, 9) (7, 9) (5, 8)

1 (6, 7) (8, 7) (4, 8)2 (7, 9) (5, 7) (6, 7)

First for pure strategies.

■ Which action of you (the rowplayer) minimises the maximumpayoff of your opponent?

■ This is the pure minmax:minimise maximum payoff,which can be found byscanning “blue rows” (= payoffsof opponent).

It turns out that Action 1 keepsthe payoff of your opponentbelow 9.

■ Similarly, if your opponentwishes to punish you, he scans“green columns” to minimiseyour payoff

Page 199: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Retaliation in repeated games: playing minmax

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:0 1 2You: 0 (4, 9) (7, 9) (5, 8)

1 (6, 7) (8, 7) (4, 8)2 (7, 9) (5, 7) (6, 7)

First for pure strategies.

■ Which action of you (the rowplayer) minimises the maximumpayoff of your opponent?

■ This is the pure minmax:minimise maximum payoff,which can be found byscanning “blue rows” (= payoffsof opponent).

It turns out that Action 1 keepsthe payoff of your opponentbelow 9.

■ Similarly, if your opponentwishes to punish you, he scans“green columns” to minimiseyour payoff ⇒ Action 2.

Page 200: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Retaliation in repeated games: playing minmax

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:0 1 2You: 0 (4, 9) (7, 9) (5, 8)

1 (6, 7) (8, 7) (4, 8)2 (7, 9) (5, 7) (6, 7)

First for pure strategies.

■ Which action of you (the rowplayer) minimises the maximumpayoff of your opponent?

■ This is the pure minmax:minimise maximum payoff,which can be found byscanning “blue rows” (= payoffsof opponent).

It turns out that Action 1 keepsthe payoff of your opponentbelow 9.

■ Similarly, if your opponentwishes to punish you, he scans“green columns” to minimiseyour payoff ⇒ Action 2.

■ Alert 1: minmax may be6= maxmin (= security levelstrategy).

Page 201: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Retaliation in repeated games: playing minmax

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other:0 1 2You: 0 (4, 9) (7, 9) (5, 8)

1 (6, 7) (8, 7) (4, 8)2 (7, 9) (5, 7) (6, 7)

First for pure strategies.

■ Which action of you (the rowplayer) minimises the maximumpayoff of your opponent?

■ This is the pure minmax:minimise maximum payoff,which can be found byscanning “blue rows” (= payoffsof opponent).

It turns out that Action 1 keepsthe payoff of your opponentbelow 9.

■ Similarly, if your opponentwishes to punish you, he scans“green columns” to minimiseyour payoff ⇒ Action 2.

■ Alert 1: minmax may be6= maxmin (= security levelstrategy).

■ Alert 2: mixed minmax may be< pure minmax.

Page 202: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Minmax: payoff surface of the opponent

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

his payoff

your mix

his mix

push payoffrange to theminimum

Page 203: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

The minmaxvalue as a threat

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

threat

Page 204: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

The minmaxvalue as a threat

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ Your (possibly mixed) minmax strategy can be used as a threatto withhold you opponent from deviating of the silently agreedpath.

Cf. the threat:

If you’re not complying to our normal pattern of ac-tions, I am going to minmax you.

Page 205: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

The minmaxvalue as a threat

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ Your (possibly mixed) minmax strategy can be used as a threatto withhold you opponent from deviating of the silently agreedpath.

Cf. the threat:

If you’re not complying to our normal pattern of ac-tions, I am going to minmax you.

■ By actually executing this threat you might harm yourself aswell ⇒ non-SGP.

Page 206: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

The minmaxvalue as a threat

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

■ Your (possibly mixed) minmax strategy can be used as a threatto withhold you opponent from deviating of the silently agreedpath.

Cf. the threat:

If you’re not complying to our normal pattern of ac-tions, I am going to minmax you.

■ By actually executing this threat you might harm yourself aswell ⇒ non-SGP.

■ For finite two-person strictly competitive games (such as zero-sum games), minmax = maxmin.

(Finite in the sense that the arsenal of playable actions, hencethe payoff matrix, is finite.)

Page 207: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other: L RYou: U (1, 1) (0, 0)D (0, 0) (−1, 4)

Page 208: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other: L RYou: U (1, 1) (0, 0)D (0, 0) (−1, 4)

■ Your opponent can punish youmaximally by playing R∗. Howyou can punish your opponentis less obvious.

Page 209: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other: L RYou: U (1, 1) (0, 0)D (0, 0) (−1, 4)

■ Your opponent can punish youmaximally by playing R∗. Howyou can punish your opponentis less obvious.

■ If you play D∗ your opponentwill earn 4∗. If you play U∗,your opponent will earn 1∗.

Page 210: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other: L RYou: U (1, 1) (0, 0)D (0, 0) (−1, 4)

■ Your opponent can punish youmaximally by playing R∗. Howyou can punish your opponentis less obvious.

■ If you play D∗ your opponentwill earn 4∗. If you play U∗,your opponent will earn 1∗.

■ It is possible to punish youropponent even more bybecoming unpredictable

Page 211: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other: L RYou: U (1, 1) (0, 0)D (0, 0) (−1, 4)

■ Your opponent can punish youmaximally by playing R∗. Howyou can punish your opponentis less obvious.

■ If you play D∗ your opponentwill earn 4∗. If you play U∗,your opponent will earn 1∗.

■ It is possible to punish youropponent even more bybecoming unpredictable (withinCKR!!).

Page 212: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other: L RYou: U (1, 1) (0, 0)D (0, 0) (−1, 4)

■ Your opponent can punish youmaximally by playing R∗. Howyou can punish your opponentis less obvious.

■ If you play D∗ your opponentwill earn 4∗. If you play U∗,your opponent will earn 1∗.

■ It is possible to punish youropponent even more bybecoming unpredictable (withinCKR!!).

Given your mixed strategy(u, d) your opponent maximiseshis payoff by choosing the rightmix (l, r):

maxl,r

ul· 1+ dr· 4

Page 213: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other: L RYou: U (1, 1) (0, 0)D (0, 0) (−1, 4)

■ Your opponent can punish youmaximally by playing R∗. Howyou can punish your opponentis less obvious.

■ If you play D∗ your opponentwill earn 4∗. If you play U∗,your opponent will earn 1∗.

■ It is possible to punish youropponent even more bybecoming unpredictable (withinCKR!!).

Given your mixed strategy(u, d) your opponent maximiseshis payoff by choosing the rightmix (l, r):

maxl,r

ul· 1+ dr· 4

= maxl

ul + 4(1 − u)(1 − l)

Page 214: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other: L RYou: U (1, 1) (0, 0)D (0, 0) (−1, 4)

■ Your opponent can punish youmaximally by playing R∗. Howyou can punish your opponentis less obvious.

■ If you play D∗ your opponentwill earn 4∗. If you play U∗,your opponent will earn 1∗.

■ It is possible to punish youropponent even more bybecoming unpredictable (withinCKR!!).

Given your mixed strategy(u, d) your opponent maximiseshis payoff by choosing the rightmix (l, r):

maxl,r

ul· 1+ dr· 4

= maxl

ul + 4(1 − u)(1 − l)

= maxl

(5u − 4)l + 4 − 4u

Page 215: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Other: L RYou: U (1, 1) (0, 0)D (0, 0) (−1, 4)

■ Your opponent can punish youmaximally by playing R∗. Howyou can punish your opponentis less obvious.

■ If you play D∗ your opponentwill earn 4∗. If you play U∗,your opponent will earn 1∗.

■ It is possible to punish youropponent even more bybecoming unpredictable (withinCKR!!).

Given your mixed strategy(u, d) your opponent maximiseshis payoff by choosing the rightmix (l, r):

maxl,r

ul· 1+ dr· 4

= maxl

ul + 4(1 − u)(1 − l)

= maxl

(5u − 4)l + 4 − 4u

■ If 5u − 4 = 0, it does not matterwhat you opponent chooses forl—his expected payoff is always4 − 4(4/5) = 4/5.

Page 216: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

u0 14/5

l

0

1 4

0

0

1

Draw a picture of the payoff sur-face of the opponent.

Page 217: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

u0 14/5

l

0

1 4

0

0

1

Draw a picture of the payoff sur-face of the opponent.

■ If 5u− 4 = 0, it does not matterwhat you opponent chooses forl. He expects 4 − 4(4/5).

Page 218: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

u0 14/5

l

0

1 4

0

0

1

Draw a picture of the payoff sur-face of the opponent.

■ If 5u− 4 = 0, it does not matterwhat you opponent chooses forl. He expects 4 − 4(4/5).

■ If 5u − 4 > 0 your opponentwill play 1, and expects to earn> 4 − 4u which is > 4/5.

Page 219: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

u0 14/5

l

0

1 4

0

0

1

Draw a picture of the payoff sur-face of the opponent.

■ If 5u− 4 = 0, it does not matterwhat you opponent chooses forl. He expects 4 − 4(4/5).

■ If 5u − 4 > 0 your opponentwill play 1, and expects to earn> 4 − 4u which is > 4/5.

■ If 5u − 4 < 0 your opponentwill play 0, and expects to earn> 4 − 4u which, again, is >

4/5.

Page 220: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Example: A repeated game with a non-SGP NE

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

u0 14/5

l

0

1 4

0

0

1

Draw a picture of the payoff sur-face of the opponent.

■ If 5u− 4 = 0, it does not matterwhat you opponent chooses forl. He expects 4 − 4(4/5).

■ If 5u − 4 > 0 your opponentwill play 1, and expects to earn> 4 − 4u which is > 4/5.

■ If 5u − 4 < 0 your opponentwill play 0, and expects to earn> 4 − 4u which, again, is >

4/5.

These calculations are done byhand, and do not easily generaliseto higher dimensions.

Page 221: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Literature

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Literature on game theory is vast. For me, the following sources werehelpful and (no less important) offered different perspectives to repeatedgames.

Page 222: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Literature

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Literature on game theory is vast. For me, the following sources werehelpful and (no less important) offered different perspectives to repeatedgames.

* C.M. Gintis (2009): Game Theory Evolving: A Problem-Centered Introduction to Modeling StrategicInteraction. Second Edition. University Press, Princeton. Ch. 9: Repeated Games.

Page 223: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Literature

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Literature on game theory is vast. For me, the following sources werehelpful and (no less important) offered different perspectives to repeatedgames.

* C.M. Gintis (2009): Game Theory Evolving: A Problem-Centered Introduction to Modeling StrategicInteraction. Second Edition. University Press, Princeton. Ch. 9: Repeated Games.

* H. Peters (2008): Game Theory: A Multi-Leveled Approach. Springer, ISBN: 978-3-540-69290-4.Ch. 8: Repeated games.

Page 224: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Literature

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Literature on game theory is vast. For me, the following sources werehelpful and (no less important) offered different perspectives to repeatedgames.

* C.M. Gintis (2009): Game Theory Evolving: A Problem-Centered Introduction to Modeling StrategicInteraction. Second Edition. University Press, Princeton. Ch. 9: Repeated Games.

* H. Peters (2008): Game Theory: A Multi-Leveled Approach. Springer, ISBN: 978-3-540-69290-4.Ch. 8: Repeated games.

* K. Leyton-Brown & Y. Shoham (2008): Essentials of Game Theory: A Concise, MultidisciplinaryIntroduction. Morgan and Claypool Publishers, 2008. Ch. 6: Repeated and Stochastic Games.

Page 225: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Literature

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Literature on game theory is vast. For me, the following sources werehelpful and (no less important) offered different perspectives to repeatedgames.

* C.M. Gintis (2009): Game Theory Evolving: A Problem-Centered Introduction to Modeling StrategicInteraction. Second Edition. University Press, Princeton. Ch. 9: Repeated Games.

* H. Peters (2008): Game Theory: A Multi-Leveled Approach. Springer, ISBN: 978-3-540-69290-4.Ch. 8: Repeated games.

* K. Leyton-Brown & Y. Shoham (2008): Essentials of Game Theory: A Concise, MultidisciplinaryIntroduction. Morgan and Claypool Publishers, 2008. Ch. 6: Repeated and Stochastic Games.

* S.P. Hargreaves Heap & Y. Varoufakis (2004): Game theory: a critical text, Routledge. Ch. 5: THEPRISONERS’DILEMMA - The riddle of co-operation and its implications for collective agency.

Page 226: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Literature

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Literature on game theory is vast. For me, the following sources werehelpful and (no less important) offered different perspectives to repeatedgames.

* C.M. Gintis (2009): Game Theory Evolving: A Problem-Centered Introduction to Modeling StrategicInteraction. Second Edition. University Press, Princeton. Ch. 9: Repeated Games.

* H. Peters (2008): Game Theory: A Multi-Leveled Approach. Springer, ISBN: 978-3-540-69290-4.Ch. 8: Repeated games.

* K. Leyton-Brown & Y. Shoham (2008): Essentials of Game Theory: A Concise, MultidisciplinaryIntroduction. Morgan and Claypool Publishers, 2008. Ch. 6: Repeated and Stochastic Games.

* S.P. Hargreaves Heap & Y. Varoufakis (2004): Game theory: a critical text, Routledge. Ch. 5: THEPRISONERS’DILEMMA - The riddle of co-operation and its implications for collective agency.

* J. Ratliff (1997): Graduate-Level Course in Game Theory. (AKA: “Jim Ratliff’s Graduate-LevelCourse in Game Theory”. Lecture notes, Dept. of Economics, University of Arizona. (Availablethrough the web but not officially published.) Sec. 5.3: A Folk Theorem Sampler.

Page 227: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

Literature

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Literature on game theory is vast. For me, the following sources werehelpful and (no less important) offered different perspectives to repeatedgames.

* C.M. Gintis (2009): Game Theory Evolving: A Problem-Centered Introduction to Modeling StrategicInteraction. Second Edition. University Press, Princeton. Ch. 9: Repeated Games.

* H. Peters (2008): Game Theory: A Multi-Leveled Approach. Springer, ISBN: 978-3-540-69290-4.Ch. 8: Repeated games.

* K. Leyton-Brown & Y. Shoham (2008): Essentials of Game Theory: A Concise, MultidisciplinaryIntroduction. Morgan and Claypool Publishers, 2008. Ch. 6: Repeated and Stochastic Games.

* S.P. Hargreaves Heap & Y. Varoufakis (2004): Game theory: a critical text, Routledge. Ch. 5: THEPRISONERS’DILEMMA - The riddle of co-operation and its implications for collective agency.

* J. Ratliff (1997): Graduate-Level Course in Game Theory. (AKA: “Jim Ratliff’s Graduate-LevelCourse in Game Theory”. Lecture notes, Dept. of Economics, University of Arizona. (Availablethrough the web but not officially published.) Sec. 5.3: A Folk Theorem Sampler.

* M.J. Osborne & A. Rubinstein (1994): A Course in Game Theory. MIT Press. Ch. 8: Repeatedgames.

Page 228: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

What next?

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Now that we know that infinitely many equilibria exist in repeatedgames (an embarrassment of richness), there are a number of ways inwhich we may proceed.

Page 229: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

What next?

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Now that we know that infinitely many equilibria exist in repeatedgames (an embarrassment of richness), there are a number of ways inwhich we may proceed.

■ Gradient Dynamics. This is to approximate NE of single-shot games(stage games) through gradient ascent (hill-climbing).

Page 230: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

What next?

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Now that we know that infinitely many equilibria exist in repeatedgames (an embarrassment of richness), there are a number of ways inwhich we may proceed.

■ Gradient Dynamics. This is to approximate NE of single-shot games(stage games) through gradient ascent (hill-climbing).

■ Reinforcement Learning. Agents simply execute the action(s) withmaximal rewards in the past.

Page 231: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

What next?

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Now that we know that infinitely many equilibria exist in repeatedgames (an embarrassment of richness), there are a number of ways inwhich we may proceed.

■ Gradient Dynamics. This is to approximate NE of single-shot games(stage games) through gradient ascent (hill-climbing).

■ Reinforcement Learning. Agents simply execute the action(s) withmaximal rewards in the past.

■ No-regret learning. Agents execute the action(s) with maximalvirtual rewards in the past.

Page 232: Multi-agent learning - Utrecht University€¦ · Author: Gerard Vreeswijk. Slides last modified on April 30th, 2014 at 22:58 Multi-agent learning: Repeated games 1. Much interaction

What next?

Author: Gerard Vreeswijk. Slides last modified on April 30th , 2014 at 22:58 Multi-agent learning: Repeated games

Now that we know that infinitely many equilibria exist in repeatedgames (an embarrassment of richness), there are a number of ways inwhich we may proceed.

■ Gradient Dynamics. This is to approximate NE of single-shot games(stage games) through gradient ascent (hill-climbing).

■ Reinforcement Learning. Agents simply execute the action(s) withmaximal rewards in the past.

■ No-regret learning. Agents execute the action(s) with maximalvirtual rewards in the past.

■ Fictitious Play. Sample the actions of opponent(s) and play a bestresponse.