![Page 1: Convergent Learning in Unknown Graphical Games](https://reader036.vdocuments.us/reader036/viewer/2022062316/56816865550346895ddec222/html5/thumbnails/1.jpg)
Convergent Learning in Unknown Graphical Games
Dr Archie Chapman, Dr David Leslie,Dr Alex Rogers and Prof Nick Jennings
School of Mathematics, University of Bristol and School of Electronics and Computer Science
University of Southampton
![Page 2: Convergent Learning in Unknown Graphical Games](https://reader036.vdocuments.us/reader036/viewer/2022062316/56816865550346895ddec222/html5/thumbnails/2.jpg)
Playing games?
)0,0()1,1()1,1()1,1()0,0()1,1()1,1()1,1()0,0(
PaperScissors
RockPaperScissorsRock
![Page 3: Convergent Learning in Unknown Graphical Games](https://reader036.vdocuments.us/reader036/viewer/2022062316/56816865550346895ddec222/html5/thumbnails/3.jpg)
Playing games?
![Page 4: Convergent Learning in Unknown Graphical Games](https://reader036.vdocuments.us/reader036/viewer/2022062316/56816865550346895ddec222/html5/thumbnails/4.jpg)
Playing games?
Dense deployment of sensors to detect pedestrian and vehicle activity within an urban environment.
Berkeley Engineering
![Page 5: Convergent Learning in Unknown Graphical Games](https://reader036.vdocuments.us/reader036/viewer/2022062316/56816865550346895ddec222/html5/thumbnails/5.jpg)
Learning in games• Adapt to observations of past play• Hope to converge to something “good”• Why?!
– Bounded rationality justification of equilibrium– Robust to behaviour of “opponents”– Language to describe distributed optimisation
![Page 6: Convergent Learning in Unknown Graphical Games](https://reader036.vdocuments.us/reader036/viewer/2022062316/56816865550346895ddec222/html5/thumbnails/6.jpg)
Notation• Players• Discrete action sets• Joint action set • Reward functions
• Mixed strategies• Joint mixed strategy space• Reward functions extend to
},...,1{ NiiA
RAri :NAAA 1
)( iii AN 1
R:ir
![Page 7: Convergent Learning in Unknown Graphical Games](https://reader036.vdocuments.us/reader036/viewer/2022062316/56816865550346895ddec222/html5/thumbnails/7.jpg)
Best response / Equilibrium• Mixed strategies of all players other than i is
• Best response of player i is
• An equilibrium is a satisfying, for all i,
i
),(argmax)( iiiii rbii
)( iii b
![Page 8: Convergent Learning in Unknown Graphical Games](https://reader036.vdocuments.us/reader036/viewer/2022062316/56816865550346895ddec222/html5/thumbnails/8.jpg)
Fictitious playEstimate strategies of other players
Game matrix
Select best action given estimates
Update estimates
![Page 9: Convergent Learning in Unknown Graphical Games](https://reader036.vdocuments.us/reader036/viewer/2022062316/56816865550346895ddec222/html5/thumbnails/9.jpg)
Belief updates• Belief about strategy of player i is the MLE
• Online updating
taa i
ti
iti
)()(
1111 )( ttt
tt b
![Page 10: Convergent Learning in Unknown Graphical Games](https://reader036.vdocuments.us/reader036/viewer/2022062316/56816865550346895ddec222/html5/thumbnails/10.jpg)
• Processes of the form
where and• F is set-valued (convex and u.s.c.)• Limit points are chain-recurrent sets of the
differential inclusion
Stochastic approximation
1111 )( tttttt eMXFXX
0)|( 1 tt XME 0te
)(XFX
![Page 11: Convergent Learning in Unknown Graphical Games](https://reader036.vdocuments.us/reader036/viewer/2022062316/56816865550346895ddec222/html5/thumbnails/11.jpg)
Best-response dynamics• Fictitious play has M and e identically 0, and
• Limit points are limit points of the best-response differential inclusion
• In potential games (and zero-sum games and some others) the limit points must be Nash equilibria
)(b
tt1
![Page 12: Convergent Learning in Unknown Graphical Games](https://reader036.vdocuments.us/reader036/viewer/2022062316/56816865550346895ddec222/html5/thumbnails/12.jpg)
Generalised weakened fictitious play
• Consider any process such that
where andand also an interplay between and M.
• The convergence properties do not change
tttttt Mbt
111 )(
,0t t0t
![Page 13: Convergent Learning in Unknown Graphical Games](https://reader036.vdocuments.us/reader036/viewer/2022062316/56816865550346895ddec222/html5/thumbnails/13.jpg)
Fictitious playEstimate strategies of other players
Game matrix
Select best action given estimates
Update estimates
![Page 14: Convergent Learning in Unknown Graphical Games](https://reader036.vdocuments.us/reader036/viewer/2022062316/56816865550346895ddec222/html5/thumbnails/14.jpg)
?)(?,?)(?,?)(?,?)(?,?)(?,?)(?,?)(?,?)(?,?)(?,
PaperScissors
RockPaperScissorsRock
Learning the game
ti
ti
ti earR )(
![Page 15: Convergent Learning in Unknown Graphical Games](https://reader036.vdocuments.us/reader036/viewer/2022062316/56816865550346895ddec222/html5/thumbnails/15.jpg)
Reinforcement learning• Track the average reward for each joint
action• Play each joint action frequently enough• Estimates will be close to the expected value
• Estimated game converges to the true game
![Page 16: Convergent Learning in Unknown Graphical Games](https://reader036.vdocuments.us/reader036/viewer/2022062316/56816865550346895ddec222/html5/thumbnails/16.jpg)
Q-learned fictitious playEstimate strategies of other players
Game matrix
Select best action given estimates
Estimated game matrix
Select best action given estimates
Update estimates
![Page 17: Convergent Learning in Unknown Graphical Games](https://reader036.vdocuments.us/reader036/viewer/2022062316/56816865550346895ddec222/html5/thumbnails/17.jpg)
Theoretical result
Theorem – If all joint actions are played infinitely often then beliefs follow a GWFP
Proof: The estimated game converges to the true game, so selected strategies are -best responses.
![Page 18: Convergent Learning in Unknown Graphical Games](https://reader036.vdocuments.us/reader036/viewer/2022062316/56816865550346895ddec222/html5/thumbnails/18.jpg)
Playing games?
Dense deployment of sensors to detect pedestrian and vehicle activity within an urban environment.
Berkeley Engineering
![Page 19: Convergent Learning in Unknown Graphical Games](https://reader036.vdocuments.us/reader036/viewer/2022062316/56816865550346895ddec222/html5/thumbnails/19.jpg)
It’s impossible!• N players, each with A actions• Game matrix has AN entries to learn
• Each individual must estimate the strategy of every other individual
• It’s just not possible for realistic game scenarios
![Page 20: Convergent Learning in Unknown Graphical Games](https://reader036.vdocuments.us/reader036/viewer/2022062316/56816865550346895ddec222/html5/thumbnails/20.jpg)
Marginal contributions• Marginal contribution of player i is
total system reward – system reward if i absent
• Maximised marginal contributions implies system is at a (local) optimum
• Marginal contribution might depend only on the actions of a small number of neighbours
![Page 21: Convergent Learning in Unknown Graphical Games](https://reader036.vdocuments.us/reader036/viewer/2022062316/56816865550346895ddec222/html5/thumbnails/21.jpg)
Sensors – rewards• Global reward for action a is
• Marginal reward for i is
• Actually use
j
an
jj
jg
jEIEaU )(
eventsobserved is
nobservatio and events
1)(
ij
ananiggi
jjEaUaUarby
observed
)(1)(
events)()()(
ij
ananti
tj
tjR
by observed
)(1)(
![Page 22: Convergent Learning in Unknown Graphical Games](https://reader036.vdocuments.us/reader036/viewer/2022062316/56816865550346895ddec222/html5/thumbnails/22.jpg)
Local learningEstimate strategies of other players
Game matrix
Select best action given estimates
Estimated game matrix
Select best action given estimates
Update estimates
![Page 23: Convergent Learning in Unknown Graphical Games](https://reader036.vdocuments.us/reader036/viewer/2022062316/56816865550346895ddec222/html5/thumbnails/23.jpg)
Local learningEstimate strategies of neighbours
Game matrix
Select best action given estimates
Estimated game matrixfor local interactions
Select best action given estimates
Update estimates
![Page 24: Convergent Learning in Unknown Graphical Games](https://reader036.vdocuments.us/reader036/viewer/2022062316/56816865550346895ddec222/html5/thumbnails/24.jpg)
Theoretical result
Theorem – If all joint actions of local games are played infinitely often then beliefs follow a GWFP
Proof: The estimated game converges to the true game, so selected strategies are -best responses.
![Page 25: Convergent Learning in Unknown Graphical Games](https://reader036.vdocuments.us/reader036/viewer/2022062316/56816865550346895ddec222/html5/thumbnails/25.jpg)
Sensing results
![Page 26: Convergent Learning in Unknown Graphical Games](https://reader036.vdocuments.us/reader036/viewer/2022062316/56816865550346895ddec222/html5/thumbnails/26.jpg)
So what?!• Play converges to (local) optimum with only
noisy information and local communication
• An individual always chooses an action to maximise expected reward given information
• If an individual doesn’t “play cricket”, the other individuals will reach an optimal point conditional on the behaviour of the itinerant
![Page 27: Convergent Learning in Unknown Graphical Games](https://reader036.vdocuments.us/reader036/viewer/2022062316/56816865550346895ddec222/html5/thumbnails/27.jpg)
Summary• Learning the game while playing is essential
• This can be accommodated within the GWFP framework
• Exploiting the neighbourhood structure of marginal contributions is essential for feasibility