towards equilibrium transfer in markov games 胡裕靖 2013-9-9
TRANSCRIPT
![Page 1: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649cff5503460f949cffd4/html5/thumbnails/1.jpg)
Towards Equilibrium Transfer in Markov Games
胡裕靖2013-9-9
![Page 2: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649cff5503460f949cffd4/html5/thumbnails/2.jpg)
Outline
BackgroundPreliminary IdeasSome Results
![Page 3: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649cff5503460f949cffd4/html5/thumbnails/3.jpg)
Background
![Page 4: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649cff5503460f949cffd4/html5/thumbnails/4.jpg)
Multi-agent Reinforcement Learning
Single-agent RL:
Mountain CarPath finding
RL in multi-agent tasks
Robot Soccer IKEA furniture robot
![Page 5: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649cff5503460f949cffd4/html5/thumbnails/5.jpg)
Markov Games
N: the set of agents.: the discrete state space.: the joint action space of the agents.is the reward function.p is the transition function.
Agent take joint actions
: the discrete state space.: the action space of the agent.is the reward function. is the transition function.
from one agent to more than one
![Page 6: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649cff5503460f949cffd4/html5/thumbnails/6.jpg)
Equilibrium-based MARL
Some equilibrium solution concepts in game theory can be adopted
![Page 7: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649cff5503460f949cffd4/html5/thumbnails/7.jpg)
Our Previous Work Equilibrium-based MARL:
Multi-agent reinforcement learning with meta equilibrium []
Multi-agent reinforcement learning by negotiation with unshared value functions []
Focusing on combining MARL with equilibrium solution concepts
Problematic issues: Equilibrium computing is complicated and time
consuming A new complexity class: TFNP! [] For tasks with many agents, equilibrium-based
MARL algorithms may take too much time
How to accelerate the learning process of equilibrium-based MARL?
![Page 8: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649cff5503460f949cffd4/html5/thumbnails/8.jpg)
Transfer Learning in RLMatthew E Taylor, Peter Stone. Transfer learning for reinforcement learning domains. Journal of Machine Learning Research, 2009.
𝑀𝐷𝑃 𝑀𝐷𝑃 ′instance/policy/value function/model/…
Alessandra Lazaric. Transfer in reinforcement learning: a framework and a survey. Reinforcement Learning, Springer, 2012.
accelerate
Reuse learnt knowledge
![Page 9: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649cff5503460f949cffd4/html5/thumbnails/9.jpg)
Transfer Learning in Markov Games?
𝑀𝑎𝑟𝑘𝑜𝑣𝐺𝑎𝑚𝑒 𝑀𝑎𝑟𝑘𝑜𝑣𝐺𝑎𝑚𝑒 ′instance/policy/value function/model/…
𝐺 (𝑠 ′ ′)𝐺 (𝑠 ′)𝐺 (𝑠 ) ……
𝐺 (𝑠 ′ ′)𝐺 (𝑠 ′)𝐺 (𝑠 ) ……
…………
……
𝑡
Why not transfer between these normal-form games within a Markov game?
Inter-task transfer
Inner-task transfer
![Page 10: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649cff5503460f949cffd4/html5/thumbnails/10.jpg)
Inner-task Transfer
𝑄1𝑡 (𝑠 ,𝑎 ,𝑏) 𝑄1
𝑡+1(𝑠 ,𝑎 ,𝑏)
……
Transfer equilibrium between similar normal-form games during learning in a Markov game:
Reuse the computed equilibria in previous games Reducing learning time
Key problems: Which games are similar? For example: the games occur on different visits
of a state How to transfer equilibrium?
![Page 11: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649cff5503460f949cffd4/html5/thumbnails/11.jpg)
Preliminary Ideas
![Page 12: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649cff5503460f949cffd4/html5/thumbnails/12.jpg)
Game Similarity Games with the same action space? Games with different action space? Similarity payoff distance? Equilibrium-based similarity or equilibrium-
independent similarity?Drew Fudenberg and David M. Kreps. A theory of learning, experimentation and equilibrium in games. 1990.
![Page 13: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649cff5503460f949cffd4/html5/thumbnails/13.jpg)
Game Similarity
Why not take in the second game?
Equilibrium-based similarity
Equilibrium transfer
Find equilibria of two games and compute the similarity
Transfer seems senseless!
Weird Cycle
![Page 14: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649cff5503460f949cffd4/html5/thumbnails/14.jpg)
Our IdeaTransfer equilibrium between games which are thought to
be similar.
Evaluate how much the loss brought by equilibrium transfer is.
Transfer is acceptable when there is a little loss.
𝑄1𝑡 (𝑠 ,𝑎 ,𝑏) 𝑄1
𝑡+1(𝑠 ,𝑎 ,𝑏)
……
The two games are different only in one item.
![Page 15: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649cff5503460f949cffd4/html5/thumbnails/15.jpg)
Problem Definition
𝐺 ,𝑝∗𝐺′ , ?
transfer method
Can we find a transfer method which can transfer the computed Nash equilibrium in game to a strategy profile in game that satisfies and , there holds
where is close to . In other words, given a transfer method, if is
small enough, then the transfer method is acceptable.
Furthermore,
Approximate Nash
equilibrium
![Page 16: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649cff5503460f949cffd4/html5/thumbnails/16.jpg)
Problem Definition
and , define the transfer error
Let Let
Given a transfer method, we need to find the bound of !
![Page 17: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649cff5503460f949cffd4/html5/thumbnails/17.jpg)
A Naïve Transfer Method
Define the difference of the two games such that and
Examine the transfer error
Direct Transfer
𝐺 ,𝑝∗ 𝐺′ , ?
𝑝∗
![Page 18: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649cff5503460f949cffd4/html5/thumbnails/18.jpg)
A Naïve Transfer Method
𝜖 𝑖 (𝑎𝑖 ,𝑝′)=𝑈 𝑖
𝐺′ (𝑎𝑖 ,𝑝−𝑖∗ )−𝑈 𝑖𝐺′
(𝑝∗ )
¿ Σ𝑎− 𝑖𝑝−𝑖∗ ( �⃗�− 𝑖 ) [𝑈 𝑖
𝐺 (𝑎𝑖 , �⃗�−𝑖 )+𝛿𝑖 (𝑎𝑖 , �⃗�− 𝑖)−Σ𝑎𝑖′𝑝𝑖
∗ (𝑎𝑖′ )[𝑈 𝑖𝐺 (𝑎𝑖
′ , �⃗�−𝑖)+𝛿𝑖 (𝑎𝑖′ , �⃗�− 𝑖)]]
¿ Σ�⃗�− 𝑖𝑝−𝑖∗ ( �⃗�− 𝑖 ) [𝑈𝑖
𝐺′
(𝑎𝑖 , �⃗�−𝑖 )− Σ𝑎𝑖′𝑝𝑖
∗ (𝑎𝑖′ )𝑈 𝑖
𝐺′
(𝑎𝑖′ , �⃗�− 𝑖) ]
¿ Σ�⃗�− 𝑖𝑝−𝑖∗ ( �⃗�− 𝑖 ) [𝑈𝑖
𝐺 (𝑎𝑖 ,�⃗�−𝑖 )−Σ𝑎𝑖′𝑝𝑖
∗ (𝑎𝑖′ )𝑈 𝑖
𝐺 (𝑎𝑖′ , �⃗�−𝑖 )]+Σ�⃗�−𝑖𝑝−𝑖∗ (�⃗�−𝑖 )[𝛿𝑖 (𝑎𝑖 , �⃗�− 𝑖)−Σ𝑎𝑖′𝑝𝑖
∗ (𝑎𝑖′ )𝛿𝑖 (𝑎𝑖′ , �⃗�− 𝑖) ]
≤ Σ�⃗�−𝑖𝑝−𝑖∗ ( �⃗�− 𝑖 )[𝛿𝑖(𝑎𝑖 , �⃗�− 𝑖)−Σ𝑎𝑖
′𝑝𝑖∗ (𝑎𝑖
′ )𝛿𝑖 (𝑎𝑖′ , �⃗�− 𝑖)]
¿ Σ�⃗�− 𝑖𝑝−𝑖∗ ( �⃗�− 𝑖 ) 𝛿𝑖 (𝑎𝑖 , �⃗�−𝑖 )−Σ�⃗�𝑝∗ (�⃗� )𝛿𝑖(�⃗�)
¿ Σ�⃗�− 𝑖𝑝−𝑖∗ ( �⃗�− 𝑖 ) 𝛿𝑖
+¿ (𝑎𝑖 ,�⃗�− 𝑖)−Σ�⃗�𝑝∗ ( �⃗�)𝛿𝑖 (�⃗�)¿
≤ Σ�⃗�−𝑖 𝛿𝑖+¿ (𝑎𝑖 ,�⃗�− 𝑖)−Σ�⃗�𝑝
∗ ( �⃗�)𝛿𝑖 (�⃗�)¿𝛿𝑖
+¿ (𝑎𝑖 ,�⃗�− 𝑖)=max (0 ,𝛿𝑖 (𝑎𝑖 ,�⃗�− 𝑖) )¿
¿ Σ�⃗�− 𝑖𝑝−𝑖∗ ( �⃗�− 𝑖 )𝑈 𝑖
𝐺 ′
(𝑎𝑖 , �⃗�− 𝑖 )−Σ𝑎𝑖′ Σ�⃗�−𝑖𝑝𝑖
∗ (𝑎𝑖′ )𝑈 𝑖
𝐺′ (𝑎𝑖′ , �⃗�−𝑖)
![Page 19: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649cff5503460f949cffd4/html5/thumbnails/19.jpg)
A Naïve Transfer Method
Σ�⃗�− 𝑖𝛿𝑖+¿ (𝑎𝑖 , �⃗�− 𝑖)−Σ�⃗�𝑝
∗ ( �⃗�) 𝛿𝑖(�⃗�)¿
Many items in are zero if two games are very similar
![Page 20: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649cff5503460f949cffd4/html5/thumbnails/20.jpg)
Some Results
![Page 21: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649cff5503460f949cffd4/html5/thumbnails/21.jpg)
Future Work
Some problems: Other transfer methods? Only Nash equilibrium? Equilibrium finding algorithms
Transfer between games with different action space
Transfer between games with different agent numbers
Game abstraction
![Page 22: Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9](https://reader034.vdocuments.us/reader034/viewer/2022042821/56649cff5503460f949cffd4/html5/thumbnails/22.jpg)
Thanks!
陳美詩撐夫新歌MV夫妻檔 - Wen Wei Popdf.wenweipo.com/2017/05/03/a22-0503.pdf · 胡楓專程來聽鄧麗君的歌,因朋友 都讚朗嘎拉姆唱得好。問胡楓有否興
Symposium M - MRS-JSymposium M マテリアルズ・フロンティア Materials Frontier オーガナイザー: 代表委員 長瀬 裕( 東海大学) 連絡委員 長瀬 裕(