towards equilibrium transfer in markov games 胡裕靖 2013-9-9

Towards Equilibrium Transfer in Markov Games

胡裕靖2013-9-9

Outline

BackgroundPreliminary IdeasSome Results

Background

Multi-agent Reinforcement Learning

Single-agent RL:

Mountain CarPath finding

RL in multi-agent tasks

Robot Soccer IKEA furniture robot

Markov Games

N: the set of agents.: the discrete state space.: the joint action space of the agents.is the reward function.p is the transition function.

Agent take joint actions

: the discrete state space.: the action space of the agent.is the reward function. is the transition function.

from one agent to more than one

Equilibrium-based MARL

Some equilibrium solution concepts in game theory can be adopted

Our Previous Work Equilibrium-based MARL:

Multi-agent reinforcement learning with meta equilibrium []

Multi-agent reinforcement learning by negotiation with unshared value functions []

Focusing on combining MARL with equilibrium solution concepts

Problematic issues: Equilibrium computing is complicated and time

consuming A new complexity class: TFNP! [] For tasks with many agents, equilibrium-based

MARL algorithms may take too much time

How to accelerate the learning process of equilibrium-based MARL?

Transfer Learning in RLMatthew E Taylor, Peter Stone. Transfer learning for reinforcement learning domains. Journal of Machine Learning Research, 2009.

𝑀𝐷𝑃 𝑀𝐷𝑃 ′instance/policy/value function/model/…

Alessandra Lazaric. Transfer in reinforcement learning: a framework and a survey. Reinforcement Learning, Springer, 2012.

accelerate

Reuse learnt knowledge

Transfer Learning in Markov Games?

𝑀𝑎𝑟𝑘𝑜𝑣𝐺𝑎𝑚𝑒 𝑀𝑎𝑟𝑘𝑜𝑣𝐺𝑎𝑚𝑒 ′instance/policy/value function/model/…

𝐺 (𝑠 ′ ′)𝐺 (𝑠 ′)𝐺 (𝑠 ) ……

…………

……

Why not transfer between these normal-form games within a Markov game?

Inter-task transfer

Inner-task transfer

Inner-task Transfer

𝑄1𝑡 (𝑠 ,𝑎 ,𝑏) 𝑄1

𝑡+1(𝑠 ,𝑎 ,𝑏)

……

Transfer equilibrium between similar normal-form games during learning in a Markov game:

Reuse the computed equilibria in previous games Reducing learning time

Key problems: Which games are similar? For example: the games occur on different visits

of a state How to transfer equilibrium?

Preliminary Ideas

Game Similarity Games with the same action space? Games with different action space? Similarity payoff distance? Equilibrium-based similarity or equilibrium-

independent similarity?Drew Fudenberg and David M. Kreps. A theory of learning, experimentation and equilibrium in games. 1990.

Game Similarity

Why not take in the second game?

Equilibrium-based similarity

Equilibrium transfer

Find equilibria of two games and compute the similarity

Transfer seems senseless!

Weird Cycle

Our IdeaTransfer equilibrium between games which are thought to

be similar.

Evaluate how much the loss brought by equilibrium transfer is.

Transfer is acceptable when there is a little loss.

𝑄1𝑡 (𝑠 ,𝑎 ,𝑏) 𝑄1

𝑡+1(𝑠 ,𝑎 ,𝑏)

……

The two games are different only in one item.

Problem Definition

𝐺 ,𝑝∗𝐺′ , ?

transfer method

Can we find a transfer method which can transfer the computed Nash equilibrium in game to a strategy profile in game that satisfies and , there holds

where is close to . In other words, given a transfer method, if is

small enough, then the transfer method is acceptable.

Furthermore,

Approximate Nash

equilibrium

Problem Definition

and , define the transfer error

Let Let

Given a transfer method, we need to find the bound of !

A Naïve Transfer Method

Define the difference of the two games such that and

Examine the transfer error

Direct Transfer

𝐺 ,𝑝∗ 𝐺′ , ?

𝑝∗

𝜖 𝑖 (𝑎𝑖 ,𝑝′)=𝑈 𝑖

𝐺′ (𝑎𝑖 ,𝑝−𝑖∗ )−𝑈 𝑖𝐺′

(𝑝∗ )

¿ Σ𝑎− 𝑖𝑝−𝑖∗ ( �⃗�− 𝑖 ) [𝑈 𝑖

𝐺 (𝑎𝑖 , �⃗�−𝑖 )+𝛿𝑖 (𝑎𝑖 , �⃗�− 𝑖)−Σ𝑎𝑖′𝑝𝑖

∗ (𝑎𝑖′ )[𝑈 𝑖𝐺 (𝑎𝑖

′ , �⃗�−𝑖)+𝛿𝑖 (𝑎𝑖′ , �⃗�− 𝑖)]]

¿ Σ�⃗�− 𝑖𝑝−𝑖∗ ( �⃗�− 𝑖 ) [𝑈𝑖

𝐺′

(𝑎𝑖 , �⃗�−𝑖 )− Σ𝑎𝑖′𝑝𝑖

∗ (𝑎𝑖′ )𝑈 𝑖

𝐺′

(𝑎𝑖′ , �⃗�− 𝑖) ]

¿ Σ�⃗�− 𝑖𝑝−𝑖∗ ( �⃗�− 𝑖 ) [𝑈𝑖

𝐺 (𝑎𝑖 ,�⃗�−𝑖 )−Σ𝑎𝑖′𝑝𝑖

∗ (𝑎𝑖′ )𝑈 𝑖

𝐺 (𝑎𝑖′ , �⃗�−𝑖 )]+Σ�⃗�−𝑖𝑝−𝑖∗ (�⃗�−𝑖 )[𝛿𝑖 (𝑎𝑖 , �⃗�− 𝑖)−Σ𝑎𝑖′𝑝𝑖

∗ (𝑎𝑖′ )𝛿𝑖 (𝑎𝑖′ , �⃗�− 𝑖) ]

≤ Σ�⃗�−𝑖𝑝−𝑖∗ ( �⃗�− 𝑖 )[𝛿𝑖(𝑎𝑖 , �⃗�− 𝑖)−Σ𝑎𝑖

′𝑝𝑖∗ (𝑎𝑖

′ )𝛿𝑖 (𝑎𝑖′ , �⃗�− 𝑖)]

¿ Σ�⃗�− 𝑖𝑝−𝑖∗ ( �⃗�− 𝑖 ) 𝛿𝑖 (𝑎𝑖 , �⃗�−𝑖 )−Σ�⃗�𝑝∗ (�⃗� )𝛿𝑖(�⃗�)

¿ Σ�⃗�− 𝑖𝑝−𝑖∗ ( �⃗�− 𝑖 ) 𝛿𝑖

+¿ (𝑎𝑖 ,�⃗�− 𝑖)−Σ�⃗�𝑝∗ ( �⃗�)𝛿𝑖 (�⃗�)¿

≤ Σ�⃗�−𝑖 𝛿𝑖+¿ (𝑎𝑖 ,�⃗�− 𝑖)−Σ�⃗�𝑝

∗ ( �⃗�)𝛿𝑖 (�⃗�)¿𝛿𝑖

+¿ (𝑎𝑖 ,�⃗�− 𝑖)=max (0 ,𝛿𝑖 (𝑎𝑖 ,�⃗�− 𝑖) )¿

¿ Σ�⃗�− 𝑖𝑝−𝑖∗ ( �⃗�− 𝑖 )𝑈 𝑖

𝐺 ′

(𝑎𝑖 , �⃗�− 𝑖 )−Σ𝑎𝑖′ Σ�⃗�−𝑖𝑝𝑖

∗ (𝑎𝑖′ )𝑈 𝑖

𝐺′ (𝑎𝑖′ , �⃗�−𝑖)

Σ�⃗�− 𝑖𝛿𝑖+¿ (𝑎𝑖 , �⃗�− 𝑖)−Σ�⃗�𝑝

∗ ( �⃗�) 𝛿𝑖(�⃗�)¿

Many items in are zero if two games are very similar

Some Results

Future Work

Some problems: Other transfer methods? Only Nash equilibrium? Equilibrium finding algorithms

Transfer between games with different action space

Transfer between games with different agent numbers

Game abstraction

Thanks!

towards equilibrium transfer in markov games 胡裕靖 2013-9-9

transfer learning

idea transfer equilibrium

similarity transfer

algorithms transfer

background slide

transfer methods

knowledge slide

equilibrium computing

Documents

7colors.org7colors.org/kenpo9/resume01.pdf · but . author:...

藤井裕史 hiroshi fujii...2017/05/02 · h:186.5cm c:95...

unit 4 cyberspace lesson 4 virtual tourism unit 4 cyberspace...

高橋裕樹線の描画 java 演習 (3) 図の描画...

專題學生： cab106013 葉名豪梁皓鈞林聞靖...

p12-p13-05 - nex1 · title: p12-p13-05.eps author: 胡...

page-0001 · title: page-0001.jpg author: 大村裕介...

an awake paralysis victim in sicu and cardiac anesthesia r1...

sky vision2 user manual · sky vision2 user manual.pdf...

11 01 02 胡恩威建筑音乐剧场 - zuni.org.hk

determinantal rings 1...

越南胡志明市.vista verde @hcmc, vietnam by capitaland

printjinzai.cnbc.or.jp/wordpress/miryoku2018/18002.pdftitle...

abo blood typing(1) hu li-ping( 胡利平 ) forensic...

institute of property management 2018/02/01...2 2018 年01...

1 a survey of ascending subgraph decomposition 胡維新

寻找！优秀的主人 1.2...

o et17:ootitle gyoseievent_a4omo author 國田恒裕...

yue yuen industrial (holdings) limited...

a diabetic patient with hhnk presenting for an emergent...