multi-agent learning emergence of conventions · last modiﬁed on april 3rd, 2014 at 13:17 slide...

Multi-agent learning Emergence of Conventions

Multi-agent learningEmergen e of Conventions

Gerard Vreeswijk, Intelligent Systems Group, Computer Science Department,

Faculty of Sciences, Utrecht University, The Netherlands.

Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 1


Motivation



Simple example of a Markov process

• Return probabilities are usually

omitted in diagrams.

• In this case it can be derived that,

on average,{

P(Sun) = 6/7

P(Rain) = 1/7

• How?

We’ll see . . .



Plan for today

1. Markov processes. (Ergodic process, communicating states/class, transient

state/class, recurrent state/class, periodic state/class, absorbing state,

irreducible process, stationary distribution.)

Compute stationary distributions:

• Solve n linear equations.

• Compare n so-called z-trees (Freidlin and Wentzell, 1984).

2. Perturbed Markov processes. (Regular perturbed Markov process,

punctuated equilibrium, stochastically stable state.)

Compute stochastically stable states:

• Compare k so-called z-trees, where k is the number of so-called recurrent

classes (Peyton Young, 1993).



Plan for today

3. Applications.

• Emergence of a currency standard.

• Competing technologies: operating system A vs. operating system B.

• Competing technologies: cell phone company A vs. cell phone

company B. (If time allows.)

• Schelling’s model of segregation (1969).



Part 1: Markov processes



State transitions



Communication classes



Start state matters



Start state matters. . . but here it does not



The stationary distribution (and computing one)

P(A) = P(A|A′)P(A′) + P(A|B′)P(B′) + P(A|C′)P(C′) + P(A|D′)P(D′)

Let us assume that visiting probabilities are stationary (A = A′, B = B′, . . . ):

= P(A|A)P(A) + P(A|B)P(B) + P(A|C)P(C) + P(A|D)P(D)

= 0 · P(A) + 0 · P(B) + 1 · P(C) + 0 · P(D)

= P(C)

Let us write this as A = C. Similarly, B = 0.8A, C = D, and D = 0.2A + B.

Four equations with four unknowns. (Always regular, i.e. Det 6= 0 ?)



Theory of discrete Markov processes

Definitions:

• Stationary distribution: fixed point

of transition probabilities.

• Empirical distribution: long run

normalised frequency of visits.

• Limit distribution: long run

probability to visit a node.

• Process is path-dependent:

empirical distribution depends on

start state. Ergodic otherwise.

• Class is recurrent: process cannot

escape. Transient otherwise.

• Process is irreducible: all states can

reach each other.

Facts:

• Node is recurrent: process will

return to it a.s.

• If finite number of states:

– At least one recurrence class.

– If precisely one recurrence class

then ergodic, and conversely.

• Stationary distribution always

exists.

Unique iff ergodic. In that case,

stationary distr. ≡ empirical distr.

• If ergodic and a-periodic, then

stationary distr. ≡ limit distr.



Finding stationary distributions with many states is difficult

• Solve n equations in nunknowns. What if S is large?

0.1 0.2 0.0 0.1 0.0 0.1 0.0 0.3 0.0 0.2

0.1 0.2 0.0 0.1 0.0 0.1 0.0 0.3 0.0 0.2

0.1 0.2 0.0 0.1 0.0 0.1 0.0 0.3 0.0 0.2

0.0 0.1 0.1 0.2 0.0 0.1 0.0 0.3 0.0 0.2

0.5 0.2 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.2

0.1 0.2 0.0 0.1 0.0 0.1 0.0 0.3 0.0 0.2

0.0 0.1 0.1 0.2 0.0 0.1 0.0 0.3 0.0 0.2

0.1 0.2 0.0 0.1 0.0 0.1 0.0 0.3 0.0 0.2

0.3 0.1 0.2 0.0 0.1 0.0 0.0 0.0 0.3 0.0

0.1 0.2 0.0 0.1 0.0 0.1 0.0 0.3 0.0 0.2

• Freidlin & Wentzell (1984):

only look at so-called state

trees.



An irreducible (and finite) Markov process



One possible A-tree



Another possible A-tree



A perhaps easier way to compute the stationary distribution

• An s-tree, Ts, is a complete collection of disjoint paths from states 6= s to s.

• The likelihood of an s-tree Ts, written ℓ(Ts), =Def the product of its edge

probabilities.

• The likelihood of a state s, written ℓ(s), =Def sum of the likelihood of all

s-trees.

Theorem (Freidlin & Wentzell, 1984). Let P be an irreducible

finite Markov process. Then, for all states, the likelihood of that

state is proportional to the stationary probability of that state.



Counting s-trees with Freidlin & Wentzell: example

Freidlin & Wentzell (1984):

µ(s) =v(s)

∑t∈S v(t), where v(t) =Def ∑

T∈Ts

ℓ(Ts)

The unique C-tree is coloured red. Computing ℓ(TC) = 10ǫ· 1/4· . . . = 5ǫ3/12.

Similarly:

State: A B C D E F G

Distribution: ǫ2/24 5ǫ3/9 5ǫ3/12 5ǫ2/24 ǫ2/24 ǫ/48 ǫ/32

Note what happens if ǫ → 0.



Part 2:

Perturbed Markov processes



Motivation



Most Markov processes are path-dependent (non-ergodic)



Make them ergodic by perturbing with ǫr(s,s′) here and there



Compute s-trees from P0-recurrent classes only (!)



Class {B, D, E} possesses lowest stochastic potential, viz. 4.



Example of P0 and Pǫ

limǫ→0

0.0 0.2 0.2 0.1 0.5

0.3 ǫ7 0.1 0.1 0.5 − ǫ7

0.1 0.2 0.2 0.0 0.5

0.7 0.1 0.2 0.0 0.0

0.1 0.2 − ǫ2/2 0.2 ǫ2 0.5 − ǫ2/2

0.0 0.0 0.1 0.0 0.9

=

0.0 0.2 0.2 0.1 0.5

0.3 0.0 0.1 0.1 0.5

0.1 0.2 0.2 0.0 0.5

0.7 0.1 0.2 0.0 0.0

0.1 0.2 0.2 0.0 0.5

0.0 0.0 0.1 0.0 0.9

• Notice that some P0-positive probabilities “have to give way” to perturbe

P0-zero probabilities with ǫ.

(Because row probabilities must add up to 1.)



Perturbed Markov processes

• P0 is a Markov process on a finite state space S.

• Let, for each ǫ ∈ (0, ǫ∗], Pǫ be a Markov process on the same state space.

• The collection

{Pǫ | ǫ ∈ (0, ǫ∗]}

is a regular perturbation of P0 if

1. Each Pǫ is ergodic.

2. It holds that limǫ→0 Pǫ = P0.

3. If Pǫ

s,s′ > 0, for some ǫ > 0, then

0 < limǫ→0

Pǫ

ǫr(s,s′)< ∞

for some r(s, s′) ≥ 0. This number is called the resistance from s tot s′.



Resistance

1. Each Pǫ is ergodic.

2. It holds that limǫ→0 Pǫ = P0.

3. If Pǫ

s,s′ > 0, for some ǫ > 0, then

0 < limǫ→0

Pǫ

ǫr(s,s′)< ∞

for some r(s, s′) ≥ 0.

4. For transitions s → s′ where P0s,s′ = Pǫ

s,s′ = 0 the resistance is defined to be

∞.

Note:

• The number r(s, s′) is well-defined!

• If P0s,s′ > 0 then r(s, s′) = 0.

• If r(s, s′) = 0 then P0s,s′ > 0.



Stochastic stability

• Because each Pǫ is ergodic, the stationary distribution µǫ is uniquely

defined, for every ǫ ∈ (0, ǫ∗].

• It can be shown that

limǫ→0

µǫ(s)

exists for every s. Let us call this distribution µ0.

• A state s is said to be stochastically stable if

µ0(s) > 0

Remarks:

• It can be shown that µ0 is a stationary distribution of P0.

• It follows that every regular perturbed Markov process possesses at least

one stochastically stable state.



A way to compute stochastically stable states

• Recurrent classes 1, . . . , K.

• The resistance of a path from i to j

=Def the sum of edge resistances.

(Why the sum?)

• Construct edges rij (between

classes) with the minimum

resistance from i to j.

• The resistance of j-tree Tj, written

r(Tj), =Def sum of edge resistances

(in class graph).

• The stochastic potential of

recurrence class j, written p(j),=Def the minimum resistance over

all j-trees.

Theorem (Young, 1993). Let

{Pǫ | ǫ ∈ (0, ǫ∗]}

be a regular perturbed Markov

process, and let µǫ be the unique

stationary distribution of Pǫ, ǫ > 0.

Then

• limǫ→0 µǫ = µ0 exists.

• µ0 is a stationary distribution of P0.

• The stochastically stable states are

precisely those that are contained

in the recurrent class(es) of P0 with

minimum stochastic potential.



Minimum path resistance: example

• Compute path resistance between all K recurrent classes.

• With K recurrent classes there are always K(K − 1) minimum path

resistances to be computed. (We work on KK [unfortunate notation].)

Example:

• Suppose there are three recurrent classes E1, E2, and E3.

• Minimum path resistances here are 1, 5, 6, 7, 8, 9.



Nine j-trees generated by three recurrence classes



Revisit earlier example

1. The unperturbed Markov process

P0 possesses two recurrent classes,

viz. E1 = {A} and E2 = {F, G}.

2. Least resistance from E1 to E2 is

10ǫ· . . . = ǫ1/32. Resistance 1.

3. Least resistance from E2 to E1 is

1/3· ǫ· . . . = ǫ2/24. Resistance 2.

4. There is only one resistance tree to

either side, hence one minimum

resistance tree.

5. Stochastic potential of E1 is 2;

stochastic potential of E2 is 1.

6. Conclusion: E2 is stochastically

stable, E1 is not.



Part 3: Applications



Technology adoption

Other:

Operating system A Operating system BYou: Operating system A (a, a) (0, 0)

Operating system B (0, 0) (b, b)

Total number of players : n, for example n = 5

Sample size : s, for example s = 3

Total number of players currently playing A : m, for example m = 2

P( individual chooses A | AABBB ) = 3/

(

5

3

)

= 3/10

P( #A′s = k | AABBB ) =

(

5

3

)

(3

10)k(

7

10)5−k

This process is path-dependent (non-ergodic): for example always

BABBB, BABBB, etc. → BBBBB. With b ≫ a even BAABB, etc. → BBBBB.



Idiosyncratic play in technology adoption

“How, then, might institutional change occur? Because best-response

play renders both conventions absorbing states, it is clear that in order

to understand institutional change, some kind of nonbest-response

play must be introduced. Suppose there is a probability ǫ that when

individuals are in the process of updating, each may switch their type

for idiosyncratic reasons. Thus, 1 − ǫ represents the probability that

the individual pursues the best-response updating process described

above. The idiosyn rati play a ounting for nonbest responses need notbe irrational or odd; it simply represents actions whose reasons are not

explicitly modeled. Included is experimentation, whim, error, and

intentional acts seeking to affect game outcomes but whose

motivations are not captured by the above game.”

From Microeconomics: behavior, institutions, and evolution (Bowles, 2003).



The tipping effect

Total number of players : n, for example n = 5

Sample size : s, for example s = 3

Total number of players currently playing A : m, for example m = 2

Suppose a = b. Let

E1 = { s ∈ S | s ∼ AAAAA } = {AAAAA}

T1 = { s ∈ S | s ∼ AAAAB } = {AAAAB, AAABA, . . . , BAAAA}

T2 = { s ∈ S | s ∼ AAABB } = {AAABB, AABAB, . . . , BBBAA}

T3 = { s ∈ S | s ∼ ABBBB }

E2 = { s ∈ S | s ∼ BBBBB }

• How many idiosyncratic transitions must be made to move from E1 to E2?

• Wat is the resistance from E1 to E2? From E1 to E2?

• What is (are) the stochastically stable state(s)?



Tipping point (general case)

• Suppose we’re in all-B.

• Generally, an individual will

choose for A when

ak ≥ b(s − k) ⇔ k ≥bs

a + b.

(Why “≥” instead of “>”?)

• Thus,

⌈bs

a + b⌉

times an idiosyncratic choice (ok,

error) must be made to move from

BBBBB . . . into the first transient

class that, without further

idiosyncracies, leads to AAAAA.

• With probability ǫ of idiosyncratic

choice this probability is

(ǫ

2)⌈

bsa+b ⌉

Indeed ǫ/2, if we assume that

idiosyncracy is uniformly

distributed among A and B.

In that case, half of the

idiosyncratic choices are

contra-productive again!

• With this payoff matrix, the

Pareto-optimal outcome is

favoured, proved s large enough.



Part 4:

Schelling’s model

of segregation



Schelling’s model in 2D (torus)



Schelling’s model in 1D (circle)

• Schelling (1969, 1971, 1978).

• Isolated people are discontent.

(Other people are content.)

• Possible swaps:

Trade Profit

DD → CC 2

DC → CC 1

CD → DC 0

CC → CD −1

CC → DD −2

• This “problem” can be “solved” in

“hundreds” of ways. (Analytically,

stochastically, whatever.)



Young’s take on Schelling’s model

• Possible trades:

Trade Profit Probability

DD → CC 2 − 2m high

DC → CC 1 − 2m high

CD → DC 0 − 2m low: ǫa

CC → CD −1 − 2m lower: ǫb

CC → DD −2 − 2m lowest: ǫc

Where 0 < a < b < c, and m aremoving osts.• The resulting Markov process is

ergodi and regular.



Recurrent classes are just {{a} | a ∈ Absorbing }

1. To determine all recurrent classes

of P0.

• All absorbing states A are

recurrent.

• If not in absorbing state, then a

mutually advantageous swap is

possible.

Thus, if not in absorbing state,

then transient state.

Therefore, all and only recurrent

classes are singletons of absorbing

states: R = {{a} | a ∈ A}.



Completely segregated vs. dispersed states

• Absorbant states are either ompletely separated or dispersed:

A = S ∪ D.

• For each s, s′ ∈ A, let r(s, s′) be

defined as usual.

Claims:

1. If s ∈ D, there does not exist an

s-tree from A\{s} with only

a-edges.

2. If s ∈ S, there does exist an s-tree

from A\{s} with only a-edges.

3. The classes with lowest potential

are L = {{s} | s ∈ S}.



Claim 1

If s ∈ D, there does not exist an s-tree from A\{s} with only a-edges.



Claim 2: a resistance

If s ∈ S, there does exist an s-tree from A\{s} with only a-edges.



Claim 2: a resistance, discontent individual (no problem)

If s ∈ S, there does exist an s-tree from A\{s} with only a-edges.



Absorbing state ⇔ state with low potential

Claim 1: If s ∈ D, there does not exist

an s-tree from A\{s} with only a-edges.

• Let s ∈ D. We must show that

some edges from A to s have

resistance > a.

• Well, edges from S to s, at least,

necessarily involve moves that

create at least one discontent (=

isolated) individual.

• Therefore, all j-trees from A to D

have resistance b > a or c > a.

Claim 2: If s ∈ S, there does exist an

s-tree from A\{s} with only a-edges.

• Let s ∈ S. We must show that all

edges from A to s have resistance

a.

i) From elements in S to other

elements in S: ok! Put headto tail repeatedly.

ii) From elements in D to

elements in S: ok! Put headto tail of small groupsrepeatedly. If one large luster, ontinue as in i).Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 48

multi-agent learning emergence of conventions · last modiﬁed on april 3rd, 2014 at 13:17 slide...

Documents