multi-agent learning emergence of conventions · last modified on april 3rd, 2014 at 13:17 slide...
TRANSCRIPT
Multi-agent learning Emergence of Conventions
Multi-agent learningEmergen e of Conventions
Gerard Vreeswijk, Intelligent Systems Group, Computer Science Department,
Faculty of Sciences, Utrecht University, The Netherlands.
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 1
Multi-agent learning Emergence of Conventions
Motivation
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 2
Multi-agent learning Emergence of Conventions
Simple example of a Markov process
• Return probabilities are usually
omitted in diagrams.
• In this case it can be derived that,
on average,{
P(Sun) = 6/7
P(Rain) = 1/7
• How?
We’ll see . . .
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 3
Multi-agent learning Emergence of Conventions
Plan for today
1. Markov processes. (Ergodic process, communicating states/class, transient
state/class, recurrent state/class, periodic state/class, absorbing state,
irreducible process, stationary distribution.)
Compute stationary distributions:
• Solve n linear equations.
• Compare n so-called z-trees (Freidlin and Wentzell, 1984).
2. Perturbed Markov processes. (Regular perturbed Markov process,
punctuated equilibrium, stochastically stable state.)
Compute stochastically stable states:
• Compare k so-called z-trees, where k is the number of so-called recurrent
classes (Peyton Young, 1993).
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 4
Multi-agent learning Emergence of Conventions
Plan for today
3. Applications.
• Emergence of a currency standard.
• Competing technologies: operating system A vs. operating system B.
• Competing technologies: cell phone company A vs. cell phone
company B. (If time allows.)
• Schelling’s model of segregation (1969).
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 5
Multi-agent learning Emergence of Conventions
Part 1: Markov processes
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 6
Multi-agent learning Emergence of Conventions
State transitions
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 7
Multi-agent learning Emergence of Conventions
Communication classes
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 8
Multi-agent learning Emergence of Conventions
Start state matters
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 9
Multi-agent learning Emergence of Conventions
Start state matters. . . but here it does not
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 10
Multi-agent learning Emergence of Conventions
The stationary distribution (and computing one)
P(A) = P(A|A′)P(A′) + P(A|B′)P(B′) + P(A|C′)P(C′) + P(A|D′)P(D′)
Let us assume that visiting probabilities are stationary (A = A′, B = B′, . . . ):
= P(A|A)P(A) + P(A|B)P(B) + P(A|C)P(C) + P(A|D)P(D)
= 0 · P(A) + 0 · P(B) + 1 · P(C) + 0 · P(D)
= P(C)
Let us write this as A = C. Similarly, B = 0.8A, C = D, and D = 0.2A + B.
Four equations with four unknowns. (Always regular, i.e. Det 6= 0 ?)
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 11
Multi-agent learning Emergence of Conventions
Theory of discrete Markov processes
Definitions:
• Stationary distribution: fixed point
of transition probabilities.
• Empirical distribution: long run
normalised frequency of visits.
• Limit distribution: long run
probability to visit a node.
• Process is path-dependent:
empirical distribution depends on
start state. Ergodic otherwise.
• Class is recurrent: process cannot
escape. Transient otherwise.
• Process is irreducible: all states can
reach each other.
Facts:
• Node is recurrent: process will
return to it a.s.
• If finite number of states:
– At least one recurrence class.
– If precisely one recurrence class
then ergodic, and conversely.
• Stationary distribution always
exists.
Unique iff ergodic. In that case,
stationary distr. ≡ empirical distr.
• If ergodic and a-periodic, then
stationary distr. ≡ limit distr.
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 12
Multi-agent learning Emergence of Conventions
Finding stationary distributions with many states is difficult
• Solve n equations in nunknowns. What if S is large?
0.1 0.2 0.0 0.1 0.0 0.1 0.0 0.3 0.0 0.2
0.1 0.2 0.0 0.1 0.0 0.1 0.0 0.3 0.0 0.2
0.1 0.2 0.0 0.1 0.0 0.1 0.0 0.3 0.0 0.2
0.0 0.1 0.1 0.2 0.0 0.1 0.0 0.3 0.0 0.2
0.5 0.2 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.2
0.1 0.2 0.0 0.1 0.0 0.1 0.0 0.3 0.0 0.2
0.0 0.1 0.1 0.2 0.0 0.1 0.0 0.3 0.0 0.2
0.1 0.2 0.0 0.1 0.0 0.1 0.0 0.3 0.0 0.2
0.3 0.1 0.2 0.0 0.1 0.0 0.0 0.0 0.3 0.0
0.1 0.2 0.0 0.1 0.0 0.1 0.0 0.3 0.0 0.2
• Freidlin & Wentzell (1984):
only look at so-called state
trees.
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 13
Multi-agent learning Emergence of Conventions
An irreducible (and finite) Markov process
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 14
Multi-agent learning Emergence of Conventions
One possible A-tree
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 15
Multi-agent learning Emergence of Conventions
Another possible A-tree
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 16
Multi-agent learning Emergence of Conventions
A perhaps easier way to compute the stationary distribution
• An s-tree, Ts, is a complete collection of disjoint paths from states 6= s to s.
• The likelihood of an s-tree Ts, written ℓ(Ts), =Def the product of its edge
probabilities.
• The likelihood of a state s, written ℓ(s), =Def sum of the likelihood of all
s-trees.
Theorem (Freidlin & Wentzell, 1984). Let P be an irreducible
finite Markov process. Then, for all states, the likelihood of that
state is proportional to the stationary probability of that state.
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 17
Multi-agent learning Emergence of Conventions
Counting s-trees with Freidlin & Wentzell: example
Freidlin & Wentzell (1984):
µ(s) =v(s)
∑t∈S v(t), where v(t) =Def ∑
T∈Ts
ℓ(Ts)
The unique C-tree is coloured red. Computing ℓ(TC) = 10ǫ· 1/4· . . . = 5ǫ3/12.
Similarly:
State: A B C D E F G
Distribution: ǫ2/24 5ǫ3/9 5ǫ3/12 5ǫ2/24 ǫ2/24 ǫ/48 ǫ/32
Note what happens if ǫ → 0.
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 18
Multi-agent learning Emergence of Conventions
Part 2:
Perturbed Markov processes
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 19
Multi-agent learning Emergence of Conventions
Motivation
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 20
Multi-agent learning Emergence of Conventions
Most Markov processes are path-dependent (non-ergodic)
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 21
Multi-agent learning Emergence of Conventions
Make them ergodic by perturbing with ǫr(s,s′) here and there
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 22
Multi-agent learning Emergence of Conventions
Compute s-trees from P0-recurrent classes only (!)
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 23
Multi-agent learning Emergence of Conventions
Compute s-trees from P0-recurrent classes only (!)
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 24
Multi-agent learning Emergence of Conventions
Class {B, D, E} possesses lowest stochastic potential, viz. 4.
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 25
Multi-agent learning Emergence of Conventions
Example of P0 and Pǫ
limǫ→0
0.0 0.2 0.2 0.1 0.5
0.3 ǫ7 0.1 0.1 0.5 − ǫ7
0.1 0.2 0.2 0.0 0.5
0.7 0.1 0.2 0.0 0.0
0.1 0.2 − ǫ2/2 0.2 ǫ2 0.5 − ǫ2/2
0.0 0.0 0.1 0.0 0.9
=
0.0 0.2 0.2 0.1 0.5
0.3 0.0 0.1 0.1 0.5
0.1 0.2 0.2 0.0 0.5
0.7 0.1 0.2 0.0 0.0
0.1 0.2 0.2 0.0 0.5
0.0 0.0 0.1 0.0 0.9
• Notice that some P0-positive probabilities “have to give way” to perturbe
P0-zero probabilities with ǫ.
(Because row probabilities must add up to 1.)
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 26
Multi-agent learning Emergence of Conventions
Perturbed Markov processes
• P0 is a Markov process on a finite state space S.
• Let, for each ǫ ∈ (0, ǫ∗], Pǫ be a Markov process on the same state space.
• The collection
{Pǫ | ǫ ∈ (0, ǫ∗]}
is a regular perturbation of P0 if
1. Each Pǫ is ergodic.
2. It holds that limǫ→0 Pǫ = P0.
3. If Pǫ
s,s′ > 0, for some ǫ > 0, then
0 < limǫ→0
Pǫ
ǫr(s,s′)< ∞
for some r(s, s′) ≥ 0. This number is called the resistance from s tot s′.
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 27
Multi-agent learning Emergence of Conventions
Resistance
1. Each Pǫ is ergodic.
2. It holds that limǫ→0 Pǫ = P0.
3. If Pǫ
s,s′ > 0, for some ǫ > 0, then
0 < limǫ→0
Pǫ
ǫr(s,s′)< ∞
for some r(s, s′) ≥ 0.
4. For transitions s → s′ where P0s,s′ = Pǫ
s,s′ = 0 the resistance is defined to be
∞.
Note:
• The number r(s, s′) is well-defined!
• If P0s,s′ > 0 then r(s, s′) = 0.
• If r(s, s′) = 0 then P0s,s′ > 0.
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 28
Multi-agent learning Emergence of Conventions
Stochastic stability
• Because each Pǫ is ergodic, the stationary distribution µǫ is uniquely
defined, for every ǫ ∈ (0, ǫ∗].
• It can be shown that
limǫ→0
µǫ(s)
exists for every s. Let us call this distribution µ0.
• A state s is said to be stochastically stable if
µ0(s) > 0
Remarks:
• It can be shown that µ0 is a stationary distribution of P0.
• It follows that every regular perturbed Markov process possesses at least
one stochastically stable state.
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 29
Multi-agent learning Emergence of Conventions
A way to compute stochastically stable states
• Recurrent classes 1, . . . , K.
• The resistance of a path from i to j
=Def the sum of edge resistances.
(Why the sum?)
• Construct edges rij (between
classes) with the minimum
resistance from i to j.
• The resistance of j-tree Tj, written
r(Tj), =Def sum of edge resistances
(in class graph).
• The stochastic potential of
recurrence class j, written p(j),=Def the minimum resistance over
all j-trees.
Theorem (Young, 1993). Let
{Pǫ | ǫ ∈ (0, ǫ∗]}
be a regular perturbed Markov
process, and let µǫ be the unique
stationary distribution of Pǫ, ǫ > 0.
Then
• limǫ→0 µǫ = µ0 exists.
• µ0 is a stationary distribution of P0.
• The stochastically stable states are
precisely those that are contained
in the recurrent class(es) of P0 with
minimum stochastic potential.
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 30
Multi-agent learning Emergence of Conventions
Minimum path resistance: example
• Compute path resistance between all K recurrent classes.
• With K recurrent classes there are always K(K − 1) minimum path
resistances to be computed. (We work on KK [unfortunate notation].)
Example:
• Suppose there are three recurrent classes E1, E2, and E3.
• Minimum path resistances here are 1, 5, 6, 7, 8, 9.
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 31
Multi-agent learning Emergence of Conventions
Nine j-trees generated by three recurrence classes
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 32
Multi-agent learning Emergence of Conventions
Revisit earlier example
1. The unperturbed Markov process
P0 possesses two recurrent classes,
viz. E1 = {A} and E2 = {F, G}.
2. Least resistance from E1 to E2 is
10ǫ· . . . = ǫ1/32. Resistance 1.
3. Least resistance from E2 to E1 is
1/3· ǫ· . . . = ǫ2/24. Resistance 2.
4. There is only one resistance tree to
either side, hence one minimum
resistance tree.
5. Stochastic potential of E1 is 2;
stochastic potential of E2 is 1.
6. Conclusion: E2 is stochastically
stable, E1 is not.
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 33
Multi-agent learning Emergence of Conventions
Part 3: Applications
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 34
Multi-agent learning Emergence of Conventions
Technology adoption
Other:
Operating system A Operating system BYou: Operating system A (a, a) (0, 0)
Operating system B (0, 0) (b, b)
Total number of players : n, for example n = 5
Sample size : s, for example s = 3
Total number of players currently playing A : m, for example m = 2
P( individual chooses A | AABBB ) = 3/
(
5
3
)
= 3/10
P( #A′s = k | AABBB ) =
(
5
3
)
(3
10)k(
7
10)5−k
This process is path-dependent (non-ergodic): for example always
BABBB, BABBB, etc. → BBBBB. With b ≫ a even BAABB, etc. → BBBBB.
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 35
Multi-agent learning Emergence of Conventions
Idiosyncratic play in technology adoption
“How, then, might institutional change occur? Because best-response
play renders both conventions absorbing states, it is clear that in order
to understand institutional change, some kind of nonbest-response
play must be introduced. Suppose there is a probability ǫ that when
individuals are in the process of updating, each may switch their type
for idiosyncratic reasons. Thus, 1 − ǫ represents the probability that
the individual pursues the best-response updating process described
above. The idiosyn rati play a ounting for nonbest responses need notbe irrational or odd; it simply represents actions whose reasons are not
explicitly modeled. Included is experimentation, whim, error, and
intentional acts seeking to affect game outcomes but whose
motivations are not captured by the above game.”
From Microeconomics: behavior, institutions, and evolution (Bowles, 2003).
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 36
Multi-agent learning Emergence of Conventions
The tipping effect
Total number of players : n, for example n = 5
Sample size : s, for example s = 3
Total number of players currently playing A : m, for example m = 2
Suppose a = b. Let
E1 = { s ∈ S | s ∼ AAAAA } = {AAAAA}
T1 = { s ∈ S | s ∼ AAAAB } = {AAAAB, AAABA, . . . , BAAAA}
T2 = { s ∈ S | s ∼ AAABB } = {AAABB, AABAB, . . . , BBBAA}
T3 = { s ∈ S | s ∼ ABBBB }
E2 = { s ∈ S | s ∼ BBBBB }
• How many idiosyncratic transitions must be made to move from E1 to E2?
• Wat is the resistance from E1 to E2? From E1 to E2?
• What is (are) the stochastically stable state(s)?
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 37
Multi-agent learning Emergence of Conventions
Tipping point (general case)
• Suppose we’re in all-B.
• Generally, an individual will
choose for A when
ak ≥ b(s − k) ⇔ k ≥bs
a + b.
(Why “≥” instead of “>”?)
• Thus,
⌈bs
a + b⌉
times an idiosyncratic choice (ok,
error) must be made to move from
BBBBB . . . into the first transient
class that, without further
idiosyncracies, leads to AAAAA.
• With probability ǫ of idiosyncratic
choice this probability is
(ǫ
2)⌈
bsa+b ⌉
Indeed ǫ/2, if we assume that
idiosyncracy is uniformly
distributed among A and B.
In that case, half of the
idiosyncratic choices are
contra-productive again!
• With this payoff matrix, the
Pareto-optimal outcome is
favoured, proved s large enough.
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 38
Multi-agent learning Emergence of Conventions
Part 4:
Schelling’s model
of segregation
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 39
Multi-agent learning Emergence of Conventions
Schelling’s model in 2D (torus)
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 40
Multi-agent learning Emergence of Conventions
Schelling’s model in 1D (circle)
• Schelling (1969, 1971, 1978).
• Isolated people are discontent.
(Other people are content.)
• Possible swaps:
Trade Profit
DD → CC 2
DC → CC 1
CD → DC 0
CC → CD −1
CC → DD −2
• This “problem” can be “solved” in
“hundreds” of ways. (Analytically,
stochastically, whatever.)
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 41
Multi-agent learning Emergence of Conventions
Young’s take on Schelling’s model
• Possible trades:
Trade Profit Probability
DD → CC 2 − 2m high
DC → CC 1 − 2m high
CD → DC 0 − 2m low: ǫa
CC → CD −1 − 2m lower: ǫb
CC → DD −2 − 2m lowest: ǫc
Where 0 < a < b < c, and m aremoving osts.• The resulting Markov process is
ergodi and regular.
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 42
Multi-agent learning Emergence of Conventions
Recurrent classes are just {{a} | a ∈ Absorbing }
1. To determine all recurrent classes
of P0.
• All absorbing states A are
recurrent.
• If not in absorbing state, then a
mutually advantageous swap is
possible.
Thus, if not in absorbing state,
then transient state.
Therefore, all and only recurrent
classes are singletons of absorbing
states: R = {{a} | a ∈ A}.
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 43
Multi-agent learning Emergence of Conventions
Completely segregated vs. dispersed states
• Absorbant states are either ompletely separated or dispersed:
A = S ∪ D.
• For each s, s′ ∈ A, let r(s, s′) be
defined as usual.
Claims:
1. If s ∈ D, there does not exist an
s-tree from A\{s} with only
a-edges.
2. If s ∈ S, there does exist an s-tree
from A\{s} with only a-edges.
3. The classes with lowest potential
are L = {{s} | s ∈ S}.
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 44
Multi-agent learning Emergence of Conventions
Claim 1
If s ∈ D, there does not exist an s-tree from A\{s} with only a-edges.
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 45
Multi-agent learning Emergence of Conventions
Claim 2: a resistance
If s ∈ S, there does exist an s-tree from A\{s} with only a-edges.
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 46
Multi-agent learning Emergence of Conventions
Claim 2: a resistance, discontent individual (no problem)
If s ∈ S, there does exist an s-tree from A\{s} with only a-edges.
Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 47
Multi-agent learning Emergence of Conventions
Absorbing state ⇔ state with low potential
Claim 1: If s ∈ D, there does not exist
an s-tree from A\{s} with only a-edges.
• Let s ∈ D. We must show that
some edges from A to s have
resistance > a.
• Well, edges from S to s, at least,
necessarily involve moves that
create at least one discontent (=
isolated) individual.
• Therefore, all j-trees from A to D
have resistance b > a or c > a.
Claim 2: If s ∈ S, there does exist an
s-tree from A\{s} with only a-edges.
• Let s ∈ S. We must show that all
edges from A to s have resistance
a.
i) From elements in S to other
elements in S: ok! Put headto tail repeatedly.
ii) From elements in D to
elements in S: ok! Put headto tail of small groupsrepeatedly. If one large luster, ontinue as in i).Gerard Vreeswijk. Last modified on April 3rd, 2014 at 13:17 Slide 48