markov chain basic

Markov Chain Basic 2014.07.11 Sanghyuk Chun

Markov Chain Basic


Sanghyuk Chun

• Exact Counting

• #P Complete

• Sampling and Counting

Previous Chapters


• Markov Chain Basic

• Ergodic MC has an unique stationary distribution

• Some basic concepts (Coupling, Mixing time)

• Coupling from past

• Coupling detail

• Ising Model

• Bounding Mixing time via Coupling

• Random spanning tree

• Path coupling framework

• MC for k-coloring graph

Remaining Chapters



• Introduce Markov Chain

• Show a potential algorithmic use of Markov Chain for sampling from complex distribution

• Prove that Ergodic Markov Chain always converge to unique stationary distribution

• Introduce Coupling techniques and Mixing time

In this chapter…


• For a finite state space Ω, we say a sequence of random variables (𝑋𝑡) on Ω is a Markov chain if the sequence is Markovian in the following sense

• For all t, all 𝑥0, … , 𝑥𝑡 , 𝑦 ∈ Ω, we require

• Pr 𝑋𝑡+1 = 𝑦 𝑋0 = 𝑥0, … , 𝑋𝑡 = 𝑥𝑡 = Pr(𝑋𝑡+1 = 𝑦|𝑋𝑡 = 𝑥𝑡)

• The Markov property: “Memoryless”

Markov Chain


• For a finite state space Ω, we say a sequence of random variables (𝑋𝑡) on Ω is a Markov chain if the sequence is Markovian

• Let’s Ω denote the set of shuffling (ex. 𝑋1=1,2,3,…,52)

• The next shuffling state only depends on previous shuffling state, or 𝑋𝑡 only depends on 𝑋𝑡+1

• Question 1: How can we uniformly shuffle the card?

• Question 2: Can we get fast uniform shuffling algorithm?

Example of Markov Chain (Card Shuffling)


• Transition Matrix

• 𝑃 𝑥, 𝑦 = Pr(𝑋𝑡+1 = 𝑦|𝑋𝑡 = 𝑥)

• Transitions are independent of the time (time-homogeneous)

Transition Matrix


• Transition Matrix

• 𝑃 𝑥, 𝑦 = Pr(𝑋𝑡+1 = 𝑦|𝑋𝑡 = 𝑥)

• Transitions are independent of the time (time-homogeneous)

• The t-step distribution is defined in the natural way

• 𝑃𝑡 𝑥, 𝑦 = 𝑃 𝑥, 𝑦 , 𝑡 = 1

𝑧∈Ω𝑃 𝑥, 𝑧 𝑃𝑡−1(𝑧, 𝑦) , 𝑡 > 1

Transition Matrix


• A distribution 𝜋 is a stationary distribution if it is invariant with respect to the transition matrix

for all 𝑦 ∈ Ω, 𝜋 𝑦 =


𝜋 𝑥 𝑃(𝑥, 𝑦)

• Theorem 1

• For a finite ergodic Markov Chain, there exists a unique stationary distribution 𝜋

• Proof?

Stationary Distribution


• A Markov Chain is ergodic if there exists t such that for all x, 𝑦 ∈ Ω, 𝑃 𝑥, 𝑦 𝑡 > 0

• It is possible to go from every state to every state (not necessarily in one move)

• For finite MC following conditions are equivalent to ergodicity

• Irreducible:

• For all 𝑥, 𝑦 ∈ Ω, there exists 𝑡 = 𝑡 𝑥, 𝑦 𝑠. 𝑡. 𝑃𝑡 𝑥, 𝑦 > 0

• Aperiodic:

• For all 𝑥 ∈ Ω, gcd 𝑡: 𝑃𝑡 𝑥, 𝑥 > 0 = 1

• Since regardless of their initial state, Ergodic MCs eventually reach a unique stationary distribution, EMCs are useful algorithmic tools

Ergodic Markov Chain


• Goal: we have a probability distribution we’d like to generate random sample from

• Solution via MC: If we can design an ergodic MC whose unique stationary distribution is desired distribution, we then run the chain and can get the distribution

• Example: sampling matching

Algorithmic usage of Ergodic Markov Chains


• For a graph 𝐺 = (𝑉, 𝐸), let Ω denote the set of matching of G

• We define a MC on Ω whose transitions are as• Choose an edge e uniformly at random from E

• Let,

𝑋′ = 𝑋𝑡 ∪ 𝑒, if 𝑒 ∉ 𝑋𝑡

𝑋𝑡\𝑒, if 𝑒 ∈ 𝑋𝑡

• If 𝑋′ ∈ Ω, then set 𝑋𝑡+1 = 𝑋′ with probability ½ ; otherwise set 𝑋𝑡+1 = 𝑋𝑡

• The MC is aperiodic (𝑃 𝑀,𝑀 ≥ 1/2 for all 𝑀 ∈ Ω)

• The MC is irreducible (via empty set) with symmetric transition probabilities

• Symmetric transition matrix has uniform stationary distribution

• Thus, the unique stationary distribution is uniform over all matching of G

Sampling Matching


• We will prove the theorem using the coupling technique and coupling Lemma

Proof of Theorem (introduction)


• For distribution 𝜇, 𝜈 on a finite set Ω, a distribution ω on Ω × Ωis a coupling if

• In other words, ω is a joint distribution whose marginal distributions are the appropriate distributions

• Variation distance between 𝜇, 𝜈 is defined as

Coupling Technique


Coupling Lemma


Page 16: Markov Chain Basic

Network Intelligence and Analysis Lab

Proof of Lemma (a)


• For all 𝑧 ∈ Ω, let

• 𝜔 𝑧, 𝑧 = min{𝜇 𝑧 , 𝜈(𝑧)}

• 𝑑𝑇𝑉 = Pr(𝑋 ≠ 𝑌)

• We need to complete the construction of w in valid way

• For y, 𝑧 ∈ Ω, y ≠ 𝑧, let

• It is straight forward to verify that w is valid coupling

Proof of Lemma (b)


• Consider a pair of Markov chains 𝑋𝑡 , 𝑌𝑡 on Ω with transition matrices 𝑃𝑋, 𝑃𝑌 respectively

• Typically, MCs are identical in applications (𝑃𝑋 = 𝑃𝑌)

• The Markov chain Xt′ , Yt

′ on Ω × Ω is a Markovian coupling if

• For such a Markovian coupling we have variance distance as

• If we choose 𝑌0 as stationary distribution π then we have

• This shows how can we use coupling to bound the distance from stationary

Couplings for Markov Chain


• Create MCs 𝑋𝑡 , 𝑌𝑡, where initial 𝑋0, 𝑌0 are arbitrary state on Ω

• Create coupling for there chains in the following way

• From 𝑋𝑡 , 𝑌𝑡, choose 𝑋𝑡+1 according to transition matrix P

• If Yt = 𝑋𝑡 , set Yt+1 = 𝑋𝑡+1, otherwise choose Yt+1 according to P, independent of the choice for 𝑋𝑡

• By ergodicity, there exist 𝑡∗ s. t. for all 𝑥, 𝑦 ∈ Ω, 𝑃𝑡∗ 𝑥, 𝑦 ≥ 𝜖 > 0

• Therefore, for all 𝑋0, 𝑌0 ∈ Ω

• We can see similarly get step 𝑡∗ → 2𝑡∗

Proof of Theorem (1/4)


• Create coupling for there chains in the following way

• From 𝑋𝑡 , 𝑌𝑡, choose 𝑋𝑡+1 according to transition matrix P

• If Yt = 𝑋𝑡 , set Yt+1 = 𝑋𝑡+1, otherwise choose Yt+1 according to P, independent of the choice for 𝑋𝑡

• If once 𝑋𝑠 = 𝑌𝑠 , we have 𝑋𝑠′ = 𝑌𝑠′ for all 𝑠′ ≥ 𝑠

• From earlier observation,

Proof of Theorem (2/4)


• For integer k > 0,

• Therefore,

• Since Xt = 𝑌𝑡 , implies Xt+1 = 𝑌𝑡+1, for all 𝑡′ ≥ 𝑡, we have

• Note that coupling of MC we defined, defines a coupling of Xt, 𝑌𝑡. Hence by Coupling Lemma,

• This proves that from any initial points we reach same distribution

Proof of Theorem (3/4)


For all 𝑥, 𝑦 ∈ Ω

• From previous result, we proves there is a limiting distribution σ

• Question: is σ a stationary distribution? Or satisfies

for all 𝑦 ∈ Ω, 𝜎 𝑦 =


𝜎 𝑥 𝑃(𝑥, 𝑦)

Proof of Theorem (4/4)


• Convergence itself is guaranteed if MC is Ergodic MC

• However, it gives no indication to the convergence rate

• We define the mixing time 𝜏𝑚𝑖𝑥(𝜖) as the time until the chain is within variance distance ε from the worst initial state

• 𝜏𝑚𝑖𝑥 𝜖 = maxmin𝑋0∈Ω{ 𝑡: 𝑑𝑇𝑉 𝑃𝑡 𝑋0, ∙ , 𝜋 ≤ ϵ }

• To get efficient sampling algorithms (e.x. matching chain), we hope that mixing time is polynomial for input size

Markov Chains for Algorithmic Purpose: Mixing Time