markov chain basic
TRANSCRIPT
Network Intelligence and Analysis Lab
Network Intelligence and Analysis Lab
Markov Chain Basic
2014.07.11
Sanghyuk Chun
Network Intelligence and Analysis Lab
• Exact Counting
• #P Complete
• Sampling and Counting
Previous Chapters
2
Network Intelligence and Analysis Lab
• Markov Chain Basic
• Ergodic MC has an unique stationary distribution
• Some basic concepts (Coupling, Mixing time)
• Coupling from past
• Coupling detail
• Ising Model
• Bounding Mixing time via Coupling
• Random spanning tree
• Path coupling framework
• MC for k-coloring graph
Remaining Chapters
3
Today!
Network Intelligence and Analysis Lab
• Introduce Markov Chain
• Show a potential algorithmic use of Markov Chain for sampling from complex distribution
• Prove that Ergodic Markov Chain always converge to unique stationary distribution
• Introduce Coupling techniques and Mixing time
In this chapter…
4
Network Intelligence and Analysis Lab
• For a finite state space Ω, we say a sequence of random variables (𝑋𝑡) on Ω is a Markov chain if the sequence is Markovian in the following sense
• For all t, all 𝑥0, … , 𝑥𝑡 , 𝑦 ∈ Ω, we require
• Pr 𝑋𝑡+1 = 𝑦 𝑋0 = 𝑥0, … , 𝑋𝑡 = 𝑥𝑡 = Pr(𝑋𝑡+1 = 𝑦|𝑋𝑡 = 𝑥𝑡)
• The Markov property: “Memoryless”
Markov Chain
5
Network Intelligence and Analysis Lab
• For a finite state space Ω, we say a sequence of random variables (𝑋𝑡) on Ω is a Markov chain if the sequence is Markovian
• Let’s Ω denote the set of shuffling (ex. 𝑋1=1,2,3,…,52)
• The next shuffling state only depends on previous shuffling state, or 𝑋𝑡 only depends on 𝑋𝑡+1
• Question 1: How can we uniformly shuffle the card?
• Question 2: Can we get fast uniform shuffling algorithm?
Example of Markov Chain (Card Shuffling)
6
Network Intelligence and Analysis Lab
• Transition Matrix
• 𝑃 𝑥, 𝑦 = Pr(𝑋𝑡+1 = 𝑦|𝑋𝑡 = 𝑥)
• Transitions are independent of the time (time-homogeneous)
Transition Matrix
7
Network Intelligence and Analysis Lab
• Transition Matrix
• 𝑃 𝑥, 𝑦 = Pr(𝑋𝑡+1 = 𝑦|𝑋𝑡 = 𝑥)
• Transitions are independent of the time (time-homogeneous)
• The t-step distribution is defined in the natural way
• 𝑃𝑡 𝑥, 𝑦 = 𝑃 𝑥, 𝑦 , 𝑡 = 1
𝑧∈Ω𝑃 𝑥, 𝑧 𝑃𝑡−1(𝑧, 𝑦) , 𝑡 > 1
Transition Matrix
8
Network Intelligence and Analysis Lab
• A distribution 𝜋 is a stationary distribution if it is invariant with respect to the transition matrix
•
for all 𝑦 ∈ Ω, 𝜋 𝑦 =
𝑥∈Ω
𝜋 𝑥 𝑃(𝑥, 𝑦)
• Theorem 1
• For a finite ergodic Markov Chain, there exists a unique stationary distribution 𝜋
• Proof?
Stationary Distribution
9
Network Intelligence and Analysis Lab
• A Markov Chain is ergodic if there exists t such that for all x, 𝑦 ∈ Ω, 𝑃 𝑥, 𝑦 𝑡 > 0
• It is possible to go from every state to every state (not necessarily in one move)
• For finite MC following conditions are equivalent to ergodicity
• Irreducible:
• For all 𝑥, 𝑦 ∈ Ω, there exists 𝑡 = 𝑡 𝑥, 𝑦 𝑠. 𝑡. 𝑃𝑡 𝑥, 𝑦 > 0
• Aperiodic:
• For all 𝑥 ∈ Ω, gcd 𝑡: 𝑃𝑡 𝑥, 𝑥 > 0 = 1
• Since regardless of their initial state, Ergodic MCs eventually reach a unique stationary distribution, EMCs are useful algorithmic tools
Ergodic Markov Chain
10
Network Intelligence and Analysis Lab
• Goal: we have a probability distribution we’d like to generate random sample from
• Solution via MC: If we can design an ergodic MC whose unique stationary distribution is desired distribution, we then run the chain and can get the distribution
• Example: sampling matching
Algorithmic usage of Ergodic Markov Chains
11
Network Intelligence and Analysis Lab
• For a graph 𝐺 = (𝑉, 𝐸), let Ω denote the set of matching of G
• We define a MC on Ω whose transitions are as• Choose an edge e uniformly at random from E
• Let,
𝑋′ = 𝑋𝑡 ∪ 𝑒, if 𝑒 ∉ 𝑋𝑡
𝑋𝑡\𝑒, if 𝑒 ∈ 𝑋𝑡
• If 𝑋′ ∈ Ω, then set 𝑋𝑡+1 = 𝑋′ with probability ½ ; otherwise set 𝑋𝑡+1 = 𝑋𝑡
• The MC is aperiodic (𝑃 𝑀,𝑀 ≥ 1/2 for all 𝑀 ∈ Ω)
• The MC is irreducible (via empty set) with symmetric transition probabilities
• Symmetric transition matrix has uniform stationary distribution
• Thus, the unique stationary distribution is uniform over all matching of G
Sampling Matching
12
Network Intelligence and Analysis Lab
• We will prove the theorem using the coupling technique and coupling Lemma
Proof of Theorem (introduction)
13
Network Intelligence and Analysis Lab
• For distribution 𝜇, 𝜈 on a finite set Ω, a distribution ω on Ω × Ωis a coupling if
• In other words, ω is a joint distribution whose marginal distributions are the appropriate distributions
• Variation distance between 𝜇, 𝜈 is defined as
Coupling Technique
14
Network Intelligence and Analysis Lab
Coupling Lemma
15
Network Intelligence and Analysis Lab
Proof of Lemma (a)
16
Network Intelligence and Analysis Lab
• For all 𝑧 ∈ Ω, let
• 𝜔 𝑧, 𝑧 = min{𝜇 𝑧 , 𝜈(𝑧)}
• 𝑑𝑇𝑉 = Pr(𝑋 ≠ 𝑌)
• We need to complete the construction of w in valid way
• For y, 𝑧 ∈ Ω, y ≠ 𝑧, let
• It is straight forward to verify that w is valid coupling
Proof of Lemma (b)
17
Network Intelligence and Analysis Lab
• Consider a pair of Markov chains 𝑋𝑡 , 𝑌𝑡 on Ω with transition matrices 𝑃𝑋, 𝑃𝑌 respectively
• Typically, MCs are identical in applications (𝑃𝑋 = 𝑃𝑌)
• The Markov chain Xt′ , Yt
′ on Ω × Ω is a Markovian coupling if
• For such a Markovian coupling we have variance distance as
• If we choose 𝑌0 as stationary distribution π then we have
• This shows how can we use coupling to bound the distance from stationary
Couplings for Markov Chain
18
Network Intelligence and Analysis Lab
• Create MCs 𝑋𝑡 , 𝑌𝑡, where initial 𝑋0, 𝑌0 are arbitrary state on Ω
• Create coupling for there chains in the following way
• From 𝑋𝑡 , 𝑌𝑡, choose 𝑋𝑡+1 according to transition matrix P
• If Yt = 𝑋𝑡 , set Yt+1 = 𝑋𝑡+1, otherwise choose Yt+1 according to P, independent of the choice for 𝑋𝑡
• By ergodicity, there exist 𝑡∗ s. t. for all 𝑥, 𝑦 ∈ Ω, 𝑃𝑡∗ 𝑥, 𝑦 ≥ 𝜖 > 0
• Therefore, for all 𝑋0, 𝑌0 ∈ Ω
• We can see similarly get step 𝑡∗ → 2𝑡∗
Proof of Theorem (1/4)
19
Network Intelligence and Analysis Lab
• Create coupling for there chains in the following way
• From 𝑋𝑡 , 𝑌𝑡, choose 𝑋𝑡+1 according to transition matrix P
• If Yt = 𝑋𝑡 , set Yt+1 = 𝑋𝑡+1, otherwise choose Yt+1 according to P, independent of the choice for 𝑋𝑡
• If once 𝑋𝑠 = 𝑌𝑠 , we have 𝑋𝑠′ = 𝑌𝑠′ for all 𝑠′ ≥ 𝑠
• From earlier observation,
Proof of Theorem (2/4)
20
Network Intelligence and Analysis Lab
• For integer k > 0,
• Therefore,
• Since Xt = 𝑌𝑡 , implies Xt+1 = 𝑌𝑡+1, for all 𝑡′ ≥ 𝑡, we have
• Note that coupling of MC we defined, defines a coupling of Xt, 𝑌𝑡. Hence by Coupling Lemma,
• This proves that from any initial points we reach same distribution
Proof of Theorem (3/4)
21
For all 𝑥, 𝑦 ∈ Ω
Network Intelligence and Analysis Lab
• From previous result, we proves there is a limiting distribution σ
• Question: is σ a stationary distribution? Or satisfies
for all 𝑦 ∈ Ω, 𝜎 𝑦 =
𝑥∈Ω
𝜎 𝑥 𝑃(𝑥, 𝑦)
Proof of Theorem (4/4)
22
Network Intelligence and Analysis Lab
• Convergence itself is guaranteed if MC is Ergodic MC
• However, it gives no indication to the convergence rate
• We define the mixing time 𝜏𝑚𝑖𝑥(𝜖) as the time until the chain is within variance distance ε from the worst initial state
• 𝜏𝑚𝑖𝑥 𝜖 = maxmin𝑋0∈Ω{ 𝑡: 𝑑𝑇𝑉 𝑃𝑡 𝑋0, ∙ , 𝜋 ≤ ϵ }
• To get efficient sampling algorithms (e.x. matching chain), we hope that mixing time is polynomial for input size
Markov Chains for Algorithmic Purpose: Mixing Time
23