characterization of cutoff for reversible markov chainscharacterization of cutoff for reversible...

Characterization of cutoff for reversible Markov chainsYuval Peres

Joint work with Riddhi Basu and Jonathan Hermon

23 Feb 2015

Joint work with Riddhi Basu and Jonathan Hermon Characterization of cutoff for reversible Markov chains 1 / 35

Notation

Transition matrix - P (reversible).

Stationary dist. - π.

Reversibility: π(x)P (x, y) = π(y)P (y, x), ∀x, y ∈ Ω.

Laziness P (x, x) ≥ 1/2, ∀x ∈ Ω.

TV distance

The total-variation distance between 2 dist. µ, ν on Ω is:

‖µ− ν‖TVd= max

A⊂Ωµ(A)− ν(A) .

d(t)d= max

x‖Ptx − π‖TV,

where Ptx(y) := P t(x, y).

The ε-mixing-time (0 < ε < 1) is:

tmix(ε)d= min t : d(t) ≤ ε

tmixd= tmix(1/4).

TV distance

The total-variation distance between 2 dist. µ, ν on Ω is:

‖µ− ν‖TVd= max

A⊂Ωµ(A)− ν(A) .

d(t)d= max

x‖Ptx − π‖TV,

where Ptx(y) := P t(x, y).

The ε-mixing-time (0 < ε < 1) is:

tmix(ε)d= min t : d(t) ≤ ε

tmixd= tmix(1/4).

Cutoff - definition

Def: a sequence of MCs (X(n)t ) exhibits cutoff if

t(n)mix(ε)− t(n)

mix(1− ε) = o(t(n)mix), ∀ 0 < ε < 1/4. (1)

(wn) is called a cutoff window for (X(n)t ) if: wn = o

(t(n)mix

), and

t(n)mix(ε)− t(n)

mix(1− ε) ≤ cεwn, ∀n ≥ 1,∀ε ∈ (0, 1/4).

Cutoff - definition

Def: a sequence of MCs (X(n)t ) exhibits cutoff if

t(n)mix(ε)− t(n)

mix(1− ε) = o(t(n)mix), ∀ 0 < ε < 1/4. (1)

(wn) is called a cutoff window for (X(n)t ) if: wn = o

(t(n)mix

), and

t(n)mix(ε)− t(n)

mix(1− ε) ≤ cεwn, ∀n ≥ 1, ∀ε ∈ (0, 1/4).

Cutoff

Figure : cutoff

Examples

Cutoff was first identified for random transpositions (Diaconis & Shashahani 81).tmix = n logn

2and wn = n . Many other card shuffling schemes have cutoff.

Lazy simple random walk on the hypercube 0, 1n exhibits a cutoff withtmix = n logn

2(half the coupon collector time) and wn = n (Aldous 83).

Biased nearest neighbor random walk on an 1, 2, . . . , n has cutoff, with a biastoward n has cutoff around E1[Tn].

The glauber dynamics of the Ising model in large temperature has cutoff for manyunderlying graphs (LS).

Lazy simple random walk on the n-cycle does not exhibit a cutoff (tmix = Θ(n2)).Same for higher dimensional tori of side length n.

Examples

Background

Many chains are believed to exhibit cutoff. Verifying the occurrence of cutoffrigorously is usually hard.

The name cutoff was coined by Aldous and Diaconis in their seminal 86 paper:

(AD 86) “the most interesting open problem”: Find verifiable conditions for cutoff.

So far cutoff was mostly studied by understanding examples.

Background

Spectral gap & relaxation-time

Let λ2 be the largest non-trivial e.v. of P .

Definition: gap = 1− λ2 - the spectral gap.

Def: trel := 1gap

- the relaxation-time.

The product condition (Prod. cond.)

In a 2004 Aim workshop I proposed that The product condition (Prod. Cond.) -gap(n)t

(n)mix →∞ (equivalently, t(n)

rel = o(t(n)mix))

should imply cutoff for ”nice” reversible chains.

(It is a necessary condition for cutoff)

It is not always sufficient (Aldous and Pak).

Problem: Find families of MCs s.t. Prod. Cond. =⇒ cutoff.

The product condition (Prod. cond.)

In a 2004 Aim workshop I proposed that The product condition (Prod. Cond.) -gap(n)t

(n)mix →∞ (equivalently, t(n)

rel = o(t(n)mix))

should imply cutoff for ”nice” reversible chains.

(It is a necessary condition for cutoff)

It is not always sufficient (Aldous and Pak).

Problem: Find families of MCs s.t. Prod. Cond. =⇒ cutoff.

Aldous’ example

Figure : Fixed bias to the right conditioned on a non-lazy step. Different holding probabilities along 2parallel branches of equal length.

t(n)rel = O(1).

Aldous’ example

Figure : d(t) is up to negligible terms the probability that the rightmost state was not hit by time t(starting from the leftmost state).

Hitting and Mixing

Def: The hitting time of a set A ⊂ Ω = TA := mint : Xt ∈ A.

Hitting times of “worst” sets are related to mixing:

Fact (Random Target): Under laziness and reversibilitytmix ≤ C maxx

∑y π(y)Ex[Ty] ≤ C maxx,y Ex[Ty].

Generalization (Aldous 82 LW 95) under reversibility and laziness

tmix ≤ C maxx,A

π(A)Ex[TA].

Hitting and Mixing

tmix ≤ C maxx,A

π(A)Ex[TA].

Hitting and Mixing

tmix ≤ C maxx,A

π(A)Ex[TA].

Hitting and Mixing

Oliveira (2011) and independently P.-Sousi (2011) (+ an by Griffiths et al. (2012)):for any irreducible reversible lazy MC

tmix = Θ(tH), where tH := maxx,A:π(A)≥1/2

Ex[TA]. (2)

The threshold 1/2 cannot be improved. For two n-cliques connected by a singleedge sets of π measure ≥ 1/2 + ε are hit in constant number of steps.

d(t) ∼ 12P(the other clique was hit by time t) =⇒ maybe we should look at tails

rather than on expectations.

Hitting and Mixing

Oliveira (2011) and independently P.-Sousi (2011) (+ an by Griffiths et al. (2012)):for any irreducible reversible lazy MC

tmix = Θ(tH), where tH := maxx,A:π(A)≥1/2

Ex[TA]. (2)

The threshold 1/2 cannot be improved. For two n-cliques connected by a singleedge sets of π measure ≥ 1/2 + ε are hit in constant number of steps.

d(t) ∼ 12P(the other clique was hit by time t) =⇒ maybe we should look at tails

rather than on expectations.

Hitting and Cutoff

Diaconis & Saloff-Coste (06) (separation cutoff) and Ding-Lubetzky-P. (10) (TVcutoff):

A seq. of birth and death chains exhibits cutoff iff the Prod. Cond. holds.

Both proofs reduce the problem to establishing concentration of hitting times.

We extend their results to weighted nearest-neighbor RWs on trees.

Hitting and Cutoff

Diaconis & Saloff-Coste (06) (separation cutoff) and Ding-Lubetzky-P. (10) (TVcutoff):

A seq. of birth and death chains exhibits cutoff iff the Prod. Cond. holds.

Both proofs reduce the problem to establishing concentration of hitting times.

We extend their results to weighted nearest-neighbor RWs on trees.

Cutoff for trees

Theorem

Let (V, P, π) be a lazy Markov chain on a tree T = (V,E). Then

tmix(ε)− tmix(1− ε) ≤ C| log ε|√treltmix, for any 0 < ε ≤ 1/4.

In particular, the Prod. Cond. implies cutoff with a cutoff window wn =

√t(n)rel t

(n)mix.

DLP (10) - for lazy BD chains tmix(ε)− tmix(1− ε) ≤ O(ε−1√treltmix).

In many cases any cutoff window must be Ω

(√t(n)rel t

(n)mix

Cutoff for trees

Theorem

Let (V, P, π) be a lazy Markov chain on a tree T = (V,E). Then

tmix(ε)− tmix(1− ε) ≤ C| log ε|√treltmix, for any 0 < ε ≤ 1/4.

In particular, the Prod. Cond. implies cutoff with a cutoff window wn =

√t(n)rel t

(n)mix.

DLP (10) - for lazy BD chains tmix(ε)− tmix(1− ε) ≤ O(ε−1√treltmix).

In many cases any cutoff window must be Ω

(√t(n)rel t

(n)mix

A mixing parameter in terms of hitting times

Let 0 < ε < 1.

hit(ε) := mint : Px[TA > t] ≤ ε : for all x and A ⊂ Ω s.t. π(A) ≥ 1/2.

Main abstract result

Definition: A sequence has hit-cutoff if

hit(n)(ε)− hit(n)(1− ε) = o(hit(n)(1/4)) for all 0 < ε < 1/4.

Main abstract result:

Theorem

Let (Ωn, Pn, πn) be a seq. of finite reversible lazy MCs. Then TFAE:

(i) Cutoff.

(ii) hit-cutoff.

We show that (ii) implies the Prod. Cond. (for (i) this is known).

Hence (i)⇐⇒(ii) follows from the following quantitative relation between hit andtmix using hit = Θ(tmix):

Theorem

(i) Cutoff.

(ii) hit-cutoff.

Theorem

(i) Cutoff.

(ii) hit-cutoff.

To mix - escape and then relax

For any reversible irreducible finite lazy chain and any 0 < ε ≤ 1/4,

hit(3ε)− trel| log(2ε)| ≤ tmix(2ε) ≤ hit(ε) + trel| log(2ε)|

Terms involving trel are negligible under the Prod. Cond..

Hitting times when X0 ∼ π

Easy direction: to mix, the chain must first escape from small sets = “first stage ofmixing”.

Fact: Let A ⊂ Ω be such that π(A) ≥ 1/2. Then (under reversibility)

Pπ[TA > 2strel] ≤e−s

2, for all s ≥ 0.

Follows by the spectral decomposition of PAc (the chain killed upon hitting A).

By a coupling argument,

Px[TA > t+ 2strel] ≤ d(t) + Pπ[TA > 2strel].

2, for all s ≥ 0.

The hard direction

We say the chain is ε-mixed w.r.t. A ⊂ Ω at time t+ s if for every x

|Pt+sx (A)− π(A)| ≤ ε.

We say that the event TG ≤ t “serves as a certificate” of being ε-mixedw.r.t. A ⊂ Ω at time t+ s if

|Px[Xt+s ∈ A | TG ≤ t]− π(A)| ≤ ε.

Application of the max ineq.

Claim: Denote t := hit(ε), s := trel × log(2/ε). Then

tmix(2ε) ≤ t+ s.

Proof: want for every A ⊂ Ω to have G = Gs(A) ⊂ Ω s.t. π(G) ≥ 1/2 and TG ≤ tserves as a certificate of “being ε-mixed w.r.t. A” at time t+ s.

Consider

G = Gs(A) :=g : ∀s ≥ s, |Psg(A)− π(A)| ≤ 2e−s/trel = ε

A consequence of Starr’s maximal ineq. is that π(G) ≥ 1/2.

=⇒ Px[TG > t] ≤ ε. (by def. of t).

For any x,A:

|Pt+sx (A)− π(A)| ≤ Px[TG > t] + maxg∈G,s≥s

|Psg(A)− π(A)| ≤ 2ε.

tmix(2ε) ≤ t+ s.

Consider

For any x,A:

|Psg(A)− π(A)| ≤ 2ε.

tmix(2ε) ≤ t+ s.

Consider

For any x,A:

|Psg(A)− π(A)| ≤ 2ε.

tmix(2ε) ≤ t+ s.

Consider

For any x,A:

|Psg(A)− π(A)| ≤ 2ε.

Remark:

The threshold 1/2 in the definition of hit can be replaced by any 0 < α < 1, i.e. ata price of an additive error of trel| log(1− α)| the analysis extends to

hitα,x(ε) := mint : Px[TA > t] ≤ ε : for all A ⊂ Ω s.t. π(A) ≥ α,

hitα(ε) := maxx∈Ω

hitα,x(ε)

If α > 1/2 one needs to assume that the Prod. Cond. holds to maintain theequivalence of the 2 notions of cutoff.

=⇒When trel is known, it suffices to consider escape probabilities from small setsin order to bound the mixing-time.

Remark:

The threshold 1/2 in the definition of hit can be replaced by any 0 < α < 1, i.e. ata price of an additive error of trel| log(1− α)| the analysis extends to

hitα,x(ε) := mint : Px[TA > t] ≤ ε : for all A ⊂ Ω s.t. π(A) ≥ α,

hitα(ε) := maxx∈Ω

hitα,x(ε)

If α > 1/2 one needs to assume that the Prod. Cond. holds to maintain theequivalence of the 2 notions of cutoff.

=⇒When trel is known, it suffices to consider escape probabilities from small setsin order to bound the mixing-time.

Def: For f ∈ RΩ, t ≥ 0, define P tf ∈ RΩ byP tf(x) := Ex[f(Xt)] =

P t(x, y)f(y).

For f ∈ RΩ define Eπ[f ] :=∑x∈Ω

π(x)f(x) and ‖f‖22 := Eπ[f2].

For g ∈ RΩ denote Varπg := ‖g − Eπ[g]‖22.

The following is well-known and follows from elementary linear-algebra.

Lemma (Contraction Lemma)

Let (Ω, P, π) be a finite rev. irr. lazy MC. Let A ⊂ Ω. Let t ≥ 0. Then

VarπPt1A ≤ e−2t/trel/4. (3)

P t(x, y)f(y).

π(x)f(x) and ‖f‖22 := Eπ[f2].

P t(x, y)f(y).

π(x)f(x) and ‖f‖22 := Eπ[f2].

P t(x, y)f(y).

π(x)f(x) and ‖f‖22 := Eπ[f2].

Maximal Inequality

The main ingredient in our approach is Starr’s maximal-inequality (66) (refines Stein’smax-inequality (61))

Theorem (Maximal inequality)

Let (Ω, P, π) be a lazy irreducible reversible Markov chain. Let f ∈ RΩ. Define thecorresponding maximal function f∗ ∈ RΩ as

f∗(x) := sup0≤k<∞

|P k(f)(x)| = sup0≤k<∞

|Ex[f(Xk)]|.

Then for 1 < p <∞,‖f∗‖p ≤ q‖f‖p 1/p+ 1/q = 1. (4)

Applying this (with p = 2) to fs := P s(1A − π(A)) in conjunction with the ContractionLemma yields the desired bound on the size of

Gc ⊂ (f∗s )2 > 8‖fs‖22 ⊂ (f∗s )2 > 2‖f∗s ‖22.

Maximal Inequality

The main ingredient in our approach is Starr’s maximal-inequality (66) (refines Stein’smax-inequality (61))

Theorem (Maximal inequality)

Let (Ω, P, π) be a lazy irreducible reversible Markov chain. Let f ∈ RΩ. Define thecorresponding maximal function f∗ ∈ RΩ as

f∗(x) := sup0≤k<∞

|P k(f)(x)| = sup0≤k<∞

|Ex[f(Xk)]|.

Then for 1 < p <∞,‖f∗‖p ≤ q‖f‖p 1/p+ 1/q = 1. (4)

Applying this (with p = 2) to fs := P s(1A − π(A)) in conjunction with the ContractionLemma yields the desired bound on the size of

Gc ⊂ (f∗s )2 > 8‖fs‖22 ⊂ (f∗s )2 > 2‖f∗s ‖22.

Let: T := (V,E) be a finite tree.

(V, P, π) a lazy MC corresponding to some (lazy) weighted nearest-neighbor walkon T (i.e. P (x, y) > 0 iff x, y ∈ E or y = x).

Fact: (Kolmogorov’s cycle condition) every MC on a tree is reversible.

Can the tree structure be used to determine the identity of the “worst” sets?

Easier question: what set of π measure ≥ 1/2 is the “hardest” to hit in a birth &death chain with state space [n] := 1, 2, . . . , n ?

Answer: take a state m with π([m]) ≥ 1/2 and π([m− 1]) < 1/2. Then the setworst set would be either [m] or [n] \ [m− 1].

How to generalize this to trees?

Central vertex

∏(Ci)≤1/2for all i

Figure : A vertex o ∈ V is called a central-vertex if each connected component of T \ o hasstationary probability at most 1/2.

There is always a central-vertex (and at most 2). We fix one, denote it by o andcall it the root.

There is a simple reduction: To is concentrated (from worst leaf) =⇒ hit-cutoff.

There is always a central-vertex (and at most 2). We fix one, denote it by o andcall it the root.

There is a simple reduction: To is concentrated (from worst leaf) =⇒ hit-cutoff.

It suffices to show that Ex[To] = Ω(tmix) =⇒ To is concentrated if X0 = x.

x=v0v1

Figure : Let v0 = x,v1, . . . , vk = o be the vertices along the path from x to o.

Proof of Concentration: Varx[To] ≤ Ctreltmix:

It suffices to show that Varx[To] ≤ 4trelEx[To].

If X0 = x then To is the sum of crossing times of the edges along the pathbetween x: τi := Tvi − Tvi−1

d= Tvi underX0 = vi−1

τ1, . . . , τk are independent =⇒ it suffices to bound the sum of their variancesVarx[To] =

∑Varx[τi].

The law of each τi is a mixture of geometrics with parameters ≥ 1/2trel =⇒∑Varx[τi] ≤

∑4trelEx[τi] = 4trelEx[To].

∑Varx[τi].

Beyond trees

The tree assumption can be relaxed. We can treat jumps to vertices of boundeddistance on a tree (i.e. the length of the path from u to v in the tree (which is nowjust an auxiliary structure) is > r =⇒ P (u, v) = 0) under some mild necessaryassumption.

Previously the BD assumption could not be relaxed mainly due to it beingexploited via a representation of hitting times result for BD chains.

In particular, if P (u, v) ≥ δ > 0 for all u, v s.t. dT (u, v) ≤ r (and otherwiseP (u, v) = 0), then

cutoff⇐⇒ the Prod. Cond. holds.

Beyond trees

Open problems

What can be said about the geometry of the “worst” sets in some interestingparticular cases (say, transitivity or monotonicity)?

When can the worst sets be described as |f2| ≤ C (Pf2 = λ2f2)?

Is it true that for transitive graphs the mixing time is bounded by the expected timeit takes the walk to escape a ball containing at least half the vertices?

Does the product condition imply cutoff for transitive graphs under certain mildconditions?

Does contraction implies cutoff? (under some mild conditions...)

Open problems

characterization of cutoff for reversible markov chainscharacterization of cutoff for reversible...

Documents