stability of adaptive random-walk metropolis algorithms

44
Introduction Some adaptive MCMC algorithms Validity Stability of adaptive random-walk Metropolis algorithms Matti Vihola [email protected] University of Jyv¨ askyl¨ a, Department of Mathematics and Statistics joint work with Eero Saksman (University of Helsinki) Paris, May 5th 2011 Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Upload: bigmc

Post on 05-Dec-2014

1.219 views

Category:

Documents


0 download

DESCRIPTION

"Stability of adaptive random-walk Metropolis algorithms", M. Vihola's talk at BigMC, 5th May 2011

TRANSCRIPT

Page 1: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Stability of adaptive random-walk Metropolisalgorithms

Matti [email protected]

University of Jyvaskyla, Department of Mathematics and Statistics

joint work with Eero Saksman (University of Helsinki)

Paris, May 5th 2011

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 2: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Adaptive MCMCStochastic approximation frameworkStability?

Adaptive MCMC

◮ Adaptive MCMC algorithms have been succesfully applied to‘automatise’ the tuning of the proposal distributions.

◮ In general, they are based on a collection of Markov kernels{Pθ}θ∈Θ, each having π as the unique invariant distribution.

◮ The aim is to find a good kernel (a good value of θ)automatically.

◮ Stochastic approximation framework is the most common wayto construct the algorithms [Andrieu and Robert, 2001].

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 3: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Adaptive MCMCStochastic approximation frameworkStability?

Stochastic approximation framework

◮ One starts with certain (X1,Θ1) ≡ (x1, θ1) ∈ Rd ×R

e and forn ≥ 2

Xn ∼ PΘn−1(Xn−1, · )

Θn = Θn−1 + γnH(Θn−1, Xn).

where (γn)n≥2 is a non-negative step size sequence thatvanishes and the function H : Re × R

d → Re determines the

adaptation.

◮ The adaptation aims to Θn → θ∗ so that h(θ∗) = 0, where

h(θ) :=

H(θ, x)π(dx).

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 4: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Adaptive MCMCStochastic approximation frameworkStability?

Stability? I

◮ Usually, one can identify a Lyapunov functionw : Re → [0,∞) such that

∇w(θ), h(θ)⟩

< 0 whenever h(θ) 6= 0.

◮ This suggests that, “on average”, there should be a “drift” ofΘn towards a θ∗ such that h(θ∗) = 0.

◮ To have sufficient “averaging”, the sequence of Markovkernels PΘn

should have nice properties.

◮ To have nice properties on PΘn, the sequence (Θn)n≥1 should

behave nicely. . .

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 5: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Adaptive MCMCStochastic approximation frameworkStability?

Stability? II

◮ Usual way to overcome the stability issue is to enforcestability, or to use adaptive reprojections approach like Andrieuand Moulines [2006], Andrieu, Moulines, and Priouret [2005].

◮ Strong belief (based on overwhelming empirical evidence) thatmany algorithms are intrinsically stable (and do not requirestabilisation).

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 6: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Adaptive Metropolis algorithmAdaptive scaling Metropolis algorithmRobust adaptive Metropolis algorithm

Adaptive Metropolis algorithm I

The AM algorithm [Haario, Saksman, and Tamminen, 2001]:

◮ Define X1 ≡ x1 ∈ Rd such that π(x1) > 0.

◮ Choose parameters t > 0 and ǫ ≥ 0.

For n = 2, 3, . . .

Yn = Xn−1 +Σ1/2n−1Wn, where Wn ∼ N(0, I), and

Xn =

{

Yn, with probability α(Xn−1, Yn) and

Xn−1, otherwise.

where Σn−1 := t2Cov(X1, . . . , Xn−1) + ǫI, and I ∈ Rd×d stands

for the identity matrix.

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 7: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Adaptive Metropolis algorithmAdaptive scaling Metropolis algorithmRobust adaptive Metropolis algorithm

Adaptive Metropolis algorithm II

Cov(· · · ) is some consistent covariance estimator (not necessarilythe standard sample covariance!).

◮ In what follows, consider the definitionCov(X1, . . . , Xn) := Sn, where (M1, S1) ≡ (x1, s1) withs1 ∈ R

d×d is symmetric and positive definite, and

Mn = (1− γn)Mn−1 + γnXn

Sn = (1− γn)Sn−1 + γn(Xn −Mn−1)(Xn −Mn−1)T .

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 8: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Adaptive Metropolis algorithmAdaptive scaling Metropolis algorithmRobust adaptive Metropolis algorithm

Example run of AM

−2 0 2−10

−8

−6

−4

−2

0

−2 0 2−10

−8

−6

−4

−2

0

−2 0 2−10

−8

−6

−4

−2

0

Figure: The first 10, 100 and 1000 samples of the AM algorithm startedwith s1 = (0.01)2I, t = 2.38/

√2, ǫ = 0 and ηn := n−1.

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 9: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Adaptive Metropolis algorithmAdaptive scaling Metropolis algorithmRobust adaptive Metropolis algorithm

Adaptive scaling Metropolis algorithm I

This algorithm, essentially proposed by [Gilks, Roberts, and Sahu,1998, Andrieu and Robert, 2001], adjusts the size of the proposaljumps, and tries to attain a given mean acceptance rate.

◮ Define X1 ≡ x1 ∈ Rd such that π(x1) > 0.

◮ Let Θ1 ≡ θ1 > 0.

◮ Define the desired mean acceptance rate α∗ ∈ (0, 1). (Usuallyα∗ = 0.44 or α∗ = 0.234. . . )

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 10: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Adaptive Metropolis algorithmAdaptive scaling Metropolis algorithmRobust adaptive Metropolis algorithm

Adaptive scaling Metropolis algorithm II

For n = 2, 3, . . ., iterate

Yn = Xn−1 +Θn−1Wn, where Wn ∼ N(0, I), and

Xn =

{

Yn, with probability α(Xn−1, Yn) and

Xn−1, otherwise.

logΘn = logΘn−1 + γn[

α(Xn−1, Yn)− α∗].

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 11: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Adaptive Metropolis algorithmAdaptive scaling Metropolis algorithmRobust adaptive Metropolis algorithm

Example run of ASM

−2 0 2−10

−8

−6

−4

−2

0

−2 0 2−10

−8

−6

−4

−2

0

−2 0 2−10

−8

−6

−4

−2

0

Figure: The first 10, 100 and 1000 samples of the ASM algorithm startedwith θ1 = 0.01 and using γn = n−3/4.

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 12: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Adaptive Metropolis algorithmAdaptive scaling Metropolis algorithmRobust adaptive Metropolis algorithm

Adaptive scaling within adaptive Metropolis

The AM and ASM algorithms can be used simultaneously [Atchadeand Fort, 2010, Andrieu and Thoms, 2008].

Yn = Xn−1 +Θn−1Σn−1Wn, where Wn ∼ N(0, I), and

Xn =

{

Yn, with probability α(Xn−1, Yn) and

Xn−1, otherwise.

logΘn = logΘn−1 + γn[

α(Xn−1, Yn)− α∗]

Mn = (1− γn)Mn−1 + γnXn

Sn = (1− γn)Sn−1 + γn(Xn −Mn−1)(Xn −Mn−1)T

Σn = t2Sn + ǫI.

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 13: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Adaptive Metropolis algorithmAdaptive scaling Metropolis algorithmRobust adaptive Metropolis algorithm

Robust adaptive Metropolis algorithm I

The RAM algorithm is a multidimensional ‘extension’ of the ASMadaptation mechanism [Vihola, 2010b].

◮ Define X1 ≡ x1 ∈ Rd such that π(x1) > 0.

◮ Let Σ1 ≡ s1 ∈ Rd×d be symmetric and positive definite.

◮ Define the desired mean acceptance rate α∗ ∈ (0, 1).

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 14: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Adaptive Metropolis algorithmAdaptive scaling Metropolis algorithmRobust adaptive Metropolis algorithm

Robust adaptive Metropolis algorithm II

For n = 2, 3, . . ., iterate

Yn = Xn−1 +Σ1/2n−1Wn, where Wn ∼ N(0, I), and

Xn =

{

Yn, with probability α(Xn−1, Yn) and

Xn−1, otherwise.

Σn = Σn−1 + γn[

α(Xn−1, Yn)− α∗]Σ1/2n−1

WnWTn

‖Wn‖2Σ1/2n−1

where Σn will be symmetric and positive definite wheneverγn

[

α(Xn−1, Yn)− α∗] > −1.

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 15: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Adaptive Metropolis algorithmAdaptive scaling Metropolis algorithmRobust adaptive Metropolis algorithm

Robust adaptive Metropolis algorithm III

Figure: The RAM update with γn[

α(Xn−1, Yn)− α∗

]

= ±0.8, resp.The ellipsoids defined by Σn−1 (Σn) are drawn in solid (dashed), and the

vector Σ1/2n−1Wn/‖Wn‖ as a dot.

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 16: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Adaptive Metropolis algorithmAdaptive scaling Metropolis algorithmRobust adaptive Metropolis algorithm

Example run of RAM

−2 0 2−10

−8

−6

−4

−2

0

−2 0 2−10

−8

−6

−4

−2

0

−2 0 2−10

−8

−6

−4

−2

0

Figure: The first 10, 100 and 1000 samples of the RAM algorithm startedwith Σ1 = 0.01× I and using γn = min{1, 2 · n−3/4}.

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 17: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Adaptive Metropolis algorithmAdaptive scaling Metropolis algorithmRobust adaptive Metropolis algorithm

RAM vs. ASwAM

◮ Computationally, they are of similar complexity (Choleskyupdate of order O(d2)).

◮ The RAM adaptation does not require that π has a finitesecond moment (or any moment at all).

◮ In RAM, the adaptation is ‘direct’ ‘while estimating thecovariance is ‘indirect’ (as it requires also a mean estimate).

◮ There are also situations where ASwAM seems to be moreefficient than RAM. . .

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 18: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Previous conditionsNew resultsResults for AM and ASM

Previous conditions on validity I

◮ What conditions are sufficient to guarantee that a law of largenumbers (LLN) holds: 1

n

∑nk=1

f(Xk)n→∞−−−→

f(x)π(x)dx?

◮ Key conditions due to Roberts and Rosenthal [2007]:

Diminishing adaptation (DA) The effect of the adaptationbecomes smaller and smaller,

“‖PΘn− PΘn−1

‖ n→∞−−−→ 0.”

Containment (C) The kernels PΘnhave all the time

sufficiently good mixing properties:

(

supn≥1

supk≥m

‖P kΘn

(Xn, · )− π‖)

m→∞−−−−→ 0.

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 19: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Previous conditionsNew resultsResults for AM and ASM

Previous conditions on validity II

◮ DA is usually easier to verify.

◮ Containment is often tightly related to the stability of theprocess (Θn)n≥1.

◮ Establishing containment (or stability) is often difficult beforeshowing that a LLN holds. . .

◮ Typical solution has been to enforce containment.◮ In the case of AM, ASM and RAM algorithms, this means that

the eigenvalues of Σn, Θn are constrained to be within [a, b]for some constants 0 < a ≤ b < ∞.

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 20: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Previous conditionsNew resultsResults for AM and ASM

Previous conditions on validity III

There are also other results in the literature on the ergodicity ofadaptive MCMC given conditions similar to DA and C.

◮ The original work on AM [Haario, Saksman, and Tamminen,2001].

◮ Atchade and Rosenthal [2005] analyse the ASM algorithmfollowing the original mixingale approach.

◮ The results on adaptive reprojections approach Andrieu andMoulines [2006], Andrieu, Moulines, and Priouret [2005].

◮ The recent paper by Atchade and Fort [2010] is based on aresolvent kernel approach.

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 21: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Previous conditionsNew resultsResults for AM and ASM

New results I

Key ideas in the new results:

◮ Containment is, in fact, not a necessary condition for LLN.

◮ That is, the ergodic properties of PΘncan become more and

more unfavourable, and still a (strong) LLN can hold.

◮ The adaptation mechanism may include a strong explicit‘drift’ away from ‘bad values’ of θ.

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 22: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Previous conditionsNew resultsResults for AM and ASM

General ergodicity result I

◮ Assume K1 ⊂ K2 ⊂ · · · ⊂ Re are increasing subsets of the

adaptation space. Assume that the adaptation follows thestochastic approximation dynamic

Xn ∼ PΘn−1(Xn−1, · )

Θn = Θn−1 + γnH(Θn−1, Xn).

and the adaptation parameter Θn ∈ Kn for all n ≥ 1.◮ One may also enforce Θn ∈ Kn. . .

◮ Consider the following conditions for constants c < ∞ and0 ≤ ǫ ≪ 1:

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 23: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Previous conditionsNew resultsResults for AM and ASM

General ergodicity result II

(A1) Drift and minorisation:

There is a drift function V : Rd → [1,∞) such that for alln ≥ 1 and all θ ∈ Kn

PθV (x) ≤ λnV (x) + 1Cn(x)bn and (1)

Pθ(x,A) ≥ 1Cn(x)δnνn(A) (2)

where Cn ⊂ Rd are Borel sets, δn, λn ∈ (0, 1) and bn < ∞

are constants and νn is concentrated on Cn. Furthermore,the constants are polynomially bounded so that

(1− λn)−1 ∨ δ−1

n ∨ bn ≤ cnǫ.

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 24: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Previous conditionsNew resultsResults for AM and ASM

General ergodicity result III

(A2) Continuity:

For all n ≥ 1 and any r ∈ (0, 1], there is c′ = c′(r) ≥ 1 suchthat for all θ, θ′ ∈ Kn,

‖Pθf − Pθ′f‖V r ≤ c′nǫ ‖f‖V r |θ − θ′|.

(A3) Bound for adaptation function:

There is a β ∈ [0, 1/2] such that for all n ≥ 1, θ ∈ Kn andx ∈ R

d

|H(θ, x)| ≤ cnǫV β(x).

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 25: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Previous conditionsNew resultsResults for AM and ASM

General ergodicity result IV

Theorem (Saksman and Vihola [2010])

Assume (A1)–(A3) hold and let f be a function with ‖f‖V α < ∞for some α ∈ (0, 1− β). Assume ǫ < κ−1

∗ [(1/2) ∧ (1− α− β)],where κ∗ ≫ 1 is an independent constant, and that∑∞

k=1kκ∗ǫ−1γk < ∞. Then,

limn→∞

1

n

n∑

k=1

f(Xk) =

f(x)π(x)dx almost surely.

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 26: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Previous conditionsNew resultsResults for AM and ASM

Verifiable assumptions I

Definition (Strongly super-exponential target)

The target density π is continuously differentiable and has regulartails that decay super-exponentially,

lim sup|x|→∞

x

|x| ·∇π(x)

|∇π(x)| < 0 and

lim|x|→∞

x

|x|ρ · ∇ log π(x) = −∞,

with some ρ > 1.

(The “super-exponential case”, ρ = 1, is due to Jarner and Hansen[2000] who ensure the geometric ergodicity of a nonadaptiverandom-walk Metropolis process.)

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 27: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Previous conditionsNew resultsResults for AM and ASM

Verifiable assumptions II

x

0

α∇π(x)

Figure: The condition lim sup|x|→∞x|x| ·

∇π(x)|∇π(x)| < 0 implies that there is

an ε > 0 such that for any sufficiently large |x|, the angle α < π/2− ε.

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 28: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Previous conditionsNew resultsResults for AM and ASM

AM when π has unbounded support

◮ Saksman and Vihola [2010]: First ergodicity results for theoriginal AM algorithm, without upper bounding theeigenvalues λ(Σn).

◮ Use the proposal covariance Σn = t2Sn + ǫI, with ǫ > 0.

◮ SLLN and CLT for strongly super-exponential π and functionssatisfying supx |f(x)|π−p(x) < ∞ for some p ∈ (0, 1).

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 29: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Previous conditionsNew resultsResults for AM and ASM

AM without the covariance lower bound I

Vihola [2011b] contains partial results on the case Σn = t2Sn, thatis, without lower bounding the eigenvalues λ(Σn).

◮ Analysed first an ‘adaptive random walk’: AM run with ‘flattarget π ≡ 1’,

Xn+1 = Xn + tS1/2n Wn+1

Mn+1 = (1− γn+1)Mn + γn+1Xn+1

Sn+1 = (1− γn+1)Sn + γn+1(Xn+1 −Mn)2.

◮ Ten sample paths of this process (univariate) started atx0 = 0, s0 = 1 and with t = 0.01 and ηn = n−1:

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 30: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Previous conditionsNew resultsResults for AM and ASM

AM without the covariance lower bound II

0 2000 4000 6000 8000 10000−0.1

−0.05

0

0.05

0.1

Xn

0 2000 4000 6000 8000 1000010

−4

10−2

100

Sn

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 31: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Previous conditionsNew resultsResults for AM and ASM

AM without the covariance lower bound III

0 0.5 1 1.5 2

x 106

−200

0

200

400

600

Xn

0 0.5 1 1.5 2

x 106

10−5

100

105

Sn

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 32: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Previous conditionsNew resultsResults for AM and ASM

AM without the covariance lower bound IV

◮ It is shown that Sn → ∞ almost surely.

◮ The speed of growth is E[Sn] ∼ exp(

t∑n

i=2

√ηi)

.

(Particularly, E[Sn] ∼ e2t√n when ηn = n−1).

◮ Using the same techniques, one can show the stability (andergodicity) of AM run with a univariate Laplace target π.

◮ These results have little direct practical value, but theyindicate that the AM covariance parameter Sn does not tendto collapse.

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 33: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Previous conditionsNew resultsResults for AM and ASM

AM with a fixed proposal component I

◮ Instead of lower bounding the eigenvalues λ(Σn), employ afixed proposal component with a probability β ∈ (0, 1)[Roberts and Rosenthal, 2009].

◮ This corresponds to an algorithm where the proposals Yn aregenerated by

Yn = Xn−1 +

{

Σ1/20 Wn, with probability β,

Σ1/2n−1Wn, otherwise.

with some fixed symm.pos.def. Σ0.

◮ In other words, one employs a mixture of ‘adaptive’ and‘nonadaptive’ Markov kernels:

P (Xn ∈ A | X1, . . . , Xn−1) = (1− β)PΣn−1(A) + βPΣ0

(A)

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 34: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Previous conditionsNew resultsResults for AM and ASM

AM with a fixed proposal component II

Vihola [2011b] shows that, for example with a bounded andcompactly supported π or with a super-exponential π, theeigenvalues λ(Σn) are bounded away from zero.

◮ Having a strongly super-exponential target, SLLN (resp. CLT)hold for functions satisfying supx |f(x)|π−p(x) < ∞ for somep ∈ (0, 1) (p ∈ (0, 1/2)).

◮ Explicit upper and lower bounds for Σn unnecessary.

◮ Mixture proposal may be better in practice than the lowerbound ǫI.

Also Bai, Roberts, and Rosenthal [2008] analyse this algorithm.

◮ A completely different approach, with different assumptions.

◮ Also exponentially decaying π are considered; in this case thefixed covariance Σ0 must be large enough.

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 35: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Previous conditionsNew resultsResults for AM and ASM

Ergodicity of unconstrained ASM algorithm

Vihola [2011a] shows two results for the unmodified adaptation,without any (upper or lower) bounds for Θn. Assume:

◮ the desired acceptance rate α∗ ∈ (0, 1/2).

◮ the adaptation weights satisfy∑

γ2n < ∞ (e.g. γn = n−γ withγ ∈ (1/2, 1]).

Two cases:

1. π is bounded, bounded away from zero on the support, andthe support X = {x : π(x) > 0} is compact and has a smoothboundary.Then, SLLN holds for bounded functions.

2. Suppose a strongly super-exponential target having tails withuniformly smooth contours.Then, SLLN holds for functions satisfyingsupx |f(x)|π−p(x) < ∞ for some p ∈ (0, 1).

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 36: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Previous conditionsNew resultsResults for AM and ASM

Ergodicity of ASM within AM

◮ The results in [Vihola, 2011a] cover also the adaptive scalingwithin adaptive Metropolis algorithm.

◮ However, one has to assume that the covariance factor Σn isbounded within 0 < a ≤ b < ∞

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 37: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Previous conditionsNew resultsResults for AM and ASM

Final remarks I

◮ Current results show that some adaptive MCMC algorithmsare intrinsically stable, requiring no additional stabilisationstructures.

◮ Easier for practitioners; less parameters to ‘tune.’◮ Showing that the methods are fairly ‘safe’ to apply.

◮ The results are related to the more general question of thestability of the Robbins-Monro stochastic approximation withMarkovian noise.

◮ Manuscript in preparation with C. Andrieu. . .

◮ Fort, Moulines, and Priouret [2010] consider also interactingMCMC (e.g. the equi-energy sampler) and extend theapproach in [Saksman and Vihola, 2010].

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 38: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Previous conditionsNew resultsResults for AM and ASM

Final remarks II

◮ It is not necessary to use Gaussian proposal distributions. Allthe above results apply to elliptical proposals satisfying aparticular tail decay condition. For example, the results applywith multivariate Student distributions having the form

qc(z) ∝ (1 + ‖c−1/2z‖2)−d+p

2

where p > 0 [Vihola, 2011a].

◮ Current results apply only for targets with rapidly decayingtails. It is important to establish similar results withheavy-tailed targets.

◮ Overall, there is a need for more general yet practicallyverifiable conditions to check the validity of the methods.

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 39: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Previous conditionsNew resultsResults for AM and ASM

Final remarks III

◮ There is also a free software available for testing severaladaptive random-walk MCMC algorithms, including the newRAM approach [Vihola, 2010a]:http://iki.fi/mvihola/grapham/

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 40: Stability of adaptive random-walk Metropolis algorithms

IntroductionSome adaptive MCMC algorithms

Validity

Previous conditionsNew resultsResults for AM and ASM

Final remarks III

◮ There is also a free software available for testing severaladaptive random-walk MCMC algorithms, including the newRAM approach [Vihola, 2010a]:http://iki.fi/mvihola/grapham/

Thank you!

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 41: Stability of adaptive random-walk Metropolis algorithms

References

References I

C. Andrieu and E. Moulines. On the ergodicity properties of someadaptive MCMC algorithms. Ann. Appl. Probab., 16(3):1462–1505, 2006.

C. Andrieu and C. P. Robert. Controlled MCMC for optimalsampling. Technical Report Ceremade 0125, Universite ParisDauphine, 2001.

C. Andrieu and J. Thoms. A tutorial on adaptive MCMC. Statist.Comput., 18(4):343–373, Dec. 2008.

C. Andrieu, E. Moulines, and P. Priouret. Stability of stochasticapproximation under verifiable conditions. SIAM J. ControlOptim., 44(1):283–312, 2005.

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 42: Stability of adaptive random-walk Metropolis algorithms

References

References II

Y. Atchade and G. Fort. Limit theorems for some adaptive MCMCalgorithms with subgeometric kernels. Bernoulli, 16(1):116–154,Feb. 2010.

Y. F. Atchade and J. S. Rosenthal. On adaptive Markov chainMonte Carlo algorithms. Bernoulli, 11(5):815–828, 2005.

Y. Bai, G. O. Roberts, and J. S. Rosenthal. On the containmentcondition for adaptive Markov chain Monte Carlo algorithms.Preprint, July 2008. URLhttp://probability.ca/jeff/research.html.

G. Fort, E. Moulines, and P. Priouret. Convergence of adaptiveMCMC algorithms: ergodicity and law of large numbers. 2010.technical report.

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 43: Stability of adaptive random-walk Metropolis algorithms

References

References III

W. R. Gilks, G. O. Roberts, and S. K. Sahu. Adaptive Markovchain Monte Carlo through regeneration. J. Amer. Statist.Assoc., 93(443):1045–1054, 1998.

H. Haario, E. Saksman, and J. Tamminen. An adaptive Metropolisalgorithm. Bernoulli, 7(2):223–242, 2001.

S. F. Jarner and E. Hansen. Geometric ergodicity of Metropolisalgorithms. Stochastic Process. Appl., 85:341–361, 2000.

G. O. Roberts and J. S. Rosenthal. Coupling and ergodicity ofadaptive Markov chain Monte Carlo algorithms. J. Appl.Probab., 44(2):458–475, 2007.

G. O. Roberts and J. S. Rosenthal. Examples of adaptive MCMC.J. Comput. Graph. Statist., 18(2):349–367, 2009.

Matti Vihola Stability of adaptive random-walk Metropolis algorithms

Page 44: Stability of adaptive random-walk Metropolis algorithms

References

References IV

E. Saksman and M. Vihola. On the ergodicity of the adaptiveMetropolis algorithm on unbounded domains. Ann. Appl.Probab., 20(6):2178–2203, Dec. 2010.

M. Vihola. Grapham: Graphical models with adaptive random walkMetropolis algorithms. Comput. Statist. Data Anal., 54(1):49–54, Jan. 2010a.

M. Vihola. Robust adaptive Metropolis algorithm with coercedacceptance rate. Preprint, arXiv:1011.4381v1, Nov. 2010b.

M. Vihola. On the stability and ergodicity of adaptive scalingMetropolis algorithms. Preprint, arXiv:0903.4061v3, Apr. 2011a.

M. Vihola. Can the adaptive Metropolis algorithm collapse withoutthe covariance lower bound? Electron. J. Probab., 16:45–75,2011b.

Matti Vihola Stability of adaptive random-walk Metropolis algorithms