introduction to advanced monte carlo methods

93
An introduction to advanced (?) MCMC methods An introduction to advanced (?) MCMC methods Christian P. Robert Universit´ e Paris-Dauphine and CREST-INSEE http://www.ceremade.dauphine.fr/ ~ xian Royal Statistical Society, October 13, 2010

Upload: christian-robert

Post on 06-May-2015

3.129 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

An introduction to advanced (?) MCMC methods

Christian P. Robert

Universite Paris-Dauphine and CREST-INSEEhttp://www.ceremade.dauphine.fr/~xian

Royal Statistical Society, October 13, 2010

Page 2: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

Motivating example

Motivating example

1 Motivating example

2 The Metropolis-Hastings Algorithm

Page 3: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

Motivating example

Latent structures make life harder!

Even simple models may lead to computational complications,as in latent variable models

f(x|θ) =

f⋆(x, x⋆|θ) dx⋆

Page 4: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

Motivating example

Latent structures make life harder!

Even simple models may lead to computational complications,as in latent variable models

f(x|θ) =

f⋆(x, x⋆|θ) dx⋆

If (x, x⋆) observed, fine!

Page 5: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

Motivating example

Latent structures make life harder!

Even simple models may lead to computational complications,as in latent variable models

f(x|θ) =

f⋆(x, x⋆|θ) dx⋆

If (x, x⋆) observed, fine!

If only x observed, trouble!

Page 6: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

Motivating example

Example (Mixture models)

Models of mixtures of distributions:

X ∼ fj with probability pj ,

for j = 1, 2, . . . , k, with overall density

X ∼ p1f1(x) + · · · + pkfk(x) .

Page 7: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

Motivating example

Example (Mixture models)

Models of mixtures of distributions:

X ∼ fj with probability pj ,

for j = 1, 2, . . . , k, with overall density

X ∼ p1f1(x) + · · · + pkfk(x) .

For a sample of independent random variables (X1, · · · , Xn),sample density

n∏

i=1

{p1f1(xi) + · · · + pkfk(xi)} .

Page 8: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

Motivating example

Example (Mixture models)

Models of mixtures of distributions:

X ∼ fj with probability pj ,

for j = 1, 2, . . . , k, with overall density

X ∼ p1f1(x) + · · · + pkfk(x) .

For a sample of independent random variables (X1, · · · , Xn),sample density

n∏

i=1

{p1f1(xi) + · · · + pkfk(xi)} .

Expanding this product involves kn elementary terms: prohibitiveto compute in large samples.

Page 9: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

Motivating example

0.3N (µ1, 1) + 0.7N (µ2, 1) loglikelihood

−1 0 1 2 3

−1

01

23

µ1

µ 2

Page 10: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

Motivating example

A typology of Bayes computational problems

(i) use of a complex parameter space, as for instance inconstrained parameter sets like those resulting from imposingstationarity constraints in dynamic models;

Page 11: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

Motivating example

A typology of Bayes computational problems

(i) use of a complex parameter space, as for instance inconstrained parameter sets like those resulting from imposingstationarity constraints in dynamic models;

(ii) use of a complex sampling model with an intractablelikelihood, as for instance in missing data and graphicalmodels;

Page 12: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

Motivating example

A typology of Bayes computational problems

(i) use of a complex parameter space, as for instance inconstrained parameter sets like those resulting from imposingstationarity constraints in dynamic models;

(ii) use of a complex sampling model with an intractablelikelihood, as for instance in missing data and graphicalmodels;

(iii) use of a huge dataset;

Page 13: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

Motivating example

A typology of Bayes computational problems

(i) use of a complex parameter space, as for instance inconstrained parameter sets like those resulting from imposingstationarity constraints in dynamic models;

(ii) use of a complex sampling model with an intractablelikelihood, as for instance in missing data and graphicalmodels;

(iii) use of a huge dataset;(iv) use of a complex prior distribution (which may be the

posterior distribution associated with an earlier sample);

Page 14: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

Motivating example

A typology of Bayes computational problems

(i) use of a complex parameter space, as for instance inconstrained parameter sets like those resulting from imposingstationarity constraints in dynamic models;

(ii) use of a complex sampling model with an intractablelikelihood, as for instance in missing data and graphicalmodels;

(iii) use of a huge dataset;(iv) use of a complex prior distribution (which may be the

posterior distribution associated with an earlier sample);(v) use of a complex inferential procedure as for instance, Bayes

factors

Bπ01(x) = P (θ ∈ Θ0 |x)/P (θ ∈ Θ1 |x)

/

π(θ ∈ Θ0)

π(θ ∈ Θ1).

Page 15: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

The Metropolis-Hastings Algorithm

1 Motivating example

2 The Metropolis-Hastings AlgorithmMonte Carlo Methods based on Markov ChainsThe Metropolis–Hastings algorithmA collection of Metropolis-Hastings algorithmsExtensionsConvergence assessment

Page 16: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Monte Carlo Methods based on Markov Chains

Running Monte Carlo via Markov Chains

Fact: It is not necessary to use a sample from the distribution f toapproximate the integral

I =

h(x)f(x)dx ,

Page 17: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Monte Carlo Methods based on Markov Chains

Running Monte Carlo via Markov Chains

Fact: It is not necessary to use a sample from the distribution f toapproximate the integral

I =

h(x)f(x)dx ,

We can obtain X1, . . . , Xn ∼ f (approx) without directlysimulating from f , using an ergodic Markov chain withstationary distribution f

Page 18: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Monte Carlo Methods based on Markov Chains

Running Monte Carlo via Markov Chains (2)

Idea

For an arbitrary starting value x(0), an ergodic chain (X(t)) isgenerated using a transition kernel with stationary distribution f

Page 19: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Monte Carlo Methods based on Markov Chains

Running Monte Carlo via Markov Chains (2)

Idea

For an arbitrary starting value x(0), an ergodic chain (X(t)) isgenerated using a transition kernel with stationary distribution f

Ensures the convergence in distribution of (X(t)) to a randomvariable from f .

For a “large enough” T0, X(T0) can be considered as

distributed from f

Produces a dependent sample X(T0), X(T0+1), . . ., which isgenerated from f , sufficient for most approximation purposes.

Page 20: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

The Metropolis–Hastings algorithm

The Metropolis–Hastings algorithm

Problem:How can one build a Markov chain with a given stationarydistribution?

Page 21: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

The Metropolis–Hastings algorithm

The Metropolis–Hastings algorithm

Problem:How can one build a Markov chain with a given stationarydistribution?

MH basicsAlgorithm that converges to the objective (target) density

f

using an arbitrary transition kernel density

q(x, y)

called instrumental (or proposal) distribution

Page 22: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

The Metropolis–Hastings algorithm

The MH algorithm

Algorithm (Metropolis–Hastings)

Given x(t),

1 Generate Yt ∼ q(x(t), y).

2 Take

X(t+1) =

{

Yt with prob. ρ(x(t), Yt),

x(t) with prob. 1 − ρ(x(t), Yt),

where

ρ(x, y) = min

{

f(y)

f(x)

q(y, x)

q(x, y), 1

}

.

Page 23: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

The Metropolis–Hastings algorithm

Features

Independent of normalizing constants for both f and q(x, ·)(ie, those constants independent of x)

Never move to values with f(y) = 0

The chain (x(t))t may take the same value several times in arow, even though f is a density wrt Lebesgue measure

The sequence (yt)t is usually not a Markov chain

Page 24: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

The Metropolis–Hastings algorithm

Features

Independent of normalizing constants for both f and q(x, ·)(ie, those constants independent of x)

Never move to values with f(y) = 0

The chain (x(t))t may take the same value several times in arow, even though f is a density wrt Lebesgue measure

The sequence (yt)t is usually not a Markov chain

Satisfies the detailed balance condition

f(x)K(x, y) = f(y)K(y, x)

’θθ->P( )

P( )θ ’ θ->

θ θ’

[Green, 1995]

Page 25: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

The Metropolis–Hastings algorithm

Convergence properties

1 The M-H Markov chain is reversible, with invariant/stationarydensity f .

Page 26: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

The Metropolis–Hastings algorithm

Convergence properties

1 The M-H Markov chain is reversible, with invariant/stationarydensity f .

2 As f is a probability measure, the chain is positive recurrent

Page 27: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

The Metropolis–Hastings algorithm

Convergence properties

1 The M-H Markov chain is reversible, with invariant/stationarydensity f .

2 As f is a probability measure, the chain is positive recurrent

3 If

Pr

[

f(Yt) q(Yt, X(t))

f(X(t)) q(X(t), Yt)≥ 1

]

< 1. (1)

i.e., if the event {X(t+1) = X(t)} occurs with positiveprobability, then the chain is aperiodic

Page 28: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

The Metropolis–Hastings algorithm

Convergence properties (2)

4 Ifq(x, y) > 0 for every (x, y), (2)

the chain is irreducible

Page 29: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

The Metropolis–Hastings algorithm

Convergence properties (2)

4 Ifq(x, y) > 0 for every (x, y), (2)

the chain is irreducible5 For M-H, f -irreducibility implies Harris recurrence

Page 30: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

The Metropolis–Hastings algorithm

Convergence properties (2)

4 Ifq(x, y) > 0 for every (x, y), (2)

the chain is irreducible5 For M-H, f -irreducibility implies Harris recurrence6 Thus, under conditions (1) and (2)

(i) For h, with Ef |h(X)| <∞,

limT→∞

1

T

T∑

t=1

h(X(t)) =

h(x)df(x) a.e. f.

(ii) and

limn→∞

Kn(x, ·)µ(dx) − f

TV

= 0

for every initial distribution µ, where Kn(x, ·) denotes thekernel for n transitions.

Page 31: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

A collection of Metropolis-Hastings algorithms

The Independent Case

The instrumental distribution q(x, ·) is independent of x and isdenoted g

Page 32: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

A collection of Metropolis-Hastings algorithms

The Independent Case

The instrumental distribution q(x, ·) is independent of x and isdenoted g

Algorithm (Independent Metropolis-Hastings)

Given x(t),

1 Generate Yt ∼ g(y)

2 Take

X(t+1) =

Yt with prob. min

{

f(Yt) g(x(t))

f(x(t)) g(Yt), 1

}

,

x(t) otherwise.

Page 33: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

A collection of Metropolis-Hastings algorithms

Properties

The resulting sample is not iid

Page 34: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

A collection of Metropolis-Hastings algorithms

Properties

The resulting sample is not iid but there exist strong convergenceproperties:

Theorem (Ergodicity)

The algorithm produces a uniformly ergodic chain if there exists aconstant M such that

f(x) ≤Mg(x) , x ∈ supp f.

In this case,

‖Kn(x, ·) − f‖TV ≤(

1 − 1

M

)n

.

[Mengersen & Tweedie, 1996]

Page 35: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

A collection of Metropolis-Hastings algorithms

Example (Noisy AR(1))

Hidden Markov chain from a regular AR(1) model,

xt+1 = ϕxt + ǫt+1 ǫt ∼ N (0, τ2)

and observablesyt|xt ∼ N (x2

t , σ2)

Page 36: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

A collection of Metropolis-Hastings algorithms

Example (Noisy AR(1))

Hidden Markov chain from a regular AR(1) model,

xt+1 = ϕxt + ǫt+1 ǫt ∼ N (0, τ2)

and observablesyt|xt ∼ N (x2

t , σ2)

The distribution of xt given xt−1, xt+1 and yt is

exp−1

2τ2

{

(xt − ϕxt−1)2 + (xt+1 − ϕxt)

2 +τ2

σ2(yt − x2

t )2

}

.

Page 37: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

A collection of Metropolis-Hastings algorithms

Example (Noisy AR(1) too)

Use for proposal the N (µt, ω2t ) distribution, with

µt = ϕxt−1 + xt+1

1 + ϕ2and ω2

t =τ2

1 + ϕ2.

Page 38: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

A collection of Metropolis-Hastings algorithms

Example (Noisy AR(1) too)

Use for proposal the N (µt, ω2t ) distribution, with

µt = ϕxt−1 + xt+1

1 + ϕ2and ω2

t =τ2

1 + ϕ2.

Ratioπ(x)/qind(x) = exp−(yt − x2

t )2/2σ2

is bounded

Page 39: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

A collection of Metropolis-Hastings algorithms

(top) Last 500 realisations of the chain {Xk}k out of 10, 000iterations; (bottom) histogram of the chain, compared withthe target distribution.

Page 40: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

A collection of Metropolis-Hastings algorithms

Random walk Metropolis–Hastings

Instead, use a local perturbation as proposal

Yt = X(t) + εt,

where εt ∼ g, independent of X(t).The instrumental density is now of the form g(y − x) and theMarkov chain is a random walk if g is symmetric

g(x) = g(−x)

Page 41: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

A collection of Metropolis-Hastings algorithms

Algorithm (Random walk Metropolis)

Given x(t)

1 Generate Yt ∼ g(y − x(t))

2 Take

X(t+1) =

Yt with prob. min

{

1,f(Yt)

f(x(t))

}

,

x(t) otherwise.

Page 42: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

A collection of Metropolis-Hastings algorithms

Probit illustration

Likelihood and posterior given by

π(β|y, X) ∝ ℓ(β|y, X) ∝n∏

i=1

Φ(xiTβ)yi(1 − Φ(xiTβ))ni−yi .

under the flat prior

Page 43: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

A collection of Metropolis-Hastings algorithms

Probit illustration

Likelihood and posterior given by

π(β|y, X) ∝ ℓ(β|y, X) ∝n∏

i=1

Φ(xiTβ)yi(1 − Φ(xiTβ))ni−yi .

under the flat priorA random walk proposal works well for a small number ofpredictors. Use the maximum likelihood estimate β as startingvalue and asymptotic (Fisher) covariance matrix of the MLE, Σ, asscale

Page 44: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

A collection of Metropolis-Hastings algorithms

MCMC algorithm

Probit random-walk Metropolis-Hastings

Initialization: Set β(0) = β and compute Σ

Iteration t:1 Generate β ∼ Nk+1(β

(t−1), τ Σ)2 Compute

ρ(β(t−1), β) = min

(

1,π(β|y)

π(β(t−1)|y)

)

3 With probability ρ(β(t−1), β) set β(t) = β;otherwise set β(t) = β(t−1).

Page 45: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

A collection of Metropolis-Hastings algorithms

R bank benchmark

Probit modelling withno intercept over thefour measurements.Three different scalesτ = 1, 0.1, 10: bestmixing behavior isassociated with τ = 1.Average of theparameters overMCMC 9, 000iterations gives plug-inestimate

0 4000 8000

−2.

0−

1.0

−2.0 −1.5 −1.0 −0.5

0.0

1.0

0 200 600 1000

0.0

0.4

0.8

0 4000 8000

−1

12

3

−1 0 1 2 3

0.0

0.4

0 200 600 1000

0.0

0.4

0.8

0 4000 8000−

0.5

1.0

2.5

−0.5 0.5 1.5 2.5

0.0

0.4

0.8

0 200 600 1000

0.0

0.4

0.8

0 4000 8000

0.6

1.2

1.8

0.6 1.0 1.4 1.8

0.0

1.0

2.0

0 200 600 1000

0.0

0.4

0.8

pi = Φ(−1.2193xi1 + 0.9540xi2 + 0.9795xi3 + 1.1481xi4) .

Page 46: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

A collection of Metropolis-Hastings algorithms

Example (Mixture models)

π(θ|x) ∝n∏

j=1

(

k∑

ℓ=1

pℓf(xj |µℓ, σℓ)

)

π(θ)

Page 47: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

A collection of Metropolis-Hastings algorithms

Example (Mixture models)

π(θ|x) ∝n∏

j=1

(

k∑

ℓ=1

pℓf(xj |µℓ, σℓ)

)

π(θ)

Metropolis-Hastings proposal:

θ(t+1) =

{

θ(t) + ωε(t) if u(t) < ρ(t)

θ(t) otherwise

where

ρ(t) =π(θ(t) + ωε(t)|x)

π(θ(t)|x) ∧ 1

and ω scaled for good acceptance rate

Page 48: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

A collection of Metropolis-Hastings algorithms

Random walk MCMC output for.7N (µ1, 1) + .3N (µ2, 1)

and scale 1

−1 0 1 2 3 4

−1

01

23

4

µ1

µ 2

Iteration 1

Page 49: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

A collection of Metropolis-Hastings algorithms

Random walk MCMC output for.7N (µ1, 1) + .3N (µ2, 1)

and scale 1

−1 0 1 2 3 4

−1

01

23

4

µ1

µ 2

Iteration 10

Page 50: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

A collection of Metropolis-Hastings algorithms

Random walk MCMC output for.7N (µ1, 1) + .3N (µ2, 1)

and scale 1

−1 0 1 2 3 4

−1

01

23

4

µ1

µ 2

Iteration 100

Page 51: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

A collection of Metropolis-Hastings algorithms

Random walk MCMC output for.7N (µ1, 1) + .3N (µ2, 1)

and scale 1

−1 0 1 2 3 4

−1

01

23

4

µ1

µ 2

Iteration 500

Page 52: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

A collection of Metropolis-Hastings algorithms

Random walk MCMC output for.7N (µ1, 1) + .3N (µ2, 1)

and scale 1

−1 0 1 2 3 4

−1

01

23

4

µ1

µ 2

Iteration 1000

Page 53: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

A collection of Metropolis-Hastings algorithms

Random walk MCMC output for.7N (µ1, 1) + .3N (µ2, 1)

and scale√.1

−1 0 1 2 3 4

−1

01

23

4

µ1

µ 2

Iteration 10

Page 54: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

A collection of Metropolis-Hastings algorithms

Random walk MCMC output for.7N (µ1, 1) + .3N (µ2, 1)

and scale√.1

−1 0 1 2 3 4

−1

01

23

4

µ1

µ 2

Iteration 100

Page 55: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

A collection of Metropolis-Hastings algorithms

Random walk MCMC output for.7N (µ1, 1) + .3N (µ2, 1)

and scale√.1

−1 0 1 2 3 4

−1

01

23

4

µ1

µ 2

Iteration 500

Page 56: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

A collection of Metropolis-Hastings algorithms

Random walk MCMC output for.7N (µ1, 1) + .3N (µ2, 1)

and scale√.1

−1 0 1 2 3 4

−1

01

23

4

µ1

µ 2

Iteration 1000

Page 57: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

A collection of Metropolis-Hastings algorithms

Random walk MCMC output for.7N (µ1, 1) + .3N (µ2, 1)

and scale√.1

−1 0 1 2 3 4

−1

01

23

4

µ1

µ 2

Iteration 10,000

Page 58: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

A collection of Metropolis-Hastings algorithms

Random walk MCMC output for.7N (µ1, 1) + .3N (µ2, 1)

and scale√.1

−1 0 1 2 3 4

−1

01

23

4

µ1

µ 2

Iteration 5000

Page 59: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

A collection of Metropolis-Hastings algorithms

Convergence properties

Uniform ergodicity prohibited by random walk structure

Page 60: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

A collection of Metropolis-Hastings algorithms

Convergence properties

Uniform ergodicity prohibited by random walk structureAt best, geometric ergodicity:

Theorem (Sufficient ergodicity)

For a symmetric density f , log-concave in the tails, and a positiveand symmetric density g, the chain (X(t)) is geometrically ergodic.

[Mengersen & Tweedie, 1996]

no tail effect

Page 61: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

A collection of Metropolis-Hastings algorithms

Example (Comparison of taileffects)

Random-walkMetropolis–Hastings algorithmsbased on a N (0, 1) instrumentalfor the generation of (a) aN (0, 1) distribution and (b) adistribution with densityψ(x) ∝ (1 + |x|)−3

(a)

0 50 100 150 200

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

(a)

0 50 100 150 200

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

0 50 100 150 200

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

0 50 100 150 200

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

(b)

0 50 100 150 200

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

0 50 100 150 200

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

0 50 100 150 200

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

90% confidence envelopes ofthe means, derived from 500parallel independent chains

Page 62: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Extensions

Extensions

There are many other families of HM algorithms

Adaptive Rejection Metropolis Sampling

Reversible Jump

Langevin algorithms

to name just a few...

Page 63: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Extensions

Langevin Algorithms

Proposal based on the Langevin diffusion Lt is defined by thestochastic differential equation

dLt = dBt +1

2∇ log f(Lt)dt,

where Bt is the standard Brownian motion

Theorem

The Langevin diffusion is the only non-explosive diffusion which isreversible with respect to f .

Page 64: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Extensions

Discretization

Because continuous time cannot be simulated, consider thediscretised sequence

x(t+1) = x(t) +σ2

2∇ log f(x(t)) + σεt, εt ∼ Np(0, Ip)

where σ2 corresponds to the discretisation step

Example off(x) = exp(−x4)

Den

sity

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

0.0

0.1

0.2

0.3

0.4

0.5

0.6

σ2 = .1

Page 65: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Extensions

Discretization

Because continuous time cannot be simulated, consider thediscretised sequence

x(t+1) = x(t) +σ2

2∇ log f(x(t)) + σεt, εt ∼ Np(0, Ip)

where σ2 corresponds to the discretisation step

Example off(x) = exp(−x4)

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

0.0

0.1

0.2

0.3

0.4

0.5

0.6

σ2 = .01

Page 66: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Extensions

Discretization

Because continuous time cannot be simulated, consider thediscretised sequence

x(t+1) = x(t) +σ2

2∇ log f(x(t)) + σεt, εt ∼ Np(0, Ip)

where σ2 corresponds to the discretisation step

Example off(x) = exp(−x4)

Den

sity

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

0.0

0.1

0.2

0.3

0.4

0.5

0.6

σ2 = .001

Page 67: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Extensions

Discretization

Because continuous time cannot be simulated, consider thediscretised sequence

x(t+1) = x(t) +σ2

2∇ log f(x(t)) + σεt, εt ∼ Np(0, Ip)

where σ2 corresponds to the discretisation step

Example off(x) = exp(−x4)

Den

sity

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

0.0

0.2

0.4

0.6

0.8

σ2 = .0001

Page 68: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Extensions

Discretization

Because continuous time cannot be simulated, consider thediscretised sequence

x(t+1) = x(t) +σ2

2∇ log f(x(t)) + σεt, εt ∼ Np(0, Ip)

where σ2 corresponds to the discretisation step

Example off(x) = exp(−x4)

Den

sity

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

0.0

0.1

0.2

0.3

0.4

0.5

0.6

σ2 = .0001∗

Page 69: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Extensions

Discretization

Unfortunately, the discretized chain may be transient, for instancewhen

limx→±∞

∣σ2∇ log f(x)|x|−1∣

∣ > 1

Example of f(x) = exp(−x4) when σ2 = .2

Page 70: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Extensions

MH correction

Accept the new value Yt with probability

f(Yt)

f(x(t))·exp

{

−∥

∥Yt − x(t) − σ2

2 ∇ log f(x(t))∥

2/

2σ2

}

exp

{

−∥

∥x(t) − Yt − σ2

2 ∇ log f(Yt)∥

2/

2σ2

} ∧ 1 .

Choice of the scaling factor σ

Should lead to an acceptance rate of 0.574 to achieve optimalconvergence rates (when the components of x are uncorrelated)

[Roberts & Rosenthal, 1998]

Page 71: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Extensions

Optimizing the Acceptance Rate

Problem of choice of the transition kernel from a practical point ofviewMost common alternatives:

1 a fully automated algorithm like ARMS;

2 an instrumental density g which approximates f , such thatf/g is bounded for uniform ergodicity to apply;

3 a random walk

In both cases (b) and (c), the choice of g is critical,

Page 72: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Extensions

Case of the random walk

Different approach to acceptance ratesA high acceptance rate does not indicate that the algorithm ismoving correctly since it indicates that the random walk is movingtoo slowly on the surface of f .

Page 73: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Extensions

Case of the random walk

Different approach to acceptance ratesA high acceptance rate does not indicate that the algorithm ismoving correctly since it indicates that the random walk is movingtoo slowly on the surface of f .If x(t) and yt are close, i.e. f(x(t)) ≃ f(yt) y is accepted withprobability

min

(

f(yt)

f(x(t)), 1

)

≃ 1 .

For multimodal densities with well separated modes, the negativeeffect of limited moves on the surface of f clearly shows.

Page 74: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Extensions

Case of the random walk (2)

If the average acceptance rate is low, the successive values of f(yt)tend to be small compared with f(x(t)), which means that therandom walk moves quickly on the surface of f since it oftenreaches the “borders” of the support of f

Page 75: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Extensions

Rule of thumb

In small dimensions, aim at an average acceptance rate of 50%. Inlarge dimensions, at an average acceptance rate of 25%.

[Gelman,Gilks and Roberts, 1995]

Page 76: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Extensions

Rule of thumb

In small dimensions, aim at an average acceptance rate of 50%. Inlarge dimensions, at an average acceptance rate of 25%.

[Gelman,Gilks and Roberts, 1995]

This rule is to be taken with a pinch of salt!

Page 77: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Extensions

Example (Noisy AR(1) continued)

For a Gaussian random walk with scale ω small enough, therandom walk never jumps to the other mode. But if the scale ω issufficiently large, the Markov chain explores both modes and give asatisfactory approximation of the target distribution.

Page 78: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Extensions

Markov chain based on a random walk with scale ω = .1

Page 79: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Extensions

Markov chain based on a random walk with scale ω = .5

Page 80: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Extensions

Where do we stand?

MCMC in a nutshell:

Page 81: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Extensions

Where do we stand?

MCMC in a nutshell:

Running a sequence Xt+1 = Ψ(Xt, Yy) provides approximationto target density f when detailed balance condition holds

f(x)K(x, y) = f(y)K(y, x)

Page 82: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Extensions

Where do we stand?

MCMC in a nutshell:

Running a sequence Xt+1 = Ψ(Xt, Yy) provides approximationto target density f when detailed balance condition holds

f(x)K(x, y) = f(y)K(y, x)

Easiest implementation of the principle is random walkMetropolis-Hastings

Yt = X(t) + εt

Page 83: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Extensions

Where do we stand?

MCMC in a nutshell:

Running a sequence Xt+1 = Ψ(Xt, Yy) provides approximationto target density f when detailed balance condition holds

f(x)K(x, y) = f(y)K(y, x)

Easiest implementation of the principle is random walkMetropolis-Hastings

Yt = X(t) + εt

Practical convergence requires sufficient energy from theproposal that is calibrated by trial and error.

Page 84: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Convergence assessment

Convergence diagnostics

How many iterations?

Page 85: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Convergence assessment

Convergence diagnostics

How many iterations?

Rule # 1 There is no absolute number of simulations, i.e.1, 000 is neither large, nor small.

Rule # 2 It takes [much] longer to check for convergencethan for the chain itself to converge.

Rule # 3 MCMC is a “what-you-get-is-what-you-see”algorithm: it fails to tell about unexplored parts of the space.

Rule # 4 When in doubt, run MCMC chains in parallel andcheck for consistency.

Page 86: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Convergence assessment

Convergence diagnostics

How many iterations?

Rule # 1 There is no absolute number of simulations, i.e.1, 000 is neither large, nor small.

Rule # 2 It takes [much] longer to check for convergencethan for the chain itself to converge.

Rule # 3 MCMC is a “what-you-get-is-what-you-see”algorithm: it fails to tell about unexplored parts of the space.

Rule # 4 When in doubt, run MCMC chains in parallel andcheck for consistency.

Many “quick-&-dirty” solutions in the literature, but notnecessarily 100% trustworthy.

Page 87: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Convergence assessment

Example (Bimodal target)

Density

f(x) =exp−x2/2√

4(x− .3)2 + .01

4(1 + (.3)2) + .01.

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

and use of random walk Metropolis–Hastings algorithm withvariance .04Evaluation of the missing mass by

T−1∑

t=1

[θ(t+1) − θ(t)] f(θ(t))

Page 88: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Convergence assessment

0 500 1000 1500 2000

0.00.2

0.40.6

0.81.0

Index

mass

Index

Sequence [in blue] and mass evaluation [in brown]

[Philippe & Robert, 2001]

Page 89: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Convergence assessment

Effective sample size

How many iid simulations from π are equivalent to N simulationsfrom the MCMC algorithm?

Page 90: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Convergence assessment

Effective sample size

How many iid simulations from π are equivalent to N simulationsfrom the MCMC algorithm?

Based on estimated k-th order auto-correlation,

ρk = cov(

x(t), x(t+k))

,

effective sample size

N ess = n

(

1 + 2

T0∑

k=1

ρk

)−1/2

,

Only partial indicator that fails to signal chains stuck in onemode of the target

Page 91: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Convergence assessment

Tempering

Facilitate exploration of π by flattening the target: simulate fromπα(x) ∝ π(x)α for α > 0 small enough

Page 92: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Convergence assessment

Tempering

Facilitate exploration of π by flattening the target: simulate fromπα(x) ∝ π(x)α for α > 0 small enough

Determine where the modal regions of π are (possibly withparallel versions using different α’s)

Recycle simulations from π(x)α into simulations from π byimportance sampling

Simple modification of the Metropolis–Hastings algorithm,with new acceptance

{(

π(θ′|x)π(θ|x)

)α q(θ|θ′)q(θ′|θ)

}

∧ 1

Page 93: Introduction to advanced Monte Carlo methods

An introduction to advanced (?) MCMC methods

The Metropolis-Hastings Algorithm

Convergence assessment

Tempering with the mean mixture

−1 0 1 2 3 4

−10

12

34 1

−1 0 1 2 3 4

−10

12

34 0.5

−1 0 1 2 3 4

−10

12

34 0.2