mcmc methods: gibbs sampling and the metropolis-hastings algorithm
TRANSCRIPT
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
1/233
MCMC Methods: Gibbs Sampling and theMetropolis-Hastings Algorithm
Patrick Lam
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
2/233
Outline
Introduction to Markov Chain Monte Carlo
Gibbs Sampling
The Metropolis-Hastings Algorithm
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
3/233
Outline
Introduction to Markov Chain Monte Carlo
Gibbs Sampling
The Metropolis-Hastings Algorithm
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
4/233
What is Markov Chain Monte Carlo (MCMC)?
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
5/233
What is Markov Chain Monte Carlo (MCMC)?
Markov Chain: a stochastic process in which future states areindependent of past states given the present state
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
6/233
What is Markov Chain Monte Carlo (MCMC)?
Markov Chain: a stochastic process in which future states areindependent of past states given the present state
Monte Carlo: simulation
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
7/233
What is Markov Chain Monte Carlo (MCMC)?
Markov Chain: a stochastic process in which future states areindependent of past states given the present state
Monte Carlo: simulation
Up until now, we’ve done a lot of Monte Carlo simulation to findintegrals rather than doing it analytically, a process called Monte Carlo Integration.
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
8/233
What is Markov Chain Monte Carlo (MCMC)?
Markov Chain: a stochastic process in which future states areindependent of past states given the present state
Monte Carlo: simulation
Up until now, we’ve done a lot of Monte Carlo simulation to findintegrals rather than doing it analytically, a process called Monte Carlo Integration.
Basically a fancy way of saying we can take quantities of interest of a distribution from simulated draws from the distribution.
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
9/233
Monte Carlo Integration
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
10/233
Monte Carlo Integration
Suppose we have a distribution p (θ) (perhaps a posterior) that wewant to take quantities of interest from.
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
11/233
Monte Carlo Integration
Suppose we have a distribution p (θ) (perhaps a posterior) that wewant to take quantities of interest from.
To derive it analytically, we need to take integrals:
I =
Θg (θ)p (θ)d θ
where g (θ) is some function of θ (g (θ) = θ for the mean andg (θ) = (θ − E (θ))2 for the variance).
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
12/233
Monte Carlo Integration
Suppose we have a distribution p (θ) (perhaps a posterior) that wewant to take quantities of interest from.
To derive it analytically, we need to take integrals:
I =
Θg (θ)p (θ)d θ
where g (θ) is some function of θ (g (θ) = θ for the mean andg (θ) = (θ − E (θ))2 for the variance).
We can approximate the integrals via Monte Carlo Integration by
simulating M values from p (θ) and calculating
Î M = 1
M
M i =1
g (θ(i ))
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
13/233
For example, we can compute the expected value of the Beta(3,3)distribution analytically:
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
14/233
For example, we can compute the expected value of the Beta(3,3)distribution analytically:
E (θ) =
Θθp (θ)d θ =
Θθ
Γ(6)
Γ(3)Γ(3)θ2(1 − θ)2d θ =
1
2
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
15/233
For example, we can compute the expected value of the Beta(3,3)distribution analytically:
E (θ) =
Θθp (θ)d θ =
Θθ
Γ(6)
Γ(3)Γ(3)θ2(1 − θ)2d θ =
1
2
or via Monte Carlo methods:
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
16/233
For example, we can compute the expected value of the Beta(3,3)distribution analytically:
E (θ) =
Θθp (θ)d θ =
Θθ
Γ(6)
Γ(3)Γ(3)θ2(1 − θ)2d θ =
1
2
or via Monte Carlo methods:
> M beta.sims sum(beta.sims)/M
[1] 0.5013
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
17/233
For example, we can compute the expected value of the Beta(3,3)distribution analytically:
E (θ) =
Θθp (θ)d θ =
Θθ
Γ(6)
Γ(3)Γ(3)θ2(1 − θ)2d θ =
1
2
or via Monte Carlo methods:
> M beta.sims sum(beta.sims)/M
[1] 0.5013
Our Monte Carlo approximation Î M is a simulation consistent
estimator of the true value I :
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
18/233
For example, we can compute the expected value of the Beta(3,3)distribution analytically:
E (θ) =
Θθp (θ)d θ =
Θθ
Γ(6)
Γ(3)Γ(3)θ2(1 − θ)2d θ =
1
2
or via Monte Carlo methods:
> M beta.sims sum(beta.sims)/M
[1] 0.5013
Our Monte Carlo approximation Î M is a simulation consistent
estimator of the true value I : ˆI M → I as M → ∞.
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
19/233
For example, we can compute the expected value of the Beta(3,3)distribution analytically:
E (θ) =
Θθp (θ)d θ =
Θθ
Γ(6)
Γ(3)Γ(3)θ2(1 − θ)2d θ =
1
2
or via Monte Carlo methods:
> M beta.sims sum(beta.sims)/M
[1] 0.5013
Our Monte Carlo approximation Î M is a simulation consistent
estimator of the true value I : ˆI M → I as M → ∞.
We know this to be true from the Strong Law of Large Numbers.
S L f L N b (SLLN)
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
20/233
Strong Law of Large Numbers (SLLN)
St L f L N b (SLLN)
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
21/233
Strong Law of Large Numbers (SLLN)
Let X 1,X 2, . . . be a sequence of independent and identicallydistributed random variables, each having a finite mean µ = E (X i ).
Then with probability 1,
X 1 + X 2 + · · · + X M M
→ µ as M → ∞
St L f L N b (SLLN)
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
22/233
Strong Law of Large Numbers (SLLN)
Let X 1,X 2, . . . be a sequence of independent and identicallydistributed random variables, each having a finite mean µ = E (X i ).
Then with probability 1,
X 1 + X 2 + · · · + X M M
→ µ as M → ∞
In our previous example, each simulation draw was independentand distributed from the same Beta(3,3) distribution.
St g L f L g N b s (SLLN)
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
23/233
Strong Law of Large Numbers (SLLN)
Let X 1,X 2, . . . be a sequence of independent and identicallydistributed random variables, each having a finite mean µ = E (X i ).
Then with probability 1,
X 1 + X 2 + · · · + X M M
→ µ as M → ∞
In our previous example, each simulation draw was independentand distributed from the same Beta(3,3) distribution.
This also works with variances and other quantities of interest,since a function of i.i.d. random variables are also i.i.d. randomvariables.
Strong Law of Large Numbers (SLLN)
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
24/233
Strong Law of Large Numbers (SLLN)
Let X 1,X 2, . . . be a sequence of independent and identicallydistributed random variables, each having a finite mean µ = E (X i ).
Then with probability 1,
X 1 + X 2 + · · · + X M M
→ µ as M → ∞
In our previous example, each simulation draw was independentand distributed from the same Beta(3,3) distribution.
This also works with variances and other quantities of interest,since a function of i.i.d. random variables are also i.i.d. randomvariables.
But what if we can’t generate draws that are independent?
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
25/233
Suppose we want to draw from our posterior distribution p (θ|y ),but we cannot sample independent draws from it.
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
26/233
Suppose we want to draw from our posterior distribution p (θ|y ),but we cannot sample independent draws from it.
For example, we often do not know the normalizing constant.
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
27/233
Suppose we want to draw from our posterior distribution p (θ|y ),but we cannot sample independent draws from it.
For example, we often do not know the normalizing constant.
However, we may be able to sample draws from p (θ|y ) that areslightly dependent.
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
28/233
Suppose we want to draw from our posterior distribution p (θ|y ),but we cannot sample independent draws from it.
For example, we often do not know the normalizing constant.
However, we may be able to sample draws from p (θ|y ) that areslightly dependent.
If we can sample slightly dependent draws using a Markov chain,then we can still find quantities of interests from those draws.
What is a Markov Chain?
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
29/233
What is a Markov Chain?
What is a Markov Chain?
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
30/233
What is a Markov Chain?
Definition: a stochastic process in which future states areindependent of past states given the present state
What is a Markov Chain?
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
31/233
What is a Markov Chain?
Definition: a stochastic process in which future states areindependent of past states given the present state
Stochastic process: a consecutive set of random (notdeterministic) quantities defined on some known state space Θ.
What is a Markov Chain?
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
32/233
What is a Markov Chain?
Definition: a stochastic process in which future states areindependent of past states given the present state
Stochastic process: a consecutive set of random (notdeterministic) quantities defined on some known state space Θ.
think of Θ as our parameter space.
What is a Markov Chain?
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
33/233
C
Definition: a stochastic process in which future states areindependent of past states given the present state
Stochastic process: a consecutive set of random (notdeterministic) quantities defined on some known state space Θ.
think of Θ as our parameter space.
consecutive implies a time component, indexed by t .
What is a Markov Chain?
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
34/233
Definition: a stochastic process in which future states areindependent of past states given the present state
Stochastic process: a consecutive set of random (notdeterministic) quantities defined on some known state space Θ.
think of Θ as our parameter space.
consecutive implies a time component, indexed by t .
Consider a draw of θ(t ) to be a state at iteration t . The next drawθ
(t +1) is dependent only on the current draw θ(t ), and not on anypast draws.
What is a Markov Chain?
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
35/233
Definition: a stochastic process in which future states areindependent of past states given the present state
Stochastic process: a consecutive set of random (notdeterministic) quantities defined on some known state space Θ.
think of Θ as our parameter space.
consecutive implies a time component, indexed by t .
Consider a draw of θ(t ) to be a state at iteration t . The next drawθ
(t +1) is dependent only on the current draw θ(t ), and not on anypast draws.
This satisfies the Markov property:
What is a Markov Chain?
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
36/233
Definition: a stochastic process in which future states areindependent of past states given the present state
Stochastic process: a consecutive set of random (notdeterministic) quantities defined on some known state space Θ.
think of Θ as our parameter space.
consecutive implies a time component, indexed by t .
Consider a draw of θ(t ) to be a state at iteration t . The next drawθ
(t +1) is dependent only on the current draw θ(t ), and not on anypast draws.
This satisfies the Markov property:
p (θ(t +1)|θ(1), θ(2), . . . , θ(t )) = p (θ(t +1)|θ(t ))
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
37/233
So our Markov chain is a bunch of draws of θ that are each slightly
dependent on the previous one.
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
38/233
So our Markov chain is a bunch of draws of θ that are each slightly
dependent on the previous one. The chain wanders around theparameter space, remembering only where it has been in the lastperiod.
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
39/233
So our Markov chain is a bunch of draws of θ that are each slightly
dependent on the previous one. The chain wanders around theparameter space, remembering only where it has been in the lastperiod.
What are the rules governing how the chain jumps from one stateto another at each period?
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
40/233
So our Markov chain is a bunch of draws of θ that are each slightly
dependent on the previous one. The chain wanders around theparameter space, remembering only where it has been in the lastperiod.
What are the rules governing how the chain jumps from one stateto another at each period?
The jumping rules are governed by a transition kernel, which is amechanism that describes the probability of moving to some other
state based on the current state.
Transition Kernel
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
41/233
Transition Kernel
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
42/233
For discrete state space (k possible states):
Transition Kernel
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
43/233
For discrete state space (k possible states): a k × k matrix of transition probabilities.
Transition Kernel
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
44/233
For discrete state space (k possible states): a k × k matrix of transition probabilities.
Example: Suppose k = 3.
Transition Kernel
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
45/233
For discrete state space (k possible states): a k × k matrix of transition probabilities.
Example: Suppose k = 3. The 3 × 3 transition matrix P would be
p (θ(t +1)A |θ
(t )A ) p (θ
(t +1)B |θ
(t )A ) p (θ
(t +1)C |θ
(t )A )
p (θ(t +1)A |θ
(t )B ) p (θ
(t +1)B |θ
(t )B ) p (θ
(t +1)C |θ
(t )B )
p (θ(t +1)A |θ(t )C ) p (θ(t +1)B |θ(t )C ) p (θ(t +1)C |θ(t )C )
where the subscripts index the 3 possible values that θ can take.
Transition Kernel
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
46/233
For discrete state space (k possible states): a k × k matrix of transition probabilities.
Example: Suppose k = 3. The 3 × 3 transition matrix P would be
p (θ(t +1)A |θ
(t )A ) p (θ
(t +1)B |θ
(t )A ) p (θ
(t +1)C |θ
(t )A )
p (θ(t +1)A |θ
(t )B ) p (θ
(t +1)B |θ
(t )B ) p (θ
(t +1)C |θ
(t )B )
p (θ(t +1)A |θ(t )C ) p (θ(t +1)B |θ(t )C ) p (θ(t +1)C |θ(t )C )
where the subscripts index the 3 possible values that θ can take.
The rows sum to one and define a conditional PMF, conditional on
the current state.
Transition Kernel
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
47/233
For discrete state space (k possible states): a k × k matrix of transition probabilities.
Example: Suppose k = 3. The 3 × 3 transition matrix P would be
p (θ(t +1)A |θ
(t )A ) p (θ
(t +1)B |θ
(t )A ) p (θ
(t +1)C |θ
(t )A )
p (θ(t +1)A |θ
(t )B ) p (θ
(t +1)B |θ
(t )B ) p (θ
(t +1)C |θ
(t )B )
p (θ(t
+1)A |θ(t
)C ) p (θ(t
+1)B |θ(t
)C ) p (θ(t
+1)C |θ(t
)C )
where the subscripts index the 3 possible values that θ can take.
The rows sum to one and define a conditional PMF, conditional on
the current state. The columns are the marginal probabilities of being in a certain state in the next period.
Transition Kernel
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
48/233
For discrete state space (k possible states): a k × k matrix of transition probabilities.
Example: Suppose k = 3. The 3 × 3 transition matrix P would be
p (θ(t +1)A |θ
(t )A ) p (θ
(t +1)B |θ
(t )A ) p (θ
(t +1)C |θ
(t )A )
p (θ(t +1)A |θ
(t )B ) p (θ
(t +1)B |θ
(t )B ) p (θ
(t +1)C |θ
(t )B )
p (θ(t
+1)A |θ(t
)C ) p (θ(t
+1)B |θ(t
)C ) p (θ(t
+1)C |θ(t
)C )
where the subscripts index the 3 possible values that θ can take.
The rows sum to one and define a conditional PMF, conditional on
the current state. The columns are the marginal probabilities of being in a certain state in the next period.
For continuous state space (infinite possible states), the transition
kernel is a bunch of conditional PDFs: f (θ(t +1)
j
|θ(t )
i
)
How Does a Markov Chain Work? (Discrete Example)
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
49/233
How Does a Markov Chain Work? (Discrete Example)
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
50/233
1. Define a starting distribution
(0) (a 1 × k vector of
probabilities that sum to one).
How Does a Markov Chain Work? (Discrete Example)
http://goforward/http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
51/233
1. Define a starting distribution
(0) (a 1 × k vector of
probabilities that sum to one).
2. At iteration 1, our distribution(1) (from which θ(1) is
drawn) is
(1) =
(0) × P
(1 × k ) (1 × k ) × (k × k )
How Does a Markov Chain Work? (Discrete Example)
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
52/233
1. Define a starting distribution
(0) (a 1 × k vector of
probabilities that sum to one).
2. At iteration 1, our distribution(1) (from which θ(1) is
drawn) is
(1) =
(0) × P
(1 × k ) (1 × k ) × (k × k )
3. At iteration 2, our distribution(2) (from which θ(2) is
drawn) is
(2) = (1) × P(1 × k ) (1 × k ) × (k × k )
How Does a Markov Chain Work? (Discrete Example)
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
53/233
1. Define a starting distribution
(0) (a 1 × k vector of
probabilities that sum to one).
2. At iteration 1, our distribution(1) (from which θ(1) is
drawn) is
(1) =
(0) × P
(1 × k ) (1 × k ) × (k × k )
3. At iteration 2, our distribution(2) (from which θ(2) is
drawn) is
(2) = (1) × P(1 × k ) (1 × k ) × (k × k )
4. At iteration t , our distribution(t ) (from which θ(t ) is
drawn) is (t ) =
(t −1) × P = (0) × Pt
Stationary (Limiting) Distribution
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
54/233
Stationary (Limiting) Distribution
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
55/233
Define a stationary distribution π to be some distribution
suchthat π = πP.
Stationary (Limiting) Distribution
http://goforward/http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
56/233
Define a stationary distribution π to be some distribution
suchthat π = πP.
For all the MCMC algorithms we use in Bayesian statistics, theMarkov chain will typically converge to π regardless of our
starting points.
Stationary (Limiting) Distribution
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
57/233
Define a stationary distribution π to be some distribution
suchthat π = πP.
For all the MCMC algorithms we use in Bayesian statistics, theMarkov chain will typically converge to π regardless of our
starting points.
So if we can devise a Markov chain whose stationary distribution πis our desired posterior distribution p (θ|y ), then we can run thischain to get draws that are approximately from p (θ|y ) once the
chain has converged.
Burn-in
http://goforward/http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
58/233
Burn-in
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
59/233
Since convergence usually occurs regardless of our starting point,we can usually pick any feasible (for example, picking startingdraws that are in the parameter space) starting point.
Burn-in
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
60/233
Since convergence usually occurs regardless of our starting point,we can usually pick any feasible (for example, picking startingdraws that are in the parameter space) starting point.
However, the time it takes for the chain to converge variesdepending on the starting point.
Burn-in
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
61/233
Since convergence usually occurs regardless of our starting point,we can usually pick any feasible (for example, picking startingdraws that are in the parameter space) starting point.
However, the time it takes for the chain to converge variesdepending on the starting point.
As a matter of practice, most people throw out a certain numberof the first draws, known as the burn-in. This is to make ourdraws closer to the stationary distribution and less dependent onthe starting point.
Burn-in
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
62/233
Since convergence usually occurs regardless of our starting point,we can usually pick any feasible (for example, picking starting
draws that are in the parameter space) starting point.
However, the time it takes for the chain to converge variesdepending on the starting point.
As a matter of practice, most people throw out a certain numberof the first draws, known as the burn-in. This is to make ourdraws closer to the stationary distribution and less dependent onthe starting point.
However, it is unclear how much we should burn-in since ourdraws are all slightly dependent and we don’t know exactly whenconvergence occurs.
Monte Carlo Integration on the Markov Chain
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
63/233
Monte Carlo Integration on the Markov Chain
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
64/233
Once we have a Markov chain that has converged to the stationarydistribution, then the draws in our chain appear to be like drawsfrom p (θ|y ),
Monte Carlo Integration on the Markov Chain
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
65/233
Once we have a Markov chain that has converged to the stationarydistribution, then the draws in our chain appear to be like drawsfrom p (θ|y ), so it seems like we should be able to use Monte CarloIntegration methods to find quantities of interest.
Monte Carlo Integration on the Markov Chain
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
66/233
Once we have a Markov chain that has converged to the stationarydistribution, then the draws in our chain appear to be like drawsfrom p (θ|y ), so it seems like we should be able to use Monte CarloIntegration methods to find quantities of interest.
One problem: our draws are not independent, which we requiredfor Monte Carlo Integration to work (remember SLLN).
Monte Carlo Integration on the Markov Chain
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
67/233
Once we have a Markov chain that has converged to the stationarydistribution, then the draws in our chain appear to be like drawsfrom p (θ|y ), so it seems like we should be able to use Monte CarloIntegration methods to find quantities of interest.
One problem: our draws are not independent, which we requiredfor Monte Carlo Integration to work (remember SLLN).
Luckily, we have the Ergodic Theorem.
Ergodic Theorem
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
68/233
Ergodic Theorem
Let θ(1),θ(2), . . . , θ(M ) be M values from a Markov chain that is
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
69/233
aperiodic , irreducible , and positive recurrent (then the chain isergodic), and E [g (θ)] < ∞.
Ergodic Theorem
Let θ(1),θ(2), . . . , θ(M ) be M values from a Markov chain that is
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
70/233
aperiodic , irreducible , and positive recurrent (then the chain isergodic), and E [g (θ)] < ∞.
Then with probability 1,
1
M
M
i =1
g (θi ) → Θ
g (θ)π(θ)d θ
as M → ∞, where π is the stationary distribution.
Ergodic Theorem
Let θ(1),θ(2), . . . , θ(M ) be M values from a Markov chain that is
http://goforward/http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
71/233
aperiodic , irreducible , and positive recurrent (then the chain isergodic), and E [g (θ)] < ∞.
Then with probability 1,
1
M
M
i =1
g (θi ) → Θ
g (θ)π(θ)d θ
as M → ∞, where π is the stationary distribution.
This is the Markov chain analog to the SLLN,
Ergodic Theorem
Let θ(1),θ(2), . . . , θ(M ) be M values from a Markov chain that is
http://goforward/http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
72/233
aperiodic , irreducible , and positive recurrent (then the chain isergodic), and E [g (θ)] < ∞.
Then with probability 1,
1
M
M
i =1
g (θi ) → Θ
g (θ)π(θ)d θ
as M → ∞, where π is the stationary distribution.
This is the Markov chain analog to the SLLN, and it allows us to
ignore the dependence between draws of the Markov chain whenwe calculate quantities of interest from the draws.
Ergodic Theorem
Let θ(1),θ(2), . . . , θ(M ) be M values from a Markov chain that is(
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
73/233
aperiodic , irreducible , and positive recurrent (then the chain isergodic), and E [g (θ)] < ∞.
Then with probability 1,
1
M
M
i =1
g (θi ) → Θ
g (θ)π(θ)d θ
as M → ∞, where π is the stationary distribution.
This is the Markov chain analog to the SLLN, and it allows us to
ignore the dependence between draws of the Markov chain whenwe calculate quantities of interest from the draws.
But what does it mean for a chain to be aperiodic , irreducible , andpositive recurrent , and therefore ergodic?
Aperiodicity
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
74/233
Aperiodicity
A Markov chain is aperiodic if the only length of time for which
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
75/233
the chain repeats some cycle of values is the trivial case with cycle
length equal to one.
Aperiodicity
A Markov chain is aperiodic if the only length of time for which
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
76/233
the chain repeats some cycle of values is the trivial case with cycle
length equal to one.
Let A, B, and C denote the states (analogous to the possiblevalues of θ) in a 3-state Markov chain.
Aperiodicity
A Markov chain is aperiodic if the only length of time for which
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
77/233
the chain repeats some cycle of values is the trivial case with cycle
length equal to one.
Let A, B, and C denote the states (analogous to the possiblevalues of θ) in a 3-state Markov chain. The following chain isperiodic with period 3, where the period is the number of steps
that it takes to return to a certain state.
Aperiodicity
A Markov chain is aperiodic if the only length of time for which
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
78/233
the chain repeats some cycle of values is the trivial case with cycle
length equal to one.
Let A, B, and C denote the states (analogous to the possiblevalues of θ) in a 3-state Markov chain. The following chain isperiodic with period 3, where the period is the number of steps
that it takes to return to a certain state.
A
0
1 B
0
1 C
0
1
Aperiodicity
A Markov chain is aperiodic if the only length of time for whichh h i l f l i h i i l i h l
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
79/233
the chain repeats some cycle of values is the trivial case with cycle
length equal to one.
Let A, B, and C denote the states (analogous to the possiblevalues of θ) in a 3-state Markov chain. The following chain isperiodic with period 3, where the period is the number of steps
that it takes to return to a certain state.
A
0
1 B
0
1 C
0
1
As long as the chain is not repeating an identical cycle, then thechain is aperiodic.
Irreducibility
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
80/233
Irreducibility
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
81/233
A Markov chain is irreducible if it is possible go from any state toany other state (not necessarily in one step).
Irreducibility
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
82/233
A Markov chain is irreducible if it is possible go from any state toany other state (not necessarily in one step).
The following chain is reducible , or not irreducible.
Irreducibility
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
83/233
A Markov chain is irreducible if it is possible go from any state toany other state (not necessarily in one step).
The following chain is reducible , or not irreducible.
A
0.5
0.5 B
0.7
0.3 C
0.4
0.6
Irreducibility
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
84/233
A Markov chain is irreducible if it is possible go from any state toany other state (not necessarily in one step).
The following chain is reducible , or not irreducible.
A
0.5
0.5 B
0.7
0.3 C
0.4
0.6
The chain is not irreducible because we cannot get to A from B or
C regardless of the number of steps we take.
Positive Recurrence
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
85/233
Positive Recurrence
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
86/233
A Markov chain is recurrent if for any given state i , if the chain
starts at i , it will eventually return to i with probability 1.
Positive Recurrence
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
87/233
A Markov chain is recurrent if for any given state i , if the chain
starts at i , it will eventually return to i with probability 1.
A Markov chain is positive recurrent if the expected return timeto state i is finite;
Positive Recurrence
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
88/233
A Markov chain is recurrent if for any given state i , if the chain
starts at i , it will eventually return to i with probability 1.
A Markov chain is positive recurrent if the expected return timeto state i is finite; otherwise it is null recurrent .
Positive Recurrence
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
89/233
A Markov chain is recurrent if for any given state i , if the chain
starts at i , it will eventually return to i with probability 1.
A Markov chain is positive recurrent if the expected return timeto state i is finite; otherwise it is null recurrent .
So if our Markov chain is aperiodic, irreducible, and positiverecurrent (all the ones we use in Bayesian statistics usually are),then it is ergodic and the ergodic theorem allows us to do MonteCarlo Integration by calculating quantities of interest from ourdraws, ignoring the dependence between draws.
Thinning the Chain
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
90/233
Thinning the Chain
In order to break the dependence between draws in the Markov
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
91/233
chain, some have suggested only keeping every d th draw of the
chain.
Thinning the Chain
In order to break the dependence between draws in the Markov
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
92/233
chain, some have suggested only keeping every d th draw of the
chain.
This is known as thinning.
Thinning the Chain
In order to break the dependence between draws in the Markov
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
93/233
chain, some have suggested only keeping every d th draw of the
chain.
This is known as thinning.
Pros:
Thinning the Chain
In order to break the dependence between draws in the Markov
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
94/233
chain, some have suggested only keeping every d th draw of the
chain.
This is known as thinning.
Pros:
Perhaps gets you a little closer to i.i.d. draws.
Thinning the Chain
In order to break the dependence between draws in the Markovh i h d l k i d h d f h
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
95/233
chain, some have suggested only keeping every d th draw of the
chain.
This is known as thinning.
Pros:
Perhaps gets you a little closer to i.i.d. draws.
Saves memory since you only store a fraction of the draws.
Thinning the Chain
In order to break the dependence between draws in the Markovh i h d l k i d h d f h
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
96/233
chain, some have suggested only keeping every d th draw of the
chain.
This is known as thinning.
Pros:
Perhaps gets you a little closer to i.i.d. draws.
Saves memory since you only store a fraction of the draws.
Cons:
Thinning the Chain
In order to break the dependence between draws in the Markovh i h t d l k i dth d f th
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
97/233
chain, some have suggested only keeping every d th draw of the
chain.
This is known as thinning.
Pros:
Perhaps gets you a little closer to i.i.d. draws.
Saves memory since you only store a fraction of the draws.
Cons:
Unnecessary with ergodic theorem.
Thinning the Chain
In order to break the dependence between draws in the Markovh i h t d l k i dth d f th
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
98/233
chain, some have suggested only keeping every d th draw of the
chain.
This is known as thinning.
Pros:
Perhaps gets you a little closer to i.i.d. draws.
Saves memory since you only store a fraction of the draws.
Cons:
Unnecessary with ergodic theorem.
Shown to increase the variance of your Monte Carlo estimates.
So Really, What is MCMC?
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
99/233
So Really, What is MCMC?
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
100/233
MCMC is a class of methods in which we can simulate draws thatare slightly dependent and are approximately from a (posterior)distribution.
So Really, What is MCMC?
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
101/233
MCMC is a class of methods in which we can simulate draws thatare slightly dependent and are approximately from a (posterior)distribution.
We then take those draws and calculate quantities of interest for
the (posterior) distribution.
So Really, What is MCMC?
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
102/233
MCMC is a class of methods in which we can simulate draws thatare slightly dependent and are approximately from a (posterior)distribution.
We then take those draws and calculate quantities of interest for
the (posterior) distribution.
In Bayesian statistics, there are generally two MCMC algorithmsthat we use: the Gibbs Sampler and the Metropolis-Hastingsalgorithm.
Outline
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
103/233
Introduction to Markov Chain Monte Carlo
Gibbs Sampling
The Metropolis-Hastings Algorithm
Gibbs Sampling
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
104/233
Gibbs Sampling
Suppose we have a joint distribution p(θ1, . . . , θk ) that we want to
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
105/233
Suppose we have a joint distribution p (θ1, . . . , θk ) that we want to
sample from (for example, a posterior distribution).
Gibbs Sampling
Suppose we have a joint distribution p (θ1, . . . , θk ) that we want to
http://goforward/http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
106/233
S pp j p( 1, , k )
sample from (for example, a posterior distribution).
We can use the Gibbs sampler to sample from the joint distributionif we knew the full conditional distributions for each parameter.
Gibbs Sampling
Suppose we have a joint distribution p (θ1, . . . , θk ) that we want to
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
107/233
pp j p( 1, , k )
sample from (for example, a posterior distribution).
We can use the Gibbs sampler to sample from the joint distributionif we knew the full conditional distributions for each parameter.
For each parameter, the full conditional distribution is thedistribution of the parameter conditional on the known informationand all the other parameters:
Gibbs Sampling
Suppose we have a joint distribution p (θ1, . . . , θk ) that we want to
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
108/233
pp j p( , , k )
sample from (for example, a posterior distribution).
We can use the Gibbs sampler to sample from the joint distributionif we knew the full conditional distributions for each parameter.
For each parameter, the full conditional distribution is thedistribution of the parameter conditional on the known informationand all the other parameters: p (θ j |θ− j , y )
Gibbs Sampling
Suppose we have a joint distribution p (θ1, . . . , θk ) that we want to
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
109/233
j ( )
sample from (for example, a posterior distribution).
We can use the Gibbs sampler to sample from the joint distributionif we knew the full conditional distributions for each parameter.
For each parameter, the full conditional distribution is thedistribution of the parameter conditional on the known informationand all the other parameters: p (θ j |θ− j , y )
How can we know the joint distribution simply by knowing the full
conditional distributions?
The Hammersley-Clifford Theorem (for two blocks)
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
110/233
The Hammersley-Clifford Theorem (for two blocks)Suppose we have a joint density f (x , y ).
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
111/233
The Hammersley-Clifford Theorem (for two blocks)Suppose we have a joint density f (x , y ). The theorem proves thatwe can write out the joint density in terms of the conditionaldensities f (x |y ) and f (y |x ):
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
112/233
The Hammersley-Clifford Theorem (for two blocks)Suppose we have a joint density f (x , y ). The theorem proves thatwe can write out the joint density in terms of the conditionaldensities f (x |y ) and f (y |x ):
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
113/233
f (x , y ) = f (y |x ) f (y |x )
f (x |y ) dy
The Hammersley-Clifford Theorem (for two blocks)Suppose we have a joint density f (x , y ). The theorem proves thatwe can write out the joint density in terms of the conditionaldensities f (x |y ) and f (y |x ):
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
114/233
f (x , y ) = f (y |x ) f (y |x )
f (x |y ) dy
We can write the denominator as
f (y |x )
f (x |y )dy =
f (x ,y )f (x )
f (x ,y )f (y )
dy
The Hammersley-Clifford Theorem (for two blocks)Suppose we have a joint density f (x , y ). The theorem proves thatwe can write out the joint density in terms of the conditionaldensities f (x |y ) and f (y |x ):
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
115/233
f (x , y ) = f (y |x ) f (y |x )
f (x |y ) dy
We can write the denominator as
f (y |x )
f (x |y )dy =
f (x ,y )f (x )
f (x ,y )f (y )
dy
= f (y )f (x )
dy
The Hammersley-Clifford Theorem (for two blocks)Suppose we have a joint density f (x , y ). The theorem proves thatwe can write out the joint density in terms of the conditionaldensities f (x |y ) and f (y |x ):
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
116/233
f (x , y ) = f (y |x ) f (y |x )
f (x |y ) dy
We can write the denominator as
f (y |x )
f (x |y )dy =
f (x ,y )f (x )
f (x ,y )f (y )
dy
= f (y )f (x )
dy
= 1
f (x )
Thus, our right-hand side is
f (y |x )
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
117/233
(y | )1
f (x )= f (y |x )f (x )
Thus, our right-hand side is
f (y |x )
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
118/233
(y | )1
f (x )= f (y |x )f (x )
= f (x , y )
Thus, our right-hand side is
f (y |x )
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
119/233
(y | )1
f (x )= f (y |x )f (x )
= f (x , y )
The theorem shows that knowledge of the conditional densities
allows us to get the joint density.
Thus, our right-hand side is
f (y |x )
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
120/233
( | )1
f (x )= f (y |x )f (x )
= f (x , y )
The theorem shows that knowledge of the conditional densities
allows us to get the joint density.
This works for more than two blocks of parameters.
Thus, our right-hand side is
f (y |x )
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
121/233
1f (x )
= f (y |x )f (x )
= f (x , y )
The theorem shows that knowledge of the conditional densities
allows us to get the joint density.
This works for more than two blocks of parameters.
But how do we figure out the full conditionals?
Steps to Calculating Full Conditional Distributions
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
122/233
Steps to Calculating Full Conditional Distributions
Suppose we have a posterior p (θ|y).
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
123/233
Steps to Calculating Full Conditional Distributions
Suppose we have a posterior p (θ|y). To calculate the full
d l f h θ d h f ll
http://goforward/http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
124/233
conditionals for each θ, do the following:
Steps to Calculating Full Conditional Distributions
Suppose we have a posterior p (θ|y). To calculate the full
di i l f h θ d h f ll i
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
125/233
conditionals for each θ, do the following:
1. Write out the full posterior ignoring constants of proportionality.
Steps to Calculating Full Conditional Distributions
Suppose we have a posterior p (θ|y). To calculate the full
di i l f h θ d h f ll i
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
126/233
conditionals for each θ, do the following:
1. Write out the full posterior ignoring constants of proportionality.
2. Pick a block of parameters (for example, θ1) and drop
everything that doesn’t depend on θ1.
Steps to Calculating Full Conditional Distributions
Suppose we have a posterior p (θ|y). To calculate the full
diti l f h θ d th f ll i
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
127/233
conditionals for each θ, do the following:
1. Write out the full posterior ignoring constants of proportionality.
2. Pick a block of parameters (for example, θ1) and drop
everything that doesn’t depend on θ1.
3. Use your knowledge of distributions to figure out what thenormalizing constant is (and thus what the full conditionaldistribution p (θ1|θ−1, y) is).
Steps to Calculating Full Conditional Distributions
Suppose we have a posterior p (θ|y). To calculate the full
diti l f h θ d th f ll i
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
128/233
conditionals for each θ, do the following:
1. Write out the full posterior ignoring constants of proportionality.
2. Pick a block of parameters (for example, θ1) and drop
everything that doesn’t depend on θ1.
3. Use your knowledge of distributions to figure out what thenormalizing constant is (and thus what the full conditionaldistribution p (θ1|θ−1, y) is).
4. Repeat steps 2 and 3 for all parameter blocks.
Gibbs Sampler Steps
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
129/233
Gibbs Sampler Steps
Let’s suppose that we are interested in sampling from the posterior
p(θ|y) where θ is a vector of three parameters θ1 θ2 θ3
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
130/233
p (θ|y), where θ is a vector of three parameters, θ1, θ2, θ3.
Gibbs Sampler Steps
Let’s suppose that we are interested in sampling from the posterior
p(θ|y) where θ is a vector of three parameters θ1 θ2 θ3
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
131/233
p (θ|y), where θ is a vector of three parameters, θ1, θ2, θ3.
The steps to a Gibbs Sampler (and the analogous steps in the MCMC process) are
Gibbs Sampler Steps
Let’s suppose that we are interested in sampling from the posterior
p(θ|y) where θ is a vector of three parameters θ1 θ2 θ3
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
132/233
p (θ|y), where θ is a vector of three parameters, θ1, θ2, θ3.
The steps to a Gibbs Sampler (and the analogous steps in the MCMC process) are
1. Pick a vector of starting values θ(0). (Defining a starting distributionQ(0) anddrawing θ(0) from it.)
Gibbs Sampler Steps
Let’s suppose that we are interested in sampling from the posterior
p(θ|y) where θ is a vector of three parameters θ1 θ2 θ3
http://goforward/http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
133/233
p (θ|y), where θ is a vector of three parameters, θ1, θ2, θ3.
The steps to a Gibbs Sampler (and the analogous steps in the MCMC process) are
1. Pick a vector of starting values θ(0). (Defining a starting distributionQ(0) anddrawing θ(0) from it.)
2. Start with any θ (order does not matter, but I’ll start with θ1
for convenience).
Gibbs Sampler Steps
Let’s suppose that we are interested in sampling from the posterior
p(θ|y) where θ is a vector of three parameters θ1 θ2 θ3
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
134/233
p (θ|y), where θ is a vector of three parameters, θ1, θ2, θ3.
The steps to a Gibbs Sampler (and the analogous steps in the MCMC process) are
1. Pick a vector of starting values θ(0). (Defining a starting distributionQ(0) anddrawing θ(0) from it.)
2. Start with any θ (order does not matter, but I’ll start with θ1
for convenience). Draw a value θ(1)1 from the full conditional
p (θ1|θ(0)
2
, θ(0)
3
, y).
3. Draw a value θ(1)2 (again order does not matter) from the full
conditional p (θ2|θ(1)1 , θ
(0)3 , y).
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
135/233
3. Draw a value θ(1)2 (again order does not matter) from the full
conditional p (θ2|θ(1)1 , θ
(0)3 , y). Note that we must use the
updated value of θ(1)1 .
http://goforward/http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
136/233
3. Draw a value θ(1)
2 (again order does not matter) from the fullconditional p (θ2|θ
(1)1 , θ
(0)3 , y). Note that we must use the
updated value of θ(1)1 .
4. Draw a value θ(1)3 from the full conditional p (θ3|θ
(1)1 , θ
(1)2 , y)
using both updated values.
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
137/233
using both updated values.
3. Draw a value θ(1)
2 (again order does not matter) from the fullconditional p (θ2|θ
(1)1 , θ
(0)3 , y). Note that we must use the
updated value of θ(1)1 .
4. Draw a value θ(1)3 from the full conditional p (θ3|θ
(1)1 , θ
(1)2 , y)
using both updated values. (Steps 2-4 are analogous to multiplyingQ(0) and P to get
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
138/233
using both updated values. (Steps 2 4 are analogous to multiplyingQ and P to getQ(1) and then drawing θ(1) from
Q(1).)
3. Draw a value θ(1)
2 (again order does not matter) from the fullconditional p (θ2|θ
(1)1 , θ
(0)3 , y). Note that we must use the
updated value of θ(1)1 .
4. Draw a value θ(1)3 from the full conditional p (θ3|θ
(1)1 , θ
(1)2 , y)
using both updated values. (Steps 2-4 are analogous to multiplyingQ(0) and P to get
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
139/233
g p ( p g p y gQ gQ(1) and then drawing θ(1) from
Q(1).)
5. Draw θ(2) using θ(1) and continually using the most updatedvalues.
3. Draw a value θ(1)
2 (again order does not matter) from the fullconditional p (θ2|θ
(1)1 , θ
(0)3 , y). Note that we must use the
updated value of θ(1)1 .
4. Draw a value θ(1)3 from the full conditional p (θ3|θ
(1)1 , θ
(1)2 , y)
using both updated values. (Steps 2-4 are analogous to multiplyingQ(0) and P to get
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
140/233
g p ( QQ(1) and then drawing θ(1) from
Q(1).)
5. Draw θ(2) using θ(1) and continually using the most updatedvalues.
6. Repeat until we get M draws, with each draw being a vectorθ
(t ).
3. Draw a value θ(1)
2 (again order does not matter) from the fullconditional p (θ2|θ
(1)1 , θ
(0)3 , y). Note that we must use the
updated value of θ(1)1 .
4. Draw a value θ(1)3 from the full conditional p (θ3|θ
(1)1 , θ
(1)2 , y)
using both updated values. (Steps 2-4 are analogous to multiplyingQ(0) and P to get
http://goforward/http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
141/233
g pQ(1) and then drawing θ(1) from
Q(1).)
5. Draw θ(2) using θ(1) and continually using the most updatedvalues.
6. Repeat until we get M draws, with each draw being a vectorθ
(t ).
7. Optional burn-in and/or thinning.
3. Draw a value θ(1)
2 (again order does not matter) from the fullconditional p (θ2|θ
(1)1 , θ
(0)3 , y). Note that we must use the
updated value of θ(1)1 .
4. Draw a value θ(1)3 from the full conditional p (θ3|θ
(1)1 , θ
(1)2 , y)
using both updated values. (Steps 2-4 are analogous to multiplyingQ(0) and P to get
http://goforward/http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
142/233
g pQ(1) and then drawing θ(1) from
Q(1).)
5. Draw θ(2) using θ(1) and continually using the most updatedvalues.
6. Repeat until we get M draws, with each draw being a vectorθ
(t ).
7. Optional burn-in and/or thinning.
Our result is a Markov chain with a bunch of draws of θ that areapproximately from our posterior.
3. Draw a value θ(1)
2 (again order does not matter) from the fullconditional p (θ2|θ
(1)1 , θ
(0)3 , y). Note that we must use the
updated value of θ(1)1 .
4. Draw a value θ(1)3 from the full conditional p (θ3|θ
(1)1 , θ
(1)2 , y)
using both updated values. (Steps 2-4 are analogous to multiplyingQ(0) and P to get
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
143/233
Q(1) and then drawing θ(1) fromQ(1).)
5. Draw θ(2) using θ(1) and continually using the most updatedvalues.
6. Repeat until we get M draws, with each draw being a vectorθ
(t ).
7. Optional burn-in and/or thinning.
Our result is a Markov chain with a bunch of draws of θ that areapproximately from our posterior. We can do Monte CarloIntegration on those draws to get quantities of interest.
An Example (Robert and Casella, 10.17)1
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
144/233
1Robert, Christian P. and George Casella. 2004. Monte Carlo Statistical Met ho ds , 2nd ed it io n. Springer.
An Example (Robert and Casella, 10.17)1
Suppose we have data of the number of failures (y i ) for each of 10pumps in a nuclear plant.
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
145/233
1Robert, Christian P. and George Casella. 2004. Monte Carlo Statistical Met ho ds , 2nd ed it io n. Springer.
An Example (Robert and Casella, 10.17)
1
Suppose we have data of the number of failures (y i ) for each of 10pumps in a nuclear plant.
We also have the times (t i ) at which each pump was observed.
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
146/233
1Robert, Christian P. and George Casella. 2004. Monte Carlo Statistical Met ho ds , 2nd ed it io n. Springer.
An Example (Robert and Casella, 10.17)
1
Suppose we have data of the number of failures (y i ) for each of 10pumps in a nuclear plant.
We also have the times (t i ) at which each pump was observed.(5 1 5 14 3 19 1 1 4 22)
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
147/233
> y t rbind(y, t)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]y 5 1 5 14 3 19 1 1 4 22t 94 16 63 126 5 31 1 1 2 10
1Robert, Christian P. and George Casella. 2004. Monte Carlo Statistical Met ho ds , 2nd ed it io n. Springer.
An Example (Robert and Casella, 10.17)
1
Suppose we have data of the number of failures (y i ) for each of 10pumps in a nuclear plant.
We also have the times (t i ) at which each pump was observed.> < (5 1 5 14 3 19 1 1 4 22)
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
148/233
> y t rbind(y, t)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]y 5 1 5 14 3 19 1 1 4 22t 94 16 63 126 5 31 1 1 2 10
We want to model the number of failures with a Poisson likelihood,where the expected number of failure λi differs for each pump.
1Robert, Christian P. and George Casella. 2004. Monte Carlo Statistical Met ho ds , 2nd ed it io n. Springer.
An Example (Robert and Casella, 10.17)
1
Suppose we have data of the number of failures (y i ) for each of 10pumps in a nuclear plant.
We also have the times (t i ) at which each pump was observed.> y < c(5 1 5 14 3 19 1 1 4 22)
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
149/233
> y t rbind(y, t)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]y 5 1 5 14 3 19 1 1 4 22t 94 16 63 126 5 31 1 1 2 10
We want to model the number of failures with a Poisson likelihood,where the expected number of failure λi differs for each pump.Since the time which we observed each pump is different, we needto scale each λi by its observed time t i .
1Robert, Christian P. and George Casella. 2004. Monte Carlo Statistical Met ho ds , 2nd ed it io n. Springer.
An Example (Robert and Casella, 10.17)
1
Suppose we have data of the number of failures (y i ) for each of 10pumps in a nuclear plant.
We also have the times (t i ) at which each pump was observed.> y
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
150/233
> y < c(5, 1, 5, 14, 3, 19, 1, 1, 4, 22)> t rbind(y, t)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]y 5 1 5 14 3 19 1 1 4 22t 94 16 63 126 5 31 1 1 2 10
We want to model the number of failures with a Poisson likelihood,where the expected number of failure λi differs for each pump.Since the time which we observed each pump is different, we needto scale each λi by its observed time t i .
Our likelihood is 10
i =1 Poisson(λi t i ).
1Robert, Christian P. and George Casella. 2004. Monte Carlo Statistical Met ho ds , 2nd ed it io n. Springer.
Let’s put a Gamma(α, β ) prior on λi with α = 1.8, so the λi s aredrawn from the same distribution.
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
151/233
Let’s put a Gamma(α, β ) prior on λi with α = 1.8, so the λi s aredrawn from the same distribution.
Also, let’s put a Gamma(γ, δ ) prior on β , with γ = 0.01 and δ = 1.
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
152/233
Let’s put a Gamma(α, β ) prior on λi with α = 1.8, so the λi s aredrawn from the same distribution.
Also, let’s put a Gamma(γ, δ ) prior on β , with γ = 0.01 and δ = 1.
So our model has 11 parameters that are unknown (10 λi s and β ).
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
153/233
Let’s put a Gamma(α, β ) prior on λi with α = 1.8, so the λi s aredrawn from the same distribution.
Also, let’s put a Gamma(γ, δ ) prior on β , with γ = 0.01 and δ = 1.
So our model has 11 parameters that are unknown (10 λi s and β ).
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
154/233
Our posterior is
p (λ, β |y, t) ∝
10i =1
Poisson(λi t i ) ×
Gamma(α, β )
×Gamma
(γ, δ )
Let’s put a Gamma(α, β ) prior on λi with α = 1.8, so the λi s aredrawn from the same distribution.
Also, let’s put a Gamma(γ, δ ) prior on β , with γ = 0.01 and δ = 1.
So our model has 11 parameters that are unknown (10 λi s and β ).
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
155/233
Our posterior is
p (λ, β |y, t) ∝
10i =1
Poisson(λi t i ) ×
Gamma(α, β )
×Gamma
(γ, δ )
=
10
i =1
e −λi t i (λi t i )y i
y i ! ×
β α
Γ(α)λα−1i e
−βλi
× δ γ
Γ(γ )β γ −1e −δβ
p (λ, β |y, t) ∝
10
i =1
e −λi t i (λi t i )y i × β αλα−1i e
−βλi
×β γ −1e −δβ
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
156/233
p (λ, β |y, t) ∝
10
i =1
e −λi t i (λi t i )y i × β αλα−1i e
−βλi
×β γ −1e −δβ
= 10
i =1λy i +α−1i e −(t i +β )λi
β 10α+γ −1e −δβ
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
157/233
p (λ, β |y, t) ∝
10
i =1
e −λi t i (λi t i )y i × β αλα−1i e
−βλi
×β γ −1e −δβ
= 10
i =1λ
y i +α−1i e −(t i +β )λi
β 10α+γ −1e −δβ
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
158/233
Finding the full conditionals:
p (λ, β |y, t) ∝
10
i =1
e −λi t i (λi t i )y i × β αλα−1i e
−βλi
×β γ −1e −δβ
= 10
i =1λ
y i +
α−1i e −(t i +β )λi
β 10α+γ −1e −δβ
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
159/233
Finding the full conditionals:
p (λi |λ−i , β, y, t) ∝ λy i +α−1i e
−(t i +β )λi
p (λ, β |y, t) ∝
10
i =1
e −λi t i (λi t i )y i × β αλα−1i e
−βλi
×β γ −1e −δβ
= 10
i =1λ
y i +
α−1i e −(t i +β )λi
β 10α+γ −1e −δβ
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
160/233
Finding the full conditionals:
p (λi |λ−i , β, y, t) ∝ λy i +α−1i e
−(t i +β )λi
p (β |λ, y, t) ∝ e −β (δ+P10
i =1 λi )β 10α+γ −1
p (λ, β |y, t) ∝
10
i =1
e −λi t i (λi t i )y i × β αλα−1i e
−βλi
×β γ −1e −δβ
= 10
i =1λ
y i +α−1
i e −(t i +β )λi β 10α+γ −1e −δβ
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
161/233
Finding the full conditionals:
p (λi |λ−i , β, y, t) ∝ λy i +α−1i e
−(t i +β )λi
p (β |λ, y, t) ∝ e −β (δ+P10
i =1 λi )β 10α+γ −1
p (λi |λ−i , β, y, t) is a Gamma(y i + α, t i + β ) distribution.
p (λ, β |y, t) ∝
10
i =1
e −λi t i (λi t i )y i × β αλα−1i e
−βλi
×β γ −1e −δβ
= 10
i =1λ
y i +α−1
i e −(t i +β )λi β 10α+γ −1e −δβ
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
162/233
Finding the full conditionals:
p (λi |λ−i , β, y, t) ∝ λy i +α−1i e
−(t i +β )λi
p (β |λ, y, t) ∝ e −β (δ+P10
i =1 λi )β 10α+γ −1
p (λi |λ−i , β, y, t) is a Gamma(y i + α, t i + β ) distribution.
p (β |λ, y, t) is a Gamma(10α + γ, δ +10
i =1 λi ) distribution.
Coding the Gibbs Sampler
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
163/233
Coding the Gibbs Sampler
1. Define starting values for β
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
164/233
Coding the Gibbs Sampler
1. Define starting values for β (we only need to define β herebecause we will draw λ first and it only depends on β andother given values).
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
165/233
Coding the Gibbs Sampler
1. Define starting values for β (we only need to define β herebecause we will draw λ first and it only depends on β andother given values).
> beta.cur
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
166/233
Coding the Gibbs Sampler
1. Define starting values for β (we only need to define β herebecause we will draw λ first and it only depends on β andother given values).
> beta.cur
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
167/233
2. Draw λ(1) from its full conditional
Coding the Gibbs Sampler
1. Define starting values for β (we only need to define β herebecause we will draw λ first and it only depends on β andother given values).
> beta.cur
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
168/233
2. Draw λ(1) from its full conditional (we’re drawing all the λi sas a block because they all only depend on β and not eachother).
Coding the Gibbs Sampler
1. Define starting values for β (we only need to define β herebecause we will draw λ first and it only depends on β andother given values).
> beta.cur
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
169/233
2. Draw λ(1) from its full conditional (we’re drawing all the λi sas a block because they all only depend on β and not eachother).
> lambda.update beta.cur
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
170/233
2. Draw λ(1) from its full conditional (we’re drawing all the λi sas a block because they all only depend on β and not eachother).
> lambda.update beta.cur
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
171/233
2. Draw λ(1) from its full conditional (we’re drawing all the λi sas a block because they all only depend on β and not eachother).
> lambda.update beta.update
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
172/233
4. Repeat using most updated values until we get M draws.
5 Optional burn in and thinning
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
173/233
5. Optional burn-in and thinning.
4. Repeat using most updated values until we get M draws.
5 Optional burn in and thinning
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
174/233
5. Optional burn-in and thinning.
6. Make it into a function.
> gibbs
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
175/233
g g y p g+ }+ for (i in 1:n.sims) { + lambda.cur
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
176/233
7. Do Monte Carlo Integration on the resulting Markov chain,which are samples approximately from the posterior.
> posterior colMeans(posterior$lambda.draws)
[1] 0.07113 0.15098 0.10447 0.12321 0.65680 0.62212 0.86522 0.85465[9] 1.35524 1.92694
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
177/233
> mean(posterior$beta.draws)
[1] 2.389
> apply(posterior$lambda.draws, 2, sd)
[1] 0.02759 0.08974 0.04012 0.03071 0.30899 0.13676 0.55689 0.54814[9] 0.60854 0.40812
> sd(posterior$beta.draws)
[1] 0.6986
Outline
Introduction to Markov Chain Monte Carlo
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
178/233
Gibbs Sampling
The Metropolis-Hastings Algorithm
Suppose we have a posterior p (θ|y) that we want to sample from,but
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
179/233
Suppose we have a posterior p (θ|y) that we want to sample from,but
the posterior doesn’t look like any distribution we know (no
conjugacy)
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
180/233
Suppose we have a posterior p (θ|y) that we want to sample from,but
the posterior doesn’t look like any distribution we know (no
conjugacy) the posterior consists of more than 2 parameters (grid
approximations intractable)
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
181/233
approximations intractable)
Suppose we have a posterior p (θ|y) that we want to sample from,but
the posterior doesn’t look like any distribution we know (no
conjugacy) the posterior consists of more than 2 parameters (grid
approximations intractable)
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
182/233
approximations intractable)
some (or all) of the full conditionals do not look like any
distributions we know (no Gibbs sampling for those whose fullconditionals we don’t know)
Suppose we have a posterior p (θ|y) that we want to sample from,but
the posterior doesn’t look like any distribution we know (no
conjugacy) the posterior consists of more than 2 parameters (grid
approximations intractable)
http://find/http://goback/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
183/233
pp )
some (or all) of the full conditionals do not look like any
distributions we know (no Gibbs sampling for those whose fullconditionals we don’t know)
If all else fails, we can use the Metropolis-Hastings algorithm,which will always work.
Metropolis-Hastings Algorithm
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
184/233
Metropolis-Hastings Algorithm
The Metropolis-Hastings Algorithm follows the following steps:
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
185/233
Metropolis-Hastings Algorithm
The Metropolis-Hastings Algorithm follows the following steps:
1. Choose a starting value θ(0).
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
186/233
Metropolis-Hastings Algorithm
The Metropolis-Hastings Algorithm follows the following steps:
1. Choose a starting value θ(0).
2. At iteration t , draw a candidate θ∗
from a jumpingdistribution J t (θ
∗|θ(t −1)).
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
187/233
Metropolis-Hastings Algorithm
The Metropolis-Hastings Algorithm follows the following steps:
1. Choose a starting value θ(0).
2. At iteration t , draw a candidate θ∗
from a jumpingdistribution J t (θ
∗|θ(t −1)).
3. Compute an acceptance ratio (probability):
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
188/233
r =
p (θ∗|y)/J t (θ∗|θ(t −1))
p (θ(t −1)|y)/J t (θ(t −1)|θ∗)
Metropolis-Hastings Algorithm
The Metropolis-Hastings Algorithm follows the following steps:
1. Choose a starting value θ(0).
2. At iteration t , draw a candidate θ∗
from a jumpingdistribution J t (θ
∗|θ(t −1)).
3. Compute an acceptance ratio (probability):
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
189/233
r =
p (θ∗|y)/J t (θ∗|θ(t −1))
p (θ(t −1)|y)/J t (θ(t −1)|θ∗)
4. Accept θ∗ as θ(t ) with probability min(r , 1). If θ∗ is notaccepted, then θ(t ) = θ(t −1).
Metropolis-Hastings Algorithm
The Metropolis-Hastings Algorithm follows the following steps:
1. Choose a starting value θ(0).
2. At iteration t , draw a candidate θ∗
from a jumpingdistribution J t (θ
∗|θ(t −1)).
3. Compute an acceptance ratio (probability):
http://find/
-
8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm
190/233
r =
p (θ∗|y)/J t (θ∗|θ(t −1))
p (θ(t −1)|y)/J t (θ(t −1)|θ∗)