mcmc methods: gibbs sampling and the metropolis-hastings algorithm

Upload: albertomartinizquierdo

Post on 07-Aug-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    1/233

    MCMC Methods: Gibbs Sampling and theMetropolis-Hastings Algorithm

    Patrick Lam

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    2/233

    Outline

    Introduction to Markov Chain Monte Carlo

    Gibbs Sampling

    The Metropolis-Hastings Algorithm

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    3/233

    Outline

    Introduction to Markov Chain Monte Carlo

    Gibbs Sampling

    The Metropolis-Hastings Algorithm

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    4/233

    What is Markov Chain Monte Carlo (MCMC)?

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    5/233

    What is Markov Chain Monte Carlo (MCMC)?

    Markov Chain: a   stochastic process  in which future states areindependent of past states given the present state

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    6/233

    What is Markov Chain Monte Carlo (MCMC)?

    Markov Chain: a   stochastic process  in which future states areindependent of past states given the present state

    Monte Carlo: simulation

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    7/233

    What is Markov Chain Monte Carlo (MCMC)?

    Markov Chain: a   stochastic process  in which future states areindependent of past states given the present state

    Monte Carlo: simulation

    Up until now, we’ve done a lot of Monte Carlo simulation to findintegrals rather than doing it analytically, a process called  Monte Carlo Integration.

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    8/233

    What is Markov Chain Monte Carlo (MCMC)?

    Markov Chain: a   stochastic process  in which future states areindependent of past states given the present state

    Monte Carlo: simulation

    Up until now, we’ve done a lot of Monte Carlo simulation to findintegrals rather than doing it analytically, a process called  Monte Carlo Integration.

    Basically a fancy way of saying we can take quantities of interest of a distribution from simulated draws from the distribution.

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    9/233

    Monte Carlo Integration

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    10/233

    Monte Carlo Integration

    Suppose we have a distribution  p (θ) (perhaps a posterior) that wewant to take quantities of interest from.

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    11/233

    Monte Carlo Integration

    Suppose we have a distribution  p (θ) (perhaps a posterior) that wewant to take quantities of interest from.

    To derive it analytically, we need to take integrals:

    I   =

     Θg (θ)p (θ)d θ

    where  g (θ) is some function of  θ  (g (θ) = θ  for the mean andg (θ) = (θ − E (θ))2 for the variance).

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    12/233

    Monte Carlo Integration

    Suppose we have a distribution  p (θ) (perhaps a posterior) that wewant to take quantities of interest from.

    To derive it analytically, we need to take integrals:

    I   =

     Θg (θ)p (θ)d θ

    where  g (θ) is some function of  θ  (g (θ) = θ  for the mean andg (θ) = (θ − E (θ))2 for the variance).

    We can approximate the integrals via Monte Carlo Integration by

    simulating  M  values from  p (θ) and calculating

    ΠM  =  1

    M i =1

    g (θ(i ))

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    13/233

    For example, we can compute the expected value of the Beta(3,3)distribution analytically:

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    14/233

    For example, we can compute the expected value of the Beta(3,3)distribution analytically:

    E (θ) =

     Θθp (θ)d θ =

     Θθ

      Γ(6)

    Γ(3)Γ(3)θ2(1 − θ)2d θ =

     1

    2

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    15/233

    For example, we can compute the expected value of the Beta(3,3)distribution analytically:

    E (θ) =

     Θθp (θ)d θ =

     Θθ

      Γ(6)

    Γ(3)Γ(3)θ2(1 − θ)2d θ =

     1

    2

    or via Monte Carlo methods:

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    16/233

    For example, we can compute the expected value of the Beta(3,3)distribution analytically:

    E (θ) =

     Θθp (θ)d θ =

     Θθ

      Γ(6)

    Γ(3)Γ(3)θ2(1 − θ)2d θ =

     1

    2

    or via Monte Carlo methods:

    > M beta.sims sum(beta.sims)/M 

    [1] 0.5013

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    17/233

    For example, we can compute the expected value of the Beta(3,3)distribution analytically:

    E (θ) =

     Θθp (θ)d θ =

     Θθ

      Γ(6)

    Γ(3)Γ(3)θ2(1 − θ)2d θ =

     1

    2

    or via Monte Carlo methods:

    > M beta.sims sum(beta.sims)/M 

    [1] 0.5013

    Our Monte Carlo approximation  ΠM   is a simulation consistent

    estimator of the true value   I :

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    18/233

    For example, we can compute the expected value of the Beta(3,3)distribution analytically:

    E (θ) =

     Θθp (θ)d θ =

     Θθ

      Γ(6)

    Γ(3)Γ(3)θ2(1 − θ)2d θ =

     1

    2

    or via Monte Carlo methods:

    > M beta.sims sum(beta.sims)/M 

    [1] 0.5013

    Our Monte Carlo approximation  ΠM   is a simulation consistent

    estimator of the true value   I :  ˆI M  → I   as  M  → ∞.

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    19/233

    For example, we can compute the expected value of the Beta(3,3)distribution analytically:

    E (θ) =

     Θθp (θ)d θ =

     Θθ

      Γ(6)

    Γ(3)Γ(3)θ2(1 − θ)2d θ =

     1

    2

    or via Monte Carlo methods:

    > M beta.sims sum(beta.sims)/M 

    [1] 0.5013

    Our Monte Carlo approximation  ΠM   is a simulation consistent

    estimator of the true value   I :  ˆI M  → I   as  M  → ∞.

    We know this to be true from the Strong Law of Large Numbers.

    S L f L N b (SLLN)

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    20/233

    Strong Law of Large Numbers (SLLN)

    St L f L N b (SLLN)

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    21/233

    Strong Law of Large Numbers (SLLN)

    Let  X 1,X 2, . . .  be a sequence of   independent  and identicallydistributed random variables, each having a finite mean  µ = E (X i ).

    Then with probability 1,

    X 1 + X 2 + · · · + X M M 

      → µ  as M  → ∞

    St L f L N b (SLLN)

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    22/233

    Strong Law of Large Numbers (SLLN)

    Let  X 1,X 2, . . .  be a sequence of   independent  and identicallydistributed random variables, each having a finite mean  µ = E (X i ).

    Then with probability 1,

    X 1 + X 2 + · · · + X M M 

      → µ  as M  → ∞

    In our previous example, each simulation draw was  independentand distributed from the same Beta(3,3) distribution.

    St g L f L g N b s (SLLN)

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    23/233

    Strong Law of Large Numbers (SLLN)

    Let  X 1,X 2, . . .  be a sequence of   independent  and identicallydistributed random variables, each having a finite mean  µ = E (X i ).

    Then with probability 1,

    X 1 + X 2 + · · · + X M M 

      → µ  as M  → ∞

    In our previous example, each simulation draw was  independentand distributed from the same Beta(3,3) distribution.

    This also works with variances and other quantities of interest,since a function of i.i.d. random variables are also i.i.d. randomvariables.

    Strong Law of Large Numbers (SLLN)

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    24/233

    Strong Law of Large Numbers (SLLN)

    Let  X 1,X 2, . . .  be a sequence of   independent  and identicallydistributed random variables, each having a finite mean  µ = E (X i ).

    Then with probability 1,

    X 1 + X 2 + · · · + X M M 

      → µ  as M  → ∞

    In our previous example, each simulation draw was  independentand distributed from the same Beta(3,3) distribution.

    This also works with variances and other quantities of interest,since a function of i.i.d. random variables are also i.i.d. randomvariables.

    But what if we can’t generate draws that are  independent?

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    25/233

    Suppose we want to draw from our posterior distribution  p (θ|y ),but we cannot sample independent draws from it.

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    26/233

    Suppose we want to draw from our posterior distribution  p (θ|y ),but we cannot sample independent draws from it.

    For example, we often do not know the normalizing constant.

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    27/233

    Suppose we want to draw from our posterior distribution  p (θ|y ),but we cannot sample independent draws from it.

    For example, we often do not know the normalizing constant.

    However, we may be able to sample draws from  p (θ|y ) that areslightly dependent.

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    28/233

    Suppose we want to draw from our posterior distribution  p (θ|y ),but we cannot sample independent draws from it.

    For example, we often do not know the normalizing constant.

    However, we may be able to sample draws from  p (θ|y ) that areslightly dependent.

    If we can sample slightly dependent draws using a   Markov chain,then we can still find quantities of interests from those draws.

    What is a Markov Chain?

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    29/233

    What is a Markov Chain?

    What is a Markov Chain?

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    30/233

    What is a Markov Chain?

    Definition: a   stochastic process  in which future states areindependent of past states given the present state

    What is a Markov Chain?

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    31/233

    What is a Markov Chain?

    Definition: a   stochastic process  in which future states areindependent of past states given the present state

    Stochastic process: a   consecutive   set of   random  (notdeterministic) quantities defined on some known state space Θ.

    What is a Markov Chain?

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    32/233

    What is a Markov Chain?

    Definition: a   stochastic process  in which future states areindependent of past states given the present state

    Stochastic process: a   consecutive   set of   random  (notdeterministic) quantities defined on some known state space Θ.

     think of Θ as our parameter space.

    What is a Markov Chain?

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    33/233

    C

    Definition: a   stochastic process  in which future states areindependent of past states given the present state

    Stochastic process: a   consecutive   set of   random  (notdeterministic) quantities defined on some known state space Θ.

     think of Θ as our parameter space.

      consecutive  implies a time component, indexed by  t .

    What is a Markov Chain?

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    34/233

    Definition: a   stochastic process  in which future states areindependent of past states given the present state

    Stochastic process: a   consecutive   set of   random  (notdeterministic) quantities defined on some known state space Θ.

     think of Θ as our parameter space.

      consecutive  implies a time component, indexed by  t .

    Consider a draw of  θ(t ) to be a state at iteration  t . The next drawθ

    (t +1) is dependent only on the current draw  θ(t ), and not on anypast draws.

    What is a Markov Chain?

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    35/233

    Definition: a   stochastic process  in which future states areindependent of past states given the present state

    Stochastic process: a   consecutive   set of   random  (notdeterministic) quantities defined on some known state space Θ.

     think of Θ as our parameter space.

      consecutive  implies a time component, indexed by  t .

    Consider a draw of  θ(t ) to be a state at iteration  t . The next drawθ

    (t +1) is dependent only on the current draw  θ(t ), and not on anypast draws.

    This satisfies the  Markov property:

    What is a Markov Chain?

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    36/233

    Definition: a   stochastic process  in which future states areindependent of past states given the present state

    Stochastic process: a   consecutive   set of   random  (notdeterministic) quantities defined on some known state space Θ.

     think of Θ as our parameter space.

      consecutive  implies a time component, indexed by  t .

    Consider a draw of  θ(t ) to be a state at iteration  t . The next drawθ

    (t +1) is dependent only on the current draw  θ(t ), and not on anypast draws.

    This satisfies the  Markov property:

    p (θ(t +1)|θ(1), θ(2), . . . , θ(t )) = p (θ(t +1)|θ(t ))

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    37/233

    So our Markov chain is a bunch of draws of  θ  that are each slightly

    dependent on the previous one.

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    38/233

    So our Markov chain is a bunch of draws of  θ  that are each slightly

    dependent on the previous one. The chain wanders around theparameter space, remembering only where it has been in the lastperiod.

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    39/233

    So our Markov chain is a bunch of draws of  θ  that are each slightly

    dependent on the previous one. The chain wanders around theparameter space, remembering only where it has been in the lastperiod.

    What are the rules governing how the chain jumps from one stateto another at each period?

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    40/233

    So our Markov chain is a bunch of draws of  θ  that are each slightly

    dependent on the previous one. The chain wanders around theparameter space, remembering only where it has been in the lastperiod.

    What are the rules governing how the chain jumps from one stateto another at each period?

    The jumping rules are governed by a  transition kernel, which is amechanism that describes the probability of moving to some other

    state based on the current state.

    Transition Kernel

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    41/233

    Transition Kernel

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    42/233

    For discrete state space (k  possible states):

    Transition Kernel

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    43/233

    For discrete state space (k  possible states): a  k × k  matrix of transition probabilities.

    Transition Kernel

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    44/233

    For discrete state space (k  possible states): a  k × k  matrix of transition probabilities.

    Example: Suppose  k  = 3.

    Transition Kernel

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    45/233

    For discrete state space (k  possible states): a  k × k  matrix of transition probabilities.

    Example: Suppose  k  = 3. The 3 × 3 transition matrix  P  would be

    p (θ(t +1)A   |θ

    (t )A   )   p (θ

    (t +1)B   |θ

    (t )A   )   p (θ

    (t +1)C   |θ

    (t )A   )

    p (θ(t +1)A   |θ

    (t )B   )   p (θ

    (t +1)B   |θ

    (t )B   )   p (θ

    (t +1)C   |θ

    (t )B   )

    p (θ(t +1)A   |θ(t )C   )   p (θ(t +1)B   |θ(t )C   )   p (θ(t +1)C   |θ(t )C   )

    where the subscripts index the 3 possible values that  θ  can take.

    Transition Kernel

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    46/233

    For discrete state space (k  possible states): a  k × k  matrix of transition probabilities.

    Example: Suppose  k  = 3. The 3 × 3 transition matrix  P  would be

    p (θ(t +1)A   |θ

    (t )A   )   p (θ

    (t +1)B   |θ

    (t )A   )   p (θ

    (t +1)C   |θ

    (t )A   )

    p (θ(t +1)A   |θ

    (t )B   )   p (θ

    (t +1)B   |θ

    (t )B   )   p (θ

    (t +1)C   |θ

    (t )B   )

    p (θ(t +1)A   |θ(t )C   )   p (θ(t +1)B   |θ(t )C   )   p (θ(t +1)C   |θ(t )C   )

    where the subscripts index the 3 possible values that  θ  can take.

    The rows sum to one and define a conditional PMF, conditional on

    the current state.

    Transition Kernel

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    47/233

    For discrete state space (k  possible states): a  k × k  matrix of transition probabilities.

    Example: Suppose  k  = 3. The 3 × 3 transition matrix  P  would be

    p (θ(t +1)A   |θ

    (t )A   )   p (θ

    (t +1)B   |θ

    (t )A   )   p (θ

    (t +1)C   |θ

    (t )A   )

    p (θ(t +1)A   |θ

    (t )B   )   p (θ

    (t +1)B   |θ

    (t )B   )   p (θ

    (t +1)C   |θ

    (t )B   )

    p (θ(t 

    +1)A   |θ(t 

    )C   )   p (θ(t 

    +1)B   |θ(t 

    )C   )   p (θ(t 

    +1)C   |θ(t 

    )C   )

    where the subscripts index the 3 possible values that  θ  can take.

    The rows sum to one and define a conditional PMF, conditional on

    the current state. The columns are the marginal probabilities of being in a certain state in the next period.

    Transition Kernel

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    48/233

    For discrete state space (k  possible states): a  k × k  matrix of transition probabilities.

    Example: Suppose  k  = 3. The 3 × 3 transition matrix  P  would be

    p (θ(t +1)A   |θ

    (t )A   )   p (θ

    (t +1)B   |θ

    (t )A   )   p (θ

    (t +1)C   |θ

    (t )A   )

    p (θ(t +1)A   |θ

    (t )B   )   p (θ

    (t +1)B   |θ

    (t )B   )   p (θ

    (t +1)C   |θ

    (t )B   )

    p (θ(t 

    +1)A   |θ(t 

    )C   )   p (θ(t 

    +1)B   |θ(t 

    )C   )   p (θ(t 

    +1)C   |θ(t 

    )C   )

    where the subscripts index the 3 possible values that  θ  can take.

    The rows sum to one and define a conditional PMF, conditional on

    the current state. The columns are the marginal probabilities of being in a certain state in the next period.

    For continuous state space (infinite possible states), the transition

    kernel is a bunch of conditional PDFs:   f   (θ(t +1)

     j

      |θ(t )

    i

      )

    How Does a Markov Chain Work? (Discrete Example)

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    49/233

    How Does a Markov Chain Work? (Discrete Example)

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    50/233

    1.  Define a starting distribution

    (0) (a 1 × k  vector of 

    probabilities that sum to one).

    How Does a Markov Chain Work? (Discrete Example)

    http://goforward/http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    51/233

    1.  Define a starting distribution

    (0) (a 1 × k  vector of 

    probabilities that sum to one).

    2.  At iteration 1, our distribution(1) (from which  θ(1) is

    drawn) is

    (1) =

    (0) ×   P

    (1 × k ) (1 × k )   ×   (k × k )

    How Does a Markov Chain Work? (Discrete Example)

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    52/233

    1.  Define a starting distribution

    (0) (a 1 × k  vector of 

    probabilities that sum to one).

    2.  At iteration 1, our distribution(1) (from which  θ(1) is

    drawn) is

    (1) =

    (0) ×   P

    (1 × k ) (1 × k )   ×   (k × k )

    3.  At iteration 2, our distribution(2) (from which  θ(2) is

    drawn) is

    (2) = (1) ×   P(1 × k ) (1 × k )   ×   (k × k )

    How Does a Markov Chain Work? (Discrete Example)

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    53/233

    1.  Define a starting distribution

    (0) (a 1 × k  vector of 

    probabilities that sum to one).

    2.  At iteration 1, our distribution(1) (from which  θ(1) is

    drawn) is

    (1) =

    (0) ×   P

    (1 × k ) (1 × k )   ×   (k × k )

    3.  At iteration 2, our distribution(2) (from which  θ(2) is

    drawn) is

    (2) = (1) ×   P(1 × k ) (1 × k )   ×   (k × k )

    4.  At iteration  t , our distribution(t ) (from which  θ(t ) is

    drawn) is (t ) =

    (t −1) × P = (0) × Pt 

    Stationary (Limiting) Distribution

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    54/233

    Stationary (Limiting) Distribution

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    55/233

    Define a stationary distribution  π  to be some distribution

     suchthat  π = πP.

    Stationary (Limiting) Distribution

    http://goforward/http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    56/233

    Define a stationary distribution  π  to be some distribution

     suchthat  π = πP.

    For all the MCMC algorithms we use in Bayesian statistics, theMarkov chain will typically  converge  to  π  regardless of our

    starting points.

    Stationary (Limiting) Distribution

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    57/233

    Define a stationary distribution  π  to be some distribution

     suchthat  π = πP.

    For all the MCMC algorithms we use in Bayesian statistics, theMarkov chain will typically  converge  to  π  regardless of our

    starting points.

    So if we can devise a Markov chain whose stationary distribution  πis our desired posterior distribution  p (θ|y ), then we can run thischain to get draws that are approximately from  p (θ|y ) once the

    chain has converged.

    Burn-in

    http://goforward/http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    58/233

    Burn-in

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    59/233

    Since convergence usually occurs regardless of our starting point,we can usually pick any feasible (for example, picking startingdraws that are in the parameter space) starting point.

    Burn-in

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    60/233

    Since convergence usually occurs regardless of our starting point,we can usually pick any feasible (for example, picking startingdraws that are in the parameter space) starting point.

    However, the time it takes for the chain to converge variesdepending on the starting point.

    Burn-in

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    61/233

    Since convergence usually occurs regardless of our starting point,we can usually pick any feasible (for example, picking startingdraws that are in the parameter space) starting point.

    However, the time it takes for the chain to converge variesdepending on the starting point.

    As a matter of practice, most people throw out a certain numberof the first draws, known as the  burn-in. This is to make ourdraws closer to the stationary distribution and less dependent onthe starting point.

    Burn-in

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    62/233

    Since convergence usually occurs regardless of our starting point,we can usually pick any feasible (for example, picking starting

    draws that are in the parameter space) starting point.

    However, the time it takes for the chain to converge variesdepending on the starting point.

    As a matter of practice, most people throw out a certain numberof the first draws, known as the  burn-in. This is to make ourdraws closer to the stationary distribution and less dependent onthe starting point.

    However, it is unclear how much we should  burn-in  since ourdraws are all slightly dependent and we don’t know exactly whenconvergence occurs.

    Monte Carlo Integration on the Markov Chain

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    63/233

    Monte Carlo Integration on the Markov Chain

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    64/233

    Once we have a Markov chain that has converged to the stationarydistribution, then the draws in our chain appear to be like drawsfrom  p (θ|y ),

    Monte Carlo Integration on the Markov Chain

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    65/233

    Once we have a Markov chain that has converged to the stationarydistribution, then the draws in our chain appear to be like drawsfrom  p (θ|y ), so it seems like we should be able to use Monte CarloIntegration methods to find quantities of interest.

    Monte Carlo Integration on the Markov Chain

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    66/233

    Once we have a Markov chain that has converged to the stationarydistribution, then the draws in our chain appear to be like drawsfrom  p (θ|y ), so it seems like we should be able to use Monte CarloIntegration methods to find quantities of interest.

    One problem: our draws are not independent, which we requiredfor Monte Carlo Integration to work (remember SLLN).

    Monte Carlo Integration on the Markov Chain

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    67/233

    Once we have a Markov chain that has converged to the stationarydistribution, then the draws in our chain appear to be like drawsfrom  p (θ|y ), so it seems like we should be able to use Monte CarloIntegration methods to find quantities of interest.

    One problem: our draws are not independent, which we requiredfor Monte Carlo Integration to work (remember SLLN).

    Luckily, we have the  Ergodic Theorem.

    Ergodic Theorem

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    68/233

    Ergodic Theorem

    Let  θ(1),θ(2), . . . , θ(M ) be  M  values from a Markov chain that is

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    69/233

    aperiodic ,   irreducible , and  positive recurrent  (then the chain isergodic), and  E [g (θ)] < ∞.

    Ergodic Theorem

    Let  θ(1),θ(2), . . . , θ(M ) be  M  values from a Markov chain that is

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    70/233

    aperiodic ,   irreducible , and  positive recurrent  (then the chain isergodic), and  E [g (θ)] < ∞.

    Then with probability 1,

    1

    i =1

    g (θi ) →  Θ

    g (θ)π(θ)d θ

    as  M  → ∞, where  π  is the stationary distribution.

    Ergodic Theorem

    Let  θ(1),θ(2), . . . , θ(M ) be  M  values from a Markov chain that is

    http://goforward/http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    71/233

    aperiodic ,   irreducible , and  positive recurrent  (then the chain isergodic), and  E [g (θ)] < ∞.

    Then with probability 1,

    1

    i =1

    g (θi ) →  Θ

    g (θ)π(θ)d θ

    as  M  → ∞, where  π  is the stationary distribution.

    This is the Markov chain analog to the SLLN,

    Ergodic Theorem

    Let  θ(1),θ(2), . . . , θ(M ) be  M  values from a Markov chain that is

    http://goforward/http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    72/233

    aperiodic ,   irreducible , and  positive recurrent  (then the chain isergodic), and  E [g (θ)] < ∞.

    Then with probability 1,

    1

    i =1

    g (θi ) →  Θ

    g (θ)π(θ)d θ

    as  M  → ∞, where  π  is the stationary distribution.

    This is the Markov chain analog to the SLLN, and it allows us to

    ignore the dependence between draws of the Markov chain whenwe calculate quantities of interest from the draws.

    Ergodic Theorem

    Let  θ(1),θ(2), . . . , θ(M ) be  M  values from a Markov chain that is(

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    73/233

    aperiodic ,   irreducible , and  positive recurrent  (then the chain isergodic), and  E [g (θ)] < ∞.

    Then with probability 1,

    1

    i =1

    g (θi ) →  Θ

    g (θ)π(θ)d θ

    as  M  → ∞, where  π  is the stationary distribution.

    This is the Markov chain analog to the SLLN, and it allows us to

    ignore the dependence between draws of the Markov chain whenwe calculate quantities of interest from the draws.

    But what does it mean for a chain to be  aperiodic ,   irreducible , andpositive recurrent , and therefore ergodic?

    Aperiodicity

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    74/233

    Aperiodicity

    A Markov chain is  aperiodic  if the only length of time for which

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    75/233

    the chain repeats some cycle of values is the trivial case with cycle

    length equal to one.

    Aperiodicity

    A Markov chain is  aperiodic  if the only length of time for which

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    76/233

    the chain repeats some cycle of values is the trivial case with cycle

    length equal to one.

    Let A, B, and C denote the states (analogous to the possiblevalues of  θ) in a 3-state Markov chain.

    Aperiodicity

    A Markov chain is  aperiodic  if the only length of time for which

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    77/233

    the chain repeats some cycle of values is the trivial case with cycle

    length equal to one.

    Let A, B, and C denote the states (analogous to the possiblevalues of  θ) in a 3-state Markov chain. The following chain isperiodic  with period 3, where the period is the number of steps

    that it takes to return to a certain state.

    Aperiodicity

    A Markov chain is  aperiodic  if the only length of time for which

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    78/233

    the chain repeats some cycle of values is the trivial case with cycle

    length equal to one.

    Let A, B, and C denote the states (analogous to the possiblevalues of  θ) in a 3-state Markov chain. The following chain isperiodic  with period 3, where the period is the number of steps

    that it takes to return to a certain state.

        A

    0

      1             B 

    0

      1             C 

    0

      

    1

      

    Aperiodicity

    A Markov chain is  aperiodic  if the only length of time for whichh h i l f l i h i i l i h l

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    79/233

    the chain repeats some cycle of values is the trivial case with cycle

    length equal to one.

    Let A, B, and C denote the states (analogous to the possiblevalues of  θ) in a 3-state Markov chain. The following chain isperiodic  with period 3, where the period is the number of steps

    that it takes to return to a certain state.

        A

    0

      1             B 

    0

      1             C 

    0

      

    1

      

    As long as the chain is not repeating an identical cycle, then thechain is  aperiodic.

    Irreducibility

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    80/233

    Irreducibility

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    81/233

    A Markov chain is   irreducible  if it is possible go from any state toany other state (not necessarily in one step).

    Irreducibility

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    82/233

    A Markov chain is   irreducible  if it is possible go from any state toany other state (not necessarily in one step).

    The following chain is   reducible , or not irreducible.

    Irreducibility

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    83/233

    A Markov chain is   irreducible  if it is possible go from any state toany other state (not necessarily in one step).

    The following chain is   reducible , or not irreducible.

        A

    0.5

      0.5             B 

    0.7

         0.3            C 

    0.4

      

    0.6

      

    Irreducibility

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    84/233

    A Markov chain is   irreducible  if it is possible go from any state toany other state (not necessarily in one step).

    The following chain is   reducible , or not irreducible.

        A

    0.5

      0.5             B 

    0.7

         0.3            C 

    0.4

      

    0.6

      

    The chain is not irreducible because we cannot get to A from B or

    C regardless of the number of steps we take.

    Positive Recurrence

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    85/233

    Positive Recurrence

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    86/233

    A Markov chain is   recurrent  if for any given state   i , if the chain

    starts at   i , it will eventually return to   i  with probability 1.

    Positive Recurrence

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    87/233

    A Markov chain is   recurrent  if for any given state   i , if the chain

    starts at   i , it will eventually return to   i  with probability 1.

    A Markov chain is  positive recurrent  if the expected return timeto state   i   is finite;

    Positive Recurrence

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    88/233

    A Markov chain is   recurrent  if for any given state   i , if the chain

    starts at   i , it will eventually return to   i  with probability 1.

    A Markov chain is  positive recurrent  if the expected return timeto state   i  is finite; otherwise it is  null recurrent .

    Positive Recurrence

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    89/233

    A Markov chain is   recurrent  if for any given state   i , if the chain

    starts at   i , it will eventually return to   i  with probability 1.

    A Markov chain is  positive recurrent  if the expected return timeto state   i  is finite; otherwise it is  null recurrent .

    So if our Markov chain is  aperiodic,   irreducible, and   positiverecurrent  (all the ones we use in Bayesian statistics usually are),then it is ergodic and the ergodic theorem allows us to do MonteCarlo Integration by calculating quantities of interest from ourdraws, ignoring the dependence between draws.

    Thinning the Chain

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    90/233

    Thinning the Chain

    In order to break the dependence between draws in the Markov

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    91/233

    chain, some have suggested only keeping every  d th draw of the

    chain.

    Thinning the Chain

    In order to break the dependence between draws in the Markov

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    92/233

    chain, some have suggested only keeping every  d th draw of the

    chain.

    This is known as  thinning.

    Thinning the Chain

    In order to break the dependence between draws in the Markov

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    93/233

    chain, some have suggested only keeping every  d th draw of the

    chain.

    This is known as  thinning.

    Pros:

    Thinning the Chain

    In order to break the dependence between draws in the Markov

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    94/233

    chain, some have suggested only keeping every  d th draw of the

    chain.

    This is known as  thinning.

    Pros:

     Perhaps gets you a little closer to i.i.d. draws.

    Thinning the Chain

    In order to break the dependence between draws in the Markovh i h d l k i d h d f h

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    95/233

    chain, some have suggested only keeping every  d th draw of the

    chain.

    This is known as  thinning.

    Pros:

     Perhaps gets you a little closer to i.i.d. draws.

     Saves memory since you only store a fraction of the draws.

    Thinning the Chain

    In order to break the dependence between draws in the Markovh i h d l k i d h d f h

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    96/233

    chain, some have suggested only keeping every  d th draw of the

    chain.

    This is known as  thinning.

    Pros:

     Perhaps gets you a little closer to i.i.d. draws.

     Saves memory since you only store a fraction of the draws.

    Cons:

    Thinning the Chain

    In order to break the dependence between draws in the Markovh i h t d l k i dth d f th

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    97/233

    chain, some have suggested only keeping every  d th draw of the

    chain.

    This is known as  thinning.

    Pros:

     Perhaps gets you a little closer to i.i.d. draws.

     Saves memory since you only store a fraction of the draws.

    Cons:

     Unnecessary with ergodic theorem.

    Thinning the Chain

    In order to break the dependence between draws in the Markovh i h t d l k i dth d f th

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    98/233

    chain, some have suggested only keeping every  d th draw of the

    chain.

    This is known as  thinning.

    Pros:

     Perhaps gets you a little closer to i.i.d. draws.

     Saves memory since you only store a fraction of the draws.

    Cons:

     Unnecessary with ergodic theorem.

     Shown to increase the variance of your Monte Carlo estimates.

    So Really, What is MCMC?

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    99/233

    So Really, What is MCMC?

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    100/233

    MCMC is a class of methods in which we can  simulate draws thatare slightly dependent and are approximately from a (posterior)distribution.

    So Really, What is MCMC?

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    101/233

    MCMC is a class of methods in which we can  simulate draws thatare slightly dependent and are approximately from a (posterior)distribution.

    We then take those draws and calculate quantities of interest for

    the (posterior) distribution.

    So Really, What is MCMC?

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    102/233

    MCMC is a class of methods in which we can  simulate draws thatare slightly dependent and are approximately from a (posterior)distribution.

    We then take those draws and calculate quantities of interest for

    the (posterior) distribution.

    In Bayesian statistics, there are generally two MCMC algorithmsthat we use: the Gibbs Sampler and the Metropolis-Hastingsalgorithm.

    Outline

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    103/233

    Introduction to Markov Chain Monte Carlo

    Gibbs Sampling

    The Metropolis-Hastings Algorithm

    Gibbs Sampling

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    104/233

    Gibbs Sampling

    Suppose we have a joint distribution p(θ1, . . . , θk ) that we want to

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    105/233

    Suppose we have a joint distribution  p (θ1, . . . , θk ) that we want to

    sample from (for example, a posterior distribution).

    Gibbs Sampling

    Suppose we have a joint distribution  p (θ1, . . . , θk ) that we want to

    http://goforward/http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    106/233

    S pp j p( 1, , k )

    sample from (for example, a posterior distribution).

    We can use the Gibbs sampler to sample from the joint distributionif we knew the  full conditional  distributions for each parameter.

    Gibbs Sampling

    Suppose we have a joint distribution  p (θ1, . . . , θk ) that we want to

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    107/233

    pp j p( 1, , k )

    sample from (for example, a posterior distribution).

    We can use the Gibbs sampler to sample from the joint distributionif we knew the  full conditional  distributions for each parameter.

    For each parameter, the  full conditional  distribution is thedistribution of the parameter conditional on the known informationand all the other parameters:

    Gibbs Sampling

    Suppose we have a joint distribution  p (θ1, . . . , θk ) that we want to

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    108/233

    pp j p( , , k )

    sample from (for example, a posterior distribution).

    We can use the Gibbs sampler to sample from the joint distributionif we knew the  full conditional  distributions for each parameter.

    For each parameter, the  full conditional  distribution is thedistribution of the parameter conditional on the known informationand all the other parameters:   p (θ j |θ− j , y )

    Gibbs Sampling

    Suppose we have a joint distribution  p (θ1, . . . , θk ) that we want to

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    109/233

    j ( )

    sample from (for example, a posterior distribution).

    We can use the Gibbs sampler to sample from the joint distributionif we knew the  full conditional  distributions for each parameter.

    For each parameter, the  full conditional  distribution is thedistribution of the parameter conditional on the known informationand all the other parameters:   p (θ j |θ− j , y )

    How can we know the joint distribution simply by knowing the full

    conditional distributions?

    The Hammersley-Clifford Theorem (for two blocks)

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    110/233

    The Hammersley-Clifford Theorem (for two blocks)Suppose we have a joint density  f   (x , y ).

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    111/233

    The Hammersley-Clifford Theorem (for two blocks)Suppose we have a joint density  f   (x , y ). The theorem proves thatwe can write out the joint density in terms of the conditionaldensities  f   (x |y ) and  f   (y |x ):

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    112/233

    The Hammersley-Clifford Theorem (for two blocks)Suppose we have a joint density  f   (x , y ). The theorem proves thatwe can write out the joint density in terms of the conditionaldensities  f   (x |y ) and  f   (y |x ):

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    113/233

    f   (x , y ) =  f   (y |x )   f   (y |x )

    f   (x |y ) dy 

    The Hammersley-Clifford Theorem (for two blocks)Suppose we have a joint density  f   (x , y ). The theorem proves thatwe can write out the joint density in terms of the conditionaldensities  f   (x |y ) and  f   (y |x ):

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    114/233

    f   (x , y ) =  f   (y |x )   f   (y |x )

    f   (x |y ) dy 

    We can write the denominator as

       f   (y |x )

    f   (x |y )dy    =

      f   (x ,y )f   (x )

    f   (x ,y )f   (y )

    dy 

    The Hammersley-Clifford Theorem (for two blocks)Suppose we have a joint density  f   (x , y ). The theorem proves thatwe can write out the joint density in terms of the conditionaldensities  f   (x |y ) and  f   (y |x ):

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    115/233

    f   (x , y ) =  f   (y |x )   f   (y |x )

    f   (x |y ) dy 

    We can write the denominator as

       f   (y |x )

    f   (x |y )dy    =

      f   (x ,y )f   (x )

    f   (x ,y )f   (y )

    dy 

    =    f   (y )f   (x )

    dy 

    The Hammersley-Clifford Theorem (for two blocks)Suppose we have a joint density  f   (x , y ). The theorem proves thatwe can write out the joint density in terms of the conditionaldensities  f   (x |y ) and  f   (y |x ):

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    116/233

    f   (x , y ) =  f   (y |x )   f   (y |x )

    f   (x |y ) dy 

    We can write the denominator as

       f   (y |x )

    f   (x |y )dy    =

      f   (x ,y )f   (x )

    f   (x ,y )f   (y )

    dy 

    =    f   (y )f   (x )

    dy 

    =  1

    f   (x )

    Thus, our right-hand side is

    f   (y |x )

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    117/233

    (y | )1

    f   (x )=   f   (y |x )f   (x )

    Thus, our right-hand side is

    f   (y |x )

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    118/233

    (y | )1

    f   (x )=   f   (y |x )f   (x )

    =   f   (x , y )

    Thus, our right-hand side is

    f   (y |x )

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    119/233

    (y | )1

    f   (x )=   f   (y |x )f   (x )

    =   f   (x , y )

    The theorem shows that knowledge of the conditional densities

    allows us to get the joint density.

    Thus, our right-hand side is

    f   (y |x )

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    120/233

    ( | )1

    f   (x )=   f   (y |x )f   (x )

    =   f   (x , y )

    The theorem shows that knowledge of the conditional densities

    allows us to get the joint density.

    This works for more than two blocks of parameters.

    Thus, our right-hand side is

    f   (y |x )

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    121/233

    1f   (x )

    =   f   (y |x )f   (x )

    =   f   (x , y )

    The theorem shows that knowledge of the conditional densities

    allows us to get the joint density.

    This works for more than two blocks of parameters.

    But how do we figure out the full conditionals?

    Steps to Calculating Full Conditional Distributions

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    122/233

    Steps to Calculating Full Conditional Distributions

    Suppose we have a posterior  p (θ|y).

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    123/233

    Steps to Calculating Full Conditional Distributions

    Suppose we have a posterior  p (θ|y). To calculate the full

    d l f h θ d h f ll

    http://goforward/http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    124/233

    conditionals for each  θ, do the following:

    Steps to Calculating Full Conditional Distributions

    Suppose we have a posterior  p (θ|y). To calculate the full

    di i l f h θ d h f ll i

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    125/233

    conditionals for each  θ, do the following:

    1.  Write out the full posterior ignoring constants of proportionality.

    Steps to Calculating Full Conditional Distributions

    Suppose we have a posterior  p (θ|y). To calculate the full

    di i l f h θ d h f ll i

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    126/233

    conditionals for each  θ, do the following:

    1.  Write out the full posterior ignoring constants of proportionality.

    2.  Pick a block of parameters (for example,  θ1) and drop

    everything that doesn’t depend on  θ1.

    Steps to Calculating Full Conditional Distributions

    Suppose we have a posterior  p (θ|y). To calculate the full

    diti l f h θ d th f ll i

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    127/233

    conditionals for each  θ, do the following:

    1.  Write out the full posterior ignoring constants of proportionality.

    2.  Pick a block of parameters (for example,  θ1) and drop

    everything that doesn’t depend on  θ1.

    3.  Use your knowledge of distributions to figure out what thenormalizing constant is (and thus what the full conditionaldistribution  p (θ1|θ−1, y) is).

    Steps to Calculating Full Conditional Distributions

    Suppose we have a posterior  p (θ|y). To calculate the full

    diti l f h θ d th f ll i

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    128/233

    conditionals for each  θ, do the following:

    1.  Write out the full posterior ignoring constants of proportionality.

    2.  Pick a block of parameters (for example,  θ1) and drop

    everything that doesn’t depend on  θ1.

    3.  Use your knowledge of distributions to figure out what thenormalizing constant is (and thus what the full conditionaldistribution  p (θ1|θ−1, y) is).

    4.  Repeat steps 2 and 3 for all parameter blocks.

    Gibbs Sampler Steps

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    129/233

    Gibbs Sampler Steps

    Let’s suppose that we are interested in sampling from the posterior

    p(θ|y) where θ is a vector of three parameters θ1 θ2 θ3

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    130/233

    p (θ|y), where  θ   is a vector of three parameters,  θ1, θ2, θ3.

    Gibbs Sampler Steps

    Let’s suppose that we are interested in sampling from the posterior

    p(θ|y) where θ is a vector of three parameters θ1 θ2 θ3

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    131/233

    p (θ|y), where  θ   is a vector of three parameters,  θ1, θ2, θ3.

    The steps to a Gibbs Sampler  (and the analogous steps in the MCMC process) are

    Gibbs Sampler Steps

    Let’s suppose that we are interested in sampling from the posterior

    p(θ|y) where θ is a vector of three parameters θ1 θ2 θ3

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    132/233

    p (θ|y), where  θ   is a vector of three parameters,  θ1, θ2, θ3.

    The steps to a Gibbs Sampler  (and the analogous steps in the MCMC process) are

    1.  Pick a vector of starting values  θ(0).   (Defining a starting distributionQ(0) anddrawing  θ(0) from it.)

    Gibbs Sampler Steps

    Let’s suppose that we are interested in sampling from the posterior

    p(θ|y) where θ is a vector of three parameters θ1 θ2 θ3

    http://goforward/http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    133/233

    p (θ|y), where  θ   is a vector of three parameters,  θ1, θ2, θ3.

    The steps to a Gibbs Sampler  (and the analogous steps in the MCMC process) are

    1.  Pick a vector of starting values  θ(0).   (Defining a starting distributionQ(0) anddrawing  θ(0) from it.)

    2.  Start with any  θ  (order does not matter, but I’ll start with  θ1

    for convenience).

    Gibbs Sampler Steps

    Let’s suppose that we are interested in sampling from the posterior

    p(θ|y) where θ is a vector of three parameters θ1 θ2 θ3

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    134/233

    p (θ|y), where  θ   is a vector of three parameters,  θ1, θ2, θ3.

    The steps to a Gibbs Sampler  (and the analogous steps in the MCMC process) are

    1.  Pick a vector of starting values  θ(0).   (Defining a starting distributionQ(0) anddrawing  θ(0) from it.)

    2.  Start with any  θ  (order does not matter, but I’ll start with  θ1

    for convenience). Draw a value  θ(1)1   from the full conditional

    p (θ1|θ(0)

    2

      , θ(0)

    3

      , y).

    3.  Draw a value  θ(1)2   (again order does not matter) from the full

    conditional  p (θ2|θ(1)1   , θ

    (0)3   , y).

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    135/233

    3.  Draw a value  θ(1)2   (again order does not matter) from the full

    conditional  p (θ2|θ(1)1   , θ

    (0)3   , y). Note that we must use the

    updated value of  θ(1)1   .

    http://goforward/http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    136/233

    3.  Draw a value  θ(1)

    2   (again order does not matter) from the fullconditional  p (θ2|θ

    (1)1   , θ

    (0)3   , y). Note that we must use the

    updated value of  θ(1)1   .

    4.  Draw a value  θ(1)3   from the full conditional p (θ3|θ

    (1)1   , θ

    (1)2   , y)

    using both updated values.

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    137/233

    using both updated values.

    3.  Draw a value  θ(1)

    2   (again order does not matter) from the fullconditional  p (θ2|θ

    (1)1   , θ

    (0)3   , y). Note that we must use the

    updated value of  θ(1)1   .

    4.  Draw a value  θ(1)3   from the full conditional p (θ3|θ

    (1)1   , θ

    (1)2   , y)

    using both updated values. (Steps 2-4 are analogous to multiplyingQ(0) and P to get

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    138/233

    using both updated values.   (Steps 2 4 are analogous to multiplyingQ and  P  to getQ(1) and then drawing  θ(1) from

    Q(1).)

    3.  Draw a value  θ(1)

    2   (again order does not matter) from the fullconditional  p (θ2|θ

    (1)1   , θ

    (0)3   , y). Note that we must use the

    updated value of  θ(1)1   .

    4.  Draw a value  θ(1)3   from the full conditional p (θ3|θ

    (1)1   , θ

    (1)2   , y)

    using both updated values.   (Steps 2-4 are analogous to multiplyingQ(0) and  P  to get

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    139/233

    g p ( p g p y gQ gQ(1) and then drawing  θ(1) from

    Q(1).)

    5.   Draw  θ(2) using  θ(1) and continually using the most updatedvalues.

    3.  Draw a value  θ(1)

    2   (again order does not matter) from the fullconditional  p (θ2|θ

    (1)1   , θ

    (0)3   , y). Note that we must use the

    updated value of  θ(1)1   .

    4.  Draw a value  θ(1)3   from the full conditional p (θ3|θ

    (1)1   , θ

    (1)2   , y)

    using both updated values.   (Steps 2-4 are analogous to multiplyingQ(0) and  P  to get

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    140/233

    g p ( QQ(1) and then drawing  θ(1) from

    Q(1).)

    5.   Draw  θ(2) using  θ(1) and continually using the most updatedvalues.

    6.  Repeat until we get  M  draws, with each draw being a vectorθ

    (t ).

    3.  Draw a value  θ(1)

    2   (again order does not matter) from the fullconditional  p (θ2|θ

    (1)1   , θ

    (0)3   , y). Note that we must use the

    updated value of  θ(1)1   .

    4.  Draw a value  θ(1)3   from the full conditional p (θ3|θ

    (1)1   , θ

    (1)2   , y)

    using both updated values.   (Steps 2-4 are analogous to multiplyingQ(0) and  P  to get

    http://goforward/http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    141/233

    g pQ(1) and then drawing  θ(1) from

    Q(1).)

    5.   Draw  θ(2) using  θ(1) and continually using the most updatedvalues.

    6.  Repeat until we get  M  draws, with each draw being a vectorθ

    (t ).

    7.  Optional burn-in and/or thinning.

    3.  Draw a value  θ(1)

    2   (again order does not matter) from the fullconditional  p (θ2|θ

    (1)1   , θ

    (0)3   , y). Note that we must use the

    updated value of  θ(1)1   .

    4.  Draw a value  θ(1)3   from the full conditional p (θ3|θ

    (1)1   , θ

    (1)2   , y)

    using both updated values.   (Steps 2-4 are analogous to multiplyingQ(0) and  P  to get

    http://goforward/http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    142/233

    g pQ(1) and then drawing  θ(1) from

    Q(1).)

    5.   Draw  θ(2) using  θ(1) and continually using the most updatedvalues.

    6.  Repeat until we get  M  draws, with each draw being a vectorθ

    (t ).

    7.  Optional burn-in and/or thinning.

    Our result is a Markov chain with a bunch of draws of  θ  that areapproximately from our posterior.

    3.  Draw a value  θ(1)

    2   (again order does not matter) from the fullconditional  p (θ2|θ

    (1)1   , θ

    (0)3   , y). Note that we must use the

    updated value of  θ(1)1   .

    4.  Draw a value  θ(1)3   from the full conditional p (θ3|θ

    (1)1   , θ

    (1)2   , y)

    using both updated values.   (Steps 2-4 are analogous to multiplyingQ(0) and  P  to get

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    143/233

    Q(1) and then drawing  θ(1) fromQ(1).)

    5.   Draw  θ(2) using  θ(1) and continually using the most updatedvalues.

    6.  Repeat until we get  M  draws, with each draw being a vectorθ

    (t ).

    7.  Optional burn-in and/or thinning.

    Our result is a Markov chain with a bunch of draws of  θ  that areapproximately from our posterior. We can do Monte CarloIntegration on those draws to get quantities of interest.

    An Example (Robert and Casella, 10.17)1

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    144/233

    1Robert, Christian P. and George Casella. 2004.   Monte Carlo Statistical  Met ho ds ,  2nd  ed it io n.   Springer.

    An Example (Robert and Casella, 10.17)1

    Suppose we have data of the number of failures (y i ) for each of 10pumps in a nuclear plant.

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    145/233

    1Robert, Christian P. and George Casella. 2004.   Monte Carlo Statistical  Met ho ds ,  2nd  ed it io n.   Springer.

    An Example (Robert and Casella, 10.17)

    1

    Suppose we have data of the number of failures (y i ) for each of 10pumps in a nuclear plant.

    We also have the times (t i ) at which each pump was observed.

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    146/233

    1Robert, Christian P. and George Casella. 2004.   Monte Carlo Statistical  Met ho ds ,  2nd  ed it io n.   Springer.

    An Example (Robert and Casella, 10.17)

    1

    Suppose we have data of the number of failures (y i ) for each of 10pumps in a nuclear plant.

    We also have the times (t i ) at which each pump was observed.(5 1 5 14 3 19 1 1 4 22)

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    147/233

    > y t rbind(y, t)

    [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]y 5 1 5 14 3 19 1 1 4 22t 94 16 63 126 5 31 1 1 2 10

    1Robert, Christian P. and George Casella. 2004.   Monte Carlo Statistical  Met ho ds ,  2nd  ed it io n.   Springer.

    An Example (Robert and Casella, 10.17)

    1

    Suppose we have data of the number of failures (y i ) for each of 10pumps in a nuclear plant.

    We also have the times (t i ) at which each pump was observed.> < (5 1 5 14 3 19 1 1 4 22)

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    148/233

    > y t rbind(y, t)

    [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]y 5 1 5 14 3 19 1 1 4 22t 94 16 63 126 5 31 1 1 2 10

    We want to model the number of failures with a Poisson likelihood,where the expected number of failure  λi  differs for each pump.

    1Robert, Christian P. and George Casella. 2004.   Monte Carlo Statistical  Met ho ds ,  2nd  ed it io n.   Springer.

    An Example (Robert and Casella, 10.17)

    1

    Suppose we have data of the number of failures (y i ) for each of 10pumps in a nuclear plant.

    We also have the times (t i ) at which each pump was observed.> y < c(5 1 5 14 3 19 1 1 4 22)

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    149/233

    > y t rbind(y, t)

    [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]y 5 1 5 14 3 19 1 1 4 22t 94 16 63 126 5 31 1 1 2 10

    We want to model the number of failures with a Poisson likelihood,where the expected number of failure  λi  differs for each pump.Since the time which we observed each pump is different, we needto scale each  λi  by its observed time  t i .

    1Robert, Christian P. and George Casella. 2004.   Monte Carlo Statistical  Met ho ds ,  2nd  ed it io n.   Springer.

    An Example (Robert and Casella, 10.17)

    1

    Suppose we have data of the number of failures (y i ) for each of 10pumps in a nuclear plant.

    We also have the times (t i ) at which each pump was observed.> y

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    150/233

    > y < c(5, 1, 5, 14, 3, 19, 1, 1, 4, 22)> t rbind(y, t)

    [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]y 5 1 5 14 3 19 1 1 4 22t 94 16 63 126 5 31 1 1 2 10

    We want to model the number of failures with a Poisson likelihood,where the expected number of failure  λi  differs for each pump.Since the time which we observed each pump is different, we needto scale each  λi  by its observed time  t i .

    Our likelihood is 10

    i =1 Poisson(λi t i ).

    1Robert, Christian P. and George Casella. 2004.   Monte Carlo Statistical  Met ho ds ,  2nd  ed it io n.   Springer.

    Let’s put a Gamma(α, β ) prior on  λi   with  α = 1.8, so the  λi s aredrawn from the same distribution.

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    151/233

    Let’s put a Gamma(α, β ) prior on  λi   with  α = 1.8, so the  λi s aredrawn from the same distribution.

    Also, let’s put a Gamma(γ, δ ) prior on β , with γ  = 0.01 and δ  = 1.

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    152/233

    Let’s put a Gamma(α, β ) prior on  λi   with  α = 1.8, so the  λi s aredrawn from the same distribution.

    Also, let’s put a Gamma(γ, δ ) prior on β , with γ  = 0.01 and δ  = 1.

    So our model has 11 parameters that are unknown (10  λi s and  β ).

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    153/233

    Let’s put a Gamma(α, β ) prior on  λi   with  α = 1.8, so the  λi s aredrawn from the same distribution.

    Also, let’s put a Gamma(γ, δ ) prior on β , with γ  = 0.01 and δ  = 1.

    So our model has 11 parameters that are unknown (10  λi s and  β ).

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    154/233

    Our posterior is

    p (λ, β |y, t)   ∝

    10i =1

    Poisson(λi t i ) ×

    Gamma(α, β )

    ×Gamma

    (γ, δ )

    Let’s put a Gamma(α, β ) prior on  λi   with  α = 1.8, so the  λi s aredrawn from the same distribution.

    Also, let’s put a Gamma(γ, δ ) prior on β , with γ  = 0.01 and δ  = 1.

    So our model has 11 parameters that are unknown (10  λi s and  β ).

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    155/233

    Our posterior is

    p (λ, β |y, t)   ∝

    10i =1

    Poisson(λi t i ) ×

    Gamma(α, β )

    ×Gamma

    (γ, δ )

    =

    10

    i =1

    e −λi t i (λi t i )y i 

    y i !  ×

      β α

    Γ(α)λα−1i    e 

    −βλi 

    ×   δ γ 

    Γ(γ )β γ −1e −δβ 

    p (λ, β |y, t)   ∝

    10

    i =1

    e −λi t i (λi t i )y i  × β αλα−1i    e 

    −βλi 

    ×β γ −1e −δβ 

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    156/233

    p (λ, β |y, t)   ∝

    10

    i =1

    e −λi t i (λi t i )y i  × β αλα−1i    e 

    −βλi 

    ×β γ −1e −δβ 

    = 10

    i =1λy i +α−1i    e −(t i +β )λi 

    β 10α+γ −1e −δβ 

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    157/233

    p (λ, β |y, t)   ∝

    10

    i =1

    e −λi t i (λi t i )y i  × β αλα−1i    e 

    −βλi 

    ×β γ −1e −δβ 

    = 10

    i =1λ

    y i +α−1i    e −(t i +β )λi 

    β 10α+γ −1e −δβ 

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    158/233

    Finding the full conditionals:

    p (λ, β |y, t)   ∝

    10

    i =1

    e −λi t i (λi t i )y i  × β αλα−1i    e 

    −βλi 

    ×β γ −1e −δβ 

    = 10

    i =1λ

    y i +

    α−1i    e −(t i +β )λi 

    β 10α+γ −1e −δβ 

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    159/233

    Finding the full conditionals:

    p (λi |λ−i , β, y, t)   ∝   λy i +α−1i    e 

    −(t i +β )λi 

    p (λ, β |y, t)   ∝

    10

    i =1

    e −λi t i (λi t i )y i  × β αλα−1i    e 

    −βλi 

    ×β γ −1e −δβ 

    = 10

    i =1λ

    y i +

    α−1i    e −(t i +β )λi 

    β 10α+γ −1e −δβ 

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    160/233

    Finding the full conditionals:

    p (λi |λ−i , β, y, t)   ∝   λy i +α−1i    e 

    −(t i +β )λi 

    p (β |λ, y, t)   ∝   e −β (δ+P10

    i =1 λi )β 10α+γ −1

    p (λ, β |y, t)   ∝

    10

    i =1

    e −λi t i (λi t i )y i  × β αλα−1i    e 

    −βλi 

    ×β γ −1e −δβ 

    = 10

    i =1λ

    y i +α−1

    i    e −(t i +β )λi β 10α+γ −1e −δβ 

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    161/233

    Finding the full conditionals:

    p (λi |λ−i , β, y, t)   ∝   λy i +α−1i    e 

    −(t i +β )λi 

    p (β |λ, y, t)   ∝   e −β (δ+P10

    i =1 λi )β 10α+γ −1

    p (λi |λ−i , β, y, t) is a Gamma(y i  + α, t i  + β ) distribution.

    p (λ, β |y, t)   ∝

    10

    i =1

    e −λi t i (λi t i )y i  × β αλα−1i    e 

    −βλi 

    ×β γ −1e −δβ 

    = 10

    i =1λ

    y i +α−1

    i    e −(t i +β )λi β 10α+γ −1e −δβ 

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    162/233

    Finding the full conditionals:

    p (λi |λ−i , β, y, t)   ∝   λy i +α−1i    e 

    −(t i +β )λi 

    p (β |λ, y, t)   ∝   e −β (δ+P10

    i =1 λi )β 10α+γ −1

    p (λi |λ−i , β, y, t) is a Gamma(y i  + α, t i  + β ) distribution.

    p (β |λ, y, t) is a Gamma(10α + γ, δ  +10

    i =1 λi ) distribution.

    Coding the Gibbs Sampler

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    163/233

    Coding the Gibbs Sampler

    1.  Define starting values for  β 

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    164/233

    Coding the Gibbs Sampler

    1.  Define starting values for  β  (we only need to define  β  herebecause we will draw  λ  first and it only depends on  β  andother given values).

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    165/233

    Coding the Gibbs Sampler

    1.  Define starting values for  β  (we only need to define  β  herebecause we will draw  λ  first and it only depends on  β  andother given values).

    > beta.cur

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    166/233

    Coding the Gibbs Sampler

    1.  Define starting values for  β  (we only need to define  β  herebecause we will draw  λ  first and it only depends on  β  andother given values).

    > beta.cur

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    167/233

    2.   Draw  λ(1) from its full conditional

    Coding the Gibbs Sampler

    1.  Define starting values for  β  (we only need to define  β  herebecause we will draw  λ  first and it only depends on  β  andother given values).

    > beta.cur

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    168/233

    2.   Draw  λ(1) from its full conditional (we’re drawing all the  λi sas a block because they all only depend on  β  and not eachother).

    Coding the Gibbs Sampler

    1.  Define starting values for  β  (we only need to define  β  herebecause we will draw  λ  first and it only depends on  β  andother given values).

    > beta.cur

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    169/233

    2.   Draw  λ(1) from its full conditional (we’re drawing all the  λi sas a block because they all only depend on  β  and not eachother).

    > lambda.update beta.cur

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    170/233

    2.   Draw  λ(1) from its full conditional (we’re drawing all the  λi sas a block because they all only depend on  β  and not eachother).

    > lambda.update beta.cur

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    171/233

    2.   Draw  λ(1) from its full conditional (we’re drawing all the  λi sas a block because they all only depend on  β  and not eachother).

    > lambda.update beta.update

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    172/233

    4.  Repeat using most updated values until we get  M   draws.

    5 Optional burn in and thinning

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    173/233

    5.  Optional burn-in and thinning.

    4.  Repeat using most updated values until we get  M   draws.

    5 Optional burn in and thinning

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    174/233

    5.  Optional burn-in and thinning.

    6.  Make it into a function.

    > gibbs

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    175/233

    g g y p g+ }+ for (i in 1:n.sims) {  + lambda.cur

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    176/233

    7.  Do Monte Carlo Integration on the resulting Markov chain,which are samples approximately from the posterior.

    > posterior colMeans(posterior$lambda.draws)

    [1] 0.07113 0.15098 0.10447 0.12321 0.65680 0.62212 0.86522 0.85465[9] 1.35524 1.92694

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    177/233

    > mean(posterior$beta.draws)

    [1] 2.389

    > apply(posterior$lambda.draws, 2, sd)

    [1] 0.02759 0.08974 0.04012 0.03071 0.30899 0.13676 0.55689 0.54814[9] 0.60854 0.40812

    > sd(posterior$beta.draws)

    [1] 0.6986

    Outline

    Introduction to Markov Chain Monte Carlo

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    178/233

    Gibbs Sampling

    The Metropolis-Hastings Algorithm

    Suppose we have a posterior  p (θ|y) that we want to sample from,but

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    179/233

    Suppose we have a posterior  p (θ|y) that we want to sample from,but

     the posterior doesn’t look like any distribution we know (no

    conjugacy)

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    180/233

    Suppose we have a posterior  p (θ|y) that we want to sample from,but

     the posterior doesn’t look like any distribution we know (no

    conjugacy)   the posterior consists of more than 2 parameters (grid

    approximations intractable)

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    181/233

    approximations intractable)

    Suppose we have a posterior  p (θ|y) that we want to sample from,but

     the posterior doesn’t look like any distribution we know (no

    conjugacy)   the posterior consists of more than 2 parameters (grid

    approximations intractable)

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    182/233

    approximations intractable)

     some (or all) of the full conditionals do not look like any

    distributions we know (no Gibbs sampling for those whose fullconditionals we don’t know)

    Suppose we have a posterior  p (θ|y) that we want to sample from,but

     the posterior doesn’t look like any distribution we know (no

    conjugacy)   the posterior consists of more than 2 parameters (grid

    approximations intractable)

    http://find/http://goback/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    183/233

    pp )

     some (or all) of the full conditionals do not look like any

    distributions we know (no Gibbs sampling for those whose fullconditionals we don’t know)

    If all else fails, we can use the  Metropolis-Hastings  algorithm,which will always work.

    Metropolis-Hastings Algorithm

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    184/233

    Metropolis-Hastings Algorithm

    The Metropolis-Hastings Algorithm follows the following steps:

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    185/233

    Metropolis-Hastings Algorithm

    The Metropolis-Hastings Algorithm follows the following steps:

    1.  Choose a starting value  θ(0).

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    186/233

    Metropolis-Hastings Algorithm

    The Metropolis-Hastings Algorithm follows the following steps:

    1.  Choose a starting value  θ(0).

    2.  At iteration  t , draw a candidate  θ∗

    from a jumpingdistribution  J t (θ

    ∗|θ(t −1)).

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    187/233

    Metropolis-Hastings Algorithm

    The Metropolis-Hastings Algorithm follows the following steps:

    1.  Choose a starting value  θ(0).

    2.  At iteration  t , draw a candidate  θ∗

    from a jumpingdistribution  J t (θ

    ∗|θ(t −1)).

    3.  Compute an acceptance ratio (probability):

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    188/233

    r  =

      p (θ∗|y)/J t (θ∗|θ(t −1))

    p (θ(t −1)|y)/J t (θ(t −1)|θ∗)

    Metropolis-Hastings Algorithm

    The Metropolis-Hastings Algorithm follows the following steps:

    1.  Choose a starting value  θ(0).

    2.  At iteration  t , draw a candidate  θ∗

    from a jumpingdistribution  J t (θ

    ∗|θ(t −1)).

    3.  Compute an acceptance ratio (probability):

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    189/233

    r  =

      p (θ∗|y)/J t (θ∗|θ(t −1))

    p (θ(t −1)|y)/J t (θ(t −1)|θ∗)

    4.   Accept  θ∗ as  θ(t ) with probability min(r , 1). If  θ∗ is notaccepted, then  θ(t ) =  θ(t −1).

    Metropolis-Hastings Algorithm

    The Metropolis-Hastings Algorithm follows the following steps:

    1.  Choose a starting value  θ(0).

    2.  At iteration  t , draw a candidate  θ∗

    from a jumpingdistribution  J t (θ

    ∗|θ(t −1)).

    3.  Compute an acceptance ratio (probability):

    http://find/

  • 8/21/2019 MCMC Methods: Gibbs Sampling and the Metropolis-Hastings Algorithm

    190/233

    r  =

      p (θ∗|y)/J t (θ∗|θ(t −1))

    p (θ(t −1)|y)/J t (θ(t −1)|θ∗)