bayesian statistics - an introduction - qmul mathsfjw/goldsmiths/2008/lp/bayestalk.pdf · bayesian...

114
Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction Dr Lawrence Pettit School of Mathematical Sciences, Queen Mary, University of London July 22, 2008 Dr Lawrence Pettit Bayesian Statistics - an Introduction

Upload: others

Post on 20-May-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

Bayesian Statistics - an Introduction

Dr Lawrence Pettit

School of Mathematical Sciences, Queen Mary, University of London

July 22, 2008

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 2: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

What is Bayesian Statistics?

I Based on an idea of subjective probability.

I Have knowledge, beliefs about matter in hand.

I Express these as a (prior) probability distribution.

I Collect some data (likelihood).

I Use Bayes theorem to combine prior knowledge and newinformation to find new (posterior) probability distribution.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 3: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

What is Bayesian Statistics?

I Based on an idea of subjective probability.

I Have knowledge, beliefs about matter in hand.

I Express these as a (prior) probability distribution.

I Collect some data (likelihood).

I Use Bayes theorem to combine prior knowledge and newinformation to find new (posterior) probability distribution.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 4: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

What is Bayesian Statistics?

I Based on an idea of subjective probability.

I Have knowledge, beliefs about matter in hand.

I Express these as a (prior) probability distribution.

I Collect some data (likelihood).

I Use Bayes theorem to combine prior knowledge and newinformation to find new (posterior) probability distribution.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 5: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

What is Bayesian Statistics?

I Based on an idea of subjective probability.

I Have knowledge, beliefs about matter in hand.

I Express these as a (prior) probability distribution.

I Collect some data (likelihood).

I Use Bayes theorem to combine prior knowledge and newinformation to find new (posterior) probability distribution.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 6: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

What is Bayesian Statistics?

I Based on an idea of subjective probability.

I Have knowledge, beliefs about matter in hand.

I Express these as a (prior) probability distribution.

I Collect some data (likelihood).

I Use Bayes theorem to combine prior knowledge and newinformation to find new (posterior) probability distribution.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 7: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

What is Bayesian Statistics?

I Based on an idea of subjective probability.

I Have knowledge, beliefs about matter in hand.

I Express these as a (prior) probability distribution.

I Collect some data (likelihood).

I Use Bayes theorem to combine prior knowledge and newinformation to find new (posterior) probability distribution.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 8: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I Today’s posterior is tomorrow’s prior.

I Although different people will start with different priors withenough data opinions will converge.

I One coherent paradigm for all problems.

I Can handle more realistic complex models.

I Makes predictions.

I The interpretation of interval estimates is more natural.

I Nuisance parameter and constraints on parameters can beeasily dealt with.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 9: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I Today’s posterior is tomorrow’s prior.

I Although different people will start with different priors withenough data opinions will converge.

I One coherent paradigm for all problems.

I Can handle more realistic complex models.

I Makes predictions.

I The interpretation of interval estimates is more natural.

I Nuisance parameter and constraints on parameters can beeasily dealt with.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 10: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I Today’s posterior is tomorrow’s prior.

I Although different people will start with different priors withenough data opinions will converge.

I One coherent paradigm for all problems.

I Can handle more realistic complex models.

I Makes predictions.

I The interpretation of interval estimates is more natural.

I Nuisance parameter and constraints on parameters can beeasily dealt with.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 11: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I Today’s posterior is tomorrow’s prior.

I Although different people will start with different priors withenough data opinions will converge.

I One coherent paradigm for all problems.

I Can handle more realistic complex models.

I Makes predictions.

I The interpretation of interval estimates is more natural.

I Nuisance parameter and constraints on parameters can beeasily dealt with.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 12: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I Today’s posterior is tomorrow’s prior.

I Although different people will start with different priors withenough data opinions will converge.

I One coherent paradigm for all problems.

I Can handle more realistic complex models.

I Makes predictions.

I The interpretation of interval estimates is more natural.

I Nuisance parameter and constraints on parameters can beeasily dealt with.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 13: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I Today’s posterior is tomorrow’s prior.

I Although different people will start with different priors withenough data opinions will converge.

I One coherent paradigm for all problems.

I Can handle more realistic complex models.

I Makes predictions.

I The interpretation of interval estimates is more natural.

I Nuisance parameter and constraints on parameters can beeasily dealt with.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 14: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I Today’s posterior is tomorrow’s prior.

I Although different people will start with different priors withenough data opinions will converge.

I One coherent paradigm for all problems.

I Can handle more realistic complex models.

I Makes predictions.

I The interpretation of interval estimates is more natural.

I Nuisance parameter and constraints on parameters can beeasily dealt with.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 15: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Bayes theorem

I Bayes Theorem is named after Rev. Thomas Bayes, anonconformist minister who lived in England in the first halfof the eighteenth century.

I The theorem was published posthumously in 1763 in ‘Anessay towards solving a problem in the doctrine of chances’.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 16: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Bayes theorem

I Bayes Theorem is named after Rev. Thomas Bayes, anonconformist minister who lived in England in the first halfof the eighteenth century.

I The theorem was published posthumously in 1763 in ‘Anessay towards solving a problem in the doctrine of chances’.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 17: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Bayes theorem

I Let Ω be a sample space and B1,B2, . . . ,Bk be mutuallyexclusive and exhaustive events in Ω (i.e.Bi ∩ Bj = ∅, i 6= j ,∪k

i=1Bi = Ω; the Bi form a partition of Ω.)

I Let A be any event with Pr[A] > 0.

I

Pr[Bi |A] =Pr[Bi ] Pr[A|Bi ]

Pr[A]=

Pr[Bi ] Pr[A|Bi ]∑kj=1 Pr[Bj ] Pr[A|Bj ]

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 18: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Bayes theorem

I Let Ω be a sample space and B1,B2, . . . ,Bk be mutuallyexclusive and exhaustive events in Ω (i.e.Bi ∩ Bj = ∅, i 6= j ,∪k

i=1Bi = Ω; the Bi form a partition of Ω.)

I Let A be any event with Pr[A] > 0.

I

Pr[Bi |A] =Pr[Bi ] Pr[A|Bi ]

Pr[A]=

Pr[Bi ] Pr[A|Bi ]∑kj=1 Pr[Bj ] Pr[A|Bj ]

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 19: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Bayes theorem

I Let Ω be a sample space and B1,B2, . . . ,Bk be mutuallyexclusive and exhaustive events in Ω (i.e.Bi ∩ Bj = ∅, i 6= j ,∪k

i=1Bi = Ω; the Bi form a partition of Ω.)

I Let A be any event with Pr[A] > 0.

I

Pr[Bi |A] =Pr[Bi ] Pr[A|Bi ]

Pr[A]=

Pr[Bi ] Pr[A|Bi ]∑kj=1 Pr[Bj ] Pr[A|Bj ]

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 20: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Example

I A diagnostic test for a disease gives a correct result 99% ofthe time. Suppose 2% of the population have the disease. If aperson selected at random from the population is given thetest and produces a positive result what is the probability thatthe person has the disease? Suppose they are given a secondindependent test and it is also positive, what is the probabilitythat they have the disease now?

I Let ‘+’ and ‘−’ denote the events that a test is positive ornegative, respectively. Let D denote the event that the personhas the disease.

I We require p(D|+) and p(D|+ +).

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 21: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Example

I A diagnostic test for a disease gives a correct result 99% ofthe time. Suppose 2% of the population have the disease. If aperson selected at random from the population is given thetest and produces a positive result what is the probability thatthe person has the disease? Suppose they are given a secondindependent test and it is also positive, what is the probabilitythat they have the disease now?

I Let ‘+’ and ‘−’ denote the events that a test is positive ornegative, respectively. Let D denote the event that the personhas the disease.

I We require p(D|+) and p(D|+ +).

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 22: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Solution

I Let ‘+’ and ‘−’ denote the events that a test is positive ornegative, respectively. Let D denote the event that the personhas the disease.

I We require p(D|+) and p(D|+ +). We are told that

I

p(+|D) = 0.99, p(−|D) = 0.99, p(D) = 0.02.

I By Bayes theorem

p(D|+) =p(+|D)p(D)

p(+|D)p(D) + p(+|D)p(D)

=0.99× 0.02

0.99× 0.02 + 0.01× 0.98= 0.6689

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 23: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Solution

I Let ‘+’ and ‘−’ denote the events that a test is positive ornegative, respectively. Let D denote the event that the personhas the disease.

I We require p(D|+) and p(D|+ +). We are told thatI

p(+|D) = 0.99, p(−|D) = 0.99, p(D) = 0.02.

I By Bayes theorem

p(D|+) =p(+|D)p(D)

p(+|D)p(D) + p(+|D)p(D)

=0.99× 0.02

0.99× 0.02 + 0.01× 0.98= 0.6689

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 24: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Solution

I Let ‘+’ and ‘−’ denote the events that a test is positive ornegative, respectively. Let D denote the event that the personhas the disease.

I We require p(D|+) and p(D|+ +). We are told thatI

p(+|D) = 0.99, p(−|D) = 0.99, p(D) = 0.02.

I By Bayes theorem

p(D|+) =p(+|D)p(D)

p(+|D)p(D) + p(+|D)p(D)

=0.99× 0.02

0.99× 0.02 + 0.01× 0.98= 0.6689

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 25: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Solution for a second test

I Suppose that we have a second positive test result

p(D|+ +) =p(+ + |D)p(D)

p(+ + |D)p(D) + p(+ + |D)p(D)

=0.992 × 0.02

0.992 × 0.02 + 0.012 × 0.98= 0.9950

I Thus although the test is very accurate we need two positiveresults before we can say with confidence that the person hasthe disease.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 26: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Solution for a second test

I Suppose that we have a second positive test result

p(D|+ +) =p(+ + |D)p(D)

p(+ + |D)p(D) + p(+ + |D)p(D)

=0.992 × 0.02

0.992 × 0.02 + 0.012 × 0.98= 0.9950

I Thus although the test is very accurate we need two positiveresults before we can say with confidence that the person hasthe disease.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 27: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Density form of Bayes Theorem

I Let X , θ be two continuous r.v.’s (possibly multivariate).

f (θ|x) =f (θ)f (x |θ)

f (x)=

f (θ)f (x |θ)∫f (θ′)f (x |θ′)dθ′

I Posterior ∝ Likelihood × Prior

I

p(θ|x) ∝ p(x |θ)p(θ)

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 28: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Density form of Bayes Theorem

I Let X , θ be two continuous r.v.’s (possibly multivariate).

f (θ|x) =f (θ)f (x |θ)

f (x)=

f (θ)f (x |θ)∫f (θ′)f (x |θ′)dθ′

I Posterior ∝ Likelihood × Prior

I

p(θ|x) ∝ p(x |θ)p(θ)

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 29: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Density form of Bayes Theorem

I Let X , θ be two continuous r.v.’s (possibly multivariate).

f (θ|x) =f (θ)f (x |θ)

f (x)=

f (θ)f (x |θ)∫f (θ′)f (x |θ′)dθ′

I Posterior ∝ Likelihood × Prior

I

p(θ|x) ∝ p(x |θ)p(θ)

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 30: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

The Likelihood Principle

I To illustrate the difference between the classical and Bayesianapproaches we start with an example.

I Suppose we toss a drawing pin and get 9 ‘ups’ and 3 ‘downs’.We denote ‘up’ by U and ‘down’ by D. Is the pin unbiased?

I Classically we might test H0 : p = 12 versus H1 : p > 1

2 wherep = p(U). The probability of the observed result or somethingmore extreme (tail area) if H0 is true is(

12

3

)+

(12

2

)+

(12

1

)+

(12

0

) (1

2

)12

which = 2994096 ≈ 7.3%. Thus we would accept H0 at the 5%

level.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 31: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

The Likelihood Principle

I To illustrate the difference between the classical and Bayesianapproaches we start with an example.

I Suppose we toss a drawing pin and get 9 ‘ups’ and 3 ‘downs’.We denote ‘up’ by U and ‘down’ by D. Is the pin unbiased?

I Classically we might test H0 : p = 12 versus H1 : p > 1

2 wherep = p(U). The probability of the observed result or somethingmore extreme (tail area) if H0 is true is(

12

3

)+

(12

2

)+

(12

1

)+

(12

0

) (1

2

)12

which = 2994096 ≈ 7.3%. Thus we would accept H0 at the 5%

level.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 32: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

The Likelihood Principle

I To illustrate the difference between the classical and Bayesianapproaches we start with an example.

I Suppose we toss a drawing pin and get 9 ‘ups’ and 3 ‘downs’.We denote ‘up’ by U and ‘down’ by D. Is the pin unbiased?

I Classically we might test H0 : p = 12 versus H1 : p > 1

2 wherep = p(U). The probability of the observed result or somethingmore extreme (tail area) if H0 is true is(

12

3

)+

(12

2

)+

(12

1

)+

(12

0

) (1

2

)12

which = 2994096 ≈ 7.3%. Thus we would accept H0 at the 5%

level.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 33: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I However this assumes that we did the experiment bydeciding to toss the drawing pin 12 times.

I What if we decided to toss the pin until we achieved 3 D’s?I Now the probability of the observed result or something

more extreme if H0 is true is(11

2

) (1

2

)12

+

(12

2

) (1

2

)13

+

(13

2

) (1

2

)14

+ · · ·

We may calculate the probability of the complement of thisevent by(

10

2

) (1

2

)11

+

(9

2

) (1

2

)10

+ · · ·+(

2

2

) (1

2

)3

It follows that the P-value is 1344096 ≈ 3.25%. So we would

reject H0 at the 5% level.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 34: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I However this assumes that we did the experiment bydeciding to toss the drawing pin 12 times.

I What if we decided to toss the pin until we achieved 3 D’s?

I Now the probability of the observed result or somethingmore extreme if H0 is true is(

11

2

) (1

2

)12

+

(12

2

) (1

2

)13

+

(13

2

) (1

2

)14

+ · · ·

We may calculate the probability of the complement of thisevent by(

10

2

) (1

2

)11

+

(9

2

) (1

2

)10

+ · · ·+(

2

2

) (1

2

)3

It follows that the P-value is 1344096 ≈ 3.25%. So we would

reject H0 at the 5% level.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 35: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I However this assumes that we did the experiment bydeciding to toss the drawing pin 12 times.

I What if we decided to toss the pin until we achieved 3 D’s?I Now the probability of the observed result or something

more extreme if H0 is true is(11

2

) (1

2

)12

+

(12

2

) (1

2

)13

+

(13

2

) (1

2

)14

+ · · ·

We may calculate the probability of the complement of thisevent by(

10

2

) (1

2

)11

+

(9

2

) (1

2

)10

+ · · ·+(

2

2

) (1

2

)3

It follows that the P-value is 1344096 ≈ 3.25%. So we would

reject H0 at the 5% level.Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 36: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I In order to perform a significance test we are required tospecify a sample space i.e. the space of all possibleoutcomes.

I Possibilities for the drawing pin are:(i) (u, d) : u + d = 12,(ii) (u, d) : d = 3,or if I carry on tossing the pin until my coffee is ready, sothat there is a random stopping point(iii) all (u, d).

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 37: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I In order to perform a significance test we are required tospecify a sample space i.e. the space of all possibleoutcomes.

I Possibilities for the drawing pin are:(i) (u, d) : u + d = 12,(ii) (u, d) : d = 3,or if I carry on tossing the pin until my coffee is ready, sothat there is a random stopping point(iii) all (u, d).

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 38: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I The Bayesian analysis of this problem is somewhat different.Let θ be the chance that the pin lands up.

I θ is a “long run frequency” of U’s. It is an objective propertyof the pin. It does not depend on You.

I You have beliefs about θ which you express in the form of aprobability density function (pdf) p(θ). You use Bayestheorem to update your beliefs.

I

p(θ|data) ∝ p(data|θ)p(θ)

θ9(1− θ)3p(θ)

The sampling rule is irrelevant.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 39: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I The Bayesian analysis of this problem is somewhat different.Let θ be the chance that the pin lands up.

I θ is a “long run frequency” of U’s. It is an objective propertyof the pin. It does not depend on You.

I You have beliefs about θ which you express in the form of aprobability density function (pdf) p(θ). You use Bayestheorem to update your beliefs.

I

p(θ|data) ∝ p(data|θ)p(θ)

θ9(1− θ)3p(θ)

The sampling rule is irrelevant.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 40: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I The Bayesian analysis of this problem is somewhat different.Let θ be the chance that the pin lands up.

I θ is a “long run frequency” of U’s. It is an objective propertyof the pin. It does not depend on You.

I You have beliefs about θ which you express in the form of aprobability density function (pdf) p(θ). You use Bayestheorem to update your beliefs.

I

p(θ|data) ∝ p(data|θ)p(θ)

θ9(1− θ)3p(θ)

The sampling rule is irrelevant.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 41: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I The Bayesian analysis of this problem is somewhat different.Let θ be the chance that the pin lands up.

I θ is a “long run frequency” of U’s. It is an objective propertyof the pin. It does not depend on You.

I You have beliefs about θ which you express in the form of aprobability density function (pdf) p(θ). You use Bayestheorem to update your beliefs.

I

p(θ|data) ∝ p(data|θ)p(θ)

θ9(1− θ)3p(θ)

The sampling rule is irrelevant.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 42: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I In deriving the posterior distribution the only contributionfrom the data is through the likelihood p(data|θ). Thus aBayesian inference, which will depend only on the posteriordistribution, obeys the likelihood principle.

I This roughly says that if two experiments give the samelikelihoods then the inferences we make should be the same ineach case.

I Classical hypothesis tests or confidence intervals violate thelikelihood principle.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 43: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I In deriving the posterior distribution the only contributionfrom the data is through the likelihood p(data|θ). Thus aBayesian inference, which will depend only on the posteriordistribution, obeys the likelihood principle.

I This roughly says that if two experiments give the samelikelihoods then the inferences we make should be the same ineach case.

I Classical hypothesis tests or confidence intervals violate thelikelihood principle.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 44: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I In deriving the posterior distribution the only contributionfrom the data is through the likelihood p(data|θ). Thus aBayesian inference, which will depend only on the posteriordistribution, obeys the likelihood principle.

I This roughly says that if two experiments give the samelikelihoods then the inferences we make should be the same ineach case.

I Classical hypothesis tests or confidence intervals violate thelikelihood principle.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 45: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I We need to give a prior distribution for θ.

I Suppose we take

p(θ) =Γ(a + b)

Γ(a)Γ(b)θa−1(1− θ)b−1 a, b > 0

ie a beta distribution with mean a/(a + b) and variance

a

a + b

b

a + b

1

a + b + 1.

We shall write this distribution as Be(a, b).I It then follows that

p(θ|data) ∝ θ9+a−1(1− θ)3+b−1.

That is Be(9 + a, 3 + b). Thus if we take a beta prior for θ weshall obtain a beta posterior.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 46: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I We need to give a prior distribution for θ.I Suppose we take

p(θ) =Γ(a + b)

Γ(a)Γ(b)θa−1(1− θ)b−1 a, b > 0

ie a beta distribution with mean a/(a + b) and variance

a

a + b

b

a + b

1

a + b + 1.

We shall write this distribution as Be(a, b).

I It then follows that

p(θ|data) ∝ θ9+a−1(1− θ)3+b−1.

That is Be(9 + a, 3 + b). Thus if we take a beta prior for θ weshall obtain a beta posterior.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 47: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I We need to give a prior distribution for θ.I Suppose we take

p(θ) =Γ(a + b)

Γ(a)Γ(b)θa−1(1− θ)b−1 a, b > 0

ie a beta distribution with mean a/(a + b) and variance

a

a + b

b

a + b

1

a + b + 1.

We shall write this distribution as Be(a, b).I It then follows that

p(θ|data) ∝ θ9+a−1(1− θ)3+b−1.

That is Be(9 + a, 3 + b). Thus if we take a beta prior for θ weshall obtain a beta posterior.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 48: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I The choice a = b = 1 gives a uniform distribution as the prior,ie we think any value of θ is equally likely. More realisticallyfor a pin we might take a = b = 2, reflecting a belief that wethink θ is more likely to be near 0.5 than 0 or 1 but not beingvery sure.

I Others might choose an asymmetric prior, perhaps arguingthat a pin with a very long point would very likely land downso a pin with any point would land down more often than up. Iam not convinced by this argument but it shows that differentpeople do have different beliefs and one of the advantages ofthe Bayesian approach is that the analysis can reflect these.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 49: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I The choice a = b = 1 gives a uniform distribution as the prior,ie we think any value of θ is equally likely. More realisticallyfor a pin we might take a = b = 2, reflecting a belief that wethink θ is more likely to be near 0.5 than 0 or 1 but not beingvery sure.

I Others might choose an asymmetric prior, perhaps arguingthat a pin with a very long point would very likely land downso a pin with any point would land down more often than up. Iam not convinced by this argument but it shows that differentpeople do have different beliefs and one of the advantages ofthe Bayesian approach is that the analysis can reflect these.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 50: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I If we were throwing a coin rather than a pin then we wouldalmost certainly choose a different prior. We have much moreexperience throwing coins than pins and are much more surethat θ, the chance the coin will land heads, is close to 0.5.Thus we might take a = b = 50 say.

I The posteriors we get for the pin and the coin will be verydifferent. For the pin the posterior mean will be 11/16 =0.6875 and the posterior variance11/16× 5/16× 1/17 = 0.0126. For the coin the posteriormean will be 59/112 = 0.527 which is close to 0.5.

I The classical unbiased estimate is 9/12 = 0.75 if the numberof throws is fixed, or 9/11 = 0.818 if we continue until wehave three ‘failures’. The classical answers are the samewhether for pins or coins, ignoring the extra information thatwe have.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 51: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I If we were throwing a coin rather than a pin then we wouldalmost certainly choose a different prior. We have much moreexperience throwing coins than pins and are much more surethat θ, the chance the coin will land heads, is close to 0.5.Thus we might take a = b = 50 say.

I The posteriors we get for the pin and the coin will be verydifferent. For the pin the posterior mean will be 11/16 =0.6875 and the posterior variance11/16× 5/16× 1/17 = 0.0126. For the coin the posteriormean will be 59/112 = 0.527 which is close to 0.5.

I The classical unbiased estimate is 9/12 = 0.75 if the numberof throws is fixed, or 9/11 = 0.818 if we continue until wehave three ‘failures’. The classical answers are the samewhether for pins or coins, ignoring the extra information thatwe have.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 52: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I If we were throwing a coin rather than a pin then we wouldalmost certainly choose a different prior. We have much moreexperience throwing coins than pins and are much more surethat θ, the chance the coin will land heads, is close to 0.5.Thus we might take a = b = 50 say.

I The posteriors we get for the pin and the coin will be verydifferent. For the pin the posterior mean will be 11/16 =0.6875 and the posterior variance11/16× 5/16× 1/17 = 0.0126. For the coin the posteriormean will be 59/112 = 0.527 which is close to 0.5.

I The classical unbiased estimate is 9/12 = 0.75 if the numberof throws is fixed, or 9/11 = 0.818 if we continue until wehave three ‘failures’. The classical answers are the samewhether for pins or coins, ignoring the extra information thatwe have.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 53: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Plot for pins

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 54: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Plot for coins

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 55: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I By taking mixtures of conjugate priors we can represent morerealistic beliefs.

I As an example consider the result when a coin is spun on itsedge. Experience has shown that when spinning a coin theproportion of heads is more likely to be 1/3 or 2/3 than 1/2.Therefore a bimodal prior seems appropriate. Since spinningcoins will be Bernoulli trials the beta distribution will beconjugate.

I Therefore we take the following prior

p(θ) =1

2Be(10, 20) +

1

2Be(20, 10)

i.e

p(θ) =1

2

Γ(30)

Γ(10)Γ(20)θ10−1(1−θ)20−1+

1

2

Γ(30)

Γ(20)Γ(10)θ20−1(1−θ)10−1

where θ is the chance that the coin comes down heads.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 56: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I By taking mixtures of conjugate priors we can represent morerealistic beliefs.

I As an example consider the result when a coin is spun on itsedge. Experience has shown that when spinning a coin theproportion of heads is more likely to be 1/3 or 2/3 than 1/2.Therefore a bimodal prior seems appropriate. Since spinningcoins will be Bernoulli trials the beta distribution will beconjugate.

I Therefore we take the following prior

p(θ) =1

2Be(10, 20) +

1

2Be(20, 10)

i.e

p(θ) =1

2

Γ(30)

Γ(10)Γ(20)θ10−1(1−θ)20−1+

1

2

Γ(30)

Γ(20)Γ(10)θ20−1(1−θ)10−1

where θ is the chance that the coin comes down heads.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 57: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I By taking mixtures of conjugate priors we can represent morerealistic beliefs.

I As an example consider the result when a coin is spun on itsedge. Experience has shown that when spinning a coin theproportion of heads is more likely to be 1/3 or 2/3 than 1/2.Therefore a bimodal prior seems appropriate. Since spinningcoins will be Bernoulli trials the beta distribution will beconjugate.

I Therefore we take the following prior

p(θ) =1

2Be(10, 20) +

1

2Be(20, 10)

i.e

p(θ) =1

2

Γ(30)

Γ(10)Γ(20)θ10−1(1−θ)20−1+

1

2

Γ(30)

Γ(20)Γ(10)θ20−1(1−θ)10−1

where θ is the chance that the coin comes down heads.Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 58: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I If we observe 3 heads and 7 tails in 10 spins the posteriorbased on the first prior is Be(13, 27) and on the second prioris Be(23, 17).

I The weight on the first posterior, α′, is

(0.5p1(x))(0.5p1(x) + 0.5p2(x))−1

Now

p1(x) =Γ(30)

Γ(10)Γ(20)

Γ(13)Γ(27)

Γ(40)

and similarly

p2(x) =Γ(30)

Γ(20)Γ(10)

Γ(23)Γ(17)

Γ(40)

A little calculation shows that α′ = 0.89 and β′ = 0.11.Hence the posterior is

p(θ|x) = 0.89Be(13, 27) + 0.11Be(23, 17)

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 59: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I If we observe 3 heads and 7 tails in 10 spins the posteriorbased on the first prior is Be(13, 27) and on the second prioris Be(23, 17).

I The weight on the first posterior, α′, is

(0.5p1(x))(0.5p1(x) + 0.5p2(x))−1

Now

p1(x) =Γ(30)

Γ(10)Γ(20)

Γ(13)Γ(27)

Γ(40)

and similarly

p2(x) =Γ(30)

Γ(20)Γ(10)

Γ(23)Γ(17)

Γ(40)

A little calculation shows that α′ = 0.89 and β′ = 0.11.

Hence the posterior is

p(θ|x) = 0.89Be(13, 27) + 0.11Be(23, 17)

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 60: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I If we observe 3 heads and 7 tails in 10 spins the posteriorbased on the first prior is Be(13, 27) and on the second prioris Be(23, 17).

I The weight on the first posterior, α′, is

(0.5p1(x))(0.5p1(x) + 0.5p2(x))−1

Now

p1(x) =Γ(30)

Γ(10)Γ(20)

Γ(13)Γ(27)

Γ(40)

and similarly

p2(x) =Γ(30)

Γ(20)Γ(10)

Γ(23)Γ(17)

Γ(40)

A little calculation shows that α′ = 0.89 and β′ = 0.11.Hence the posterior is

p(θ|x) = 0.89Be(13, 27) + 0.11Be(23, 17)Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 61: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Plot for mixture prior

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 62: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Example

I Suppose that we are about to introduce a new insurancepolicy. Interested in the mean number of claims per month.

I Prior?

I May expect that on average there would be 2 claims permonth, with a variance of 1.

I Conjugate prior for θ is Gamma(4,2):

p(θ) ∝ θ4−1 exp(−2θ)

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 63: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Example

I Suppose that we are about to introduce a new insurancepolicy. Interested in the mean number of claims per month.

I Prior?

I May expect that on average there would be 2 claims permonth, with a variance of 1.

I Conjugate prior for θ is Gamma(4,2):

p(θ) ∝ θ4−1 exp(−2θ)

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 64: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Example

I Suppose that we are about to introduce a new insurancepolicy. Interested in the mean number of claims per month.

I Prior?

I May expect that on average there would be 2 claims permonth, with a variance of 1.

I Conjugate prior for θ is Gamma(4,2):

p(θ) ∝ θ4−1 exp(−2θ)

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 65: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Example

I Suppose that we are about to introduce a new insurancepolicy. Interested in the mean number of claims per month.

I Prior?

I May expect that on average there would be 2 claims permonth, with a variance of 1.

I Conjugate prior for θ is Gamma(4,2):

p(θ) ∝ θ4−1 exp(−2θ)

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 66: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Prior density - Gamma(4,2)

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 67: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Calculation of the posterior

I After 6 months there have been 18 claims.

I Assume that data follow a Poisson distribution with mean θ.

p(x |θ) =exp(−θ)θx

x!

I The likelihood is given by

p(x |θ) ∝ θ∑

xi exp(−nθ)

where n = 6 and∑

xi = 18.I Posterior

p(θ|x) ∝ θ22−1 exp(−8θ)

which we can recognise is a Ga(22, 8).

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 68: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Calculation of the posterior

I After 6 months there have been 18 claims.I Assume that data follow a Poisson distribution with mean θ.

p(x |θ) =exp(−θ)θx

x!

I The likelihood is given by

p(x |θ) ∝ θ∑

xi exp(−nθ)

where n = 6 and∑

xi = 18.I Posterior

p(θ|x) ∝ θ22−1 exp(−8θ)

which we can recognise is a Ga(22, 8).

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 69: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Calculation of the posterior

I After 6 months there have been 18 claims.I Assume that data follow a Poisson distribution with mean θ.

p(x |θ) =exp(−θ)θx

x!

I The likelihood is given by

p(x |θ) ∝ θ∑

xi exp(−nθ)

where n = 6 and∑

xi = 18.

I Posteriorp(θ|x) ∝ θ22−1 exp(−8θ)

which we can recognise is a Ga(22, 8).

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 70: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Calculation of the posterior

I After 6 months there have been 18 claims.I Assume that data follow a Poisson distribution with mean θ.

p(x |θ) =exp(−θ)θx

x!

I The likelihood is given by

p(x |θ) ∝ θ∑

xi exp(−nθ)

where n = 6 and∑

xi = 18.I Posterior

p(θ|x) ∝ θ22−1 exp(−8θ)

which we can recognise is a Ga(22, 8).

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 71: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Calculation of the posterior

I Posterior mean (228 ) is a weighted average of the prior mean

(2) and data mean (3).

I Weights are related to the prior and data variances.

I This holds for other common distributions.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 72: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Calculation of the posterior

I Posterior mean (228 ) is a weighted average of the prior mean

(2) and data mean (3).

I Weights are related to the prior and data variances.

I This holds for other common distributions.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 73: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Calculation of the posterior

I Posterior mean (228 ) is a weighted average of the prior mean

(2) and data mean (3).

I Weights are related to the prior and data variances.

I This holds for other common distributions.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 74: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 75: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Prior/Posterior density

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 76: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Predictive distributions

I Distribution of a future observation y?

I

x = (x1, . . . , xn) is a random sample from p(xi |θ)

I Prior p(θ). Posterior p(θ|x).

I y - independent observation from the same distribution.Want p(y |x) - predictive distribution of y .

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 77: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Predictive distributions

I Distribution of a future observation y?

I

x = (x1, . . . , xn) is a random sample from p(xi |θ)

I Prior p(θ). Posterior p(θ|x).

I y - independent observation from the same distribution.Want p(y |x) - predictive distribution of y .

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 78: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Predictive distributions

I Distribution of a future observation y?

I

x = (x1, . . . , xn) is a random sample from p(xi |θ)

I Prior p(θ). Posterior p(θ|x).

I y - independent observation from the same distribution.Want p(y |x) - predictive distribution of y .

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 79: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Predictive distributions

I Distribution of a future observation y?

I

x = (x1, . . . , xn) is a random sample from p(xi |θ)

I Prior p(θ). Posterior p(θ|x).

I y - independent observation from the same distribution.

Want p(y |x) - predictive distribution of y .

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 80: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Predictive distributions

I Distribution of a future observation y?

I

x = (x1, . . . , xn) is a random sample from p(xi |θ)

I Prior p(θ). Posterior p(θ|x).

I y - independent observation from the same distribution.Want p(y |x) - predictive distribution of y .

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 81: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Predictive density

Now by extension of the argument

p(y |x) =

∫p(y |θ, x)p(θ|x)dθ

=

∫p(y |θ)p(θ|x)dθ

since y is independent of x given θ.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 82: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Poisson predictive example

I x1, . . . , xn ∼ Poisson(θ), prior for θ is Gamma(4,2)Posterior ⇒ Gamma(22,8)

I Suppose y ∼ Poisson(θ)

p(y |x) =

∫p(y |θ)p(θ|x)dθ

=

∫θy exp(−θ)

y !

822θ21 exp−8θΓ(22)

=822

Γ(22)y !

∫θ21+y exp−θ(9)dθ

=822

Γ(22)y !

Γ(y + 22)

922+yy = 0, 1, 2, 3, . . .

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 83: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Poisson predictive example

I x1, . . . , xn ∼ Poisson(θ), prior for θ is Gamma(4,2)Posterior ⇒ Gamma(22,8)

I Suppose y ∼ Poisson(θ)

p(y |x) =

∫p(y |θ)p(θ|x)dθ

=

∫θy exp(−θ)

y !

822θ21 exp−8θΓ(22)

=822

Γ(22)y !

∫θ21+y exp−θ(9)dθ

=822

Γ(22)y !

Γ(y + 22)

922+yy = 0, 1, 2, 3, . . .

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 84: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

Predictive density of y

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 85: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I The posterior distribution represents all our knowledge afterseeing the data.

I Why quote posterior mean and variance so much?

I Point estimates are justified by decision theory.

I As an extra ingredient we specify a loss function and choosean estimate to minimise expected loss.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 86: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I The posterior distribution represents all our knowledge afterseeing the data.

I Why quote posterior mean and variance so much?

I Point estimates are justified by decision theory.

I As an extra ingredient we specify a loss function and choosean estimate to minimise expected loss.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 87: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I The posterior distribution represents all our knowledge afterseeing the data.

I Why quote posterior mean and variance so much?

I Point estimates are justified by decision theory.

I As an extra ingredient we specify a loss function and choosean estimate to minimise expected loss.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 88: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I The posterior distribution represents all our knowledge afterseeing the data.

I Why quote posterior mean and variance so much?

I Point estimates are justified by decision theory.

I As an extra ingredient we specify a loss function and choosean estimate to minimise expected loss.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 89: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I For example, quadratic loss

l(d , θ) = (d − θ)2

leads to the posterior mean as the estimator and posteriorvariance as the expected loss.

I Absolute error loss

l(d , θ) = |d − θ|

leads to the posterior median as the estimator.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 90: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

What is Bayesian Statistics?Bayes TheoremThe Likelihood PrincipleMixtures of conjugate priorsPoisson examplePredictive distributionsA little decision theory

I For example, quadratic loss

l(d , θ) = (d − θ)2

leads to the posterior mean as the estimator and posteriorvariance as the expected loss.

I Absolute error loss

l(d , θ) = |d − θ|

leads to the posterior median as the estimator.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 91: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

Poisson hierarchical model

Stochastic Simulation - Gibbs sampling

Stochastic Simulation - basic idea:

I simulate from the distribution we are interested in

I use the simulated sample to make inferences.I Gibbs sampling is one of these methods.

Illustration of Gibbs

I Suppose we have three parameters θ, δ and φ.I Likelihood - p(x |θ, δ, φ).I Prior - p(θ, δ, φ).I Suppose we can find the conditional distributions:

p(θ|δ, φ, x), p(δ|φ, θ, x), p(φ|θ, δ, x)

that we can simulate from.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 92: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

Poisson hierarchical model

Stochastic Simulation - Gibbs sampling

Stochastic Simulation - basic idea:

I simulate from the distribution we are interested inI use the simulated sample to make inferences.

I Gibbs sampling is one of these methods.

Illustration of Gibbs

I Suppose we have three parameters θ, δ and φ.I Likelihood - p(x |θ, δ, φ).I Prior - p(θ, δ, φ).I Suppose we can find the conditional distributions:

p(θ|δ, φ, x), p(δ|φ, θ, x), p(φ|θ, δ, x)

that we can simulate from.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 93: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

Poisson hierarchical model

Stochastic Simulation - Gibbs sampling

Stochastic Simulation - basic idea:

I simulate from the distribution we are interested inI use the simulated sample to make inferences.I Gibbs sampling is one of these methods.

Illustration of Gibbs

I Suppose we have three parameters θ, δ and φ.I Likelihood - p(x |θ, δ, φ).I Prior - p(θ, δ, φ).I Suppose we can find the conditional distributions:

p(θ|δ, φ, x), p(δ|φ, θ, x), p(φ|θ, δ, x)

that we can simulate from.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 94: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

Poisson hierarchical model

Stochastic Simulation - Gibbs sampling

Stochastic Simulation - basic idea:

I simulate from the distribution we are interested inI use the simulated sample to make inferences.I Gibbs sampling is one of these methods.

Illustration of Gibbs

I Suppose we have three parameters θ, δ and φ.I Likelihood - p(x |θ, δ, φ).I Prior - p(θ, δ, φ).I Suppose we can find the conditional distributions:

p(θ|δ, φ, x), p(δ|φ, θ, x), p(φ|θ, δ, x)

that we can simulate from.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 95: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

Poisson hierarchical model

Stochastic Simulation - Gibbs sampling

Stochastic Simulation - basic idea:

I simulate from the distribution we are interested inI use the simulated sample to make inferences.I Gibbs sampling is one of these methods.

Illustration of Gibbs

I Suppose we have three parameters θ, δ and φ.

I Likelihood - p(x |θ, δ, φ).I Prior - p(θ, δ, φ).I Suppose we can find the conditional distributions:

p(θ|δ, φ, x), p(δ|φ, θ, x), p(φ|θ, δ, x)

that we can simulate from.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 96: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

Poisson hierarchical model

Stochastic Simulation - Gibbs sampling

Stochastic Simulation - basic idea:

I simulate from the distribution we are interested inI use the simulated sample to make inferences.I Gibbs sampling is one of these methods.

Illustration of Gibbs

I Suppose we have three parameters θ, δ and φ.I Likelihood - p(x |θ, δ, φ).

I Prior - p(θ, δ, φ).I Suppose we can find the conditional distributions:

p(θ|δ, φ, x), p(δ|φ, θ, x), p(φ|θ, δ, x)

that we can simulate from.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 97: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

Poisson hierarchical model

Stochastic Simulation - Gibbs sampling

Stochastic Simulation - basic idea:

I simulate from the distribution we are interested inI use the simulated sample to make inferences.I Gibbs sampling is one of these methods.

Illustration of Gibbs

I Suppose we have three parameters θ, δ and φ.I Likelihood - p(x |θ, δ, φ).I Prior - p(θ, δ, φ).

I Suppose we can find the conditional distributions:

p(θ|δ, φ, x), p(δ|φ, θ, x), p(φ|θ, δ, x)

that we can simulate from.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 98: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

Poisson hierarchical model

Stochastic Simulation - Gibbs sampling

Stochastic Simulation - basic idea:

I simulate from the distribution we are interested inI use the simulated sample to make inferences.I Gibbs sampling is one of these methods.

Illustration of Gibbs

I Suppose we have three parameters θ, δ and φ.I Likelihood - p(x |θ, δ, φ).I Prior - p(θ, δ, φ).I Suppose we can find the conditional distributions:

p(θ|δ, φ, x), p(δ|φ, θ, x), p(φ|θ, δ, x)

that we can simulate from.Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 99: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

Poisson hierarchical model

Gibbs Sampling

I The Gibbs sampler is an iterative scheme:Step 1 Choose initial estimates θ0, δ0, φ0

Step 2 Given current estimates θi , δi , φi

simulate new valuesθi+1 from p(θ|δi , φi , x)δi+1 from p(δ|φi , θi+1, x)φi+1 from p(φ|θi+1, δi+1, x)

Step 3 Return to step 2

I The sequence (θ, δ, φ)i i = 1, 2, . . . is a realisation of aMarkov chain which, under mild regularity conditions, hasequilibrium distribution p(θ, δ, φ|x), the joint posteriordistribution of θ, δ, φ.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 100: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

Poisson hierarchical model

Gibbs Sampling

I The Gibbs sampler is an iterative scheme:Step 1 Choose initial estimates θ0, δ0, φ0

Step 2 Given current estimates θi , δi , φi

simulate new valuesθi+1 from p(θ|δi , φi , x)δi+1 from p(δ|φi , θi+1, x)φi+1 from p(φ|θi+1, δi+1, x)

Step 3 Return to step 2

I The sequence (θ, δ, φ)i i = 1, 2, . . . is a realisation of aMarkov chain which, under mild regularity conditions, hasequilibrium distribution p(θ, δ, φ|x), the joint posteriordistribution of θ, δ, φ.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 101: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

Poisson hierarchical model

Estimation from Gibbs Output

I Burn in period of length L.

I Posterior mean of θ can be estimated for large t by

1

t

L+t∑i=L+1

θi

I To see a picture of the posterior density of θ - use Kerneldensity estimation.

I The predictive density of a new observation y can beestimated as

p(y |x) =1

t

L+t∑i=L+1

P(y |θi , δi , φi )

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 102: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

Poisson hierarchical model

Estimation from Gibbs Output

I Burn in period of length L.I Posterior mean of θ can be estimated for large t by

1

t

L+t∑i=L+1

θi

I To see a picture of the posterior density of θ - use Kerneldensity estimation.

I The predictive density of a new observation y can beestimated as

p(y |x) =1

t

L+t∑i=L+1

P(y |θi , δi , φi )

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 103: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

Poisson hierarchical model

Estimation from Gibbs Output

I Burn in period of length L.I Posterior mean of θ can be estimated for large t by

1

t

L+t∑i=L+1

θi

I To see a picture of the posterior density of θ - use Kerneldensity estimation.

I The predictive density of a new observation y can beestimated as

p(y |x) =1

t

L+t∑i=L+1

P(y |θi , δi , φi )

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 104: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

Poisson hierarchical model

Estimation from Gibbs Output

I Burn in period of length L.I Posterior mean of θ can be estimated for large t by

1

t

L+t∑i=L+1

θi

I To see a picture of the posterior density of θ - use Kerneldensity estimation.

I The predictive density of a new observation y can beestimated as

p(y |x) =1

t

L+t∑i=L+1

P(y |θi , δi , φi )

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 105: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

Poisson hierarchical model

Poisson hierarchical model

I 1st stage - observations sj ∼ Poisson with mean tjλj forj = 1, 2, . . . , p, tj known

I 2nd stage - prior λj ∼ Ga(α, β) iid, α known

I 3rd stage - hyperprior β ∼ Ga(γ, δ), where γ and δ are known.

I p + 1 unknown parameters - λ’s and β.

I Joint posterior of all the parameters:

p∏j=1

(λj tj)sj exp(−λj tj)

sj !×

p∏j=1

Γ(α)−1βαλα−1j exp(−βλj)

×Γ(γ)−1δγβγ−1 exp(−δβ)

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 106: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

Poisson hierarchical model

Poisson hierarchical model

I 1st stage - observations sj ∼ Poisson with mean tjλj forj = 1, 2, . . . , p, tj known

I 2nd stage - prior λj ∼ Ga(α, β) iid, α known

I 3rd stage - hyperprior β ∼ Ga(γ, δ), where γ and δ are known.

I p + 1 unknown parameters - λ’s and β.

I Joint posterior of all the parameters:

p∏j=1

(λj tj)sj exp(−λj tj)

sj !×

p∏j=1

Γ(α)−1βαλα−1j exp(−βλj)

×Γ(γ)−1δγβγ−1 exp(−δβ)

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 107: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

Poisson hierarchical model

Poisson hierarchical model

I 1st stage - observations sj ∼ Poisson with mean tjλj forj = 1, 2, . . . , p, tj known

I 2nd stage - prior λj ∼ Ga(α, β) iid, α known

I 3rd stage - hyperprior β ∼ Ga(γ, δ), where γ and δ are known.

I p + 1 unknown parameters - λ’s and β.

I Joint posterior of all the parameters:

p∏j=1

(λj tj)sj exp(−λj tj)

sj !×

p∏j=1

Γ(α)−1βαλα−1j exp(−βλj)

×Γ(γ)−1δγβγ−1 exp(−δβ)

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 108: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

Poisson hierarchical model

Poisson hierarchical model

I 1st stage - observations sj ∼ Poisson with mean tjλj forj = 1, 2, . . . , p, tj known

I 2nd stage - prior λj ∼ Ga(α, β) iid, α known

I 3rd stage - hyperprior β ∼ Ga(γ, δ), where γ and δ are known.

I p + 1 unknown parameters - λ’s and β.

I Joint posterior of all the parameters:

p∏j=1

(λj tj)sj exp(−λj tj)

sj !×

p∏j=1

Γ(α)−1βαλα−1j exp(−βλj)

×Γ(γ)−1δγβγ−1 exp(−δβ)

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 109: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

Poisson hierarchical model

Poisson hierarchical model

I 1st stage - observations sj ∼ Poisson with mean tjλj forj = 1, 2, . . . , p, tj known

I 2nd stage - prior λj ∼ Ga(α, β) iid, α known

I 3rd stage - hyperprior β ∼ Ga(γ, δ), where γ and δ are known.

I p + 1 unknown parameters - λ’s and β.

I Joint posterior of all the parameters:

p∏j=1

(λj tj)sj exp(−λj tj)

sj !×

p∏j=1

Γ(α)−1βαλα−1j exp(−βλj)

×Γ(γ)−1δγβγ−1 exp(−δβ)

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 110: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

Poisson hierarchical model

Conditional distributions

I Conditional distributions of λ’s and β - pick out the relevantterms from this posterior.

I

p(λk |β, λj(j 6= k), data) ∝ λskk exp(−λktk)λα−1

k exp(−λkβ)

⇒ Ga(α + sk , β + tk)

I

p(β|λ, data) ∝ βpα exp(−∑

λjβ)βγ−1 exp(−δβ)

⇒ Ga(pα + γ,∑

λj + δ)

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 111: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

Poisson hierarchical model

Conditional distributions

I Conditional distributions of λ’s and β - pick out the relevantterms from this posterior.

I

p(λk |β, λj(j 6= k), data) ∝ λskk exp(−λktk)λα−1

k exp(−λkβ)

⇒ Ga(α + sk , β + tk)

I

p(β|λ, data) ∝ βpα exp(−∑

λjβ)βγ−1 exp(−δβ)

⇒ Ga(pα + γ,∑

λj + δ)

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 112: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

Poisson hierarchical model

Conditional distributions

I Conditional distributions of λ’s and β - pick out the relevantterms from this posterior.

I

p(λk |β, λj(j 6= k), data) ∝ λskk exp(−λktk)λα−1

k exp(−λkβ)

⇒ Ga(α + sk , β + tk)

I

p(β|λ, data) ∝ βpα exp(−∑

λjβ)βγ−1 exp(−δβ)

⇒ Ga(pα + γ,∑

λj + δ)

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 113: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

Poisson hierarchical model

Gibbs runs

I For suitable starting values for the unknown parameters theGibbs sampler proceeds by drawing

λi+1j ∼ Ga(α + sj , β

i + tj) j = 1, . . . , p

βi+1 ∼ Ga(pα + γ,∑

λi+1j + δ)

I Use large sample of λ’s and β’s to estimate posterior andpredictive quantities of interest.

Dr Lawrence Pettit Bayesian Statistics - an Introduction

Page 114: Bayesian Statistics - an Introduction - QMUL Mathsfjw/goldsmiths/2008/LP/bayestalk.pdf · Bayesian Statistics Stochastic Simulation - Gibbs sampling Bayesian Statistics - an Introduction

Bayesian StatisticsStochastic Simulation - Gibbs sampling

Poisson hierarchical model

Gibbs runs

I For suitable starting values for the unknown parameters theGibbs sampler proceeds by drawing

λi+1j ∼ Ga(α + sj , β

i + tj) j = 1, . . . , p

βi+1 ∼ Ga(pα + γ,∑

λi+1j + δ)

I Use large sample of λ’s and β’s to estimate posterior andpredictive quantities of interest.

Dr Lawrence Pettit Bayesian Statistics - an Introduction