st5219: bayesian hierarchical modelling lecture 2.1
TRANSCRIPT
PRIORS, NORMAL MODELS, COMPUTING POSTERIORS
st5219: Bayesian hierarchical modellinglecture 2.1
Plan for lecture
Priors: how to choose them, different types
The normal distribution in Bayesianism Tutorial 1: over to you Computing posteriors:
Monte Carlo Importance Sampling Markov chain Monte Carlo
What is random?
FREQUENTISM BAYESIANISM
Something with a long run frequency distribution
E.g. coin tosses Patients in a clinical
trial “Measurement”
errors?
Everything What you don’t know
is random Unobserved data,
parameters, unknown states, hypotheses
Observed data still arise from probability modelKnock on effects on how to estimate things
and assess hypotheses
This week: practical issues
CHOOSING A PRIOR DOING COMPUTATIONS
Very misunderstood “How did you choose
your priors?” Please never answer
“Oh, I just made them up”
For data analysis, you need strong rationale for choice of prior
(later)
An example to illustrate priors
Following infection: body creates antibodies
These target pathogen and remain in the blood
Antibodies can provide data on historic disease exposure
H1N1 in Singapore
Cook, Chen, Lim (2010) Emerg Inf Dis DOI:10.3201/EID.1610.100840
Serology of H1N1
Singapore study longitudinal
Chen et al (2010) J Am Med Assoc 303:1383--91
Measurements
Observation in (xij,2xij) for individual i, observation j
Define “seroconversion” to be a “four-fold” rise in antibody levels, i.e.
yi = 1 if xi2 ≥ 4xi1 and 0 otherwise Out of 727 participants with follow up, we
have 98 seroconversions
Q: what proportion were infected?
AIDS ≠ influenza A H1N1
Seroconversion “test” not perfect: something about 80%
Infection rate should be higher than seroconversion rate
Board
work
Bayesian approach
Need some priors Last time: “U(0,1) good way to represent
lack of knowledge of a probability” Before we collected the JAMA data, we
didn’t know what p would be, and a prior p~U(0,1) makes sense
But there are data out there on σ!
Other data
Zambon et al (2001) Arch Intern Med 161:2116--22
Other data
m = 791 y = 629
This can give you
a prior!!!
σ~Be(630,163)
Board
work
Kinds of priors
NON-INFORMATIVE INFORMATIVE
p~U(0,1) σ²~U(0,∞) μ~U(- ∞, ∞) β~N(0,1000²) Should give you no
information about that parameter except what is in the data
σ ~Be(630,163) μ ~N(15.2,6.8²) Lets you supplement
natural information content of the data when not enough information on that aspect
Can give information on other parameters indirectly
How to choose?Scenario 1. You are trying to reach an optimal decision in the presence of uncertainty: use whatever information you can, even if subjective, via informative priors
Scenario 2. You are trying to estimate parameters for a scientific data analysis (you cannot or don’t want to use external data): use non-informative prior
Scenario 3. You are trying to estimate parameters for a scientific data analysis (you have good external data): use non-informative priors for those bits you have no data for or in which you want your own data to speak for themselves; use informative priors elsewhere
Whence came that Be(630,163)?
Step 1: uniform prior for σ
Step 2: fit model to Zambon data
Step 3: posterior for that becomes prior for main analysis Boar
d work
Conjugacy
The beta distribution is conjugate to a binomial model, in that if you start with a beta prior and use it in a binomial model for p and x, you end with a beta posterior of known form
I.e. if p~Be(a,b) and x~Bin(n,p), p|x~Be(a+x,b+n-x)
Other conjugate priors exist forsimple models, e.g. ...
Board
work
Why is it ok to take posteriors and turn them into priors?
It’s the incremental nature of accumulated knowledge
Eg Zambon study:
Stage Prior Data (y,m)=
Posterior
0 Be(1,1) (0,0) Be(1,1)
1 Be(1,1) (1,1) Be(2,1)
2 Be(2,1) (1,2) Be(2,2)
3 Be(2,2) (1,3) Be(2,3)
4 Be(2,3) (2,4) Be(3,4)
Effective sample sizes
You can think of the parameters of the beta(a,b) as representing a best guess of the proportion, a/(a+b) a “sample size” that the prior is equivalent to
(a+b) This is an easy way to transform published
results into beta priors: take the point estimate (MLE, say) and the sample size and transform to get a and b.
(So a uniform prior is like adding one positive and one negative value to your data set: is this fair???)
Other converting methods Take a point estimate and CI and convert
to 2 parameters to represent your prior. Eg the infectious period is a popular
parameter in infectious disease epidemiology: the average time from infection to recovery
For no good reason, often assumed to be exponential with mean λ, say
Fraser et al (2009) Science 324:1557--61 suggest estimate ofgeneration period of 1.91 with95%CI (1.3,2.71)
Board
work
Two final thoughts on priors
I mentioned U(-∞, ∞) as a non-informative prior.What’s the density function for U(- ∞, ∞)?
Board
work
Two final thoughts on priors
A prior such as U(-∞, ∞) is called an improper prior as it does not have a proper density function.
Improper priors sometimes give proper posteriors: depending on the integral of the likelihood.
Not an improper prior is a proper one
Two final thoughts on priors
Just because a prior is flat in one representation does not mean it is flat in another
Eg for an exponential model (for survival analysis say)
Board
work