reading "bayesian measures of model complexity and fit"
DESCRIPTION
first talk of the reading classics seminar 2013-2014 at Université Paris-Dauphine by Ilaria MasianiTRANSCRIPT
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion
Bayesian measures of model complexityand fit
by D. J. Spiegelhalter, N. G. Best, B. P. Carlin and A. van derLinde, 2002
presented by Ilaria Masiani
TSI-EuroBayes studentUniversité Paris Dauphine
Reading seminar on Classics, October 21, 2013
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion
Presentation of the paper
Bayesian measures of model complexity and fit by David J.Spiegelhalter, Nicola G. Best, Bradley P. Carlin andAngelika van der LindePublished in 2002 for J. Royal Statistical Society, series B,vol.64, Part 4, pp. 583-639
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion
Outline
1 Introduction
2 Complexity of a Bayesian model
3 Forms for pD
4 Diagnostics for fit
5 Model comparison criterion
6 Examples
7 Conclusion
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion
Outline
1 Introduction
2 Complexity of a Bayesian model
3 Forms for pD
4 Diagnostics for fit
5 Model comparison criterion
6 Examples
7 Conclusion
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion
Introduction
Model comparison:measure of fit (ex. deviance statistic)complexity (n. of free parameters in the model)
=⇒Trade-off of these two quantities
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion
Some of usual model comparison criterion:Akaike information criterion: AIC= −2log{p(y |θ)}+ 2pBayesian information criterion:BIC= −2log{p(y |θ)}+ plog(n)
The problem: both require to know p
Sometimes not clearly defined, e.g., complex hierarchicalmodels
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion
=⇒This paper suggests Bayesian measures of complexity andfit that can be combined to compare complex models.
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionBayesian measure of modelcomplexity
Observations on pD
Outline
1 Introduction
2 Complexity of a Bayesian model
3 Forms for pD
4 Diagnostics for fit
5 Model comparison criterion
6 Examples
7 Conclusion
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionBayesian measure of modelcomplexity
Observations on pD
Complexity reflects the ’difficulty in estimation’.
Measure of complexity may depend on:prior informationobserved data
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionBayesian measure of modelcomplexity
Observations on pD
True model
’All models are wrong, but some are useful’Box (1976)
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionBayesian measure of modelcomplexity
Observations on pD
True model
pt (Y ) ’true’ distribution of unobserved future data Yθt ’pseudotrue’ parameter valuep(Y |θt ) likelihood specified by θt
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionBayesian measure of modelcomplexity
Observations on pD
Residual information
residual information in data y conditional on θ:
−2log{p(y |θ)}
up to a multiplicative constant (Kullback and Leibler, 1951)estimator θ(y) of θt
excess of the true over the estimated residual information:
dΘ{y , θt , θ(y)} = −2log{p(y |θt )}+ 2log[p{y |θ(y)}]
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionBayesian measure of modelcomplexity
Observations on pD
Residual information
residual information in data y conditional on θ:
−2log{p(y |θ)}
up to a multiplicative constant (Kullback and Leibler, 1951)estimator θ(y) of θt
excess of the true over the estimated residual information:
dΘ{y , θt , θ(y)} = −2log{p(y |θt )}+ 2log[p{y |θ(y)}]
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionBayesian measure of modelcomplexity
Observations on pD
Residual information
residual information in data y conditional on θ:
−2log{p(y |θ)}
up to a multiplicative constant (Kullback and Leibler, 1951)estimator θ(y) of θt
excess of the true over the estimated residual information:
dΘ{y , θt , θ(y)} = −2log{p(y |θt )}+ 2log[p{y |θ(y)}]
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionBayesian measure of modelcomplexity
Observations on pD
Outline
1 Introduction
2 Complexity of a Bayesian modelBayesian measure of model complexity
3 Forms for pD
4 Diagnostics for fit
5 Model comparison criterion
6 Examples
7 Conclusion
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionBayesian measure of modelcomplexity
Observations on pD
Bayesian measure of model complexity
unknown θt replaced by random variable θdΘ{y , θ, θ(y)} estimated by its posterior expectation w.r.t.p(θ|y) :
pD{y ,Θ, θ(y)} = Eθ|y [dΘ{y , θ, θ(y)}]= Eθ|y [−2log{p(y |θ)}] + 2log[p{y |θ(y)}]
pD proposal as the effective number of parameters w.r.t.model with focus Θ
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionBayesian measure of modelcomplexity
Observations on pD
Bayesian measure of model complexity
unknown θt replaced by random variable θdΘ{y , θ, θ(y)} estimated by its posterior expectation w.r.t.p(θ|y) :
pD{y ,Θ, θ(y)} = Eθ|y [dΘ{y , θ, θ(y)}]= Eθ|y [−2log{p(y |θ)}] + 2log[p{y |θ(y)}]
pD proposal as the effective number of parameters w.r.t.model with focus Θ
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionBayesian measure of modelcomplexity
Observations on pD
Bayesian measure of model complexity
unknown θt replaced by random variable θdΘ{y , θ, θ(y)} estimated by its posterior expectation w.r.t.p(θ|y) :
pD{y ,Θ, θ(y)} = Eθ|y [dΘ{y , θ, θ(y)}]= Eθ|y [−2log{p(y |θ)}] + 2log[p{y |θ(y)}]
pD proposal as the effective number of parameters w.r.t.model with focus Θ
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionBayesian measure of modelcomplexity
Observations on pD
Effective number of parameters
tipically θ(y) = E(θ|y) = θ.f (y) fully specified standardizing term, function of the data
Then
Definition
pD = D(θ)− D(θ) (1)
whereD(θ) = −2log{p(y |θ)}+ 2log{f (y)}
is the ’Bayesian deviance’.
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionBayesian measure of modelcomplexity
Observations on pD
Effective number of parameters
tipically θ(y) = E(θ|y) = θ.f (y) fully specified standardizing term, function of the data
Then
Definition
pD = D(θ)− D(θ) (1)
whereD(θ) = −2log{p(y |θ)}+ 2log{f (y)}
is the ’Bayesian deviance’.
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionBayesian measure of modelcomplexity
Observations on pD
Outline
1 Introduction
2 Complexity of a Bayesian modelObservations on pD
3 Forms for pD
4 Diagnostics for fit
5 Model comparison criterion
6 Examples
7 Conclusion
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionBayesian measure of modelcomplexity
Observations on pD
Observations on pD
1 (1) can be rewritten as D(θ) = D(θ) + pD =⇒ measure of’adeguacy’
2 pD depends on: data, choice of focus Θ, prior info, choiceof θ(y) =⇒ lack of invariance to tranformations
3 using θ(y) = E(θ|y), pD ≥ 0 for any log-concave likelihoodin θ (Jensen’s inequality) =⇒ negative pDs indicate conflictbetween prior and data
4 pD easily calculated after a MCMC run
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionBayesian measure of modelcomplexity
Observations on pD
Observations on pD
1 (1) can be rewritten as D(θ) = D(θ) + pD =⇒ measure of’adeguacy’
2 pD depends on: data, choice of focus Θ, prior info, choiceof θ(y) =⇒ lack of invariance to tranformations
3 using θ(y) = E(θ|y), pD ≥ 0 for any log-concave likelihoodin θ (Jensen’s inequality) =⇒ negative pDs indicate conflictbetween prior and data
4 pD easily calculated after a MCMC run
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionBayesian measure of modelcomplexity
Observations on pD
Observations on pD
1 (1) can be rewritten as D(θ) = D(θ) + pD =⇒ measure of’adeguacy’
2 pD depends on: data, choice of focus Θ, prior info, choiceof θ(y) =⇒ lack of invariance to tranformations
3 using θ(y) = E(θ|y), pD ≥ 0 for any log-concave likelihoodin θ (Jensen’s inequality) =⇒ negative pDs indicate conflictbetween prior and data
4 pD easily calculated after a MCMC run
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionBayesian measure of modelcomplexity
Observations on pD
Observations on pD
1 (1) can be rewritten as D(θ) = D(θ) + pD =⇒ measure of’adeguacy’
2 pD depends on: data, choice of focus Θ, prior info, choiceof θ(y) =⇒ lack of invariance to tranformations
3 using θ(y) = E(θ|y), pD ≥ 0 for any log-concave likelihoodin θ (Jensen’s inequality) =⇒ negative pDs indicate conflictbetween prior and data
4 pD easily calculated after a MCMC run
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionpD for approximately normallikelihoodspD for normal likelihoods
pD for exponential family likeli-hoods
Outline
1 Introduction
2 Complexity of a Bayesian model
3 Forms for pD
4 Diagnostics for fit
5 Model comparison criterion
6 Examples
7 Conclusion
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionpD for approximately normallikelihoodspD for normal likelihoods
pD for exponential family likeli-hoods
Outline
1 Introduction
2 Complexity of a Bayesian model
3 Forms for pDpD for approximately normal likelihoods
4 Diagnostics for fit
5 Model comparison criterion
6 Examples
7 Conclusion
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionpD for approximately normallikelihoodspD for normal likelihoods
pD for exponential family likeli-hoods
Negligible prior informations
Assume θ|y ∼ N(θ,−L′′
θ), then expanding D(θ) around θ
D(θ) ≈ D(θ)− (θ − θ)T L′′
θ(θ − θ)
≈ D(θ) + χ2p
=⇒pD = Eθ|y{D(θ)} − D(θ) ≈ p (2)
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionpD for approximately normallikelihoodspD for normal likelihoods
pD for exponential family likeli-hoods
Negligible prior informations
Assume θ|y ∼ N(θ,−L′′
θ), then expanding D(θ) around θ
D(θ) ≈ D(θ)− (θ − θ)T L′′
θ(θ − θ)
≈ D(θ) + χ2p
=⇒pD = Eθ|y{D(θ)} − D(θ) ≈ p (2)
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionpD for approximately normallikelihoodspD for normal likelihoods
pD for exponential family likeli-hoods
Outline
1 Introduction
2 Complexity of a Bayesian model
3 Forms for pDpD for normal likelihoods
4 Diagnostics for fit
5 Model comparison criterion
6 Examples
7 Conclusion
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionpD for approximately normallikelihoodspD for normal likelihoods
pD for exponential family likeli-hoods
General hierarchical normal model (know variance)
y ∼ N(A1θ,C1)
θ ∼ N(A2φ,C2)
Then θ|y is normal with mean θ = Vb and covariance V .
=⇒pD = tr(−L
′′V )
where −L′′
= AT1 C−1
1 A1 is the Fisher information.
In this case, pD is invariant to affine tranformations of θ.
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionpD for approximately normallikelihoodspD for normal likelihoods
pD for exponential family likeli-hoods
General hierarchical normal model (know variance)
y ∼ N(A1θ,C1)
θ ∼ N(A2φ,C2)
Then θ|y is normal with mean θ = Vb and covariance V .
=⇒pD = tr(−L
′′V )
where −L′′
= AT1 C−1
1 A1 is the Fisher information.
In this case, pD is invariant to affine tranformations of θ.
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionpD for approximately normallikelihoodspD for normal likelihoods
pD for exponential family likeli-hoods
General hierarchical normal model (know variance)
y ∼ N(A1θ,C1)
θ ∼ N(A2φ,C2)
Then θ|y is normal with mean θ = Vb and covariance V .
=⇒pD = tr(−L
′′V )
where −L′′
= AT1 C−1
1 A1 is the Fisher information.
In this case, pD is invariant to affine tranformations of θ.
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionpD for approximately normallikelihoodspD for normal likelihoods
pD for exponential family likeli-hoods
In normal models:y = Hy , with H hat matrix (that projects the data onto thefitted values) =⇒ H = A1VAT
1 C−11
ThenpD = tr(H)
tr(H) = sum of leverages (influence of each observationon its fitted value)
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionpD for approximately normallikelihoodspD for normal likelihoods
pD for exponential family likeli-hoods
Conjugate normal-gamma model (unknow precision τ )
y ∼ N(A1θ, τ−1C1)
θ ∼ N(A2φ, τ−1C2)
pD = tr(H) + q(θ)(τ − τ)− n{log(τ)− log(τ)}
where q(θ) = (y − A1θ)T C−11 (y − A1θ).
It can be shown that for large n the choice of parameterizationof τ will make little difference to pD.
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionpD for approximately normallikelihoodspD for normal likelihoods
pD for exponential family likeli-hoods
Conjugate normal-gamma model (unknow precision τ )
y ∼ N(A1θ, τ−1C1)
θ ∼ N(A2φ, τ−1C2)
pD = tr(H) + q(θ)(τ − τ)− n{log(τ)− log(τ)}
where q(θ) = (y − A1θ)T C−11 (y − A1θ).
It can be shown that for large n the choice of parameterizationof τ will make little difference to pD.
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionpD for approximately normallikelihoodspD for normal likelihoods
pD for exponential family likeli-hoods
Conjugate normal-gamma model (unknow precision τ )
y ∼ N(A1θ, τ−1C1)
θ ∼ N(A2φ, τ−1C2)
pD = tr(H) + q(θ)(τ − τ)− n{log(τ)− log(τ)}
where q(θ) = (y − A1θ)T C−11 (y − A1θ).
It can be shown that for large n the choice of parameterizationof τ will make little difference to pD.
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionpD for approximately normallikelihoodspD for normal likelihoods
pD for exponential family likeli-hoods
Outline
1 Introduction
2 Complexity of a Bayesian model
3 Forms for pDpD for exponential family likelihoods
4 Diagnostics for fit
5 Model comparison criterion
6 Examples
7 Conclusion
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionpD for approximately normallikelihoodspD for normal likelihoods
pD for exponential family likeli-hoods
One-parameter exponential family
DefinitionAssume to have p groups of observations, each of niobservations in group i has same distribution.For j th observation in i th group:
log{p(yij |θi , φ)} = wi{yijθi − b(θi)}/φ+ c(yij , φ)
whereµi = E(Yij |θi , φ) = b
′(θi)
V (Yij |θi , φ) = b′′
(θi)φ/wi
wi constant.
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionpD for approximately normallikelihoodspD for normal likelihoods
pD for exponential family likeli-hoods
One-parameter exponential family
If Θ focus, bi = Eθi |y{b(θi)}, then the contribution of i th group tothe effective number of parameters:
pΘDi
= 2niwi{bi − b(θi)}/φ
=⇒ lack of invariance of pD to reparametrization
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionpD for approximately normallikelihoodspD for normal likelihoods
pD for exponential family likeli-hoods
One-parameter exponential family
If Θ focus, bi = Eθi |y{b(θi)}, then the contribution of i th group tothe effective number of parameters:
pΘDi
= 2niwi{bi − b(θi)}/φ
=⇒ lack of invariance of pD to reparametrization
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion
Outline
1 Introduction
2 Complexity of a Bayesian model
3 Forms for pD
4 Diagnostics for fit
5 Model comparison criterion
6 Examples
7 Conclusion
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion
Sampling theory diagnostics for lack of Bayesian fit
Eθ|y{D(θ)} = D(θ) measure of fit or ’adeguacy’If the model is true
EY (D) = EY [Eθ|y{D(θ)}]
= Eθ(EY |θ[−2logp(Y |θ)
p{Y |θ(Y )}])
≈ Eθ[EY |θ(χ2p)]
= Eθ(p) = p
For one-parameter exponential family p = n, thenEY (D) ≈ n
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion
Sampling theory diagnostics for lack of Bayesian fit
Eθ|y{D(θ)} = D(θ) measure of fit or ’adeguacy’If the model is true
EY (D) = EY [Eθ|y{D(θ)}]
= Eθ(EY |θ[−2logp(Y |θ)
p{Y |θ(Y )}])
≈ Eθ[EY |θ(χ2p)]
= Eθ(p) = p
For one-parameter exponential family p = n, thenEY (D) ≈ n
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion
Sampling theory diagnostics for lack of Bayesian fit
Eθ|y{D(θ)} = D(θ) measure of fit or ’adeguacy’If the model is true
EY (D) = EY [Eθ|y{D(θ)}]
= Eθ(EY |θ[−2logp(Y |θ)
p{Y |θ(Y )}])
≈ Eθ[EY |θ(χ2p)]
= Eθ(p) = p
For one-parameter exponential family p = n, thenEY (D) ≈ n
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionDefinition of the problemClassical criteria for modelcomparison
Bayesian criteria for modelcomparison
Outline
1 Introduction
2 Complexity of a Bayesian model
3 Forms for pD
4 Diagnostics for fit
5 Model comparison criterion
6 Examples
7 Conclusion
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionDefinition of the problemClassical criteria for modelcomparison
Bayesian criteria for modelcomparison
Outline
1 Introduction
2 Complexity of a Bayesian model
3 Forms for pD
4 Diagnostics for fit
5 Model comparison criterionDefinition of the problem
6 Examples
7 Conclusion
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionDefinition of the problemClassical criteria for modelcomparison
Bayesian criteria for modelcomparison
Model comparison: the problem
Yrep = independent replicate data setL(Y , θ) = loss in assigning to data Y a probability p(Y |θ)
L(y , θ(y)) = ’apparent’ loss repredicting the observed y
EYrep|θt [L{y , θ(y)}] = L{y , θ(y)}+ cΘ{y , θt , θ(y)}
where cΘ is the ’optimism’ associated with the estimator θ(y)(Efron, 1986)
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionDefinition of the problemClassical criteria for modelcomparison
Bayesian criteria for modelcomparison
Model comparison: the problem
Yrep = independent replicate data setL(Y , θ) = loss in assigning to data Y a probability p(Y |θ)
L(y , θ(y)) = ’apparent’ loss repredicting the observed y
EYrep|θt [L{y , θ(y)}] = L{y , θ(y)}+ cΘ{y , θt , θ(y)}
where cΘ is the ’optimism’ associated with the estimator θ(y)(Efron, 1986)
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionDefinition of the problemClassical criteria for modelcomparison
Bayesian criteria for modelcomparison
Assuming L(Y , θ) = −2log{p(Y |θ)},to estimate cΘ:
1 Classical approach: attempts to estimate the samplingexpectation of cΘ
2 Bayesian approach: direct calculation of the posteriorexpectation of cΘ
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionDefinition of the problemClassical criteria for modelcomparison
Bayesian criteria for modelcomparison
Assuming L(Y , θ) = −2log{p(Y |θ)},to estimate cΘ:
1 Classical approach: attempts to estimate the samplingexpectation of cΘ
2 Bayesian approach: direct calculation of the posteriorexpectation of cΘ
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionDefinition of the problemClassical criteria for modelcomparison
Bayesian criteria for modelcomparison
Assuming L(Y , θ) = −2log{p(Y |θ)},to estimate cΘ:
1 Classical approach: attempts to estimate the samplingexpectation of cΘ
2 Bayesian approach: direct calculation of the posteriorexpectation of cΘ
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionDefinition of the problemClassical criteria for modelcomparison
Bayesian criteria for modelcomparison
Outline
1 Introduction
2 Complexity of a Bayesian model
3 Forms for pD
4 Diagnostics for fit
5 Model comparison criterionClassical criteria for model comparison
6 Examples
7 Conclusion
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionDefinition of the problemClassical criteria for modelcomparison
Bayesian criteria for modelcomparison
Expected optimism: π(θt ) = EY |θt [cΘ{Y , θt , θ(Y )}]All criteria for models comparison based on minimizing
EYrep|θt [L{Yrep, θ(y)}] = L{y , θ(y)}+ π(θt )
Efron (1986) π(θt ) for the log-loss function: πE (θt ) ≈ 2pConsidered as corresponding to a plug-in estimate of fit +twice the effective number of parameters in the model
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionDefinition of the problemClassical criteria for modelcomparison
Bayesian criteria for modelcomparison
Expected optimism: π(θt ) = EY |θt [cΘ{Y , θt , θ(Y )}]All criteria for models comparison based on minimizing
EYrep|θt [L{Yrep, θ(y)}] = L{y , θ(y)}+ π(θt )
Efron (1986) π(θt ) for the log-loss function: πE (θt ) ≈ 2pConsidered as corresponding to a plug-in estimate of fit +twice the effective number of parameters in the model
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionDefinition of the problemClassical criteria for modelcomparison
Bayesian criteria for modelcomparison
Expected optimism: π(θt ) = EY |θt [cΘ{Y , θt , θ(Y )}]All criteria for models comparison based on minimizing
EYrep|θt [L{Yrep, θ(y)}] = L{y , θ(y)}+ π(θt )
Efron (1986) π(θt ) for the log-loss function: πE (θt ) ≈ 2pConsidered as corresponding to a plug-in estimate of fit +twice the effective number of parameters in the model
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionDefinition of the problemClassical criteria for modelcomparison
Bayesian criteria for modelcomparison
Expected optimism: π(θt ) = EY |θt [cΘ{Y , θt , θ(Y )}]All criteria for models comparison based on minimizing
EYrep|θt [L{Yrep, θ(y)}] = L{y , θ(y)}+ π(θt )
Efron (1986) π(θt ) for the log-loss function: πE (θt ) ≈ 2pConsidered as corresponding to a plug-in estimate of fit +twice the effective number of parameters in the model
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionDefinition of the problemClassical criteria for modelcomparison
Bayesian criteria for modelcomparison
Outline
1 Introduction
2 Complexity of a Bayesian model
3 Forms for pD
4 Diagnostics for fit
5 Model comparison criterionBayesian criteria for model comparison
6 Examples
7 Conclusion
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionDefinition of the problemClassical criteria for modelcomparison
Bayesian criteria for modelcomparison
AIME: identify models that best explain the observed databut
with the expectation that they minimize uncertainty aboutobservations generated in the same way
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionDefinition of the problemClassical criteria for modelcomparison
Bayesian criteria for modelcomparison
Deviance information criterion (DIC)
Definition
DIC = D(θ) + 2pD
= D + pD
Classical estimate of fit + twice the effective number ofparametersAlso a Bayesian measure of fit, penalized by complexity pD
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
ConclusionDefinition of the problemClassical criteria for modelcomparison
Bayesian criteria for modelcomparison
DIC and AIC
Akaike information criterion=⇒ AIC= 2p − 2log{p(y |θ)}θ =MLE
From result (2): pD ≈ p in models with negligible priorinformation =⇒ DIC≈ 2p + D(θ)
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Outline
1 Introduction
2 Complexity of a Bayesian model
3 Forms for pD
4 Diagnostics for fit
5 Model comparison criterion
6 Examples
7 Conclusion
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Outline
1 Introduction
2 Complexity of a Bayesian model
3 Forms for pD
4 Diagnostics for fit
5 Model comparison criterion
6 ExamplesSpatial distribution of lip cancer in Scotland
7 Conclusion
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Data on the rates of lip cancer in 56 districts in Scotland(Clayton and Kaldor, 1987; Breslow and Clayton, 1993)
yi observed numbers of cases for each county iEi expected numbers of cases for each county iAi list for each county of its ni adjacent counties
yi ∼ Pois(exp{θi}Ei)
exp{θi} underlying true area-specific relative risk of lip cancer
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Data on the rates of lip cancer in 56 districts in Scotland(Clayton and Kaldor, 1987; Breslow and Clayton, 1993)
yi observed numbers of cases for each county iEi expected numbers of cases for each county iAi list for each county of its ni adjacent counties
yi ∼ Pois(exp{θi}Ei)
exp{θi} underlying true area-specific relative risk of lip cancer
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Candidate models for θi
Model 1: θi = α0 (pooled)Model 2: θi = α0 + γi (exchangeable random effect)Model 3: θi = α0 + δi (spatial random effect)Model 4: θi = α0 + γi + δi (exchang.+ spatial effects)Model 5: θi = αi (saturated)
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Priors
α0 improper uniform priorαi (i = 1, ...,56) normal priors with large varianceγi ∼ N(0, λ−1
γ )
δi |δ\i ∼ N(
1ni
∑j∈Ai
δj ,1
niλδ
)with
∑56i=1 δi = 0
conditional autoregressive prior (Besag, 1974)λγ , λδ ∼ Gamma(0.5,0.0005)
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Saturated deviance
D(θ) = 2∑
i
[yi log{yi/exp(θi)Ei} − {yi − exp(θi)Ei}]
(McCullagh and Nelder, 1989, pg 34)
obtained by taking as standardizing factor:−2log{f (y)} = −2
∑i log{p(yi |θi)} = 208.0
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Results
For each model, two independent chains of MCMC (WinBUGS)for 15000 iterations each (burn-in after 5000 it.)
Deviance summaries using three alternative parameterizations(mean, canonical, median).
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Deviance calculations
D mean of the posterior samples of the saturated devianceD(µ) by plugging the posterior mean of µi = exp(θi)Ei intothe saturated devianceD(θ) by plugging the posterior means of α0, αi , γi , δi intothe linear predictor θi
D(med) by plugging the posterior median of θi into thesaturated deviance
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Deviance calculations
D mean of the posterior samples of the saturated devianceD(µ) by plugging the posterior mean of µi = exp(θi)Ei intothe saturated devianceD(θ) by plugging the posterior means of α0, αi , γi , δi intothe linear predictor θi
D(med) by plugging the posterior median of θi into thesaturated deviance
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Deviance calculations
D mean of the posterior samples of the saturated devianceD(µ) by plugging the posterior mean of µi = exp(θi)Ei intothe saturated devianceD(θ) by plugging the posterior means of α0, αi , γi , δi intothe linear predictor θi
D(med) by plugging the posterior median of θi into thesaturated deviance
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Deviance calculations
D mean of the posterior samples of the saturated devianceD(µ) by plugging the posterior mean of µi = exp(θi)Ei intothe saturated devianceD(θ) by plugging the posterior means of α0, αi , γi , δi intothe linear predictor θi
D(med) by plugging the posterior median of θi into thesaturated deviance
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Observations on pDs results
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Observations on pDs results
From result (2): pD ≈ ppooled model 1: pD = 1.0saturated model 5: pD from 52.8 to 55.9models 3-4 with spatial random effects: pD around 31model 2 with only exchangeable random effects: pDaround 43
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Comparison of DIC
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Comparison of DIC
DIC subject to Monte Carlo sampling error (function ofstochastic quantities)
Either of models 3 or 4 is superior to the others
Models 2 and 5 are superior to model 1
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Absolute measure of fit: compare D with n = 56
All models (except pooled model 1) adequate overall fit to thedata =⇒ comparison essentially based on pDs
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Absolute measure of fit: compare D with n = 56
All models (except pooled model 1) adequate overall fit to thedata =⇒ comparison essentially based on pDs
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Outline
1 Introduction
2 Complexity of a Bayesian model
3 Forms for pD
4 Diagnostics for fit
5 Model comparison criterion
6 ExamplesSix-cities study
7 Conclusion
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Subset of data from the six-cities study: longitudinal study ofhealth effects of air pollution (Fitzmaurice and Laird, 1993)
yij repeated binary measurement of the wheezing status ofchild i at time j (1, yes; 0, no), i = 1, ..., I, j = 1, ..., JI = 537 children living in Stuebenville, OhioJ = 4 time pointsaij age of child i in years at measurement point j (7, 8, 9,10 years)si smoking status of child i ’s mother (1, yes; 0, no)
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Subset of data from the six-cities study: longitudinal study ofhealth effects of air pollution (Fitzmaurice and Laird, 1993)
yij repeated binary measurement of the wheezing status ofchild i at time j (1, yes; 0, no), i = 1, ..., I, j = 1, ..., JI = 537 children living in Stuebenville, OhioJ = 4 time pointsaij age of child i in years at measurement point j (7, 8, 9,10 years)si smoking status of child i ’s mother (1, yes; 0, no)
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Conditional response model
Yij ∼ Bernoulli(pij)
pij = Pr(Yij = 1) = g−1(µij)
µij = β0 + β1zij1 + β2zij2 + β3zij3 + bi
zijk = xijk − x ..k , k = 1,2,3xij1 = aij , xij2 = si , xij3 = aijsi
bi individual-specific random effects: bi ∼ N(0, λ−1)
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Conditional response model
Yij ∼ Bernoulli(pij)
pij = Pr(Yij = 1) = g−1(µij)
µij = β0 + β1zij1 + β2zij2 + β3zij3 + bi
zijk = xijk − x ..k , k = 1,2,3xij1 = aij , xij2 = si , xij3 = aijsi
bi individual-specific random effects: bi ∼ N(0, λ−1)
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Conditional response model
Yij ∼ Bernoulli(pij)
pij = Pr(Yij = 1) = g−1(µij)
µij = β0 + β1zij1 + β2zij2 + β3zij3 + bi
zijk = xijk − x ..k , k = 1,2,3xij1 = aij , xij2 = si , xij3 = aijsi
bi individual-specific random effects: bi ∼ N(0, λ−1)
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Conditional response model
Yij ∼ Bernoulli(pij)
pij = Pr(Yij = 1) = g−1(µij)
µij = β0 + β1zij1 + β2zij2 + β3zij3 + bi
zijk = xijk − x ..k , k = 1,2,3xij1 = aij , xij2 = si , xij3 = aijsi
bi individual-specific random effects: bi ∼ N(0, λ−1)
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Model choice: link function g(·)
Model 1: g(pij) = logit(pij) = log{pij/(1− pij)}
Model 2: g(pij) = probit(pij) = Φ−1(pij)
Model 3: g(pij) = cloglog(pij) = log{−log(1− pij)}
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Priors and deviance form
βk flat priorsλ ∼ Gamma(0.001,0.001)
D = −2∑i,j
{yij log(pij) + (1− yij)log(1− pij)}
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion Spatial distribution of lip cancer Six-cities study
Results
Gibbs sampler for 5000 iterations (burn-in after 1000 it.)
Deviance summaries for canonical and meanparameterizations.
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion
Outline
1 Introduction
2 Complexity of a Bayesian model
3 Forms for pD
4 Diagnostics for fit
5 Model comparison criterion
6 Examples
7 Conclusion
Ilaria Masiani October 21, 2013
IntroductionComplexity
Forms for pDDiagnostics for fit
Model comparison criterionExamples
Conclusion
Conclusion
pD may not be invariant to the chosen parametrizationSimilarities to frequentist measures but based onexpectations w.r.t. parameters, in place of samplingexpectationsDIC viewed as a Bayesian analogue of AIC, similarjustification but wider applicabilityInvolves Monte Carlo sampling and negligible analytic work
Ilaria Masiani October 21, 2013
Appendix References
References I
McCullagh, P. and Nelder, J.Generalized Linear Models.2nd edn. London: Chapman and Hall, 1989.
Besag, J.Spatial interaction and the statistical analysis of latticesystems.J. R. Statist. Soc., series B, 36, 192-236, 1974.
Clayton, D.G. and Kaldor, J.Empirical Bayes estimates of age-standardised relative riskfor use in disease mapping.Biometrics, 43, 671-681, 1987.
Ilaria Masiani October 21, 2013
Appendix References
References II
Efron, B.How biased is the apparent error rate of a prediction rule?J. Ann. Statistic. Ass., 81, 461-470, 1986.
Fitzmaurice, G. and Laird, N.A likelihood-based method for analysing longitudinal binaryresponses.Biometrika, 80, 141-151, 1993.
Kullback, S. and Leibler, R.A.On information and sufficienty.Ann. Math. Statist., 22, 79-86, 1951.
Ilaria Masiani October 21, 2013
Appendix References
References III
Spiegelhalter, D.J., Best, N.G., Carlin, B.P. and van derLinde, A.Bayesian measures of model complexity and fit.J. Royal Statistical Society, series B, vol.64, Part 4, pp.583-639, 2002.
Ilaria Masiani October 21, 2013
Appendix References
Thank you.
Questions?
Ilaria Masiani October 21, 2013