bayesian computation with inla
DESCRIPTION
Short-course about Bayesian computation with INLA given on the AS2013 conference in Ribno, Slovenia.TRANSCRIPT
Bayesian computation using INLA
Thiago G. Martins
Norwegian University of Science and TechnologyTrondheim, Norway
AS 2013, Ribno, Slovenia
September, 2013
1 / 140
Parte I
Latent Gaussian models and INLA
methodology
2 / 140
Outline
Latent Gaussian models
Are latent Gaussian models important?
Bayesian computing
INLA method
3 / 140
Hierarchical Bayesian models
Hierarchical models are an extremely useful tool in Bayesian modelbuilding.
Three parts:
I Observations (y): Encodes information about observed data,including design and collection issues.
I The latent process (x): The unobserved process. May be thefocus of the study, or may be included to reduceautocorrelation. E.g., encode spatial and/or temporaldependence.
I The Parameter model (θ): Models for all of the parameters inthe observation and latent processes.
4 / 140
Hierarchical Bayesian models
Hierarchical models are an extremely useful tool in Bayesian modelbuilding.
Three parts:
I Observations (y): Encodes information about observed data,including design and collection issues.
I The latent process (x): The unobserved process. May be thefocus of the study, or may be included to reduceautocorrelation. E.g., encode spatial and/or temporaldependence.
I The Parameter model (θ): Models for all of the parameters inthe observation and latent processes.
4 / 140
Hierarchical Bayesian models
Hierarchical models are an extremely useful tool in Bayesian modelbuilding.
Three parts:
I Observations (y): Encodes information about observed data,including design and collection issues.
I The latent process (x): The unobserved process. May be thefocus of the study, or may be included to reduceautocorrelation. E.g., encode spatial and/or temporaldependence.
I The Parameter model (θ): Models for all of the parameters inthe observation and latent processes.
4 / 140
Latent Gaussian models
A latent Gaussian model is a Bayesian hierarchical model of thefollowing form
I Observed data y, yi |xi ∼ π(yi |xi ,θ)
I Latent Gaussian field x ∼ N (·,Σ(θ))
I Hyperparameters θ ∼ π(θ)I variabilityI length/strength of dependenceI parameters in the likelihood
π(x,θ|y) ∝ π(θ) π(x|θ)∏i∈I
π(yi |xi ,θ)
5 / 140
Latent Gaussian models
A latent Gaussian model is a Bayesian hierarchical model of thefollowing form
I Observed data y, yi |xi ∼ π(yi |xi ,θ)
I Latent Gaussian field x ∼ N (·,Σ(θ))
I Hyperparameters θ ∼ π(θ)I variabilityI length/strength of dependenceI parameters in the likelihood
π(x,θ|y) ∝ π(θ) π(x|θ)∏i∈I
π(yi |xi ,θ)
5 / 140
Latent Gaussian models
A latent Gaussian model is a Bayesian hierarchical model of thefollowing form
I Observed data y, yi |xi ∼ π(yi |xi ,θ)
I Latent Gaussian field x ∼ N (·,Σ(θ))
I Hyperparameters θ ∼ π(θ)I variabilityI length/strength of dependenceI parameters in the likelihood
π(x,θ|y) ∝ π(θ) π(x|θ)∏i∈I
π(yi |xi ,θ)
5 / 140
Latent Gaussian models
A latent Gaussian model is a Bayesian hierarchical model of thefollowing form
I Observed data y, yi |xi ∼ π(yi |xi ,θ)
I Latent Gaussian field x ∼ N (·,Σ(θ))
I Hyperparameters θ ∼ π(θ)I variabilityI length/strength of dependenceI parameters in the likelihood
π(x,θ|y) ∝ π(θ) π(x|θ)∏i∈I
π(yi |xi ,θ)
5 / 140
Precision matrix
The precision matrix of the latent field
Q(θ) = Σ(θ)−1
plays a key role!
Two issues
I Building models through conditioning (“hierarchical models”)
I Computational benefits
6 / 140
Precision matrix
The precision matrix of the latent field
Q(θ) = Σ(θ)−1
plays a key role!
Two issues
I Building models through conditioning (“hierarchical models”)
I Computational benefits
6 / 140
Building models through conditioning
If
I x ∼ N (0,Q−1x )
I y|x ∼ N (x,Q−1y )
then
Q(x,y) =
[Qx + Qy −Qy
−Qy Qy
]Not so nice expressions using the Covariance-matrix
7 / 140
Computational benefits
I Precision matrices encodes conditional independence:
xi ⊥ xj |x−ij ⇐⇒ Qij = 0
We are interested in models with sparse precision matrices.
I x ∼ N (·,Σ(θ)) with sparse Q(θ) = Σ(θ)−1
Gaussians with a sparse precision matrix are called GaussianMarkov random fields (GMRFs)
I Good computational properties through numerical algorithmsfor sparse matrices
8 / 140
Computational benefits
I Precision matrices encodes conditional independence:
xi ⊥ xj |x−ij ⇐⇒ Qij = 0
We are interested in models with sparse precision matrices.
I x ∼ N (·,Σ(θ)) with sparse Q(θ) = Σ(θ)−1
Gaussians with a sparse precision matrix are called GaussianMarkov random fields (GMRFs)
I Good computational properties through numerical algorithmsfor sparse matrices
8 / 140
Computational benefits
I Precision matrices encodes conditional independence:
xi ⊥ xj |x−ij ⇐⇒ Qij = 0
We are interested in models with sparse precision matrices.
I x ∼ N (·,Σ(θ)) with sparse Q(θ) = Σ(θ)−1
Gaussians with a sparse precision matrix are called GaussianMarkov random fields (GMRFs)
I Good computational properties through numerical algorithmsfor sparse matrices
8 / 140
Numerical algorithms for sparse matrices: scalingproperties
I Time: O(n)
I Space: O(n3/2)
I Space-time: O(n2)
This is to be compared with general O(n3) algorithms for densematrices.
9 / 140
Numerical algorithms for sparse matrices: scalingproperties
I Time: O(n)
I Space: O(n3/2)
I Space-time: O(n2)
This is to be compared with general O(n3) algorithms for densematrices.
9 / 140
Outline
Latent Gaussian models
Are latent Gaussian models important?
Bayesian computing
INLA method
10 / 140
Example (I): Mixed-effect model
yij |ηij ,θ1 ∼ π(yij |ηij ,θ1), i = 1, . . . ,N, j = 1, . . . ,M
ηij = µ+ cijβ + ui + vj + wij
where u, v and w are “random effects”.
If we assign Gaussian priors on µ, β, u and v, then
x|θ2 = (µ, β,u, v,η)|θ2
is jointly Gaussian.
θ = (θ1,θ2)
11 / 140
Example (I): Mixed-effect model
yij |ηij ,θ1 ∼ π(yij |ηij ,θ1), i = 1, . . . ,N, j = 1, . . . ,M
ηij = µ+ cijβ + ui + vj + wij
where u, v and w are “random effects”.
If we assign Gaussian priors on µ, β, u and v, then
x|θ2 = (µ, β,u, v,η)|θ2
is jointly Gaussian.
θ = (θ1,θ2)
11 / 140
Example (I) - cont.
We can reinterpret the model as
θ ∼ π(θ)
x|θ ∼ π(x|θ) = N (0,Q−1(θ))
y|x,θ ∼∏i
π(yi |ηi ,θ)
I dim(x) could be large 102-105
I dim(θ) is small 1-5
12 / 140
Example (I) - cont.
We can reinterpret the model as
θ ∼ π(θ)
x|θ ∼ π(x|θ) = N (0,Q−1(θ))
y|x,θ ∼∏i
π(yi |ηi ,θ)
I dim(x) could be large 102-105
I dim(θ) is small 1-5
12 / 140
Example (I) - cont.
Precision matrix (η,u, v, µ, β) N = 100, M = 5.
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
13 / 140
Example (II): Time-series model
Smoothing of binary time-series
I Data is sequence of 0 and 1s
I Probability for a 1 at time t, pt , depends on time
pt =exp(ηt)
1 + exp(ηt)
I Linear predictor
ηt = µ+ βct + ut + vt , t = 1, . . . , n
14 / 140
Example (II): Time-series model
Smoothing of binary time-series
I Data is sequence of 0 and 1s
I Probability for a 1 at time t, pt , depends on time
pt =exp(ηt)
1 + exp(ηt)
I Linear predictor
ηt = µ+ βct + ut + vt , t = 1, . . . , n
14 / 140
Example (II): Time-series model
Smoothing of binary time-series
I Data is sequence of 0 and 1s
I Probability for a 1 at time t, pt , depends on time
pt =exp(ηt)
1 + exp(ηt)
I Linear predictor
ηt = µ+ βct + ut + vt , t = 1, . . . , n
14 / 140
Example (II) - cont.
Prior models
I µ and β are Normal
I u AR-model, likeut = φut−1 + εt
with parameters (φ, σ2ε ).
I v is an unstructured term or a “random effect”
givesx|θ = (µ, β,u, v,η)
is jointly Gaussian.
Hyperparametersθ = (φ, σ2
ε , σ2v )
15 / 140
Example (II) - cont.
Prior models
I µ and β are Normal
I u AR-model, likeut = φut−1 + εt
with parameters (φ, σ2ε ).
I v is an unstructured term or a “random effect”
givesx|θ = (µ, β,u, v,η)
is jointly Gaussian.
Hyperparametersθ = (φ, σ2
ε , σ2v )
15 / 140
Example (II) - cont.
Prior models
I µ and β are Normal
I u AR-model, likeut = φut−1 + εt
with parameters (φ, σ2ε ).
I v is an unstructured term or a “random effect”
givesx|θ = (µ, β,u, v,η)
is jointly Gaussian.
Hyperparametersθ = (φ, σ2
ε , σ2v )
15 / 140
Example (II) - cont.
Prior models
I µ and β are Normal
I u AR-model, likeut = φut−1 + εt
with parameters (φ, σ2ε ).
I v is an unstructured term or a “random effect”
givesx|θ = (µ, β,u, v,η)
is jointly Gaussian.
Hyperparametersθ = (φ, σ2
ε , σ2v )
15 / 140
Example (II) - cont.
Prior models
I µ and β are Normal
I u AR-model, likeut = φut−1 + εt
with parameters (φ, σ2ε ).
I v is an unstructured term or a “random effect”
givesx|θ = (µ, β,u, v,η)
is jointly Gaussian.
Hyperparametersθ = (φ, σ2
ε , σ2v )
15 / 140
Example (II) - cont.
Prior models
I µ and β are Normal
I u AR-model, likeut = φut−1 + εt
with parameters (φ, σ2ε ).
I v is an unstructured term or a “random effect”
givesx|θ = (µ, β,u, v,η)
is jointly Gaussian.
Hyperparametersθ = (φ, σ2
ε , σ2v )
15 / 140
Example (II) - cont.
We can reinterpret the model as
θ ∼ π(θ)
x|θ ∼ π(x|θ) = N (0,Q−1(θ))
y|x,θ ∼∏i
π(yi |ηi ,θ)
I dim(x) could be large 102-105
I dim(θ) is small 1-5
16 / 140
Example (II) - cont.
We can reinterpret the model as
θ ∼ π(θ)
x|θ ∼ π(x|θ) = N (0,Q−1(θ))
y|x,θ ∼∏i
π(yi |ηi ,θ)
I dim(x) could be large 102-105
I dim(θ) is small 1-5
16 / 140
Example (II) - cont.
Precision matrix (η,u, v, µ, β), n = 100.
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
17 / 140
Example (III): Disease mapping
I Data yi ∼ Poisson(Eiexp(ηi ))
I Log-relative riskηi = µ+ ui + vi + f (ci )
I Structured component u
I Unstructured component v
I Smooth effect of a covariate c
−0.63
−0.37
−0.1
0.17
0.44
0.71
0.98
18 / 140
Example (III): Disease mapping
I Data yi ∼ Poisson(Eiexp(ηi ))
I Log-relative riskηi = µ+ ui + vi + f (ci )
I Structured component u
I Unstructured component v
I Smooth effect of a covariate c
−0.63
−0.37
−0.1
0.17
0.44
0.71
0.98
18 / 140
Example (III): Disease mapping
I Data yi ∼ Poisson(Eiexp(ηi ))
I Log-relative riskηi = µ+ ui + vi + f (ci )
I Structured component u
I Unstructured component v
I Smooth effect of a covariate c
−0.63
−0.37
−0.1
0.17
0.44
0.71
0.98
18 / 140
Example (III): Disease mapping
I Data yi ∼ Poisson(Eiexp(ηi ))
I Log-relative riskηi = µ+ ui + vi + f (ci )
I Structured component u
I Unstructured component v
I Smooth effect of a covariate c
−0.63
−0.37
−0.1
0.17
0.44
0.71
0.98
18 / 140
Example (III): Disease mapping
I Data yi ∼ Poisson(Eiexp(ηi ))
I Log-relative riskηi = µ+ ui + vi + f (ci )
I Structured component u
I Unstructured component v
I Smooth effect of a covariate c
−0.63
−0.37
−0.1
0.17
0.44
0.71
0.98
18 / 140
Yet Another Example (III)
We can reinterpret the model as
θ ∼ π(θ)
x|θ ∼ π(x|θ) = N (0,Q−1(θ))
y|x,θ ∼∏i
π(yi |ηi ,θ)
I dim(x) could be large 102-105
I dim(θ) is small 1-5
19 / 140
Example (III) - cont.
Precision matrix (η,u, v, µ, f)
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
20 / 140
What we have learned so far
The latent Gaussian model construct
θ ∼ π(θ)
x|θ ∼ π(x|θ) = N (0,Q−1(θ))
y|x,θ ∼∏i
π(yi |ηi ,θ)
occurs in many, seemingly unrelated, statistical models.
GLM/GAM/GLMM/GAMM/++
21 / 140
Further Examples
I Dynamic linear models
I Stochastic volatility
I Generalized linear (mixed) models
I Generalized additive (mixed) models
I Spline smoothing
I Semi-parametric regression
I Space-varying (semi-parametric) regression models
I Disease mapping
I Log-Gaussian Cox-processes
I Model-based geostatistics (*)
I Spatio-temporal models
I Survival analysis
I +++
22 / 140
Further Examples
I Dynamic linear models
I Stochastic volatility
I Generalized linear (mixed) models
I Generalized additive (mixed) models
I Spline smoothing
I Semi-parametric regression
I Space-varying (semi-parametric) regression models
I Disease mapping
I Log-Gaussian Cox-processes
I Model-based geostatistics (*)
I Spatio-temporal models
I Survival analysis
I +++
22 / 140
Further Examples
I Dynamic linear models
I Stochastic volatility
I Generalized linear (mixed) models
I Generalized additive (mixed) models
I Spline smoothing
I Semi-parametric regression
I Space-varying (semi-parametric) regression models
I Disease mapping
I Log-Gaussian Cox-processes
I Model-based geostatistics (*)
I Spatio-temporal models
I Survival analysis
I +++
22 / 140
Further Examples
I Dynamic linear models
I Stochastic volatility
I Generalized linear (mixed) models
I Generalized additive (mixed) models
I Spline smoothing
I Semi-parametric regression
I Space-varying (semi-parametric) regression models
I Disease mapping
I Log-Gaussian Cox-processes
I Model-based geostatistics (*)
I Spatio-temporal models
I Survival analysis
I +++
22 / 140
Further Examples
I Dynamic linear models
I Stochastic volatility
I Generalized linear (mixed) models
I Generalized additive (mixed) models
I Spline smoothing
I Semi-parametric regression
I Space-varying (semi-parametric) regression models
I Disease mapping
I Log-Gaussian Cox-processes
I Model-based geostatistics (*)
I Spatio-temporal models
I Survival analysis
I +++
22 / 140
Further Examples
I Dynamic linear models
I Stochastic volatility
I Generalized linear (mixed) models
I Generalized additive (mixed) models
I Spline smoothing
I Semi-parametric regression
I Space-varying (semi-parametric) regression models
I Disease mapping
I Log-Gaussian Cox-processes
I Model-based geostatistics (*)
I Spatio-temporal models
I Survival analysis
I +++
22 / 140
Further Examples
I Dynamic linear models
I Stochastic volatility
I Generalized linear (mixed) models
I Generalized additive (mixed) models
I Spline smoothing
I Semi-parametric regression
I Space-varying (semi-parametric) regression models
I Disease mapping
I Log-Gaussian Cox-processes
I Model-based geostatistics (*)
I Spatio-temporal models
I Survival analysis
I +++
22 / 140
Further Examples
I Dynamic linear models
I Stochastic volatility
I Generalized linear (mixed) models
I Generalized additive (mixed) models
I Spline smoothing
I Semi-parametric regression
I Space-varying (semi-parametric) regression models
I Disease mapping
I Log-Gaussian Cox-processes
I Model-based geostatistics (*)
I Spatio-temporal models
I Survival analysis
I +++
22 / 140
Further Examples
I Dynamic linear models
I Stochastic volatility
I Generalized linear (mixed) models
I Generalized additive (mixed) models
I Spline smoothing
I Semi-parametric regression
I Space-varying (semi-parametric) regression models
I Disease mapping
I Log-Gaussian Cox-processes
I Model-based geostatistics (*)
I Spatio-temporal models
I Survival analysis
I +++
22 / 140
Further Examples
I Dynamic linear models
I Stochastic volatility
I Generalized linear (mixed) models
I Generalized additive (mixed) models
I Spline smoothing
I Semi-parametric regression
I Space-varying (semi-parametric) regression models
I Disease mapping
I Log-Gaussian Cox-processes
I Model-based geostatistics (*)
I Spatio-temporal models
I Survival analysis
I +++
22 / 140
Further Examples
I Dynamic linear models
I Stochastic volatility
I Generalized linear (mixed) models
I Generalized additive (mixed) models
I Spline smoothing
I Semi-parametric regression
I Space-varying (semi-parametric) regression models
I Disease mapping
I Log-Gaussian Cox-processes
I Model-based geostatistics (*)
I Spatio-temporal models
I Survival analysis
I +++
22 / 140
Further Examples
I Dynamic linear models
I Stochastic volatility
I Generalized linear (mixed) models
I Generalized additive (mixed) models
I Spline smoothing
I Semi-parametric regression
I Space-varying (semi-parametric) regression models
I Disease mapping
I Log-Gaussian Cox-processes
I Model-based geostatistics (*)
I Spatio-temporal models
I Survival analysis
I +++
22 / 140
Further Examples
I Dynamic linear models
I Stochastic volatility
I Generalized linear (mixed) models
I Generalized additive (mixed) models
I Spline smoothing
I Semi-parametric regression
I Space-varying (semi-parametric) regression models
I Disease mapping
I Log-Gaussian Cox-processes
I Model-based geostatistics (*)
I Spatio-temporal models
I Survival analysis
I +++
22 / 140
Outline
Latent Gaussian models
Are latent Gaussian models important?
Bayesian computing
INLA method
23 / 140
Bayesian computing
We are interested in the posterior marginal quantities like π(xi |y)and π(θi |y).
This requires the evaluation of integrals of the form
π(xi |y) ∝∫x{−i}
∫θπ(y |x,θ)π(x|θ)π(θ) dθ dx{−i}
The computation of massively high dimensional integrals is at thecore of Bayesian computing.
24 / 140
Bayesian computing
We are interested in the posterior marginal quantities like π(xi |y)and π(θi |y).
This requires the evaluation of integrals of the form
π(xi |y) ∝∫x{−i}
∫θπ(y |x,θ)π(x|θ)π(θ) dθ dx{−i}
The computation of massively high dimensional integrals is at thecore of Bayesian computing.
24 / 140
Bayesian computing
We are interested in the posterior marginal quantities like π(xi |y)and π(θi |y).
This requires the evaluation of integrals of the form
π(xi |y) ∝∫x{−i}
∫θπ(y |x,θ)π(x|θ)π(θ) dθ dx{−i}
The computation of massively high dimensional integrals is at thecore of Bayesian computing.
24 / 140
But surely we can already do this
I Markov Chain Monte Carlo (MCMC) is widely used by theapplied community.
I There are generic tools available for MCMC, OpenBUGS, JAGS,STAN and others for specific models, like BayesX.
I The issue of Bayesian computing is not “solved” even thoughMCMC is available
I Hierarchical models are more difficult for MCMC
I Strong dependencies, bad mixing.
I A main obstacle for Bayesian modeling is still the issue of“Bayesian computing”
25 / 140
But surely we can already do this
I Markov Chain Monte Carlo (MCMC) is widely used by theapplied community.
I There are generic tools available for MCMC, OpenBUGS, JAGS,STAN and others for specific models, like BayesX.
I The issue of Bayesian computing is not “solved” even thoughMCMC is available
I Hierarchical models are more difficult for MCMC
I Strong dependencies, bad mixing.
I A main obstacle for Bayesian modeling is still the issue of“Bayesian computing”
25 / 140
But surely we can already do this
I Markov Chain Monte Carlo (MCMC) is widely used by theapplied community.
I There are generic tools available for MCMC, OpenBUGS, JAGS,STAN and others for specific models, like BayesX.
I The issue of Bayesian computing is not “solved” even thoughMCMC is available
I Hierarchical models are more difficult for MCMC
I Strong dependencies, bad mixing.
I A main obstacle for Bayesian modeling is still the issue of“Bayesian computing”
25 / 140
So what’s wrong with MCMC?
This is actually a problem with any Monte-Carlo scheme.
Error in expectations
The Monte-Carlo error is
Var
(E(f (X ))− 1
N
N∑i=1
f (xi )
)= O
(1√N
)
In practical terms, to reduce the variance to O(10−p) you needO(102p) samples!
This can be optimistic!
26 / 140
Be more narrow
MCMC
I MCMC ‘works’ for everything, but it is not usually optimalwhen we focus on a specific class of models.
I It works for latent Gaussian models, but it’s too slow.
I (Unfortunately) sometimes it’s the only thing we can do.
INLA
I Integrated Nested Laplace Approximations
I Deterministic rather than stochastic algorithm, like MCMC.
I Specially designed for latent Gaussian models.
I Accurate results in a small fraction of computational time,when compared to MCMC.
27 / 140
Be more narrow
MCMC
I MCMC ‘works’ for everything, but it is not usually optimalwhen we focus on a specific class of models.
I It works for latent Gaussian models, but it’s too slow.
I (Unfortunately) sometimes it’s the only thing we can do.
INLA
I Integrated Nested Laplace Approximations
I Deterministic rather than stochastic algorithm, like MCMC.
I Specially designed for latent Gaussian models.
I Accurate results in a small fraction of computational time,when compared to MCMC.
27 / 140
Be more narrow
MCMC
I MCMC ‘works’ for everything, but it is not usually optimalwhen we focus on a specific class of models.
I It works for latent Gaussian models, but it’s too slow.
I (Unfortunately) sometimes it’s the only thing we can do.
INLA
I Integrated Nested Laplace Approximations
I Deterministic rather than stochastic algorithm, like MCMC.
I Specially designed for latent Gaussian models.
I Accurate results in a small fraction of computational time,when compared to MCMC.
27 / 140
Be more narrow
MCMC
I MCMC ‘works’ for everything, but it is not usually optimalwhen we focus on a specific class of models.
I It works for latent Gaussian models, but it’s too slow.
I (Unfortunately) sometimes it’s the only thing we can do.
INLA
I Integrated Nested Laplace Approximations
I Deterministic rather than stochastic algorithm, like MCMC.
I Specially designed for latent Gaussian models.
I Accurate results in a small fraction of computational time,when compared to MCMC.
27 / 140
Be more narrow
MCMC
I MCMC ‘works’ for everything, but it is not usually optimalwhen we focus on a specific class of models.
I It works for latent Gaussian models, but it’s too slow.
I (Unfortunately) sometimes it’s the only thing we can do.
INLA
I Integrated Nested Laplace Approximations
I Deterministic rather than stochastic algorithm, like MCMC.
I Specially designed for latent Gaussian models.
I Accurate results in a small fraction of computational time,when compared to MCMC.
27 / 140
Be more narrow
MCMC
I MCMC ‘works’ for everything, but it is not usually optimalwhen we focus on a specific class of models.
I It works for latent Gaussian models, but it’s too slow.
I (Unfortunately) sometimes it’s the only thing we can do.
INLA
I Integrated Nested Laplace Approximations
I Deterministic rather than stochastic algorithm, like MCMC.
I Specially designed for latent Gaussian models.
I Accurate results in a small fraction of computational time,when compared to MCMC.
27 / 140
Comparing results with MCMC
I When comparing the results of R-INLA with MCMC, it isimportant to use the same model.
I Here we have compared the EPIL example results with thoseobtained using JAGS via the rjags package
28 / 140
Comparing results with MCMC
I When comparing the results of R-INLA with MCMC, it isimportant to use the same model.
I Here we have compared the EPIL example results with thoseobtained using JAGS via the rjags package
28 / 140
Intercept, 0.125 minutes
a0
Density
1.4 1.5 1.6 1.7 1.8 1.9
01
23
45
Age
alpha.Age
Density
−0.5 0.0 0.5 1.0 1.5
0.0
0.5
1.0
1.5
log(tau.Ind)
log(tau.b1)
Density
0.5 1.0 1.5 2.0 2.5
0.0
0.5
1.0
1.5
log(tau.Rand)
log(tau.b)
Density
1.5 2.0 2.5
0.0
0.5
1.0
1.5
29 / 140
Intercept, 0.25 minutes
a0
Density
1.4 1.5 1.6 1.7 1.8 1.9
01
23
45
6
Age
alpha.Age
Density
−0.5 0.0 0.5 1.0 1.5
0.0
0.4
0.8
1.2
log(tau.Ind)
log(tau.b1)
Density
0.5 1.0 1.5 2.0 2.5
0.0
0.5
1.0
1.5
log(tau.Rand)
log(tau.b)
Density
1.5 2.0 2.5 3.0
0.0
0.5
1.0
1.5
29 / 140
Intercept, 0.5 minutes
a0
De
nsity
1.3 1.4 1.5 1.6 1.7 1.8 1.9
01
23
45
Age
alpha.Age
De
nsity
−0.5 0.0 0.5 1.0 1.5 2.0
0.0
0.4
0.8
1.2
log(tau.Ind)
log(tau.b1)
De
nsity
0.5 1.0 1.5 2.0 2.5
0.0
0.5
1.0
1.5
log(tau.Rand)
log(tau.b)
De
nsity
1.5 2.0 2.5 3.0
0.0
0.5
1.0
1.5
29 / 140
Intercept, 1 minutes
a0
De
nsity
1.3 1.4 1.5 1.6 1.7 1.8 1.9
01
23
45
Age
alpha.Age
De
nsity
−0.5 0.0 0.5 1.0 1.5 2.0
0.0
0.4
0.8
1.2
log(tau.Ind)
log(tau.b1)
De
nsity
0.5 1.0 1.5 2.0 2.5
0.0
0.5
1.0
1.5
log(tau.Rand)
log(tau.b)
De
nsity
1.5 2.0 2.5 3.0
0.0
0.5
1.0
1.5
29 / 140
Intercept, 2 minutes
a0
Density
1.3 1.4 1.5 1.6 1.7 1.8 1.9
01
23
45
Age
alpha.Age
Density
−1.0 −0.5 0.0 0.5 1.0 1.5 2.0
0.0
0.4
0.8
1.2
log(tau.Ind)
log(tau.b1)
Density
0.5 1.0 1.5 2.0 2.5
0.0
0.5
1.0
1.5
log(tau.Rand)
log(tau.b)
Density
1.5 2.0 2.5 3.0
0.0
0.5
1.0
1.5
29 / 140
Intercept, 4 minutes
a0
Density
1.3 1.4 1.5 1.6 1.7 1.8 1.9
01
23
45
Age
alpha.Age
Density
−1.0 0.0 0.5 1.0 1.5 2.0
0.0
0.4
0.8
1.2
log(tau.Ind)
log(tau.b1)
Density
0.5 1.0 1.5 2.0 2.5 3.0
0.0
0.5
1.0
1.5
log(tau.Rand)
log(tau.b)
Density
1.5 2.0 2.5 3.0
0.0
0.5
1.0
1.5
29 / 140
Intercept, 8 minutes
a0
Density
1.3 1.4 1.5 1.6 1.7 1.8 1.9
01
23
45
Age
alpha.Age
Density
−1.0 0.0 0.5 1.0 1.5 2.0
0.0
0.2
0.4
0.6
0.8
1.0
1.2
log(tau.Ind)
log(tau.b1)
Density
0.5 1.0 1.5 2.0 2.5 3.0
0.0
0.5
1.0
1.5
log(tau.Rand)
log(tau.b)
Density
1.5 2.0 2.5 3.0
0.0
0.5
1.0
1.5
29 / 140
Intercept, 16 minutes
a0
Density
1.3 1.4 1.5 1.6 1.7 1.8 1.9
01
23
45
Age
alpha.Age
Density
−1.5 −0.5 0.5 1.0 1.5 2.0
0.0
0.2
0.4
0.6
0.8
1.0
log(tau.Ind)
log(tau.b1)
Density
0.5 1.0 1.5 2.0 2.5 3.0
0.0
0.4
0.8
1.2
log(tau.Rand)
log(tau.b)
Density
1.0 1.5 2.0 2.5 3.0
0.0
0.5
1.0
1.5
29 / 140
Intercept, 32 minutes
a0
De
nsity
1.3 1.4 1.5 1.6 1.7 1.8 1.9
01
23
45
Age
alpha.Age
De
nsity
−1.5 −0.5 0.5 1.0 1.5 2.0
0.0
0.2
0.4
0.6
0.8
1.0
log(tau.Ind)
log(tau.b1)
De
nsity
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.0
0.4
0.8
1.2
log(tau.Rand)
log(tau.b)
De
nsity
1.0 1.5 2.0 2.5 3.0 3.5
0.0
0.5
1.0
1.5
29 / 140
Intercept, 64 minutes
a0
De
nsity
1.2 1.4 1.6 1.8
01
23
45
Age
alpha.Age
De
nsity
−1 0 1 2
0.0
0.2
0.4
0.6
0.8
1.0
log(tau.Ind)
log(tau.b1)
De
nsity
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.0
0.4
0.8
1.2
log(tau.Rand)
log(tau.b)
De
nsity
1.0 1.5 2.0 2.5 3.0 3.5
0.0
0.5
1.0
1.5
29 / 140
Intercept, 120 minutes
a0
Density
1.2 1.4 1.6 1.8 2.0
01
23
45
Age
alpha.Age
Density
−1 0 1 2
0.0
0.2
0.4
0.6
0.8
1.0
log(tau.Ind)
log(tau.b1)
Density
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.0
0.4
0.8
1.2
log(tau.Rand)
log(tau.b)
Density
1.0 1.5 2.0 2.5 3.0 3.5
0.0
0.5
1.0
1.5
29 / 140
Outline
Latent Gaussian models
Are latent Gaussian models important?
Bayesian computing
INLA method
30 / 140
Main aim
Posteriorπ(x,θ|y) ∝ π(θ) π(x|θ)
∏i∈I
π(yi |xi ,θ)
Compute the posterior marginals:
π(xi |y) =
∫π(θ|y) π(xi |θ, y) dθ
π(θj |y) =
∫π(θ|y) dθ−j
31 / 140
Main aim
Posteriorπ(x,θ|y) ∝ π(θ) π(x|θ)
∏i∈I
π(yi |xi ,θ)
Compute the posterior marginals:
π(xi |y) =
∫π(θ|y) π(xi |θ, y) dθ
π(θj |y) =
∫π(θ|y) dθ−j
31 / 140
Tasks
1. Build an approximation to π(θ|y): π̃(θ |y)
2. Build an approximation to π(xi |θ, y): π̃(xi |θ, y)
π(xi |y) =
∫π(θ|y) π(xi |θ, y) dθ
π(θj |y) =
∫π(θ|y) dθ−j
3. Do the integration wrt θ numerically.
32 / 140
Tasks
1. Build an approximation to π(θ|y): π̃(θ |y)
2. Build an approximation to π(xi |θ, y): π̃(xi |θ, y)
π(xi |y) =
∫π(θ|y) π(xi |θ, y) dθ
π(θj |y) =
∫π(θ|y) dθ−j
3. Do the integration wrt θ numerically.
32 / 140
Tasks
1. Build an approximation to π(θ|y): π̃(θ |y)
2. Build an approximation to π(xi |θ, y): π̃(xi |θ, y)
π(xi |y) =
∫π(θ|y) π(xi |θ, y) dθ
π(θj |y) =
∫π(θ|y) dθ−j
3. Do the integration wrt θ numerically.
32 / 140
Tasks
1. Build an approximation to π(θ|y): π̃(θ |y)
2. Build an approximation to π(xi |θ, y): π̃(xi |θ, y)
π̃(xi |y) =
∫π̃(θ|y) π̃(xi |θ, y) dθ
π̃(θj |y) =
∫π̃(θ|y) dθ−j
3. Do the integration wrt θ numerically.
32 / 140
Tasks
1. Build an approximation to π(θ|y): π̃(θ |y)
2. Build an approximation to π(xi |θ, y): π̃(xi |θ, y)
π̃(xi |y) =
∫π̃(θ|y) π̃(xi |θ, y) dθ
π̃(θj |y) =
∫π̃(θ|y) dθ−j
3. Do the integration wrt θ numerically.
32 / 140
Task 1: π̃(θ|y)
The Laplace approximation for π(θ|y) is
π(θ|y) =π(x,θ|y)
π(x|θ, y)
∝ π(θ) π(x|θ) π(y|x)
π(x|θ, y)
≈ π(θ) π(x|θ) π(y|x,θ)
πG (x|θ, y)
∣∣∣∣∣x=x∗(θ)
where πG (x|θ, y) is the Gaussian approximation of π(x|θ, y) andx∗(θ) is the mode.
33 / 140
The GMRF-approximation
π(x|y) ∝ exp
(−1
2xTQx +
∑i
log π(yi |xi )
)
≈ exp
(−1
2(x− µ)T (Q + diag(ci ))(x− µ)
)= π̃(x|y)
Constructed as follows:
I Locate the mode x∗
I Expand to second order
Markov and computational properties are preserved
34 / 140
Remarks
The Laplace approximation
π̃(θ|y)
turn out to be accurate: x|y,θ appears almost Gaussian in mostcases, as
I x is a priori Gaussian.
I y is typically not very informative.
I Observational model is usually ‘well-behaved’.
Note: π̃(θ|y) itself does not look Gaussian!
35 / 140
Remarks
The Laplace approximation
π̃(θ|y)
turn out to be accurate: x|y,θ appears almost Gaussian in mostcases, as
I x is a priori Gaussian.
I y is typically not very informative.
I Observational model is usually ‘well-behaved’.
Note: π̃(θ|y) itself does not look Gaussian!
35 / 140
Remarks
The Laplace approximation
π̃(θ|y)
turn out to be accurate: x|y,θ appears almost Gaussian in mostcases, as
I x is a priori Gaussian.
I y is typically not very informative.
I Observational model is usually ‘well-behaved’.
Note: π̃(θ|y) itself does not look Gaussian!
35 / 140
Remarks
The Laplace approximation
π̃(θ|y)
turn out to be accurate: x|y,θ appears almost Gaussian in mostcases, as
I x is a priori Gaussian.
I y is typically not very informative.
I Observational model is usually ‘well-behaved’.
Note: π̃(θ|y) itself does not look Gaussian!
35 / 140
Remarks
The Laplace approximation
π̃(θ|y)
turn out to be accurate: x|y,θ appears almost Gaussian in mostcases, as
I x is a priori Gaussian.
I y is typically not very informative.
I Observational model is usually ‘well-behaved’.
Note: π̃(θ|y) itself does not look Gaussian!
35 / 140
Task 2: π̃(xi |y,θ)
This task is more challenging, since
I dimension of x, n is large
I and there are potential n marginals to compute, or at leastO(n).
Here we present three options:
1. Gaussian approximation
2. Laplace approximation
3. Simplified Laplace approximation
There is a trade-off between accuracy and complexity.
36 / 140
Task 2: π̃(xi |y,θ)
This task is more challenging, since
I dimension of x, n is large
I and there are potential n marginals to compute, or at leastO(n).
Here we present three options:
1. Gaussian approximation
2. Laplace approximation
3. Simplified Laplace approximation
There is a trade-off between accuracy and complexity.
36 / 140
π̃(xi |y,θ) - 1. Gaussian approximation
An obvious simple and fast alternative, is to use theGMRF-approximation πG (x|y,θ)
π̃(xi |θ, y) = N (xi ; µ(θ), σ2(θ))
I It is the fastest option, only need to compute the diagonal ofQ(θ)−1.
I Can present errors in location and asymmetry.
37 / 140
π̃(xi |y,θ) - 1. Gaussian approximation
An obvious simple and fast alternative, is to use theGMRF-approximation πG (x|y,θ)
π̃(xi |θ, y) = N (xi ; µ(θ), σ2(θ))
I It is the fastest option, only need to compute the diagonal ofQ(θ)−1.
I Can present errors in location and asymmetry.
37 / 140
π̃(xi |y,θ) - 2. Laplace approximation
I The Laplace approximation:
π̃(xi | y,θ) ≈ π(x,θ|y)
πGG (x−i |xi , y,θ)
∣∣∣∣∣x−i=x∗−i (xi ,θ)
I Again, approximation is very good, as x−i |xi , θ is ‘almostGaussian’,
I but it is expensive. In order to get the n marginals:I perform n optimizations, andI n factorizations of n − 1× n − 1 matrices.
38 / 140
π̃(xi |y,θ) - 2. Laplace approximation
I The Laplace approximation:
π̃(xi | y,θ) ≈ π(x,θ|y)
πGG (x−i |xi , y,θ)
∣∣∣∣∣x−i=x∗−i (xi ,θ)
I Again, approximation is very good, as x−i |xi , θ is ‘almostGaussian’,
I but it is expensive. In order to get the n marginals:I perform n optimizations, andI n factorizations of n − 1× n − 1 matrices.
38 / 140
π̃(xi |y,θ) - 2. Laplace approximation
I The Laplace approximation:
π̃(xi | y,θ) ≈ π(x,θ|y)
πGG (x−i |xi , y,θ)
∣∣∣∣∣x−i=x∗−i (xi ,θ)
I Again, approximation is very good, as x−i |xi , θ is ‘almostGaussian’,
I but it is expensive. In order to get the n marginals:I perform n optimizations, andI n factorizations of n − 1× n − 1 matrices.
38 / 140
π̃(xi |y,θ) - 3. Simplified Laplace approximation
Taylor expansions of the LA for π(xi |θ, y):
I computational much faster
I correct the Gaussian approximation for error in shift andskewness
log π̃(xi |θ, y) = −1
2x2i + bxi +
1
6d x3
i + · · ·
I Fit a skew-Normal density
2φ(x)Φ(ax)
I sufficiently accurate for most applications
39 / 140
π̃(xi |y,θ) - 3. Simplified Laplace approximation
Taylor expansions of the LA for π(xi |θ, y):
I computational much faster
I correct the Gaussian approximation for error in shift andskewness
log π̃(xi |θ, y) = −1
2x2i + bxi +
1
6d x3
i + · · ·
I Fit a skew-Normal density
2φ(x)Φ(ax)
I sufficiently accurate for most applications
39 / 140
π̃(xi |y,θ) - 3. Simplified Laplace approximation
Taylor expansions of the LA for π(xi |θ, y):
I computational much faster
I correct the Gaussian approximation for error in shift andskewness
log π̃(xi |θ, y) = −1
2x2i + bxi +
1
6d x3
i + · · ·
I Fit a skew-Normal density
2φ(x)Φ(ax)
I sufficiently accurate for most applications
39 / 140
π̃(xi |y,θ) - 3. Simplified Laplace approximation
Taylor expansions of the LA for π(xi |θ, y):
I computational much faster
I correct the Gaussian approximation for error in shift andskewness
log π̃(xi |θ, y) = −1
2x2i + bxi +
1
6d x3
i + · · ·
I Fit a skew-Normal density
2φ(x)Φ(ax)
I sufficiently accurate for most applications
39 / 140
Task 3: Numerical integration wrt θ
Now that we know how to compute:
I π̃(θ|y) - Laplace approximation
I π̃(xi |θ, y) -1. Gaussian2. Laplace3. Simplified Laplace
Lets see how INLA works
40 / 140
Task 3: Numerical integration wrt θ
Now that we know how to compute:
I π̃(θ|y) - Laplace approximation
I π̃(xi |θ, y) -1. Gaussian2. Laplace3. Simplified Laplace
Lets see how INLA works
40 / 140
The integrated nested Laplace approximation (INLA) I
Explore π̃(θ|y)
I Locate the modeI Use the Hessian to construct new variablesI Grid-search
41 / 140
The integrated nested Laplace approximation (INLA) I
Explore π̃(θ|y)
I Locate the modeI Use the Hessian to construct new variablesI Grid-search
41 / 140
The integrated nested Laplace approximation (INLA) I
Explore π̃(θ|y)
I Locate the modeI Use the Hessian to construct new variablesI Grid-search
41 / 140
The integrated nested Laplace approximation (INLA) I
Explore π̃(θ|y)
I Locate the modeI Use the Hessian to construct new variablesI Grid-search
41 / 140
The integrated nested Laplace approximation (INLA) II
Step II For each θj
I For each i , evaluate the Laplace approximationfor selected values of xi
I Build a Skew-Normal or log-spline correctedGaussian
N (xi ; µi , σ2i )× exp(spline)
to represent the conditional marginal density.
42 / 140
The integrated nested Laplace approximation (INLA) II
Step II For each θj
I For each i , evaluate the Laplace approximationfor selected values of xi
I Build a Skew-Normal or log-spline correctedGaussian
N (xi ; µi , σ2i )× exp(spline)
to represent the conditional marginal density.
42 / 140
The integrated nested Laplace approximation (INLA) II
Step II For each θj
I For each i , evaluate the Laplace approximationfor selected values of xi
I Build a Skew-Normal or log-spline correctedGaussian
N (xi ; µi , σ2i )× exp(spline)
to represent the conditional marginal density.
42 / 140
The integrated nested Laplace approximation (INLA) III
Step III Sum out θj
I For each i , sum out θ
π̃(xi | y) ∝∑j
π̃(xi | y,θj)× π̃(θj | y)
I Build a log-spline corrected Gaussian
N (xi ; µi , σ2i )× exp(spline)
to represent π̃(xi | y).
43 / 140
The integrated nested Laplace approximation (INLA) III
Step III Sum out θj
I For each i , sum out θ
π̃(xi | y) ∝∑j
π̃(xi | y,θj)× π̃(θj | y)
I Build a log-spline corrected Gaussian
N (xi ; µi , σ2i )× exp(spline)
to represent π̃(xi | y).
43 / 140
The integrated nested Laplace approximation (INLA) III
Step III Sum out θj
I For each i , sum out θ
π̃(xi | y) ∝∑j
π̃(xi | y,θj)× π̃(θj | y)
I Build a log-spline corrected Gaussian
N (xi ; µi , σ2i )× exp(spline)
to represent π̃(xi | y).
43 / 140
Computing posterior marginals for θj (I)
Main idea
I Use the integration-points and build an interpolant
I Use numerical integration on that interpolant
44 / 140
Computing posterior marginals for θj (I)
Main idea
I Use the integration-points and build an interpolant
I Use numerical integration on that interpolant
44 / 140
How can we assess the error in the approximations?
Tool 1: Compare a sequence of improved approximations
1. Gaussian approximation
2. Simplified Laplace
3. Laplace
45 / 140
How can we assess the error in the approximations?
Tool 2: Estimate the “effective” number of parameters as definedin the Deviance Information Criteria:
pD(θ) = D(x;θ)− D(x;θ)
and compare this with the number of observations.
Low ratio is good.
This criteria has theoretical justification.
46 / 140
Parte II
R-INLA package
47 / 140
Outline
INLA implementation
R-INLA - Model specification
Some examples
Model evaluation
Controlling hyperparameters and priors
Some more advanced features
More examples
Extras
48 / 140
Implementing INLA
All procedures required to perform INLA need to be carefullyimplemented to achieve a good speed; easy to implement a slowversion of INLA.
I The GMRFLib-library
I The inla-program
I The INLA package for R
Happily, the R package is all we need to learn!!!
49 / 140
Implementing INLA
All procedures required to perform INLA need to be carefullyimplemented to achieve a good speed; easy to implement a slowversion of INLA.
I The GMRFLib-libraryI Basic library written in C for fast computations for GMRFs.
I The inla-program
I The INLA package for R
Happily, the R package is all we need to learn!!!
49 / 140
Implementing INLA
All procedures required to perform INLA need to be carefullyimplemented to achieve a good speed; easy to implement a slowversion of INLA.
I The GMRFLib-library
I The inla-program
I Define latent Gaussian models and interface with theGMRFLib-library
I Models are defined using .ini-filesI inla-program write all the results (E/Var/marginals) to files
I The INLA package for R
Happily, the R package is all we need to learn!!!
49 / 140
Implementing INLA
All procedures required to perform INLA need to be carefullyimplemented to achieve a good speed; easy to implement a slowversion of INLA.
I The GMRFLib-library
I The inla-program
I The INLA package for R
I R-interface to the inla-program. (That’s why its not onCRAN.)
I Convert “formula”-statements into “.ini”-file definitionsI Run inla-programI Get results back to R
Happily, the R package is all we need to learn!!!
49 / 140
Implementing INLA
All procedures required to perform INLA need to be carefullyimplemented to achieve a good speed; easy to implement a slowversion of INLA.
I The GMRFLib-library
I The inla-program
I The INLA package for R
Happily, the R package is all we need to learn!!!
49 / 140
The INLA package for R
Data Frame
formula
− ini file
− Input files
Produces:
1.
2.
3.
inla
program
INLA
package
Collects results
Input
ARuns the
R
of type list
object
Output
plots etc.can get summary,
50 / 140
R-INLA
I Visit the www-site
www.r-inla.org
and follow the instructions.
I www-site contains source-code, examples, reports +++
I The first time do> source("http://www.math.ntnu.no/inla/givmeINLA.R")
Later, you can upgrade the package doing> inla.upgrade()
or if you want the test-version, which you want,> inla.upgrade(testing=TRUE)
I Available for Linux, Windows and Mac
51 / 140
R-INLA
I Visit the www-site
www.r-inla.org
and follow the instructions.
I www-site contains source-code, examples, reports +++
I The first time do> source("http://www.math.ntnu.no/inla/givmeINLA.R")
Later, you can upgrade the package doing> inla.upgrade()
or if you want the test-version, which you want,> inla.upgrade(testing=TRUE)
I Available for Linux, Windows and Mac
51 / 140
R-INLA
I Visit the www-site
www.r-inla.org
and follow the instructions.
I www-site contains source-code, examples, reports +++
I The first time do> source("http://www.math.ntnu.no/inla/givmeINLA.R")
Later, you can upgrade the package doing> inla.upgrade()
or if you want the test-version, which you want,> inla.upgrade(testing=TRUE)
I Available for Linux, Windows and Mac
51 / 140
Outline
INLA implementation
R-INLA - Model specification
Some examples
Model evaluation
Controlling hyperparameters and priors
Some more advanced features
More examples
Extras
52 / 140
The structure of an R program using INLA
There are essentially three parts to an INLA program:
1. The data organization.
2. The formula - notation inherited from R’s native glm function.
3. The call to the INLA program.
53 / 140
The inla function
I This is all that’s needed for a basic call
> result <- inla(
formula = y ~ 1 + x, # This describes your latent
# field
family = "gaussian", # The likelihood distribution.
data = data.frame(y,x) # A list or dataframe
)
54 / 140
The simplest case: Linear regression
n = 100
x = sort(runif(n))
y = 1 + x + rnorm(n, sd = 0.1)
plot(x,y)
formula = y ~ 1 + x
result = inla(formula,
data = data.frame(x,y),
family = "gaussian")
summary(result)
plot(result)
55 / 140
Call:
c("inla(formula = formula, family = \"gaussian\", data = data.frame(x, ", " y))")
Time used:
Pre-processing Running inla Post-processing Total
0.08050394 0.03020334 0.01916695 0.12987423
Fixed effects:
mean sd 0.025quant 0.5quant 0.975quant kld
(Intercept) 0.9690533 0.01849785 0.9327319 0.9690531 1.005387 0
x 1.0426582 0.03126996 0.9812582 1.0426580 1.104079 0
The model has no random effects
Model hyperparameters:
mean sd 0.025quant 0.5quant
Precision for the Gaussian observations 127.45 18.10 95.14 126.37
0.975quant
Precision for the Gaussian observations 166.11
Expected number of effective parameters(std dev): 2.209(0.02362)
Number of equivalent replicates : 45.27
Marginal Likelihood: 88.01
56 / 140
Likelihood functions - family argument
result = inla(formula,
data = data.frame(x,y),
family = "gaussian")
I “binomial”
I “coxph”
I “Exponential”
I “gaussian”
I “gev”
I “laplace”
I “sn”(Skew Normal)
I “stochvol”, ”stochvol.nig”, ”stochvol.t”
I “T”
I “weibull”
I Many others: go to http://r-inla.org/
57 / 140
Likelihood functions - family argument
result = inla(formula,
data = data.frame(x,y),
family = "gaussian")
I “binomial”
I “coxph”
I “Exponential”
I “gaussian”
I “gev”
I “laplace”
I “sn”(Skew Normal)
I “stochvol”, ”stochvol.nig”, ”stochvol.t”
I “T”
I “weibull”
I Many others: go to http://r-inla.org/
57 / 140
A more general model
Assume the following model:
y ∼ π(y |η)
η = g(λ) = β0 + β1x1 + β2x2 + f (x3)
where
x1, x2 are covariates, linear effect
βi ∼ N (0, τ−11 )
x3 can be the index for spatial effect, random effect, etc
{f1, f2, . . . } ∼ N (0,Q−1f (τ2))
58 / 140
A more general model
Assume the following model:
y ∼ π(y |η)
η = g(λ) = β0 + β1x1 + β2x2 + f (x3)
where
x1, x2 are covariates, linear effect
βi ∼ N (0, τ−11 )
x3 can be the index for spatial effect, random effect, etc
{f1, f2, . . . } ∼ N (0,Q−1f (τ2))
58 / 140
A more general model (cont.)Assume the following model:
y ∼ π(y |η)
η = g(λ) = β0 + β1x1 + β2x2 + f (x3)
> formula = y ∼ x1 + x2 + f(x3, ...)
y =
y1
y2...
yn
g−→ η =
η1
η2...ηn
η =
η1
η2...ηn
= β0 ∗
11...1
+ β1 ∗
x11
x12...
x1n
+ β2 ∗
x21
x22...
x2n
+
fx31
fx32
...fx3n
59 / 140
A more general model (cont.)Assume the following model:
y ∼ π(y |η)
η = g(λ) = β0 + β1x1 + β2x2 + f (x3)
> formula = y ∼ x1 + x2 + f(x3, ...)
y =
y1
y2...
yn
g−→ η =
η1
η2...ηn
η =
η1
η2...ηn
= β0 ∗
11...1
+ β1 ∗
x11
x12...
x1n
+ β2 ∗
x21
x22...
x2n
+
fx31
fx32
...fx3n
59 / 140
A more general model (cont.)Assume the following model:
y ∼ π(y |η)
η = g(λ) = β0 + β1x1 + β2x2 + f (x3)
> formula = y ∼ x1 + x2 + f(x3, ...)
y =
y1
y2...
yn
g−→ η =
η1
η2...ηn
η =
η1
η2...ηn
= β0 ∗
11...1
+ β1 ∗
x11
x12...
x1n
+ β2 ∗
x21
x22...
x2n
+
fx31
fx32
...fx3n
59 / 140
Model specification - INLA packageThe model is specified in R through a formula, similar to glm:
> formula = y ∼ x1 + x2 + f(x3, ...)
I y is the name of your response variable in your data frame.
I An intercept is fitted automatically! Use -1 in your formula toavoid it.
I The fixed effects (β0, β1 and β2) are taken as i.i.d. normalwith zero mean and small precision. (This can be changed)
I The f() function contains the random effect specifications.
Some models
I iid, iid1d, ii2d, iid3d: random effects
I rw1, rw2, ar1: smooth effect of covariates or time effect
I seasonal: seasonal effect
I besag: spatial effect (CAR model)
I generic: user defined precision matrix60 / 140
Model specification - INLA packageThe model is specified in R through a formula, similar to glm:
> formula = y ∼ x1 + x2 + f(x3, ...)
I y is the name of your response variable in your data frame.
I An intercept is fitted automatically! Use -1 in your formula toavoid it.
I The fixed effects (β0, β1 and β2) are taken as i.i.d. normalwith zero mean and small precision. (This can be changed)
I The f() function contains the random effect specifications.
Some models
I iid, iid1d, ii2d, iid3d: random effects
I rw1, rw2, ar1: smooth effect of covariates or time effect
I seasonal: seasonal effect
I besag: spatial effect (CAR model)
I generic: user defined precision matrix60 / 140
Model specification - INLA packageThe model is specified in R through a formula, similar to glm:
> formula = y ∼ x1 + x2 + f(x3, ...)
I y is the name of your response variable in your data frame.
I An intercept is fitted automatically! Use -1 in your formula toavoid it.
I The fixed effects (β0, β1 and β2) are taken as i.i.d. normalwith zero mean and small precision. (This can be changed)
I The f() function contains the random effect specifications.
Some models
I iid, iid1d, ii2d, iid3d: random effects
I rw1, rw2, ar1: smooth effect of covariates or time effect
I seasonal: seasonal effect
I besag: spatial effect (CAR model)
I generic: user defined precision matrix60 / 140
Model specification - INLA packageThe model is specified in R through a formula, similar to glm:
> formula = y ∼ x1 + x2 + f(x3, ...)
I y is the name of your response variable in your data frame.
I An intercept is fitted automatically! Use -1 in your formula toavoid it.
I The fixed effects (β0, β1 and β2) are taken as i.i.d. normalwith zero mean and small precision. (This can be changed)
I The f() function contains the random effect specifications.
Some models
I iid, iid1d, ii2d, iid3d: random effects
I rw1, rw2, ar1: smooth effect of covariates or time effect
I seasonal: seasonal effect
I besag: spatial effect (CAR model)
I generic: user defined precision matrix60 / 140
Model specification - INLA packageThe model is specified in R through a formula, similar to glm:
> formula = y ∼ x1 + x2 + f(x3, ...)
I y is the name of your response variable in your data frame.
I An intercept is fitted automatically! Use -1 in your formula toavoid it.
I The fixed effects (β0, β1 and β2) are taken as i.i.d. normalwith zero mean and small precision. (This can be changed)
I The f() function contains the random effect specifications.
Some models
I iid, iid1d, ii2d, iid3d: random effects
I rw1, rw2, ar1: smooth effect of covariates or time effect
I seasonal: seasonal effect
I besag: spatial effect (CAR model)
I generic: user defined precision matrix60 / 140
Model specification - INLA packageThe model is specified in R through a formula, similar to glm:
> formula = y ∼ x1 + x2 + f(x3, ...)
I y is the name of your response variable in your data frame.
I An intercept is fitted automatically! Use -1 in your formula toavoid it.
I The fixed effects (β0, β1 and β2) are taken as i.i.d. normalwith zero mean and small precision. (This can be changed)
I The f() function contains the random effect specifications.
Some models
I iid, iid1d, ii2d, iid3d: random effects
I rw1, rw2, ar1: smooth effect of covariates or time effect
I seasonal: seasonal effect
I besag: spatial effect (CAR model)
I generic: user defined precision matrix60 / 140
Specifying random effects
Random effects are added to the formula through the function
f(name, model="...", hyper = ...,
replicate = ..., constr = FALSE, cyclic = FALSE)
I name - the name of the random effect. Also refers to thevalues in data which are used for various things, usuallyindexes, e.g. for space or time.
I model - the latent model. Eg. “iid”, “rw2”, “ar1”, etc.
I hyper - specify the prior on the hyperparameters
I constr - Sum to zero constraint?
I cyclic - Are you cyclic? (RW1, RW2 and AR1)
I The are more advanced options, we see later.
61 / 140
Specifying random effects
Random effects are added to the formula through the function
f(name, model="...", hyper = ...,
replicate = ..., constr = FALSE, cyclic = FALSE)
I name - the name of the random effect. Also refers to thevalues in data which are used for various things, usuallyindexes, e.g. for space or time.
I model - the latent model. Eg. “iid”, “rw2”, “ar1”, etc.
I hyper - specify the prior on the hyperparameters
I constr - Sum to zero constraint?
I cyclic - Are you cyclic? (RW1, RW2 and AR1)
I The are more advanced options, we see later.
61 / 140
Specifying random effects
Random effects are added to the formula through the function
f(name, model="...", hyper = ...,
replicate = ..., constr = FALSE, cyclic = FALSE)
I name - the name of the random effect. Also refers to thevalues in data which are used for various things, usuallyindexes, e.g. for space or time.
I model - the latent model. Eg. “iid”, “rw2”, “ar1”, etc.
I hyper - specify the prior on the hyperparameters
I constr - Sum to zero constraint?
I cyclic - Are you cyclic? (RW1, RW2 and AR1)
I The are more advanced options, we see later.
61 / 140
Specifying random effects
Random effects are added to the formula through the function
f(name, model="...", hyper = ...,
replicate = ..., constr = FALSE, cyclic = FALSE)
I name - the name of the random effect. Also refers to thevalues in data which are used for various things, usuallyindexes, e.g. for space or time.
I model - the latent model. Eg. “iid”, “rw2”, “ar1”, etc.
I hyper - specify the prior on the hyperparameters
I constr - Sum to zero constraint?
I cyclic - Are you cyclic? (RW1, RW2 and AR1)
I The are more advanced options, we see later.
61 / 140
Specifying random effects
Random effects are added to the formula through the function
f(name, model="...", hyper = ...,
replicate = ..., constr = FALSE, cyclic = FALSE)
I name - the name of the random effect. Also refers to thevalues in data which are used for various things, usuallyindexes, e.g. for space or time.
I model - the latent model. Eg. “iid”, “rw2”, “ar1”, etc.
I hyper - specify the prior on the hyperparameters
I constr - Sum to zero constraint?
I cyclic - Are you cyclic? (RW1, RW2 and AR1)
I The are more advanced options, we see later.
61 / 140
Specifying random effects
Random effects are added to the formula through the function
f(name, model="...", hyper = ...,
replicate = ..., constr = FALSE, cyclic = FALSE)
I name - the name of the random effect. Also refers to thevalues in data which are used for various things, usuallyindexes, e.g. for space or time.
I model - the latent model. Eg. “iid”, “rw2”, “ar1”, etc.
I hyper - specify the prior on the hyperparameters
I constr - Sum to zero constraint?
I cyclic - Are you cyclic? (RW1, RW2 and AR1)
I The are more advanced options, we see later.
61 / 140
Outline
INLA implementation
R-INLA - Model specification
Some examples
Model evaluation
Controlling hyperparameters and priors
Some more advanced features
More examples
Extras
62 / 140
EPIL example
Seizure counts in a randomized trial of anti-convulsant therapy inepilepsy. From WinBUGS manual.
Patient y1 y2 y3 y4 Trt Base Age
1 5 3 3 3 0 11 312 3 5 3 3 0 11 30
....59 1 4 3 2 1 12 37
63 / 140
EPIL example (cont.)
I Mixed model with repeated Poisson counts
yjk ∼ Poisson(µjk); j = 1, . . . , 59; k = 1, . . . , 4
log(µjk) = α0 + α1 log(Basej/4) + α2Trtj+α3Trtj log(Basej/4) + α4Agej + α5V 4+Indj + βjk
αi ∼ N (0, τα) τα knownIndj ∼ N (0, τInd) τInd ∼ Gamma(a1, b1)βjk ∼ N (0, τβ) τβ ∼ Gamma(a2, b2)
64 / 140
EPIL example (cont.)The Epil data frame:
y Trt Base Age V4 rand Ind
5 0 11 31 0 1 1
3 0 11 31 0 2 1...
Specifying the model:
formula = y ∼ log(Base/4) + Trt + I(Trt *
log(Base/4)) + log(Age) + V4 +
f(Ind, model = "iid") + f(rand, model="iid")
η =
η1
η2...
η4∗59
= β0 ∗
11...1
+ . . .+
f Ind1
f Ind1...
f Ind59
+
f Rand1
f Rand2
...f Ind4∗59
65 / 140
EPIL example (cont.)The Epil data frame:
y Trt Base Age V4 rand Ind
5 0 11 31 0 1 1
3 0 11 31 0 2 1...
Specifying the model:
formula = y ∼ log(Base/4) + Trt + I(Trt *
log(Base/4)) + log(Age) + V4 +
f(Ind, model = "iid") + f(rand, model="iid")
η =
η1
η2...
η4∗59
= β0 ∗
11...1
+ . . .+
f Ind1
f Ind1...
f Ind59
+
f Rand1
f Rand2
...f Ind4∗59
65 / 140
EPIL example (cont.)The Epil data frame:
y Trt Base Age V4 rand Ind
5 0 11 31 0 1 1
3 0 11 31 0 2 1...
Specifying the model:
formula = y ∼ log(Base/4) + Trt + I(Trt *
log(Base/4)) + log(Age) + V4 +
f(Ind, model = "iid") + f(rand, model="iid")
η =
η1
η2...
η4∗59
= β0 ∗
11...1
+ . . .+
f Ind1
f Ind1...
f Ind59
+
f Rand1
f Rand2
...f Ind4∗59
65 / 140
data(Epil)
my.center = function(x) (x - mean(x))
Epil$CTrt = my.center(Epil$Trt)
Epil$ClBase4 = my.center(log(Epil$Base/4))
Epil$CV4 = my.center(Epil$V4)
Epil$ClAge = my.center(log(Epil$Age))
formula = y ~ ClBase4*CTrt + ClAge + CV4 +
f(Ind, model="iid") + f(rand, model="iid")
result = inla(formula,family="poisson", data = Epil)
summary(result)
plot(result)
66 / 140
Epil-example from Win/Open-BUGS
1.2 1.4 1.6 1.8 2.0
01
23
45
Marginals for α0
67 / 140
Epil-example from Win/Open-BUGS
0 5 10 15
0.0
0.1
0.2
0.3
Marginals for τβ
67 / 140
EPIL example (cont.)
Access results
- Summaries (mean, sd, [0.025, 0.5, 0.975]-quantiles, kld)
I result$summary.fixed
I result$summary.random$Ind
I result$summary.random$rand
I result$summary.hyperpar
- Post. marginals (matrix with x- and y- axis)
I result$marginals.fixed
I result$marginals.random$Ind
I result$marginals.random$rand
I result$marginals.hyperpar
68 / 140
EPIL example (cont.)
Access results
- Summaries (mean, sd, [0.025, 0.5, 0.975]-quantiles, kld)
I result$summary.fixed
I result$summary.random$Ind
I result$summary.random$rand
I result$summary.hyperpar
- Post. marginals (matrix with x- and y- axis)
I result$marginals.fixed
I result$marginals.random$Ind
I result$marginals.random$rand
I result$marginals.hyperpar
68 / 140
Smoothing binary times series
0 100 200 300
0.00.5
1.01.5
2.0
Time
Number of days in Tokyo with rainfall above 1 mm in 1983-84.We want to estimate the probability of rain pt for calendar dayt = 1, . . . , 366
69 / 140
Smoothing binary times series
I Model with time series component
yt ∼ Binomial(nt , pt); t = 1, . . . , 366
pt = exp(ηt)1+exp(ηt)
ηt = f (t)f = {f1, . . . , f366} ∼ cyclic RW2(τ)τ ∼ Gamma(1, 0.0001)
70 / 140
Smoothing binary time series
The Tokyo data frame:
y n time
0 2 1
0 2 2
1 2 3...
71 / 140
Smoothing binary time series
The Tokyo data frame:
y n time
0 2 1
0 2 2
1 2 3...
Specifying the model:formula = y ∼ f(time, model="rw2", cyclic=TRUE)-1
71 / 140
Smoothing binary time series
The Tokyo data frame:
y n time
0 2 1
0 2 2
1 2 3...
Specifying the model:formula = y ∼ f(time, model="rw2", cyclic=TRUE)-1
η =
η1
η2...
η366
=
f time1
f time2
...f time366
71 / 140
data(Tokyo)
formula = y ~ f(time, model="rw2", cyclic=TRUE) - 1
result = inla(formula, family="binomial", Ntrials=n,
data=Tokyo)
72 / 140
Posterior for temporal effect
0 100 200 300
-2.5
-2.0
-1.5
-1.0
-0.5
0.0
0.5
time
PostMean 0.025% 0.5% 0.975%
73 / 140
Posterior for precision
0 10000 20000 30000 40000 50000 60000
0e+00
1e-05
2e-05
3e-05
4e-05
5e-05
6e-05
7e-05
PostDens [Precision for time]
74 / 140
Disease mapping in Germany
Larynx cancer mortality counts are observed in the 544 district ofGermany from 1986 to 1990 and level of smoking consumption(100 possible values).
0.63
0.95
1.27
1.59
1.91
2.23
2.55
26.22
38.02
49.82
61.61
73.41
85.2
97
75 / 140
yi , i = 1, . . . , 544 counts of cancer mortality in Region iEi , i = 1, . . . , 544 known variable accounting for demographicvariation in Region ici , i = 1, . . . , 544 level of smoking consumption registered inRegion i
0.63
0.95
1.27
1.59
1.91
2.23
2.55
26.22
38.02
49.82
61.61
73.41
85.2
97
76 / 140
The model
yi ∼ Poisson{Ei exp(ηi )}; i = 1, . . . , 544ηi = µ+ f (ci ) + fs(si ) + fu(si )
where:
I f (ci ) is a smooth effect of the covariate
f = {f1, . . . , f100} ∼ RW2(τf )
I fs(si ) is a spatial effect modeled as an intrinsic GMRF
fs(s)|fs(s ′), s 6= s ′, λs ∼ N (1
ns
∑s∼s′
fs(s ′),τfsns
)
I fu(si ) is a random effect
fu = {fu(s1), . . . , fu(s544)} ∼ N(0, τfu I)
I µ is an intercept term µ ∼ N (0, 0.0001)
77 / 140
The model
yi ∼ Poisson{Ei exp(ηi )}; i = 1, . . . , 544ηi = µ+ f (ci ) + fs(si ) + fu(si )
where:
I f (ci ) is a smooth effect of the covariate
f = {f1, . . . , f100} ∼ RW2(τf )
I fs(si ) is a spatial effect modeled as an intrinsic GMRF
fs(s)|fs(s ′), s 6= s ′, λs ∼ N (1
ns
∑s∼s′
fs(s ′),τfsns
)
I fu(si ) is a random effect
fu = {fu(s1), . . . , fu(s544)} ∼ N(0, τfu I)
I µ is an intercept term µ ∼ N (0, 0.0001)
77 / 140
The model
yi ∼ Poisson{Ei exp(ηi )}; i = 1, . . . , 544ηi = µ+ f (ci ) + fs(si ) + fu(si )
where:
I f (ci ) is a smooth effect of the covariate
f = {f1, . . . , f100} ∼ RW2(τf )
I fs(si ) is a spatial effect modeled as an intrinsic GMRF
fs(s)|fs(s ′), s 6= s ′, λs ∼ N (1
ns
∑s∼s′
fs(s ′),τfsns
)
I fu(si ) is a random effect
fu = {fu(s1), . . . , fu(s544)} ∼ N(0, τfu I)
I µ is an intercept term µ ∼ N (0, 0.0001)
77 / 140
The model
yi ∼ Poisson{Ei exp(ηi )}; i = 1, . . . , 544ηi = µ+ f (ci ) + fs(si ) + fu(si )
where:
I f (ci ) is a smooth effect of the covariate
f = {f1, . . . , f100} ∼ RW2(τf )
I fs(si ) is a spatial effect modeled as an intrinsic GMRF
fs(s)|fs(s ′), s 6= s ′, λs ∼ N (1
ns
∑s∼s′
fs(s ′),τfsns
)
I fu(si ) is a random effect
fu = {fu(s1), . . . , fu(s544)} ∼ N(0, τfu I)
I µ is an intercept term µ ∼ N (0, 0.0001)
77 / 140
The model
yi ∼ Poisson{Ei exp(ηi )}; i = 1, . . . , 544ηi = µ+ f (ci ) + fs(si ) + fu(si )
where:
I f (ci ) is a smooth effect of the covariate
f = {f1, . . . , f100} ∼ RW2(τf )
I fs(si ) is a spatial effect modeled as an intrinsic GMRF
fs(s)|fs(s ′), s 6= s ′, λs ∼ N (1
ns
∑s∼s′
fs(s ′),τfsns
)
I fu(si ) is a random effect
fu = {fu(s1), . . . , fu(s544)} ∼ N(0, τfu I)
I µ is an intercept term µ ∼ N (0, 0.0001)
77 / 140
For identifiably we define a sum-to-zero constraint for all intrinsicmodels, so ∑
s fs(s) = 0∑i fi = 0
78 / 140
The Germany data frame:
region E Y x
0 7.965008 8 56
1 22.836219 22 65
The model is:
ηi = µ+ f (ci ) + fs(si ) + fu(si )
I The data set has to contain one separate column for eachterm specified through f() so in this case we have to add onecolumn.> Germany = cbind(Germany, region.struct=Germany$region)
I We also need the graph file where the neighborhood structureis specified germany.graph
79 / 140
The Germany data frame:
region E Y x
0 7.965008 8 56
1 22.836219 22 65
The model is:
ηi = µ+ f (ci ) + fs(si ) + fu(si )
I The data set has to contain one separate column for eachterm specified through f() so in this case we have to add onecolumn.> Germany = cbind(Germany, region.struct=Germany$region)
I We also need the graph file where the neighborhood structureis specified germany.graph
79 / 140
The Germany data frame:
region E Y x
0 7.965008 8 56
1 22.836219 22 65
The model is:
ηi = µ+ f (ci ) + fs(si ) + fu(si )
I The data set has to contain one separate column for eachterm specified through f() so in this case we have to add onecolumn.> Germany = cbind(Germany, region.struct=Germany$region)
I We also need the graph file where the neighborhood structureis specified germany.graph
79 / 140
The new data set is:
region E Y x region.struct
0 7.965008 8 56 0
1 22.836219 22 65 1
Then the formula isformula <- Y ∼f(region.struct,model="besag",graph="germany.graph")+
f(x,model="rw2")+f(region)
80 / 140
The new data set is:
region E Y x region.struct
0 7.965008 8 56 0
1 22.836219 22 65 1
Then the formula isformula <- Y ∼f(region.struct,model="besag",graph="germany.graph")+
f(x,model="rw2")+f(region)
The sum-to-zero constraint is default in the inla function for allintrinsic models.
80 / 140
The new data set is:
region E Y x region.struct
0 7.965008 8 56 0
1 22.836219 22 65 1
Then the formula isformula <- Y ∼f(region.struct,model="besag",graph="germany.graph")+
f(x,model="rw2")+f(region)
The sum-to-zero constraint is default in the inla function for allintrinsic models.
80 / 140
The new data set is:
region E Y x region.struct
0 7.965008 8 56 0
1 22.836219 22 65 1
Then the formula isformula <- Y ∼f(region.struct,model="besag",graph="germany.graph")+
f(x,model="rw2")+f(region)
80 / 140
The new data set is:
region E Y x region.struct
0 7.965008 8 56 0
1 22.836219 22 65 1
Then the formula isformula <- Y ∼f(region.struct,model="besag",graph="germany.graph")+
f(x,model="rw2")+f(region)
The location of the graph file has to be provided here (the graphfile cannot be loaded in R)
80 / 140
The graph file
The germany.graph file:
5441 1 122 2 10 113 4 6 8 15 387...
I Total number of nodes in the graph
I Identifier for the node
I Number of neighbors
I Identifiers for the neighbors
81 / 140
The graph file
The germany.graph file:
5441 1 122 2 10 113 4 6 8 15 387...
I Total number of nodes in the graph
I Identifier for the node
I Number of neighbors
I Identifiers for the neighbors
81 / 140
The graph file
The germany.graph file:
5441 1 122 2 10 113 4 6 8 15 387...
I Total number of nodes in the graph
I Identifier for the node
I Number of neighbors
I Identifiers for the neighbors
81 / 140
The graph file
The germany.graph file:
5441 1 122 2 10 113 4 6 8 15 387...
I Total number of nodes in the graph
I Identifier for the node
I Number of neighbors
I Identifiers for the neighbors
81 / 140
The graph file
The germany.graph file:
5441 1 122 2 10 113 4 6 8 15 387...
I Total number of nodes in the graph
I Identifier for the node
I Number of neighbors
I Identifiers for the neighbors
81 / 140
data(Germany)
g = system.file("demodata/germany.graph", package="INLA")
source(system.file("demodata/Bym-map.R", package="INLA"))
Germany = cbind(Germany, region.struct=Germany$region)
# standard BYM model
formula1 = Y ~ f(region.struct,model="besag",graph=g) +
f(region,model="iid")
# with linear covariate
formula2 = Y ~ f(region.struct,model="besag",graph=g) +
f(region,model="iid") + x
# with smooth covariate
formula3 = Y ~ f(region.struct,model="besag",graph=g) +
f(region,model="iid") + f(x, model="rw2")
82 / 140
result1 = inla(formula1,family="poisson",data=Germany,E=E,
control.compute=list(dic=TRUE))
result2 = inla(formula2,family="poisson",data=Germany,E=E,
control.compute=list(dic=TRUE))
result3 = inla(formula3,family="poisson",data=Germany,E=E,
control.compute=list(dic=TRUE))
83 / 140
Other graph specification
- It is also possible to define the graph structure of your modelusing:
I A symmetric (dense or sparse) matrix, where the non-zeropattern of the matrix defines the graph.
I A inla.graph object.
See FAQ on the webpage for more information.
84 / 140
Outline
INLA implementation
R-INLA - Model specification
Some examples
Model evaluation
Controlling hyperparameters and priors
Some more advanced features
More examples
Extras
85 / 140
Model evaluationI Deviance Information Criterion (DIC):
result = inla(..., control.compute = list(dic = TRUE))
result$dic$dic
I Conditional predictive ordinate (CPO) and probability integraltransform (PIT):
CPOi = π(yi |y−i )
PITi = Prob(Yi ≤ yobsi |y−i )
result = inla(..., control.compute = list(cpo = TRUE))
result$cpo$cpo
result$cpo$dic
86 / 140
Outline
INLA implementation
R-INLA - Model specification
Some examples
Model evaluation
Controlling hyperparameters and priors
Some more advanced features
More examples
Extras
87 / 140
Controlling θ
I We often need to set our own priors and using our ownparameters in these.
I These can be set in two ways
Old style using prior=.., param=..., initial=...,
fixed=...
New style using hyper = list(prec =
list(initial=2, fixed=TRUE, ....))
The old-style is there for backward-compatibility only. The twostyles can also be mixed.
88 / 140
Controlling θ
I We often need to set our own priors and using our ownparameters in these.
I These can be set in two ways
Old style using prior=.., param=..., initial=...,
fixed=...
New style using hyper = list(prec =
list(initial=2, fixed=TRUE, ....))
The old-style is there for backward-compatibility only. The twostyles can also be mixed.
88 / 140
Controlling θ
I We often need to set our own priors and using our ownparameters in these.
I These can be set in two ways
Old style using prior=.., param=..., initial=...,
fixed=...
New style using hyper = list(prec =
list(initial=2, fixed=TRUE, ....))
The old-style is there for backward-compatibility only. The twostyles can also be mixed.
88 / 140
Example- New style
hyper = list(
prec = list(
prior = "loggamma",
param = c(2,0.1),
initial = 3,
fixed = FALSE
)
)
formula = y ~ f(i, model="iid", hyper = hyper) + ...
- Old style
formula = y ~ f(i, model="iid", prior = "loggamma",
param = c(2,0.1), inital = 3,
fixed = FALSE) + ...
89 / 140
Internal and external scale
Hyperparameters, like the precision τ is represented internally usinga “good” transformation, like
θ1 = log(τ)
I Initial values are given in the internal scale
I the to.theta and from.theta functions can be used to mapbetween the external and internal scale.
90 / 140
Internal and external scale
Hyperparameters, like the precision τ is represented internally usinga “good” transformation, like
θ1 = log(τ)
I Initial values are given in the internal scale
I the to.theta and from.theta functions can be used to mapbetween the external and internal scale.
90 / 140
Internal and external scale
Hyperparameters, like the precision τ is represented internally usinga “good” transformation, like
θ1 = log(τ)
I Initial values are given in the internal scale
I the to.theta and from.theta functions can be used to mapbetween the external and internal scale.
90 / 140
Example: AR1 model
hyper
theta1
name log precisionshort.name prec
prior loggammaparam 1 5e-05initial 4fixed FALSE
to.thetafrom.theta
theta2
name logit lag one correlationshort.name rho
prior normalparam 0 0.15initial 2fixed FALSE
to.thetafrom.theta
constr FALSE
nrow.ncol FALSE
augmented FALSE
aug.factor 1
aug.constr
n.div.by
n.required FALSE
set.default.values FALSE
pdf ar1
91 / 140
Outline
INLA implementation
R-INLA - Model specification
Some examples
Model evaluation
Controlling hyperparameters and priors
Some more advanced features
More examples
Extras
92 / 140
Feature: replicate
“replicate” generates iid replicates from the same model with thesame hyperparameters.
If x | θ ∼ AR(1), then nrep=3, makes
x = (x1, x2, x3)
with mutually independent xi ’s from AR(1) with the same θ
Most f()-models can be replicated
93 / 140
Example: replicate
n=100
x1 = arima.sim(n, model=list(ar=0.9)) + 1
x2 = arima.sim(n, model=list(ar=0.9)) - 1
y1 = rpois(n,exp(x1))
y2 = rpois(n,exp(x2))
y = c(y1,y2)
i = rep(1:n,2)
r = rep(1:2,each=n)
intercept = as.factor(r)
formula = y ~ f(i, model="ar1", replicate=r) + intercept -1
result = inla(formula, family = "poisson",
data = data.frame(y=y,i=i,r=r))
94 / 140
Example: replicate
i = rep(1:n,2)
r = rep(1:2,each=n)
intercept = as.factor(r)
formula = y ~ f(i, model="ar1", replicate=r) + intercept -1
y1,1
...yn,1y1,2
...yn,2
g−→
η1,1
...ηn,1η1,2
...ηn,2
=
f i1,1...
f in,1
f i1,2...
f in,2
+ β0,1 ∗
1...10...0
+ β0,2 ∗
0...01...1
95 / 140
Feature: More than one family
Every observation could have its own likelihood!
I Response is a matrix or list
I Each “column” defines a separate “family”
I Each “family” has its own hyperparameters
96 / 140
n=100
phi = 0.9
x1 = 1 + arima.sim(n, model=list(ar=phi))
x2 = 0.5 + arima.sim(n, model=list(ar=phi))
y1 = rbinom(n,size=1, prob=exp(x1)/(1+exp(x1)))
y2 = rpois(n,exp(x2))
y = matrix(NA, 2*n, 2)
y[ 1:n, 1] = y1
y[n+1:n, 2] = y2
i = rep(1:n,2)
r = rep(1:2,each=n)
intercept = as.factor(r)
Ntrials = c(rep(1,n), rep(NA,n))
formula = y ~ f(i, model="ar1", replicate=r) + intercept -1
result = inla(formula, family = c("binomial", "poisson"),
Ntrials = Ntrials, data = data.frame(y,i,r))
97 / 140
y = matrix(NA, 2*n, 2)
y[ 1:n, 1] = y1
y[n+1:n, 2] = y2
i = rep(1:n,2)
r = rep(1:2,each=n)
intercept = as.factor(r)
Ntrials = c(rep(1,n), rep(NA,n))
formula = y ~ f(i, model="ar1", replicate=r) + intercept -1
result = inla(formula, family = c("binomial", "poisson"),
Ntrials = Ntrials, data = data.frame(y,i,r))
y1,1 NA...
...yn,1 NANA y1,2
......
NA yn,2
g−→
η1,1
...ηn,1η1,2
...ηn,2
=
f i1,1...
f in,1f i1,2...
f in,2
+ β0,1 ∗
1...10...0
+ β0,2 ∗
0...01...1
98 / 140
More than one family - More examples
Some rather advanced examples on www.r-inla.org using thisfeature
I Preferential sampling, geostatistics (marked point process)
I Weibull-survival data and “longitudinal” data
99 / 140
Feature: copy
The model
formula = y ~ f(i, ...) + ...
Only allow ONE element from each sub-model, to contribute tothe linear predictor for each observation.
Sometimes this is not sufficient.
100 / 140
Feature: copy
Supposeηi = ui + ui+1 + ...
Then we can code this as
formula = f(i, model="iid") + f(i.plus, copy="i")
I The copy-feature, creates an additional sub-model which isε-close to the target.
I Many copies allowed
I Copy with unknown scaling (default scaling is fixed to 1).
η1...ηn
=
u1...
un
+
u2...
un
101 / 140
Feature: copySuppose that
ηi = ai + bizi + ....
where(ai , bi )
iid∼ N2(0,Σ)
- Simulate data
n = 100
Sigma = matrix(c(1, 0.8, 0.8, 1), 2, 2)
z = runif(n)
ab = rmvnorm(n, sigma = Sigma)
a = ab[, 1]
b = ab[, 2]
eta = a + b * z
s = 0.1
y = eta + rnorm(n, sd=s)
102 / 140
i = 1:n
j = 1:n + n
formula = y ~ f(i, model="iid2d", n = 2*n) + f(j, z, copy="i") -1
r = inla(formula, data = data.frame(y, i, j))
η1
...ηn
=
a1
...anb1
...bn
+
b1 ∗ z1
...bn ∗ zn
103 / 140
Feature: Linear-combinations
Possible to extract extra information from the model through linearcombinations of the latent field, say
v = Bx
for a k × n matrix B.
104 / 140
Feature: Linear-combinations (cont.)
Two different approaches.
1. Most “correct” is to do the computations on the enlarged field
x̃ = (x, v)
But this often lead to more dense precision matrix.
2. The second option is to compute these “offline”, as(conditionally on θ)
Var(v1) = Var(bT1 x) ≈ bT
1 Q−1GMRFapproxb1
andE (v1) = b1E (x)
Approximate density of v1 with a Normal.
105 / 140
Feature: Linear-combinations (cont.)
Two different approaches.
1. Most “correct” is to do the computations on the enlarged field
x̃ = (x, v)
But this often lead to more dense precision matrix.
2. The second option is to compute these “offline”, as(conditionally on θ)
Var(v1) = Var(bT1 x) ≈ bT
1 Q−1GMRFapproxb1
andE (v1) = b1E (x)
Approximate density of v1 with a Normal.
105 / 140
formula = y ~ ClBase4*CTrt + ClAge + CV4 +
f(Ind, model="iid") + f(rand, model="iid")
## Now I want the posterior for
##
## 1) 2*CTrt - CV4
## 2) Ind[2] - rand[2]
##
lc1 = inla.make.lincomb( CTrt = 2, CV4 = -1)
names(lc1) = "lc1"
lc2 = inla.make.lincomb( Ind = c(NA,1), rand = c(NA,-1))
names(lc2) = "lc2"
## default is to derive the marginals from lc’s without changing the
## latent field
result1 = inla(formula,family="poisson", data = Epil,
lincomb = c(lc1, lc2))
## but the lincombs can also be additionally included into the latent
## field for increased accurancy...
result2 = inla(formula,family="poisson", data = Epil,
lincomb = c(lc1, lc2),
control.inla = list(lincomb.derived.only = FALSE))
106 / 140
- Get the results
result$summary.lincomb.derived
result$marginals.lincomb.derived # results of the
# default method
result$summary.lincomb
result$marginals.lincomb # alternative method
- Posterior correlation matrix between all the linearcombinations
control.inla = list(lincomb.derived.correlation.matrix = TRUE)
result$misc$lincomb.derived.correlation.matrix
- Many linear combinations at onceUse inla.make.lincombs()
107 / 140
A-matrix in the linear predictor (I)
Usual formulaη = ...
andyi ∼ π(yi | ηi , ...)
108 / 140
A-matrix in the linear predictor (II)
Extended formulaη = ...
η∗ = Aη
andyi ∼ π(yi | η∗i , ...)
Implemented as
A = matrix(...)
A = sparseMatrix(...)
result = inla(formula, ...,
control.predictor = list(A = A))
109 / 140
A-matrix in the linear predictor (II)
Extended formulaη = ...
η∗ = Aη
andyi ∼ π(yi | η∗i , ...)
Implemented as
A = matrix(...)
A = sparseMatrix(...)
result = inla(formula, ...,
control.predictor = list(A = A))
109 / 140
A-matrix in the linear predictor (III)
I Can really simplify model-formulations
I Duplicate to some extent the “copy” feature
I Really useful for some models; the A-matrix need not to be asquare matrix...
110 / 140
Feature: remote computing
For large/huge models, its more convenient to run thecomputations on the remote (Linux/Mac) computational server
inla(...., inla.call="remote")
using ssh (and Cygwin on windows).
111 / 140
Control statements
The control.xxx statements control various parts of the INLAprogram
I control.predictorI A — The ”A matrix”or ”Observational Matrix”linking the
latent field to the data.
I control.modeI x,theta, result — Gives modes to INLA.I restart = TRUE — Tells INLA to try to improve on the
supplied mode
I control.computeI dic, mlik, cpo — Compute measures of fit.
I control.inlaI strategy and int.strategy contain useful advanced
features.
Various other—see help!
112 / 140
Outline
INLA implementation
R-INLA - Model specification
Some examples
Model evaluation
Controlling hyperparameters and priors
Some more advanced features
More examples
Extras
113 / 140
Space-varying regression
Number of (insurance-type) losses Nkt in 431municipalities/regions of Norway in relation to one weathercovariate Wkt .The likelihood is
Nkt ∼ Poisson(Akt pkt); k = 1, . . . , 431 t = 1, . . . , 10
The model for log pkt is:
log pkt = β0 + βk Wkt
where βk is the regression coefficients for each municipality.
114 / 140
Borrow strength..
Few losses is in each region; high variability in the estimates.
Borrow strength, by letting {β1, . . . , β431} to be smooth in space:
{β1, . . . , β431} ∼ CAR(τβ)
115 / 140
Borrow strength..
Few losses is in each region; high variability in the estimates.
Borrow strength, by letting {β1, . . . , β431} to be smooth in space:
{β1, . . . , β431} ∼ CAR(τβ)
115 / 140
The data set:
y region W
1 0 1 0.4
2 0 1 0.4
10 0 1 0.4
11 1 2 0.2
12 0 2 0.2
20 0 2 0.2
116 / 140
Second argument in f() is the weight which defaults to 1
ηi = ...+ wi fi + ...
is represented as
f(i, w, ...)
No need for sum-to-zero constraint!
norway = read.table("norway.dat", header=TRUE)
formula = y ~ 1 + f(region, W, model="besag",
graph.file="norway.graph",
constr=FALSE)
result = inla(formula, family="poisson", data=norway)
117 / 140
Survival models
patient time event age sex1 8,16 1,1 28,28 02 23,13 1,0 48,48 13 22,18 1,1 32,32 0
I Times of infection from the time of insertion of catheter on 38kidney patients using portable dialysis equipment.
I 2 observation for each patient (38 patients).
I Each time can be an event (infection) or a censoring (noinfection)
118 / 140
The Kidney data
The Kidney data frame
time event age sex ID
8 1 28 0 1
16 1 28 0 1
23 1 48 1 2
13 0 48 1 2
22 1 32 0 3
28 1 32 0 3
119 / 140
data(Kidney)
formula = inla.surv(time,event) ~ age + sex + f(ID,model="iid")
result1 = inla(formula, family="coxph", data=Kidney)
result2 = inla(formula, family="weibull", data=Kidney)
result3 = inla(formula, family="exponential", data=Kidney)
120 / 140
Outline
INLA implementation
R-INLA - Model specification
Some examples
Model evaluation
Controlling hyperparameters and priors
Some more advanced features
More examples
Extras
121 / 140
A toy-example using copy
State-space modelyt = xt + vt
xt = 2xt−1 − xt−2 + wt
Rewrite this asyt = xt + vt
0 = xt − 2xt−1 + xt−2 + wt
and implement this as two families
1. Observations yt with precision Prec(vt)
2. Observations 0 with precision Prec(wt), or Prec=HIGH.
122 / 140
A toy-example using copy
State-space modelyt = xt + vt
xt = 2xt−1 − xt−2 + wt
Rewrite this asyt = xt + vt
0 = xt − 2xt−1 + xt−2 + wt
and implement this as two families
1. Observations yt with precision Prec(vt)
2. Observations 0 with precision Prec(wt), or Prec=HIGH.
122 / 140
n = 100
m = n-2
y = sin((1:n)*0.2) + rnorm(n, sd=0.1)
formula = Y ~ f(i, model="iid", initial=-10, fixed=TRUE) +
f(j, w, copy="i") + f(k, copy="i") +
f(l, model ="iid") -1
Y = matrix(NA, n+m, 2)
Y[1:n, 1] = y
Y[1:m + n, 2] = 0
i = c(1:n, 3:n) # x_t
j = c(rep(NA,n), 3:n -1) # x_t-1
w = c(rep(NA,n), rep(-2,m)) # weights for j
k = c(rep(NA,n), 3:n -2) # x_t-2
l = c(rep(NA,n), 1:m) # v_t
r = inla(formula, data = data.frame(i,j,w,k,l,Y),
family = c("gaussian", "gaussian"),
control.data = list(list(), list(initial=10, fixed=TRUE)))
123 / 140
Stochastic Volatility model
0 200 400 600 800 1000
−2
02
4
Log of the daily difference of the pound-dollar exchange rate fromOctober 1st, 1981, to June 28th, 1985.
124 / 140
Stochastic Volatility model
Simple model
xt | x1, . . . , xt−1, τ, φ ∼ N (φxt−1, 1/τ)
where |φ| < 1 to ensure a stationary process.
Observations are taken to be
yt | x1, . . . , xt , µ ∼ N (0, exp(µ+ xt))
125 / 140
Results
Using just the first 50 data-points only, which makes the problemmuch harder.
126 / 140
Results
−10 −5 0 5 10 15 20
0.00
0.02
0.04
0.06
0.08
0.10
ν = logit(2φ− 1)
126 / 140
Results
0 2 4 6
0.00
0.05
0.10
0.15
0.20
0.25
0.30
log(κx)
126 / 140
Using the full dataset
0 200 400 600 800 1000
−20
24
The Pound-Dollar data.
127 / 140
Using the full dataset
0 200 400 600 800
−3−2
−10
12
x$V1
x$V2
Mean of xt + µ
128 / 140
Using the full dataset
0 100 200 300 400 500
0.000
0.005
0.010
0.015
0.020
convert.dens(xx, yy, FUN = exp)$x
conver
t.dens(
xx, yy
, FUN =
exp)$
y
The posterior marginal for the precision.
129 / 140
Using the full dataset
0.70 0.75 0.80 0.85 0.90 0.95 1.00
010
2030
40
convert.dens(xx, yy, FUN = phi.trans)$x
conver
t.dens(
xx, yy
, FUN =
phi.tra
ns)$y
The posterior marginal for the lag-1 correlation.
130 / 140
Using the full dataset
0 200 400 600 800 1000
−3−2
−10
12
x$V1
x$V2
Predictions for µ+ xt+k
131 / 140
New data-model: Student-tν
Now extend the model to use Student-tν distribution
yt | x1, . . . , xt ∼ exp(µ/2 + xt/2)× Student-tν/√ν/(ν − 2)
132 / 140
Student-tν
0 20 40 60 80 100
0.00
0.02
0.04
0.06
0.08
convert.dens(xx, yy, FUN = dof.trans)$x
conver
t.dens(
xx, yy
, FUN =
dof.tra
ns)$y
Posterior marginal for ν.
133 / 140
Student-tν
0 200 400 600 800 1000
−3−2
−10
12
x$V1
x$V2
Predictions
134 / 140
Student-tν
0 200 400 600 800 1000
−3−2
−10
12
x$V1
x$V2
Comparing predictions with Student−tν and Gaussian
135 / 140
Student-tν
However,I No support for Student-tν in the data
I Bayes-factorI Deviance Information Criteria
136 / 140
Disease mapping: The BYM-model
I Data yi ∼ Poisson(Eiexp(ηi ))
I Log-relative risk ηi = ui + viI Structured component u
I Unstructured component v
I Log-precisions log κu and log κv
−0.63
−0.37
−0.1
0.17
0.44
0.71
0.98
I A hard case: Insulin Dependent Diabetes Mellitus in 366districts of Sardinia. Few counts.
I dim(θ) = 2.
137 / 140
Marginals for θ|y
138 / 140
Marginals for θ|y
138 / 140
Marginals for xi |y
139 / 140
THANK YOU
140 / 140