correlated data: linear mixed models with random intercepts · the random intercept model is a lmm...
TRANSCRIPT
Correlated Data: Linear Mixed Models withRandom Intercepts
1
Mixed Effects Models
I This lecture introduces linear mixed effects models.
I Linear mixed models are a type of regression model,which generalise the linear regression model.
I The generalisation allows us to relax the assumption oflinear regression, that the errors are independent andtherefore uncorrelated.
2
Outline
I Recap of multivariate linear regression.I Motivating example: growth curves: why linear
regression isn’t the best model.I General linear mixed model formulation.I The random intercept model.I Prediction using BLUP.
3
Recap of Linear Regression
I The linear regression model has equation:
EY = α+β1X1 +β2X2 + ...βpXp
I The statistical model for observation i is:
Yi = EYi + εi
= α+β1Xi1 +β2Xi2 + ...βpXip + εi
with εi ∼ N(0,σ2).
I An assumption of linear regression is that the observederrors are independent and so uncorrelated.
I That is, for any pair of observations i1, i2:
Cor(εi1 , εi2) = 0.
4
Motivating Example
I Potthoff and Roy (1964) Biometrika 51:313-326
I Measured distance from the pituitary gland (at the base ofthe brain) to the pterygomaxillary fissure (point in skullbone) for a sample of 16 males and 11 females.
I Measurements taken from X-rays every 2 years, betweenthe ages of 8 and 14.
I We are interested in modelling growth i.e. in therelationship between distance and age.
5
Growth Curve Example
I The data are available in the R package nlme.
I nlme also contains routines that allow us to fit linearmixed effects models
I This package has been superseded and we will insteaduse "lme4" for analysis
I To install the packages (just once):> install.package("nlme")> install.package("lme4")
I To load a package (each time we start R):> library(nlme)> library(lme4)
6
Growth Curve Example
I The data are in the dataframe Orthodont.
I List the variables in the dataframe:> names(Orthodont)[1] "distance" "age" "Subject" "Sex"
I The variables are as follows
Variable Description R name
distance (in mm) from pituitary gland distance
to the pterygomaxillary fissure
age of subject at time of measurement age
Factor variable identifying subject Subject
Factor variable identifying sex Sex
of subject (Male/Female)
7
Growth Curve Example
I Look at the first few rows of the dataframe:> head(Orthodont)distance age Subject Sex
1 26.0 8 M01 Male2 25.0 10 M01 Male3 29.0 12 M01 Male4 31.0 14 M01 Male5 21.5 8 M02 Male6 22.5 10 M02 Male
I Note: there are multiple observations on each subject.
I The data have a natural grouping structure. There is agroup of observations (age and sex measurements) foreach study subject.
I There are some variables which vary at the group levelbut not the observation level e.g. the sex of a subject isfixed across age and distance measurements.
8
Growth Curve Example
I Because adolescent growth patterns depend strongly onsex to simplify we will restrict analysis to females.
I Create a new dataframe, just containing the observationson female subjects:OrthoFem=Orthodont[Orthodont$Sex=="Female",]
9
Growth Curve Example: Linear Regression
I Suppose we fit a linear regression, regressing distance onage:> lm.obj<-lm(distance~age, data=OrthoFem)> summary(lm.obj)
Call:lm(formula = distance ~ age, data = OrthoFem)
Residuals:Min 1Q Median 3Q Max
-4.7091 -1.1784 -0.1068 1.5080 4.8727
Coefficients:Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.3727 1.6378 10.608 1.87e-13 ***age 0.4795 0.1459 3.287 0.00205 **---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I Each year of age corresponds to about 0.5mm of growth.
10
Growth Curve Example: Linear Regression
I Plot distance against age:> plot(distance~age, data=OrthoFem)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
8 9 10 11 12 13 14
1820
2224
2628
age
dist
ance
11
Growth Curve Example: Linear Regression
I Add a line of best fit from regression of distance on age:> abline(lm.obj$coef)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
8 9 10 11 12 13 14
1820
2224
2628
age
dist
ance
12
Assumptions about Errors
I The key assumption of linear regression is that the errors
εi = Yi − EYi
= Yi −α−β1Xi1 −β2Xi2... −βpXip
are independent and identically distributed.I Is this assumption realistic for the skull growth data?I One way to examine the errors is to look at the residuals
which are estimates of the errors:
εi = Yi − Yi
= Yi − β1Xi1 − β2Xi2... − βpXip
I If the assumptions of linear regression hold errors shouldnot be correlated with the value of the outcome Yi.
13
Assumptions about Errors
I Plot the residuals εi against the fitted values Yi> plot(lm.obj$fitted,lm.obj$residuals,
xlab="Fitted Value", ylab="Residual")
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
● ●
●
●
● ●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
21.5 22.0 22.5 23.0 23.5 24.0
−4
−2
02
4
Fitted Value
Res
idua
l
14
Correlated Errors
I Each coloured line corresponds to a study subject,residuals from the same subject are correlated.
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
● ●
●
●
● ●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
21.5 22.0 22.5 23.0 23.5 24.0
−4
−2
02
4
Fitted Value
Res
idua
l
15
Growth Plots for the 11 Females in the Sample
> plot(OrthoFem)
age
dist
ance
123456789
101112131415161718
8 9 10 11 12 13 14
●● ●
●
F10
●
●●
●
F09
8 9 10 11 12 13 14
●
● ●
●
F06
●
●●
●
F01123456789
101112131415161718
●
●●
●
F05
●
●●
●
F07
●●
●
●
F02
● ●●
●
F08123456789
101112131415161718
●
●●
●
F03
8 9 10 11 12 13 14
●
●●
●
F04
●●
● ●
F11
16
Correlated Errors
I Overlay the line of best fit on each curve:
age
dist
ance
123456789
101112131415161718
8 9 10 11 12 13 14
F10 F09
8 9 10 11 12 13 14
F06 F01123456789
101112131415161718
F05 F07 F02 F08123456789
101112131415161718
F03
8 9 10 11 12 13 14
F04 F11
17
Correlated Errors
I Why are the errors from the same subjects correlated?
I Because large 8 year olds are large 14 year olds etc.
I How can we remove this correlation?
I What if we were to allow the intercept to vary acrosssubjects?
18
Growth Plots for the 11 Females in the Sample
I Fix the slope (growth rate) across subjects but allowdifferent intercepts (red lines):
age
dist
ance
123456789
101112131415161718
8 9 10 11 12 13 14
F10 F09
8 9 10 11 12 13 14
F06 F01123456789
101112131415161718
F05 F07 F02 F08123456789
101112131415161718
F03
8 9 10 11 12 13 14
F04 F11
19
The Linear Mixed Model
I Correlated errors violate the assumption of the linearmodel.
I A model that accounts for the error correlation withinstudy subjects should provide a better fit.
I The linear mixed model (LMM) is a generalisation oflinear regression, which allows us to model correlatederrors.
20
The Linear Mixed Model
I Similar to multivariate linear regression.
Yi =∑
j βjXij +∑q
k ukZik + εi
I The q new terms u1, ... uq are called the random effects.
I Like the error term εi, the uk are random terms.
I The the noise terms εi, must be independent as inordinary linear regression.
I The random effects terms uk can have any correlationstructure we choose.
I The Zik are predictor variables which depend on thestructure of the experiment. (We will see examples.)
21
The Random Intercept Model
I A simple hierarchical LMM
I Applies when the data are grouped and the interceptvaries between groups (as in the growth example).
I We need two indices for the observations, one for eachlevel of the hierarchy.
I i indexes observations while k indexes the group so k(i) isthe group of the ith observation:
Yi = α+
p∑j
βjXij +uk(i)+εi, uk ∼ N(0,ψ2), εi ∼ N(0,σ2)
This model allows the intercept to shift randomly betweenthe groups.
22
The Random Intercept Model
I A simple hierarchical LMM
I Applies when the data are grouped and the interceptvaries between groups (as in the growth example).
I We need two indices for the observations, one for eachlevel of the hierarchy.
I i indexes observations while k indexes the group so k(i) isthe group of the ith observation:
Yi = α+
p∑j
βjXij +uk(i)+εi, uk ∼ N(0,ψ2), εi ∼ N(0,σ2)
This model allows the intercept to shift randomly betweenthe groups.
22
The Random Intercept Model is a LMM
I The random intercept model:
Yi = α+
p∑j
βjXij +uk(i)+εi, uk ∼ N(0,ψ2), εi ∼ N(0,σ2)
I The general LMM model:
Yi =
p∑j
βjXij +
q∑k
ukZik + εi
I We can get the random intercept model as a specialisationof the general LMM model by letting q be the number ofgroups and setting Zik = 1 if i is in group k, Zik = 0otherwise.
23
Fitting The Random Intercept Model to the GrowthData
We use the R function lmer which stands for linear-mixedeffects regression. It is in the lme4 package.
> lmer(distance~age + (1 | Subject), data=OrthoFem)
The syntax for lmer is similar to that for lm. The first twoparameters are the same:
1 a formula giving the fixed effects: distance~age2 a parameter specifying the data-frame: OrthoFem
The bit in brackets specifies the random effects:I (1 | Subject) indicates a random interceptI |Subject indicates that there should be a different
random effect for each level of the Subject variable.
24
Fitting The Random Intercept Model to the GrowthData
> rand.int.obj<-lmer(distance~age + (1 | Subject), data=OrthoFem)> rand.int.objLinear mixed model fit by REML [’lmerMod’]Formula: distance ~ age + (1 | Subject)
Data: OrthoFemREML criterion at convergence: 141.2183Random effects:Groups Name Std.Dev.Subject (Intercept) 2.068Residual 0.780
Number of obs: 44, groups: Subject, 11Fixed Effects:(Intercept) age
17.3727 0.4795
25
Fitting The Random Intercept Model to the GrowthData
Like lm, lmer reports an estimate of the fixed effectparameters β:
(Intercept) age17.3727 0.4795
It also reports estimates of the variance componentsparameters ψ and σ:
Random effects:Groups Name Std.Dev.Subject (Intercept) 2.068Residual 0.780
ψ = 2.06, σ = 0.78 so that 86% = ψ2/(ψ2 + σ2)× 100% of theresidual variation is explained by the random intercept terms.
26
Confidence Intervals
I The confint command reports confidence intervals:> confint(rand.int.obj)Computing profile confidence intervals ...
2.5 % 97.5 %.sig01 1.3356536 3.2373519.sigma 0.6143927 0.9985162(Intercept) 15.6891967 19.0562579age 0.3750181 0.5840728
I The bottom two lines give the usual 95% confidenceintervals for the fixed intercept and random effects.
I The top two lines are approximate confidence intervals forthe variance components parameters ψ and σ
27
Best Linear Unbiased Predictor (BLUP)
I The random effects u are not model parameters butunobserved latent variables.
I Consequently, they cannot be estimated by maximumlikelihood, instead a method called BLUP is used.
I BLUP estimates of the random intercepts can be obtainedwith the random.effects command:random.effects(rand.int.obj)
(Intercept)F10 -4.00532866F09 -1.47044943F06 -1.47044943F01 -1.22903236F05 -0.02194701F07 0.34017860F02 0.34017860F08 0.70230420F03 1.06442981
28
Predicting Yi for New Observations
I As for other regression models, we can use the predictfunction to predict the Yi for a new observation, for whichwe know the predictor values of the fixed effects Xi1, ...Xip.
I There are two cases:1 We want to predict a new value of Yi for a new observation
in a specified group (i.e. subject in the growth example)
2 We want to predict a new value of Yi for a newobservation in a new group, not so far observed. In thiscase, the intercept for the new group must be predicted,before we predict the Yi.
29
Predicting Yi for a New Observation within a Group
I For an example of the first case, predict the value of thedistance for a new measurement on subject F10 in thegrowth example, taken when she is 16:predict(rand.int.obj, newdata=data.frame(age=16, Subject="F10"))
121.04013
30
Predicting Yi for New Observation in New Group
I For an example of the second case, predict the value of thedistance for a new measurement on a new subject, takenwhen she is 16:> predict(rand.int.obj, allow.new.levels=TRUE,
newdata=data.frame(age=16, Subject="F12"))1
25.04545
I allow.new.levels=TRUE indicates that we need tointroduce a new group (Subject) which we have called F12and a new observation within the group.
I Note that the within Subject F10 prediction for distance islower than the prediction for distance based on a newlypredicted subject. This makes sense as F10 is smaller thenan average study subject (c.f. prediction curves).
31
Multi-Level Modelling Example: Split-PlotExperiment
I An agricultural field experiment.I 3 varieties of oats: Golden Rain, Victory and Marvellous.I The aim is to investigate the relationship between yield,
variety and the dosage of a nitrogen fertiliser.32
Multi-Level Modelling Example: Split-PlotExperiment
The Field
Divided into 6 Blocks
A Block
Divided into 3 plots, 1 for each Variety
Experimental Design:
I The field is divided into 6replicate blocks.
I Each block is divided into3 plots and a variety ofoats is planted in each.
I Each plot is divided intofour sub-plots which arefertilised with differentdoses of nitrogen.
I The observations can begrouped at two levels: byblock and by plot withinblock.33
The Data for Block I
> Oats[1:12,]Block Variety nitro yield
1 I Victory 0.0 1112 I Victory 0.2 1303 I Victory 0.4 1574 I Victory 0.6 1745 I Golden Rain 0.0 1176 I Golden Rain 0.2 1147 I Golden Rain 0.4 1618 I Golden Rain 0.6 1419 I Marvellous 0.0 10510 I Marvellous 0.2 14011 I Marvellous 0.4 11812 I Marvellous 0.6 156
34
A Multi-Level Random Intercept Model
I The hierarchical experimental design suggests a modelwith two levels of random effect.
I Let Yi be the observed yield of oats from the ith subplotfrom the k(i)th plot in the l(k)th block. We use the model
Yi = nitroiβ1 + gldk(i)β2 + vick(i)β3 + marvk(i)β4 + ul(k(i)) + vk(i) + εi
ul ∼ N(0,ψ21), vk ∼ N(0,ψ2
2), εi ∼ N(0,σ2)
35
A Multi-Level Random Intercept Model
The hierarchical experimental design suggests a model withtwo levels of random effect.
Let Yi be the observed yield of oats from the ith subplot fromthe k(i)th plot in the l(k)th block. We use the model
Yi = nitroiβ1 + gldk(i)β2 + vick(i)β3 + marvk(i)β4 + ul(k(i)) + vk(i) + εi
ul ∼ N(0,ψ21), vk ∼ N(0,ψ2
2), εi ∼ N(0,σ2)
The fixed effectsI nitroi is the fertiliser dose for the ith subplot. Question:
does fertiliser affect yield i.e. can we reject β1 = 0?I gldk, vick, marvk are dummy variables for the variety
planted in the kth plot. Question: does variety affect yieldi.e. can we reject β2 = β3 = β4?
36
A Multi-Level Random Intercept Model
The hierarchical experimental design suggests a model withtwo levels of random effect.
Let Yi be the observed yield of oats from the ith subplot fromthe k(i)th plot in the l(k)th block. We use the model
Yi = nitroiβ1 + gldk(i)β2 + vick(i)β3 + marvk(i)β4 + ul(k(i)) + vk(i) + εi
ul ∼ N(0,ψ21), vk ∼ N(0,ψ2
2), εi ∼ N(0,σ2)
The random effects, two levels of random intercept:I ul allows for random yield fluctuations between blocks.I vk allows for random yield fluctuations between plots
(within each block).These random fluctuations in yields might be due for exampleto different soil or drainage conditions in differentblocks/plots
37
Fitting the Multi-Level Model
We can fit the multilevel model in R using the lme command
> Mod2Levels<-lmer(yield~nitro+Variety+(1|Block/Variety), data=Oats)
Here Block/Variety specifies random effects for blocks andrandom effects for plots within blocks. (We can use Varietyas a synonym for plot here because within a block each plotcorresponds to one and only one variety)
38
Is the Multi-Level Model Parsimonious?
The multilevel model has random effects 18 + 6 = 24 randomeffects. Would it be as effective to use simpler model?A model with a single set of random effects at the Block level:
> Mod1Level<-lmer(yield~nitro+Variety +(1|Block),data=Oats)
An ordinary linear model with no random effects:
> Mod0Levels=lm(yield~nitro+Variety, data=Oats)
39
Is the Multi-Level Model Parsimonious?
Compare using AIC:
> anova(Mod2Levels, Mod1Level, Mod0Levels)refitting model(s) with ML (instead of REML)Data: OatsModels:Mod0Levels: yield ~ nitro + VarietyMod1Level: yield ~ nitro + Variety + (1 | Block)Mod2Levels: yield ~ nitro + Variety + (1 | Block/Variety)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)Mod0Levels 5 650.23 661.61 -320.11 640.23Mod1Level 6 620.80 634.46 -304.40 608.80 31.428 1 2.069e-08 ***Mod2Levels 7 615.11 631.04 -300.55 601.11 7.690 1 0.005553 **---
The two level model has the lowest AIC.
40
Comparing Models with Different Fixed Effects
> Mod2Levels_no_nitro<-lmer(yield~Variety+(1|Block/Variety), data=Oats)
> Mod2Levels_no_variety<-lmer(yield~nitro+(1|Block/Variety), data=Oats)
> Mod2Levels_no_nitro_no_variety<-lmer(yield~+(1|Block/Variety), data=Oats)
41
Comparing Models with Different Fixed Effects
> anova(Mod2Levels_no_nitro_no_variety, Mod2Levels_no_variety, Mod2Levels_no_nitro, Mod2Levels)refitting model(s) with ML (instead of REML)Data: OatsModels:Mod2Levels_no_nitro_no_variety: yield ~ +(1 | Block/Variety)Mod2Levels_no_variety: yield ~ nitro + (1 | Block/Variety)Mod2Levels_no_nitro: yield ~ Variety + (1 | Block/Variety)Mod2Levels: yield ~ nitro + Variety + (1 | Block/Variety)
Df AIC BIC logLik devianceMod2Levels_no_nitro_no_variety 4 675.48 684.59 -333.74 667.48Mod2Levels_no_variety 5 614.23 625.61 -302.11 604.23Mod2Levels_no_nitro 6 676.37 690.03 -332.19 664.37Mod2Levels 7 615.11 631.04 -300.55 601.11
I There is good evidence that nitrogen fertiliser has aneffect on yield
I There does not seem to be enough evidence to imply thatany one variety produces different yields from the othertwo.
42
Summary
I We’ve had an initial look at linear mixed effects models(LMMs)
I These allow us to model data with correlated errors
I We’ve looked at the specific case of random interceptmodels
I These are models were the error correlation is induced bya grouping of observations
I Within each grouping there is a slightly different linearregression intercept
43