introduction to mixed models in r - school of statistics...

Introduction to Mixed Models in R

Galin Jones

School of StatisticsUniversity of Minnesota

http://www.stat.umn.edu/∼galin

March 2011

Second in a Series

• Sponsored by Quantitative Methods Collaborative.

• Previous workshop “An Introduction to R” was given on 3/7by Professor Sanford Weisberg.

• Upcoming talk on 4/11 by Chris Winship of HarvardUniversity on causal inference.

• Brief review of first workshop.

• Definition of mixed models and why they may be useful.

• Using mixed models in R through two simple case studies.

Review

• R is “free” software and can be downloaded athttp://cran.r-project.org

• A package is a collection of functions designed for specifictasks. For example the package lme4 fits many mixed models.

• When you install R on your computer you get the “base”distribution. This includes many useful packages but others,such as lme4, you need to install separately.

• The R search engine is extremely usefulhttp://www.rseek.org

• Getting data into R (or any other software package) can bechallenging.

• Many R packages include built-in data sets and we will usetwo of these today.

Mixed Models

Mixed models are a large and complex topic, we will only justbarely get started with them today.

There are many varieties of mixed models:

• Linear mixed models (LMM)

• Nonlinear mixed models (NLM)

• Generalized linear mixed models (GLMM)

Our focus will be on linear mixed models. Much more discussion ofthis material can be found in the following books.

Extending the Linear Model with R by Julian Faraway

Mixed-Effects Models in S and S-PLUS by Jose Pinheiro andDouglas Bates

Factors, Effects and Treatments

Suppose apple slices are treated with five preservative compounds(A, B, C, D, E) with the goal of extending shelf life.

Response: Shelf life

Factor: Preservative

Treatments = Levels of a factor: A, B, C, D, E

Effect: Impact of compound on shelf life

µ=population mean shelf lifeµA= population mean shelf life with treatment A

effect of A = µA − µ

Fixed or Random?

Factor effects are either fixed or random.

• Fixed: The levels in the study represent all levels of interest

• Random: The levels in the study represent only a sample ofthe levels of interest.

Mixed models have both fixed and random effects.

In our example, preservative is fixed since A, B, C, D, E are theonly levels of interest.

Blocking

Block:

• A group of units formed so that units within the group are ashomogeneous as possible.

• Reduces the effects of variation among experimental units.

• Treatmets are randomly assigned to blocks.

Lets add to the example: Suppose 10 individual fruit are randomlychosen from a population of fruit and the 5 preservatives arerandomly assigned to 5 portions of fruit.

• Preservative is a fixed effect.

• Fruit is a (random) block effect.

Mixed models

Mixed models contain both fixed and random effects This hasseveral ramifications:

• Using random effects broadens the scope of inference. That is,inferences can be made on a statistical basis to the populationfrom which the levels of the random factor have been drawn.

• Naturally incorporates dependence in the model. Observationsthat share the same level of the random effects are beingmodeled as correlated.

• Using random factors often gives more accurate estimates.

• Sophisticated estimation and fitting methods must be used.

Rail Data

The “Rail” data is a built-in R data set. It can be found in theMEMSS and the NLME packages.

The commands

> library(nlme)

> data(Rail)

> ?Rail

will make the data available to us and produce a description.

Rail Data

Evaluation of Stress in Railway Rails

Description

The Rail data frame has 18 rows and 2 columns.

Format

This data frame contains the following columns:

an ordered factor identifying the rail on which

the measurement was made.

travel

a numeric vector giving the travel time for ultrasonic

head-waves in the rail (nanoseconds). The value given

is the original travel time minus 36,100 nanoseconds.

Rail Data

Summary

• Six rails were chosen from a group of rails.

• Each rail was tested 3 times.

• Measured the time it takes an ultrasonic wave to travel thelength of a rail.

• Rail is a factor (fixed or random?) and travel is the response.

Rail Data

> Rail

Grouped Data: travel ~ 1 | Rail

Rail travel

1 1 55

2 1 53

3 1 54

4 2 26

5 2 37

15 5 50

16 6 80

17 6 85

18 6 83

Rail Data

> summary(Rail)

Rail travel

2:3 Min. : 26.00

5:3 1st Qu.: 50.25

1:3 Median : 66.50

6:3 Mean : 66.50

3:3 3rd Qu.: 85.00

4:3 Max. :100.00

> with(Rail, tapply(travel, Rail, mean))

2 5 1 6 3 4

31.66667 50.00000 54.00000 82.66667 84.66667 96.00000

> pdf(file="RailPlot1.pdf")

> with(Rail, plot(travel, Rail, xlab="Travel time"))

> dev.off()

Rail Data

●●●

● ●●

40 60 80 100

Travel time

Rail Data

• Between-rail variability is greater than within-rail variability.

• Within-rail variability is not constant.

• Mean travel time appears different for some rails2 5 1 6 3 4

We clearly need to account for the classification factor (Rail) in theanalysis.

Rail DataRail as a fixed effect

yij = βi + eij i = 1, . . . , 6 j = 1, 2, 3

• yij is the observed travel time for observation j on rail i .

• βi is the population mean travel time of rail i

• eij are independent and identically normally distributed withmean 0 and variance σ2 or

eijiid∼ N(0, σ2) .

This asumption means yijind∼ N(βi , σ

> #Rail as a fixed effect

> r1.lm<-lm(travel ~ Rail - 1, data=Rail)

Rail Data

> r1.lm

lm(formula = travel ~ Rail - 1, data = Rail)

Coefficients:

Rail2 Rail5 Rail1 Rail6 Rail3 Rail4

31.67 50.00 54.00 82.67 84.67 96.00

This means that the oredered estimates of the βi are

β2 = 31.67 , . . . , β4 = 96.00

Rail Data

> summary(r1.lm)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

Rail2 31.667 2.321 13.64 1.15e-08 ***

Rail4 96.000 2.321 41.35 2.59e-14 ***

Residual standard error: 4.021 on 12 degrees of freedom

Multiple R-squared: 0.9978,Adjusted R-squared: 0.9967

F-statistic: 916.6 on 6 and 12 DF, p-value: 2.971e-15

Note that the F-statistic and p-value are testing for any differencesbetween the rail effects. Also, the estimate of σ2 is 4.021 on 12degrees of freedom.

Rail Data

> with(Rail, bwplot(Rail ~ residuals(r1.lm)))

residuals(r1.lm)

−5 0 5

Rail Data

The fixed effects model gives a good summary of the data but themain interest is in the population of rails. Also, we don’t believethe model is accurate since the 3 observations on each rail areclearly not independent.

Rail as a random effect

yij = β + bi + eij

• yij is the observed travel time for observation j on rail i .

• β is the population mean travel time

• bi is the deviation from β for the ith rail

Rail Data

• eij is the deviation for observation j on rail i from the meantravel time for rail i

biiid∼ N(0, σ2b) eij

iid∼ N(0, σ2) .

• Our assumptions imply that observations on the same rail arecorrelated, in fact,

corr =σ2b

σ2b + σ2.

Rail Data

> #Rail as random effect

> r2.lme<-lmer(travel ~ 1 + (1 | Rail),

REML=FALSE, data=Rail)

This notation takes some getting used to. Specifically,

(1 | Rail)

means that there is a single random factor which is constant withineach level and its levels are given by the grouping variable Rail.

Rail Data

> summary(r2.lme)

Linear mixed model fit by maximum likelihood

Formula: travel ~ 1 + (1 | Rail)

Random effects:

Groups Name Variance Std.Dev.

Rail (Intercept) 511.861 22.6243

Residual 16.167 4.0208

Number of obs: 18, groups: Rail, 6

Our best guess at σ2 and σ2b are

σ2b = 511.861 σ2 = 16.167

and the estimated correlation between observations on a rail is

corr =511.861

511.861 + 16.167= 0.969 .

Rail Data

Fixed effects:

Estimate Std. Error t value

(Intercept) 66.500 9.285 7.162

The estimate of β is β = 66.5. For a new rail drawn for thepopulation this is our best guess at travel time while the predictedvalues for a new observation on each of the same 6 rails are

> fixef(r2.lme)+ranef(r2.lme)$Rail

(Intercept)

2 32.02957

5 50.17190

1 54.13023

6 82.49824

3 84.47740

4 95.69266

Rail Data

> qqnorm(resid(r2.lme), main="")

−2 −1 0 1 2

−6−4

Theoretical Quantiles

Rail Data

> plot(fitted(r2.lme), resid(r2.lme), xlab="Fitted",

ylab="Residuals")

30 40 50 60 70 80 90

−6−4

Fitted

Multilevel Models

Multilevel models

• are special cases of mixed models,

• are useful for data with a hierarchical structure, and

• can be implemented in a variety of R packages.

Joint Schools Project

> library(faraway)

> data(jsp)

> ?jsp

Description

Example Dataset from "Practical Regression and Anova"

Format

See for yourself

Source

See Reference

References

Reference details may be found in "Practical Regression

and Anova" by Julian Faraway

Joint Schools Project> str(jsp)

’data.frame’: 3236 obs. of 9 variables:

$ school : Factor w/ 49 levels "1","2","3","4",..: 1 1

1 1 1 1 1 1 1 1 ...

$ class : Factor w/ 4 levels "1","2","3","4": 1 1 1 1

1 1 1 1 1 1 ...

$ gender : Factor w/ 2 levels "boy","girl": 2 2 2 1 1

1 1 1 1 1 ...

$ social : Factor w/ 9 levels "1","2","3","4",..: 9 9

9 2 2 2 2 2 9 9 ...

$ raven : num 23 23 23 15 15 22 22 22 14 14 ...

$ id : Factor w/ 1192 levels "1","2","3","4",..: 1

1 1 2 2 3 3 3 4 4 ...

$ english: num 72 80 39 7 17 88 89 83 12 25 ...

$ math : num 23 24 23 14 11 36 32 39 24 26 ...

$ year : num 0 1 2 0 1 0 1 2 0 1 ...

> summary(jsp)

school class gender social raven

48 : 206 1:1949 boy :1551 4 :1225 Min. : 4.00

33 : 131 2: 987 girl:1685 9 : 484 1st Qu.:21.00

4 : 131 3: 169 2 : 424 Median :25.00

31 : 107 4: 131 5 : 288 Mean :25.13

47 : 102 3 : 270 3rd Qu.:29.00

50 : 101 6 : 221 Max. :36.00

(Other):2458 (Other): 324

id english math year

1 : 3 Min. : 0.00 Min. : 1.00 Min. :0.0000

3 : 3 1st Qu.:31.00 1st Qu.:22.00 1st Qu.:0.0000

4 : 3 Median :54.00 Median :28.00 Median :1.0000

6 : 3 Mean :52.49 Mean :26.66 Mean :0.9379

7 : 3 3rd Qu.:75.00 3rd Qu.:33.00 3rd Qu.:2.0000

8 : 3 Max. :98.00 Max. :40.00 Max. :2.0000

(Other):3218

#Subset data to focus on Year=2

> jsp.year2<-jsp[jsp$year==2,]

> plot(jitter(math) ~ jitter(raven), xlab="Raven Score",

ylab="Math Score", data=jsp.year2)

> boxplot(math ~ social, xlab="Social Class",

ylab="Math Score", data=jsp.year2)

> boxplot(math ~ gender, xlab="Gender", ylab="Math Score",

data=jsp.year2)

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

●● ●

●●●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

● ●●

●●

● ● ●

●●

● ●

●●

● ●

●●

● ●

●●

● ● ●

●●

● ●

● ●●

● ●

●●●

● ●●

● ●

●●

5 10 15 20 25 30 35

Raven Score

There is clearly correlation between math score and raven score.

●●●

1 2 3 4 5 6 7 8 9

Social Class

There are differences in math scores between the levels of socialclass.

●●

●●●

boy girl

Gender

Math scores are not different between the levels of gender.

Center raven since we would otherwise be comparing to zero.

> jsp.y2$ctrraven<-jsp.y2$raven-mean(jsp.y2$raven)

> jsp1.lme<-lmer(math ~ ctrraven*social*gender +

(1 | school) + (1 | school:class), data=jsp.y2)

> qqnorm(resid(jsp1.lme),main="")

> plot(fitted(jsp1.lme) ~ resid(jsp1.lme), xlab="Fitted",

ylab="Residuals")

●●

●●●

●●

●●●

●●

●●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●●●

●●

−3 −2 −1 0 1 2 3

There isn’t much of concern here.

●●

●●●

●●

●●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●●

●●

●●●

●●

● ●

●●●

● ●

●●

●●●

●●

● ●

●●

●●●

● ●

●●

●● ●

●●

● ●

●●

● ●

●●

● ●●

●●

●●●

●●

● ●

●●

●●●

●●

−20 −15 −10 −5 0 5 10 15

Fitted

There is some evidence of non-constant variance. Transformation?

> anova(jsp1.lme)

Analysis of Variance Table

Df Sum Sq Mean Sq F value

ctrraven 1 10218.0 10218.0 374.4037

social 8 615.7 77.0 2.8201

gender 1 21.6 21.6 0.7917

ctrraven:social 8 577.4 72.2 2.6445

ctrraven:gender 1 2.5 2.5 0.0905

social:gender 8 275.2 34.4 1.2605

ctrraven:social:gender 8 187.2 23.4 0.8573

The p-values associated with the F -statistics are approximate andcan be too small when the number of cases is small. However, thisdata set is probably large enough to overcome this limitation. Theparametric bootstrap can be used if this is a concern.

> nrow(jsp.y2) - sum(anova(jsp1.lme)[,1]) - 1

[1] 917

> round(pf(anova(jsp1.lme)[,4], anova(jsp1.lme)[,1], 917,

lower.tail=FALSE),4)

[1] 0.0000 0.0043 0.3738 0.0072 0.7636 0.2607 0.5524

> #Remove Gender

> jsp2.lme<-lmer(math ~ ctrraven*social + (1 | school) +

(1 | school:class), data=jsp.y2)

> qqnorm(resid(jsp2.lme),main="")

> plot(fitted(jsp2.lme) ~ resid(jsp2.lme), xlab="Fitted",

ylab="Residuals")

> qqnorm(ranef(jsp2.lme)$"school:class"[[1]],

main="School Effects")

> qqnorm(ranef(jsp2.lme)$"school:class"[[1]],

main="Class Effects")

●●

●●●

●●

●●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

−3 −2 −1 0 1 2 3

Still not much of concern.

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

● ●●●

●●

●● ●

●●

● ●

●●

●●●

●●

●●●

●●

● ●

●●

● ●

●●

● ● ●●

●●

● ●●

●●

●●●

●●

●●●

●●

● ●

● ●●

●●

● ●●

●●

●●●●

● ●

●●

● ●

●●●

● ●

●●

● ●

●●

−20 −15 −10 −5 0 5 10 15

Fitted

Constant variance is still a problem.

●●

−2 −1 0 1 2

−1.5

−1.0

−0.5

School Effects

●●

−2 −1 0 1 2

−1.5

−1.0

−0.5

Class Effects

> anova(jsp2.lme)

Analysis of Variance Table

Df Sum Sq Mean Sq F value

ctrraven 1 10159.7 10159.7 374.3263

social 8 609.5 76.2 2.8073

ctrraven:social 8 564.6 70.6 2.6005

> nrow(jsp.y2) - sum(anova(jsp2.lme)[,1]) - 1

[1] 935

> round(pf(anova(jsp2.lme)[,4], anova(jsp2.lme)[,1], 935,

lower.tail=FALSE),4)

[1] 0.0000 0.0045 0.0082

> sch.effects<-ranef(jsp2.lme)$school[[1]]

> summary(sch.effects)

Min. 1st Qu. Median Mean 3rd Qu. Max.

-2.45200 -0.69480 0.01107 0.00000 0.71840 2.44900

> raw.sch.effects<-coef(lm(math ~ school-1,jsp.y2))

> raw.sch.effects<-raw.sch.effects-mean(raw.sch.effects)

> plot(raw.sch.effects,sch.effects)

> sint<-c(9,14,29)

> text(raw.sch.effects[sint],sch.effects[sint]+0.2,

c("9","15","30"))

●●

● ●

−6 −4 −2 0 2 4

−2−1

raw.sch.effects

introduction to mixed models in r - school of statistics...

Documents

sopes mallorquines (ingles) y food galin tommy y adri

journal of clinical child & adolescent psychology social...

alternating direction methods for latent variable gaussian...

galin, sqa from theory to implementation @pearson education...

se513 software quality control lecture01: introduction to...

dimension reduction in regressions with exponential family...

inference for proportions - users.stat.umn.edu

computing primer for applied linear regression, 4th...

outline - users.stat.umn.edu

terrain generation using procedural models based...

regularized rank-based estimation of high-dimensional...

generalized additive partial linear models for...

nonconcave penalized composite conditional...

model selection via testing: an alternative to (penalized...

evaluating default priors with a generalization of eaton's...

new efficient estimation and variable selection methods...

penalized quantile regression a parallel algorithm for...

using prior information about population quantiles in...

chapter 1 output analysis for markov chain monte carlo...

institute of mathematical statistics is collaborating with...