©cscar, 2010: proc mixed1 introduction to sas ® proc mixed cscar workshop may 19 & 21, 2010...

130
©CSCAR, 2010: Proc Mixed 1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor [email protected]

Upload: rudolf-shaw

Post on 15-Jan-2016

286 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 1

Introduction to SAS® Proc Mixed

CSCAR WorkshopMay 19 & 21, 2010

Kathy Welch, [email protected]

Page 2: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 2

Workshop Goals• Introduce the Linear Mixed Model (LMM) and

some key concepts (theory)• Learn what types of data are appropriate for a

LMM analysis• Learn how to set up data for analysis using SAS

Proc Mixed• Learn how to set up Proc Mixed syntax to fit

LMMs for different types of data• Interpret output from Proc Mixed• Get diagnostic plots from Proc Mixed to check

LMM assumptions

Page 3: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©3

Lab Examples

1. Randomized block design

2. Two-level clustered data

3. Three-level clustered data

4. Repeated measures data

5. Longitudinal data

Page 4: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 4

What is a Linear Mixed Model (LMM)?

• A parametric linear model for a normally distributed response, appropriate for non-independent data– Clustered data

• Responses for units in same cluster may be correlated.

– Repeated Measures / Longitudinal data• Residuals for the same subject may be correlated

• Differs from ordinary linear regression and Anova, where we assume independent observations

Page 5: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

What is a Linear Mixed Model? (Cont)

• Predictors may be– Fixed– Random

• Fixed predictors have fixed effects parameters and specify the mean structure

• Random effects are associated with individual subjects or clusters and determine the covariance structure

• Variances and covariances can differ by group• Differs from general linear model where we

assume constant variance©CSCAR, 2010: Proc Mixed 5

Page 6: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

Data Appropriate for a LMM Analysis

• Clustered data• Subjects are nested in clusters, such as

classrooms, families, litters, neighborhoods

• Repeated Measures data• Multiple measurements for the same subjects over

time or another dimension

• Longitudinal data• Multiple measures for the same subject over long

period of time,

• Observations for the same cluster/subject are likely to be correlated (non-independent)

©CSCAR, 2010: Proc Mixed 6

Page 7: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

General Linear Model vs Linear Mixed Model

©CSCAR, 2010: Proc Mixed 7

fixed random

2

fixed random

~ ( , )

~ ( , )

~ ( , )

i i i

i

i i i i i

i

i i

N

N

N

0

0

0

Y X

iid

Y X Z u

u D

R

General Linear Model

Linear Mixed Model

Independent observations

withconstant variance

The parameters in a GLM are β and σ2

The parameters in a LMM are β and the variances and covariances in D and Ri

Variance-Covariance

Matrix, captures non-independence

Page 8: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 8

Linear Mixed Model for the ith Subject

fixed random

~ ( , )

~ ( , )

Y X Z u

u D

R

i i i i i

i

i i

N

N

0

0

• where β are fixed effects parameters• ui are random variables, with a normal distribution and variance-covariance matrix D• ϵi are random residuals, with a normal distribution and variance-covariance matrix Ri

• ui,, ϵi are independent

Page 9: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed9

Yi : Response for Subject i• Yi is a vector of Responses for the ith

cluster/subject• The ni responses for cluster/subject i are set up

in long format, with each response on a separate row of data, along with all of the covariates (predictors) for that response.

• Each subject/cluster may have a different number of responses

• Yi is approximately normally distributed (actually the residuals must be normal)

• If residuals are not normally distributed, we may consider a transformation to improve normality

Page 10: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

Yi for Clustered Data• There are ni units within cluster i

• Response measured once for each unit in the cluster

• Number of units per cluster can vary• Some clusters may have only one unit

• Example 1• Mathgain score measured once for each sampled

student in 130 classrooms• Number of sampled students per classroom can

vary• What is the cluster?• What is the unit?

©CSCAR, 2010: Proc Mixed 10

Page 11: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

Clustered Data Examples• Example 2

• First, schools are sampled• Next, classrooms within schools are sampled• Finally, students within classrooms are sampled• Mathgain score for each student is measured• What is the cluster?• What is the unit?

• Example 3• Birth weights of litters of rat pups are measured• What is the cluster?• What is the unit?

• More examples?©CSCAR, 2010: Proc Mixed 11

Page 12: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

Clustered Data Table

©CSCAR, 2010: Proc Mixed 12

Y1 : response vector for the 3 students in classid 160

Page 13: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

Yi forRepeated Measures Data

• The classic repeated measures design is multiple measures of a response over time

• Multiple measures for the same subject• ni measures of the response for subject i• ni can vary across subjects

• Data can vary over time, space, or other dimension/s

• Example 1• Insulin levels at 1 min, 5 min, 20 min, and 60 min

after injection of a drug for diabetes• What is the subject?• What is the repeated measures factor?

©CSCAR, 2010: Proc Mixed 13

Page 14: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

Repeated Measures DataExamples

• Example 2• Chemical concentration in 5 different brain regions

in rats• What is the response?• What is the subject?• What is the repeated measures factor?

• More examples?

©CSCAR, 2010: Proc Mixed 14

Page 15: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

Repeated Measures Data Table

©CSCAR, 2010: Proc Mixed 15

Y1 : response vector for animal R111097

Page 16: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

Yi forLongitudinal Data

• Longitudinal data are measures made on the same subject over a longer period of time

• ni measures of the response for a given subject• Number per subject can vary due to attrition

• Example 1• Socialization score for autistic children measured at

ages 2, 3, 5, 9, and 13• What is the subject?• What is the time frame?• Do you think attrition will be a problem?

• Other examples?©CSCAR, 2010: Proc Mixed 16

Page 17: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

Longitudinal Data Table

©CSCAR, 2010: Proc Mixed 17

Y1 : response vector for subject 1

Page 18: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 18

Fixed Predictors• Fixed predictors can be categorical

(factors) or continuous• For factors, all levels of interest included

• Treatment• Gender

• Levels of fixed factors can be defined to represent contrasts of interest

• High Dose vs. Control, Medium Dose vs. Control • Female vs. Male

• Fixed continuous predictors can be included as linear, quadratic or other terms

Page 19: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

Fixed Predictor Examples

• Age, Age2

• Income

• Gender

• Drug Treatment

• Region

• Examples from your work…

©CSCAR, 2010: Proc Mixed 19

Page 20: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 20

Xi : Design Matrix for Fixed Effects

• Xi contains values of the fixed predictor variables (X variables) for subject i.

• e.g. Age, Sex, Treatment

• The X matrix can include continuous and/or indicator variables for categorical predictors

• Xi has one row for each of the ni observations for the ith subject, and one column for each of the predictors– We implicitly include an intercept (a column of ones) for

most models– We do not need to have an intercept variable in the data

Page 21: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

X Matrix Example

©CSCAR, 2010: Proc Mixed 21

• The X matrix is formed from variables in the dataset. • We usually don’t include a variable for the intercept in the dataset. • The intercept is included in the model by default by Proc Mixed.• The X matrix variables must be present for each row of data for that observation to be included in the analysis

X MatrixX Matrix for subject 1

Page 22: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 22

: Fixed Effects Parameters

are fixed-effects parameters or regression coefficients

• unknown fixed quantities

describe how the mean of Y depends on the predictor variables for an entire population or subpopulation of subjects

• The value of does not vary across individual subjects

• We usually include an intercept ( 0) as one of the components of

Page 23: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 23

Random Factors• Random Factor: A classification variable • Random factors do not represent conditions

chosen to meet the needs of the study, but arise from sampling a larger population

• Variation in the dependent variable across levels of a random factor can be estimated and assessed

• Results can be generalized to the greater population

• A random factor may have different random effects associated with it (e.g. random intercept and random slope)

Page 24: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

Random Factor Examples

• Clustered Data:– Classroom– Hospital– Neighborhood– More examples…

• Repeated Measures/Longitudinal Data:– Person (subject)– More examples….

• The random factor is the cluster in clustered data, and subject in repeated measures/longitudinal data

©CSCAR, 2010: Proc Mixed 24

Page 25: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 25

ui : Random Effects• ui are unobserved random variables (not

parameters)

• ui are specific for the ith subject (random factor) in a LMM

• ui vary across clusters/subjects

• ui are random deviations in the relationships described by fixed effects

• ui are assumed to have a normal distribution• mean=0 and variance-covariance matrix, D

• The parameters associated with the ui are the variances and covariances of these random variables

Page 26: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

Normal Distribution Refresher

©CSCAR, 2010: Proc Mixed 26

E(Y ) = μ = Mean, or Expected value

balance pointcenter of symmetric distribution

μ

-infinity < Y < infinity

Var(Y) = σ2 = VarianceMeasure of variability, or spread

σ2 must be 0 or positive, >=0

σ is the standard deviation

Page 27: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

Normal Distribution Refresher II

©CSCAR, 2010: Proc Mixed 27

Same ,Different

Same ,Different

Page 28: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

Covariance Refresher• Covariance (denoted by Y1,Y2) is a measure of

how much two random variables change together.

• It can be positive or negative.• Normal random variables with zero covariance

are assumed to be independent

©CSCAR, 2010: Proc Mixed 28

Page 29: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

Variance-Covariance Matrix• A variance-covariance matrix contains the

variances and covariances between a set of random variables.

• The dimension of the var-covar matrix is the same as the number of random variables.

• If there is only one r.v., the dimension would be 1 by 1 (just a single value).

• If there are two r.v.s, the matrix would be 2 by 2 (or 2x2)

• The set of variances and covariances are called the covariance parameters.

©CSCAR, 2010: Proc Mixed 29

Page 30: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 30

The D matrix

• D is the variance-covariance matrix for the random effects (ui) for subject i

• D contains the covariance parameters for the random effects

• The dimension of D is based on the number of random effects per subject, not the number of observations per subject– If there is one random effect (e.g., a random intercept)

per subject, D would be 1x1– If there are two random effects (e.g., a random

intercept and random slope) per subject, D would be 2x2

Page 31: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 31

Form of the D Matrix

• Variances of random effects are on the diagonal• Covariances between different random effects within the

same subject are on the off-diagonal• Symmetric, positive-definite matrix

• Variances must all be positive • SAS calls this G. • We use D and G interchangeably

1 1 2 1

1 2 2 2

1 2

( ) ( , ) ( , )

( , ) ( ) ( , )( )

( , ) ( , ) ( )

i i i i qi

i i i i qi

i

i qi i qi qi

Var u cov u u cov u u

cov u u Var u cov u uVar

cov u u cov u u Var u

D u

Page 32: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 32

Two Common Structures for D

• Variance components type=vc (Independent)

• Unstructured

type=un

2

1

2

2

0

0

u

u

vc

D

2

1 1, 2

2

1, 2 2

u u u

un

u u u

D

Although there are many other possible structures for D, one of these two structures is almost always used

Page 33: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

Covariance Matrices• How would you fill

these in, if you had a random intercept and random slope per subject and 10 obs per subject?

©CSCAR, 2010: Proc Mixed 33

_____

_____vc

_____

_____

D

_____

_____un

_____

_____

DScenario A: 2

intercepts=.282

slopes=.10

Scenario B: 2

intercepts=.282

slopes=.10intercepts,slopes=-.01

Page 34: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 34

Zi : Design Matrix for Random Effects

• The Zi matrix can include both continuous and indicator variables for subject i

• Zi has one row for each observation for the ith subject

• Number of columns in Zi depends on the number of random effects in the model

• We often include a random intercept for each subject

• In a model with one random intercept per subject, Zi would have one column per subject

• In a model with a random intercept and random slope for each subject, Zi would have two columns per subject.

Page 35: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

Z Matrix Example

©CSCAR, 2010: Proc Mixed 35

Z Matrix for Subject 1

Z Matrix for Subject 3

Z Matrix for Subject 4

We don’t include variables for Z in our dataset. Note that Zi has all zero values for other subjects.

Page 36: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 36

Random Residuals: εi

• The εi vector contains the residuals for the ith subject

• There is one value of εi for each observation for the ith subject

• We assume that the εi are normally distributed, with mean = 0 and variance-covariance matrix, Ri

• There are a large number of possible structures for Ri, some of which we will examine later

• For example, we can allow the variances of the residuals at different time points to differ.

Page 37: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 37

The Ri matrix• Ri contains the variances and covariances of residuals

for the same subject– residual covariance parameters

• The dimension of Ri depends on the number of observations (ni) for subject i. – For a subject with 5 repeated measures, the Ri matrix

would be 5 X 5. – For a subject with only one measure, the Ri matrix would

be 1 X 1.• The default assumption in Proc Mixed for the Ri

matrix is that the variance of all residuals is the same and that the covariances are all zero.

Page 38: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 38

Form of the R Matrix

• The diagonal elements are variances of residuals for the same subject

• The off-diagonal elements are covariances between two residuals for the same subject

• Symmetric, positive-definite matrix • The variances must all be > 0

1 1 2 1

1 2 2 2

1 2

( ) ( , ) ( , )

( , ) ( ) ( , )( )

( , ) ( , ) ( )

i

i

i i i

i i i i n i

i i i i n i

i i

i n i i n i n i

Var cov cov

cov Var covVar

cov cov Var

R

Page 39: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

Form of the R Matrix II

• Proc Mixed has many possibilities for the structure of the R matrix

• We will discuss some of these later in the workshop.

©CSCAR, 2010: Proc Mixed 39

Page 40: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 40

Covariance Parameters• We estimate a set of covariance

parameters , which are the variances and covariances for the D and R matrices– For D we estimate D (variances and

covariances of the random effects)– For R we estimate R (variances and

covariances of the random residuals)

• The number of covariance parameters that we estimate depends on the number of random effects, and the structure we specify for D and R

Page 41: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

Covariance Summary

• We use Proc Mixed to estimate the variance of random effects, and the covariance between pairs of random effects in a LMM.

• We also use Proc Mixed to estimate the variances and covariances of the random residuals in a LMM.

• We assume that the random effects and the random residuals are independent.

©CSCAR, 2010: Proc Mixed 41

Page 42: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

Implied Marginal Model

• A LMM uses random effects explicitly to model between-subject variance– Subject-specific model– Includes D matrix and R matrix

• Implied marginal model– Marginal model that results from fitting a LMM– The marginal variance-covariance matrix is called V– V is derived from D and R

• We can get the distribution of the population mean using the implied marginal model

©CSCAR, 2010: Proc Mixed 42

Page 43: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 43

Implied Marginal Distribution of Yi Based on a LMM

Vi, the marginal variance-covariance matrix, is derived from D and Ri

( ) (

( ) (

( (

)

) ( ) )

Y X Z u

Y Y X Z u X

Y V X Z u

Z u Z u Z

= Z DZ R

) =i

i i i

i i

i

i i

i i i i

i i i i

i i i i

i i i i i i

i

Mean of =E E

Var Var

Var Var Var

Page 44: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 44

Proc Mixed in SAS• Is an appropriate tool to fit models for clustered

or repeated measures / longitudinal data• Allows users to fit LMMs with both fixed and

random effects• Accomodates models with a wide variety of

correlation (covariance) structures• Can be used when there are unequal numbers

of observations per cluster/subject• Can be used when there are unequal variances

for different subgroups of observations• Has a rich array of graphical and analytic tools

to assess the fit of LMMs

Page 45: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

Data Structure for Proc Mixed

• We structure the data in “long” form, so multiple observations for the same subject/cluster are on separate rows of data

• Each row contains all information specific to the cluster or subject– Some variables vary across clusters/subjects, but are

constant within a given cluster/subject, they will be repeated for all rows for the same subject/cluster

• In repeated measures, these are time-invariant– Some variables change for different subjects within a

cluster• In repeated measures, these are time-varying

©CSCAR, 2010: Proc Mixed 45

Page 46: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 46

Randomized Block Design• A block is a group of relatively homogeneous

experimental units• The use of blocks reduces variability (within-block

variability should be low, between-block variability can be high)

• Individual blocks are independent• Observations within a block are correlated• Blocks are usually random factors

– They represent a random sample from a population– We wish to make inferences to the population, not to

the individual blocks• Examples of blocks include batches, machines, plots,

mice, people, clinics, and bananas (the next example)

Page 47: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

Lab Example 1

Randomized Block DesignBanana Data

Page 48: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 48

(Hypothetical) Banana Example• Purpose: To compare shelf life of bananas, when treated with

three different food preservatives (A, B, C)

• Experimental material: 5 bananas

• Experimental design, bananas are blocks:

– Cut each banana into three pieces.

– Randomly assign one of the three preservatives to each piece

Treatment A B C

1 8.9 9.1 9.1 2 9.3 9.4 9.7 3 9.4 9.3 9.6 4 9.6 9.8 10.0

Banana (Block)

5 10.0 9.9 10.2

Page 49: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 49

Fixed and Random Factors• Treatment is a fixed factor; contrasts between

treatments are of interest• Bananas are a random sample from a

population of bananas– We want conclusions for the study to apply to the

whole population of bananas, not just these particular bananas

– Banana is a random factor that will have random effects

• We will fit a LMM with fixed effects for treatment and a random effect for each banana

Page 50: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 50

Model for Banana Data

• where Yti = shelf life of banana i, treated with preservative t 0 = intercept t = fixed effect of treatment t, t = 1, 2, 3

• bi = random effect (intercept) for banana i,

• εti = residual for banana i, treatment t

• We estimate five parameters, 0, 1, 2, σb2, σ2

3 = 0, set to zero restriction

0

2

2

~ (0, )

~ (0, )

t titi ti i

i b

ti

Treat b

b N

N

Y Note: The D matrix is 1 by1, because

we have only 1 random effect per banana

Page 51: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 51

D matrix for Banana Data

• The bi are random intercepts, one for each banana

• σ2b is the variance of the random banana intercepts, and

captures the between-banana variance• In this case (1 random effect per subj), D is 1 x 1

• σ2b is the only random effects parameter we need to

estimate• The covariance of observations on the same banana

depends on the variance of the random effects, σ2b

2

2~ (0, )

i b

i b

Var(b )

b N

D =

Page 52: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 52

Ri Matrix for Banana Data

• There are 3 observations per banana• The Ri matrix will be a 3 x 3 matrix for each banana• We assume the default structure (σ2I) for Ri

• The residual variance is constant

Var(εti) = σ2

2

2

2 2

2

2

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

~ (0, )

R = i i

i

Var( ) I

N I

ε

ε

Page 53: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 53

Data (long form) y treat banana

8.9 A 1

9.1 B 1

9.1 C 1

9.3 A 2

9.4 B 2

9.7 C 2

10.0 A 5

9.9 B 5

10.2 C 5

Model for each Banana

y11 = 0 + 1 + b1 + ε11

y21 = 0 + 2 + b1 + ε 21

y31 = 0 + 3 + b1 + ε 31

y12 = 0 + 1 + b2 + ε 12

y22 = 0 + 2 + b2 + ε 22

y32 = 0 + 3 + b2 + ε 32

yti = 0 + t + bi + ε ti

y15 = 0 + 1 + b5 + ε 15

y25 = 0 + 2 + b5 + ε 25

y35 = 0 + 3 + b5 + ε 35

______________________

= treat --- 1 2 3 (Fixed Effects)

b = banana --- b1 b2 b3 b4 b5 (Random Effects)

Page 54: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 54

data fruit;input shelf treat $ banana;cards; 8.9 A 1 9.1 B 1 9.1 C 1 9.3 A 2 9.4 B 2 9.7 C 2 9.4 A 3 9.3 B 3 9.6 C 3 9.6 A 4 9.8 B 410.0 C 410.0 A 5 9.9 B 510.2 C 5;proc mixed data = fruit;class treat banana;model shelf = treat / solution;random banana; run;

Example 1: SAS Code

Page 55: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 55

Proc Mixed Syntax

• Class statement sets up categorical factors for both fixed and random effects

• Model statement specifies the fixed factors in the model

• Random statement specifies the random factors to be included in the model, and specifies the structure for the D matrix (called G matrix by SAS)

• Repeated statement specifies the structure of the R matrix of residual variances and covariances

Page 56: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 56

Example 1: Proc Mixed Syntax

Proc mixed data = fruit; class treat banana; model shelf = treat / solution; random banana;run;

Note: Proc Mixed will automatically include a dummy variable for each level of a class variable. The highest level of the class variable is given a coefficient of 0 for the dummy variable by default. This makes the highest level the reference.

Page 57: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 57

The Mixed Procedure

Model Information

Data Set WORK.FRUIT Dependent Variable shelf Covariance Structure Variance Components Estimation Method REML Residual Variance Method Profile Fixed Effects SE Method Model-Based Degrees of Freedom Method Containment

Class Level Information

Class Levels Values

treat 3 A B C banana 5 1 2 3 4 5

Ex 1: Proc Mixed OutputPart 1

Page 58: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 58

Ex 1: Proc Mixed Output (Cont)

Dimensions Covariance Parameters 2 Columns in X 4 Columns in Z 5 Subjects 1 Max Obs Per Subject 15

Number of Observations Number of Observations Read 15 Number of Observations Used 15 Number of Observations Not Used 0

Iteration History Iteration Evaluations -2 Res Log Like Criterion 0 1 16.24999675 1 1 -2.40852048 0.00000000

Convergence criteria met.

SAS assumes all obs are for the same subject, because we did not specify a subject in the

random statement

Page 59: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 59

     Covariance Parameter Estimates Cov Parm Estimate banana 0.1430 Residual 0.008667   Fit Statistics -2 Res Log Likelihood -2.4 AIC (smaller is better) 1.6 AICC (smaller is better) 2.9 BIC (smaller is better) 0.8   Solution for Fixed Effects Standard Effect treat Estimate Error DF t Value Pr > |t| Intercept 9.7200 0.1742 4 55.81 <.0001 treat A -0.2800 0.05888 8 -4.76 0.0014 treat B -0.2200 0.05888 8 -3.74 0.0057 treat C 0 . . . .  Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F treat 2 8 12.54 0.0034

Estimated between

Banana variance:σb2

Estimated within banana variance: σ2

Significance of Overall Treat Effect

Ex 1: Proc Mixed Output(Cont)

Intercept is estimatedMean for Treat C

Estimated diff in mean of treat A vs. C

Estimated diff in mean of Treat B vs. C

Page 60: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 60

Ex 1: Interpreting Fixed Effects Estimates

• The estimated effect of each treatment represents a contrast between that level of treatment and the last level

• The intercept represents the estimated mean for the last level of treatment– The estimated shelf life for treatment C = 9.72 days

• The effect of treatment A = -0.28– Treatment A reduces shelf life by 0.28 days, as compared to

treatment C (the reference group), p = 0.0014.• The effect of treatment B = -0.22

– Treatment B reduces shelf life by 0.22 days, as compared to treatment C,p = 0.0057

Page 61: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 61

Ex 1: Fixed Effects Estimates (Cont)

• We can substitute the estimated fixed effects parameters into the model equation to get the predicted mean for each treatment

0 1

0 2

ˆ ˆˆ 9.72 .28 9.44

ˆ ˆˆ 9.72 .22 9.50

ˆ ?

A

B

C

Page 62: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 62

Ex 1: Covariance Parameter Estimates

• There are two covariance parameters in this model, the estimated between-banana variance:

• And the estimated within-banana, or residual variance:

2ˆ 0.1430bD

2ˆ 0.008667

Page 63: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 63

Estimated G Matrix

Row Effect banana Col1 Col2 Col3 Col4 Col5 1 banana 1 0.1430 2 banana 2 0.1430 3 banana 3 0.1430 4 banana 4 0.1430 5 banana 5 0.1430

Ex 1: Estimated G Matrix

• We can view G matrix (what we call the D matrix) by using the g option:

random banana / g;The D matrix for each

Banana is 1x1

Page 64: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 64

Ex 1: Estimated Ri Matrix

• We can view the estimated 3 x 3 Ri matrix for the three observations for the first banana by adding a repeated statement:

repeated / subject=banana r;

Estimated R Matrix for Banana 1

Row Col1 Col2 Col3 1 0.008667 2 0.008667 3 0.008667

Page 65: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 65

Estimated V Matrix for Subject 1 row Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 Col10 Col11 Col12 Col13 Col14 Col15 1 0.152 0.143 0.143 2 0.143 0.152 0.143 3 0.143 0.143 0.152 4 0.152 0.143 0.143 5 0.143 0.152 0.143 6 0.143 0.143 0.152 7 0.152 0.143 0.143 8 0.143 0.152 0.143 9 0.143 0.143 0.15210 0.152 0.143 0.14311 0.143 0.152 0.14312 0.143 0.143 0.15213 0.152 0.143 0.14314 0.143 0.152 0.14315 0.143 0.143 0.152

 

Ex 1: Estimated V matrix• We can view the estimated V matrix of marginal

variances and covariances for all bananas by using the v option in the random statement (note V is block-diagonal, with obs from different bananas being indep.):

random banana / v;

Page 66: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 66

Ex 1: Calculation of V matrix from D and R

• The V matrix, is derived from the D and R matrices

• The covariance, and hence correlation, among observations within the same banana is due to the between-banana variation

• If there is zero between-banana variation, there is no correlation among obs for the same banana

( )Y V Z DZ Ri i i i i

Var

Page 67: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 67

Ex 1: Calculation of V Matrix(Cont)

• We first illustrate these calculations for the ith banana

• We then show how these calculations work for the entire data set, assuming we have only two bananas in the study

• This can then be generalized to any number of bananas

• We will then have tools to help understand more complicated models

Page 68: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 68

Ex 1: Step 1 of Calculation of V Matrix for the ith banana

'

1ˆ 1 .143 1 1 1

1

.143 .143 .143

.143 .143 .143

.143 .143 .143

i i

Z DZ

ˆ ˆ ˆ( )Y V Z DZ Ri i i i i

Est. Var

Page 69: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 69

Ex 1: Step 2 of Calculation of V Matrix

2

1 0 0ˆ ˆ .008667 0 1 0

0 0 1

.0087 0 0

0 .0087 0

0 0 .0087

i I

R

Page 70: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 70

Ex 1: Step 3 of Calculation of V Matrix

'

2 2 2 2

2 2

ˆ ˆ ˆ

.1430 .1430 .1430 .0087 0 0

.1430 .1430 .1430 0 .0087 0

.1430 .1430 .1430 0 0 .0087

ˆ ˆ ˆ ˆ.1517 .1430 .1430

ˆ ˆ ˆ.1430 .1517 .1430

.1430 .1430 .1517

i

b b b

b b

V ZDZ R

2 2

2 2 2 2

ˆ

ˆ ˆ ˆ ˆb

b b b

Covariance of obs on same banana is the between-banana variance

Marginal variance of each obs is the between-banana variance plus within-banana variance

Page 71: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 71

Ex 1: Calculation of V Matrixfor two bananas

2

1 0 0 0 0 0

0 1 0 0 0 0

0 0 1 0 0 0ˆ ˆ .0086670 0 0 1 0 0

0 0 0 0 1 0

0 0 0 0 0 1

.0087 0 0 0 0 0

0 .0087 0 0 0 0

0 0 .0087 0 0 0

0 0 0 .0087 0 0

0 0 0 0 .0087 0

0 0 0 0 0 .0087

I

R

Page 72: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 72

Ex 1: Calculation of V Matrixfor two bananas (Cont)

'

.143 .143 .143 0 0 0

.143 .143 .143 0 0 0

.143 .143 .143 0 0 0ˆ ˆ0 0 0 .143 .143 .143

0 0 0 .143 .143 .143

0 0 0 .143 .143 .143

.0087 0 0 0 0 0

0 .0087 0 0 0 0

0 0 .0087 0 0 0

0 0 0 .0087 0 0

0 0 0 0 .0087 0

0 0 0 0 0 .0087

ZGZ R

Page 73: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed73

Ex 1: Calculation of Vfor two bananas (Cont)

• The V matrix is block-diagonal • Observations within the same banana are correlated• Observations for different bananas are independent• This can be extended for more bananas

'

.1517 .143 .143 0 0 0

.143 .1517 .143 0 0 0

.143 .143 .1517 0 0 0ˆˆ ˆ0 0 0 .1517 .143 .143

0 0 0 .143 .1517 .143

0 0 0 .143 .143 .1517

V ZGZ R

Page 74: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 74

Ex 1: Intraclass Correlation ICC

• The ICC estimates the correlation of observations within the same subject

• It is very high for this example• Because it is based on variances, the ICC can

only be positive or zero (more on this later)2

2 2

.143Estimated (banana example) .9426

.1517

b

b

ICC

ICC

Page 75: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 75

Ex 1: ICC for the Banana Data

• You can get an estimate of the marginal correlation matrix by adding the vcorr option:

random banana / v vcorr;Estimated V Correlation Matrix for Subject 1

row Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 Col10 Col11 Col12 Col13 Col14 Col15 1 1.000 0.942 0.942 2 0.942 1.000 0.942 3 0.942 0.942 1.000 4 1.000 0.942 0.942 5 0.942 1.000 0.942 6 0.942 0.942 1.000 7 1.000 0.942 0.942 8 0.942 1.000 0.942 9 0.942 0.942 1.00010 1.000 0.942 0.94211 0.942 1.000 0.94212 0.942 0.942 1.00013 1.000 0.942 0.94214 0.942 1.000 0.94215 0.942 0.942 1.000

 

Page 76: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 76

Ex 1: Using Subject in the Random Statement

• The subject option tells SAS the V matrix is block diagonal, allowing computationally efficient methods of estimation.

random intercept / subject=banana;

• SAS will now say that there are 5 subjects, instead of one.

• With the subject option, if you ask SAS to print out the G, V, or Vcorr matrices, you will only get one block of the matrix, for a single subject

Page 77: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 77

Ex 1: Using Subject in the Random Statement (Cont)

• Without the subject option, the V matrix will still be block diagonal, but SAS won’t know that in advance and will use less efficient methods of computation

• If you have a large sample size, using the subject option may be crucial for efficiency.

Page 78: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 78

Ex 1: Summary• Random statement allows us to fit a model with

correlated observations for the same banana– Including the subject option is more efficient

• We get estimates of the between-banana variation, the within-banana variation and the intraclass correlation

• SAS will print estimates of the marginal variance-covariance matrices and the marginal correlation matrices for the model

• We estimate the fixed effects of treatment after adjusting for the random effects of banana

Page 79: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 79

Model-Building Strategies• Top-Down:

– Start with a well-defined mean structure– Select a structure for the random effects– Select a covariance structure for residuals– Reduce the fixed effects in the model, removing non-

significant effects

• Step-Up:– Often used when fitting HLM models– Starts with “unconditional” or means-only model– Add fixed effects and random effects to the different

levels of the model

Page 80: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 80

Estimation in LMMs

• Use either ML (Maximum Likelihood) or REML (Residual or Restricted Max Likelihood) to estimate covariance parameters in V

• Then use Generalized Least Squares (GLS) to estimate

Page 81: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 81

REML Estimation• REML is a way of estimating covariance

parameters

• Produces unbiased estimates of covariance parameters

• Takes into account loss of df resulting from estimating fixed effects

• Used when carrying out hypothesis tests about covariance parameters

• Less important to use REML when sample size is large

Page 82: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 82

ML Estimation• Maximize a profile log likelihood function lML()

• Nonlinear optimization• Constraints applied to the covariance parameters in

D and R• Iterates to a solution• ML estimation used for hypothesis tests (LRT) about

fixed effects parameters,

-1( ) 0.5 ln(2 ) 0.5 ln(det( )) 0.5 'ML i i i ii i

l n V r V r

Page 83: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 83

Hypothesis tests

• We may want to test hypotheses about the fixed effects in – Test whether a particular fixed effect is zero, – Compare the fixed effects of two (or more) treatments

or groups– Get an overall test of the fixed effects for a categorical

predictor• We may want to test hypotheses about the

covariance parameters in – Is the variance of a particular random effect zero?– Should we include covariances between random

effects, or specify them to be zero?

Page 84: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 84

F-tests• Often approximate in LMMs, except for balanced

designs or partially balanced designs• We do not base F-tests on sums of squares as in

traditional ANOVA models• Denominator df for F-tests may be estimated in a

number of waysH0: Lβ = 0 HA: Lβ ≠ 0

1 1 1ˆ ˆˆ( ) ( ( ' ) ) ( )

( )

i i iiF

rank

L L X V X L L

L

Page 85: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 85

Denominator df in F-tests • df method can be specified as an option in Model

statement (ddfm= )• Default method for model with random statement is

contain: ddfm=contain (SAS uses syntax rules to figure out correct error term for fixed effects)

• Satterthwaite has good small sample properties: ddfm=sat

• Kenward-Roger tries to correct for the fact that we do not know (and so must estimate) variance of random effects: ddfm=kr

• Between-within divides df into between and within components: ddfm=bw (default for repeated measures)

• With a large number of subjects, the method used doesn’t make much difference

Page 86: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 86

t-tests

• t-tests are also usually approximate in a LMM

• Degrees of freedom (df) for the t-test may also be approximated as for F-test

ˆ

ˆ( )t

se

H0: β = 0

HA: β ≠ 0

Page 87: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 87

Likelihood Ratio Tests (LRT)• Likelihood Ratio Tests compare the likelihood for a

nested (reduced) model to that for a reference (full) model.

• One or more parameters the in the nested model are constrained (i.e., certain parameters may be set to zero)

• The df for the test are derived by subtracting the number of parameters in the nested model from the number in the reference model

22 log( ) 2 log( ) ( 2 log( )) ~nested

nested reference df

reference

LL L

L

Page 88: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 88

Likelihood Ratio Tests for Fixed Effects

• Use Maximum Likelihood (ML) as the estimation method– Fit the reference model– Fit the nested model– Subtract -2 log likelihood of reference model from that

of nested model– Calculate the p-value of the test, using appropriate df

• This can be done using SAS code, illustrated in Lab example 2

Page 89: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 89

Likelihood Ratio Tests for Covariance Parameters

• Use Restricted Maximum Likelihood (REML) estimation to get unbiased estimates of covariance parameters

• Fit the reference model• Fit the nested model• Calculate the p-value using the appropriate 2

distribution or mixture of 2 distributions, and appropriate df

• This can be done using SAS code, illustrated later

Page 90: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 90

Residual Diagnostics• Assess model residuals for

– Normality– Constant variance– Outliers

• Use histograms, normal q-q plots, residual vs. predicted plots, and other diagnostic plots

• Two basic kinds of residuals– Conditional residuals / Studentized conditional

residuals (conditional on random effects)-better for model diagnostics

– Unconditional residuals (not conditional on random effects)-not as good for model diagnostics

– We will use conditional / studentized conditional residuals for this workshop

Page 91: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 91

Conditional Residuals• Difference between the observed value and the

conditional predicted value (conditional on random effects)

• May not be well as suited for verifying model assumptions and detecting outliers as studentized conditional residuals

• Variances may be different for different subgroups

ˆ ˆˆi i ii i

y X Z u

Page 92: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 92

Studentized Conditional Residuals

• Scaled residuals, where each conditional residual is divided by its estimated standard deviation

• Because residuals are scaled, different residuals for subgroups with unequal variances will have similar scales

• Two types of studentization:– Internal studentization

• the observation itself is included in calculation of its standard deviation (studentized residuals)

– External studentization• the observation itself is excluded when calculating its

standard deviation (studentized deleted residuals)• Well suited for verifying model assumptions and detecting

outliers

Page 93: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 93

Influence Diagnostics

• Can identify observations that are influential in estimation of β (fixed effects) or (covariance parameters)

• Examine effect of omission of each observation (or cluster) on analysis of the entire data set

• Proc Mixed includes many ways to study influence diagnostics for LMMs

• Active area of research

Page 94: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

Hierarchical Data Structure

• A nice way to visualize data sets appropriate for an LMM analysis, is to think about them in a hierarchical sense.

• This way of thinking about the data is largely due to the HLM program of Bryck and Raudenbush.

• Each level of the data represents a different degree of summarization.

©CSCAR, 2010: Proc Mixed 94

Page 95: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 95

Clustered Data Hierarchy

• Dependent variable measured once for each unit of analysis– Units of analysis are nested within clusters of units

• Observations for units in same cluster may be correlated– Students in classrooms – Rat pups in litters – Patients in clinics – Students in classrooms and classrooms within

schools

Page 96: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 96

Rat pup 1

Litter 2Litter 1

Rat pup 1

Rat pup 2

Rat pup 3 Rat pup 2

Two-level Clustered Data Structure (Rat Pup Data)

Level 2

(Litters)

Level 1

(Rat pups)

• Dependent variable is measured once for each rat pup

Page 97: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 97

Clustered Data Setup

• Data in long format– One row for each unit within a cluster

• Unit-specific information– Response variable– Unit-specific covariates to be included in the model

• Cluster-specific information– Cluster ID– Cluster-specific covariates to be included in the model– These values are repeated for each row for a cluster

• All rows with complete data will be used in fitting the model

Page 98: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

Lab Example 2

Two-Level Clustered DataRat Pup Data

Page 99: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

Rat pup data Setup for SAS

©CSCAR, 2010: Proc Mixed 99

Page 100: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 100

Ex 2: Summary

• We can use various tests (F-tests, t-tests, and likelihood ratio tests) for the LMM that we fit.

• The appropriate test depends on the hypothesis that we are testing

• Model diagnostics in Proc Mixed are very extensive and provide helpful information about influential cases/clusters

Page 101: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 101

School 1

Classroom 2Classroom 1

Student 1 Student 2 Student 3 Student 1 Student 2

Three-Level Clustered Data Structure

Level 3

(Clusters of Clusters)

Level 2

(Clusters of Units)

Level 1

(Units of Analysis)

• Dependent variable is measured once for each student

Page 102: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 102

Levels of Data:Level 1

• The most detailed level of the data– Response is always measured at Level 1

• For Clustered data set:– Level 1 Represents the units of analysis (e.g.,

students in classroom)– Unit-specific covariates (e.g., SES, minority

status, sex of each child) are measured at Level 1

Page 103: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 103

Levels of Data:Level 2

• The next level of the hierarchy in the data• For a Clustered data set:

– Level 2 represents clusters of units (e.g., classrooms)

– Includes cluster-specific covariates (e.g., classroom size, teacher experience)

Page 104: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 104

Levels of Data:Level 3

• The third level of the hierarchy in the data, if it exists

• For a Clustered data set:– Level 3 represents clusters of level 2 units / clusters

of clusters (e.g., schools, which are clusters of classrooms)

– Level 3 specific covariates (e.g., neighborhood characteristics of school, such as household poverty in school neighborhood)

• Other examples of three-level clustered data?

Page 105: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 105

Predicting Random EffectsBLUPS

• Even though classrooms and schools are a random sample from some larger population, we may still want to estimate the classroom or school-specific means.

• To identify classrooms with poor math scores that might be candidates for an intervention. Identify, post-hoc, attributes of poorly performing schools.

• Classroom is an element of the Z vector. We can interpret ui as the “effect” of classroom i (random classroom effect, or random intercept per classroom).

fixed random

i i i i i Y X Z u

Page 106: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 106

Predicting Random EffectsBLUPS (Cont)

• If V is known, the estimates of u are Best Linear Unbiased Predictors (BLUPs).

• When V is unknown, the estimates of u are referred to as Empirical Best Linear Unbiased Predictors (EBLUPs).

• Use solution option added to the random statement to produce EBLUPs for the u’s (predictions for each ui, i.e., the effect of each classroom): random int / subject=classid solution; or random classid / solution;

Page 107: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 107

Predicting Random EffectsBLUPS (Cont)

• The ui are the conditional expectations of the random effects, given the observed response values, yi

• We predict, rather than estimate, the values of the EBLUPS

• Recall that the assumed distribution of the random effects is normal, and we can check that assumption

1 ˆˆ ˆˆ ( | ) ( )i i i i i i i i

E u u Y y DZ V y X

Page 108: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

Lab Example 3

Three-Level Clustered DataClassroom Data

Page 109: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 109

Ex 3: Summary

• We can fit models for clustered data with three or more levels

• We can check the distribution of the Eblups (predicted random effects) to look for outliers (schools or classrooms that are doing particularly well or poorly)

Page 110: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 110

Repeated Measures Data

• Dependent variable measured multiple times for each unit of analysis– Repeated measures factor may be time or other

observational or experimental factor– May be more than one repeated measures factor

• Examples– Regions/Treatments within rat brain– Insulin levels measured at various time points within

patient after injection of drug

• Observations made for same subject may be correlated

Page 111: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 111

Repeated Measures Data Structure

Level 2

(Units of Analysis)

Level 1

(Repeated Measures)

Brain Region 2

Brain Region 3

Rat 1

Brain Region 1

Brain Region 2

Brain Region 3

Rat 2…

Brain Region 1

• Dependent variable is measured more than once for each rat

Page 112: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 112

Data Setup:Repeated Measures / Longitudinal

Data• Data are in long format • One row for each time point for each subject• Each row contains

– Time-varying information• Dependent variable• Time-varying covariates to be included in the model

– Time-invariant information• Unit / subject ID• Time-invariant covariates to be included in the model• Values repeated for each row for a subject

• All rows with complete data will be used in fitting the model• Number of rows per subject can vary

Page 113: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 113

Recall: Form of the Ri Matrix

• Variance-covariance matrix of the residuals • Many different possible structures for R

1 1 2 1

1 2 2 2

1 2

( ) ( , ) ( , )

( , ) ( ) ( , )( )

( , ) ( , ) ( )

i

i

i i i

i i i i n i

i i i i n i

i i

i n i i n i n i

Var cov cov

cov Var covVar

cov cov Var

R

Page 114: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 114

Commonly Used Structures for R

Unstructured

type = UN

Variance

Components

type=VC

Compound Symmetry

type = CS

Banded

type = UN(2)

232313

232212

131221

21

22

23

0 0

0 0

0 0

21 1 1

21 1 1

21 1 1

2323

232212

1221

0

0

Page 115: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 115

First-order Autoregressive

type = AR(1)

Toeplitz Toeplitz (2)

type = Toep type = Toep(2)

Heterogeneous

Compound Symmetry

type = CSH

Heterogeneous 1st-order Heterogeneous Toeplitz

Autoregressive type = Toeph

type = ARH(1)

2222

222

2222

21 22

1 12

2 1

212

1 12

1

0

0

233231

322221

312121

233231

232

2221

312

2121

23321312

32122211

31221121

More structures for R

Page 116: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 116

Model Fit: Akaike Information Criteria

• SAS calculates the AIC based on the (ML or REML) log likelihood

• The penalty is 2p, where in SAS, p represents the total number of parameters being estimated for both fixed and random effects.

• Can be used to compare any two models fit for the same observations, they need not be nested.

• Smaller is better.• Often used to help chose an appropriate structure for the

R matrix

ˆ ˆ2 ( , ) 2AIC l p

Page 117: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 117

Model Fit: Bayes Information Criterion• BIC applies a greater penalty for models with

more parameters than does AIC.

• The penalty is number of parameters, p, times ln(n), where n is the total number of observations in the data set.

• Can be used to compare two models for the same observations, need not be nested models.

• Smaller is better.• Often used, in conjunction with AIC to help

chose an appropriate structure for R matrix.

ˆ ˆ2 ( , ) ln( )BIC l p n

Page 118: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 118

Marginal Model vs. LMM• LMM uses random effects explicitly to model

between-subject variance– Subject-specific model– Includes D matrix and R matrix

• Implied marginal model (discussed earlier)– Marginal model that results from fitting a LMM– The marginal variance-covariance matrix is called V– V is derived from D and R

• Marginal model does not use random effects in its specification – Population-averaged model– Uses only the R matrix, no random effects, so no D.

Page 119: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 119

A Marginal Model With no random effects

• We do not include random effects in this model.• Therefore, D is zero• Covariances, and hence correlations, among residuals

are specified directly through the Ri matrix• Vi (the marginal variance-covariance matrix for Yi) = Ri

~ ( , )

i i i

i i

i i

N

0

Y X

V

V R

Page 120: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 120

LSMEANS

• Lsmeans (least squares means) give estimates of the mean of Y for each level of fixed predictors (that are in the class statement), after adjusting for all other fixed covariates in the model– Assumes all groups based on categorical predictors

are balanced– Assumes continuous covariates are fixed at their

mean• Post-hoc tests can be carried out on lsmeans, using

different adjustments for multiple comparisons, e.g., Bonferroni, Tukey, Dunnett, Scheffe, etc.

• Slices can be used to get simple effects for interactions

Page 121: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

Lab Example 4

Repeated Measures DesignThe Rat Brain Data

Page 122: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 122

Ex 4: Summary

• There are many possible ways to fit a model for repeated measures data

• A LMM with a single random intercept is equivalent to a marginal model with a compound symmetric variance-covariance structure, but only if the between-subject variance is > 0.

• Post-hoc comparisons can be easily carried out using lsmeans statements

• Many different post-hoc methods are available in SAS

Page 123: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 123

Types of Data:Longitudinal Data

• Dependent variable measured multiple times for each unit of analysis– Repeated measures factor is time– May be over an extended period of time (e.g. years)

• Examples– Autistic children measured at different ages

• Observations made on same child may be correlated

Page 124: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 124

Longitudinal Data Structure

Level 2

(Subjects: Units of Analysis)

Level 1

(Repeated Measures)

Child ID 2Child ID 1

Age 2 years Age 3 years Age 9 yearsAge 2 years Age 5 years

• Dependent variable is measured more than once for each child

• Number of measurements does not need to be equal for all subjects

• Spacing of intervals not required to be equal for all measurement times

• Measurement times do not have to be the same for all subjects

Page 125: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 125

Missing Data• Assume data are Missing at Random (MAR)

– Probability of having missing data on a given variable may depend on other observed information

– Does not depend on the data that would have been observed, but is missing

• Include in model other covariates that are predictive of missingness

Page 126: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

Lab Example 5

Random Coefficients Models for Longitudinal Data The Autism Data

Page 127: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 127

Ex 5: Summary

• Random coefficient models can be used to model both the trajectory of change and the variance of the trajectories

• Variance of random intercepts may not be estimable in all situations

• If a problem comes up, it is best to investigate it thoroughly

Page 128: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 128

References I

• Pinheiro, J. C. and Bates, D. M., Mixed-Effects Models in S and S-PLUS, Springer-Verlag, Berlin, 2000.

• Laird, N.M. and Ware, J.H., Random-effects models for longitudinal data. Biometrics, 38, 963, 1982.

• Oti, R., Anderson, D., and Lord, C. (submitted) Social trajectories among individuals with autism spectrum disorders, Journal of Developmental Psychopathology.

Page 129: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

©CSCAR, 2010: Proc Mixed 129

References II

• West, Brady T., Welch, Kathleen B., Galecki, Andrzej T., Linear Mixed Models: A Practical Guide Using Statistical Software, Chapman & Hall/CRC, 2006.

• Verbeke, G. and Molenberghs, G., Linear Mixed Models for Longitudinal Data, Springer, 2000.

• Littell, R.C., Milliken, G.A., Stroup, W.W., Wolfinger, R.D., Schabenberger, O., SAS for Mixed Models, 2nd Edition, Cary, NC: SAS Institute Inc.

Page 130: ©CSCAR, 2010: Proc Mixed1 Introduction to SAS ® Proc Mixed CSCAR Workshop May 19 & 21, 2010 Kathy Welch, Instructor kwelch@umich.edu

References III

• Little, R.J.A., and Rubin, D.B., Statistical Analysis with Missing Data: 2nd Edition, Wiley, 2002.

©CSCAR, 2010: Proc Mixed 130