analyzing longitudinal data with multilevel models: an ......ance structures and nonlinear models...

17
Analyzing Longitudinal Data With Multilevel Models: An Example With Individuals Living With Lower Extremity Intra-Articular Fractures Oi-Man Kwok Texas A&M University Andrea T. Underhill and Jack W. Berry University of Alabama at Birmingham Wen Luo University of Wisconsin—Milwaukee Timothy R. Elliott and Myeongsun Yoon Texas A&M University Objective: The use and quality of longitudinal research designs has increased over the past 2 decades, and new approaches for analyzing longitudinal data, including multilevel modeling (MLM) and latent growth modeling (LGM), have been developed. The purpose of this article is to demonstrate the use of MLM and its advantages in analyzing longitudinal data. Research Method: Data from a sample of individuals with intra-articular fractures of the lower extremity from the University of Alabama at Birmingham’s Injury Control Research Center are analyzed using both SAS PROC MIXED and SPSS MIXED. Results: The authors begin their presentation with a discussion of data preparation for MLM analyses. The authors then provide example analyses of different growth models, including a simple linear growth model and a model with a time-invariant covariate, with interpretation for all the parameters in the models. Implications: More complicated growth models with different between- and within-individual covari- ance structures and nonlinear models are discussed. Finally, information related to MLM analysis, such as online resources, is provided at the end of the article. Keywords: multilevel model, growth model, trajectory analysis, hierarchical linear model, rehabilitation Longitudinal designs have recently received more attention in a variety of different disciplines of psychology, including clinical, developmental, personality, and health psychology (West, Biesanz, & Kwok, 2003). In some areas, such as developmental psychology and personality psychology, a substantial number of recently published studies have been longitudinal (Biesanz, West, & Kwok, 2003; Khoo, West, Wu, & Kwok, 2006). For example, Khoo et al. (2006) found that slightly more than one third of articles published in Developmental Psychology in 2002 included at least one longitudinal study, defined as having at least two measurement occasions. This proportion is double the proportion of longitudinal studies published in the same journal in 1990. Furthermore, more than 70% of the longitudinal studies published in Developmental Psychology in 2002 included three or more measurement waves. In this article, we focus on the analyses of multiwave longitudinal data, in which multiwave is defined as more than two waves. With the growing use of longitudinal research, a number of methodological and statistical sources on the analysis of multi- wave longitudinal data have appeared in the past decade (e.g., Bollen & Curran, 2006; Collins & Sayer, 2001; Singer & Willett, 2003), including discussions of traditional approaches, such as repeated measures univariate analyses of variance (UANOVA) and multivariate analyses of variance. Multilevel models (MLMs), also known as hierarchical linear models (HLMs; Raudenbush & Bryk, 2002), random coefficient models (Longford, 1993), and mixed-effect models (Littell, Milliken, Stroup, Wolfinger, & Scha- benberber, 2006) have become an increasingly important approach for analyzing multiwave longitudinal data. Although MLMs have been widely adopted in educational re- search for more than two decades (Raudenbush, 1988), these models are still relatively new to researchers in rehabilitation psychology. Growth models first appeared in the rehabilitation psychology literature over a decade ago: Clay, Wood, Frank, Hagglund, and Johnson (1995) reported a compelling (yet circum- scribed) demonstration of using growth modeling of the develop- ment of emotional distress and behavioral problems of children with juvenile arthritis, juvenile diabetes, and no diagnosed health problems. An expanded report from this database appeared 3 years later in the Journal of Consulting and Clinical Psychology (Frank et al., 1998). Subsequent applications of HLM examined the dy- namic trajectory of adjustment among family members over the 1st year of their caring for a person with a spinal cord injury (Shew- chuk, Richards, & Elliott, 1998) and later identified salient pre- Oi-Man Kwok, Timothy R. Elliott, and Myeongsun Yoon, Department of Educational Psychology, Texas A&M University; Andrea T. Underhill and Jack W. Berry, Injury Control Research Center, University of Alabama at Birmingham; Wen Luo, Department of Educational Psychology, Uni- versity of Wisconsin—Milwaukee. This study was supported in part by National Institute of Child Health and Human Development Grant R01HD039367 awarded to Oi-Man Kwok and U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, National Center for Injury Prevention and Control Grant R49-CE000191 to the University of Alabama at Birmingham, Injury Control Research Center. The contents of this study are solely the respon- sibility of the authors and do not necessarily represent the official views of the funding agencies. Correspondence concerning this article should be addressed to Oi-Man Kwok, Department of Educational Psychology, 4225 TAMU, College Station, TX 77843-4225. E-mail: [email protected] Rehabilitation Psychology 2008, Vol. 53, No. 3, 370 –386 Copyright 2008 by the American Psychological Association 0090-5550/08/$12.00 DOI: 10.1037/a0012765 370

Upload: others

Post on 09-Jul-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analyzing Longitudinal Data With Multilevel Models: An ......ance structures and nonlinear models are discussed. Finally, information related to MLM analysis, such as online resources,

Analyzing Longitudinal Data With Multilevel Models: An Example WithIndividuals Living With Lower Extremity Intra-Articular Fractures

Oi-Man KwokTexas A&M University

Andrea T. Underhill and Jack W. BerryUniversity of Alabama at Birmingham

Wen LuoUniversity of Wisconsin—Milwaukee

Timothy R. Elliott and Myeongsun YoonTexas A&M University

Objective: The use and quality of longitudinal research designs has increased over the past 2 decades, andnew approaches for analyzing longitudinal data, including multilevel modeling (MLM) and latent growthmodeling (LGM), have been developed. The purpose of this article is to demonstrate the use of MLM andits advantages in analyzing longitudinal data. Research Method: Data from a sample of individuals withintra-articular fractures of the lower extremity from the University of Alabama at Birmingham’s InjuryControl Research Center are analyzed using both SAS PROC MIXED and SPSS MIXED. Results: Theauthors begin their presentation with a discussion of data preparation for MLM analyses. The authorsthen provide example analyses of different growth models, including a simple linear growth model anda model with a time-invariant covariate, with interpretation for all the parameters in the models.Implications: More complicated growth models with different between- and within-individual covari-ance structures and nonlinear models are discussed. Finally, information related to MLM analysis, suchas online resources, is provided at the end of the article.

Keywords: multilevel model, growth model, trajectory analysis, hierarchical linear model, rehabilitation

Longitudinal designs have recently received more attention in avariety of different disciplines of psychology, including clinical,developmental, personality, and health psychology (West,Biesanz, & Kwok, 2003). In some areas, such as developmentalpsychology and personality psychology, a substantial number ofrecently published studies have been longitudinal (Biesanz, West,& Kwok, 2003; Khoo, West, Wu, & Kwok, 2006). For example,Khoo et al. (2006) found that slightly more than one third ofarticles published in Developmental Psychology in 2002 includedat least one longitudinal study, defined as having at least twomeasurement occasions. This proportion is double the proportionof longitudinal studies published in the same journal in 1990.Furthermore, more than 70% of the longitudinal studies publishedin Developmental Psychology in 2002 included three or more

measurement waves. In this article, we focus on the analyses ofmultiwave longitudinal data, in which multiwave is defined asmore than two waves.

With the growing use of longitudinal research, a number ofmethodological and statistical sources on the analysis of multi-wave longitudinal data have appeared in the past decade (e.g.,Bollen & Curran, 2006; Collins & Sayer, 2001; Singer & Willett,2003), including discussions of traditional approaches, such asrepeated measures univariate analyses of variance (UANOVA)and multivariate analyses of variance. Multilevel models (MLMs),also known as hierarchical linear models (HLMs; Raudenbush &Bryk, 2002), random coefficient models (Longford, 1993), andmixed-effect models (Littell, Milliken, Stroup, Wolfinger, & Scha-benberber, 2006) have become an increasingly important approachfor analyzing multiwave longitudinal data.

Although MLMs have been widely adopted in educational re-search for more than two decades (Raudenbush, 1988), thesemodels are still relatively new to researchers in rehabilitationpsychology. Growth models first appeared in the rehabilitationpsychology literature over a decade ago: Clay, Wood, Frank,Hagglund, and Johnson (1995) reported a compelling (yet circum-scribed) demonstration of using growth modeling of the develop-ment of emotional distress and behavioral problems of childrenwith juvenile arthritis, juvenile diabetes, and no diagnosed healthproblems. An expanded report from this database appeared 3 yearslater in the Journal of Consulting and Clinical Psychology (Franket al., 1998). Subsequent applications of HLM examined the dy-namic trajectory of adjustment among family members over the 1styear of their caring for a person with a spinal cord injury (Shew-chuk, Richards, & Elliott, 1998) and later identified salient pre-

Oi-Man Kwok, Timothy R. Elliott, and Myeongsun Yoon, Departmentof Educational Psychology, Texas A&M University; Andrea T. Underhilland Jack W. Berry, Injury Control Research Center, University of Alabamaat Birmingham; Wen Luo, Department of Educational Psychology, Uni-versity of Wisconsin—Milwaukee.

This study was supported in part by National Institute of Child Healthand Human Development Grant R01HD039367 awarded to Oi-Man Kwokand U.S. Department of Health and Human Services, Centers for DiseaseControl and Prevention, National Center for Injury Prevention and ControlGrant R49-CE000191 to the University of Alabama at Birmingham, InjuryControl Research Center. The contents of this study are solely the respon-sibility of the authors and do not necessarily represent the official views ofthe funding agencies.

Correspondence concerning this article should be addressed to Oi-ManKwok, Department of Educational Psychology, 4225 TAMU, CollegeStation, TX 77843-4225. E-mail: [email protected]

Rehabilitation Psychology2008, Vol. 53, No. 3, 370–386

Copyright 2008 by the American Psychological Association0090-5550/08/$12.00 DOI: 10.1037/a0012765

370

Page 2: Analyzing Longitudinal Data With Multilevel Models: An ......ance structures and nonlinear models are discussed. Finally, information related to MLM analysis, such as online resources,

dictors of these trajectories (Elliott, Shewchuk, & Richards, 2001).Another study used HLM to study the growth curve trajectory ofthe functional abilities of persons receiving inpatient spinal cordinjury rehabilitation (Warschausky, Kay, & Kewman, 2001). Astudy of problem-solving training with caregivers of stroke survi-vors was among the first to use HLM in a randomized clinical trial(Grant, Elliott, Weaver, Bartolucci, & Giger, 2002), which height-ened expectations about the utility of MLM in intervention re-search. The initial enthusiasm for MLM was expressed by oneauthor, who opined that these techniques could have “an immenseimpact on program development and evaluation” and help the fieldidentify “who responds best to what and why, who is at risk, andwho responds optimally regardless of treatment options” (Elliott,2002, p. 138).

Unfortunately, very few studies using MLMs have recentlyappeared in the rehabilitation psychology literature, implying thatthe field has yet to realize the potential and possibilities of theseapproaches. This is particularly unfortunate in light of the infor-mative and extensive longitudinal databases that have been col-lected over the years to help with the understanding of certainhigh-cost, high-impact disabilities (e.g., spinal cord injury andtraumatic brain injuries in the Model Systems Projects).

Therefore, the present article is intended to be an elementaryintroduction to MLM for researchers in rehabilitation psychologywho have interest in and access to longitudinal data that could beanalyzed with these techniques.

Comparison of MLM with Repeated Measures UANOVA

Most researchers in rehabilitation psychology will be familiarwith the use of repeated measures UANOVA for analyzing mul-tiwave data. In this section, therefore, we provide an overview ofsome of the similarities and differences between repeated mea-sures analyses of variance (ANOVA) and MLM. We provide a

summary of our comparison in Table 1. In the simplest multiwavedesign, a purely within-subjects design, the researcher will havecollected repeated measurements on the same sample of researchparticipants over time, and the primary question is whether there iswithin-subjects change in the sample on a particular outcomevariable. With repeated measures UANOVA, time is thought of asa categorical factor, and the researcher can conduct a variety ofcontrasts to compare differences among time periods. For exam-ple, each postbaseline time period can be compared to the baseline,or all adjacent time periods can be compared. An alternativestrategy is polynomial trend analysis. In ANOVA trend analysis,the question is whether there is an average linear, quadratic, orhigher order polynomial trend in mean levels of the outcomevariable over time.

The repeated measures ANOVA-based analyses can be viewedas special cases of MLMs (Kwok, West, & Green, 2007). Hence,MLM can employ these same analytic strategies for simple within-subjects designs but, as we describe in more detail below, MLMcan provide several advantages over ANOVA in terms of handlingmissing data and flexible modeling of variance–covariance struc-tures. MLM also offers a unique data analytic strategy for within-subjects designs that is not possible with UANOVA. Namely,MLM can be used to model individual-level trends over time, inwhich polynomial trends (rather than simply average trends) canbe estimated for each participant. This approach is referred to asindividual growth models. In UANOVA, individual growth mod-els are not estimated; rather, an average growth model is estimatedin a single analysis of all participants, and individual variationaround the average model is treated as “unexplained” error. InMLM, regression parameters from all the individual growth mod-els, including intercepts, slopes, or both can be treated as randomeffects for estimation. Advantages of modeling these random ef-fects have been reviewed in Kwok et al. (2007).

There are different estimation methods for UANOVA and

Table 1Comparison of Analysis of Variance (ANOVA)-Based Analyses and Multilevel Model (MLM) Analyses

Point of comparison Repeated measures univariate ANOVA MLMs

Analysis strategies A. Simple within-subjects designs:Time contrasts andaverage polynomial trends (time as a categoricalfactor; balanced data and equal time spacingassumed)

A. Simple within-subjects designs:Time contrasts andaverage polynomial trends (time as a categoricalfactor; balanced data and equal time spacingassumed)

B. Mixed within/between designs (e.g., randomizedclinical trials): Interaction effects between timeand other between-subjects factors

B. Mixed within/between designs (e.g., randomizedclinical trials):1. Interaction effects between time and other

between-subjects factors2. Cross-level interactions (e.g., effects of

between-subjects factors/variables on individualgrowth trajectories)

C. Regression parameters from the individual growthmodels including intercepts, slopes, or both can betreated as random effects for estimation (time as acontinuous variable; unbalanced data and unequaltime spacing accommodated)

Estimation method Least squares Maximum likelihoodMissing data Complete cases only All available dataVariance–covariance structure Compound symmetry (or Huynh-Feldt, which is a

more general form of the compound symmetrystructure) to meet the sphericity assumption

Flexible structure

Covariates Time-invariant covariates only Both time-invariant and time-varying covariates

371SPECIAL ISSUE: ANALYZING LONGITUDINAL DATA

Page 3: Analyzing Longitudinal Data With Multilevel Models: An ......ance structures and nonlinear models are discussed. Finally, information related to MLM analysis, such as online resources,

MLM. In repeated measures UANOVA, least squares (LS) esti-mation is generally used, whereas in MLM, maximum likelihood(ML) is one of the commonly used estimation methods. A detaileddiscussion of LS and ML methods is beyond the scope of thisarticle. One additional note on ML is given because there are twocommonly used ML methods1 available in several major MLMprograms (e.g., HLM, SPSS, SAS, and STATA) when the outcomevariable is continuous and normally distributed. The default esti-mation method in both SPSS (MIXED) and SAS (PROC MIXED)for analyzing multilevel data with continuous outcomes is therestricted maximum likelihood (REML) estimation. The alterna-tive estimation is the full information maximum likelihood esti-mation. REML can provide more accurate results when samplesize (especially the number of higher level units) is small (Hox,2002; Raudenbush & Bryk, 2002). However, full informationmaximum likelihood can compare the goodness of fit for bothfixed and random parts between nested models using likelihoodratio tests, whereas REML can only compare the goodness of fitfor the random part between nested models. The results presentedin this article are based on the default REML estimation method inboth SAS and SPSS. More detailed information on estimation canbe found in the text by Raudenbush and Bryk (2002).

Another major difference between UANOVA and MLM is inthe treatment of the time predictor. In individual growth models,time can be treated as a continuous variable in MLM. Because ofthis, MLM can accommodate unequal spacing between time inter-vals and unbalanced data. Observations may be collected at un-equally spaced intervals (e.g., measurements collected 0 months, 3months, 6 months, 1 year, and 5 years following treatment).Observations may also be collected at different time points fordifferent participants (e.g., for the first participant 0, 3, 6, and 9months following treatment; for the second participant 1, 5, 10,and 12 months following treatment).2 Such patterns of observa-tions may occur because of practical problems in implementing theoriginal data collection design. Unbalanced data and unequal spac-ing conditions can be flexibly handled under MLM through ade-quate specification of the time predictor. On the other hand, allparticipants are assumed to have the same number of assessments(balanced data), and the intervals between time periods are as-sumed to be equal (equal spacing) when using UANOVA.

In the same vein, missing data can be handled flexibly in MLMbut not in UANOVA. Missing data can arise for many reasons inmultiwave longitudinal research: missed appointments, participantincapacity, dropout, or loss to follow-up for a variety of reasons.However, only complete cases can be included in an analysis whenusing UANOVA. If a research participant is missing for even asingle time period, all of that participant’s data are removed fromthe analysis. The capacity of MLM (using likelihood-based esti-mation) to incorporate all available data in an analysis can beespecially useful in conducting intention-to-treat (ITT) analyses incontrolled clinical trials. Recent reviews of ITT studies havesuggested that inappropriate handing of missing data is the chiefproblem with published reports of ITT-based clinical trials(Gravel, Opatrny, & Shapiro, 2007; Hollis & Campbell, 1999).The requirement of complete data in UANOVA can lead to sub-stantial losses of statistical power and precision in longitudinalresearch. We discuss the possibility of imputing missing datavalues in a later section of this article.

An advantage of MLM is that it can make use of all available

data in the estimation of model parameters due to its flexibletreatment of the time predictor. A research participant with onlybaseline data can be included in an analysis and contribute to theestimation of model parameters. The validity of using all availabledata does depend on whether missing data are missing completelyat random or missing at random, which is a less restrictive missingdata assumption, and methods of assessing this requirement areavailable. In addition, the treatment of time as a continuous insteadof discrete variable in MLM can increase the statistical power fordetecting the growth effects (Muthen & Curran, 1997). For thesereasons, MLM is a preferred option for ITT analyses in clinicaltrials and other intervention studies, particularly when the theoret-ical model anticipates a gradual response to the intervention overtime (as typified in most psychological theories of therapeuticresponse).

UANOVA and MLM also differ on the statistical assumptionsrelated to the variance–covariance structure in analyses of longi-tudinal data. In repeated measures UANOVA, the variance–co-variance matrix of observations taken over time is assumed to meetthe requirements of sphericity in which compound symmetry is asufficient condition for fulfilling the sphericity assumption. Com-pound symmetry implies that the variances of measures at eachtime period are equal, and also that the covariances between allpairs of time periods are equal. This is a strong assumption and islikely to be unrealistic for many (if not most) longitudinal studies(Kwok et al., 2007). For example, in some studies, research par-ticipants might show greater variability in an outcome measureover time, whereas in others there might be growing convergenceover time. It is also possible that covariances between variablesover time will be smaller with greater distances between timeintervals. Violations of the assumption of sphericity can lead toincorrect decisions in ANOVA-based analyses. In MLM, there isgreat flexibility in specifying the variance–covariance structure oflongitudinal data (Chi & Reinsel, 1988; Diggle, 1988; Jones &Boadi-Boateng, 1991; Laird & Ware, 1982; Wolfinger, 1993). Themost flexible option in MLM analysis is a general “unstructured”variance–covariance assumption in which every variance and co-variance is free to be estimated from the data. Other, more restric-tive assumptions include autocorrelated structures, in which co-variances are a function of distances between time periods, andvariances can be modeled as either homogeneous or variable.

One more difference between repeated measures UANOVA andMLM concerns the use of covariates in statistical analyses. Co-variates can be used in within-subjects research for many reasons:to reduce error variance, to statistically “equalize” participants onsome variable of interest, or to find mediators of the relationshipbetween time and an outcome variable. In repeated measuresUANOVA, all covariates in a model must be time-invariant. Inother words, individual measures on the covariates do not change

1 The Bayesian methods (e.g., the empirical Bayesian and fully Bayesianestimators) are also widely used estimation methods in MLM. The discus-sion of these methods is beyond the focus of this article. More informationon these Bayesian estimation methods can be found in Raudenbush andBryk’s (2002) text.

2 One alternative way to handle the unequal space issue in UANOVA isto create a covariate capturing the unequal space and include the covariatein the analysis (i.e., a univariate analysis of covariance).

372 KWOK ET AL.

Page 4: Analyzing Longitudinal Data With Multilevel Models: An ......ance structures and nonlinear models are discussed. Finally, information related to MLM analysis, such as online resources,

with time and therefore have a constant effect across all measure-ment occasions. Examples of such time-invariant covariates wouldbe the age of a participant at the beginning of a study or aparticipant’s standing on stable trait variables. Using MLM, incontrast, the researcher can include time-varying covariates in ananalysis. Time-varying covariates are often assessed concurrentlywith major outcome variables and can change over time for eachparticipant. The inclusion of time-varying covariates can provide amuch more sensitive and realistic assessment of covariate effectsfor unstable, state-like variables that might influence primary out-come variables.

We have thus far described potential advantages of using MLMin the analysis of strictly within-subjects research designs. How-ever, a common use of multiwave research combines within-subjects repeated measures with one or more between-subjectsvariables. An example of such a design is the randomized clinicaltrial with multiple outcome assessments over time. In such clinicaltrials, treatment condition is a between-subjects factor, and time(assessment occasion) is a within-subjects factor. In usingANOVA to assess clinical trial data, the focus is on the statisticalsignificance of the Treatment � Time interaction. If the interactionis significant, simple main effects can be conducted to determinethe nature of the differential change between treatment conditions.Differences in average time contrasts and polynomial trends can beused for this purpose.

The same strategies can be used in MLM, but MLM can alsocombine the advantages of individual growth curve analysis withthe examination of interactions of treatment with time. Specifi-cally, because MLM separates the random effects into two parts(between-subjects random effects and within-subject random er-rors), MLM allows for the examination of new effects of interestsuch as cross-level interaction effects. For example, researcherscan examine how treatment condition (and other between-subjects-level predictors) influences the individual growth trajectories(within-subjects repeated measures) of research participants overtime. Even if individual growth curves are not estimated, the MLMapproach to assessing treatment effects over time offers clearadvantages over the repeated measures approach. For example, ina recent randomized, controlled trial of problem-solving trainingfor family caregivers of a loved one with a traumatic brain injury(Rivera, Elliott, Berry, & Grant, 2008), caregivers were assessedon a variety of psychological and health-related outcomes overfour time periods. In this study, individual growth curves were notestimated (because of convergence problems due to relativelysmall sample sizes in the control and treatment groups). Nonethe-less, the MLM analysis permitted the inclusion of data from allavailable caregivers, allowed for the estimation of an unstructuredvariance–covariance structure, and provided the opportunity toinclude potentially mediating time-varying covariates in the mod-els (such as problem-solving abilities at each assessment).

The major focus of this article is to demonstrate how to analyzelongitudinal rehabilitation data using MLM. We have alreadyprovided a sketch of some of the advantages of MLM for suchdata. In what follows, we provide a more detailed discussion ofboth the theoretical underpinnings of MLM and some practicalguidance for how to conduct MLM. Our analyses will highlight theuse of individual growth models, as these models are uniquelyprovided by the MLM approach. Data from a sample of individualswith intra-articular fractures of the lower extremity from the Uni-

versity of Alabama at Birmingham’s Injury Control ResearchCenter (UAB-ICRC) are analyzed using both SAS PROC MIXEDand SPSS MIXED, and the corresponding annotated SAS andSPSS syntaxes for different growth models are presented. Formore information on how to analyze longitudinal data using SASand SPSS, readers can consult Singer’s (1998) article on usingSAS PROC MIXED and Peugh and Enders’s (2005) article onusing SPSS MIXED. We start our presentation with data prepara-tion, followed by the analyses of different growth models, includ-ing a simple linear growth model and the model with a time-invariant covariate, with interpretation for all the parameters in themodels.

Data Description

A sample of individuals with intra-articular fractures (IAFs) ofthe lower extremity from a large-scale, ongoing project, A Longi-tudinal Study of Rehabilitation Outcomes, by the UAB-ICRC, isused here for the demonstration. The UAB-ICRC longitudinalstudy included persons with at least one of four potentially dis-abling injuries (i.e., spinal cord injury, traumatic brain injury,severe burns, or IAFs of the lower extremity) who were dischargedfrom a sample of hospitals representing a cross-section of individ-uals in north-central Alabama. The criteria for inclusion in thelongitudinal study were as follows: (a) had an acute care length ofstay of 3 or more days; (b) resided and was injured in Alabama; (c)was discharged alive from an acute care hospital between October1, 1989, and September 30, 1992; (d) was more than 17 years oldwhen injured; and (e) could be contacted at prespecified intervalsafter discharge. Data have been collected approximately annuallyfrom 12 months postdischarge to 180 months postdischarge. Forthe purposes of this article, data collected at 12, 24, 48, and 60months postdischarge were used (no data were collected 36months postdischarge). The details of the data collection procedurecan be found elsewhere (Underhill, Lobello, & Fine, 2004; Un-derhill et al., 2003).

There were 251 individuals with IAFs who had data for at leastone time point. Among these 251 individuals, 131 had completedata for all four time points (i.e., 12, 24, 48, and 60 monthsfollowing acute care). The descriptive statistics for these twosamples (N � 131 and N � 251) are presented in Table 2. Weconducted attrition analyses by comparing the two samples on alldemographic variables and found no significant differences be-tween the two samples on any of the demographic variables. Forpedagogical reasons, we start the demonstration with the 131participants with complete data. We discuss the analyses with thefull data set (with N � 251) in the later section.

We used the physical domain of the Functional IndependenceMeasure (FIM) as the outcome variable for the demonstration.FIM is a widely used self-report scale of functional status (Keith,Granger, Hamilton, & Sherwin, 1987) that contains 18 seven-point, Likert-type items with responses ranging from 1 (totalassistance) to 7 (complete independence). The FIM has beenshown to have adequate reliability and validity (Putzke, Barrett,Richards, Underhill & LoBello, 2004). In this study, the reliabili-ties of the FIM scale at different time points ranged from .91 to .98.The FIM scale can be further divided into two subdomains,namely, a physical (or motor) domain (13 items) and a cognitivedomain (5 items; Greenspan, Wrigley, Kresnow, Branche-Dorsey,

373SPECIAL ISSUE: ANALYZING LONGITUDINAL DATA

Page 5: Analyzing Longitudinal Data With Multilevel Models: An ......ance structures and nonlinear models are discussed. Finally, information related to MLM analysis, such as online resources,

& Fine, 1996; Hall, Hamilton, Gordon, & Zasler, 1993; Heine-mann, Linacre, Wright, Hamilton, & Granger, 1993). The physicaldomain includes items such as eating, grooming, bathing, dressing,toileting, managing bladder and bowels, walking, and transferringto bed/toilet/tub. The cognitive domain contains items such ascomprehension, expression, social interaction, problem solving,and memory. Higher scores on these two domains indicate higherfunctional ability. In our demonstration, we focus on the change inthe physical domain (FIM__P) of the participants over time. Themeans and standard deviations of the four FIM__P scores for thetwo samples (i.e., N � 131 and N � 251) are presented in Table3. Based on the information from Table 3, a potential negative

trend of the FIM__P scores and an increase in score variabilityover time may be found in both samples.

Data Preparation

In general, data are in multivariate (sometimes called “wide”)format as shown in Figure 1A; that is, each row represents aparticipant, and each column represents a specific variable. Toanalyze the data using MLM, we need to transform the multivar-iate data (Figure 1A) into the univariate (sometimes called “long”)format (Figure 1B), in which each row represents a specific timepoint rather than a participant. Table 4 presents the corresponding

Table 2Descriptive Statistics of Participants With Intra-Articular Fractures

Characteristic N � 131 N � 251

Age (SD) 45.82 (17.02) 43.89 (17.59)Gender, n (%)

Men 75 (57.3) 142 (56.6)Women 56 (42.7) 109 (43.4)

Ethnicity, n (%)African American 38 (29.0) 74 (29.5)Caucasian 93 (71.0) 175 (69.7)Other 0 (0.0) 2 (0.8)

Employment status at 12-month follow-up, n (%)Employed, full time 41 (31.3) 72 (28.7)Employed, part time 3 (2.3) 9 (3.6)Self-employed 4 (3.1) 10 (4.0)Unemployed 18 (13.7) 35 (13.9)Student 4 (3.1) 6 (2.4)Retired 20 (15.3) 37 (14.7)Not working because of a previous disability 35 (26.7) 67 (26.7)Other 5 (3.8) 11 (4.4)Unknown 1 (0.8) 4 (1.6)

Marital status at 12-month follow-up, n (%)Single 28 (21.4) 61 (24.3)Married 74 (56.5) 126 (50.2)Divorced 16 (12.2) 31 (12.4)Separated 2 (1.5) 8 (3.2)Widowed 10 (7.6) 21 (8.4)Other/unknown 1 (0.8) 4 (1.6)

Education status at 12-month follow-up, n (%)8th grade or less 15 (11.5) 23 (9.2)9th–11th grade 27 (20.6) 59 (23.5)High school diploma/GED 48 (36.6) 86 (34.3)Trade school 2 (1.5) 3 (1.2)Some college, no degree 19 (14.5) 41 (16.3)Associate degree 1 (0.8) 4 (1.6)Bachelor’s degree 11 (8.4) 21 (8.4)Master’s degree 6 (4.6) 6 (2.4)Doctorate 0 (0.0) 1 (0.4)Other 2 (1.5) 3 (1.2)Unknown 0 (0.0) 4 (1.6)

Table 3Descriptive Statistics of the Functional Independence Measure of the Physical Domain(FIM__P) Over Time

FIM__P time period M (SD) for N � 131 sample M (SD) for N � 251 sample

FIM__P at 1st (12-month) follow-up 87.21 (5.21) 86.37 (8.37)FIM__P at 1st (24-month) follow-up 86.79 (5.68) 86.42 (7.44)FIM__P at 1st (48-month) follow-up 85.41 (11.52) 85.96 (10.03)FIM__P at 1st (60-month) follow-up 84.40 (13.06) 84.64 (12.64)

374 KWOK ET AL.

Page 6: Analyzing Longitudinal Data With Multilevel Models: An ......ance structures and nonlinear models are discussed. Finally, information related to MLM analysis, such as online resources,

annotated SAS and SPSS syntaxes for converting data from mul-tivariate format to univariate format. In SPSS, menu-driven assis-tance is also available using Restructure under the main Datamenu.

In multivariate format, each row represents a participant, andeach column represents a variable. All time-varying variables (i.e.,allowing different variable values at different time points, such asthe FIM__P score) and time-invariant variables (i.e., no change inthe variable value over time, such as gender or age at the first timemeasure) are presented in the columns of the data. In other words,each time measure is a variable in the multivariate format data(e.g., the four different time measures for FIM__P are representedby four different variables, namely, FIM__P12, FIM__P24,FIM__P48, and FIM__P60). However, in the univariate formatdata, each row represents a specific time observation, and eachindividual can have multiple rows of observations. For participantswith complete data (N � 131), each individual has four rows ofdata lines to represent the four different time measures (i.e., the12-month, 24-month, 48-month, and 60-month postdischarge mea-sures). A new variable, time, called an index variable, is created asthe indicator for each row of time measures. In addition, research-ers should also screen their data to check for potential errors andoutliers through the steps recommended by Fidell and Tabachnick(2003; Tabachnick & Fidell, 2006). Plotting individual-level data(e.g., spaghetti plots as shown in Figure 2) before conducting anydata analyses is always recommended. These plots can be usefulfor determining possible polynomial trends that might fit the dataand can also be used to flag unusual levels or trajectories forindividual participants.

Model Specification and Analysis

In this section, we first start with a simple random interceptmodel (for calculating intraclass correlation [ICC]) and a simplelinear growth model, followed by a model with a time-invariantcovariate. We then move beyond the default growth models anddiscuss more complicated models, including models with differentbetween- and within-individual covariance structures and modelswith nonlinear growth patterns. The annotated SPSS and SASsyntaxes for fitting all the growth models in this article are pre-sented in Table 5.

Random Intercept Model and Simple Linear GrowthModel

In MLMs for longitudinal data, the lowest level of data is thespecific measurement at a particular time. This lowest level isreferred to as Level 1 data. Each Level 1 measurement is nestedwithin a particular research participant. The individual, then, con-stitutes Level 2 data. If there is another level of nesting (e.g., ifparticipants are nested within schools), this would be Level 3 data,and so on. The simplest two-level model with a total of 131participants and four repeated measures of FIM__P per individualover time can be presented as follows:

For the Level 1 (repeated measures-level) model,

FIM__Pti � �0i � eti, (1)

where t represents the four different measurement occasions (i.e.,

Figure 1. A: Data in multivariate format (SPSS). B: Data in univariate format (SPSS).

375SPECIAL ISSUE: ANALYZING LONGITUDINAL DATA

Page 7: Analyzing Longitudinal Data With Multilevel Models: An ......ance structures and nonlinear models are discussed. Finally, information related to MLM analysis, such as online resources,

12, 24, 48, and 60-month follow-ups) and i represents the 131participants (i.e., i � 1. . .131). �0i is the estimated averageFIM__P score (over the four FIM__P scores) for the ith individual.eti is the within-individual random error, which captures the dif-ference between the observed FIM__P score at time t and thepredicted (average) score of the ith participant. eti is generallyassumed to be normally distributed with variance equal to �2, thatis, eti � N(0, �2), which captures the within-individual variation.Equation 1 shows the average FIM__P score for the ith patient. Inthis example, we have 131 patients, so that we have 131 averageFIM__P scores. We can further summarize these 131 averageFIM__P scores by the following equation:

For Level 2 (individual-level) models,

�0i � �00 � U0i, (2)

where �00 is the grand mean of the 131 average FIM__P scoresand U0i is the difference between the ith average FIM__P score

and the grand mean. As shown in Table 6 (Model ICC), the grandmean FIM__P score was 85.954. U0i is assumed to be normallydistributed with variance equal to �00, that is, U0i � N(0, �00). Inaddition, the within-individual random errors (eti) are assumed tobe independent from the between-individual random effects (U0i).The combination of Equations 1 and 2 is named as the random-intercept model, in which no predictor is included in these twoequations. ICC, which measures the magnitude of dependencybetween observations, can be calculated by using the within- andbetween-individual variances (i.e., �2 and �00) from Equations 1and 2:

ICC ��00

�00 � �2. (3)

The ICC for this example was 43.915/(43.915 � 47.753) � .479,based on the variance estimates presented in Table 6 (Model ICC).ICC is the proportion of the between-individual variance to thesum of the between- and within-individual variances of an out-come variable and generally ranges between 0 and 1. Hox (2002)interpreted ICC as “the proportion of the variance explained by thegrouping structure in the population” (p. 15). ICC can also be(roughly) viewed as the average relation between any pair ofobservations (i.e., the FIM__P scores) within a cluster (i.e., apatient in our example). Barcikowski (1981) showed that the TypeI error rate could be inflated (e.g., from the nominal .05 level to.06) when a very small ICC (e.g., .01) occurred. ICC in educationalresearch with cross-sectional design generally ranges between .05and .20 (Snijders & Bosker, 1999). The relatively high ICC fromthis example (.479) is probably due to the longitudinal nature ofthe data given that the same measure was assessed repeatedly fromthe same patient over time.

A simple linear growth model can be presented by the followingequation:

Table 4SPSS and SAS Syntax for Transforming the Data FromMultivariate to Univariate Format

Program Syntax

SPSS VARSTOCASES/MAKE FIM__P FROM FIM__P12FIM__P24 FIM__P48 FIM__P60/INDEX � Time(4)/KEEP � id age c__age/NULL � KEEP.RECODETime (1 � 0) (2 � 12) (3 � 36)(4 � 48).

SAS data uni; set mult;FIM__P � FIM__P12;time � 0;output;FIM__P � FIM__P24;time � 12;output;FIM__P � FIM__P48;time � 36;output;FIM__P � FIM__P60;time � 48;output;keep id age c__age time FIM__P;

Note. For SPSS syntax, VARSTOCASES � call the data transformationprocedure to convert the multivariate data format to univariate data format;MAKE FIM__P FROM FIM__P12 FIM__P24 FIM__P48 FIM__P60 �create new (univariate) outcome variable FIM__P from FIM__P12 toFIM__P60; INDEX � Time(4) � create an index variable to representdifferent data line within each participant (e.g., individual). In this exam-ple, we create a new index variable (time) with 1,2,3,4 to represent the 4different data lines within each individual in the univariate dataset;KEEP � id age c__age � keep the time invariant variable (e.g., id and ageat the first time measure); NULL � KEEP � keep the missing data as aseparate data line (e.g., if an individual has no data at both Times 3 and 4,this individual will still have four data lines in the new converted univariatedataset with missing data shown for both 3rd and 4th data lines; RECODETime (1 � 0) (2 � 12) (3 � 36) (4 � 48) � recode the values in the newlycreated Time variable (i.e., 1, 2, 3, and 4) to the corresponding time valueswe used in the example (i.e., 0, 12, 36, and 48 months). For SAS syntax,data � name the new converted univariate format data as “uni”; set � readin the original multivariate format data named “mult”; FIM__P �FIM__P12 � create new variable named FIM__P using the originalvariable FIM__P12; time � 0 � create new variable “time” and set thevalue for the first time measure as 0; output � output to the new convertedunivariate dataset (the same commands apply to the next three commandlines for different time measures); keep id c__age time FIM__P � includeall these variables (i.e., id c__age time FIM__P) in the new convertedunivariate dataset named “uni.” FIM__P � Functional Independence Mea-sure of the physical domain.

Figure 2. Spaghetti plots of a random sample with 20 participants.

376 KWOK ET AL.

Page 8: Analyzing Longitudinal Data With Multilevel Models: An ......ance structures and nonlinear models are discussed. Finally, information related to MLM analysis, such as online resources,

For the Level 1 (repeated measures-level) model,

FIM__Pti � �0i � �1iTimeti � eti. (4)

To simplify the illustration, we coded Timeti as 0 for the first (i.e.,12-month) follow-up, Timeti as 12 for the second (i.e., 24-month)follow-up, Timeti as 36 for the third (i.e., 48-month) follow-up,and Timeti as 48 for the fourth (i.e., 60-month) follow-up. Unlikein Equation 1 of the random intercept model, �0i in Equation 4 isthe estimated FIM__P score for the ith individual at the first (i.e.,12-month) follow-up when Timeti is equal to 0. �1i is the averagemonthly change in FIM__P score for the ith individual over time.eti is still the within-individual random error with variance equal to�2, which captures the within-individual variation, that is, eti �N(0, �2). More discussion on modeling the within-individual vari-ation is given in a later section. Equation 4 shows the regressionmodel based on the four FIM__P scores for the ith participant.

Indeed, we can fit the same model to the 131 participants sepa-rately. Hence, we can have 131 different sets of regression coef-ficients (i.e., the intercept, �0, and the average monthly change,�1). We can summarize these 131 sets of parameter estimates bythe following two equations:

For Level 2 (individual-level) models,

�0i � �00 � U0i (5)

�1i � �10 � U1i, (6)

where �00 is the average score of FIM__P at the initial time point(i.e., Timeti � 0) and �10 is the average monthly change inFIM__P over the 131 participants. Both U0i and U1i are thebetween-individual random effects and are assumed to be normallydistributed, that is,

Table 5Syntaxes for Analyzing Different Models in SAS and SPSS

Model SPSS syntax SAS syntax

ICC Mixed fim__p proc mixed data � uni covtest;/fixed � intercept class individual__id;/random intercept subject(individual__id) covtype(UN) model fim__p � /solution;/print � solution testcov. random intercept / type � un subject � individual__id;

A Mixed fim__p with time proc mixed data � uni covtest;/fixed � intercept time class individual__id;/random intercept time subject(individual__id) covtype(UN) model fim__p � time /solution;/print � solution testcov. random intercept time / type � un subject � individual__id;

B Mixed fim__p with time c__age proc mixed data � uni covtest;/fixed � intercept time c__age time� c__age class individual__id;/random � intercept time subject(individual__id) covtype(UN) model fim__p � time c__age time�c__age /solution;/print � solution testcov. random intercept time/ type � un subject � individual__id;

C1 Mixed fim__p with time proc mixed data � uni covtest;/fixed � intercept time class individual__id;/random � intercept time subject(individual__id) covtype(diag) model fim__p � time /solution;/print � solution testcov. random intercept time / type � un(1) subject � individual__id;

C2 Mixed fim__p with time c__age proc mixed data � uni covtest;/fixed � intercept time c__age time� c__age class individual__id;/random � intercept time subject(individual__id) covtype(diag) model fim__p � time c__age time�c__age /solution;/print � solution testcov. random intercept time/ type � un(1) subject � individual__id;

C3 Mixed fim__p with time c__age by index1 proc mixed data � uni covtest;/fixed � intercept time c__age time�c__age class individual__id;/random � intercept time subject(individual__id) covtype(diag) model fim__p � time c__age time�c__age /solution;/repeated � index1 subject(individual__id) covtype(ar1) random intercept time/ type � un(1) subject � individual__id;/print � solution testcov. repeated / type � ar(1) subject � individual__id;

D (Same as Model C3) (Same as Model C3)

Note. For SPSS syntax (Model C3 as example), Mixed fim__p with time c__age by index1 � call the Mixed procedure in SPSS and identify the dependentvariable (fim__p) with the continuous predictors (time and c__age) by the categorical predictor or predictors (index1: label for each data line within eachindividual); fixed � specify the average growth (or fixed-effect) model; random � specify the random effects (or request for the estimation of thebetween-individual covariance matrix); subject(individual__id) � specify the Level 2 cluster ID (i.e., individual id); covtype(diag) (in the “random”command line) � specify the structure of the between-individual covariance matrix as diagonal (diag) structure; repeated � index1 � request for theestimation of the within-individual covariance matrix; covtype(ar1) (in the “repeated” command line) � specify the structure of the within-individualcovariance matrix as the first-order autoregressive structure; print � solution � request for the growth parameter estimates (e.g., �0 and �1) and theircorresponding standard errors (e.g., SE�0 and SE�1); testcov � request for the tests of the parameter estimates in the random effects and errors. For SASsyntax (Model C3 as example), proc mixed � call the proc mixed procedure in SAS; data � the data set for the analysis; covtest � request for the testsof the parameter estimates in the random effects; class � specify the categorical variable individual__id; model � specify the average growth (orfixed-effect) model; solution � request for the growth parameter estimates (e.g., �0 and �1) and their corresponding standard errors (e.g., SE�0 and SE�1);random � specify the random effects (or request for the estimation of the between-individual covariance matrix); type � un(1) (next to the “random”command) � specify the structure of the between-individual covariance matrix as un(1) structure (same as the diagonal structure in SPSS); subject �specify the Level 2 cluster ID (i.e., individual__id); repeated � request for the estimation of the within-individual covariance matrix; type � ar(1) (nextto the “repeated” command) � specify the structure of the within-individual covariance matrix as the first-order autoregressive structure. ICC � intraclasscorrelation; fim__p � Functional Independence Measure of the physical domain.

377SPECIAL ISSUE: ANALYZING LONGITUDINAL DATA

Page 9: Analyzing Longitudinal Data With Multilevel Models: An ......ance structures and nonlinear models are discussed. Finally, information related to MLM analysis, such as online resources,

� U0i

U1i� � N0, T,

where

T � � �00 �01

�10 �11�.

U0i captures the difference between the intercept (�0i) of the ithparticipant from the average intercept �00, and U1i captures thedifference between the estimated monthly change in FIM__P (�1i)of the ith participant from the average monthly change in FIM__P(�10) across the 131 participants. The variances of U0i and U1i are�00 and �11, respectively, which capture the between-individualvariation.

The meaning of �00, �11, and the covariance, �01 (or �10) be-tween the two random effects (i.e., U0i and U1i) can be furtherexplained through visualization. As shown in Figure 3A, theaverage model is the bolded straight line with the average intercept�00 (i.e., the average FIM__P score at the first time measure) andthe average monthly change �10, and the other straight lines are theindividual predicted models for different participants. The varia-tion between the 131 intercepts (of the 131 regression models) andthe average intercept �00 is captured by �00, and the variationbetween the 131 monthly changes and the average monthly change�10 is captured by �11. When �00 is equal to zero, as shown inFigure 3B, all the intercepts of the 131 regression models are thesame as the average intercept �00, and all 131 regression models

pass through �00. However, when �11 is equal to zero, all thepredicted monthly changes of the 131 regression models are thesame as the average monthly change �10, and all 131 regressionmodels are parallel to the average model (i.e., the bolded straightline), as shown in Figure 3C. Figure 3D shows a possible look fora positive covariance �01. That is, individuals who have a higherFIM__P score at the first time measure are more likely to have alarger predicted monthly change in FIM__P than are those whoscore lower on the FIM__P at the first time point.

Consistent with the mean pattern as shown in Table 3, theFIM__P score decreased over time (see Model A in Table 6). Theaverage FIM__P score at the first time measure (i.e., 12-monthfollow-up) was 87.351, and it decreased 0.058 point per month (or0.70 point per year). On average, the patients reported worsefunctional abilities as time passed. Except for the covariance �01,the variances of the two random effects (i.e., �00 and �11) werestatistically significant, which indicated a significant amount ofvariation between the 131 individual regression models and theaverage model. The significance of these two random-effect vari-ances also implied that some potential individual-related variablesmight be able to explain/account for the variation between indi-vidual regression models and the average model. The significantvariance of the within-individual random error, �2, indicated asignificant amount of variation between the observations at differ-ent time points and the individual regression model within eachperson.

Table 6Parameter Estimates for Different Models

Parameter Model ICC Model A Model B Model C1 Model C2 Model C3 Model D

Fixed effectsIntercept (�00) 85.954 87.351 87.351 87.351 87.351 88.378 86.476

SE .653 .448 .448 .450 .448 .447 .492p �.001 �.001 �.001 �.001 �.001 �.001 �.001

Time (�10) �.058 �.058 �.058 �.058 �.058 �.035SE .024 .023 .024 .023 .023 .017p .015 .014 .015 .014 .014 .040

c__Age (�01) �.028 �.028 �.022 �.116SE .026 .026 .026 .028p .294 .294 .411 �.001

Time�c__Age (�11) �.003 �.003 �.003 �.003SE .001 .001 .001 .001p .043 .042 .037 .008

Random effects�00 43.915 15.253 15.231 15.459 15.222 22.004 55.384�11 .061 .059 .062 .059 .067 .051�01 .024 �.001�2 47.753 17.073 17.073 17.019 17.075 11.662 10.302 �.450 �.403

Overall model test�2LL 3,713.001 3,449.097 3,459.544 3,449.133 3,459.544 3,445.827 5,295.452AIC 3,717.001 3,457.097 3,467.544 3,455.133 3,465.544 3,453.827 5,303.452BIC 3,725.520 3,474.128 3,484.559 3,467.906 3,478.306 3,470.842 5,322.125

Note. N � 131 for Models ICC, A, B, C1, C2, and C3; N � 251 for Model D. Values in bold and italics are not statistically significant (p � .05). ModelICC contains no predictor; Model A has time as a predictor, and Model B has both time and c__Agei as predictors. The model specification between ModelsA and C1 is exactly the same, except that �01 is estimated in Model A but not in Model C1. Similarly, the model specification between Models B and C2is exactly the same, except that �01 is estimated in Model B but not in Model C2. The only difference between Models C2 and C3 is on the specificationof the within-individual variance–covariance matrix, in which C2 is specified with the default identity (ID) structure, whereas C3 is specified as thefirst-order autoregressive, or AR(1), structure. Model specification for Models C3 and C4 is exactly the same, except Model C3 includes patients with allfour repeated measures (N � 131), whereas Model C4 includes all patients (N � 251). ICC � intraclass correlation; �2LL � �2 log likelihood; AIC �the Akaike information criterion; BIC � the Bayesian information criterion.

378 KWOK ET AL.

Page 10: Analyzing Longitudinal Data With Multilevel Models: An ......ance structures and nonlinear models are discussed. Finally, information related to MLM analysis, such as online resources,

Model With Time-Invariant Covariate

Because of the two significant random-effect variances, we canfurther examine some potential individual-related predictors thatmay be able to account for the variation between individual re-gression models and the average model. Many individual-relatedvariables are time-invariant variables because the values of thesevariables are the same over time—examples include a partici-pant’s gender and ethnicity. We use the grand mean-centeredparticipant’s age at the first time measure (i.e., c__Agei � Agei �m__Age) as the person-related predictor for the demonstration.Agei is the initial age or age at the first time measure of the ithparticipant, and m__Age is the mean age at the first time measureover the 131 participants. For example, as shown in Figures 1Aand 1B, the initial age for the first participant (i.e., patient__id �1100067) was 22, and the mean-centered age for this individualwas �23.82 years, given that the mean initial age for the 131participants was 45.82 years (i.e., �23.82 � 22 � 45.82). A majorreason for using the centered age is to have meaningful interpre-tation for the intercept. Biesanz, Deeb-Sossa, Papadakis, Bollen,and Curran (2004) discussed the centering issue (especially oncentering the time variable) in longitudinal analysis. More discus-sions on centering in the general MLM framework can be found inKreft, DeLeeuw, and Aiken (1995) and Enders and Tofighi (2007).

In our example, the Level 1 (repeated measures-level) model is

the same as shown in Equation 4, whereas the Level 2 (individual-level) models are as follows:

�0i � �00 � �01c__Agei � U0i (7)

�1i � �10 � �11c__Agei � U1i. (8)

By substituting Equations 7 and 8 back into Equation 4, we havethe following:

FIM__Pti � �00 � �01c__Agei � �10Timeti

� �11c__Agei*Timeti � U0i � U1i*Timeti � eti, (9)

where the first four terms (i.e., �00, �01c__Agei, �10Timeti, and�11c__Agei

�Timeti) are the fixed effects that capture the averagemodel. The last three terms (i.e., U0i, U1i

�Timeti, and eti) are therandom effects that capture the variation between individual re-gression models and the average model (i.e., U0i and U1i

�Timeti)and the variation between individual observations and the regres-sion model within each person (i.e., eti). The results are presentedin Table 6 (Model B).

Because of the added cross-level interaction effect (i.e.,c__Agei

�Timeti in Equation 9), the regression coefficients of thelower order terms (i.e., �00 of the intercept, �01 of c__Agei, and�10 of Timeti) are conditional terms and have to be interpreted

Figure 3. A: The visual demonstration of �00, �11, and �01 (�00 � 0 and �11 � 0). B: The visual demonstrationof �00, �11, and �01 (�00 � 0 and �11 � 0). C: The visual demonstration of �00, �11, and �01 (�00 � 0 and �11 �0). D: The visual demonstration of �00, �11, and �01 (�00 � 0, �11 � 0, and �10 � �01 � 0).

379SPECIAL ISSUE: ANALYZING LONGITUDINAL DATA

Page 11: Analyzing Longitudinal Data With Multilevel Models: An ......ance structures and nonlinear models are discussed. Finally, information related to MLM analysis, such as online resources,

along with the interaction term. For example, the regression coef-ficient of Timeti, �.058, was the average monthly change inFIM__P score not for all 131 participants but for those participantswhose initial age or age at the first time measure was equal to45.82 years, because the values of c__Agei for these individualswere equal to zero. Similarly, the intercept, 87.351, was the aver-age FIM__P score at the first time measure not for all 131 partic-ipants but for those participants whose initial age was equal to45.82 years. The nonsignificant coefficient of c__Agei, �01, indi-cated that there was no relation between the mean-centered ageand the FIM__P score at Timeti � 0.

To understand the meaning of the interaction effect,c__Agei

�Timeti, one can use the steps suggested by Aiken andWest (1991) to decompose the interaction effect. The general ideaof the Aiken and West (1991) procedure is that one can use atwo-dimensional figure to present a three-dimensional relationship(i.e., two predictors, c__Agei and Timeti, and the outcome variable,FIM__Pti). It is quite straightforward to decompose the interactioneffect in a longitudinal analysis, in which the y-axis is always theoutcome variable (FIM__Pti) and the x-axis is always the “Timeti”predictor. Hence, one only needs to substitute some meaningfulvalues for the second predictor (i.e., the predictor other thanTimeti, which is the time-invariant covariate, c__Agei). The threecommonly used values for substitution are the mean of the pre-dictor, one standard deviation above the mean value of the pre-dictor, and one standard deviation below the mean value of thepredictor. Hence, the three values of c__Agei we used for decom-posing and plotting the interaction effect were �17.02 (one stan-dard deviation below the mean of c__Agei), 0 (mean of c__Agei),

and 17.02 (one standard deviation above the mean of c__Agei).The corresponding predicted model for each specific c__Agei

value is shown below:For younger participants (with c__Agei � �17.02 years or

original Agei � 45.82 � 17.02 � 28.80 years),

FIM__Pti � 87.351 � 0.058Timeti � 0.028c__Agei

� 0.003Timeti*c__Agei � 87.351 � 0.058Timeti � 0.028

� 17.02 � 0.003Timeti* � 17.02 � 87.828 � 0.007Timeti

For mean age participants (with c__Agei � 0 years or Agei �45.82 years),

FIM__Pti � 87.351 � 0.058Timeti � 0.0280

� 0.003Timeti*0 � 87.351 � 0.058Timeti

For older participants (with c__Agei � 17.02 years or Agei �45.82 � 17.02 � 62.84 years),

FIM__Pti � 87.351 � 0.058Timeti � 0.02817.02

� 0.003Timeti*17.02 � 86.874 � 0.109Timeti

Figure 4 presents the three predicted models for each of the threeage groups. On one hand, the younger participants had slightlyhigher FIM__P scores at the first time measure, even though thesescores were not significantly different from those in other agegroups. On the other hand, the declination rate of the FIM__Pscores was significantly slower in the younger participant group(i.e., �0.007 point/month) than in the older participant group (i.e.,

Figure 4. Decomposing the c__Agei�Timeti interaction effect.

380 KWOK ET AL.

Page 12: Analyzing Longitudinal Data With Multilevel Models: An ......ance structures and nonlinear models are discussed. Finally, information related to MLM analysis, such as online resources,

�0.109 point/month). In other words, older individuals experi-enced a steeper rate of decline in functional abilities over time thandid younger individuals.

We can also evaluate the effectiveness of the time-invariantcovariate, c__Agei, on explaining the between-individual variationby using the Pseudo-R2 statistic (Raudenbush & Bryk, 2002;Singer & Willett, 2003), as shown below:

Pseudo_R�002 �

�00_Unconditional � �00_Conditional

�00_Unconditional

and

Pseudo_R�112 �

�11_Unconditional � �11_Conditional

�11_Unconditional

for �00 and �11, respectively. �00_Unconditional and �11_Unconditional

are the variances of the random effects for the model without thetime-invariant covariate c__Agei, whereas �00_Conditional and�11_Conditional are the variances of the random effects for the modelwith c__Agei. Hence, the Pseudo-R2 statistic is the proportion ofexplained variance in the random effect by the time-invariantcovariate. Based on the information in Table 6, Models A and B,the Pseudo-R2 statistics for �00 and �11 are as follows:

Pseudo_R�002 �

�00_Unconditional � �00_Conditional

�00_Unconditional�

15.253 � 15.231

15.253

� .001 or .1%

and

Pseudo_R�112 �

�11_Unconditional � �11_Conditional

�11_Unconditional�

.061 � .059

.061

� .033 or 3.3%.

That is, the initial age or age at the first follow-up measure (i.e.,c__Agei) could only explain 0.1% of the variance in �00 (i.e., thevariation of the FIM__P score over the 131 participants at the firsttime point) but 3.3% of the variance in �11 (i.e., the variation of themonthly change in FIM__P score over the 131 participants). Giventhat the explained variance is the analog of the squared multiplecorrelation change in the OLS regression, we can adopt Cohen’s(1988) guideline (i.e., .02, .13, and .26 in squared multiple corre-lation change representing small, medium, and large effects, re-spectively) and conclude that the initial age has no effect onpredicting the functional abilities at the initial time point (i.e., atthe 12-month follow-up) but a small effect on predicting the linearrate of change in the functional abilities over time. These findingsare consistent with the tests of significance for the individualparameter estimates as shown in Table 6, Model B (i.e., the p valueof �01 was .29, whereas the p value of �11 was less than .05). Inaddition, these small explained variances imply the omission ofother important time-invariant covariates in the model, and furtherexamination of the model is needed.

The advantage of using Pseudo-R2 is that it provides an easy-to-use and understandable measure of effect size. However, unlikethe regular R2, negative Pseudo-R2 can be obtained especially wellwhen Level 1 (repeated measures-/within-individual-level) predic-tors only contribute to the within-individual variation but not to thebetween-individual variation. This can also be explained by the

compensatory relation between the within-individual variance andthe between-individual variance (Kwok et al., 2007; Snijders &Bosker, 1994, 1999). Snijders and Bosker (1994, 1999) providedmore discussion on the use of Pseudo-R2 and suggested an alter-native way3 for calculating the R2 for different levels to prevent thenegative explained variance.

In addition to the explained variance, researchers can comparedifferent MLMs using the information criteria, such as the Akaikeinformation criterion (AIC) and the Bayesian information criterion(BIC). The general guideline for using these information criteria isto select the model with the smallest value on either the AIC or theBIC. Additional guidelines on comparing models using the BICare available (Raftery, 1996). There is no substantial differencebetween two models if the BIC difference is less than 2. However,there is a substantial difference between two models if the differ-ence between the two BIC values is larger than 10. Nevertheless,a few studies have shown that the effectiveness of using theseinformation criteria on model selection (especially on selecting thecorrect variance–covariance matrix of the random effects anderrors) was relatively low (Keselman, Algina, Kowalchuk, &Wolfinger, 1998). Hence, these information criteria should be usedwith caution and should not be used as the sole criterion forselecting models without considering both theoretical explanationand application of the model.

Beyond the Default Models

In previous sections, we discussed models that are the common/default models in most statistical programs. These models alwayscome with some default assumptions, specifically on the random-effect part of the model. For example, a specific covariance struc-ture, an identity structure (i.e., �2I), is always assumed for thecovariance structure of the within-individual variation (i.e., eti),which may not be applicable, especially for longitudinal analysis(Kwok et al., 2007). The misspecification of random effect covari-ance structure may result in biased estimation of the variances ofthe random effects, which in turn may affect the estimation of thestandard errors and the test of significance of the fixed effects.

One of the advantages of using MLM for analyzing longitudinaldata is that MLM offers great flexibility in modeling the covari-ance structure for both between-individual random effects andwithin-individual random errors. Researchers can search for theoptimal covariance structure, which theoretically results in thehighest statistical power and increases the precision of estimates ofthe fixed effects (Davis, 2002; Diggle, Heagerty, Liang, & Zeger,2002; Keselman, Algina, & Kowalchuk, 2001; Singer & Willett,2003; Wolfinger, 1996).

Modeling the covariance structure for the between-individualrandom effects. Recall that we examined two models, a simple

3 By fitting an MLM with a random effect associated only with theintercept (i.e., not with a random effect associated with any coefficientsother than the intercept), Snijders and Bosker (1994, 1999) provided analternative way to calculate the explained variance by redefining thewithin-individual variance as �2 and the between-individual variance as�2

n� �00, in which n is the cluster size (i.e., the number of observations

per cluster). For unbalanced design, n can be the harmonic mean of thecluster size across all clusters.

381SPECIAL ISSUE: ANALYZING LONGITUDINAL DATA

Page 13: Analyzing Longitudinal Data With Multilevel Models: An ......ance structures and nonlinear models are discussed. Finally, information related to MLM analysis, such as online resources,

linear growth model (Model A) and a model with a time-invariantcovariate, c__Agei (Model B). In these two models, we estimatedthree elements (i.e., �00, �11, and �01) in the covariance structurefor the between-individual random effects. As shown in Table 6(Models A and B), the covariance, �01, was not significant in eitherone of the models. The covariance structure used for these twomodels is called unstructured (UN), in which all unique elementsin the covariance structure are free for estimation:

V� U0i

U1i� � � �00 �01

�10 �11�.

Because of the nonsignificance of the covariance (i.e., �01 notsignificantly different from zero), we have fitted the same twomodels (i.e., models C1 and C2 in Table 6) with a simplercovariance structure, namely, UN(1) structure (in SAS) or DIAGstructure (in SPSS) as presented below:

V� U0i

U1i� � � �00 0

0 �11�,

in which only the variances but not the covariance of the randomeffects are estimated. In other words, �01 is constrained to zero,which implies that there is no relation between the functionalabilities at the initial time point (i.e., the 12-month follow-up) andthe rate of change in the functional abilities over time acrossindividuals.

By using the likelihood ratio test, we can compare Models A andC1 with their �2 log likelihood (�2LL) values. Because the onlydifference between these two models is the covariance �01, whichhas been estimated in Model A but not in Model C1 (i.e., �01 � 0in Model C1), the difference in the �2LL values of these twonested models follows a chi-square distribution with 1 degree offreedom. The likelihood ratio test, �2(1) � (�2LLModel A) �(�2LL

Model C1) � 3,449.133 � 3,449.097 � 0.036, is not statisti-

cally significant (p � .850), which means that �01 is not differentfrom zero. The estimated �00 and �11 in Model C1 are slightlylarger than the ones in Model A due to the redistribution of thevariance between the random effects (Luo & Kwok, 2006; Meyers& Beretvas, 2006). Then, we added the time-invariant covariate,c__Agei, back in Model C1, and the results of this model (ModelC2) are presented in Table 6. Basically, the only difference on themodel specification between Models B and C2 is that �01 isestimated in Model B but constrained to zero in Model C2. We canuse the same equation presented previously to obtain the Pseu-do-R2 statistics for the changes in �00 and �11 after includingc__Agei in the model. By constraining �01 to zero in Models C1and C2, we found that the Pseudo-R2 statistics for both �00 and �11

are larger (i.e., R�002 � .015 and R�00

2 � .048) than the ones basedon Models A and B (i.e., R�00

2 � .001 and R�002 � .033). Instead

of using the default setting, one might find that modeling thevariance–covariance matrix of the between-individual random ef-fects may result in higher explained variances.

Modeling the covariance structure for the within-individualrandom errors. As Kwok et al. (2007) pointed out, when ana-lyzing longitudinal data under the MLM framework, researcherstypically assume the within-individual errors to be independentlyand identically distributed, with mean zero and homogenous vari-ance �2 for all participants, that is, e � N(0,�2I). The simplifica-tion of the within-individual covariance structure (i.e., �2I, also

named identity structure) may bias the estimation of the standarderrors of the fixed effects, which in turn may lead to incorrectstatistical inferences for the fixed effects. Singer and Willett (2003)also emphasized that obtaining an adequate within-individual co-variance structure is a key element in estimating the proper effectsize and properly accounting for missing values in multilevel data.Thus, choosing the optimal error structure is an important task inMLM. As Campbell and Kenny (1999) stated,

The correlational structure of longitudinal data almost always has aproximally autocorrelated structure: adjacent waves of measurementcorrelate more highly than nonadjacent waves, and more remote in time,the lower the correlation (Campbell & Reichardt, 1991; Kenny &Campbell, 1989). . .except for data that are highly cyclical (Warner,1998), proximal autocorrelation is the norm. (p. 121)

Given that the proximal autocorrelation is a common phenom-enon in longitudinal data, the first-order autoregression, or AR(1),structure was also fitted to the within-individual covariance of ourexample data. AR(1) is one of the covariance structures commonlyused by researchers when analyzing longitudinal data, and it hasbeen widely applied in the latent growth models (Bollen & Curran,2004; 2006; Curran & Bollen, 2001). AR(1) contains two param-eters (the error variance �2 and the autocorrelation coefficient ).An example of AR(1) with four repeated measures is shown below(compared with the default identity [ID] structure in Model C2):

AR1Model_C3 � �2�1 2 3

1 2

2 1 3 2 1

�vs.

IDModel_C2 � �2�1 0 0 00 1 0 00 0 1 00 0 0 1

�The results based on the AR(1) structure are shown in Table 6

(Model C3). The only difference between Models C2 and C3 is inthe within-individual covariance, in which Model C2 has thedefault identity structure (i.e., �2I), whereas Model C3 has theAR(1) structure. The difference between these two models onfitting the within-individual covariance matrix can be examined bythe likelihood ratio test, given that the default identity structure isnested within the AR(1) structure. The significant difference be-tween the �2LL values of these two models, �2(1) � 3,459.544 –3,445.827 � 13.717, p � .001, indicated that the AR(1) structure(Model C3) fitted the within-individual covariance matrix betterthan did the default identity structure (Model C2). Nevertheless,most of the parameter estimates of Model C3 were very similar tothe ones in Model C2. One noticeable difference between the twomodels is that the change in the p value of the Time�c__Ageinteraction effect (i.e., from p � .042 in Model C2 reduced to p �.037 in Model C3) implicitly showed the increment in the statis-tical power after modeling the within-individual covariance as theAR(1) structure. In addition, the negative autocorrelation coeffi-cient ( � �.450) indicated a tendency of the FIM__P score tooscillate within patients over time (after controlling for the averagelinear growth trend). That is, a negative relation ( � �.450)between the adjacent time points was found, whereas a smallerpositive relation ( 2 � .202) between the nonadjacent time points

382 KWOK ET AL.

Page 14: Analyzing Longitudinal Data With Multilevel Models: An ......ance structures and nonlinear models are discussed. Finally, information related to MLM analysis, such as online resources,

(i.e., the first and third time points and the second and fourth timepoints) was presented.

Handling Missing Data

Missing data may occur in which individuals are absent fromone or more data collection occasions. MLM addresses this issueby taking all observations into account regardless of the design ofthe study. In our example data set, there were 251 individuals whohad at least one response in the four time measures. Among these251 individuals, 27 of them only responded to one time point, 39of them responded to two time points, 54 of them responded tothree time points, and 131 of them responded to all four timepoints. Because of the required multivariate data format (seeFigure 1A) for a repeated measures UANOVA analysis, listwisedeletion would be adopted, and only the 131 individuals withcomplete data could be included.4 However, all 251 individualscan be included in the analysis when MLM is used. The resultsbased on all 251 individuals are presented in Table 6, Model D.Models C3 and D are the same in the model setup, except in termsof the sample size (i.e., Model C3 only included 131 individuals,whereas Model D included 251 individuals).

In comparison with Model C3, Model D produced a very similarpattern of results. There are two major differences, however,between the two models: (a) all regression coefficients are signif-icant in Model D, and (b) the variance of the random intercept �00

is substantially larger in Model D than in Model C3. The signif-icance of all regression coefficients is the result of the increasedsample size (i.e., increased statistical power). The increment in �00

is the result of the inclusion of more individuals, especially theones with only a single response, which could only contribute tothe estimation of between-individual parameters and variations(Muthen, 2002).

Required Sample Size for Longitudinal Analysis UsingMLMs

There are many rules of thumb on the required sample size forMLMs (e.g., 15 units per cluster by Bryk & Raudenbush, 1992; the30 clusters/30 units per cluster rule by Kreft, 1996; and the 50clusters/20 units per cluster rule for detecting cross-level interac-tion effect by Hox, 1998). However, none of these rules of thumbcan provide an accurate estimation of the sample size (in terms ofa desired level of statistical power along with a specific size of thetarget effect). Moreover, all these rules of thumb require a rela-tively large number of Level 1 units (i.e., the number of repeatedmeasures), which may not be applicable for longitudinal studieswith educational or psychological data, given that these data oftencontain a smaller number of waves/repeated measures. In general,more reliable estimates for the individual growth models can beobtained with a relative large number of measurement waves (e.g.,eight or more). Moreover, a larger number of higher level units(i.e., the number of patients in our example) can increase thestatistical power for detecting the effects of the higher level pre-dictors and the cross-level interaction effects between the within-and between-individual predictors. More accurate sample sizeestimation for longitudinal analysis in MLM can be obtainedthrough freeware, such as PINT (Version 2.1; Bosker, Snijders, &Guldemond, 2003), which can be downloaded from http://stat-

.gamma.rug.nl/snijders/, and Optimal Design (Version 1.76; Spy-brook, Raudenbush, Liu, Congdon, & Martinez, 2008), which canbe downloaded from http://sitemaker.umich.edu/group-based/opti-mal_design_software. In addition, a freeware application for de-termining sample sizes for MLM for two-group repeated measuresdesigns (e.g., clinical trials) can be obtained from http://tigger.uic.edu/�hedeker/ml.html. This application (RMASS2) is basedon the work of Hedeker, Gibbons, and Waternaux (1999) andallows for attrition, both fixed and random effects models, andseveral variance–covariance structures for repeated measures.

Beyond Linear Growth Models

In this article, we only focused on linear growth models (i.e., thechange in the outcome variable FIM__P is in a linear fashion overtime). In fact, the great majority of applications of growth modelsto date have used linear models. The advantages of using lineargrowth models include the following: (a) these models are simpleand easy to interpret, and (b) they can adequately represent thegrowth process when the number of measurement waves is small,the study is short, or both. However, some phenomena, such as thedevelopment of crystallized and fluid intelligence over the entirelife span (Finkel, Reynolds, McArdle, Gatz, & Pederson, 2003),cannot be adequately captured by linear growth models. Otherforms of growth models, such as a quadratic growth model (i.e.,adding a quadratic term, time2, into the model) or piecewise model(i.e., segmenting a nonlinear process into multiple linear growthmodels; Khoo, 2001; Raudenbush & Bryk, 2002), are commonlyused for representing nonlinear growth processes. In addition toallowing examination of a single developmental process, parallelprocess models (PPM) provide the opportunity for studying therelation between multiple developmental processes simulta-neously. More information on the setup and interpretation of PPMcan be found in the studies by Cheong, MacKinnon, and Khoo(2003) and by Kwok, West, and Sousa (2006).

In the current example, we only examined a two-level modelwith repeated measures nested within patients. In some othersettings, patients can also be nested within some higher levelunits/clusters, such as different wards or hospitals. The variablesassociated with these higher level units are contextual variables(e.g., the total number of patients in a hospital, the ratio of numberof doctors to number of patients, and the hospital type). Thesecontextual variables may be able to further explain/account for thevariation in the initial FIM__P scores as well as the variation in thechange in the FIM__P scores over time across all patients. More

4 An alternative way for researchers to analyze incomplete data using aUANOVA is for them to incorporate the multiple imputation procedure(Schafer, 1997) if the data are missing completely at random or missing atrandom (Little & Rubin, 2002). Multilevel missing data can be imputedusing either the PAN routine provided by Schafer (http://www.stat.psu.edu/�jls/misoftwa.html) or the SAS PROC MI routine (i.e., to impute themultivariate data then convert the imputed data to univariate format foranalysis). HLM, SAS (PROC MIANALYZE), and SPSS (missing valuesanalysis module) have the missing data routine that can analyze theimputed data. Nevertheless, as pointed out by Twisk (2006), “it has evenbeen shown that applying multilevel analysis to an incomplete dataset iseven better than applying imputation methods (Twisk & de Vente, 2002;Twisk, 2003)” (p. 107).

383SPECIAL ISSUE: ANALYZING LONGITUDINAL DATA

Page 15: Analyzing Longitudinal Data With Multilevel Models: An ......ance structures and nonlinear models are discussed. Finally, information related to MLM analysis, such as online resources,

information on fitting higher level (e.g., three-level) models andinterpreting the contextual effects can be found in both Rauden-bush and Bryk’s (2002) text and in Snijders and Bosker’s (1999)text.

Conclusion

In this article, we illustrated how to analyze longitudinal dataunder the MLM framework. There are two major phases foranalyzing longitudinal data: preparation phase and analysis phase.In the preparation phase, researchers should carefully inspect theirdata to screen for errors or outliers in the data and obtain somebasic descriptive statistics for their data, such as means, variances,skewness, and kurtosis. As described in the data preparation sec-tion, data have to be converted into univariate format before onecan analyze them in MLM. Moreover, researchers are encouragedto plot their data with spaghetti plots to see whether there is anypotential trend in the data over time.

In the analysis phase, Wallace and Green (2002) suggested foursteps for analyzing longitudinal data in MLM:

1. Review the past literature to formulate the initial model.2. Examine the initial model and evaluate the fixed part of the

model (i.e., shape of the average growth model plus covariates).3. Evaluate the random part of the model (i.e., the covariance

structure of the random effects) with the same fixed part based onStep 2.

4. Fine-tune the fixed part of the model. An iterative processbetween Steps 3 and 4 is recommended until a stable and inter-pretable model is obtained.

Similar steps have been adopted in our example. We first ana-lyzed the data with a simple linear model (i.e., Model A). We thenadded the time-invariant covariate (c__Agei) to see whether itcould account for the variations in both intercepts and slopes overthe 131 participants (i.e., Model B). After we confirmed the fixedpart of the model (i.e., the shape of the average model pluscovariates), we examined the random part of the model (i.e., thecovariance structures of both between-individual variations andwithin-individual random errors). We first fitted a simpler be-tween-individual covariance structure (i.e., without estimating �01;Models C1 and C2). Then, we examined the AR(1) structure forthe within-individual covariance structure (i.e., Model C3), and thelikelihood ratio test showed that the AR(1) structure fitted better tothe data than did the default identity structure.

In this article, we provided only a simple overview of modelsand procedures for analyzing longitudinal data under the MLMframework. There are many texts, such as those by Singer andWillett (2003), Hedeker and Gibbons (2006), and Weiss (2005),that have provided more in-depth treatments on analyzing longi-tudinal data in MLM. Readers can also find more information onusing the latent growth model to analyze longitudinal data in thetexts by Duncan, Duncan, Strycker, Li, and Alpert (1999) andBollen and Curran (2006). In addition, there are many usefulresources from the Internet, including the University of Bristol(Bristol, United Kingdom) Centre for Multilevel Modelling Website (http://www.cmm.bristol.ac.uk/index.shtml) and the Univer-sity of California, Los Angeles MLM portal (http://statcomp.ats.ucla.edu/mlm/), which contain useful information, such as links toother MLM-related online resources and reviews and links to

different MLM software, including HLM, MLwinN, Mplus, andStata.

References

Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing andinterpreting interaction. Newbury Park, CA: Sage.

Barcikowski, R. S. (1981). Statistical power with group mean as the unit ofanalysis. Journal of Educational Statistics, 6, 267–285.

Biesanz, J. C., Deeb-Sossa, N., Papadakis, A. A., Bollen, K. A., & Curran,P. J. (2004). The role of coding time in estimating and interpretinggrowth curve models. Psychological Methods, 9, 30–52.

Biesanz, J. C., West, S. G., & Kwok, O. (2003). Personality over time:Methodological approaches to the study of short-term and long-termdevelopment and change. Journal of Personality, 71, 905–941.

Bollen, K. A., & Curran, P. J. (2004). Autoregressive latent trajectory(ALT) models: A synthesis of two traditions. Sociological Methods andResearch, 32, 336–383.

Bollen, K. A., & Curran, P. J. (2006). Latent curve models: A structuralequation perspective. Hoboken, NJ: Wiley.

Bosker, R. J., Snijders, T. A. B., & Guldemond, H. (2003). PINT (PowerIN Two-level designs): Estimating standard errors of regression coeffi-cients in hierarchical linear models for power calculations. RetrievedApril 19, 2008, from http://stat.gamma.rug.nl/Pint21_UsersManual.pdf

Bryk, A. S., & Raudenbush, S. W. (1992). Hierarchical linear models:Applications and data analysis methods. Newbury Park, CA: Sage.

Campbell, D. T., & Kenny, D. A. (1999). A primer on regression artifacts.New York: Guilford Press.

Campbell, D. T., & Reichardt, C. S. (1991). Problems in assuming thecomparability of pretest and posttest in autoregressive and growth mod-els. In R. E. Snow & E. Wiley (Eds.), Improving inquiry in socialscience: A volume in honor of Lee J. Cronbach (pp. 201–219). Mahwah,NJ: Erlbaum.

Cheong, J., MacKinnon, D. P., & Khoo, S. (2003). Investigation of medi-ational processes using latent growth curve modeling. Structural Equa-tion Modeling, 10, 238–262.

Chi, E. M., & Reinsel, G. C. (1988). Models for longitudinal data withrandom effects and AR(1) errors. Journal of the American StatisticalAssociation, 84, 452–459.

Clay, D. L., Wood, P. K., Frank, R. G., Hagglund, K. J., & Johnson, J. C.(1995). Examining systematic differences in adaptation to chronic ill-ness: A growth modeling approach. Rehabilitation Psychology, 40,61–70.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences(2nd ed.). Mahwah, NJ: Erlbaum.

Collins, L. M., & Sayer, A. G. (2001). New methods for the analysis ofchange. Washington, DC: American Psychological Association.

Curran, P. J., & Bollen, K. A. (2001). The best of both worlds: Combiningautoregressive and latent curve models. In L. M. Collins & A. G. Sayer(Eds.), New methods for the analysis of change (pp. 107–135). Wash-ington, DC: American Psychological Association.

Davis, C. S. (2002). Statistical methods for the analysis of repeatedmeasurements. San Diego, CA: Springer.

Diggle, P. J. (1988). An approach to the analysis of repeated measure-ments. Biometrics, 44, 959–971.

Diggle, P. J., Heagerty, P., Liang, K., & Zeger, S. L. (2002). Analysis oflongitudinal data (2nd ed.). New York: Oxford University Press.

Duncan, T. E., Duncan, S. C., Strycker, L. A., Li, F., & Alpert, A. (1999).An introduction to latent variable growth curve modeling: Concepts,issues, and applications. Mahwah, NJ: Erlbaum.

Elliott, T. (2002). Presidential address: Defining our common ground toreach new horizons. Rehabilitation Psychology, 47, 131–143.

Elliott, T., Shewchuk, R., & Richards, J. S. (2001). Family caregiver social

384 KWOK ET AL.

Page 16: Analyzing Longitudinal Data With Multilevel Models: An ......ance structures and nonlinear models are discussed. Finally, information related to MLM analysis, such as online resources,

problem-solving abilities and adjustment during the initial year of thecaregiving role. Journal of Counseling Psychology, 48, 223–232.

Enders, C. K., & Tofighi, D. (2007). Centering predictor variables incross-sectional multilevel models: A new look at an old issue. Psycho-logical Methods, 12, 121–138.

Fidell, L. S., & Tabachnick, B. G. (2003). Preparatory data analysis. InJ. A. Schinka & W. F. Velicer (Eds.), Comprehensive handbook ofpsychology: Research methods in psychology (Vol. 2, pp. 115–141).New York: Wiley.

Finkel, D., Reynolds, C. A., McArdle, J. J., Gatz, M., & Pedersen, N. L.(2003). Latent growth curve analyses of accelerating decline in cognitiveabilities in adulthood. Developmental Psychology, 39, 535–550.

Frank, R. G., Thayer, J. F., Hagglund, K. J., Veith, A. Z., Schopp, L. H.,Beck, N. C., et al. (1998). Trajectories of adaptation in pediatric chronicillness: The importance of the individual. Journal of Consulting andClinical Psychology, 66, 521–532.

Grant, J. S., Elliott, T. R., Weaver, M., Bartolucci, A. A., & Giger, J. N.(2002). Telephone intervention with family caregivers of stroke survi-vors after rehabilitation. Stroke, 33, 2060–2065.

Gravel, J., Opatrny, L., & Shapiro, S. (2007). The intention-to-treat ap-proach in randomized controlled trials: Are authors saying what they doand doing what they say? Clinical Trials, 4, 350–356.

Greenspan, A. I., Wrigley, J. M., Kresnow, M., Branche-Dorsey, C. M., &Fine, P. R. (1996). Factors influencing failure to return to work due totraumatic brain injury. Brain Injury, 10, 207–218.

Hall, K. M., Hamilton, B. B., Gordon, W. A., & Zasler, N. D. (1993).Characteristics and comparisons of functional assessment indices: Dis-ability rating scale, functional independence measure, and functionalassessment measure. Journal of Head Trauma Rehabilitation, 8, 60–74.

Hedeker, D., & Gibbons, R. D. (2006). Longitudinal data analysis. Hobo-ken, NJ: Wiley.

Hedeker, D., Gibbons, R. D., & Waternaux, C. (1999). Sample sizeestimation for longitudinal designs with attrition: Comparing time-re-lated contrasts between two groups. Journal of Educational and Behav-ioral Statistics, 24, 70–93.

Heinemann, A. W., Linacre, J. M., Wright, B. D., Hamilton, B. B., &Granger, C. (1993). Relationships between impairment and physicaldisability as measured by the functional independence measure. Archivesof Physical Medicine and Rehabilitation, 74, 566–573.

Hollis, S., & Campbell, F. (1999). What is meant by intention to treatanalysis? Survey of published randomized controlled trials. British Med-ical Journal, 319, 670–674.

Hox, J. (1998). Multilevel modeling: When and why. In I. Balderjahn, R.Mathar, & M. Schader (Eds.), Classification, data analysis, and datahighways (pp. 147–154). New York: Springer-Verlag.

Hox, J. (2002). Multilevel analysis: Techniques and applications. Mahwah,NJ: Erlbaum.

Jones, R. H., & Boadi-Boateng, F. (1991). Unequally spaced longitudinaldata with AR(1) serial correlation. Biometrics, 47, 161–175.

Keith, R. A., Granger, C. V., Hamilton, B. B., & Sherwin, F. S. (1987). Thefunctional independence measure: A new tool for rehabilitation. Ad-vances in Clinical Rehabilitation, 1, 6–18.

Kenny, D. A., & Campbell,. D T. (1989). On the measurement of stabilityin over-time data. Journal of Personality, 57, 445–481.

Keselman, H. J., Algina, J., & Kowalchuk, R. K. (2001). The analysis ofrepeated measures designs: A review. British Journal of Mathematicaland Statistical Psychology, 54, 1–20.

Keselman, H. J., Algina, J., Kowalchuk, R. K., & Wolfinger, R. D. (1998).A comparison of two approaches for selecting covariance structures inthe analysis of repeated measurements. Communications in Statistics:Simulation and Computation, 27(3), 591–604.

Khoo, S. (2001). Assessing program effects in the presence of treatment–baseline interactions: A latent curve approach. Psychological Methods,6, 234–257.

Khoo, S., West, S. G., Wu, W., & Kwok, O. (2006). Longitudinal methods.In M. Eid & E. Diener (Eds.), Handbook of psychological measurement:A multimethod perspective (pp. 301–317). Washington, DC: AmericanPsychological Association.

Kreft, I. G. G. (1996). Are multilevel techniques necessary? An overview,including simulation studies. Unpublished manuscript.

Kreft, I. G. G., DeLeeuw, J., & Aiken, L. S. (1995). Variable centering inhierarchical linear models: Model parameterization, estimation, and in-terpretation. Multivariate Behavioral Research, 30, 1–21.

Kwok, O., West, S. G., & Green, S. B. (2007). The impact of misspecifyingthe within-subject covariance structure in multiwave longitudinal mul-tilevel models: A Monte Carlo study. Multivariate Behavioral Research,42, 557–592.

Kwok, O., West, S. G., & Sousa, K. H. (2006). Analyzing longitudinalparallel processing (LPP) model under structural equation modeling(SEM) framework: An example from the AIDS Time-Oriented HealthOutcome Study (ATHOS). Paper presented at the 14th annual meeting ofthe Society for Prevention Research, San Antonio, TX.

Laird, N., & Ware, J. H. (1982). Random-effects models for longitudinaldata. Biometrics, 38, 963–974.

Littell, R. C., Milliken, G. A., Stroup, W. W., Wolfinger, R. D., &Schabenberber, O. (2006). SAS system for linear mixed models (2nd ed.).Cary, NC: SAS Institute.

Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missingdata (2nd ed.). New York: Wiley.

Longford, N. T. (1993). Random coefficient models. New York: OxfordUniversity Press.

Luo, W., & Kwok, O. (2006). Impacts of ignoring a crossed factor inanalyzing cross-classified multilevel data: A Monte Carlo study. Paperpresented at the 71st annual meeting of the Psychometric Society,Montreal, Quebec, Canada.

Meyers, J., & Beretvas, S. N. (2006). The impact of inappropriate modelingof cross-classified data structures. Multivariate Behavioral Research, 41,473–497.

Muthen, B. (2002, February 2). Multilevel Data/Complex Sample: ClusterSize. Message posted to Mplus Discussion, archived at http://www.stat-model.com/discussion/messages/12/164.html

Muthen, B., & Curran, P. (1997). General longitudinal modeling of indi-vidual differences in experimental designs: A latent variable frameworkfor analysis and power estimation. Psychological Methods, 2, 371–402.

Peugh, J. L., & Enders, C. K. (2005). Using the SPSS Mixed procedure tofit hierarchical linear and growth trajectory models. Educational andPsychological Measurement, 65, 811–835.

Putzke, J. D., Barrett, J. J., Richards, J. S., Underhill, A. T., & LoBello,S. G. (2004). Life satisfaction following spinal cord injury: Long-termfollow-up. The Journal of Spinal Cord Medicine, 27, 106–110.

Raftery, A. E. (1996). Bayesian model selection in social research. In P. V.Marsden (Ed.)., Sociological methodology (pp. 111–163). Oxford, En-gland: Basil Blackwell.

Raudenbush, S. W. (1988). Educational applications of hierarchical mod-els: A review. Journal of Educational Statistics, 13, 85–116.

Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models:Applications and data analysis methods (2nd ed.). Thousand Oaks, CA:Sage.

Rivera, P., Elliott, T., Berry, J., & Grant, J. (2008). Problem-solvingtraining for family caregivers of persons with traumatic brain injuries: Arandomized controlled trial. Archives of Physical Medicine and Reha-bilitation, 89, 931–941.

Schafer, J. L. (1997). Analysis of incomplete multivariate data. New York:Chapman & Hall.

Shewchuk, R., Richards, J. S., & Elliott, T. (1998). Dynamic processes inhealth outcomes among caregivers of individuals with spinal cord inju-ries. Health Psychology, 17, 125–129.

Singer, J. D. (1998). Using SAS PROC MIXED to fit multilevel models,

385SPECIAL ISSUE: ANALYZING LONGITUDINAL DATA

Page 17: Analyzing Longitudinal Data With Multilevel Models: An ......ance structures and nonlinear models are discussed. Finally, information related to MLM analysis, such as online resources,

hierarchical models, and individual growth models. Journal of Educa-tional and Behavioral Statistics, 24, 323–355.

Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis:Modeling change and event occurrence. New York: Oxford UniversityPress.

Snijders, T. A. B., & Bosker, R. J. (1994). Modeled variance in two-levelmodels. Sociological Methods & Research, 22, 342–363.

Snijders, T. A. B., & Bosker, R. J. (1999). Multilevel analysis: An intro-duction to basic and advanced multilevel modeling. Thousand Oaks,CA: Sage.

Spybrook, J., Raudenbush, S. W., Liu, X., Congdon, R., & Martinez, A.(2008). Optimal design for longitudinal and multilevel research:Documentation for the “Optimal Design” software. Retrieved April19, 2008, from http://sitemaker.umich.edu/group-based/files/od-manual-20080312-v176.pdf

Tabachnick, B. G., & Fidell, L. S. (2006). Using multivariate statistics (5thed.). Boston, MA: Allyn & Bacon.

Twisk, J. W. R. (2003). Applied longitudinal data analysis for epidemiol-ogy: A practical guide. New York: Cambridge University Press.

Twisk, J. W. R. (2006). Applied multilevel analysis. New York: CambridgeUniversity Press.

Twisk, J. W. R., & de Vente, W. (2002). Attrition in longitudinal studies:How to deal with missing data. Journal of Clinical Epidemiology, 55,329–337.

Underhill, A. T., Lobello, S. G., & Fine, P. R. (2004). Reliability andvalidity of the Family Satisfaction Scale with survivors of traumaticbrain injury. Journal of Rehabilitation Research and Development, 41,603–610.

Underhill, A. T., Lobello, S. G., Stroud, T. P., Terry, K. S., Devivo, M. J.,

& Fine, P. R. (2003). Depression and life satisfaction in patients withtraumatic brain injury: A longitudinal study. Brain Injury, 17, 973–982.

Wallace, D., & Green, S. B. (2002). Analysis of repeated-measures designswith linear mixed models. In D. S. Moskowitz & S. L. Hershberger(Eds.), Modeling intraindividual variability with repeated measuresdata: Method and applications (pp. 103–134). Englewood Cliffs, NJ:Erlbaum.

Warner, R. (1998). Spectral analysis of time-series data. New York:Guilford Press.

Warschausky, S., Kay, J., & Kewman, D. (2001). Hierarchical linearmodeling of FIM instrument growth curve characteristics after spinalcord injury. Archives of Physical Medicine and Rehabilitation, 82,329–334.

Weiss, R. E. (2005). Modeling longitudinal data. New York: Springer.West, S. G., Biesanz, J. C., & Kwok, O. (2003). Within-subject and

longitudinal experiments: Design and analysis issues. In C. Sansone,C. C. Morf, & A. T. Panter (Eds.), Handbook of methods in socialpsychology (pp. 287–312). Thousand Oaks, CA: Sage.

Wolfinger, R. D. (1993). Covariance structure selection in general mixedmodels. Communications in Statistics-Simulation and Computation, 22,1079–1106.

Wolfinger, R. D. (1996). Heterogeneous variance-covariance structures forrepeated measures. Journal of Agricultural, Biological, and Environ-mental Statistics, 1, 205–230.

Received November 5, 2007Revision received April 19, 2008

Accepted May 13, 2008 �

386 KWOK ET AL.