university rennes 2, crpcc, ea 1285

1June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data

Latent variable modeling of psychological longitudinal data:

taking into account the unobserved heterogeneity using Mplus

Jacques JuhelUniversity Rennes 2, CRPCC, EA 1285


Studying individual differences in learning, change and development

A double compromise :• random effect model,• classification techniques.

Introduction

3

(among other methods) the GMM approach of Muthén and colleagues

A technique for longitudinal data that :• combines categorical and continuous latent variables in the same model (“beyond SEM”),• accommodates unobserved heterogeneity in the sample,• allows for each class membership latent growth parameters to be influenced by time-varying covariates and time-invariant predictor variables,• incorporates consequent outcomes predicted by the latent class variable.

Introduction

June 2-4, 2010 - Saint-Raphaël INSERM workshop : Mixture modelling for longitudinal data

4

The LGM for a continuous outcome : the multivariate latent variable approach

Factor analysis measurement model (level 1) :

Yi (mx1) repeated measures over fixed time points,

(mx1) intercepts in the regression from Yi on i ,

i (px1) latent growth factors,

(mxp) design matrix of factor loadings,i (mx1) residuals in the regression of Yi on i (covariance matrix ).

Y ν Λη (1), i i i


LGM specifications

5


Structural regression model (level 2) :

(px1) means of i or intercepts in the regression of i on i ,

B (pxp) regression coefficients in the regression of i on i ,


i (px1) residuals in the regression of i on i (covariance matrix ).

η α Βη (2), i i i


LGM specifications



The covariance and mean structure are derived for the population with the hypothesis that :

, and are mutually uncorrelated,

E[] and E[] equal 0.

LGM assumptions


The unconditional linear LGMFree parameters (Mplus output)

y1

0

1

y2 y3 y4

1 01 11 21 3

Λ

Β 0

ν 0

Means of 0 and 1,

var(var(cov(01)res. var(y)

SEM representation

Y Ληη α

(1)

(2)

, ,

i i i

i i

8

The LGM with time-varying covariates

Factor analysis measurement model (level 1) :

Yi (mx1) repeated measures over fixed time points,

(mx1) intercepts in the regression from Yi on i ,


(mxp) design matrix of factor loadings,K (mxr) coefficients in the regression from Yi on time-varying covariates ai.

i (mx1) residuals in the regression of Yi on i (covariance matrix ).

Y ν Λη Ka (1bis), i i ii


LGM specifications


Linear LGM with time-varying covariatesFree parameters (Mplus output)

y1

0

1

y2 y3 y4

a1 a2 a3 a4

1 01 11 21 3

Λ var(var(cov(01)res.var(y)cov(a,)cov(a,)

Regression coefficients from y on a

ν 0

Means of 0 and 1,

SEM representation


The LGM with time-invariant covariates

Structural regression model (level 2), with vector of predictors x :


(px1) means of i or intercepts in the regression of i on i ,

B (pxp) regression coefficients in the regression of i on i ,

Xi (qx1) time-invariant covariate predictors of change,

(pxq) regression coefficients in the regression from on X,i (px1) residuals in the regression of i on i (covariance matrix ).

η ΓXα Βη (3), i i ii

LGM specifications


The linear LGM with time-varying and time-invariant covariatesFree parameters (Mplus output)

y1

0

1

y2 y3 y4

x1 x2 x3

a1 a2 a3 a4

1 01 11 21 3

Λ


Regression coefficients from 0and1on X

ν 0

Intercepts of 0 and 1,

Means of a1-a4

res.var(res. var(res. cov(01)res. var(y) cov(a,) cov(a,)cov(a,x

SEM representation


The linear LGM with time-varying, time-invariant covariates and a distal outcome

Consequences of change as outcomes can be predicted by the latent growth factors :

Zi (dx1) vector of distal outcomes of change,

(dxp) matrix of regression coefficients from Z on , (dx1) vector of regression intercepts for Z,i (px1) residuals in the regression of Zi on i (covariance matrix Y).

Z ω Βη (4), i i i

LGM specifications


The linear LGM with time-varying, time-invariant covariates and a distal outcomeFree parameters (Mplus output)

y1

0

1

y2 y3 y4

x1 x2 x3

a1 a2 a3 a4

1 01 11 21 3

Λ


Regression coefficients from 0and1on xRegression coefficients from z on 0and1

ν 0

Intercepts of 0 and 1,

Means of a1-a4

Intercept of z res. var(res. var(res. cov(01)res. var(y) cov(a,) cov(a,)cov(a,x

z

SEM representation

14

Illustration : data set 1


Clinical symptomatology, performance on the TMT and consciousness disorders in schizophrenia• 130 stabilized patients with schizophrenia (M=31.0 yr., QI>90, all with neuroleptic medication).

• Time to complete TMT parts A and B separately at 4 equally spaced time points (t=0, t=2, t=4 and t=6 months).

• t=-1 : scores to the Positive and Negative Syndrome Scale.


Trail Making Test : Responding time (t0 t3, N = 102 complete, only!)

Illustration: data set 1



Fitting a linear LGM with time-varying and time-invariant covariates to TMT data (N=102)

B1 B2 B3 B4

A1 A2 A3 A4

Dis Pos Neg Host Anx

i

s

TMT form B

TMT form A


Is the linear growth model tenable?


Growth shape

Fit indices linear quadratic piecewise#par 21 27 27chi-square 44.676 44.049 42.489ddl 29 23 23p-value 0.0316 0.0052 0.0080CFI 0.957 0.943 0.947TLI 0.938 0.886 0.903AIC 9139 9151 9149BIC 9194 9221 9220SSABIC 9128 9136 9135RMSEA 0.073 0.095 0.091SRMR 0.046 0.048 0.064

1 01 11 21 3

1 0 01 1 11 2 41 3 9

1 0 01 1 01 2 01 2 1

Λ


Conditional LGM : resultsML estimation Two-Tailed

Estimate S.E. Est./S.E. P-Value

I ON

DISORG 5.075 2.666 1.904 0.057

POS 2.983 2.536 1.176 0.240

NEG 0.089 2.562 0.035 0.972

HOST -3.696 2.875 -1.285 0.199

ANX 4.272 2.817 1.516 0.129

S ON

DISORG -2.006 1.034 -1.940 0.052

POS -1.376 0.984 -1.400 0.162

NEG 1.408 0.991 1.421 0.155

HOST 1.222 1.115 1.095 0.273

ANX -0.360 1.092 -0.330 0.742



Conditional LGM : resultsML estimation

Two-Tailed


B1 ON

A1 1.674 0.226 7.394 0.000

B2 ON

A2 1.703 0.166 10.274 0.000

B3 ON

A3 1.511 0.115 13.110 0.000

B4 ON

A4 1.797 0.156 11.516 0.000




Two-Tailed


Intercepts

B1 0.000 0.000 999.000 999.000

B2 0.000 0.000 999.000 999.000

B3 0.000 0.000 999.000 999.000

B4 0.000 0.000 999.000 999.000

I -39.325 27.652 -1.422 0.155

S 4.543 10.730 0.423 0.672




Two-Tailed


Residual Variances

B1 3172.312 461.870 6.868 0.000

B2 1034.587 164.132 6.303 0.000

B3 387.629 75.508 5.134 0.000

B4 378.444 72.855 5.194 0.000

I 265.423 61.838 4.292 0.000

S 0.000 0.000 999.000 999.000

R-SQUARE

B1 0.395 0.061 6.427 0.000

B2 0.584 0.055 10.594 0.000

B3 0.801 0.041 19.526 0.000

B4 0.770 0.045 17.118 0.000

I 0.468 0.144 3.240 0.001

S 1.000 999.000 999.000 999.000



Representing heterogeneity with respect to the growth factors and covariates.GMM specifies a separate LGM for each of the K latent class simultaneously :

and

GMM specification

Y ν Λ η K X (5), ik k k ik k ik ik

η α Β η Γ X (6), ik k k ik k ik ik


Modeling predictive effects of time-invariant covariates on latent class membership

Mixture components (c) are related to covariates through a multinomial logistic regression model :

with the reference class K,

(1xq) vector of logistic regression coefficients from C on X,

0k logistic regression intercept for class k relative to class K.

Xi (qx1) vector of time-invariant covariate predictors of change.

GMM specification

( )

(7)( )

1

Pr( ) , C

ok k i

Coh h i

i i K

h

eC k Xe

Γ X

Γ X

( )CkΓ

24

Indices for determining the “best” GMM-Information-based criteria :

BIC, SABIC

- Nested model Likelihood Ratio Test :

LMR (Low-Mendell-Rubin) LRT, bootstrapped LRT

-Latent classification accuracy :

Entropy, average latent class probabilities for most likely latent class membership


GMM selection



Mplus representation of a linear GMM fitted to TMT data (N=102).

B1 B2 B3 B4

A1 A2 A3 A4

i

s

c

DisorgPosNegHostAnx

x


Restrictions

Overall var(i)=0 res. var(i)=0var(s)=0 var(s)=0 res. var(s)=0 res. var(s)=0 res. var(s)=0

x -> c x -> c x -> c x -> c i s x -> c i s x -> cclass 1 x -> iclass 2 x -> i#par 18 19 21 28 29 39starts (2000 20) OK OK OK OK OK OKBIC 4083 4083 4079 4102 4102 4095SSABIC 4026 4023 4012 4014 4010 4004Entropy 0,841 0,787 1 0,991 0,987 0,801LMR LRT p-value 0,14 0,78 0,03 0,01 0,036 0,20Nc1 28.50 76,17 87,25 93,03 93,03 25,41

Nc2 71.49 23,85 12,75 6,97 6,97 74,59


Determining the “best” growth two-class model

x c

i s

x c

i s

x c

i s

differencesbetweenclasses


GMM results : TMT data (N=102)

Information Criteria

Number of Free Parameters 29

Akaike (AIC) 4025.603

Bayesian (BIC) 4101.727

Sample-Size Adjusted BIC 4010.126

(n* = (n + 2) / 24

FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASS PATTERNS

BASED ON ESTIMATED POSTERIOR PROBABILITIES

Latent

Classes

1 7.10321 0.06964

2 94.89679 0.93036



GMM results : TMT data (N=102)

CLASSIFICATION QUALITY

Entropy 0.987

CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP

Class Counts and Proportions

Latent classes

1 7 0.06863

2 95 0.93137

Average Latent Class Probabilities for Most Likely Latent Class Membership (Row)

by Latent Class (Column)

1 2

1 0.994 0.006

2 0.002 0.998



Growth Mixture model results : TMT data (N=102)

VUONG-LO-MENDELL-RUBIN LIKELIHOOD RATIO TEST FOR 1 (H0) VERSUS 2 CLASSES

H0 Loglikelihood Value -2001.982

2 Times the Loglikelihood Difference 36.361

Difference in the Number of Parameters 8

Mean -7.722

Standard Deviation 35.246

P-Value 0.0355

LO-MENDELL-RUBIN ADJUSTED LRT TEST

Value 35.404

P-Value 0.0383




Categorical Latent Variables

Two-Tailed


C#1 ON

DISORG 1.478 0.550 2.689 0.007 POS 1.967 0.603 3.260 0.001 NEG -1.250 0.397 -3.150 0.002 HOST -2.240 0.869 -2.579 0.010 ANX -0.282 0.399 -0.706 0.480

Intercepts

C#1 -1.700 3.014 -0.564 0.573



GMM: probability of class membership as function of value on each of covariates : TMT data (N=102)


c#1 on Value on each of the covariatesdisorg 1,478 1 2 1 1 1 1 1 1 1pos 1,967 1 1 2 3 4 1 1 1 1neg -1,250 1 1 1 1 1 2 3 1 1host -2,240 1 1 1 1 1 1 1 2 3anx -0,282 0 0 0 0 0 0 0 0 0

interceptc#1 -1,700

log odds (c=1)= -1,75 -0,27 0,22 2,19 4,16 -3,00 -4,25 -3,99 -6,23log odds (c=2)= 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00

Prob(c=1) 0,15 0,43 0,56 0,90 0,98 0,05 0,01 0,02 0,00Prob(c=2) 0,85 0,57 0,44 0,10 0,02 0,95 0,99 0,98 1,00



Latent class 1 = Latent class 2

Two-Tailed


I ON

DiSORG 1.335 2.595 0.514 0.607

POS -1.365 2.703 -0.505 0.613

NEG 4.387 2.412 1.819 0.069

HOST 0.264 3.270 0.081 0.936

ANX 5.051 2.659 1.900 0.057

S ON

DiSORG -1.617 1.090 -1.483 0.138

POS -0.892 1.196 -0.746 0.456

NEG 0.917 0.899 1.019 0.308

HOST 0.780 1.585 0.492 0.622

ANX -0.434 1.206 -0.360 0.719


33June 2-4, 2010 - Saint-Raphaël



INSERM workshop : Mixture modelling for longitudinal data

Nc#1= 7

Nc#2= 95


Data set 2 : Learning to read and development of phonological and morphological processing

• 344 children (6-7 years) tested 6 times (6 weeks between each measurement occasion)• t1-1: Raven Matrix (int)

• t1 – t6 : 4 observed variables: Syllables Implicit Processing, Phonemes Implicit Processing , Syllables Explicit Processing, Phonemes Explicit Processing.• t6 + 1 week : Word reading (frequent words, rare words, pseudo-words)



t0 t1 t2 t3 t4 t5 t0 t1 t2 t3 t4 t5

t0 t1 t2 t3 t4 t5 t0 t1 t2 t3 t4 t5

Data set 2 : descriptive statistics



SEM representation of a quadratic GGMM with time invariant antecedents of change and a distal outcome (N=344)

Int

sip1 pip1 sep1 pep1

f1

sip2 pip2 sep2 pep2

f2

sip3 pip3 sep3 pep3

f3

sip4 pip4 sep4 pep4

f4

sip5 pip5 sep5 pep5

f5

sip6 pip3 sep6 pep6

f6

c

i s q

Lect.

freq.

rare

pseudowords



Multiple indicator LGM

First-order factor scores : measurement model with (strong) invariance constraints

Second-order growth factors :

Factor scores as deviations from the group mean :

Second-order growth model:

Multiple indicators GMM

Y ν Λη ,i i i

η Γξ ,i i i

ξ κ υ ,i i

Y ν Λ Γ κ υ( ) .i i i i


Multiple indicator GMM

First-order constraints :

Differences between latent classes :

- means ,

- covariances , - parameters for representing growth .

Y ν Λ Γ κ υ( ) .ik k k k k i k ik ik

ν ν Λ Λ Ψ Ψ θ θ, , , ,k k k k

κk

ΦkΓk

Multiple indicators GMM


Unconditional GMM : 2 classes vs 3 classes


var(i) 0 0var(s) 0 0 0 0var(q) 0 0 0 0 0 0Between var(i) classes var(s) cov(i,s)Parameters 96 98 100 103 106 100 102 104 107BIC 29953 29120 29080 29058 29015 29459 29062 29057 29028SABIC 29648 28809 28763 28732 28679 29141 28738 28727 28688Entropy 0,94 0,697 0,794 0,804 0,754 0,944 0,718 0,762 0,858LMR-LRT 0,000 0,016 0,015 0,000 0,000 0,000 0,190 0,540 0,140Nc1 82,46 39,25 74,80 23,37 66,39 32,78 36,74 67,89 8,71

Nc2 17,51 60,75 25,20 76,63 33,61 66,93 35,77 11,72 61,08

Nc3 0,29 27,49 20,69 30,21

Two-class GMM Three-class GMM



Three-class GMM with int as covariate, without (overall) and with (between) class differences

overall overall between overall between overall between betweenvar(i) 0var(s) 0 0 0var(q) 0 0 0 0 0covariate c on x c i on x c on x c i s on x c on x c i s q on x c on x c on xclass1 i on x i s on x i s q on x i s q on x, cov. i s qclass2 i on x i s on x i s q on x i s q on x, cov. i s qclass3 i on x i s on x i s q on x i s q on x, cov. i s qParameters 111 114 116 117 121 121 127 139BIC 33330 32473 32622 32259 32313 32263 32243 32286SABIC 32978 32112 32254 31888 31930 31879 31841 31845Entropy 0,916 0,991 0,820 0,987 0,986 0,990 0,986 0,987LMR-LRT 0,023 0,000 0,204 0,004 0,07 0,001 0,013 0,050Nc1 56,16 11,69 49,46 7,48 11,44 81,22 11,46 7,66

Nc2 34,90 81,28 36,64 11,73 81,10 11,65 80,99 80,99

Nc3 8,94 7,03 13,90 80,80 7,46 7,13 7,55 11,36



Conditional GMM: estimated means



GMM results : information criteria an quality of classification

Information Criteria Number of Free Parameters 127 Akaike (AIC) 31755.780 Bayesian (BIC) 32243.542 Sample-Size Adjusted BIC 31840.665 (n* = (n + 2) / 24)

FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASS PATTERNSBASED ON ESTIMATED POSTERIOR PROBABILITIES Latent Classes 1 278.61914 0.80994 2 39.41000 0.11456 3 25.97086 0.07550



GMM results : information criteria an quality of classification

Entropy 0.986CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIPClass Counts and Proportions Latent Classes 1 280 0.81395 2 38 0.11047 3 26 0.07558Average Latent Class Probabilities for Most Likely Latent Class Membership (Row)by Latent Class (Column) 1 2 3 1 0.995 0.005 0.000 2 0.003 0.990 0.007 3 0.000 0.011 0.989



GMM results : intercepts of i, s and q

Class 1

Intercepts I 3.693 0.275 13.451 0.000

S 1.103 0.145 7.632 0.000 Q -0.095 0.027 -3.559 0.000

Residual Variances I 0.961 0.106 9.084 0.000 S 0.152 0.031 4.924 0.000 Q 0.005 0.001 5.221 0.000




Class 2

Intercepts I 2.616 0.420 6.223 0.000 S 1.907 0.284 6.725 0.000 Q -0.254 0.055 -4.617 0.000

Residual Variances I 0.961 0.106 9.084 0.000 S 0.152 0.031 4.924 0.000 Q 0.005 0.001 5.221 0.000




Class 3

Intercepts I 0.000 0.000 999.000 999.000 S 1.127 0.354 3.187 0.001 Q 0.077 0.068 -1.137 0.256(linear trend in class 3 in fixing q@0) Residual Variances I 0.961 0.106 9.084 0.000 S 0.152 0.031 4.924 0.000 Q 0.005 0.001 5.221 0.000



GMM results : coefficients regression from categorical variables c on covariate

Categorical Latent Variables C#1 ON INTNV 0.172 0.058 2.969 0.003 C#2 ON INTNV 0.044 0.076 0.575 0.565 Intercepts C#1 0.392 0.709 0.553 0.580 C#2 -0.052 0.925 -0.056 0.955



GMM results : probability of class membership

c#1 on int value of int0,172 0,5 1 2 5 10

c#2 on int0,044

intercept c#10,392

intercept c#2-0,052

log odds (c=1)= 0,478 0,564 0,736 1,252 2,112log odds (c=2)= -0,03 -0,008 0,036 0,168 0,388log odds (c=3)= 0 0 0 0 0

Prob(c=1) 0,45 0,47 0,51 0,62 0,77Prob(c=2) 0,27 0,26 0,25 0,21 0,14Prob(c=3) 0,28 0,27 0,24 0,18 0,09



Estimated probabilities for c as a function of int level



GMM results : regression from i, s and q on covariate

Class 1 I ON INTNV 0.122 0.020 6.050 0.000 S ON INTNV -0.033 0.011 -2.939 0.003 Q ON INTNV 0.003 0.002 1.567 0.117 S WITH I -0.008 0.040 -0.206 0.837 Q WITH I -0.015 0.007 -2.309 0.021 S -0.026 0.005 -4.943 0.000



GMM results : reading proficiency level for each class

Class 1 Means LECT 7.508 0.434 17.288 0.000

Class 2 Means LECT 4.430 0.287 15.455 0.000

Class 3 Means LECT 0.000 0.000 999.000 999.000


Concluding remarks

Interest, limitations, cautionsGMM is a promising approach for modeling heterogeneous latent change across unobserved population subgroups.But :-GMM is usually based on large samples.-The search for heterogeneity should be conducted in a principled and disciplined way; the best way to guide GMM selection is to test different models following theory-based models.- GMM always identify groups- The role that covariates play in the enumeration process has to be clarified.- An important question : how to model missing data on x variables?

university rennes 2, crpcc, ea 1285

Documents