measurement in variance

8/11/2019 Measurement in Variance

1/26

Project for Introduction to Multivariate Statistics:

Measurement Invariance

Lian Hortensius

May 10, 2012

Contents

1 Abstract 2

2 Introduction 2

3 Structural Equation Modeling 33.1 Parameters in SEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.2 Data as modeled by SEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

4 Measurement Invariance 74.1 Introduction to Measurement Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . 74.2 Configural Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74.3 Structural Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4.3.1 Invariance of factor loadings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84.3.2 Invariance of intercepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84.3.3 Invariance of error terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

5 Fit statistics 95.1 The2 test statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

5.1.1 Specify two models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95.1.2 Unspecified parameters are estimated with Maximum Likelihood Estimation . . 105.1.3 Calculate the Likelihood of both models . . . . . . . . . . . . . . . . . . . . . . . 105.1.4 Compare the Likelihood Ratio using the (approximate)2 distribution . . . . . . 115.1.5 Detour: why is the Likelihood ratio test statistic distributed2df? . . . . . . . . . 115.1.6 Using the Likelihood ratio test for testing Measurement Invariance . . . . . . . . 12

5.2 Other GoF measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.2.1 CFI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.2.2 Gamma hat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145.2.3 RMSEA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145.2.4 McDonalds Non-Centrality Index . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5.3 Comparing GoF measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1


2/26

6 Limitations and other approaches 166.1 Limitations of Measurement Invariance Testing . . . . . . . . . . . . . . . . . . . . . . . 166.2 Comparing SEM and IRT methods for testing Measurement Invariance . . . . . . . . . . 16

7 My program in OpenMX 17

7.1 What it does . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177.2 Lovely Little Demo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

8 Conclusion 18

9 Appendix - Measurement Invariance OpenMx program 21

1 Abstract

Structural Equation Modeling (SEM) is popular, and people want to test for Measurement Invariance,

which is whether the model is equal across groups. You test for Measurement Invariance by comparing

a model without certain restrictions with a model with certain restrictions (e.g. restricting the factorloadings to be equal) and seeing whether the restrictions significantly decrease the fit of the model.

There are different types of Measurement Invariance (configural and structural). Testing for invariance

includes a series of comparisons, for which the tests of invariance of shape of the model, factor loadings,

intercepts, and residual variances are used most often. There are different ways to determine the fit of

models. A commonly used fit statistic is the 2 test statistic based on the Likelihood Ratio test, and

the mechanics of this statistic are explained. Other fit statistics include CFI, RMSEA, Gamma hat,

and Mc NCI. Monte Carlo studies suggest using CFI and Mc NCI. Limitations of the SEM approach

to Measurement Invariance are discussed and it is contrasted with the IRT approach to invariance.

2 Introduction

Structural Equation Modeling (SEM) has become a popular way to model psychological data. One

of the issues that users of SEM face is the problem of Measurement Invariance: does the model fit

different groups equally well, or is there some difference in the pathways between variables across

different groups. Initially, Measurement Invariance was seen as a requirement for a good model, or

at least as the preferred outcome. However, more recently some researchers have searched for a lack

of Measurement Invariance: for example, Spaan (2012) tested Measurement Invariance of a battery of

neuropsychological tests in elders with and without early-stage Alzheimer. She found that the factor

loadings were not invariant across groups and used that to support the hypothesis that cognitive decline

in elders with Alzheimer is substantively different from cognitive decline in elders without Alzheimer.

This paper will start with an introduction to Structrual Equation Modeling and an explanation of

the concept of Measurement Invariance, including the different types of Measurement Invariance that

can be tested for. The second part of the paper will focus on an explanation of different fit statistics,

most notably the 2 statistic based on the Likelihood Ratio test. It includes a literature review of

which fit statistics to use when testing for invariance, and how to decide whether there is significant

2


3/26

misfit. The fourth part focuses on limitations of the SEM approach to Measurement Invariance and

contrasts it with the IRT approach. The final part of the paper is a demonstration of the OpenMX

program I have written to test for Measurement Invariance.

3 Structural Equation Modeling

Structural Equation Modeling is a method for modeling the relationships between variables. Several

equations representing relationships are specified and then the coefficients in those equations are esti-

mated simultaneously. SEM is a generalization of Factor Analysis (FA); in FA the common variance

of the observed variables is explained by underlying latent traits (common factors and unique factors),

and the factor loadings of those traits on the observed variables are estimated. In SEM, one can specify

connections between the latent traits. For example, in Figure 1 on the left is a path diagram represent-

ing a simple FA, and on the right is a path diagram representing a simple structural equation model

with a relationship between common factors.

X2X1 X3

1

12

3

2e1 2e2

2e3

12

3

(a) A simple FA-model

Y2Y1 Y3

X2X1 X3

1

y1y2

y3

21 22

23

x1x2

x3

21 22

23

x1 x2x3

y1 y2y3

(b) A simple SEM model with two related com-mon latent variables

Figure 1: Two Path Diagrams

Usually, SEM is used to model the influence of some predictor variableson somedependent variables

through latent traits. An example of such a model is given in Figure 1 on the right. In this example, as

in the general SEM literature, let represent the endogenous common latent trait(s), where endogenous

3


4/26

means that this variable is predicted by other variables. Let represent the exogenous common latent

trait(s), where exogenous means that this variable is external to the model; it is a predictor and is

not predicted by any other variable. Let represent the structural coefficients/loadings of on , lety

be the vector of observed endogenous indicators, let y be the structural coefficients/loadings of on

y, let be the errors (or uniqueness) associated with y, let x be the structural coefficients/loadings

of on x, let x be the vector of observed exogenous indicators, let be the errors (or uniqueness)

associated with x, and let be the error (or uniqueness) associated with . The triangle is used to

model the means: for simplification we have set the latent trait means equal to 0 and use to represent

indicator intercepts, which because of this simplification will also be the indicator means.

These models can be expressed with path diagrams, as we have seen, but also with equations. The

example given earlier can be expressed as such:

= +

The endogenous latent trait is a function of the exogenous latent trait (with factor loading and

the error .

y=

y1

y2

y3

+

y1

y2

y3

+

1

2

3

=y+ y+

The observed endogenous indicators y are a function of the endogenous latent trait (with factor

loadings) and the error. By estimating the coefficients, we can get information about the unobserved

latent traits (common and unique) using the observed indicators.

x=x1

x2

x3

+ x1

x2

x3

+ 1

2

3

=x+ x+

The observed exogenous indicatorsx are a function of the exogenous latent trait (with factor loadings

) and the error .

The final generalization of this model involves expanding the single endogenous latent trait to a

vector of traits, and expanding the single exogenous latent trait to a vector of traits. Figure 2 shows

an example. In this example there are two exogenous common latent traits, each with three exogenous

observed indicators, and two endogenous common latent traits, each with three endogenous observed

indicators. The two endogenous latent traits are correlated, one endogenous latent trait is predicted

using both exogenous latent traits, and one endogenous latent trait is predicted using only one latent

trait. To keep the picture readable the triangle with a 1, and the pathways representing the means,

have been omitted. This example will help to show the final Structural Equation Modeling notation.

1

2

=

0 12

12 0

1

2

+

11 12

21 22

1

2

+

1

2

= = B+ +

4


5/26


6/26

1

Y2Y1 Y3 Y4 Y5 Y6

2

1

X2X1 X3

2

X5X4 X6

y1y2

y3

21 22

23

1

y4y5

y6

24 25

26

2

x1x2 x3

21 22

23

x4x5 x6

24 25

26

1112

22

12

Figure 2: An expanded SEM model

Error/uniqueness variances of observed exogenous and endogenous variables, given by and

respectively

Error variances in the conceptual models, i.e. error variances of the common latent variables,

given by

3.2 Data as modeled by SEM

The previously described Structural Equation Models are a way to model data. It is possible to run

a Structural Equation Model based solely on a covariance matrix (or SSCP matrix or correlation

matrix), such a matrix plus a mean vector, or on raw data. When doing Measurement Invariance, it is

important to use the covariance matrix rather than a correlation matrix, because by standardizing the

data a difference in scale between groups might be lost. The sample covariance matrix is denoted by

S.

In SEM, the predicted covariance matrix is given by the equation

= T + .

The intercepts are denoted by , with the predicted intercepts, which are themselves parameters in

the model, denoted by .

In this way, using the estimated parameters one can calculate the predicted covariance matrix and

mean structure, and compare those to the observed covariance matrix and mean structure.

6


7/26

4 Measurement Invariance

4.1 Introduction to Measurement Invariance

So far I have talked about single Structural Equation Models, and the parameters that are estimated

for them. Now I want to introduce Measurement Invariance and when you would want to test for it.

Meredith (1993) defined Measurement Invariance as the parameters of a model depending on group

membership. Say you have two samples, from two groups that might be different (male and female,

or French and Australian), and you want to know whether the test you have created can be used

equivalently and can be compared across groups, so you can compare these groups on some latent

trait(s) (MacCallum & Austin, 2000; e.g. Kim, K.H., Cramond, B. & Bandalos, D.L., 2006). Or say

you want to know whether the underlying factor structure for data from a neuropsychological test

battery is the same for healthy elders and Alzheimer patients, to find support for either the continuity

hypothesis (cognitive decline in Alzheimer patients is just a more extreme form of the cognitive decline

in normal aging) or the discontinuity hypothesis (there is a substantive difference in the form cognitive

decline takes across groups) (Spaan, 2012).

In all these examples testing for Measurement Invariance can tell you whether the same factor struc-

ture upholds in both groups, and also whether the factor loadings and the means and error distribution

of the groups are the same. By testing for Measurement Invariance, we test for equivalence of the

structural equation model across groups. Measurement Invariance is often tested for when using tests,

where the indicators are individual test items, but it can also be used with, for example, subtest scores

as indicator variables.

When comparing models, the typical approach is to compare a model to a more restricted version

of the same model. When testing Measurement Invariance, typically the fit of the model for the whole

data with specific parameters fixed to be equal across groups is compared to the model for the whole

data with those parameters free to vary. If the free-to-vary condition fits significantly better than the

fixed-to-be-equal condition, this parameter vector is not invariant across groups. It is important to

mention again that one must use the covariance matrix, not the correlation matrix, when testing for

Measurement Invariance in SEM. Because in the correlation matrix the metric has been standardized,

you cannot test whether the metrics were different initially.

There is not one single test that can be done - rather, one must do a series of tests, each designed to

test for a specific level of Measurement Invariance. In the next sections I will explain the comparisons

that are done most commonly: invariance of configuration, factor loadings, means, and error terms

(Chen, 2007). It is possible (and might be desirable) to test for more levels of invariance (a nice

overview is given in Cheung & Rensvold, 2002), but I have decided to stick to those that are at least

somewhat commonly performed.

4.2 Configural Invariance

In our earlier example of SEM, some parameters were presumed to be larger than 0 and some were

set to 0: not all endogenous observed indicators were indicators of each endogenous latent trait, for

7


8/26

example. The first test of Measurement Invariance tests whether the same configuration holds across

groups: whether in both groups the same pathways can be said to be 0. If a model exhibits configural

noninvariance, it is not possible to compare across groups at all: the observed variables are indicators

of different traits in different groups.

4.3 Structural Invariance

Once configural invariance has been established, the next step is to look at different aspects of structural

invariance: invariance of factor loadings, then intercepts, and finally error terms.

4.3.1 Invariance of factor loadings

The researcher will want to know whether the factor loadings are equal across groups: do the latent

traits have the same loadings on the indicators that they are associated with, and are the relationships

between latent traits equal across groups? This is a test of equality of structure coefficients. If factor

loading invariance is established, the groups can be said to have the same unit of measurement. This

is considered weak measurement invariance. If factor loading noninvariance is found, there are several

things one could do, but if one wants to discover which specific factor loading(s) are invariant the review

by Millsap and Meredith (2007) suggests using backwards elimination. One would release the factor

loadings one by one and check the reduction in fit for each released constraint, to determine which

factor loading(s) are noninvariant.

4.3.2 Invariance of intercepts

Now that equal unit of measurement has been established, the researcher will also want to know whether

the intercept (or mean structure) across groups is the same: whether the scores from different groupshave the same origin. If intercept noninvariance is found, the different groups have different means on

the indicators and thus individual scores cannot easily be compared across groups. On the other hand,

intercept noninvariance can be a substantive research finding. Presence of both factor loading invariance

and intercept invariance is considered strong measurement invariance. As with factor loadings, intercept

noninvariance can be explored more deeply by releasing constraints on the intercepts one by one.

4.3.3 Invariance of error terms

The final test is that of residual invariance. This answers the question of whether group differences

on the items are only due to group differences on the common factors. It can be difficult to achieve

residual invariance. Presence of all four types of invariance is considered strict measurement invariance

and indicates that group differences in the covariances, variances, and means for the indicator variables

(demonstrated by factor loading invariance, intercept invariance, and error term invariance respectively)

are due to group differences on the common factors (Millsap & Meredith, 2007).

8


9/26

5 Fit statistics

In the previous section I explained what Measurement Invariance is and which parameters we will want

to know invariance of. There are different ways of testing for invariance, which all involve model fit.

The most widely used of them is the 2 test statistic based on the Likelihood Ratio test. I will explain

this statistic in detail below.

5.1 The 2 test statistic

The most widely used test statistic for model fit in SEM is the classical likelihood ratio statistic 2t.s.

(which assumes a normal distribution of the data). If the data are truly normally distributed and the

model structure is correctly specified, 2t.s. approaches a chi-square distribution 2dfas the sample size

Nincreases (Yuan, 2005).

When assessing for model fit, you calculate how similar the predicted covariance matrix (based on

the model; ) is to the covariance matrix containing the relationships in the actual data ( S), as well asthe similarity of the predicted mean vector () and observed mean vector (). This is calculated using

the aforementioned Likelihood ratio statistic. The 2t.s. statistic is a function of the sample size and

how different the model covariance matrix and mean vector and observed data are. A more extreme

2t.s. means the predicted values are more different from the observed data, i.e. the model has a worse

fit. A significant p-value thus indicates bad model fit.

I will now first explain this procedure in general terms and then specify its use for SEM. The steps

taken when calculating the 2 test statistic are such:

Specify two models

Unspecified parameters are estimated with Maximum Likelihood Estimation

Calculate the Likelihood of both models

Compare the Likelihood Ratio using the (approximate) 2 distribution

Detour: why is the Likelihood Ratio test statistic distributed2df?

I will now explain each of these steps.

5.1.1 Specify two models

Say you have a parameter vector = 1, 2,...,p of length p, where (the parameter space).In other words, the parameter vector can take values that are in the parameter space. The parameter

space is a subspace of p-dimensional Euclidian space Rp.

In order to specify two models, you can restrict some of the parameter values. For example, in

model 1 do not restrict any of the parameter values, and in model 2 specifcy 1 = 1 and 2 = 0 (but

leave3,...,p free to vary). Model 2 (the more restricted model) is nestedin Model 1 (the more general

model).

9


10/26

5.1.2 Unspecified parameters are estimated with Maximum Likelihood Estimation

In the next step we will want to calculate the Likelihood of both models, and we will want to have

the highest possible Likelihood given the constraints on parameter values. I will first explain what

Likelihood is and how to maximize the Likelihood for a single parameter.The distribution for the data,x with length n, given the parameter , is a PDF given by

f(x|) =n

i=1

f(xi|) = Ln().

The last term is the Likelihood functionof the parameter given the data x. It is a function of the

parameter and represents how likely each possible parameter is, given the data. When calculatingthe Likelihood, you can drop any terms that do not contain the parameter. The highest Likelihood is

at the value that is the Maximum Likelihood Estimate (MLE). In order to calculate the MLE, you

take the (natural) log of the Likelihood:

ln() = logL().

The next step is to take the derivative of the log likelihood function with respect to :

ln() =ln()

Finally, in order to find the MLE you set l n() equal to 0 and solve for :

ln() = 0

Under certain conditions (e.g. concavity of the likelihood function) this value is the value that globally

maximizes the Likelihood. When finding a set of MLEs for multiple parameters the distribution for the

data, x with length n, given the parameters, , is a PDF given by f(x|) =ni=1f(xi|). The MLEvector is a vector of parameter estimates which, combined, gives the highest possible likelihood for

the model.

5.1.3 Calculate the Likelihood of both models

We have seen how to obtain the Likelihood function, and how to find the parameter value(s) that will

maximize the Likelihood. Now we calculate the Likelihood given those parameter values. In our earlier

example Model 1 had no constraints on parameter values and Model 2 had constraints on the first twoparameters but not on the others. So for Model 1 the Likelihood is calculated with the MLEs for every

parameter value, and for Model 2 the Likelihood is calculated with the fixed values for parameter one

and two, and the MLEs for the other parameters.

Obviously, the Likelihood for Model 1 will be higher than for Model 2, since for Model 2 not every

parameter was estimated to be the value that maximizes the Likelihood.

10


11/26

5.1.4 Compare the Likelihood Ratio using the (approximate)2 distribution

We have calculated the Likelihood of Model 1 (L(Model1)) and the Likelihood of Model 2 (L(Model2)).

We want to know whether Model 2 fits significantly less than Model 1 - we know it fits less, and we

want to determine whether it fits so much less that we can say the constraints on the parameters wereincorrect.

When comparing two (nested) models, the difference in (log) Likelihood test statistics given by:

2t.s.= 2lnL(model2)L(model1)

= 2

lnL(Model2) lnL(Model1)

is asymptotically distributed as a2 with degrees of freedom that is the difference between the number

of free parameters of Model 1 and Model 2. This was demonstrated by Wilks in 1938. His proof is

beyond the scope of this paper, but I will demonstrate why a simpler likelihood ratio test is distributed

as2; the more complicated tests are an expansion of this principle. Therefore it is possible to use2t.s.

as a test statistic and compare it to a 2 distribution with the appropriate degrees of freedom to finda p-value. I will now demonstrate why the Likelihood ratio test statististic is distributed2df for an

example with only one free parameter.

5.1.5 Detour: why is the Likelihood ratio test statistic distributed2df?

Say we have a sample X1,...,Xnfrom a normal distribution with known variance. We want to compare

Model 1, where is estimated freely, with Model 2, where is fixed at 0.

We want to estimate the unspecified parameterin Model 1 using Maximum Likelihood Estimation.

First we find the PDF:

f(x|) =

n

i=1

1

22e(xi)

2/22

The PDF is also the Likelihood function:

Ln() = ( 122

)nni=1

e(xi)2/22 = (

122

)ne

n

i=1(xi)2/22 =

( 122

)nen(xn)2/22 e

n

i=1(xixn)2/22

In order to calculate the MLE, you take the (natural) log of the Likelihood, dropping the terms that

do not contain for simplicity:

ln() =n(xn

)2

22

The next step is to take the derivative of the log likelihood function with respect to :

ln() =n(xn )

2

11


12/26

Then we set the derivative equal to zero and solve for :

ln() = 0 =n(xn )

2 = xn

And so we see that the MLE = xn. Now that we know the MLE, we can calculate the Likelihood of

Model 1 (where was estimated freely, with the MLE).

Ln() = ( 122

)nen(xn)2/22 e

n

i=1(xixn)2/22 = (

122

)ne

n

i=1(xixn)2/22

because = xn. And the Likelihood for Model 2, with = 0:

Ln(0) = ( 122

)nen(xn0)2/22 e

n

i=1(xixn)2/22 .

Now we have the Likelihood for both models, so we can calculate the Likelihood ratio:

L(model2)L(model1)

= ( 1

22)nen(xn0)

2/22 e

n

i=1(xixn)2/22

( 122

)ne

n

i=1(xixn)2/22

= en(xn0)

2/22

So the Likelihood ratio test statistic 2t.s. is:

2lnL(model2)L(model1)

= 2ln(en(xn0)2/22) = 2 n(xn 0)

2

22 =

n(xn 0)22

=(xn 0)

/n

2

Under the hypothesis that = 0, we know (xn0)

/n N(0, 1), and therefore

(xn0)/

n

2is a squared

standard normal score, which is a score from a 2 distribution with 1 degree of freedom - which is what

we claimed the Likelihood ratio test statistic is distributed as in this example with 1 parameter.

5.1.6 Using the Likelihood ratio test for testing Measurement Invariance

So far I have talked about fairly simple models, comparing fit with one free parameter. In SEM, there

are many more free parameters, but the idea is the same: calculate how likely the observed covariance

matrix is given the parameters (some fixed and some estimated freely to maximize the likelihood) and

then compare that to the likelihood of the observed covariance matrix when using only Maximum

likelihood estimates. Conceptually, this is the same as comparing the observed covariance matrix S to

a model-specified covariance matrix . So when using the Likelihood ratio test for model fit in SEM,

the calculation of the Likelihood is more complicated than in the earlier examples (because there are

many more parameters), but the test statistic can be calculated in the same way.

When testing for Configural and Structural Invariance, four consecutive tests are done comparing

four models (and a baseline model with no constraints). See table 1 for the models and its constraints.

Each additional model is nested in the previous one. Table 2 shows the four tests that are done. Each

additional model is compared with the previous one, so that each test is only for the newly added set

of constraints (e.g. equal factor loadings).

12


13/26

The2 test statistic has disadvantages that have been acknowledged in recent times. As mentioned

previously, it is sensitive to sample size - when the sample size is large, the test statistic will be

significant even for small differences between the observed covariance matrix and the model-specified

covariance matrix, i.e. it will reject models that fit reasonably well. Hayduk (1987) mentions that

in his experience this is a problem for sample sizes over 500. In addition, the2 test is senstive to

non-normality, specifically kurtosis. Studies have shown that skewness does not impact the test too

much, but if there is kurtosis in the data the resulting 2 test statistics are not distributed as a 2

(Yuan, 2005). Because of these issues, additional Goodness of Fit (GoF) measures have been proposed

for testing how well a specific model fits data from one group, and recently they have also been applied

to testing for measurement invariance.

Table 1: Models used in testing for Measurement Invariance

Model name Constraints

Model 1 Equal configurationModel 2 Equal factor loadingsModel 3 Equal factor loadings and interceptsModel 4 Equal factor loadings, intercepts, and error variances

Table 2: Consecutive tests of Measurement Invariance

Models compared: Type of invariance tested for: Testing equality of:Model 1 vs Baseline model Configural invariance ConfigurationModel 1 vs Model 2 Weak invariance Factor loadings

Model 2 vs Model 3 Strong invariance InterceptsModel 3 vs Model 4 Strict invariance Error variances

5.2 Other GoF measures

In this section I will explain some of the common Goodness of Fit indices. As can be seen in the next

subsection, simulation studies have focused on the Comparative Fit Index (CFI), Gamma hat, Root

mean square error or approximation (RMSEA), and/or McDonalds Non-Centrality Index (McD NCI)

as potential GoF indices to use.

5.2.1 CFI

The Comparative Fit Index is calculated by

CFI = 1 X2t dftX2n dfn

13


14/26

where 2t is the chi-square for the tested model, dft is its associated degrees of freedom, 2n is the

chi-square for the null model, and dfn is its associated degrees of freedom (Bentler, 1990).

5.2.2 Gamma hat

This fit index is proposed by Steiger (1989), although he actually proposed two Gamma indices and

the later Monte Carlo studies do not specify which one they used. They do reference Hu and Bentler

(1998) who only discuss one of the two Gamma hat indices:

Gamma hat = p

p+ 2[(TT dfT)/(N 1)]

where TT is the T-statistic for the target model (with dfTits associated defrees of freedom), p is the

number of parameters, and Nis the sample size. Hu and Bentler recommended against using Gamma

hat in practice.

5.2.3 RMSEA

The Root mean square error of approximation is calculated such:

RMSEA =

2t dftdft(N 1)

where 2t is the chi-square for the tested model, dft is its associated degrees of freedom, and N is the

sample size. When using the RMSEA for testing for Measurement Invariance, it is adjusted by taking

the square root of the overall population discrepancy function divided by the average number of degrees

of freedom per sample. The population discrepancy function is the value of the discrepancy function we

would get if we fit the model to the (actually unknown) true population covariance matrix (Steiger,

1998).

5.2.4 McDonalds Non-Centrality Index

In 1989 McDonald proposed his Non-Centrality Index (McD NCI) as an alternative to other GoF indices

which are usually sensitive to sample size. It is an adaptation of the Akaike Information Criterion which

depends on sample size as well: for a small sample size the simplest model will result in the smallest

AIC, whereas as sample size increases eventually the saturated model will result in the smallest AIC.

As an alternative to this, McDonald proposed using the Non-Centrality Parameter rescaled to not be

a function of sample size: =

n. This value can be estimated by = f (pn) wheref=

(2logL)n and

p= the order of the sample covariance matrix. This results in a measure of misfit, so to change that

to a goodness of fit index McDonald proposed the final transformation:

mc = exp 1

2d.

14


15/26


16/26

6 Limitations and other approaches

6.1 Limitations of Measurement Invariance Testing

In previous sections I explained how, using a SEM approach, one can test for Measurement Invariance.

In this section I will present a critical view of the SEM approach to testing for Invariance.

A problem that has been discussed rarely in the literature is that of sequential testing. In testing

for invariance one does a sequence of tests, each designed for testing for a specific type of invariance. In

other words, one considers a set of statistical tests based on one dataset. This leads to an inflated type I

error rate, which should be corrected for (e.g. Benjamini & Hochberg, 1995). However, no correction for

multiple comparisons is discussed in the Measurement Invariance literature. Not correcting for multiple

testing leads to an inflated Type I error rate, i.e. too often one concludes there is a significant difference

when in truth the samples are from the same population, which in this context means one too often

incorrectly diagnoses Measurement Noninvariance. Therefore, not correcting for multiple comparisons

leads to a more conservative set of tests of Measurement Invariance.

Reise, Waller, and Comrey (2000) discuss testing Measurement Invariance in the context of scale

revision. They point out that using the standard Comfirmatory Factor Analysis method to compare

the factor loadings of dichotomous items can be inappropriate, because these methods are designed to

be used on continuous interval-level measurement of the indicators. In addition, they warned that a

lack of simple structure can lead to an incorrect conclusion of Measurement Noninvariance.

6.2 Comparing SEM and IRT methods for testing Measurement Invariance

A different approach than the SEM approach to invariance, i.e. the multiple group CFA approach, is

the Item Response Theory (IRT) approach. This is appropriate when testing the factor loadings ofindividual items, so when looking at the structure of a scale/test. In IRT the item response is modeled as

a logistic function of item discrimination (analogous to factor loadings), item location (the difficulty

of the item), and potentially a lower and upper asymptote. It is useful for modeling responses to

dichotomous or polytomous items. Measurement Invariance in the IRT context can be conceptualized

as invariance of the item parameters - are the same item parameters applicable to the two groups, and

thus, do the items have the same relation to the latent trait in both groups (Reise, Widaman & Pugh,

1993). In IRT, Measurement Noninvariance is often called differential item functioning: one or more

items function differently in the two groups.

Reise, Widaman, and Pugh (1993) compared multiple group CFA (i.e. SEM) and Item Response

Theory (IRT) as methods to test for Measurement Invariance. They concluded that both methods

have advantages. Multiple group CFA is more user friendly, i.e. simpler to implement, and has more

varied options to determine model fit, i.e. a combination of the likelihood ratio test and additional

other goodness of fit indices, which the IRT model lacks. The issue they address of multiple group CFA

disregarding the difficulty parameter of IRT can be circumvented by testing for intercept invariance in

addition to structural coefficients invariance. However, the more serious criticism of the SEM approach

16


17/26

is that it is a linear model which is generally not appropriate for dichotomous or polytomous responses,

unlike IRT which uses a logistic model.

Glockner-Rist and Hoijtink (2003) took an in-depth look at the similarities and differences of SEM

and IRT approaches to Measurement Invariance. They proposed an integrated framework where IRT

properties are added to a SEM approach. They first demonstrate that the normal ogive two parameter

IRT model is a type of (nonlinear) factor analysis, and because this model is equivalent to the logistic

IRT model the latter is also comparable to factor analysis. They run a non-linear confirmatory factor

analysis using the normal ogive structure, hence they run an IRT model in a SEM framework. This

approach can be helpful for bringing IRT and SEM literature closer together, and using the SEM

approach for dichotomous or polytomous data.

7 My program in OpenMX

7.1 What it does

In order to write a program to test Measurement Invariance, I used OpenMx (Boker et al., 2011),

a package in R (R Development Core Team, 2011). My program, calledmeasInvTest takes two raw

datasets as its argument (with the option to specify the number of common latent factors, set to 1 by

default). It specifies four models in succession - Model 1 through 4 from table 1, with no constraints

on which factor loadings should be 0. It would be possible to add this, but the details would depend

on the number of indicators, common factors, and the model that one wishes to specify.

Each model is tested using the Likelihood ratio test against the model with one fewer set of con-

straints on the parameters. The first comparison tests a model with no constraints (other than the

basic constraints of common factor means being 0, and common factor variances being 1 to identify

the model) versus a model with the constraint that factor loadings must be equal across group. The

second comparison tests the latter model against a model with the additional constraint that indicator

means must be equal across groups. The third comparison tests the latter model against a model with

the additional constraint that uniqueness factors variances must be equal across groups.

The output of the program is a table showing the results from these three comparisons, with (among

other, less relevant output) the diffLL (i.e. 2t.s.), the associated degrees of freedom (diffdf) and the

p-value.

7.2 Lovely Little Demo

For this demo of the program I will compare two raw datasets used as an example by Michael Neale,

one of the creators of OpenMx. Here I will just use the function - in the Appendix I provide the full

code for the function.

First, I load the relevant packages and read in the data:

> ## First the necessary preamble:

> .libPaths(new="C:\\Lian\\R Packages")

17


18/26

> require(OpenMx)

> require("mvtnorm")

> ## Let

s use Michael Neale

s data:

> data(myFADataRaw)

> group1 group2


19/26

Noninvariance). Another limitation of the (linear) SEM based approach to Measurement Invariance

is that often this will be used with dichotomous items as indicators, which leads to decidedly non-

continuous data - a violation of the assumptions. A combination of IRT and SEM might be more

appropriate for testing Measurement Invariance in the future.

References

[1] Benjamini, Y. & Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Pow-erful Approach to Multiple Testing. Journal of the Royal Statistical Society, Series B (Methodolog-ical), 57(1), 289-300.

[2] Bentler, P.M. (1990). Comparative Fit Indexes in Structural Models. Psychological Bulletin,107(2), 238-246.

[3] Boker, S.M., Neale, M.C., Maes, H.H., Wilde, M.J., Spiegel, M., Brick, T.R., Spies, J., Estabrook,

R., Kenny, S., Bates, T.C., Mehta, P., & Fox, J. (2011). OpenMx: An Open Source ExtendedStructural Equation Modeling Framework. Psychometrika, 76(2), 306-317.

[4] Byrne, B.M. & Watkins, D. (2003). The Issue of Measurement Invariance Revisited.Journal ofCross-Cultural Psychology, 34, 155-175.

[5] Chen, F.F. (2007). Sensitivity of Goodness of Fit Indexes to Lack of Measurement Invariance.Structural Equation Modeling, 14(3), 464-504.

[6] Cheung, G.W. & Rensvold, R.B. (2002). Evaluating Goodness-of-Fit Indexes for Testing Measure-ment Invariance. Structural Equation Modeling, 9(2), 233-255.

[7] Glockner-Rist, A. & Hoijtink, H. (2003). The Best of Both Worlds: Factor Analysis of Dichoto-mous Data Using Item Response Theory and Structural Equation Modeling. Structural Equation

Modeling, 10(4), 544-565.[8] Hayduk, L.A. (1987). Structural Equation Modeling with LISREL: Essentials and Advances. The

John Hopkins University Press: Baltimore, Maryland.

[9] Kim, K.H., Cramond, B., & Bandalos, B.L. (2006). The Latent Structure and Measurement Invari-ance of Scores on the Torrance Tests of Creative Thinking-Figural. Educational and PsychologicalMeasurement, 66(3), 459-477.

[10] MacCallum, R.C. & Austin, J.T. (2000). Applications of Structural Equation Modeling in Psycho-logical Research. Annual Review of Psychology, 51, 201-226.

[11] McDonald, R.P. (1989). An Index of Goodness-of-Fit Based on Noncentrality. Journal of Classifi-cation, 6, 97-103.

[12] Meredith, W. (1993). Measurement Invariance, Factor Analysis and Factorial Invariance. Psy-chometrika, 58(4), 525-543.

[13] Millsap, R.E. & Meredith, W. (2007). Factorial Invariance: Historical Perspectives and New Prob-lems. In Cudeck, R. & MacCallum, R.C. (Eds.) Factor Analysis at 100: Historical Developmentsand Future Directions. Lawrence Erlbaum Associates: Mahwah, New Jersey.

[14] R Development Core Team (2011). R: A language and environment for statistical computing. RFoundation for Statistical Computing: Vienna, Austia.

19


20/26

[15] Reise, S.P., Waller, N.G. & Comrey, A.L. (2000). Factor Analysis and Scale Revision.PsychologicalAssessment, 12(3), 287-297.

[16] Reise, S.P., Widaman, K.F., & Pugh, R.H. (1993). Confirmatory Factor Analysis and Item Re-sponse Theory: Two Approaches for Exploring Measurement Invariance. Psychological Bulletin,

114(3), 552-566.

[17] Spaan, P.E.J. (2012). Cognitieve achteruitgang bij normale veroudering en de ziekte van Alzheimer:Een continue of discontinue overgang? Tijdschrift voor Neuropsychologie, 7(1), 3-15.

[18] Steiger, J.H. (1989).EzPATH: A supplementary module for SYSTAT and SYGRAPH.SYSTAT:Evanston, Il.

[19] Steiger, J.H. (1998). A note on multiple sample extensions of the RMSEA fit index. StructuralEquation Modeling, 5(4), 411-419.

[20] Wilks, S.S. (1938). The Large-Sample Distribution of the Likelihood Ratio for Testing CompositeHypotheses.Annals of Mathematical Statistics, 9, 60-62.

[21] Yuan, K. (2005). Fit Indices Versus Test Statistics.Multivariate Behavioral Research, 40(1), 115-148.

20


21/26

9 Appendix - Measurement Invariance OpenMx program

Here I will provide the entire program used to test for Measurement Invariance:

> ## First the necessary preamble:

> .libPaths(new="C:\\Lian\\R Packages")> require(OpenMx)

> require("mvtnorm")

> options(continue=" ")

> ############

> ## Data ##

> ############

>

> ## Let s use Michael Neale s data:

> data(myFADataRaw)

> group1 group2


22/26

mxModel("group2",

mxData(observed=group2, type="raw"),

mxMatrix("Full", s.nrvar, s.nrlatent, values=0.2,

free=TRUE, labels=paste("l2",1:(s.nrvar*s.nrlatent),sep=""),

name="A"),

# A matrix of factor loadings (free)mxMatrix("Symm", s.nrlatent, s.nrlatent, values=diag(s.nrlatent),

free=FALSE, name="L"),

# A matrix of factor variances (orthogonal model)(fixed to 1)

mxMatrix("Diag", s.nrvar, s.nrvar, values=1, free=TRUE,

labels=paste("e2",1:s.nrvar, sep=""), name="U"),

# A matrix of Uniqueness variances (free)

mxMatrix("Full", 1, s.nrvar, free=TRUE, labels=paste("mean2",

1:s.nrvar, sep=""), values=rep(1, s.nrvar), name="meanMatrix"),

# A vector of means on observed variables (free)

mxAlgebra(A %*% L %*% t(A) + U, name="R"),

# The algebra to calculate Sigma

mxFIMLObjective(covariance="R", means="meanMatrix", dimnames=

colnames(group2))

# The objective

),

mxAlgebra(group1.objective + group2.objective, name="h12"),

mxAlgebraObjective("h12")

)

noConstraintFit


23/26


colnames(group1))

# The objective

),

mxModel("group2",

mxData(observed=group2, type="raw"),mxMatrix("Full", s.nrvar, s.nrlatent, values=0.2,

free=TRUE, labels=paste("l",1:(s.nrvar*s.nrlatent),sep=""),

name="A"),

# A matrix of factor loadings (free)

mxMatrix("Symm", s.nrlatent, s.nrlatent, values=diag(s.nrlatent),




labels=paste("e2",1:s.nrvar, sep=""), name="U"),


mxMatrix("Full", 1, s.nrvar, free=TRUE, labels=paste("mean2",

1:s.nrvar, sep=""), values=rep(1, s.nrvar), name="meanMatrix"),





colnames(group2))

# The objective

),



)

facLoadFit


24/26


25/26



labels=paste("e",1:s.nrvar, sep=""), name="U"),


mxMatrix("Full", 1, s.nrvar, free=TRUE, labels=paste("mean",

1:s.nrvar, sep=""), name="meanMatrix"),# A vector of means on observed variables (free)




colnames(group1))

# The objective

),

mxModel("group2",

mxData(observed=group2, type="raw"),

mxMatrix("Full", s.nrvar, s.nrlatent, values=0.2,

free=TRUE, labels=paste("l",1:(s.nrvar*s.nrlatent),sep=""),

name="A"),

# A matrix of factor loadings (free)

mxMatrix("Symm", s.nrlatent, s.nrlatent, values=diag(s.nrlatent),




labels=paste("e",1:s.nrvar, sep=""), name="U"),


mxMatrix("Full", 1, s.nrvar, free=TRUE, labels=paste("mean",

1:s.nrvar, sep=""), name="meanMatrix"),




mxFIMLObjective(covariance="R", means="meanMatrix", dimnames=colnames(group2))

# The objective

),



)

errorsFit ##########################

> ## Running the function ##

> ##########################

>

> measInvTest(group1, group2)

25


26/26

Running Model1

Running Model2

Running Model3

Running Model4

base comparison ep minus2LL df AIC diffLL diffdf p

1 Model1 36 13737.65 5964 1809.652 NA NA NA

2 Model1 Model2 30 13743.66 5970 1803.661 6.009412 6 0.422136584

3 Model2 30 13743.66 5970 1803.661 NA NA NA

4 Model2 Model3 24 13750.24 5976 1798.237 6.575737 6 0.361868610

5 Model3 24 13750.24 5976 1798.237 NA NA NA

6 Model3 Model4 18 13772.02 5982 1808.019 21.782247 6 0.001325933

measurement in variance

Documents