measurement in variance

Upload: fastford14

Post on 02-Jun-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/11/2019 Measurement in Variance

    1/26

    Project for Introduction to Multivariate Statistics:

    Measurement Invariance

    Lian Hortensius

    May 10, 2012

    Contents

    1 Abstract 2

    2 Introduction 2

    3 Structural Equation Modeling 33.1 Parameters in SEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.2 Data as modeled by SEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    4 Measurement Invariance 74.1 Introduction to Measurement Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . 74.2 Configural Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74.3 Structural Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    4.3.1 Invariance of factor loadings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84.3.2 Invariance of intercepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84.3.3 Invariance of error terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    5 Fit statistics 95.1 The2 test statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    5.1.1 Specify two models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95.1.2 Unspecified parameters are estimated with Maximum Likelihood Estimation . . 105.1.3 Calculate the Likelihood of both models . . . . . . . . . . . . . . . . . . . . . . . 105.1.4 Compare the Likelihood Ratio using the (approximate)2 distribution . . . . . . 115.1.5 Detour: why is the Likelihood ratio test statistic distributed2df? . . . . . . . . . 115.1.6 Using the Likelihood ratio test for testing Measurement Invariance . . . . . . . . 12

    5.2 Other GoF measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.2.1 CFI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.2.2 Gamma hat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145.2.3 RMSEA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145.2.4 McDonalds Non-Centrality Index . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    5.3 Comparing GoF measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    1

  • 8/11/2019 Measurement in Variance

    2/26

    6 Limitations and other approaches 166.1 Limitations of Measurement Invariance Testing . . . . . . . . . . . . . . . . . . . . . . . 166.2 Comparing SEM and IRT methods for testing Measurement Invariance . . . . . . . . . . 16

    7 My program in OpenMX 17

    7.1 What it does . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177.2 Lovely Little Demo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    8 Conclusion 18

    9 Appendix - Measurement Invariance OpenMx program 21

    1 Abstract

    Structural Equation Modeling (SEM) is popular, and people want to test for Measurement Invariance,

    which is whether the model is equal across groups. You test for Measurement Invariance by comparing

    a model without certain restrictions with a model with certain restrictions (e.g. restricting the factorloadings to be equal) and seeing whether the restrictions significantly decrease the fit of the model.

    There are different types of Measurement Invariance (configural and structural). Testing for invariance

    includes a series of comparisons, for which the tests of invariance of shape of the model, factor loadings,

    intercepts, and residual variances are used most often. There are different ways to determine the fit of

    models. A commonly used fit statistic is the 2 test statistic based on the Likelihood Ratio test, and

    the mechanics of this statistic are explained. Other fit statistics include CFI, RMSEA, Gamma hat,

    and Mc NCI. Monte Carlo studies suggest using CFI and Mc NCI. Limitations of the SEM approach

    to Measurement Invariance are discussed and it is contrasted with the IRT approach to invariance.

    2 Introduction

    Structural Equation Modeling (SEM) has become a popular way to model psychological data. One

    of the issues that users of SEM face is the problem of Measurement Invariance: does the model fit

    different groups equally well, or is there some difference in the pathways between variables across

    different groups. Initially, Measurement Invariance was seen as a requirement for a good model, or

    at least as the preferred outcome. However, more recently some researchers have searched for a lack

    of Measurement Invariance: for example, Spaan (2012) tested Measurement Invariance of a battery of

    neuropsychological tests in elders with and without early-stage Alzheimer. She found that the factor

    loadings were not invariant across groups and used that to support the hypothesis that cognitive decline

    in elders with Alzheimer is substantively different from cognitive decline in elders without Alzheimer.

    This paper will start with an introduction to Structrual Equation Modeling and an explanation of

    the concept of Measurement Invariance, including the different types of Measurement Invariance that

    can be tested for. The second part of the paper will focus on an explanation of different fit statistics,

    most notably the 2 statistic based on the Likelihood Ratio test. It includes a literature review of

    which fit statistics to use when testing for invariance, and how to decide whether there is significant

    2

  • 8/11/2019 Measurement in Variance

    3/26

    misfit. The fourth part focuses on limitations of the SEM approach to Measurement Invariance and

    contrasts it with the IRT approach. The final part of the paper is a demonstration of the OpenMX

    program I have written to test for Measurement Invariance.

    3 Structural Equation Modeling

    Structural Equation Modeling is a method for modeling the relationships between variables. Several

    equations representing relationships are specified and then the coefficients in those equations are esti-

    mated simultaneously. SEM is a generalization of Factor Analysis (FA); in FA the common variance

    of the observed variables is explained by underlying latent traits (common factors and unique factors),

    and the factor loadings of those traits on the observed variables are estimated. In SEM, one can specify

    connections between the latent traits. For example, in Figure 1 on the left is a path diagram represent-

    ing a simple FA, and on the right is a path diagram representing a simple structural equation model

    with a relationship between common factors.

    X2X1 X3

    1

    12

    3

    2e1 2e2

    2e3

    12

    3

    (a) A simple FA-model

    Y2Y1 Y3

    X2X1 X3

    1

    y1y2

    y3

    21 22

    23

    x1x2

    x3

    21 22

    23

    x1 x2x3

    y1 y2y3

    (b) A simple SEM model with two related com-mon latent variables

    Figure 1: Two Path Diagrams

    Usually, SEM is used to model the influence of some predictor variableson somedependent variables

    through latent traits. An example of such a model is given in Figure 1 on the right. In this example, as

    in the general SEM literature, let represent the endogenous common latent trait(s), where endogenous

    3

  • 8/11/2019 Measurement in Variance

    4/26

    means that this variable is predicted by other variables. Let represent the exogenous common latent

    trait(s), where exogenous means that this variable is external to the model; it is a predictor and is

    not predicted by any other variable. Let represent the structural coefficients/loadings of on , lety

    be the vector of observed endogenous indicators, let y be the structural coefficients/loadings of on

    y, let be the errors (or uniqueness) associated with y, let x be the structural coefficients/loadings

    of on x, let x be the vector of observed exogenous indicators, let be the errors (or uniqueness)

    associated with x, and let be the error (or uniqueness) associated with . The triangle is used to

    model the means: for simplification we have set the latent trait means equal to 0 and use to represent

    indicator intercepts, which because of this simplification will also be the indicator means.

    These models can be expressed with path diagrams, as we have seen, but also with equations. The

    example given earlier can be expressed as such:

    = +

    The endogenous latent trait is a function of the exogenous latent trait (with factor loading and

    the error .

    y=

    y1

    y2

    y3

    +

    y1

    y2

    y3

    +

    1

    2

    3

    =y+ y+

    The observed endogenous indicators y are a function of the endogenous latent trait (with factor

    loadings) and the error. By estimating the coefficients, we can get information about the unobserved

    latent traits (common and unique) using the observed indicators.

    x=x1

    x2

    x3

    + x1

    x2

    x3

    + 1

    2

    3

    =x+ x+

    The observed exogenous indicatorsx are a function of the exogenous latent trait (with factor loadings

    ) and the error .

    The final generalization of this model involves expanding the single endogenous latent trait to a

    vector of traits, and expanding the single exogenous latent trait to a vector of traits. Figure 2 shows

    an example. In this example there are two exogenous common latent traits, each with three exogenous

    observed indicators, and two endogenous common latent traits, each with three endogenous observed

    indicators. The two endogenous latent traits are correlated, one endogenous latent trait is predicted

    using both exogenous latent traits, and one endogenous latent trait is predicted using only one latent

    trait. To keep the picture readable the triangle with a 1, and the pathways representing the means,

    have been omitted. This example will help to show the final Structural Equation Modeling notation.

    1

    2

    =

    0 12

    12 0

    1

    2

    +

    11 12

    21 22

    1

    2

    +

    1

    2

    = = B+ +

    4

  • 8/11/2019 Measurement in Variance

    5/26

  • 8/11/2019 Measurement in Variance

    6/26

    1

    Y2Y1 Y3 Y4 Y5 Y6

    2

    1

    X2X1 X3

    2

    X5X4 X6

    y1y2

    y3

    21 22

    23

    1

    y4y5

    y6

    24 25

    26

    2

    x1x2 x3

    21 22

    23

    x4x5 x6

    24 25

    26

    1112

    22

    12

    Figure 2: An expanded SEM model

    Error/uniqueness variances of observed exogenous and endogenous variables, given by and

    respectively

    Error variances in the conceptual models, i.e. error variances of the common latent variables,

    given by

    3.2 Data as modeled by SEM

    The previously described Structural Equation Models are a way to model data. It is possible to run

    a Structural Equation Model based solely on a covariance matrix (or SSCP matrix or correlation

    matrix), such a matrix plus a mean vector, or on raw data. When doing Measurement Invariance, it is

    important to use the covariance matrix rather than a correlation matrix, because by standardizing the

    data a difference in scale between groups might be lost. The sample covariance matrix is denoted by

    S.

    In SEM, the predicted covariance matrix is given by the equation

    = T + .

    The intercepts are denoted by , with the predicted intercepts, which are themselves parameters in

    the model, denoted by .

    In this way, using the estimated parameters one can calculate the predicted covariance matrix and

    mean structure, and compare those to the observed covariance matrix and mean structure.

    6

  • 8/11/2019 Measurement in Variance

    7/26

    4 Measurement Invariance

    4.1 Introduction to Measurement Invariance

    So far I have talked about single Structural Equation Models, and the parameters that are estimated

    for them. Now I want to introduce Measurement Invariance and when you would want to test for it.

    Meredith (1993) defined Measurement Invariance as the parameters of a model depending on group

    membership. Say you have two samples, from two groups that might be different (male and female,

    or French and Australian), and you want to know whether the test you have created can be used

    equivalently and can be compared across groups, so you can compare these groups on some latent

    trait(s) (MacCallum & Austin, 2000; e.g. Kim, K.H., Cramond, B. & Bandalos, D.L., 2006). Or say

    you want to know whether the underlying factor structure for data from a neuropsychological test

    battery is the same for healthy elders and Alzheimer patients, to find support for either the continuity

    hypothesis (cognitive decline in Alzheimer patients is just a more extreme form of the cognitive decline

    in normal aging) or the discontinuity hypothesis (there is a substantive difference in the form cognitive

    decline takes across groups) (Spaan, 2012).

    In all these examples testing for Measurement Invariance can tell you whether the same factor struc-

    ture upholds in both groups, and also whether the factor loadings and the means and error distribution

    of the groups are the same. By testing for Measurement Invariance, we test for equivalence of the

    structural equation model across groups. Measurement Invariance is often tested for when using tests,

    where the indicators are individual test items, but it can also be used with, for example, subtest scores

    as indicator variables.

    When comparing models, the typical approach is to compare a model to a more restricted version

    of the same model. When testing Measurement Invariance, typically the fit of the model for the whole

    data with specific parameters fixed to be equal across groups is compared to the model for the whole

    data with those parameters free to vary. If the free-to-vary condition fits significantly better than the

    fixed-to-be-equal condition, this parameter vector is not invariant across groups. It is important to

    mention again that one must use the covariance matrix, not the correlation matrix, when testing for

    Measurement Invariance in SEM. Because in the correlation matrix the metric has been standardized,

    you cannot test whether the metrics were different initially.

    There is not one single test that can be done - rather, one must do a series of tests, each designed to

    test for a specific level of Measurement Invariance. In the next sections I will explain the comparisons

    that are done most commonly: invariance of configuration, factor loadings, means, and error terms

    (Chen, 2007). It is possible (and might be desirable) to test for more levels of invariance (a nice

    overview is given in Cheung & Rensvold, 2002), but I have decided to stick to those that are at least

    somewhat commonly performed.

    4.2 Configural Invariance

    In our earlier example of SEM, some parameters were presumed to be larger than 0 and some were

    set to 0: not all endogenous observed indicators were indicators of each endogenous latent trait, for

    7

  • 8/11/2019 Measurement in Variance

    8/26

    example. The first test of Measurement Invariance tests whether the same configuration holds across

    groups: whether in both groups the same pathways can be said to be 0. If a model exhibits configural

    noninvariance, it is not possible to compare across groups at all: the observed variables are indicators

    of different traits in different groups.

    4.3 Structural Invariance

    Once configural invariance has been established, the next step is to look at different aspects of structural

    invariance: invariance of factor loadings, then intercepts, and finally error terms.

    4.3.1 Invariance of factor loadings

    The researcher will want to know whether the factor loadings are equal across groups: do the latent

    traits have the same loadings on the indicators that they are associated with, and are the relationships

    between latent traits equal across groups? This is a test of equality of structure coefficients. If factor

    loading invariance is established, the groups can be said to have the same unit of measurement. This

    is considered weak measurement invariance. If factor loading noninvariance is found, there are several

    things one could do, but if one wants to discover which specific factor loading(s) are invariant the review

    by Millsap and Meredith (2007) suggests using backwards elimination. One would release the factor

    loadings one by one and check the reduction in fit for each released constraint, to determine which

    factor loading(s) are noninvariant.

    4.3.2 Invariance of intercepts

    Now that equal unit of measurement has been established, the researcher will also want to know whether

    the intercept (or mean structure) across groups is the same: whether the scores from different groupshave the same origin. If intercept noninvariance is found, the different groups have different means on

    the indicators and thus individual scores cannot easily be compared across groups. On the other hand,

    intercept noninvariance can be a substantive research finding. Presence of both factor loading invariance

    and intercept invariance is considered strong measurement invariance. As with factor loadings, intercept

    noninvariance can be explored more deeply by releasing constraints on the intercepts one by one.

    4.3.3 Invariance of error terms

    The final test is that of residual invariance. This answers the question of whether group differences

    on the items are only due to group differences on the common factors. It can be difficult to achieve

    residual invariance. Presence of all four types of invariance is considered strict measurement invariance

    and indicates that group differences in the covariances, variances, and means for the indicator variables

    (demonstrated by factor loading invariance, intercept invariance, and error term invariance respectively)

    are due to group differences on the common factors (Millsap & Meredith, 2007).

    8

  • 8/11/2019 Measurement in Variance

    9/26

    5 Fit statistics

    In the previous section I explained what Measurement Invariance is and which parameters we will want

    to know invariance of. There are different ways of testing for invariance, which all involve model fit.

    The most widely used of them is the 2 test statistic based on the Likelihood Ratio test. I will explain

    this statistic in detail below.

    5.1 The 2 test statistic

    The most widely used test statistic for model fit in SEM is the classical likelihood ratio statistic 2t.s.

    (which assumes a normal distribution of the data). If the data are truly normally distributed and the

    model structure is correctly specified, 2t.s. approaches a chi-square distribution 2dfas the sample size

    Nincreases (Yuan, 2005).

    When assessing for model fit, you calculate how similar the predicted covariance matrix (based on

    the model; ) is to the covariance matrix containing the relationships in the actual data ( S), as well asthe similarity of the predicted mean vector () and observed mean vector (). This is calculated using

    the aforementioned Likelihood ratio statistic. The 2t.s. statistic is a function of the sample size and

    how different the model covariance matrix and mean vector and observed data are. A more extreme

    2t.s. means the predicted values are more different from the observed data, i.e. the model has a worse

    fit. A significant p-value thus indicates bad model fit.

    I will now first explain this procedure in general terms and then specify its use for SEM. The steps

    taken when calculating the 2 test statistic are such:

    Specify two models

    Unspecified parameters are estimated with Maximum Likelihood Estimation

    Calculate the Likelihood of both models

    Compare the Likelihood Ratio using the (approximate) 2 distribution

    Detour: why is the Likelihood Ratio test statistic distributed2df?

    I will now explain each of these steps.

    5.1.1 Specify two models

    Say you have a parameter vector = 1, 2,...,p of length p, where (the parameter space).In other words, the parameter vector can take values that are in the parameter space. The parameter

    space is a subspace of p-dimensional Euclidian space Rp.

    In order to specify two models, you can restrict some of the parameter values. For example, in

    model 1 do not restrict any of the parameter values, and in model 2 specifcy 1 = 1 and 2 = 0 (but

    leave3,...,p free to vary). Model 2 (the more restricted model) is nestedin Model 1 (the more general

    model).

    9

  • 8/11/2019 Measurement in Variance

    10/26

    5.1.2 Unspecified parameters are estimated with Maximum Likelihood Estimation

    In the next step we will want to calculate the Likelihood of both models, and we will want to have

    the highest possible Likelihood given the constraints on parameter values. I will first explain what

    Likelihood is and how to maximize the Likelihood for a single parameter.The distribution for the data,x with length n, given the parameter , is a PDF given by

    f(x|) =n

    i=1

    f(xi|) = Ln().

    The last term is the Likelihood functionof the parameter given the data x. It is a function of the

    parameter and represents how likely each possible parameter is, given the data. When calculatingthe Likelihood, you can drop any terms that do not contain the parameter. The highest Likelihood is

    at the value that is the Maximum Likelihood Estimate (MLE). In order to calculate the MLE, you

    take the (natural) log of the Likelihood:

    ln() = logL().

    The next step is to take the derivative of the log likelihood function with respect to :

    ln() =ln()

    Finally, in order to find the MLE you set l n() equal to 0 and solve for :

    ln() = 0

    Under certain conditions (e.g. concavity of the likelihood function) this value is the value that globally

    maximizes the Likelihood. When finding a set of MLEs for multiple parameters the distribution for the

    data, x with length n, given the parameters, , is a PDF given by f(x|) =ni=1f(xi|). The MLEvector is a vector of parameter estimates which, combined, gives the highest possible likelihood for

    the model.

    5.1.3 Calculate the Likelihood of both models

    We have seen how to obtain the Likelihood function, and how to find the parameter value(s) that will

    maximize the Likelihood. Now we calculate the Likelihood given those parameter values. In our earlier

    example Model 1 had no constraints on parameter values and Model 2 had constraints on the first twoparameters but not on the others. So for Model 1 the Likelihood is calculated with the MLEs for every

    parameter value, and for Model 2 the Likelihood is calculated with the fixed values for parameter one

    and two, and the MLEs for the other parameters.

    Obviously, the Likelihood for Model 1 will be higher than for Model 2, since for Model 2 not every

    parameter was estimated to be the value that maximizes the Likelihood.

    10

  • 8/11/2019 Measurement in Variance

    11/26

    5.1.4 Compare the Likelihood Ratio using the (approximate)2 distribution

    We have calculated the Likelihood of Model 1 (L(Model1)) and the Likelihood of Model 2 (L(Model2)).

    We want to know whether Model 2 fits significantly less than Model 1 - we know it fits less, and we

    want to determine whether it fits so much less that we can say the constraints on the parameters wereincorrect.

    When comparing two (nested) models, the difference in (log) Likelihood test statistics given by:

    2t.s.= 2lnL(model2)L(model1)

    = 2

    lnL(Model2) lnL(Model1)

    is asymptotically distributed as a2 with degrees of freedom that is the difference between the number

    of free parameters of Model 1 and Model 2. This was demonstrated by Wilks in 1938. His proof is

    beyond the scope of this paper, but I will demonstrate why a simpler likelihood ratio test is distributed

    as2; the more complicated tests are an expansion of this principle. Therefore it is possible to use2t.s.

    as a test statistic and compare it to a 2 distribution with the appropriate degrees of freedom to finda p-value. I will now demonstrate why the Likelihood ratio test statististic is distributed2df for an

    example with only one free parameter.

    5.1.5 Detour: why is the Likelihood ratio test statistic distributed2df?

    Say we have a sample X1,...,Xnfrom a normal distribution with known variance. We want to compare

    Model 1, where is estimated freely, with Model 2, where is fixed at 0.

    We want to estimate the unspecified parameterin Model 1 using Maximum Likelihood Estimation.

    First we find the PDF:

    f(x|) =

    n

    i=1

    1

    22e(xi)

    2/22

    The PDF is also the Likelihood function:

    Ln() = ( 122

    )nni=1

    e(xi)2/22 = (

    122

    )ne

    n

    i=1(xi)2/22 =

    ( 122

    )nen(xn)2/22 e

    n

    i=1(xixn)2/22

    In order to calculate the MLE, you take the (natural) log of the Likelihood, dropping the terms that

    do not contain for simplicity:

    ln() =n(xn

    )2

    22

    The next step is to take the derivative of the log likelihood function with respect to :

    ln() =n(xn )

    2

    11

  • 8/11/2019 Measurement in Variance

    12/26

    Then we set the derivative equal to zero and solve for :

    ln() = 0 =n(xn )

    2 = xn

    And so we see that the MLE = xn. Now that we know the MLE, we can calculate the Likelihood of

    Model 1 (where was estimated freely, with the MLE).

    Ln() = ( 122

    )nen(xn)2/22 e

    n

    i=1(xixn)2/22 = (

    122

    )ne

    n

    i=1(xixn)2/22

    because = xn. And the Likelihood for Model 2, with = 0:

    Ln(0) = ( 122

    )nen(xn0)2/22 e

    n

    i=1(xixn)2/22 .

    Now we have the Likelihood for both models, so we can calculate the Likelihood ratio:

    L(model2)L(model1)

    = ( 1

    22)nen(xn0)

    2/22 e

    n

    i=1(xixn)2/22

    ( 122

    )ne

    n

    i=1(xixn)2/22

    = en(xn0)

    2/22

    So the Likelihood ratio test statistic 2t.s. is:

    2lnL(model2)L(model1)

    = 2ln(en(xn0)2/22) = 2 n(xn 0)

    2

    22 =

    n(xn 0)22

    =(xn 0)

    /n

    2

    Under the hypothesis that = 0, we know (xn0)

    /n N(0, 1), and therefore

    (xn0)/

    n

    2is a squared

    standard normal score, which is a score from a 2 distribution with 1 degree of freedom - which is what

    we claimed the Likelihood ratio test statistic is distributed as in this example with 1 parameter.

    5.1.6 Using the Likelihood ratio test for testing Measurement Invariance

    So far I have talked about fairly simple models, comparing fit with one free parameter. In SEM, there

    are many more free parameters, but the idea is the same: calculate how likely the observed covariance

    matrix is given the parameters (some fixed and some estimated freely to maximize the likelihood) and

    then compare that to the likelihood of the observed covariance matrix when using only Maximum

    likelihood estimates. Conceptually, this is the same as comparing the observed covariance matrix S to

    a model-specified covariance matrix . So when using the Likelihood ratio test for model fit in SEM,

    the calculation of the Likelihood is more complicated than in the earlier examples (because there are

    many more parameters), but the test statistic can be calculated in the same way.

    When testing for Configural and Structural Invariance, four consecutive tests are done comparing

    four models (and a baseline model with no constraints). See table 1 for the models and its constraints.

    Each additional model is nested in the previous one. Table 2 shows the four tests that are done. Each

    additional model is compared with the previous one, so that each test is only for the newly added set

    of constraints (e.g. equal factor loadings).

    12

  • 8/11/2019 Measurement in Variance

    13/26

    The2 test statistic has disadvantages that have been acknowledged in recent times. As mentioned

    previously, it is sensitive to sample size - when the sample size is large, the test statistic will be

    significant even for small differences between the observed covariance matrix and the model-specified

    covariance matrix, i.e. it will reject models that fit reasonably well. Hayduk (1987) mentions that

    in his experience this is a problem for sample sizes over 500. In addition, the2 test is senstive to

    non-normality, specifically kurtosis. Studies have shown that skewness does not impact the test too

    much, but if there is kurtosis in the data the resulting 2 test statistics are not distributed as a 2

    (Yuan, 2005). Because of these issues, additional Goodness of Fit (GoF) measures have been proposed

    for testing how well a specific model fits data from one group, and recently they have also been applied

    to testing for measurement invariance.

    Table 1: Models used in testing for Measurement Invariance

    Model name Constraints

    Model 1 Equal configurationModel 2 Equal factor loadingsModel 3 Equal factor loadings and interceptsModel 4 Equal factor loadings, intercepts, and error variances

    Table 2: Consecutive tests of Measurement Invariance

    Models compared: Type of invariance tested for: Testing equality of:Model 1 vs Baseline model Configural invariance ConfigurationModel 1 vs Model 2 Weak invariance Factor loadings

    Model 2 vs Model 3 Strong invariance InterceptsModel 3 vs Model 4 Strict invariance Error variances

    5.2 Other GoF measures

    In this section I will explain some of the common Goodness of Fit indices. As can be seen in the next

    subsection, simulation studies have focused on the Comparative Fit Index (CFI), Gamma hat, Root

    mean square error or approximation (RMSEA), and/or McDonalds Non-Centrality Index (McD NCI)

    as potential GoF indices to use.

    5.2.1 CFI

    The Comparative Fit Index is calculated by

    CFI = 1 X2t dftX2n dfn

    13

  • 8/11/2019 Measurement in Variance

    14/26

    where 2t is the chi-square for the tested model, dft is its associated degrees of freedom, 2n is the

    chi-square for the null model, and dfn is its associated degrees of freedom (Bentler, 1990).

    5.2.2 Gamma hat

    This fit index is proposed by Steiger (1989), although he actually proposed two Gamma indices and

    the later Monte Carlo studies do not specify which one they used. They do reference Hu and Bentler

    (1998) who only discuss one of the two Gamma hat indices:

    Gamma hat = p

    p+ 2[(TT dfT)/(N 1)]

    where TT is the T-statistic for the target model (with dfTits associated defrees of freedom), p is the

    number of parameters, and Nis the sample size. Hu and Bentler recommended against using Gamma

    hat in practice.

    5.2.3 RMSEA

    The Root mean square error of approximation is calculated such:

    RMSEA =

    2t dftdft(N 1)

    where 2t is the chi-square for the tested model, dft is its associated degrees of freedom, and N is the

    sample size. When using the RMSEA for testing for Measurement Invariance, it is adjusted by taking

    the square root of the overall population discrepancy function divided by the average number of degrees

    of freedom per sample. The population discrepancy function is the value of the discrepancy function we

    would get if we fit the model to the (actually unknown) true population covariance matrix (Steiger,

    1998).

    5.2.4 McDonalds Non-Centrality Index

    In 1989 McDonald proposed his Non-Centrality Index (McD NCI) as an alternative to other GoF indices

    which are usually sensitive to sample size. It is an adaptation of the Akaike Information Criterion which

    depends on sample size as well: for a small sample size the simplest model will result in the smallest

    AIC, whereas as sample size increases eventually the saturated model will result in the smallest AIC.

    As an alternative to this, McDonald proposed using the Non-Centrality Parameter rescaled to not be

    a function of sample size: =

    n. This value can be estimated by = f (pn) wheref=

    (2logL)n and

    p= the order of the sample covariance matrix. This results in a measure of misfit, so to change that

    to a goodness of fit index McDonald proposed the final transformation:

    mc = exp 1

    2d.

    14

  • 8/11/2019 Measurement in Variance

    15/26

  • 8/11/2019 Measurement in Variance

    16/26

    6 Limitations and other approaches

    6.1 Limitations of Measurement Invariance Testing

    In previous sections I explained how, using a SEM approach, one can test for Measurement Invariance.

    In this section I will present a critical view of the SEM approach to testing for Invariance.

    A problem that has been discussed rarely in the literature is that of sequential testing. In testing

    for invariance one does a sequence of tests, each designed for testing for a specific type of invariance. In

    other words, one considers a set of statistical tests based on one dataset. This leads to an inflated type I

    error rate, which should be corrected for (e.g. Benjamini & Hochberg, 1995). However, no correction for

    multiple comparisons is discussed in the Measurement Invariance literature. Not correcting for multiple

    testing leads to an inflated Type I error rate, i.e. too often one concludes there is a significant difference

    when in truth the samples are from the same population, which in this context means one too often

    incorrectly diagnoses Measurement Noninvariance. Therefore, not correcting for multiple comparisons

    leads to a more conservative set of tests of Measurement Invariance.

    Reise, Waller, and Comrey (2000) discuss testing Measurement Invariance in the context of scale

    revision. They point out that using the standard Comfirmatory Factor Analysis method to compare

    the factor loadings of dichotomous items can be inappropriate, because these methods are designed to

    be used on continuous interval-level measurement of the indicators. In addition, they warned that a

    lack of simple structure can lead to an incorrect conclusion of Measurement Noninvariance.

    6.2 Comparing SEM and IRT methods for testing Measurement Invariance

    A different approach than the SEM approach to invariance, i.e. the multiple group CFA approach, is

    the Item Response Theory (IRT) approach. This is appropriate when testing the factor loadings ofindividual items, so when looking at the structure of a scale/test. In IRT the item response is modeled as

    a logistic function of item discrimination (analogous to factor loadings), item location (the difficulty

    of the item), and potentially a lower and upper asymptote. It is useful for modeling responses to

    dichotomous or polytomous items. Measurement Invariance in the IRT context can be conceptualized

    as invariance of the item parameters - are the same item parameters applicable to the two groups, and

    thus, do the items have the same relation to the latent trait in both groups (Reise, Widaman & Pugh,

    1993). In IRT, Measurement Noninvariance is often called differential item functioning: one or more

    items function differently in the two groups.

    Reise, Widaman, and Pugh (1993) compared multiple group CFA (i.e. SEM) and Item Response

    Theory (IRT) as methods to test for Measurement Invariance. They concluded that both methods

    have advantages. Multiple group CFA is more user friendly, i.e. simpler to implement, and has more

    varied options to determine model fit, i.e. a combination of the likelihood ratio test and additional

    other goodness of fit indices, which the IRT model lacks. The issue they address of multiple group CFA

    disregarding the difficulty parameter of IRT can be circumvented by testing for intercept invariance in

    addition to structural coefficients invariance. However, the more serious criticism of the SEM approach

    16

  • 8/11/2019 Measurement in Variance

    17/26

    is that it is a linear model which is generally not appropriate for dichotomous or polytomous responses,

    unlike IRT which uses a logistic model.

    Glockner-Rist and Hoijtink (2003) took an in-depth look at the similarities and differences of SEM

    and IRT approaches to Measurement Invariance. They proposed an integrated framework where IRT

    properties are added to a SEM approach. They first demonstrate that the normal ogive two parameter

    IRT model is a type of (nonlinear) factor analysis, and because this model is equivalent to the logistic

    IRT model the latter is also comparable to factor analysis. They run a non-linear confirmatory factor

    analysis using the normal ogive structure, hence they run an IRT model in a SEM framework. This

    approach can be helpful for bringing IRT and SEM literature closer together, and using the SEM

    approach for dichotomous or polytomous data.

    7 My program in OpenMX

    7.1 What it does

    In order to write a program to test Measurement Invariance, I used OpenMx (Boker et al., 2011),

    a package in R (R Development Core Team, 2011). My program, calledmeasInvTest takes two raw

    datasets as its argument (with the option to specify the number of common latent factors, set to 1 by

    default). It specifies four models in succession - Model 1 through 4 from table 1, with no constraints

    on which factor loadings should be 0. It would be possible to add this, but the details would depend

    on the number of indicators, common factors, and the model that one wishes to specify.

    Each model is tested using the Likelihood ratio test against the model with one fewer set of con-

    straints on the parameters. The first comparison tests a model with no constraints (other than the

    basic constraints of common factor means being 0, and common factor variances being 1 to identify

    the model) versus a model with the constraint that factor loadings must be equal across group. The

    second comparison tests the latter model against a model with the additional constraint that indicator

    means must be equal across groups. The third comparison tests the latter model against a model with

    the additional constraint that uniqueness factors variances must be equal across groups.

    The output of the program is a table showing the results from these three comparisons, with (among

    other, less relevant output) the diffLL (i.e. 2t.s.), the associated degrees of freedom (diffdf) and the

    p-value.

    7.2 Lovely Little Demo

    For this demo of the program I will compare two raw datasets used as an example by Michael Neale,

    one of the creators of OpenMx. Here I will just use the function - in the Appendix I provide the full

    code for the function.

    First, I load the relevant packages and read in the data:

    > ## First the necessary preamble:

    > .libPaths(new="C:\\Lian\\R Packages")

    17

  • 8/11/2019 Measurement in Variance

    18/26

    > require(OpenMx)

    > require("mvtnorm")

    > ## Let

    s use Michael Neale

    s data:

    > data(myFADataRaw)

    > group1 group2

  • 8/11/2019 Measurement in Variance

    19/26

    Noninvariance). Another limitation of the (linear) SEM based approach to Measurement Invariance

    is that often this will be used with dichotomous items as indicators, which leads to decidedly non-

    continuous data - a violation of the assumptions. A combination of IRT and SEM might be more

    appropriate for testing Measurement Invariance in the future.

    References

    [1] Benjamini, Y. & Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Pow-erful Approach to Multiple Testing. Journal of the Royal Statistical Society, Series B (Methodolog-ical), 57(1), 289-300.

    [2] Bentler, P.M. (1990). Comparative Fit Indexes in Structural Models. Psychological Bulletin,107(2), 238-246.

    [3] Boker, S.M., Neale, M.C., Maes, H.H., Wilde, M.J., Spiegel, M., Brick, T.R., Spies, J., Estabrook,

    R., Kenny, S., Bates, T.C., Mehta, P., & Fox, J. (2011). OpenMx: An Open Source ExtendedStructural Equation Modeling Framework. Psychometrika, 76(2), 306-317.

    [4] Byrne, B.M. & Watkins, D. (2003). The Issue of Measurement Invariance Revisited.Journal ofCross-Cultural Psychology, 34, 155-175.

    [5] Chen, F.F. (2007). Sensitivity of Goodness of Fit Indexes to Lack of Measurement Invariance.Structural Equation Modeling, 14(3), 464-504.

    [6] Cheung, G.W. & Rensvold, R.B. (2002). Evaluating Goodness-of-Fit Indexes for Testing Measure-ment Invariance. Structural Equation Modeling, 9(2), 233-255.

    [7] Glockner-Rist, A. & Hoijtink, H. (2003). The Best of Both Worlds: Factor Analysis of Dichoto-mous Data Using Item Response Theory and Structural Equation Modeling. Structural Equation

    Modeling, 10(4), 544-565.[8] Hayduk, L.A. (1987). Structural Equation Modeling with LISREL: Essentials and Advances. The

    John Hopkins University Press: Baltimore, Maryland.

    [9] Kim, K.H., Cramond, B., & Bandalos, B.L. (2006). The Latent Structure and Measurement Invari-ance of Scores on the Torrance Tests of Creative Thinking-Figural. Educational and PsychologicalMeasurement, 66(3), 459-477.

    [10] MacCallum, R.C. & Austin, J.T. (2000). Applications of Structural Equation Modeling in Psycho-logical Research. Annual Review of Psychology, 51, 201-226.

    [11] McDonald, R.P. (1989). An Index of Goodness-of-Fit Based on Noncentrality. Journal of Classifi-cation, 6, 97-103.

    [12] Meredith, W. (1993). Measurement Invariance, Factor Analysis and Factorial Invariance. Psy-chometrika, 58(4), 525-543.

    [13] Millsap, R.E. & Meredith, W. (2007). Factorial Invariance: Historical Perspectives and New Prob-lems. In Cudeck, R. & MacCallum, R.C. (Eds.) Factor Analysis at 100: Historical Developmentsand Future Directions. Lawrence Erlbaum Associates: Mahwah, New Jersey.

    [14] R Development Core Team (2011). R: A language and environment for statistical computing. RFoundation for Statistical Computing: Vienna, Austia.

    19

  • 8/11/2019 Measurement in Variance

    20/26

    [15] Reise, S.P., Waller, N.G. & Comrey, A.L. (2000). Factor Analysis and Scale Revision.PsychologicalAssessment, 12(3), 287-297.

    [16] Reise, S.P., Widaman, K.F., & Pugh, R.H. (1993). Confirmatory Factor Analysis and Item Re-sponse Theory: Two Approaches for Exploring Measurement Invariance. Psychological Bulletin,

    114(3), 552-566.

    [17] Spaan, P.E.J. (2012). Cognitieve achteruitgang bij normale veroudering en de ziekte van Alzheimer:Een continue of discontinue overgang? Tijdschrift voor Neuropsychologie, 7(1), 3-15.

    [18] Steiger, J.H. (1989).EzPATH: A supplementary module for SYSTAT and SYGRAPH.SYSTAT:Evanston, Il.

    [19] Steiger, J.H. (1998). A note on multiple sample extensions of the RMSEA fit index. StructuralEquation Modeling, 5(4), 411-419.

    [20] Wilks, S.S. (1938). The Large-Sample Distribution of the Likelihood Ratio for Testing CompositeHypotheses.Annals of Mathematical Statistics, 9, 60-62.

    [21] Yuan, K. (2005). Fit Indices Versus Test Statistics.Multivariate Behavioral Research, 40(1), 115-148.

    20

  • 8/11/2019 Measurement in Variance

    21/26

    9 Appendix - Measurement Invariance OpenMx program

    Here I will provide the entire program used to test for Measurement Invariance:

    > ## First the necessary preamble:

    > .libPaths(new="C:\\Lian\\R Packages")> require(OpenMx)

    > require("mvtnorm")

    > options(continue=" ")

    > ############

    > ## Data ##

    > ############

    >

    > ## Let s use Michael Neale s data:

    > data(myFADataRaw)

    > group1 group2

  • 8/11/2019 Measurement in Variance

    22/26

    mxModel("group2",

    mxData(observed=group2, type="raw"),

    mxMatrix("Full", s.nrvar, s.nrlatent, values=0.2,

    free=TRUE, labels=paste("l2",1:(s.nrvar*s.nrlatent),sep=""),

    name="A"),

    # A matrix of factor loadings (free)mxMatrix("Symm", s.nrlatent, s.nrlatent, values=diag(s.nrlatent),

    free=FALSE, name="L"),

    # A matrix of factor variances (orthogonal model)(fixed to 1)

    mxMatrix("Diag", s.nrvar, s.nrvar, values=1, free=TRUE,

    labels=paste("e2",1:s.nrvar, sep=""), name="U"),

    # A matrix of Uniqueness variances (free)

    mxMatrix("Full", 1, s.nrvar, free=TRUE, labels=paste("mean2",

    1:s.nrvar, sep=""), values=rep(1, s.nrvar), name="meanMatrix"),

    # A vector of means on observed variables (free)

    mxAlgebra(A %*% L %*% t(A) + U, name="R"),

    # The algebra to calculate Sigma

    mxFIMLObjective(covariance="R", means="meanMatrix", dimnames=

    colnames(group2))

    # The objective

    ),

    mxAlgebra(group1.objective + group2.objective, name="h12"),

    mxAlgebraObjective("h12")

    )

    noConstraintFit

  • 8/11/2019 Measurement in Variance

    23/26

    mxFIMLObjective(covariance="R", means="meanMatrix", dimnames=

    colnames(group1))

    # The objective

    ),

    mxModel("group2",

    mxData(observed=group2, type="raw"),mxMatrix("Full", s.nrvar, s.nrlatent, values=0.2,

    free=TRUE, labels=paste("l",1:(s.nrvar*s.nrlatent),sep=""),

    name="A"),

    # A matrix of factor loadings (free)

    mxMatrix("Symm", s.nrlatent, s.nrlatent, values=diag(s.nrlatent),

    free=FALSE, name="L"),

    # A matrix of factor variances (orthogonal model)(fixed to 1)

    mxMatrix("Diag", s.nrvar, s.nrvar, values=1, free=TRUE,

    labels=paste("e2",1:s.nrvar, sep=""), name="U"),

    # A matrix of Uniqueness variances (free)

    mxMatrix("Full", 1, s.nrvar, free=TRUE, labels=paste("mean2",

    1:s.nrvar, sep=""), values=rep(1, s.nrvar), name="meanMatrix"),

    # A vector of means on observed variables (free)

    mxAlgebra(A %*% L %*% t(A) + U, name="R"),

    # The algebra to calculate Sigma

    mxFIMLObjective(covariance="R", means="meanMatrix", dimnames=

    colnames(group2))

    # The objective

    ),

    mxAlgebra(group1.objective + group2.objective, name="h12"),

    mxAlgebraObjective("h12")

    )

    facLoadFit

  • 8/11/2019 Measurement in Variance

    24/26

  • 8/11/2019 Measurement in Variance

    25/26

    # A matrix of factor variances (orthogonal model)(fixed to 1)

    mxMatrix("Diag", s.nrvar, s.nrvar, values=1, free=TRUE,

    labels=paste("e",1:s.nrvar, sep=""), name="U"),

    # A matrix of Uniqueness variances (free)

    mxMatrix("Full", 1, s.nrvar, free=TRUE, labels=paste("mean",

    1:s.nrvar, sep=""), name="meanMatrix"),# A vector of means on observed variables (free)

    mxAlgebra(A %*% L %*% t(A) + U, name="R"),

    # The algebra to calculate Sigma

    mxFIMLObjective(covariance="R", means="meanMatrix", dimnames=

    colnames(group1))

    # The objective

    ),

    mxModel("group2",

    mxData(observed=group2, type="raw"),

    mxMatrix("Full", s.nrvar, s.nrlatent, values=0.2,

    free=TRUE, labels=paste("l",1:(s.nrvar*s.nrlatent),sep=""),

    name="A"),

    # A matrix of factor loadings (free)

    mxMatrix("Symm", s.nrlatent, s.nrlatent, values=diag(s.nrlatent),

    free=FALSE, name="L"),

    # A matrix of factor variances (orthogonal model)(fixed to 1)

    mxMatrix("Diag", s.nrvar, s.nrvar, values=1, free=TRUE,

    labels=paste("e",1:s.nrvar, sep=""), name="U"),

    # A matrix of Uniqueness variances (free)

    mxMatrix("Full", 1, s.nrvar, free=TRUE, labels=paste("mean",

    1:s.nrvar, sep=""), name="meanMatrix"),

    # A vector of means on observed variables (free)

    mxAlgebra(A %*% L %*% t(A) + U, name="R"),

    # The algebra to calculate Sigma

    mxFIMLObjective(covariance="R", means="meanMatrix", dimnames=colnames(group2))

    # The objective

    ),

    mxAlgebra(group1.objective + group2.objective, name="h12"),

    mxAlgebraObjective("h12")

    )

    errorsFit ##########################

    > ## Running the function ##

    > ##########################

    >

    > measInvTest(group1, group2)

    25

  • 8/11/2019 Measurement in Variance

    26/26

    Running Model1

    Running Model2

    Running Model3

    Running Model4

    base comparison ep minus2LL df AIC diffLL diffdf p

    1 Model1 36 13737.65 5964 1809.652 NA NA NA

    2 Model1 Model2 30 13743.66 5970 1803.661 6.009412 6 0.422136584

    3 Model2 30 13743.66 5970 1803.661 NA NA NA

    4 Model2 Model3 24 13750.24 5976 1798.237 6.575737 6 0.361868610

    5 Model3 24 13750.24 5976 1798.237 NA NA NA

    6 Model3 Model4 18 13772.02 5982 1808.019 21.782247 6 0.001325933