a primer on structural equation modelspeople.duke.edu/~mababyak/docs/other/babyakgreencfa.pdf ·...
TRANSCRIPT
A Primer on Structural Equation Models:
Part 1. Confirmatory Factor Analysis.
Michael A. Babyak, PhD1 and Samuel B. Green, PhD2
1 Department of Psychiatry and Behavioral Sciences, Duke University Medical Center,
Durham, NC
2 Division of Psychology in Education, Arizona State University, Tempe, Arizona
Correspondence to: Dr. Michael Babyak, Box 3119, DUMC, Durham, NC 27710;
email: [email protected]; fax: (919) 684-8629
Word Count: 6,098
Tables: 1
Figures: 3
A Primer on Structural Equation Models:
Part 1. Confirmatory Factor Analysis.
Michael A. Babyak and Samuel B. Green
Confirmatory Factor Analysis 1
Abstract
In the first of a two-part didactic series on structural equation modeling, we present an
introduction to the basic concepts underlying confirmatory factor analysis. We use
examples with simplified fictitious data to demonstrate the underlying mathematical
model and the conventions for nomenclature and graphical representation of the model.
We then show how parameter estimates are generated for the model with the maximum
likelihood function. Finally, we discuss several ways in which model fit is evaluated and
also briefly introduce the concept of model identification. Sample code in the EQS and
Mplus software language is provided in an online appendix. A list of resources for
further study also is included.
Confirmatory Factor Analysis 2
Structural equation modeling (SEM) is a general data-analytic method for the
assessment of models that specify relationships among variables. SEM involves
investigating two primary models: the measurement model that links measures to factors
and the structural model that links factors to each other. In the first installment of this
two-part series, we will discuss confirmatory factor analysis (CFA), which is the method
for specifying, estimating, and assessing measurement models. The second installment
will be published at a later date and will focus on the full structural equation model,
which includes both measurement and structural components. SEM is highly flexible–it
can be used to carry out a large variety of analytic procedures. With flexibility, of
course, comes complexity, and even two overly long papers would not provide sufficient
space to address the many facets of modern SEM. Nevertheless, we hope to provide at
least a glimpse and intuitive understanding of the basic concepts of SEM. In the
following pages, we will offer an introduction to confirmatory factor analysis, including
the purpose of CFA, the specification of models, computation of estimates of the model
parameters, and assessment of model fit.
SEM in psychosomatic and medical research
Although SEM is used quite frequently in some fields, such as psychology,
education, sociology, and genetics, research using SEM appears comparatively
infrequently in psychosomatic and medical journals. There is at least a small irony to the
relative scarcity of SEM in medical and psychosomatic research in that the technique
actually has its direct roots in biology. In the 1920s, geneticist Sewall Wright first
developed an important component of SEM, path analysis, in an attempt to better
Confirmatory Factor Analysis 3
understand the complex relations among variables that might determine the birth weight
of guinea pig offspring (1). In the late 1970s by Joreskog and Goldberger (2) developed
and championed the full SEM model, successfully integrating factor analytic technology
with path analysis. SEM has begun to appear more often in psychosomatic research, and
even has made begun to make an occasional foray into high profile medical journals. For
example, in a recent paper published in the New England Journal of Medicine, Calis et al.
(3) used the path modeling portion of SEM to estimate a set of complex associations
among malaria, HIV, and various nutritional deficiencies. In a recent commentary on
post-traumatic stress syndrome (PTSD) that appeared in JAMA, Bell and Orcutt {Bell,
2009 #158} explicitly point out the potential utility of SEM in their area of study:
“Structural equation modeling is particularly well suited for examining complex
associations between multiple constructs; such constructs are often represented as latent
constructs and are assumed to be free of measurement error.” Nevertheless, very few
papers using SEM to address any topic have graced the pages of JAMA. Of course, we
tend to agree with Bell and Orcutt. SEM is particularly useful as an aid in understanding
of how variables might interrelate in a system. It is especially useful when those
variables are comprised of several facets that overlap to some extent. In the
psychosomatic medicine domain, for example, Rosen et al. (4) use SEM to estimate the
association of global subjective health with psychological distress, social support and
physical function. Rosen et al. operationalize these four study variables as latent
variables, that is, factors instantiated by a set of observed measures that were thought to
reflect the respective factors.
Confirmatory Factor Analysis 4
Our fictionalized example
In order to demonstrate the CFA part of SEM, we will focus on a fictionalized
example based on a real question in psychosomatic research. Our example bears some
similarity to the model studied by Rosen et al., although ours is, of necessity, much
simpler. We draw on the literature has accumulated in our field suggesting that a variety
of psychosocial constructs, including trait hostility, anger, anxiety, and depressive
symptoms, appear to be risk factors for the development of coronary artery disease
(CAD) (for a review, see Suls and Bunde (5)). Despite the large number of papers
published about the topic, however, a number of interesting fundamental questions
remain. For example, do depression, hostility, anger, and anxiety each uniquely pose a
risk for heart disease? Or, because these variables tend to overlap, are they really just
manifestations of a broader trait underlying “negative affect? And is it really that
underlying trait which confers the risk?
In this first installment, we will focus on how SEM can be used to study the
question of whether these variables might be manifestations of one or more broader
dimensions. Understanding the measurement properties of the variables under study is a
critical first step in carrying out any empirical research study—we have to know what we
are measuring and how well we are doing it before drawing any robust conclusions about
findings that concern those variables. In the case of our example, in this preliminary
‘measurement model’ phase we can address questions about the measurement properties
of the variables under study, such as: are hostility, depression, and anxiety really distinct
constructs? Or are they really just slightly different manifestations of the same negative
affectivity phenomenon? Understanding the measurement properties of the negative
Confirmatory Factor Analysis 5
affect variables will later help illuminate the “specific versus general” risk question. Of
course, in the world outside the confines of this tutorial, we also would draw from
substantive theory to make such arguments work. Since our focus here is within the more
narrow borders of the SEM technique itself, we will not delve very deeply into the
substantive aspects of the analyses we will present. Our aim here is to simply get your
feet wet in understanding the technique might be applied to such questions.
Purpose of Confirmatory Factor Analysis
CFA, as well as exploratory factor analysis (EFA), defines factors that account for
covariability or shared variance among measured variables and ignore the variance that is
unique to each of the measures. Broadly speaking, either can be a useful technique for
(a) understanding the structure underlying a set of measures, (b) reducing redundancy
among a set of measured variables by representing them with a fewer number of factors,
and (c) exploiting that redundancy and hence improving the reliability and validity of
measures. However, the purposes of EFA and CFA, and accordingly the methods
associated with them, are different. The goal of EFA is to discover a set of as-yet-
unknown or unverified factors based on the data, although a priori hypotheses based on
the literature may help guide some decisions in the EFA process. In other words, we start
with correlations among some set of variables, and although we may have some reason to
believe they will cluster following a certain pattern, we generally submit the correlations
to the software and allow the algorithm to tell us how many factors there may be and
what particular variables belong to those factors. In contrast, in CFA, we have to start
with one or more explicit hypotheses about the number of factors and how the variables
Confirmatory Factor Analysis 6
are related to those factors. CFA accomplishes this by assessing constraints imposed on
factor models based on the a priori hypotheses about measures. If these constraints are
inconsistent with the pattern of relationships among measured variables, the model with
its imposed constraints is rejected. Given the focus of CFA is on hypothesized models,
let’s first describe how these models are specified before considering how the parameters
of models are estimated and how the fit of models to the data are assessed.
Model Specification
With CFA, we hypothesize a model that specifies the relationship between
measured variables and presumed underlying factors.1 The model includes parameters we
want to estimate based on the data (i.e., freely estimated parameters) and parameters we
constrain to particular values based on our understanding of our data and the literature
(i.e., constrained or fixed parameters). It is the constraints on model parameters that
produce lack of fit.
In this section we will consider three prototypical CFA models, each with a
different substantive interpretation. We will present each prototypical model and discuss
it in the context of our negative affect example. The example for the first two prototypes
involves postulating a factor structure underlying four measures: a trait hostility measure,
an anger measure, an anxiety measure, and a depressive symptoms score. Each of the
measures is derived from summing the items on a self-report instrument designed to
measure that construct. For the third prototypical model, we extend this example by
breaking the depressive symptoms measures out into three domains of symptoms,
affective, somatic, and cognitive. 1 We use the terms factor and latent variable interchangeably throughout this paper.
Confirmatory Factor Analysis 7
Single Factor Model
The single factor model is the simplest CFA model. Nevertheless, we devote
considerable attention to it in order to introduce the basic concept of CFA and
conventional SEM terminology.
A single factor model specifies a unidimensional model, hypothesizing that a
single dimension underlies a set of measures. Unidimensionality is often seen as a
desirable characteristic of a set of measures, in particular because multiple measures can
be reduced to a single measure, thus improving parsimony, and also because the single
dimension is typically more reliable than a given individual component. As with any
structural equation model, a single factor model can be presented pictorially as a path
diagram or in equation form. Figure 1 is a graphical representation of a model with a
single factor (F1) underlying four measures, X1 (hostility), X2 (anger), X3 (anxiety), and
X4 (depressive symptoms). By convention, the factor is depicted as a circle, which
represents a latent variable, while the observed measures are squares, which represent
observable or indicator variables. A single-headed arrow between two variables indicates
the direction of the effect of the one variable on the other. Within the context of our
example, we are postulating that a factor called negative affect (F1) underlies or
determines the observed scores on the hostility, anger, anxiety, and depressive symptom
measures. Statistically, we believe these four measures are correlated because they have a
common underlying factor, negative affect. In other words, the model reflects the belief
that changes in the unobserved latent variable, negative affect, are presumed to result in
changes in the four variables that we have actually measured.
Confirmatory Factor Analysis 8
Continuing with Figure 1, a variable with arrows pointing only away from it is
called exogenous. A variable with one or more arrows pointing to it, even if one or more
arrows are pointing away from it, is called endogenous. One equation is associated with
each endogeneous variable. Accordingly, the model in Figure 1 involves four endogenous
variables and therefore four equations:
1 11 1 1
2 21 1 2
3 31 1 3
4 41 1 4
X = λ F + EX = λ F + EX = λ F + EX = λ F + E
.
The lambdas ( )λ are factor weights or loadings, which can be interpreted essentially like
regression coefficients. For example, for every one unit increase in the negative affect
factor, F1, the expected change in hostility, X1, will be λ11.
Observed measures are not likely to be pure indicators of a factor, but almost
certainly contain unique components, frequently referred to as residuals or errors (E). A
unique component for a measure includes two kinds of variability--reliable information
that is specific to that measure but not related to the factor, and unreliable information,
otherwise known as measurement error. Because errors are not directly observable, they
are also latent variables and are represented in our path diagram as circles. For the
hostility measure in our example, the unique component might include the specific
component of agitation as well as measurement error due to inattentiveness of
respondents and ambiguity of the items on this measure.
Finally, our path diagram also includes double-headed curved arrows. If an arrow
begins and returns to the same exogenous variable, it represents the variance of that
variable. A double-headed arrow could also be drawn between any two errors to represent
Confirmatory Factor Analysis 9
a covariance between them, but we chose not to include error covariances in our model to
avoid unnecessary complexity.
The model parameters, or unknowns, which we seek to estimate or constrain
based on our understanding of a study, are associated with the single-headed and double-
headed arrows in our diagrams and, by convention, are shown as Greek letters. In
addition to the lambdas, the parameters for the model in Figure 1 are the variance of the
factor ( )1
2Fσ and the variances of the errors ( )1 4
2 2E Eσ - σ . As shown at the bottom of the
figure, the model parameters can also be presented in three matrices: the phi matrix ( )Φ
containing the variances and covariances among factors, the lambda matrix ( )Λ that
includes all factor weights, and the theta matrix ( )Θ that includes the variances and
covariances among the errors.
We now turn to a concept that concerns all SEM, including our present CFA
models, the idea of “free” or “fixed” (also referred to as constrained) parameters. When
we conduct a conventional multiple regression, we typically deal directly only with free
parameters, those we wish to estimate.2 In a multiple regression model, the free
parameters are the regression coefficients. We tell the software or algorithm ahead of
time to calculate those values. In SEM, however, we specify not only which parameters
are to be estimated, but most critically, we also specify which parameters are constrained
or fixed, that is, which parameters are not to be estimated. In SEM, parameters can be
constrained in a number of ways, including fixing them to a specific value or to be equal
2 There are in fact a number of constraints even in a conventional multiple regression, but these are typically just part of the underlying assumption of the simple for of the model. For example, unless we explicitly specify the model differently, the relations between the predictors and response variable are all assumed to be linear, which is in a broad sense, a constraint. There are a number of other such constraints in most statistical models.
Confirmatory Factor Analysis 10
to each other. For example, in our negative affect CFA model, we could constraint the
loading (l) for the anxiety variable to be some value that we had obtained in a prior study.
Or, we could constrain the loading for anxiety to equal to loading of anger. These
constraints are generally used when we have a very specific theoretical question about
those parameters. A constraint that is far more frequently used is one in which the
parameter is constrained to a value of zero. For example, our single factor model includes
no covariances among errors (i.e., all zeros in the off-diagonal positions of the theta
matrix). Substantively, these constraints on the error covariances reflect our belief that
there are no other factors other than the negative affect factor that systematically
influence the variability in the measured variables. Also reflecting that there are no
additional factors underlying the measures, we don’t specify any additional parameters in
the phi and lambda matrices. This is a less obvious constraint in that they are made by
simply omitting any reference to additional factors; the net result, however, is that of
constraining the variances and weights of additional factors to zero. This latter point will
become clearer when we present our second example below. If one or more of these
constraints are incorrect, the model is likely to fit poorly and be rejected.
Another important class of constraints are those used to define an arbitrary metric
for a factor. The metric constraint is often a bit mysterious to SEM novices, and while
the precise mathematical details are not critical to our purposes here, we will describe
how this constraint is applied and what it accomplishes. Factor metrics are arbitrary
because they are latent variables—they are unobservable variables and hence do not have
an inherent metric. We can assign a metric for a factor by either fixing its variance to 1,
as we did, or fixing one of its weights to 1. The choice should have no effect on the fit
Confirmatory Factor Analysis 11
for the relatively simple models considered in this article. Fixing the factor variance to 1
in effect standardizes the factor, and the resultant loadings can be interpreted as
standardized. In more complex models, however, it may be necessary to fix one of the
factor weights (lambdas) to 1 rather than to fix the variance of the factor to 1.
Researchers often select the weight associated with the measured variable that is believed
to have the strongest relationship with the underlying factor. In our negative affect
example, we might choose to fix the weight of the best developed depressive symptom
measure to 1. The result of fixing a factor weight to 1 puts all the other loadings in a
scale that is relative to the depressive symptom scale. Very broadly speaking,
constraining one of the loadings to equal 1 is a distant cousin to selecting a reference
category for a set of dummy variables in a linear model, in that it provides a point of
comparison for the other effects. In addition to setting the metric of a latent variable, the
constraint also helps the algorithm estimate the remaining free parameters by making it
more likely that the model is identified. We will discuss the concept of identification in
more detail later in this tutorial, but the central idea of identification is that we cannot
have fewer data points than unknowns (free parameters) in the model. Finally, as we
noted above, the constraints to define a factor’s metric do not influence model fit. All
other model constraints, however, have potential effects on the fit of the model to the
data.
All of the free parameters in our model (i.e., those not constrained to 1 or 0) are
estimated based on the data. If the model fits, we interpret these estimated parameters to
evaluate, for example, which measure is the best indicator of the factor. As in the case of
regression analysis, interpretation is best performed by examining the standardized
Confirmatory Factor Analysis 12
weights (i.e., when the factor and measures are transformed to z-scores). If the model
fails to fit, we should not interpret the estimated parameters because their values may
have been adversely affected by the potential misspecification of the model. If our model
1 provides adequate fit (which we will define later) and each of the factor loadings is
substantial, we would conclude that there is evidence to support the idea that there may
be a single latent variable underlying the four observed measures. However, this result
does not mean that the one factor model is the only structure that might produce good fit.
The good fit only means that it is one of the possible models that fits well. Apart from
the obvious theoretical implications of a well-fitting one factor model with high factor
loadings, the result also suggests that we might feel fairly safe using just the single
negative affect latent variable as a predictor (or as an outcome or mediator) in a more
extensive structural model. Of course, if the model does not fit, we will have to test
alternative models in order to understand whether there is a structure that might fit the
observed data better.
Correlated Factors Model
Our second model is a correlated factors model, which specifies that two or more
factors underlie a set of measured variables and also that the factors are correlated. For
simplicity, we will consider a two-factor model, but our discussion is relevant to models
with more than two factors.
In Figure 2 we present a model for our four measures but now with two correlated
factors. As with our path diagram for a single factor model, we have circles for latent
variables (i.e., factors and errors), squares for measured variables, single-headed arrows
Confirmatory Factor Analysis 13
for effects of one variable on another, double-headed curved arrows for variances of
exogenous variables, and a double-headed arrow for the covariance between the two
factors. Within the context of our negative affect example, we might speculate that the
hostility and anxiety measures are related to one another by the shared characteristic of
agitation and also, to some degree, distinct from the two depressive symptom measures.
In other words, the model should include a factor (F1) affecting the hostility and anger
measures (X1 and X2), and another factor (F2) affecting the anxiety and depressive
symptom measures (X3 and X4).
Model parameters are associated with all single-headed and double-headed arrows
and are presented in matrix form at the bottom of Figure 2. Constraints can be imposed
on the model parameters. As previously presented, we can define the metric for factors by
constraining their variances to 1 or one of their weights to 1. In this instance, we
arbitrarily chose to set the factor variance to 1 ( )1 2
2 2F Fi.e., σ = 1andσ = 1 .
All constraints besides those to determine the metric of factors can produce lack
of fit and are evaluated in assessing the quality of a model. For example, the effects of
factors on measures, as shown by arrows between factors and measures in the path
diagram, can be represented as equations,
1 11 1 2 1
2 21 1 2 2
3 1 32 2 3
4 1 42 2 4
X = λ F + 0 F + EX = λ F + 0 F + EX = 0 F + λ F + EX = 0 F + λ F + E
.
As shown, the equations indicate that a number of factor loadings are constrained to zero
such that each measured variable is associated with one and only one factor. The
specified structure is consistent with the idea of simple structure, an objective frequently
Confirmatory Factor Analysis 14
felt to be desirable with EFA. In addition, a measure is less likely to be misinterpreted if
it is a function of only one factor. Given the advantages of this structure, researchers
frequently begin with specifying models that constrain factor loadings for a measure to be
associated with one and only one factor. In other words, each measure has one weight
that is freely estimated, and all other weights (potential crossloadings) between that
measure and other factors are constrained to 0.
Other parameters in our model that may be freely estimated or constrained are the
covariance between the factors and the variances and covariances among errors. (a) With
CFA, we typically allow the factors to be correlated by freely estimating the covariances
between factors. If we constrained all factor covariances to be equal to zero (i.e.,
orthogonal factors) and also constrained many of the factor loadings to be equal to zero
(e.g., each measure being associated with only one factor), we would be hypothesizing a
model that does not allow for correlations among measures associated with different
factors. This model is likely to conflict with reality and be rejected empirically. In
addition, this model would be inconsistent with many psychological theories that suggest
underlying correlated dimensions. The decision to allow for correlated factors is in stark
contrast with practice in EFA, where researchers routinely choose varimax rotation
resulting in orthogonal factors. However, in EFA, we can still obtain good fit to data in
that all factor loadings are freely estimated (i.e., all measured variables are a function of
all factors), permitting correlations among all measured variables. (b) We usually think of
our measured variables as being unreliable to some degree and thus must freely estimate
the error variances. In most CFA models, we begin by constraining all covariance
Confirmatory Factor Analysis 15
between errors to be 0. By imposing these constraints, we are implying that the
correlations among measures are purely a function of the specified factors.
If this model fits our data, we again have a structure that is consistent with the
data, but still cannot rule out other specifications that also might fit. A good fit for the
two factor model suggests that although all four measures may share variability, anger
and hostility may represent an underlying construct that is relatively distinct from the
construct that underlies the anxiety and depression measure. We might interpret the
anger and hostility factor as something like ‘opposition’ or perhaps ‘aggression,’ while
the depression and anxiety might be interpreted as something like ‘withdrawal.’ Again,
this result would have implications regarding whether we would be better off using these
two separate latent variables rather than the single negative affect variable.
In practice, we would typically compare the unidimensional model with the two
correlated factor model. We can do this formally in SEM by comparing the difference
between the fit of the models. We pointed out earlier that a well-fitting model does not
guarantee that it is the correct model. For this reason, SEM procedures such as CFA are
at their scientific best when there are several theoretically plausible models available to
compare. We will discuss fit a bit later, and model comparison in the second installment.
For now, we turn to one more type of model structure, just to further illustrate the kinds
of models that can be represented.
Bifactor Model
A bifactor model may include a general factor associated with all measures and
one or more group factors associated with a limited number of measures.(6, 7) In Figure
Confirmatory Factor Analysis 16
3 we present a model for six measures with one general factor and one group factor.
Keeping with our negative affect example, we replace the single depressive symptom
measures with three separate scores of affective, cognitive, and somatic symptoms of
depression. Let’s say that X1, X2, and X3 are the hostility, anger, and anxiety measures,
and the X4, X5, and X6 are the affective, cognitive, and somatic depressive symptom
measures. Due to space limitations, we will only briefly describe the specification of this
model.
As typically applied, we are unlikely to obtain a bifactor model with EFA in that
an objective of this method (with rotation) is to obtain simple structure, which is
generally intolerant to a general factor. In contrast, in CFA, we choose which parameters
to estimate freely and which to constrain to 0. Thus, we can allow for a general factor as
well as group factors. Most frequently bifactor models have been suggested as
appropriate for item measures associated with psychological scales (See (8)). Although
interesting measures are likely to assess a general trait or factor, they are also likely to
include more specific aspects of that trait, that is, group factors. In contrast with the
previous model, factors for a bifactor model are typically specified to be uncorrelated
(i.e., the factor covariances are constrained to 0). In our example, this model suggests that
the three depressive symptom measures are to some extent distinct from the other three
measures, but that a broader general factor, which might be called negative affect, also
underlies all six measures. 3
3 See Reise, Morizot, and Hays (2007) for a discussion of bifactor models. They suggest, for example, that items on an appropriately developed scale of depression would assess not only the general factor of depression, but also subsets of items would assess group factors representing such aspects as somatization and feelings of hopelessness.
Confirmatory Factor Analysis 17
Estimation of Free Parameters
Next we will consider how free parameters are estimated. We will discuss the
estimation using the model presented in Figure 1 with four measures and a single factor.
Hats (^) are placed on top of model parameters in recognition that we are now working
with sample data as opposed to model parameters at the population level.
SEM software typically allows a variety of input data formats, including raw
case-level data, the observed covariances among the study measures, or the correlations
and standard deviations of the measures. Regardless of the form of the data that you enter
into the software, the standard maximum likelihood estimation algorithm ultimately uses
the variances and covariances among the measured variables. If you input data as a
covariance matrix, the software will use this matrix directly; if you input data as raw
cases or correlations and standard deviations among measures, the software will convert
them to a covariance matrix before conducting the SEM analyses. These variances and
covariances are elements in the sample covariance matrix, S . The specified model with
its freely estimated parameters tries to reproduce this covariance matrix. The reproduced
matrix based on the model (also called the model-implied covariance matrix) is ModelΣ̂ and
the equation linking it to the model parameters is
Confirmatory Factor Analysis 18
1
2
13
4
Model
211 E
2E21 2
2F 11 21 31 41E31
2E
41
ˆ ˆ ˆˆ ˆ Σ = Λ Φ Λ + Θ
λ̂ σ̂ 0 0 0ˆ ˆ0 σ 0 0λ ˆ ˆ ˆ ˆσ̂ λ λ λ λ + ˆ0 0 σ 0λ̂
ˆ0 0 0 σλ̂
′
⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎡ ⎤⎡ ⎤= ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎣ ⎦⎣ ⎦
(1)
The details of the equation are not important, but rather it is crucial to understand that the
values of the model parameters dictate the quantities in the reproduced covariance matrix
among the measured variables. The objective of the estimation procedure is to have the
variances and covariances based on the model parameters (i.e., the values in ModelΣ̂ ) to be
as close to the variances and covariances among measures in our sample data (i.e., the
values in S ). Stepping back from the technical details for a minute, we are assuming that
some knowable process in the population exists that has generated the set of variances
and covariances that we have observed among our study variables. In SEM, we use our
substantive knowledge of the field to hypothesize what that process might have been by
specifying what we think the model looks like. The model structure we specify
ultimately corresponds to a model-implied variance-covariance matrix. To the extent that
we have specified something like a plausible model, the values in that model-implied
matrix ought to be similar to the observed matrix. In practice, the constraints imposed on
the model do not permit perfect reproduction of S ; that is, Modelˆ≠S Σ . To summarize the
steps of estimation so far, we postulate a model that specifies how we believe the latent
and measured variables are related, and more importantly, how they are not related (the
constraints). The algorithm generates estimates of the parameter values, which in turn
Confirmatory Factor Analysis 19
determine the implied matrix among measures. The implied matrix should be similar to
the sample covariance matrix if the model is to be considered a good fit. Below we
consider how the parameter estimates are actually generated and then how the implied
matrix is used to evaluate the fit of the model.
In contrast to regression analysis and many other statistical methods, equations
are not available for directly computing the freely estimated parameters. The estimates
are instead computed by an iterative process, initially making arbitrary guesses about the
values of the model parameters and then repeatedly modifying these values in an attempt
to have the values between S and ModelΣ̂ be as similar as possible. The process stops
when a criterion is met that suggests that the differences between S and ModelΣ̂ cannot be
smaller.
A very simple example might be helpful at this point, plugging in some fictional
values into our one factor model example. (The code and input data for this example can
be found at http://www.duke.edu/web/behavioralmed). The reported results are based on
an SEM analysis using the software program EQS. Let’s say that the variances for the
hostility, anger, anxiety, and depressive symptom measures are 1 and all covariances
among measures are .36. A variance-covariance matrix of the 4 observed measures would
look like this, with the values in the main diagonal being the variances of each of our
variables, and the off-diagonal elements the covariances between any two given
variables:
1 .36 .36 .36.36 1 .36 .36
S.36 .36 1 .36.36 .36 .36 1
⎡ ⎤⎢ ⎥⎢ ⎥=⎢ ⎥⎢ ⎥⎣ ⎦
Confirmatory Factor Analysis 20
(This set of values is highly improbable in the real world, but for convenience we have
created a covariance matrix S as a correlation matrix.) In addition, in specifying the
model, let’s say we fix the variance of our underlying factor to 1 to define its metric. The
SEM software package begins with very rough estimates of 1 for all factor loadings and
all error variances. For these estimated parameters, the reproduced covariance matrix
among the measured variables based on Equation 1 is
[ ][ ]Model
1 1 0 0 0 2 1 1 11 0 1 0 0 1 2 1 1ˆ Σ 1 1 1 1 1 +1 0 0 1 0 1 1 2 11 0 0 0 1 1 1 1 2
⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥= =⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦ ⎣ ⎦
.
Clearly, the reproduced and sample variances and covariances (on the right with 2s along
the main diagonal and 1s in the off-diagonal positions) are not very similar to the 1s and
.36s in S. The software then takes another guess, revising its estimates so that the factor
loadings are all .68 and the error variances are .64.
[ ][ ]Model
.68 .64 0 0 0 1.102 .462 .462 .462
.68 0 .64 0 0 .462 1.102 .462 .462ˆ Σ 1 .68 .68 .68 .68 +
.68 0 0 .64 0 .462 .462 1.102 .462
.68 0 0 0 .64 .462 .462 .462 1.102
⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥= =⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦ ⎣ ⎦
Now the values in ModelΣ̂ are more similar to the values in S, but still not exactly the
same. In the next iteration, all factor loadings are estimated to be .605, while the error
variances are estimated to be .640. With two additional iterations, the final estimates are
.600 for all factor loadings and .64 for all error variances.
[ ][ ]Model
.60 .64 0 0 0 1 .36 .36 .36
.60 0 .64 0 0 .36 1 .36 .36ˆ Σ 1 .60 .60 .60 .60 +
.60 0 0 .64 0 .36 .36 1 .36
.60 0 0 0 .64 .36 .36 .36 1.
⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥= =⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦ ⎣ ⎦
Confirmatory Factor Analysis 21
For this artificial example, the model parameters reproduce perfectly the sample
covariance matrix among the measures. In other words, the fit of the model to the data is
perfect—a highly unlikely result in practice. We also note here that we interpret the
loadings in the conventional way for a statistical equation with estimated weights. In this
case, for every one unit increase in the underlying factor, the score of the observed
measure is expected to increase .64 units.
How does the algorithm know when S and ModelΣ̂ are similar enough to stop
iterating? Mathematically, it is necessary to specify a function to define the similarity.
The most popular estimation approach is maximum likelihood (ML), and, with this
approach, the iterative estimation procedure is designed to minimize the following
function:
1ML Model
ˆ ˆF = log - log S + trace(S ) - pModel−Σ Σ , (2)
where p is the number of measured variables. It is not crucial to understand the details of
the equation. What is important to know is that each iteration (set of parameter guesses)
produces a value for FML and that FML is a mathematical reflection of the difference
between S and ModelΣ̂ for a given set of parameters. When FML is at its smallest value, S
and ModelΣ̂ are as similar as they can be, given the data and the hypothesized model. The
values of the parameter estimates at this point in the iterative process are the maximum
likelihood estimates for the CFA model.
For our example above, FML becomes smaller with each iteration, as shown in
Table 1. At step 5, it recognizes that there was no change from step 4 to step 5 and that
Confirmatory Factor Analysis 22
there were no changes in the parameter estimates, so the process stops and the estimates
at the last step are the maximum likelihood estimates.
-----------------------------------
Table 1 About Here
-----------------------------------
Although researchers most frequently minimize FML to obtain estimates in SEM,
it is sometimes preferable to choose other functions to minimize. For example, a different
function—the full information maximum likelihood (FIML) function—is preferable if
some data on measures are missing. When modeling item-level data (such as Likert-type
items) a weighted least squares (WLS) function is generally preferred for estimating
model parameters.
To summarize our steps so far, we specify a model, the algorithm then generates a
series of guesses for the parameters, trying to find parameters that imply a covariance
matrix ( ModelΣ̂ ) that is as similar as possible to the observed covariance matrix (S). When
the changes in the parameters can no longer make the implied matrix any more similar to
the observed matrix, the algorithm stops, and the final parameter estimates are reported.
We now turn to how fit is formally evaluated.
Assessment of Global Fit
We must assess the quality of a model by examining the output from SEM
software to determine if the model and its estimated parameters are interpretable. We first
scan the output for warning messages and rerun analyses when appropriate. Second, we
assess local fit. Examples include evaluation of individual estimated parameters to ensure
Confirmatory Factor Analysis 23
they are within mathematical bounds (e.g., no negative variances or correlations about
1.0), are within interpretational bounds (i.e., no parameter estimates with values that defy
interpretation), and are significantly different from zero based on hypothesis tests. Third,
we examine global fit to determine if the constrained parameters of a model allow for
overall good fit to the data. We will concentrate our attention on global judgments of fit
here.
As previously described, the fit function is used to assess whether the estimated
model parameters are optimal with respect to fit to the data. Given it is deemed useful for
assessing global fit in the estimation of model parameters, it is not surprising that the fit
function is also a central component of all global fit indices, as we describe next.
Testing the Hypothesis of Perfect Fit in Population
We can assess the hypothesis that the researcher’s model is correct in the
population. More specifically, we can ask whether the reproduced covariance matrix
based on the model ( ModelΣ ) is equal to the population covariance matrix among the
measures (Σ ). We can state this question in the more familiar form below: the null
hypothesis of equality between the model-implied and population covariance matrix is
given below as H0, while the alternative hypothesis that the two matrices are different is
given as HA.
0 Model
A Model
H : 0H : 0
Σ −Σ =Σ −Σ ≠
(3)
Two comments are worth noting about how this question is posed in SEM. First, in most
non-SEM applications of hypothesis testing, rejection of the null hypothesis implies
support for the researcher’s hypothesis. In contrast, in SEM rejection of the null
Confirmatory Factor Analysis 24
hypothesis indicates that the researcher’s hypothesized model does not hold in the
population—the model-implied and population matrices are “significantly different.”
Second, no model is likely to fit perfectly in the population, and thus we know, a priori,
that the null hypothesis concerning the researcher’s hypothesis is false.
The test of the null hypothesis is straightforward. The test statistic, T, is a simple
function of sample size (N) and the fit function:
( ) MLT = N -1 F (4)
(or ML=T N F , as computed in some SEM software packages). In large samples and
assuming the p measured variables are normally distributed in the population, T is
distributed approximately as a chi square. The degrees of freedom for the chi square are
equal to the number of unique variances and covariances in the covariance matrix among
measured variables (i.e., ( )1 2p p +⎡ ⎤⎣ ⎦ ) minus the number of freely estimated model
parameters (q), that is,
p (p 1)df q2+
= − . (5)
In most applications with some degree of model complexity, a sample size of 200 or
greater is recommended for T to be distributed approximately as a chi square. However, a
greater sample size may be required to have sufficient power to reject hypotheses of
interest, including hypotheses about particular parameters or set of parameters.
Unfortunately, this test of global fit suffers from the same problems that a
conventional hypothesis does. If the null is not rejected, it may be due to insufficient
sample size, that is, a lack of power. In addition, non-rejection does not imply that the
researcher’s model is correct—it is incorrect to “accept the null hypothesis.” In fact, it is
Confirmatory Factor Analysis 25
likely that a number of alternative models produce similar T values. If the hypothesis is
rejected, we can only conclude what we knew initially: the model is imperfect. In
addition, note that the formula for T is highly dependent on sample size. If the sample
size is large, the T value will necessarily be large, and even small and possibly
unimportant discrepancies between the model-implied and observed covariance matrix
will yield significance. It is our observation that tests of models are routinely
significant—meaning that we conclude our model does not fit—when sample size
exceeds 200.
Fit Indices: Assessing Degree of Fit
Because the chi-square fit test is affected by sample size, a wide variety of other
measures of fit have been proposed. Two indices that are used frequently are Bentler’s
comparative fit index (CFI) and the root mean square error of approximation (RMSEA).
CFI. The CFI is a comparison of the fit of the researcher’s model to the fit of a
null model. The null model is highly constrained and unrealistic. More specifically, the
model parameters are constrained such that all covariances among measured variables are
equal to zero (implying all correlations are equal to zero). Accordingly, we expect a
researcher’s model to fit much better than a null model.
In the population, CFI is defined as
null model researcher's modelpop
null model
λ - λCFI =λ
. (7)
λ is a non-centrality parameter that is an index of lack of fit of a model to a population
covariance matrix. λ is zero if a model is correct and becomes larger to the degree that
the model is misspecified. We would expect the null model to be a badly misspecified
Confirmatory Factor Analysis 26
model in most applications of SEM; therefore, null modelλ would be large. In comparison,
researcher's modelλ should be much smaller. Accordingly, we expect to obtain high CFI values
to the extent that the researcher’s model is superior to a null model that implies
uncorrelated measured variables.
In the formula for the CFIpop, we can substitute T - df for λ to obtain a sample
estimate of CFI:
( ) ( )( )
null model null model researcher's model researcher's model
null model null model
T df - T dfCFI =
T df− −
− (8)
According to Hu and Bentler(9), a value of .95 or higher indicates good fit. This cutoff is
consistent with the belief that a researcher’s model should fit much better than the
unrealistic null model. It must be noted that cutoffs for fit indices are problematic and a
preferable approach is to use these indices to compare fits for various alternative models.
RMSEA. The RMSEA is a fit index that assesses lack of fit, but does not use the
unrealistic comparison of a null model. The sample estimate is also a function of T and
df:
( )( )
researcher's model
researcher's model
T N -1 1RMSEA = -df N -1
. (9)
To the extent that the model fits [i.e., small ( )researcher's modelT N -1 ] and the model involves
estimating few model parameters (large researcher's modeldf ), RMSEA should approach zero.
RMSEAs of less than .06 indicate good fit according to Hu and Bentler(9), but again this
cutoff should be treated as a rough-and-ready rule of thumb.
Confirmatory Factor Analysis 27
Underidentification and Other Problems in Estimation
After presenting the example, let’s return to discuss one technical issue. A
requirement for estimating the parameters of a model is that the model must be identified.
Identification means simply that the information about your sample must equal or exceed
the needs defined by the estimation of your model parameters. Information about your
sample is captured in the variances and the covariances among the measured variables.
This information is used to estimate the unconstrained model parameters—potentially,
the factor loadings, the variances and the covariance among the factors, and variances
and covariances among the errors. The t-rule states that the number of freely estimated
parameters (q) must be less than or equal to the number of unique variances and
covariances among the measured variables, which is equal to p (p 1) 2+ . Another way to
express the t-rule is that the degrees of freedom for the chi square test cannot be negative
(see Equation 5).
The bad news is that even if your model passes the t-rule, the model may still be
underidentified (i.e., not identified). This occurs if the number of parameter for a portion
of the model exceeds the available sample information. For example, a model might
include one factor with freely estimated loadings on the first 2 measures and a second
factor with freely estimated loadings on the remaining 2 measures. All other loadings are
constrained to 0; the factor covariance and error covariances are constrained to 0; and the
factor variances are fixed to 1 to set their metric. In this example, the variances and
covariances for each pair of measures are available to estimate only the model parameters
for these measures. Because each pair of measured variables are linked to one and only
Confirmatory Factor Analysis 28
one factor, it is as if two CFAs are being conducted—one for the first pair of measures
and another for the second pair. The consequence is that the model cannot be estimated
because the number of freely estimated parameters for any pair of measures (2 loadings
and 2 error variances) exceeds the amount of sample information (2 variances and 1
covariance between these measures).
Additional identification rules are available. The 3-indicator rule may be applied
for the example just described. If each measure has only one estimated factor loading
(others constrained to 0), the covariances among factors are constrained to 0, and the
covariances among the errors are constrained to 0, then a model is identified if each
factor has estimated loadings on at least 3 measures (as opposed to 2 measures as
described in our previous example). In most SEM applications, factors are allowed to be
correlated, and then a model is identified if each factor has estimated loadings on at least
2 measures. For the 2-indicator rule, the same conditions must hold as with the 3-
indicator rule except the covariances among factors are freely estimated.
There is both bad and good news about the use of the 2- and 3-indicator rules. The
bad news is that they are not applicable for many CFA models. For example, they are not
helpful in determining if a bifactor model with both group and general factors is
identified. The good news is that available software is likely to give warning messages if
the model is underidentified. More bad news is that it is not always obvious what the
warning messages mean and what, if anything, should be done to remedy the problem.
In fact, the messages might suggest other estimation problems such as empirical
underidentification or bad start values. With empirical underidentification, the model is
identified mathematically, but nevertheless the parameters of the model cannot be
Confirmatory Factor Analysis 29
estimated because of the data. For example, a CFA model with two factors might meet all
the requirements of the 2-indicator rule, but may still not be able to be estimated if the
freely estimated covariance between factors is 0 (or close to 0). In this case, because the
factors are uncorrelated, 3 measures are required per factor. Alternatively, for the same
example, if the estimated factor loading for a measure is 0, it cannot be counted as one of
the indicators for a factor.
The other estimation problem is bad start values. As described earlier, the
estimation process in CFA is iterative and requires start values that are created by the
SEM software. With more complex models, the start values created by the program may
be bad in that they do not produce adequate estimates. In this instance, the researcher
may ask the program to conduct more iterations to get a good solution or may be forced
to supply their own start values for parameter estimation. Researchers might use
estimates from exploratory factor analysis or other CFA models to supply start values.
In conducting CFA, no researcher wants to see warning messages about parameter
estimates. Our piece of advice is not to deny the presence of warning messages, but rather
to acknowledge their presence and work through them with someone you trust can help
you (i.e., your local SEM expert).
Conclusions
In many applications, researchers who apply exploratory factor analysis could use
confirmatory factor analysis. To the extent that researchers have some knowledge about
the measures that they are analyzing, they should be conducting CFA. There are real
benefits to specifying rigorously one’s beliefs about measures, assessing those beliefs
Confirmatory Factor Analysis 30
with indices that allow for disconfirmation of these beliefs, and at the end being able to
specify which alternative model produces the best fit. It may require more thoughtfulness
upfront than EFA, but the outcome is likely to be more informative if the methods of
CFA are applied skillfully.
As we noted at the beginning of this piece, SEM is capable of carrying out a fairly
staggering variety of types of analytic procedures. We have presented one of the two
procedures that are fundamental to the full structural equation model. In our next
installment we will present the second fundamental procedure, path analysis, and also
how CFA and path analysis combine to form the full structural model. In the coming
installment, we also will present several important concepts that we had to omit in the
present paper. These concepts include how to compare competing models, approaches to
modifying models, indirect effects and mediation, and more general considerations, such
as sample size and power.
Confirmatory Factor Analysis 31
References
1. Wright S. Correlation and causation. Journal of Agricultural Research
1921;20:557-85.
2. Joreskog KG, Goldberger AS. Estimation of a model with multiple indicators and
multiple causes of a single latent variable. Journal of the American Statistical
Association 1975;70:631-9.
3. Calis JCJ, Phiri KS, Faragher EB, Brabin BJ, Bates I, Cuevas LE, de Haan RJ,
Phiri AI, Malange P, Khoka M, Hulshof PJM, van Lieshout L, Beld MGHM, Teo
YY, Rockett KA, Richardson A, Kwiatkowski DP, Molyneux ME, van
Hensbroek MB. Severe Anemia in Malawian Children. N Engl J Med
2008;358:888-99.
4. Rosen R, Contrada R, Gorkin L, Kostis J. Determinants of perceived health in
patients with left ventricular dysfunction: a structural modeling analysis.
Psychosom Med 1997;59:193-200.
5. Suls J, Bunde J, Suls J, Bunde J. Anger, anxiety, and depression as risk factors for
cardiovascular disease: the problems and implications of overlapping affective
dispositions. Psychological Bulletin 2005;131:260-300.
6. Rindskopf D, Rose T. Some theory and applications of confirmatory second-order
factor analysis. Multivariate Behavioral Research 1988;1988:51-67.
7. Yung YF, Thissen D, McLeod LD. On the relationship between the higher-order
factor model and the hierarchical factor model. Psychometrika 1999;64:113-28.
8. Reise SP, Waller NG, Comrey AL. Factor analysis and scale revision.
Psychological Assessment 2000;12:187-297.
Confirmatory Factor Analysis 32
9. Hu L, Bentler PM. Cutoff criteria for fit indexes in covariance structure analysis:
Conventional criteria versus new alternatives. Structural Equation Modeling
1999;6:1-55.
Confirmatory Factor Analysis 33
Suggested Readings
Introductory Reading:
Brown TA. Confirmatory factor analysis for applied research. New York: Guilford
Press; 2006.
Green SB, Thompson MS. Structural equation modeling in clinical research. In: Roberts
MC, Illardi SS, editors, Methods of Research in Clinical Psychology: A Handbook.
London: Blackwell; 2003. p 138-175.
Kline RB. Principles and practice of structural equation modeling (2nd ed). New York:
Guilford Press; 2005.
Glaser D. Structural Equation Modeling Texts: A primer for the beginner. Journal of
Clinical Child Psychology 2002; 31: 573-578.
More Advanced Reading:
Bollen KA. Structural Equations with Latent Variables. New York: Wiley; 1989.
Edwards JR, Bagozzi RP. On the nature and direction of relationships between
constructs and measures. Psychol Methods 2000; 5:155-174.
MacCallum RC, Roznowski M, Necowitz LB. Model modifications in covariance
structure analysis: The problem of capitalization on chance. Psychol Bull 1992; 111:
490-504.
McDonald R, Ho M-HR. Principles and practice in reporting structural equation analyses.
Psychol Methods 2002; 7: 64-82.
Wirth RJ, Edwards MC. Item factor analysis: Current approaches and future directions.
Psychol Methods 1989; 12:58-79.
Confirmatory Factor Analysis 34
Table 1. Estimation of Model Parameters by Minimizing of FML for Our Example
Steps in iterative
process
Estimates of factor
loadings at each step
Estimates of error
variance at each step FML
1 1 1 .55194
2 .680 .640 .01523
3 .605 .640 .00006
4 .600 .640 .00000
5 .600 .640 .00000
Confirmatory Factor Analysis 35
Figure Captions
Figure 1. A Single Factor Model
Figure 2. A Correlated Factors Model
Figure 3. A Bifactor Model
Confirmatory Factor Analysis 36
F1
1
2Fσ 1=
21λ 31λ11λ 41λ
E1
1
2Eσ
E2
2
2Eσ
E3
3
2Eσ
E4
4
2Eσ
X1 X2 X3 X4
11
21
31
41
λλ
Λ =λλ
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
1
2
3
4
2E
2E
2E
2E
σ 0 0 00 σ 0 0
Θ = 0 0 σ 0
0 0 0 σ
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
1
2FΦ = σ 1⎡ ⎤=⎣ ⎦
Path Diagram:
Matrices:
F1
1
2Fσ 1=
21λ 31λ11λ 41λ
E1
1
2Eσ
E2
2
2Eσ
E3
3
2Eσ
E4
4
2Eσ
X1 X2 X3 X4
F1
1
2Fσ 1=
21λ 31λ11λ 41λ
E1
1
2Eσ
E2
2
2Eσ
E3
3
2Eσ
E4
4
2Eσ
X1 X2 X3 X4
11
21
31
41
λλ
Λ =λλ
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
1
2
3
4
2E
2E
2E
2E
σ 0 0 00 σ 0 0
Θ = 0 0 σ 0
0 0 0 σ
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
1
2FΦ = σ 1⎡ ⎤=⎣ ⎦
Path Diagram:
Matrices:
Confirmatory Factor Analysis 37
Path Diagram:
Matrices:
F1
21λ32λ11λ
42λ
E1
1
2Eσ
E2
2
2Eσ
E3
3
2Eσ
E4
4
2Eσ
X1 X2 X3 X4
F2
1 2F Fσ
1
2Fσ 1=
2
2Fσ 1=
1
2
3
4
2E
2E
2E
2E
σ 0 0 00 σ 0 0
Θ =0 0 σ 0
0 0 0 σ
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
1 1 2
1 2 2
2F F F
2F F F
σΦ =
σσ
σ⎡ ⎤⎢ ⎥⎣ ⎦
11
21
32
42
λ 0λ 0
Λ =0 λ0 λ
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
Path Diagram:
Matrices:
F1
21λ32λ11λ
42λ
E1
1
2Eσ
E2
2
2Eσ
E3
3
2Eσ
E4
4
2Eσ
X1 X2 X3 X4
F2
1 2F Fσ
1
2Fσ 1=
2
2Fσ 1=F1
21λ32λ11λ
42λ
E1
1
2Eσ
E2
2
2Eσ
E3
3
2Eσ
E4
4
2Eσ
X1 X2 X3 X4
F2
1 2F Fσ
1
2Fσ 1=
2
2Fσ 1=
1
2
3
4
2E
2E
2E
2E
σ 0 0 00 σ 0 0
Θ =0 0 σ 0
0 0 0 σ
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
1 1 2
1 2 2
2F F F
2F F F
σΦ =
σσ
σ⎡ ⎤⎢ ⎥⎣ ⎦
11
21
32
42
λ 0λ 0
Λ =0 λ0 λ
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
Confirmatory Factor Analysis 38
Path Diagram:
Matrices:
F1
F2
1
2Fσ 1=
2
2Fσ 1=
E6
6
2Eσ
X6
21λ11λ 41λ31λ
51λ 61λ
X5 E5
5
2Eσ
E4
4
2Eσ
X4X3 E3
3
2Eσ
E2
2
2Eσ
X2X1 E1
1
2Eσ
42λ 52λ 62λ
11
21
31
41 42
51 52
61 62
λ 0λ 0λ 0
Λ =λ λλ λλ λ
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
1
2
3
4
5
6
2E
2E
2E
2E
2E
2E
σ 0 0 0 0 00 σ 0 0 0 00 0 σ 0 0 0
Θ =0 0 0 σ 0 00 0 0 0 σ 0
0 0 0 0 0 σ
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
1
2
2F
2F
σ 0Φ =
0 σ⎡ ⎤⎢ ⎥⎣ ⎦
Path Diagram:
Matrices:
F1
F2
1
2Fσ 1=
2
2Fσ 1=
E6
6
2Eσ
X6
21λ11λ 41λ31λ
51λ 61λ
X5 E5
5
2Eσ
E4
4
2Eσ
X4X3 E3
3
2Eσ
E2
2
2Eσ
X2X1 E1
1
2Eσ
42λ 52λ 62λ
F1
F2
1
2Fσ 1=
2
2Fσ 1=
E6
6
2Eσ
X6
21λ11λ 41λ31λ
51λ 61λ
X5 E5
5
2Eσ
E4
4
2Eσ
X4X3 E3
3
2Eσ
E3
3
2Eσ
E2
2
2Eσ
X2X1 E1
1
2Eσ
E1
1
2Eσ
42λ 52λ 62λ
11
21
31
41 42
51 52
61 62
λ 0λ 0λ 0
Λ =λ λλ λλ λ
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
1
2
3
4
5
6
2E
2E
2E
2E
2E
2E
σ 0 0 0 0 00 σ 0 0 0 00 0 σ 0 0 0
Θ =0 0 0 σ 0 00 0 0 0 σ 0
0 0 0 0 0 σ
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
1
2
2F
2F
σ 0Φ =
0 σ⎡ ⎤⎢ ⎥⎣ ⎦