latent structural equation modelinglipas.uwasa.fi/~sjp/teaching/sem/lectures/semc1.pdfaish, a.m. and...
TRANSCRIPT
Latent Structural Equation Modeling
Seppo Pynnonen
Department of Mathematics and Statistics, University of Vaasa, Finland
As of March 7, 2016Seppo Pynnonen Latent Structural Equation Modeling
Introduction
Part I
Introduction
Seppo Pynnonen Latent Structural Equation Modeling
Introduction
Contents
1 Introduction
Measurement Models
Covariance and Correlation
Missing Data
On the Sample Size for SEM
Seppo Pynnonen Latent Structural Equation Modeling
Introduction
Measurement Models
Contents
1 Introduction
Measurement Models
Covariance and Correlation
Missing Data
On the Sample Size for SEM
Seppo Pynnonen Latent Structural Equation Modeling
Introduction
Measurement Models
Latent and Observable Variables
Latent variables
phenomena of theoretical interest which cannot directlyobservedassessed by manifest measures that are observable.The device in the assessment is a measurement model.
Seppo Pynnonen Latent Structural Equation Modeling
Introduction
Measurement Models
Simple Measurement Model
y = η + ε, (1)
where y is the observed value, η is the ”true” value and ε is ameasurement error.
Schematically this is typically depicted as
Figure : Simple measurement model
����
η y- ε�
Seppo Pynnonen Latent Structural Equation Modeling
Introduction
Measurement Models
Multiple Indicators
Typically one manifest variable (indicator) is not sufficient to fullyreflect the underlying unobservable construct.
If indicators y1, y2, and y3 measure the same latent construct, theabove model can be generalized to
����η
y1
y2
y3
���
�*
-HHH
Hj
λ1
λ2
λ3
ε1
ε2
ε3
�
�
�
Figure: Measurement model with three indicators.
Seppo Pynnonen Latent Structural Equation Modeling
Introduction
Measurement Models
Mathematical representation
Mathematically:
y1 = λ1η + ε1
y2 = λ2η + ε2 (2)
y3 = λ3η + ε3
The λ-coefficients are called loadings.
The measurement errors, εi , are assumed to be independent fromeach other and in particular form η.
Seppo Pynnonen Latent Structural Equation Modeling
Introduction
Measurement Models
Coefficient of Reliability
In terms of model (2) the variances, σ2i = var[yi ], of the observable
variables becomeσ2
i = λ2i σ
2η + σ2
εi, (3)
where σ2η = var[η] and σ2
εi= var[εi ].
A coefficient of reliability of yi in measuring η is
ρ2i =
λ2i σ
2η
σ2i
= 1−σ2εi
σ2i
(4)
(c.f. the coefficient of determination, R-square, in regression).
Seppo Pynnonen Latent Structural Equation Modeling
Introduction
Measurement Models
Structural Model
The ultimate interest in latent variable analysis usually is the in therelationships between different constructs.
A measurement model describes relationships between a constructand its measures (items, indicators).
A structural model specifies relationships between differentconstructs.
A proper specification of the measurement model is necessarybefore meaningful empirical analysis of the structural model.
Seppo Pynnonen Latent Structural Equation Modeling
Introduction
Measurement Models
Reflective and Formative
Reflective and formative measurements.
The distinction is in the direction of the effect.
Reflective measurement
construct → measurement
y = λη + ε (5)
Formative measurement
construct ← measurement
η = γy + ζ (6)
Reflective: Dependent variable observable.
Formative: Dependent variable latent.Seppo Pynnonen Latent Structural Equation Modeling
Introduction
Measurement Models
Reflective and Formative
Figure: Measurement models (Source: Diamantopoulos et al. 2008, JBR)
Seppo Pynnonen Latent Structural Equation Modeling
Introduction
Measurement Models
Reflective and Formative
Reflective model:
A change in the latent variable causes simultaneous variation in the
measurements.
All measures (indicators) must be positively correlated (Bollen, 1984,
Quality and Quantity, 377–385).
Typical example: ability
Formative model:
Indicators determine the latent variable
Typical examples: Socio-economic status, quality of life, careersuccess (for more details, Diamantapoulos, Riefler, and Roth 2008,JBR, 1203–1218).
Seppo Pynnonen Latent Structural Equation Modeling
Introduction
Covariance and Correlation
Contents
1 Introduction
Measurement Models
Covariance and Correlation
Missing Data
On the Sample Size for SEM
Seppo Pynnonen Latent Structural Equation Modeling
Introduction
Covariance and Correlation
Modeling Relationships
Modeling of the relationships is based on dependencies between the
variables. The dependencies are measured in terms of covariances or
correlations.
Seppo Pynnonen Latent Structural Equation Modeling
Introduction
Covariance and Correlation
Correlations
Table : Correlation types
Correlation Level of measurementPearson product-moment Interval–Interval
Spearman rank, Kendall’s tau Ordinal–Ordinal
Phi Nominal–Nominal
Point biserial Interval–Dichotomous
Rank biserial Ordinal–Dichotomous
Biserial Interval–Dichotomouswith undelying continuity
Polyserial Interval–ordinalwith underlying continuity
Tetrachoric Dichotomous–Dichotomous,with underlying continuityin both variables
Polychoric Ordinal–Ordinal,with undelying continuityin both variables
Seppo Pynnonen Latent Structural Equation Modeling
Introduction
Covariance and Correlation
Correlations
Pearson product-moment: the ”usual” correlation.
Rank correlation (Spearman’s rho) is factually the usualcorrelation between ranked observations (differences become ifthere are ties in the ranks).
Kendall’s tau: Due to computerization has become preferredover the Spearman’s rank correlation.
Point biserial correlation: is again the same as the usualcorrelation when one of the variables is dichotomous (assumesvalues 0 and 1), (no special program is needed in calculations).
Biserial correlation: Rarely used, use polyserial instead.
Seppo Pynnonen Latent Structural Equation Modeling
Introduction
Covariance and Correlation
Correlations
Polyserial correlation: Assumes bivariate normality betweenthe interval variable and the underlying continuous variable ofthe ordinal (or dichotomous) variable.
Polychoric correlation: Assumes that both of the ordinalvariables are measurements from underlying continuousnormal variables. Popular in SEM (SAS FREQ with optionPLCORR2).
Tetrachoric correlation: Polychoric correlation withdichotomous variables (SAS FREQ with option PLCORR).
2For an example, see e.g. UCLA: Statistical Computing Group,http://www.ats.ucla.edu/stat/sas/faq/tetrac.htm
Seppo Pynnonen Latent Structural Equation Modeling
Introduction
Covariance and Correlation
The general idea behind polyserial and polychoric (and tetrachoric)correlations is that the ordinal variable z may be regarded as acrude measurement of the underlying unobservable continuousvariable z∗.
For example, a four point ordinal scale may be conceived as:
if z∗ ≤ τ1, z is scored 1,
if τ1 < z∗ ≤ τ2, z is scored , 2,
if τ2 < z∗ ≤ τ3, z is scored , 3,
if τ3 ≤ z∗, z is scored 4,
where τ1 < τ2 < τ3 are threshold values of z∗.
Seppo Pynnonen Latent Structural Equation Modeling
Introduction
Covariance and Correlation
Remark 1.1: Ordinal measurements do not have scale. As a consequence
the underlying continuous variable is fixed to have mean zero and unit
variance.
Seppo Pynnonen Latent Structural Equation Modeling
Introduction
Covariance and Correlation
Example 1
Political efficacy data. Dataa are responses to the following statements (1 = agreestrongly, 2 = agree, 3 = disagree, 4 = disagree strongly)
nosay: People like me have no say in what government does
voting: Voting is the only way that people like me can have any say about how thegovernment runs things
complex: Sometimes politics and government seem so complicated that a person like mecannot really understand what is really going on
nocare: I don’t think that public officials care much about what people like me think
touch: Generally speaking, those we elect to Congress in Washington lose touch withthe people pretty quickly
interest: Parties are only interested in people’s vote but not in their opinions.
aAish, A.M. and K.G. Jreskog (1990). A panel model for political efficay and responsiveness: an application of
LISREL 7 with weighted least squares. Quality and Quality 24, 405–426.
Seppo Pynnonen Latent Structural Equation Modeling
Introduction
Covariance and Correlation
The data looks like (n = 312):
nosay voting complex nocare touch interest
2 2 1 1 1 1
2 3 3 3 2 3
3 2 2 3 3 3
3 3 2 3 2 3
. . .
Polychoric correlations:
===========================================================
nosay voting complex nocare touch interest
-----------------------------------------------------------
nosay 1.000
voting 0.295 1.000
complex 0.274 0.300 1.000
nocare 0.466 0.279 0.449 1.000
touch 0.389 0.181 0.295 0.672 1.000
interest 0.421 0.238 0.363 0.702 0.639 1.000
==========================================================
Seppo Pynnonen Latent Structural Equation Modeling
Introduction
Covariance and Correlation
Spearman rank correlations:
========================================================
nosay voting complex nocare touch interest
--------------------------------------------------------
nosay 1.000
voting 0.310 1.000
complex 0.283 0.255 1.000
nocare 0.434 0.275 0.374 1.000
touch 0.337 0.183 0.261 0.566 1.000
interest 0.364 0.247 0.325 0.623 0.537 1.000
=======================================================
Pearson product moment correlations:
========================================================
nosay voting complex nocare touch interest
--------------------------------------------------------
nosay 1.000
voting 0.276 1.000
complex 0.221 0.254 1.000
nocare 0.413 0.250 0.387 1.000
touch 0.298 0.153 0.247 0.578 1.000
interest 0.353 0.216 0.320 0.620 0.550 1.000
=======================================================
More technical details related to polychoric etc correlations can be found
at: http://www.ssicentral.com/lisrel/techdocs/orfiml.pdf
Seppo Pynnonen Latent Structural Equation Modeling
Introduction
Missing Data
Contents
1 Introduction
Measurement Models
Covariance and Correlation
Missing Data
On the Sample Size for SEM
Seppo Pynnonen Latent Structural Equation Modeling
Introduction
Missing Data
Imputation
In multivariate data sets list-wise deletion of cases with missing valuesmay result discarding a large proportion of data.
Replacing missing values is called imputation.
SAS (PROC MI) offers several options for imputation:3
Mean imputation: Substitute the mean for the missing value
Regression imputation: Substitute predicted value for the missing value
Matching response: Match variables with incomplete data to variableswith complete data to determine missing values.
Sophisticated methods: EM, MCMC.
3http://support.sas.com/documentation/cdl/en/statug/68162/HTML/default/viewer.htm#statug mi gettingstarted.htm”
Seppo Pynnonen Latent Structural Equation Modeling
Introduction
Missing Data
Example 1.2: In the previous example there are missing values as follows:
Number of Missing Values per Variable
=======================================================
nosay voting complex nocare touch interest
-------------------------------------------------------
5 8 3 7 14 14
=======================================================
Seppo Pynnonen Latent Structural Equation Modeling
Introduction
Missing Data
Using MCMC imputation gives the polychoric correlations:
Polychoric correlation after MCMC imputation
of missing observations
===========================================================
nosay voting complex nocare touch interest
----------------------------------------------------------
nosay 1.000
voting 0.294 1.000
complex 0.279 0.295 1.000
nocare 0.475 0.265 0.464 1.000
touch 0.374 0.176 0.310 0.681 1.000
interest 0.424 0.266 0.383 0.714 0.650 1.000
===========================================================
In this case the correlations do not change much.
Generally when estimating SEM it is highly recommended to compare the
results before and after the imputation of missing values!
Seppo Pynnonen Latent Structural Equation Modeling
Introduction
Missing Data
Model Based Method
The specified model as a starting point.
Partition data into subset with the same pattern of missingobservations
Estimate relevant statistics from the subset
Estimate the model parameters combining the subsetinformation
In SAS SEM this approach can be utilized by selecting the MLmethod (called also FIML, Full Informaton Maximum Likelihood).
Seppo Pynnonen Latent Structural Equation Modeling
Introduction
On the Sample Size for SEM
Contents
1 Introduction
Measurement Models
Covariance and Correlation
Missing Data
On the Sample Size for SEM
Seppo Pynnonen Latent Structural Equation Modeling
Introduction
On the Sample Size for SEM
Sample Size
Typically SEM requires large samples!
What is large enough?
No single answer.
More complex model, more data4
Low score reliability, larger sample needed
Journals routinely reject papers if N < 2005
Bottom line: Analyzing small samples with SEM is problematic.
4e.g. Wolf E.J., et al., 2013, Sample size requirements for structural equation
models: An evaluation of bower, bias, and solution property, Educational and
Psychological Measurement 73, 913–934.5Barret P., 2007, Structural equation modeling, Personality and Individual
Differences 42, 815–824.
Seppo Pynnonen Latent Structural Equation Modeling