latent structural equation modelinglipas.uwasa.fi/~sjp/teaching/sem/lectures/semc1.pdfaish, a.m. and...

Latent Structural Equation Modeling

Seppo Pynnonen

Department of Mathematics and Statistics, University of Vaasa, Finland

As of March 7, 2016Seppo Pynnonen Latent Structural Equation Modeling

Introduction

Part I

Introduction

Seppo Pynnonen Latent Structural Equation Modeling

Introduction

Contents

1 Introduction

Measurement Models

Covariance and Correlation

Missing Data

On the Sample Size for SEM


Introduction

Measurement Models

Contents

1 Introduction

Measurement Models


Missing Data



Introduction

Measurement Models

Latent and Observable Variables

Latent variables

phenomena of theoretical interest which cannot directlyobservedassessed by manifest measures that are observable.The device in the assessment is a measurement model.


Introduction

Measurement Models

Simple Measurement Model

y = η + ε, (1)

where y is the observed value, η is the ”true” value and ε is ameasurement error.

Schematically this is typically depicted as

Figure : Simple measurement model

��

η y- ε�


Introduction

Measurement Models

Multiple Indicators

Typically one manifest variable (indicator) is not sufficient to fullyreflect the underlying unobservable construct.

If indicators y1, y2, and y3 measure the same latent construct, theabove model can be generalized to

��η

y1

y2

y3

��

�*

-HHH

Hj

λ1

λ2

λ3

ε1

ε2

ε3

�

�

�

Figure: Measurement model with three indicators.


Introduction

Measurement Models

Mathematical representation

Mathematically:

y1 = λ1η + ε1

y2 = λ2η + ε2 (2)

y3 = λ3η + ε3

The λ-coefficients are called loadings.

The measurement errors, εi , are assumed to be independent fromeach other and in particular form η.


Introduction

Measurement Models

Coefficient of Reliability

In terms of model (2) the variances, σ2i = var[yi ], of the observable

variables becomeσ2

i = λ2i σ

2η + σ2

εi, (3)

where σ2η = var[η] and σ2

εi= var[εi ].

A coefficient of reliability of yi in measuring η is

ρ2i =

λ2i σ

2η

σ2i

= 1−σ2εi

σ2i

(4)

(c.f. the coefficient of determination, R-square, in regression).


Introduction

Measurement Models

Structural Model

The ultimate interest in latent variable analysis usually is the in therelationships between different constructs.

A measurement model describes relationships between a constructand its measures (items, indicators).

A structural model specifies relationships between differentconstructs.

A proper specification of the measurement model is necessarybefore meaningful empirical analysis of the structural model.


Introduction

Measurement Models

Reflective and Formative

Reflective and formative measurements.

The distinction is in the direction of the effect.

Reflective measurement

construct → measurement

y = λη + ε (5)

Formative measurement

construct ← measurement

η = γy + ζ (6)

Reflective: Dependent variable observable.

Formative: Dependent variable latent.Seppo Pynnonen Latent Structural Equation Modeling

Introduction

Measurement Models


Figure: Measurement models (Source: Diamantopoulos et al. 2008, JBR)


Introduction

Measurement Models


Reflective model:

A change in the latent variable causes simultaneous variation in the

measurements.

All measures (indicators) must be positively correlated (Bollen, 1984,

Quality and Quantity, 377–385).

Typical example: ability

Formative model:

Indicators determine the latent variable

Typical examples: Socio-economic status, quality of life, careersuccess (for more details, Diamantapoulos, Riefler, and Roth 2008,JBR, 1203–1218).


Introduction


Contents

1 Introduction

Measurement Models


Missing Data



Introduction


Modeling Relationships

Modeling of the relationships is based on dependencies between the

variables. The dependencies are measured in terms of covariances or

correlations.


Introduction


Correlations

Table : Correlation types

Correlation Level of measurementPearson product-moment Interval–Interval

Spearman rank, Kendall’s tau Ordinal–Ordinal

Phi Nominal–Nominal

Point biserial Interval–Dichotomous

Rank biserial Ordinal–Dichotomous

Biserial Interval–Dichotomouswith undelying continuity

Polyserial Interval–ordinalwith underlying continuity

Tetrachoric Dichotomous–Dichotomous,with underlying continuityin both variables

Polychoric Ordinal–Ordinal,with undelying continuityin both variables


Introduction


Correlations

Pearson product-moment: the ”usual” correlation.

Rank correlation (Spearman’s rho) is factually the usualcorrelation between ranked observations (differences become ifthere are ties in the ranks).

Kendall’s tau: Due to computerization has become preferredover the Spearman’s rank correlation.

Point biserial correlation: is again the same as the usualcorrelation when one of the variables is dichotomous (assumesvalues 0 and 1), (no special program is needed in calculations).

Biserial correlation: Rarely used, use polyserial instead.


Introduction


Correlations

Polyserial correlation: Assumes bivariate normality betweenthe interval variable and the underlying continuous variable ofthe ordinal (or dichotomous) variable.

Polychoric correlation: Assumes that both of the ordinalvariables are measurements from underlying continuousnormal variables. Popular in SEM (SAS FREQ with optionPLCORR2).

Tetrachoric correlation: Polychoric correlation withdichotomous variables (SAS FREQ with option PLCORR).

2For an example, see e.g. UCLA: Statistical Computing Group,http://www.ats.ucla.edu/stat/sas/faq/tetrac.htm


Introduction


The general idea behind polyserial and polychoric (and tetrachoric)correlations is that the ordinal variable z may be regarded as acrude measurement of the underlying unobservable continuousvariable z∗.

For example, a four point ordinal scale may be conceived as:

if z∗ ≤ τ1, z is scored 1,

if τ1 < z∗ ≤ τ2, z is scored , 2,

if τ2 < z∗ ≤ τ3, z is scored , 3,

if τ3 ≤ z∗, z is scored 4,

where τ1 < τ2 < τ3 are threshold values of z∗.


Introduction


Remark 1.1: Ordinal measurements do not have scale. As a consequence

the underlying continuous variable is fixed to have mean zero and unit

variance.


Introduction


Example 1

Political efficacy data. Dataa are responses to the following statements (1 = agreestrongly, 2 = agree, 3 = disagree, 4 = disagree strongly)

nosay: People like me have no say in what government does

voting: Voting is the only way that people like me can have any say about how thegovernment runs things

complex: Sometimes politics and government seem so complicated that a person like mecannot really understand what is really going on

nocare: I don’t think that public officials care much about what people like me think

touch: Generally speaking, those we elect to Congress in Washington lose touch withthe people pretty quickly

interest: Parties are only interested in people’s vote but not in their opinions.

aAish, A.M. and K.G. Jreskog (1990). A panel model for political efficay and responsiveness: an application of

LISREL 7 with weighted least squares. Quality and Quality 24, 405–426.


Introduction


The data looks like (n = 312):

nosay voting complex nocare touch interest

2 2 1 1 1 1

2 3 3 3 2 3

3 2 2 3 3 3

3 3 2 3 2 3

. . .

Polychoric correlations:

===========================================================


-----------------------------------------------------------

nosay 1.000

voting 0.295 1.000

complex 0.274 0.300 1.000

nocare 0.466 0.279 0.449 1.000

touch 0.389 0.181 0.295 0.672 1.000

interest 0.421 0.238 0.363 0.702 0.639 1.000

==========================================================


Introduction


Spearman rank correlations:

========================================================


--------------------------------------------------------

nosay 1.000

voting 0.310 1.000

complex 0.283 0.255 1.000

nocare 0.434 0.275 0.374 1.000

touch 0.337 0.183 0.261 0.566 1.000

interest 0.364 0.247 0.325 0.623 0.537 1.000

=======================================================

Pearson product moment correlations:

========================================================


--------------------------------------------------------

nosay 1.000

voting 0.276 1.000

complex 0.221 0.254 1.000

nocare 0.413 0.250 0.387 1.000

touch 0.298 0.153 0.247 0.578 1.000

interest 0.353 0.216 0.320 0.620 0.550 1.000

=======================================================

More technical details related to polychoric etc correlations can be found

at: http://www.ssicentral.com/lisrel/techdocs/orfiml.pdf


Introduction

Missing Data

Contents

1 Introduction

Measurement Models


Missing Data



Introduction

Missing Data

Imputation

In multivariate data sets list-wise deletion of cases with missing valuesmay result discarding a large proportion of data.

Replacing missing values is called imputation.

SAS (PROC MI) offers several options for imputation:3

Mean imputation: Substitute the mean for the missing value

Regression imputation: Substitute predicted value for the missing value

Matching response: Match variables with incomplete data to variableswith complete data to determine missing values.

Sophisticated methods: EM, MCMC.

3http://support.sas.com/documentation/cdl/en/statug/68162/HTML/default/viewer.htm#statug mi gettingstarted.htm”


Introduction

Missing Data

Example 1.2: In the previous example there are missing values as follows:

Number of Missing Values per Variable

=======================================================


-------------------------------------------------------

5 8 3 7 14 14

=======================================================


Introduction

Missing Data

Using MCMC imputation gives the polychoric correlations:

Polychoric correlation after MCMC imputation

of missing observations

===========================================================


----------------------------------------------------------

nosay 1.000

voting 0.294 1.000

complex 0.279 0.295 1.000

nocare 0.475 0.265 0.464 1.000

touch 0.374 0.176 0.310 0.681 1.000

interest 0.424 0.266 0.383 0.714 0.650 1.000

===========================================================

In this case the correlations do not change much.

Generally when estimating SEM it is highly recommended to compare the

results before and after the imputation of missing values!


Introduction

Missing Data

Model Based Method

The specified model as a starting point.

Partition data into subset with the same pattern of missingobservations

Estimate relevant statistics from the subset

Estimate the model parameters combining the subsetinformation

In SAS SEM this approach can be utilized by selecting the MLmethod (called also FIML, Full Informaton Maximum Likelihood).


Introduction


Contents

1 Introduction

Measurement Models


Missing Data



Introduction


Sample Size

Typically SEM requires large samples!

What is large enough?

No single answer.

More complex model, more data4

Low score reliability, larger sample needed

Journals routinely reject papers if N < 2005

Bottom line: Analyzing small samples with SEM is problematic.

4e.g. Wolf E.J., et al., 2013, Sample size requirements for structural equation

models: An evaluation of bower, bias, and solution property, Educational and

Psychological Measurement 73, 913–934.5Barret P., 2007, Structural equation modeling, Personality and Individual

Differences 42, 815–824.


latent structural equation modelinglipas.uwasa.fi/~sjp/teaching/sem/lectures/semc1.pdfaish, a.m. and...

Documents