otis dudley duncan memorial lecture: the resurrection of duncanism
DESCRIPTION
Otis Dudley Duncan Memorial Lecture: THE RESURRECTION OF DUNCANISM. Judea Pearl University of California Los Angeles (www.cs.ucla.edu/~judea/). OUTLINE. Duncanism = Causally Assertive SEM History: Oppression, Distortion, and Resurrection The Old-New Logic of SEM New Tools - PowerPoint PPT PresentationTRANSCRIPT
11
Otis Dudley DuncanMemorial Lecture:
THE RESURRECTION OF DUNCANISM
Judea PearlUniversity of California
Los Angeles(www.cs.ucla.edu/~judea/)
22
OUTLINE
1. Duncanism = Causally Assertive SEM
2. History: Oppression, Distortion, and Resurrection
3. The Old-New Logic of SEM
4. New Tools
4.1 Local testing
4.2 Non-parametric identification
4.3 Logic of counterfactuals
4.3 Non linear mediation analysis
33
Duncan book
44
Duncan book
55
FINDING INSTRUMENTAL VARIABLES
Can you find a n instrument for identifying b34? (Duncan, 1975)
By inspection: X2 d-separate X1 from V Therefore X1 is a valid instrument
66
DUNCANISM = ASSERTIVE SEM
• [y = bx + u] may be read as "a change in x or u produces a change in y” ... or “x and u are the causes of y” (Duncan, 1975, p. 1)
• “[the disturbance] u stands for all other sources of variation in y” (ibid)
• “doing the model consists largely in thinking about what kind of model one wants and can justify.” (ibid, p. viii)
• Assuming a model for sake of argument, we can express its properties in terms of correlations and (sometimes) find one or more conditions that must hold if the model is true“ (ibid, p. 20).
SEM – A tool for deriving causal conclusion from data and assumptions:
77
HISTORY: BIRTH, OPPRESSION, DISTORTION, RESURRECTION
•Birth and Development (1920 - 1980)Sewell Wright (1921), Haavelmo (1943), Simon (1950), Marschak (1950), Koopmans (1953), Wold and Strotz (1963), Goldberger (1973), Blalock (1964), and Duncan (1969, 1975)
•The regressional assault (1970-1990)Causality is meaningless, therefore, to be meaningful, SEM must be a regression technique, not “causal modeling.” Richard (1980) Cliff (1983), Dempster (1990), Wermuth (1992), and Muthen (1987)
•The Potential-outcome assault (1985-present) Causality is meaningful but, since SEM is a regression technique, it could not possibly have causal interpretation. Rubin (2004, 2010), Holland (1986) Imbens (2009), andSobel (1996, 2008)
88
REGRESSION VS. STRUCTURAL EQUATIONS(THE CONFUSION OF THE CENTURY)
Regression (claimless, nonfalsifiable): Y = ax + Y
Structural (empirical, falsifiable): Y = bx + uY
Claim: (regardless of distributions): E(Y | do(x)) = E(Y | do(x), do(z)) = bx
Q. When is b a partial regression? b = YX •
A. Shown in the diagram, Slide 40.
The mothers of all questions:Q. When would b equal a?A. When (uY X), read from the diagram
99
THE POTENTIAL-OUTCOME ASSAULT (1985-PRESENT)
•“I am speaking, of course, about the equation:What does it mean? The only meaning I have ever determined for such an equation is that it is a shorthand way of describing the conditional distribution of {y} given {x}.” (Holland 1995)
•“The use of complicated causal-modeling software [read SEM] rarely yields any results that have any interpretation as causal effects.” (Wilkinson and Task Force 1999 on Statistical Methods in Psychology Journals: Guidelines and Explanations)
}.{ bxay
1010
THE POTENTIAL-OUTCOME ASSAULT (1985-PRESENT) (Cont)
•In general (even in randomized studies), the structural and causal parameters are not equal, implying that the structural parameters should not be interpreted as effect.” (Sobel 2008)
•“Using the observed outcome notation entangles the science...Bad! Yet this is exactly what regression approaches, path analyses, directed acyclic graphs, and so forth essentially compel one to do.” (Rubin 1010)
1111
WHY SEM INTERPRETATION IS“SELF-CONTRADICTORY”
D. Freedman JES (1987), p. 114, Fig. 3
"Now try the direct effect of Z on Y: We intervene by fixing W and X but increasing Z by one unit; this should increase Y by d units. However, this hypothetical intervention is self-contradictory,because fixing W and increasing Z causes an increase in X.
The oversight: Fixing X DISABLES equation (7.1);
)2.7(
)1.7(
VdZcXY
UbWaZX
1212
SEM REACTION TOFREEDMAN CRITICS
•It would be very healthy if more researchers abandoned thinking of and using terms such as cause and effect (Muthen, 1987).
•“Causal modeling” is an outdated misnomer (Kelloway, 1998).
•Causality-free, politically-safe vocabulary: “covariance structure” “regression analysis,” or “simultaneous equations.”
•[Causal Modeling] may be somewhat dated, however, as it seems to appear less often in the literature nowadays”(Kline, 2004, p. 9)
Total Surrender:
1313
SEM REACTION TO THE STRUCTURE-PHOBIC ASSAULT
Galles, Pearl, Halpern (1998 - Logical equivalence)
Heckman-Sobel
Morgan Winship (1997)
Gelman Blog
NONE TO SPEAK OF!
1414
THE RESURRECTION
Why non-parametric perspective?
What are the parameters all about and why we labor to identify them?Can we do without them?
Consider:
Only he who lost the parameters and needs to find substitutes can begin to ask:Do I really need them? What do they really mean? What role do the play?
uxY
),( uxfY
1515
CAUSAL MODEL
(MA)
THE LOGIC OF SEM
A - CAUSAL ASSUMPTIONS
Q Queries of interest
Q(P) - Identified estimands
Data (D)
Q - Estimates of Q(P)
Causal inference
T(MA) - Testable implications
Statistical inference
Goodness of fit
Model testingConditional claims
),|( ADQQ )(Tg
A* - Logicalimplications of A
CAUSAL MODEL
(MA)
1616
TRADITIONAL STATISTICALINFERENCE PARADIGM
Data
Inference
Q(P)(Aspects of P)
PJoint
Distribution
e.g.,Infer whether customers who bought product Awould also buy product B.Q = P(B | A)
1717
What happens when P changes?e.g.,Infer whether customers who bought product Awould still buy A if we were to double the price.
FROM STATISTICAL TO CAUSAL ANALYSIS:1. THE DIFFERENCES
Probability and statistics deal with static relations
Data
Inference
Q(P)(Aspects of P)
P Joint
Distribution
PJoint
Distribution
change
1818
FROM STATISTICAL TO CAUSAL ANALYSIS:1. THE DIFFERENCES
Note: P (v) P (v | price = 2)
P does not tell us how it ought to changeCausal knowledge: what remains invariant
What remains invariant when P changes say, to satisfy P (price=2)=1
Data
Inference
Q(P)(Aspects of P)
P Joint
Distribution
PJoint
Distribution
change
1919
FROM STATISTICAL TO CAUSAL ANALYSIS:1. THE DIFFERENCES (CONT)
CAUSALSpurious correlationRandomization / InterventionConfounding / EffectInstrumental variableExogeneity / IgnorabilityMediation
STATISTICALRegressionAssociation / Independence“Controlling for” / ConditioningOdd and risk ratiosCollapsibility / Granger causalityPropensity score
1. Causal and statistical concepts do not mix.
2.
3.
4.
2020
CAUSALSpurious correlationRandomization / Intervention Confounding / EffectInstrumental variable Exogeneity / IgnorabilityMediation
STATISTICALRegressionAssociation / Independence“Controlling for” / ConditioningOdd and risk ratiosCollapsibility / Granger causality Propensity score
1. Causal and statistical concepts do not mix.
4.
3. Causal assumptions cannot be expressed in the mathematical language of standard statistics.
FROM STATISTICAL TO CAUSAL ANALYSIS:2. MENTAL BARRIERS
2. No causes in – no causes out (Cartwright, 1989)
statistical assumptions + datacausal assumptions causal conclusions }
2121
4. Non-standard mathematics:a) Structural equation models (Wright, 1920; Simon, 1960)
b) Counterfactuals (Neyman-Rubin (Yx), Lewis (x Y))
CAUSALSpurious correlationRandomization / Intervention Confounding / EffectInstrumental variable Exogeneity / IgnorabilityMediation
STATISTICALRegressionAssociation / Independence“Controlling for” / ConditioningOdd and risk ratiosCollapsibility / Granger causalityPropensity score
1. Causal and statistical concepts do not mix.
3. Causal assumptions cannot be expressed in the mathematical language of standard statistics.
FROM STATISTICAL TO CAUSAL ANALYSIS:2. MENTAL BARRIERS
2. No causes in – no causes out (Cartwright, 1989)
statistical assumptions + datacausal assumptions causal conclusions }
2222
Data
Inference
Q(M)(Aspects of M)
Data Generating
Model
M – Invariant strategy (mechanism, recipe, law, protocol) by which Nature assigns values to variables in the analysis.
JointDistribution
THE STRUCTURAL MODELPARADIGM
M
“Think Nature, not experiment!”•
2323
STRUCTURALCAUSAL MODELS
Definition: A structural causal model is a 4-tupleV,U, F, P(u), where• V = {V1,...,Vn} are endogenous variables• U = {U1,...,Um} are background variables• F = {f1,..., fn} are functions determining V,
vi = fi(v, u)• P(u) is a distribution over UP(u) and F induce a distribution P(v) over observable variables
Yuxy e.g.,
2424
CAUSAL MODELS AND COUNTERFACTUALS
Definition: The sentence: “Y would be y (in unit u), had X been x,”
denoted Yx(u) = y, means:The solution for Y in a mutilated model Mx, (i.e., the equations
for X replaced by X = x) with input U=u, is equal to y.
U
X (u) Y (u)
M
U
X = x YX (u)
Mx
2525
CAUSAL MODELS AND COUNTERFACTUALS
Definition: The sentence: “Y would be y (in unit u), had X been x,”
denoted Yx(u) = y, means:The solution for Y in a mutilated model Mx, (i.e., the equations
for X replaced by X = x) with input U=u, is equal to y.
)()( uYuY xMx
The Fundamental Equation of Counterfactuals:
2626
),|(),|'(
)()()|(
')(':'
)(:
yxuPyxyYPN
uPyYPyP
yuxYux
yuxYux
In particular:
)(xdo
CAUSAL MODELS AND COUNTERFACTUALS
Definition: The sentence: “Y would be y (in situation u), had X been x,”
denoted Yx(u) = y, means:The solution for Y in a mutilated model Mx, (i.e., the equations
for X replaced by X = x) with input U=u, is equal to y.•
)(),()(,)(:
uPzZyYPzuZyuYu
wxwx
Joint probabilities of counterfactuals:
2727
3-LEVEL HIERARCHY OF CAUSAL MODELS
1. Probabilistic Knowledge P(y | x)Bayesian networks, graphical models
2. Interventional Knowledge P(y | do(x))Causal Bayesian Networks (CBN) (Agnostic graphs, manipulation graphs)
3. Counterfactual Knowledge P(Yx = y,Yx =y)Structural equation models, physics, functional graphs, “Treatment assignment mechanism”
2828
TWO PARADIGMS FOR CAUSAL INFERENCE
Observed: P(X, Y, Z,...)Conclusions needed: P(Yx=y), P(Xy=x | Z=z)...
How do we connect observables, X,Y,Z,…to counterfactuals Yx, Xz, Zy,… ?
N-R modelCounterfactuals areprimitives, new variables
Super-distribution
Structural modelCounterfactuals are derived quantities
Subscripts modify the model and distribution )()( yYPyYP xMx ,...,,,
,...),,...,,(*
yx
zxZYZYX
XYYXP
constrain
2929
ARE THE TWO PARADIGMS EQUIVALENT?
• Yes (Galles and Pearl, 1998; Halpern 1998)
• In the N-R paradigm, Yx is defined by consistency:
• In SCM, consistency is a theorem.
• Moreover, a theorem in one approach is a theorem in the other.
01 )1( YxxYY
3030
Define:
Assume:
Identify:
Estimate:
Test:
THE FIVE NECESSARY STEPSOF CAUSAL ANALYSIS
Express the target quantity Q as a function Q(M) that can be computed from any model M.
Formulate causal assumptions A using some formal language.
Determine if Q is identifiable given A.
Estimate Q if it is identifiable; approximate it, if it is not.
Test the testable implications of A (if any).
3131
THE FIVE NECESSARY STEPS FOR EFFECT ESTIMATION
))(|())(|( 01 xdoYExdoYE ATE
Define:
Assume:
Identify:
Estimate:
Test:
Express the target quantity Q as a function Q(M) that can be computed from any model M.
Formulate causal assumptions A using some formal language.
Determine if Q is identifiable given A.
Estimate Q if it is identifiable; approximate it, if it is not.
Test the testable implications of A (if any).
3232
FORMULATING ASSUMPTIONSTHREE LANGUAGES
1. English: Smoking (X), Cancer (Y), Tar (Z), Genotypes (U)
Not too friendly:consistent? complete? redundant? arguable?
ZX Y
3. Structural:
},{
),()(
),()()()(
),()(
XYZ
uYuY
uXuXuXuX
uZuZ
zx
zxz
zzyy
yxx
2. Counterfactuals:
3333
Define:
Assume:
Identify:
Estimate:
IDENTIFYING CAUSAL EFFECTSIN POTENTIAL-OUTCOME FRAMEWORK
Express the target quantity Q as a counterfactual formula, e.g., E(Y(1) – Y(0))
Formulate causal assumptions using the distribution:
Determine if Q is identifiable using P* andY=x Y (1) + (1 – x) Y (0).
Estimate Q if it is identifiable; approximate it, if it is not.
))0(),1(,,|(* YYZYXPP
3434
GRAPHICAL – COUNTERFACTUALS SYMBIOSIS
Every causal graph expresses counterfactuals assumptions, e.g., X Y Z
consistent, and readable from the graph.
• Express assumption in graphs• Derive estimands by graphical or algebraic
methods
)()(, uYuY xzx 1. Missing arrows Y Z
2. Missing arcs Y Z yx ZY
3535
IDENTIFICATION IN SCM
Find the effect of X on Y, P(y|do(x)), given the
causal assumptions shown in G, where Z1,..., Zk are auxiliary variables.
Z6
Z3
Z2
Z5
Z1
X Y
Z4
G
Can P(y|do(x)) be estimated if only a subset, Z, can be measured?
3636
ELIMINATING CONFOUNDING BIASTHE BACK-DOOR CRITERION
P(y | do(x)) is estimable if there is a set Z ofvariables such that Z d-separates X from Y in Gx.
Z6
Z3
Z2
Z5
Z1
X Y
Z4
Z6
Z3
Z2
Z5
Z1
X Y
Z4
Z
Gx G
• Moreover,
(“adjusting” for Z) Ignorability
z z zxP
zyxPzPzxyPxdoyP
)|(),,(
)(),|())(|(
3737
IDENTIFYING TESTABLE IMPLICATIONS
Assumptions advertized in the missing edges are Z1 – Z2, Z1 – Y
W3
W1
Z2
W2
Z1
X Y
Z3
Implying:
},{|
},,{|
312
3211
21
ZZXZ
ZZXYZ
ZZ
''
'
341312
3423211
21
ZcZcXcZ
ZbZbXbYbZ
aZZ
The missing edges imply: z = 0, b1 = 0, and c1 = 0.Software routines for automatic detection of all such tests reported in Kyono (2010)
3838
SEPARATION EQUIVALENCE MODEL EQUIVALENCE
3939
FINDING INSTRUMENTAL VARIABLES
Can you find a n instrument for identifying b34? (Duncan, 1975)
By inspection: X2 d-separate X1 from V Therefore X1 is a valid instrument
4040
W1 W4
X YV2V1
W2 W3
Z T
CONFOUNDING EQUIVALENCEWHEN TWO MEASUREMENTS ARE
EQUALLY VALUABLE
L
Z T?
4141
CONFOUNDING EQUIVALENCEWHEN TWO MEASUREMENTS ARE
EQUALLY VALUABLE
Definition:T and Z are c-equivalent if
Definition (Markov boundary):
Markov boundary Sm of S (relative to X) is the minimal
subset of S that d-separates X from all other members of S. Theorem (Pearl and Paz, 2009) Z and T are c-equivalent iff
1. Zm=Tm, or
2. Z and T are admissible (i.e., satisfy the back-door condition)
t z
yxzPzxyPtPtxyP ,)(),|()(),|(
4242
W1 W4
X YV2V1
W2 W3
Z T
CONFOUNDING EQUIVALENCEWHEN TWO MEASUREMENTS ARE
EQUALLY VALUABLE
Z T
L
4343
W1 W4
X YV2V1
W2 W3
CONFOUNDING EQUIVALENCEWHEN TWO MEASUREMENTS ARE
EQUALLY VALUABLE
Z T
Z T
4444
W1 W4
X YV2V1
W2 W3Z T
CONFOUNDING EQUIVALENCEWHEN TWO MEASUREMENTS ARE
EQUALLY VALUABLE
Z T
4545
BIAS AMPLIFICATIONBY INSTRUMENTAL VARIABLES
W2 W1
X Y
U
W1 {W1, W2}
• Adding W2 to Propensity Score increases bias (if such exists) (Wooldridge, 2009)
• In linear systems – always• In non-linear systems – almost always (Pearl, 2010)• Outcome predictors are safer than treatment predictors
4646
EFFECT DECOMPOSITION(direct vs. indirect effects)
1. Why decompose effects?
2. What is the definition of direct and indirect effects?
3. What are the policy implications of direct and indirect effects?
4. When can direct and indirect effect be estimated consistently from experimental and nonexperimental data?
4747
WHY DECOMPOSE EFFECTS?
1. To understand how Nature works
2. To comply with legal requirements
3. To predict the effects of new type of interventions:
Signal routing, rather than variable fixing
4848
X Z
Y
LEGAL IMPLICATIONSOF DIRECT EFFECT
What is the direct effect of X on Y ?
(averaged over z)
))(),(())(),( 01 zdoxdoYEzdoxdoYE ||(
(Qualifications)
(Hiring)
(Gender)
Can data prove an employer guilty of hiring discrimination?
Adjust for Z? No! No!
4949
z = f (x, u)y = g (x, z, u)
X Z
Y
NATURAL INTERPRETATION OFAVERAGE DIRECT EFFECTS
Natural Direct Effect of X on Y:The expected change in Y, when we change X from x0 to x1 and, for each u, we keep Z constant at whatever value it attained before the change.
In linear models, DE = Controlled Direct Effect
][001 xZx YYE
x
);,( 10 YxxDE
Robins and Greenland (1992) – “Pure”
)( 01 xx
5050
z = f (x, u)y = g (x, z, u)
X Z
Y
DEFINITION OFINDIRECT EFFECTS
Indirect Effect of X on Y:The expected change in Y when we keep X constant, say at x0, and let Z change to whatever value it would have attained had X changed to x1.
In linear models, IE = TE - DE
][010 xZx YYE
x
);,( 10 YxxIE
5151
Z
m2
X Y
m1
)(revIEDETE
DETEmmIE
DE
mmTE
21
21
mediation
disablingby prevented Effect
alone mediationby sustained Effect
DETE
IE
IEDETE WHY
IErevIE )(
Disabling mediation
Disabling direct path
DE
TE - DE
TE
IE
In linear systems
Is NOT equal to:
5252
POLICY IMPLICATIONS OF INDIRECT EFFECTS
f
GENDER QUALIFICATION
HIRING
What is the indirect effect of X on Y?
The effect of Gender on Hiring if sex discriminationis eliminated.
X Z
Y
IGNORE
Deactivating a link – a new type of intervention
5353
1. The natural direct and indirect effects are identifiable in Markovian models (no confounding),
2. And are given by:
3. Applicable to linear and non-linear models, continuous and discrete variables, regardless of distributional form.
MEDIATION FORMULAS
z
zxdozPxdozPzxdoYEIE
xdozPzxdoYEzxdoYEDE
))](|())(|())[,(|(
)).(|())],(|()),(|([
010
001
5454
MEDIATION FORMULASIN UNCONFOUNDED MODELS
X
Z
Y
)|()|(
])|()|()[,|([
)|()],|(),|([
01
010
001
xYExYETE
xzPxzPzxYEIE
xzPzxYEzxYEDE
z
z
mediation to owed responses of Fraction
mediationby explained responses of Fraction
DETE
IE
5555
COMPUTING THE MEDIATION FORMULA
X
Z
Y
))((
)()1)((
000101
0011100010gghhIE
hgghggDE
XX ZZ YY EE((Y|x,zY|x,z))==gxz EE((Z|xZ|x))==hhxx
nn11 00 00 00
nn22 00 00 11
nn33 00 11 00
nn44 00 11 11
nn55 11 00 00
nn66 11 00 11
nn77 11 11 00
nn88 11 11 11
0021
2 gnn
n
0143
4 gnn
n
1065
6 gnn
n
1187
8 gnn
n
04321
43 hnnnn
nn
18765
87 hnnnn
nn
5656
RAMIFICATION OF THEMEDIATION FORMULA
• DE should be averaged over mediator levels,
IE should NOT be averaged over exposure levels.
• TE-DE need not equal IETE-DE = proportion for whom mediation is necessary
IE = proportion for whom mediation is sufficient
• TE-DE informs interventions on indirect pathways
IE informs intervention on direct pathways.
5757
W
MEASUREMENT BIAS ANDEFFECT RESTORATION
Unobserved Z
X Y
P(w|z)P(y | do(x)) is identifiable from
measurement of W, if P(w | z) is given (Selen, 1986; Greenland & Lash, 2008)
z
z
z
zyxPzwP
zyxPzyxwP
wzyxPwyxPzwPzyxwP
),,()|(
),,(),,|(
),,,(),,()|(),,|(
w
wyxPwzIzyxP ),,(),(),,(
(local independence)
5858
EFFECT RESTORATION IN BINARY MODELS
),|( 1 yxzP
),|( 1 yxwP
1
1 1
undefined undefined
)0|1(
)1|0(
ZWP
ZWP
)()(1
),|(1
1
)()(1
),|(1
1))(|(
00)0|()0,,(
11)1|()1,,(
wPxP
yxwP
wPxP
yxwPxdoyP
wxPwyxP
wxPwyxP
To cell (x,y,Z = 0)
To cell (x,y,Z = 1)
Weight distribution from cell (x,y,W = 1)
W
Z
X Y
5959
EFFECT RESTORATIONIN LINEAR MODELS
W
Z
X Y
c1 c2c3
c0
(a)
V
Z
X Y
c1 c2c3
c0
W
c4
(b)
)()()()()()()()(
0 WVcovXWcovXarvXVcovWVcovYWcovXVcovXYcov
c
zzxwxx
ywxwxy ckk
kc
2320
/
/
Correlated proxies (Cai & Kuroki, 2008)
6060
I TOLD YOU CAUSALITY IS SIMPLE
• Formal basis for causal and counterfactual
inference (complete)
• Unification of the graphical, potential-outcome
and structural equation approaches
• Friendly and formal solutions to
century-old problems and confusions.
He is wise who bases causal inference on an explicit causal structure that is defensible on scientific grounds.
(Aristotle 384-322 B.C.)
From Charlie Poole
CONCLUSIONS
6161
They will be answered
QUESTIONS???