![Page 1: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/1.jpg)
Single World Intervention Graphs (SWIGs):Unifying the Counterfactual and Graphical Approaches to Causality
Thomas Richardson
Department of Statistics
University of Washington
Joint work with James Robins (Harvard School of Public Health)
Therme Vals Causal Workshop
5 Aug 2013
![Page 2: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/2.jpg)
Outline
Brief review of counterfactualsA new unification of graphs and counterfactuals vianode-splitting
I Factorization and ModularityI Contrast with Twin Network approachI Some Examples and ExtensionsI Sequentially Randomized Experiments / Time Dependent
ConfoundingI Dynamic Regimes
Experimental Testability and Independence of Errors inNPSEMs
Thomas Richardson Therme Vals Workshop Slide 1
![Page 3: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/3.jpg)
Counterfactualsaka Potential Outcomes
Thomas Richardson Therme Vals Workshop Slide 2
![Page 4: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/4.jpg)
The potential outcomes framework: philosophy
Hume (1748) An Enquiry Concerning Human Understanding:
We may define a cause to be an object followed by another, andwhere all the objects, similar to the first, are followed by objectssimilar to the second, . . .
. . . where, if the first object had not been the second never hadexisted.
Thomas Richardson Therme Vals Workshop Slide 3
![Page 5: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/5.jpg)
The potential outcomes framework: crop trials
Jerzy Neyman (1923):To compare v varieties [on m plots] we will consider numbers:
U11 . . . U1m...
...Uv1 . . . Uvm
Here Uij is the crop yield that would be observed if variety i wereplanted in plot j.
Physical constraints only allow one variety to be planted in a givenplot in any given growning season.
Popularized by Rubin (1974); sometimes called the ‘Rubin causal model’.
Thomas Richardson Therme Vals Workshop Slide 4
![Page 6: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/6.jpg)
Potential outcomes with binary treatment
For binary treatment X and response Y, we define two potentialoutcome variables:
Y(x = 0): the value of Y that would be observed for a givenunit if assigned X = 0;
Y(x = 1): the value of Y that would be observed for a givenunit if assigned X = 1;
WIll also write these as Y(x0) and Y(x1).Implicit here is the assumption that these outcomes arewell-defined. Specifically:
I Only one version of treatment X = xI No interference between units (SUTVA).
Thomas Richardson Therme Vals Workshop Slide 5
![Page 7: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/7.jpg)
Potential Outcomes
Unit Potential Outcomes ObservedY(x = 0) Y(x = 1) X Y
1 0 12 0 13 0 04 1 15 1 0
Thomas Richardson Therme Vals Workshop Slide 6
![Page 8: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/8.jpg)
Drug Response ‘Types’:
In the simplest case where Y is a binary outcome we have thefollowing 4 types:
Y(x0) Y(x1) Name0 0 Never Recover0 1 Helped1 0 Hurt1 1 Always Recover
Thomas Richardson Therme Vals Workshop Slide 7
![Page 9: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/9.jpg)
Assignment to Treatments
Unit Potential Outcomes ObservedY(x = 0) Y(x = 1) X Y
1 0 1 12 0 1 03 0 0 14 1 1 15 1 0 0
Thomas Richardson Therme Vals Workshop Slide 8
![Page 10: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/10.jpg)
Observed Outcomes from Potential Outcomes
Unit Potential Outcomes ObservedY(x = 0) Y(x = 1) X Y
1 0 1 1 12 0 1 0 03 0 0 1 04 1 1 1 15 1 0 0 1
Thomas Richardson Therme Vals Workshop Slide 9
![Page 11: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/11.jpg)
Potential Outcomes and Missing Data
Unit Potential Outcomes ObservedY(x = 0) Y(x = 1) X Y
1 ? 1 1 12 0 ? 0 03 ? 0 1 04 ? 1 1 15 1 ? 0 1
Thomas Richardson Therme Vals Workshop Slide 10
![Page 12: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/12.jpg)
Average Causal Effect (ACE) of X on Y
ACE(X→ Y) ≡ E[Y(x1) − Y(x0)]
= p(Helped) − p(Hurt) ∈ [−1, 1]
Thus ACE(X→ Y) is the difference in % recovery ifeveryone treated (X = 1) vs. if noone treated (X = 0).
Thomas Richardson Therme Vals Workshop Slide 11
![Page 13: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/13.jpg)
Identification of the ACE under randomization
If X is assigned randomly then
X ⊥⊥ Y(x0) and X ⊥⊥ Y(x1) (1)
hence
E[Y(x1) − Y(x0)] = E[Y(x1)] − E[Y(x0)]
= E[Y(x1) | X = 1] − E[Y(x0) | X = 0]
= E[Y | X = 1] − E[Y | X = 0].
Thus if (1) holds then ACE(X→ Y) is identified from P(X, Y).
Thomas Richardson Therme Vals Workshop Slide 12
![Page 14: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/14.jpg)
Inference for the ACE without randomizationSuppose that we do not know that X ⊥⊥ Y(x0) and X ⊥⊥ Y(x1).What can be inferred?
X = 0 X = 1Placebo Drug
Y = 0 200 600Y = 1 800 400
What is:
The largest number of people who could be Helped?400 + 200
The smallest number of people who could be Hurt? 0
⇒ Max value of ACE: (200 + 400)/2000 − 0 = 0.3
Similar logic:
⇒ Min value of ACE: 0 − (600 + 800)/2000 = −0.7
Thomas Richardson Therme Vals Workshop Slide 13
![Page 15: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/15.jpg)
Inference for the ACE without randomization
Suppose that we do not know that X ⊥⊥ Y(x0) and X ⊥⊥ Y(x1).
General case:
−(P(x=0,y=1) + P(x=1,y=0)) 6 ACE(X→ Y)
ACE(X→ Y) 6 P(x=0,y=0) + P(x=1,y=1)
⇒ Bounds will always cross zero.
⇒ X ⊥⊥ Y(x0) and X ⊥⊥ Y(x1) essential for drawing non-trivialcausal inferences.
Thomas Richardson Therme Vals Workshop Slide 14
![Page 16: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/16.jpg)
Summary of Counterfactual Approach
In our observed data, for each unit one outcome will be‘actual’; the others will be ‘counterfactual’.
The potential outcome framework allowsCausation to be ‘reduced’ to Missing Data⇒ Conceptual progress!
The ACE is identified if X ⊥⊥ Y(xi) for all values xi.
Randomization of treatment assignment implies X ⊥⊥ Y(xi).
Ideas are central to Fisher’s Exact Test; also many parts ofexperimental design.
The framework is the basis of many practical causal dataanalyses published in Biostatistics, Econometrics andEpidemiology.
Thomas Richardson Therme Vals Workshop Slide 15
![Page 17: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/17.jpg)
Relating Counterfactuals and Structural Equations
Potential outcomes can be seen as a different notation forNon-Parametric Structural Equation Models (NPSEMs):
Example: X→ Y.
NPSEM formulation: Y = f(X, εY)
Potential outcome formulation: Y(x) = f(x, εY)
Two important caveats:
NPSEMs typically assume all variables are seen as beingsubject to well-defined interventions (not so with potentialoutcomes)
Pearl associates NPSEMs with Independent Errors(NPSEM-IEs) with DAGs (more on this later).
Thomas Richardson Therme Vals Workshop Slide 16
![Page 18: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/18.jpg)
Relating Counterfactuals and ‘do’ notation
Expressions in terms of ‘do’ can be expressed in terms ofcounterfactuals:
P(Y(x) = y) ≡ P(Y = y | do(X = x))
but counterfactual notation is more general. Ex. Distribution ofoutcomes that would arise among those who took treatment(X = 1) had counter-to-fact they not received treatment:
P(Y(x = 0) = y | X = 1)
If treatment is randomized, so X ⊥⊥ Y(x = 0) then this equalsP(Y(x = 0) = y), but in an observational study these may bedifferent.
Thomas Richardson Therme Vals Workshop Slide 17
![Page 19: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/19.jpg)
Graphs
Thomas Richardson Therme Vals Workshop Slide 18
![Page 20: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/20.jpg)
Recap: Graphical Approach to Causality
X Y
No Confounding
X
H
Y
Confounding
Unobserved
Graph intended to represent direct causal relations.
Convention that confounding variables (e.g. H) are always includedon the graph.
Approach originates in the path diagrams introduced by SewallWright in the 1920s.
If X→ Y then X is said to be a parent of Y; Y is child of X.
Thomas Richardson Therme Vals Workshop Slide 19
![Page 21: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/21.jpg)
Edges are directed, but are they causal?
X Y
P(X, Y) = P(X)P(Y | X)
No Confounding
X Y
P(X, Y) = P(Y)P(X | Y)
No Confounding
Neither factorization places any restriction on P(X, Y).
Thomas Richardson Therme Vals Workshop Slide 20
![Page 22: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/22.jpg)
Linking the two approaches
X Y
X ⊥⊥ Y(x0) & X ⊥⊥ Y(x1)
X
H
Y
X 6⊥⊥ Y(x0) & X 6⊥⊥ Y(x1)
Unobserved
Elephant in the room:The variables Y(x0) and Y(x1) do not appear on thesegraphs!!
Thomas Richardson Therme Vals Workshop Slide 21
![Page 23: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/23.jpg)
Node splitting: Setting X to 0
X Y
P(X= x, Y= y) = P(X= x)P(Y= y | X= x)
⇒ X x = 0 Y(x = 0)
Can now ‘read’ the independence: X ⊥⊥ Y(x=0).Also associate a new factorization:
P (X= x, Y(x=0)= y) = P(X= x)P (Y(x=0)= y)
where:P (Y(x=0)= y) = P(Y= y |X=0).
This last equation links a term in the original factorization to thenew factorization. We term this the ‘modularity assumption’.
Thomas Richardson Therme Vals Workshop Slide 22
![Page 24: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/24.jpg)
Node splitting: Setting X to 1
X Y
P(X= x, Y= y) = P(X= x)P(Y= y | X= x)
⇒ X x = 1 Y(x = 1)
Can now ‘read’ the independence: X ⊥⊥ Y(x=1).Also associate a new factorization:
P (X= x, Y(x=1)= y) = P(X= x)P (Y(x=1)= y)
where:P (Y(x=1)=y) = P(Y=y |X=1).
Thomas Richardson Therme Vals Workshop Slide 23
![Page 25: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/25.jpg)
Crucial point: Y(x=0) and Y(x=1) are never on the same graph.Although we have:
X ⊥⊥ Y(x=0) and X ⊥⊥ Y(x=1)
we do not haveX ⊥⊥ Y(x=0), Y(x=1)
Had we tried to construct a single graph containing both Y(x=0)and Y(x=1) this would have been impossible. (Why?)
⇒ Single-World Intervention Graphs (SWIGs).
Thomas Richardson Therme Vals Workshop Slide 24
![Page 26: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/26.jpg)
Representing both graphs via a ‘template’
X Y
G
⇒ X x Y(x)
G(x)
Represent both graphs via a template:
Formally this is a ‘graph valued function’:
Takes as input a specific value x∗
Returns as output a SWIG G(x∗).
Each instantiation of the template is a SWIG G(x∗) that representsa different margin: P(X, Y(x∗)) with red nodes x∗ becomingconstants.
Thomas Richardson Therme Vals Workshop Slide 25
![Page 27: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/27.jpg)
Intuition behind node splitting:(Robins, VanderWeele, Richardson 2007)
Q: How could we identify whether someone would choose to taketreatment, i.e. have X = 1, and at the same time find out whathappens to such a person if they don’t take treatment Y(x = 0)?
A: Consider an experiment in which, whenever a patient isobserved to swallow the drug have X = 1, we instantly interveneby administering a safe ‘emetic’ that causes the pill to beregurgitated before any drug can enter the bloodstream.Since we assume the emetic has no side effects, the patient’srecorded outcome is then Y(x = 0).
Thomas Richardson Therme Vals Workshop Slide 26
![Page 28: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/28.jpg)
Harder Inferential problem
X0 Z
H X1
Y
Query: does this causal graph imply?
Y(x0, x1) ⊥⊥ X1(x0) | Z(x0),X0,
Thomas Richardson Therme Vals Workshop Slide 27
![Page 29: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/29.jpg)
Simple solution
X0 Z
H X1
Y
X0
x0
Z(x0)
H
X1(x0)
x1
Y(x0, x1)
Query does this graph imply:
Y(x0, x1) ⊥⊥ X1(x0) | Z(x0),X0 ?
Answer: Yes – applying d-separation to the SWIG on the right wesee that there is no d-connecting path from Y(x0, x1) given Z(x0).More on this shortly...
Thomas Richardson Therme Vals Workshop Slide 28
![Page 30: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/30.jpg)
Single World Intervention Template Construction (1)Given a graph G, a subset of vertices A = {A1, . . . ,Ak} to be intervenedon, we form G(a) in two steps:
(1) (Node splitting): For every A ∈ A split the node into a randomnode A and a fixed node a:
A
· · ·
· · ·
⇒ A
a
Splitting: Schematic Illustrating the Splitting of Node A
The random half inherits all edges directed into A in G;
The fixed half inherits all edges directed out of A in G.
Thomas Richardson Therme Vals Workshop Slide 29
![Page 31: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/31.jpg)
Single World Intervention Template Construction (2)
(2) Relabel descendants of fixed nodes:
a ⇒A
B C
D
FE
X
T
Y
Z
· · ·
· · ·
· · ·
· · ·
a
A(. . .)
B(a, . . .) C(a, . . .)
D(a, . . .)
F(a, . . .)E(a, . . .)
X(. . .)
T(. . .)
Y(. . .)
Z(. . .)
· · ·
· · ·
· · ·
· · ·
Thomas Richardson Therme Vals Workshop Slide 30
![Page 32: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/32.jpg)
Single World Intervention Graph
A Single World Intervention Graph (SWIG) G(a∗) is obtained fromthe Template G(a) by simply substituting specific values a∗ for thevariables a in G(a);
For example, we replace G(x) with G(x=0).
Changing the value of a fixed variable corresponds toconstructing a new graph and considering a differentpopulation, e.g. P(X, Y(x=0)) vs. P(X, Y(x=1))
It is only the instantiated graph G(x) that represents P(V(x)),not the template G(x).
Thomas Richardson Therme Vals Workshop Slide 31
![Page 33: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/33.jpg)
Factorization and Modularity
Factorization: P(V(a)) over the counterfactual variables in G(a)factorizes with respect to G(a) (ignoring fixed nodes):
P (V(a)) =∏
Y(a)∈V(a)P(Y(a)
∣∣∣ paG(a)(Y(a)) \ a)
.
Modularity: P(V(a)) and P(V) are linked as follows:
P(Y(a)=y
∣∣∣ (paG(a)(Y(a)) \ a)= q
)= P
(Y=y
∣∣∣ (paG(Y) \A)= q,
(paG(Y) ∩A
)= apaG(Y)∩A
),
So the conditional density associated with Y(aY) in G(a) is just theconditional density associated with Y in G after substituting ai for anyAi ∈ A that is a parent of Y.
Thomas Richardson Therme Vals Workshop Slide 32
![Page 34: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/34.jpg)
Applying d-separation to the graph G(a)
Counterfactual conditional independence relations may beobtained from the transformed graph by applying d-separation afteradding fixed nodes to the conditioning set:
Given disjoint subsets B(a), C(a) and D(a) of random vertices(where D(a) may be empty),
if B(a) is d-separated from C(a) given D(a) ∪ a in G(a) (2)
then B(a) ⊥⊥ C(a) | D(a) [P(V(a))].
In words, if in G(a) two subsets B(a) and C(a) of random nodes ared-separated by D(a) in conjunction with the fixed nodes a, then B(a) andC(a) are conditionally independent given D(a) in the associateddistribution P(V(a)).
Thomas Richardson Therme Vals Workshop Slide 33
![Page 35: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/35.jpg)
Conditioning on fixed variables a
intuitive since these are fixed constants in the SWIG
Since vertices in a have no parents, no new paths d-connectdue to also conditioning on a.
⇒ If a d-separation holds in G(a) without conditioning on thefixed nodes, then it will continue to hold if we also condition onfixed nodes.
An alternative is simply to restrict attention to paths that donot contain fixed vertices,
e.g. remove fixed nodes from the graph before checkingd-separation.
Thomas Richardson Therme Vals Workshop Slide 34
![Page 36: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/36.jpg)
Mediation graph (I)Intervention on Z alone.
X YZ X(z)Z z Y(z)⇒
factorization:
P(Z,X(z), Y(z)) = P(Z)P(X(z))P(Y(z) | X(z))
modularity:
P(X(z)=x) = P(X=x | Z= z),
P(Y(z)=y | X(z)=x) = P(Y=y | X=x,Z= z).
d-separation gives:Z ⊥⊥ X(z), Y(z).
Thomas Richardson Therme Vals Workshop Slide 35
![Page 37: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/37.jpg)
Mediation graph (II)Intervention on Z and X:
X YZ X(z) xZ z Y(x, z)⇒
factorization:
P(Z,X(z), Y(x, z)) = P(Z)P(X(z))P(Y(x, z))
modularity:
P(X(z)=x) = P(X=x | Z= z),
P(Y(x, z)=y) = P(Y=y | X= x,Z= z).
d-separation gives:
Z ⊥⊥ X(z) ⊥⊥ Y(x, z)
Thomas Richardson Therme Vals Workshop Slide 36
![Page 38: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/38.jpg)
No direct effect graph
X YZ X(z) xZ z Y(x)⇒
factorization:
P(Z,X(z), Y(x)) = P(Z)P(X(z))P(Y(x))
modularity:
P(X(z)=x) = P(X=x | Z= z),
P(Y(x)=y) = P(Y=y | X= x).
d-separation gives:
Z ⊥⊥ X(z) ⊥⊥ Y(x)
Thomas Richardson Therme Vals Workshop Slide 37
![Page 39: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/39.jpg)
Inferential Problem (II)
X0 Z
H X1
Y
X0
x0
Z(x0)
H
X1(x0)
x1
Y(x0, x1)
Pearl (2009), Ex. 11.3.3, claims the causal DAG above does not imply:
Y(x0, x1) ⊥⊥ X1 | Z,X0 = x0. (3)
The SWIG shows that (3) does hold; Pearl is incorrect.Specifically, we see from the SWIG:
Y(x0, x1) ⊥⊥ X1(x0) | Z(x0),X0, (4)
⇒ Y(x0, x1) ⊥⊥ X1(x0) | Z(x0),X0 = x0. (5)
This last condition is then equivalent to (3) via consistency.(Pearl infers a claim of Robins is false since if true then (3) would hold).
Thomas Richardson Therme Vals Workshop Slide 38
![Page 40: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/40.jpg)
Pearl’s twin network for the same problem
X0
Z
H
X1
Y
X0
Z
H
X1
Y
x0
Z(x0, x1)
H(x0, x1)
x1
Y(x0, x1)
UZ
UH
UY
The twin network fails to reveal that Y(x0, x1) ⊥⊥ X1 | Z,X0 = x0.This ‘extra’ independence holds in spite of d-connection because (byconsistency) when X0 = x0, then Z = Z(x0) = Z(x0, x1).Note that Y(x0, x1) 6⊥⊥ X1 | Z,X0 6= x0.
Shpitser & Pearl (2008) introduce a pre-processing step to address this.
Thomas Richardson Therme Vals Workshop Slide 39
![Page 41: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/41.jpg)
Confounding Revisited
X x Y(x)
H
Here we can read directly from the template that X 6⊥⊥ Y(x) sincethere is a path:
X← H→ Y(x).
Thomas Richardson Therme Vals Workshop Slide 40
![Page 42: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/42.jpg)
Adjusting for confounding
X Y
L
X x Y(x)
L
Here we can read directly from the template that
X ⊥⊥ Y(x) | L.
It follows that:
P(Y(x)=y) =∑l
P(Y=y | L= l,X= x)P(L= l). (6)
Thomas Richardson Therme Vals Workshop Slide 41
![Page 43: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/43.jpg)
Contrast with approach via removing edges
X Y
L
X x Y(x)
L
X Y
L
This ‘explains’ why L is sufficient to control confounding under thenull (where X has no effect on Y) but not under the alternative.
Thomas Richardson Therme Vals Workshop Slide 42
![Page 44: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/44.jpg)
Adjusting for confounding
X Y
L
X x Y(x)
L
X ⊥⊥ Y(x) | L.
Proof of identification:
P[Y(x) = y] =∑l
P[Y(x) = y | L = l]P(L = l)
=∑l
P[Y(x) = y | L = l,X = x]P(L = l) indep
=∑l
P[Y = y | L = l,X = x]P(L = l) modularity
Thomas Richardson Therme Vals Workshop Slide 43
![Page 45: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/45.jpg)
More Examples (I)
X Y
L
H
(a-i)
X x Y(x)
L
H
(a-ii)
Here we can read directly from the template that
X ⊥⊥ Y(x) | L.
Thomas Richardson Therme Vals Workshop Slide 44
![Page 46: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/46.jpg)
More Examples (II)
X Y
L
H
(b-i)
X x Y(x)
L
H
(b-ii)
Here we can read directly from the template that
X ⊥⊥ Y(x) | L.
Thomas Richardson Therme Vals Workshop Slide 45
![Page 47: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/47.jpg)
Sequentially randomized experiment (I)
A B C D
H
A and C are treatments;
H is unobserved;
B is a time varying confounder;
D is the final response;
Treatment C is assigned randomly conditional on the observedhistory, A and B;
Want to know P(D(a, c)).
Thomas Richardson Therme Vals Workshop Slide 46
![Page 48: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/48.jpg)
Sequentially randomized experiment (I)
A B C D
H
If the following holds:
A ⊥⊥ D(a, c)
C(a) ⊥⊥ D(a, c) | B(a),A
General result of Robins (1986) then implies:
P(D(a, c)=d) =∑b
P(B=b | A= a)P(D=d | A= a,B=b,C= c).
Does it??
Thomas Richardson Therme Vals Workshop Slide 47
![Page 49: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/49.jpg)
Sequentially randomized experiment (II)
A a B(a) C(a) c D(a, c)
H
d-separation:
A ⊥⊥ D(a, c)
C(a) ⊥⊥ D(a, c) | B(a),A
General result of Robins (1986) then implies:
P(D(a, c)=d) =∑b
P(B=b | A= a)P(D=d | A= a,B=b,C= c).
Thomas Richardson Therme Vals Workshop Slide 48
![Page 50: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/50.jpg)
Multi-network approach
A B C D
H
UHUB
UC
UD
a B(a) C(a) D(a)
H(a)
a B(a, c) c D(a, c)
H(a, c)
Thomas Richardson Therme Vals Workshop Slide 49
![Page 51: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/51.jpg)
Another example
A B C D
H2H1
A ⊥⊥ D(a, c)
C(a) ⊥⊥ D(a, c) | B(a),A
General result of Robins (1986) then implies:
P(D(a, c)=d) =∑b
P(B=b | A= a)P(D=d | A= a,B=b,C= c).
Does it??
Thomas Richardson Therme Vals Workshop Slide 50
![Page 52: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/52.jpg)
Another example
A B C D
H2H1
A
aB(a)
C(a)
cD(a, c)
H1 H2
A ⊥⊥ D(a, c)
C(a) ⊥⊥ D(a, c) | B(a),A
General result of Robins (1986) then implies:
P(D(a, c)=d) =∑b
P(B=b | A= a)P(D=d | A= a,B=b,C= c).
Thomas Richardson Therme Vals Workshop Slide 51
![Page 53: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/53.jpg)
General result (Robins, 1986)Observed data:
O ≡ 〈L1,A1, . . . ,LK,AK, Y〉.
If the following holds for k = 1, . . . ,K
Y(a†) ⊥⊥ Ak(a†) | Lk(a
†),Ak−1(a†); (7)
then (under positivity):
P(Y(a†)=y |Lj(a†) = lj,Aj−1(a
†) = a†j−1)
=∑
lm+1,...,lK
p(y|lK, a†K)K∏
j=m+1
p(lj|lj−1, a†j−1). (8)
Here Aj−1(a†) ≡ 〈A1, . . . ,Aj−1(a
†j−2)〉, similarly for Lj−1(a
†).The RHS of (8) is referred to as the ‘g-formula’.
Thomas Richardson Therme Vals Workshop Slide 52
![Page 54: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/54.jpg)
Dynamic regimes
A dynamic regime g is a policy that assigns treatment (usuallyat multiple time points) on the basis of past history;
Including conditional on the ‘natural’ value of treatment in theabsence of an intervention;
Exercise for as long as you would have done withoutintervention or twenty minutes, whichever is more.
See Young et al. (2012) for additional analysis.
Thomas Richardson Therme Vals Workshop Slide 53
![Page 55: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/55.jpg)
Dynamic regimes
A1
a1
L(a1) A2(a1)
a2
Y(a1,a2)
H2H1
A1
A+1 (g)
L(g) A2(g)
A+2 (g)
Y(g)
H2H1
P(Y(g)) is identified.Thomas Richardson Therme Vals Workshop Slide 54
![Page 56: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/56.jpg)
Dynamic regimes
A1
a1
L(a1) A2(a1)
a2
Y(a1,a2)
H2H1
A1
A+1 (g)
L(g) A2(g)
A+2 (g)
Y(g)
H2H1
P(Y(g)) is not identified.Thomas Richardson Therme Vals Workshop Slide 55
![Page 57: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/57.jpg)
Joint Independence
We saw earlier that the causal DAG X→ Y implied:
X ⊥⊥ Y(x0) and X ⊥⊥ Y(x1)
However, joint independence relations such as:
X ⊥⊥ Y(x0), Y(x1)
never follow from our SWIG transformation:There is no way via node-splitting to construct a graph with bothY(x0), and Y(x1).This has important consequences for the identification of directeffects.
Thomas Richardson Therme Vals Workshop Slide 56
![Page 58: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/58.jpg)
Assuming Independent Errors and
Cross-World Independence
Thomas Richardson Therme Vals Workshop Slide 57
![Page 59: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/59.jpg)
Mediation graphIntervention on X and M:
M YX M(x) mX x Y(x, m)⇒
d-separation gives:
X ⊥⊥ M(x) ⊥⊥ Y(x, m)
Pearl associates additional independence relations with this graph
Y(x1,m) ⊥⊥ M(x0),X
Y(x0,m) ⊥⊥ M(x1),X
equivalent to assuming independent errors, εX ⊥⊥ εM ⊥⊥ εY .Thomas Richardson Therme Vals Workshop Slide 58
![Page 60: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/60.jpg)
Pure Direct Effect
Pure (aka Natural) Direct Effect (PDE): Change in Y had X beendifferent, but M fixed at the value it would have taken had X notbeen changed:
PDE ≡ Y(x1,M(x0)) − Y(x0,M(x0)).
Legal motivation [from Pearl (2000)]:
“The central question in any employment-discrimination case is whetherthe employer would have taken the same action had the employee beenof a different race (age, sex, religion, national origin etc.) and everythingelse had been the same.” (Carson versus Bethlehem Steel Corp., 70FEP Cases 921, 7th Cir. (1996)).
Thomas Richardson Therme Vals Workshop Slide 59
![Page 61: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/61.jpg)
Decomposition
PDE also allows non-parametric decomposition of Total Effect(ACE) into direct (PDE) and indirect (TIE) pieces.
PDE ≡ E [Y(1,M(0))] − E [Y(0)]
TIE ≡ E [Y (1,M(1)) − Y (1,M(0))]
TIE+ PDE ≡ E [Y(1)] − E [Y(0)] ≡ ACE(X→ Y)
Thomas Richardson Therme Vals Workshop Slide 60
![Page 62: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/62.jpg)
Pearl’s identification claim
Pearl and others claim that under “no confounding” the PDE isidentified by the following mediation formula:
PDEmed(m) =∑m
[E[Y|x1,m] − E[Y|x0,m]]P(m|x0)
Thomas Richardson Therme Vals Workshop Slide 61
![Page 63: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/63.jpg)
Critique of PDE: Hypothetical Case Study
Observational data on three variables:
X- treatment: cigarette cessation
M intermediate: blood pressure at 1 year, high or low
Y outcome: say CHD by 2 years
Observed data (X,M, Y) on each of n subjects.
All binary
X randomly assigned
Thomas Richardson Therme Vals Workshop Slide 62
![Page 64: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/64.jpg)
Hypothetical Study (I): X randomized
Y = 0 Y = 1 Total P(Y=1 |m, x)
M = 0 1500 500 2000 0.25X = 0
M = 1 1200 800 2000 0.40
M = 0 948 252 1200 0.21X = 1
M = 1 1568 1232 2800 0.44
A researcher, Prof H wishes to apply the mediation formula toestimate the PDE.Prof H believes that there is no confounding, so that Pearl’sNPSEM-IE holds, but his post-doc, Dr L is skeptical.
Thomas Richardson Therme Vals Workshop Slide 63
![Page 65: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/65.jpg)
Hypothetical Study (II): X and M Randomized
To try to address Dr L’s concerns, Prof H carries out animalintervention studies.
Y = 0 Y = 1 Total P(Y(m, x)=1)M = 0 750 250 1000 0.25
X = 0M = 1 600 400 1000 0.40
M = 0 790 210 1000 0.21X = 1
M = 1 560 440 1000 0.44
As we see: P(Y(m, x)=1) = P(Y=1 |m, x);Prof H is now convinced: ‘What other experiment could I do ?’
He applies the mediation formula, yielding PDEmed
= 0.Conclusion: No direct effect of X on Y.
Thomas Richardson Therme Vals Workshop Slide 64
![Page 66: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/66.jpg)
Failure of the mediation formula
Under the true generating process, the true value of the PDE is:
PDE = 0.153 6= PDEmed
= 0
Prof H’s conclusion was completely wrong!
Thomas Richardson Therme Vals Workshop Slide 65
![Page 67: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/67.jpg)
Why did the mediation formula go wrong?
Dr L was right – there was a confounder:
M YX
H
but. . . it had a special structure so that:
Y ⊥⊥ H |M,X = 0 and M ⊥⊥ H | X = 1
Thomas Richardson Therme Vals Workshop Slide 66
![Page 68: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/68.jpg)
Why did the mediation formula go wrong?
Dr L was right – there was a confounder:
M YX
H
but. . . it had a special structure so that:
Y ⊥⊥ H |M,X = 0 and M ⊥⊥ H | X = 1
M Y
HX = 0
M Y
HX = 1
The confounding undetectable by any intervention on X and/or M.
Pearl: Onus is on the researcher to be sure there is no confounding.Causation should precede intervention.
Thomas Richardson Therme Vals Workshop Slide 67
![Page 69: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/69.jpg)
PDE identification cannot be checked via experimentIf our only interventions are on the variables X and M then wecannot do an experiment to learn the PDE.
We could learn E [Y{x = 1,M(x = 0)}] by intervention if we couldI intervene and set X to 0 and observe M(0),I then return each subject to their pre-intervention state,I finally intervene to set X to 1 and M to M(0) and observeY(1,M(0)).
Such an intervention strategy will usually not exist because notpossible in a real-world intervention (e.g., suppose the outcome Ywere death).
Because we cannot observe the same subject under both X = 1and X = 0 (i.e. ”across worlds”,) no intervention will allow us tolearn the distribution of mixed counterfactuals such asY{x = 1,M(x = 0)} :
(In the story Dr L had to introduce a new node on the graph in order tocheck the value of the PDE via an experiment.)
Thomas Richardson Therme Vals Workshop Slide 68
![Page 70: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/70.jpg)
Summary of critique of Independent Error AssumptionThe independent error assumption cannot be checked by anyrandomized experiment on the variables in the graph.
⇒ Connection between experimental interventions and potentialoutcomes, established by Neyman has been severed;
⇒ Theories in Social and Medical sciences are not detailed enough tosupport the independent error assumption.
What about faithfulness and causal discovery procedures?
Such inferences are explicit that they rely on faithfulness, and aredesigned to guide hypothesis formation;
I Contrast: In Pearl’s NPSEM-IE approach the simple act ofusing a DAG is viewed as automatically committing you tomaking this untestable hypothesis.
Predictions (possibly derived assuming faithfulness) regardingintervention distributions P(Y(x)) = P(Y | do(x)) can be tested byrandomized experiments.
Thomas Richardson Therme Vals Workshop Slide 69
![Page 71: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/71.jpg)
How many experimentally untestable assumptions?Assumption of independent errors implies super-exponentiallymany ‘cross-world’ counterfactual independence assumptions:
No. Actual Vars. 2 3 4 K
Dim. P(V) 3 7 15 2K − 1No. Cnterfactual Vars. 3 7 15 2K − 1
Dim. Cnterfactual Dist. 7 127 32767 2(2K−1) − 1
Dim. SWIG 5 113 32697 (2(2K−1) − 1) −∑K−1
j=1 (4j − 2j)
Dim. NPSEM-IE 4 19 274∑K−1
j=0 (22j
− 1)
No. untestable indep. 1 94 32423 O(22K−2)constrnts in NPSEM-IE
Table: Dimensions of counterfactual models associated with completegraphs with binary variables.
Thomas Richardson Therme Vals Workshop Slide 70
![Page 72: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/72.jpg)
SWIG Completeness Conjecture
In an NPSEM we define a counterfactual independence to belogical if it holds regardless of the distribution overcounterfactuals (equivalently error terms) e.g. for binary X
Y(x0) ⊥⊥ Y(x1) | X, Y
Completeness Conjecture There exists a distribution overcounterfactuals that is experimentally indistinguishable fromthe NPSEM that assumes independent errors but in which theonly non-logical independencies are those that may bederived from the SWIG.
Thomas Richardson Therme Vals Workshop Slide 71
![Page 73: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/73.jpg)
Summary
A simple approach to unifying graphs and counterfactuals vianode-splitting
The approach works via linking the factorizations associatedwith the two graphs
The approach provides a language that allows counterfactualand graphical people to communicate
The approach leads to many fewer untestable independenceassumptions than in the NPSEM-IE approach of Pearl.
The approach also provides a way to combine information onthe absence of individual and population level direct effects.
Thomas Richardson Therme Vals Workshop Slide 72
![Page 74: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/74.jpg)
Thank You!
Thomas Richardson Therme Vals Workshop Slide 73
![Page 75: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/75.jpg)
ReferencesPearl, J. Causality (Second ed.). Cambridge, UK: CambridgeUniversity Press, 2009.
Richardson, TS, Robins, JM. Single World Intervention Graphs.CSSS Technical Report No. 128http://www.csss.washington.edu/Papers/wp128.pdf, 2013.
Robins, JM A new approach to causal inference in mortality studieswith sustained exposure periods applications to control of thehealthy worker survivor effect. Mathematical Modeling 7,1393–1512, 1986.
Robins, JM, VanderWeele, TJ, Richardson TS. Discussion of“Causal effects in the presence of non compliance a latent variableinterpretation by Forcina, A. Metron LXIV (3), 288–298, 2007.
Shpitser, I, Pearl, J. What counterfactuals can be tested. Journal ofMachine Learning Research 9, 1941–1979, 2008.
Spirtes, P, Glymour, C, Scheines R. Causation, Prediction andSearch. Lecture Notes in Statistics 81, Springer-Verlag.
Thomas Richardson Therme Vals Workshop Slide 74
![Page 76: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/76.jpg)
Details on Pearl’s ErrorPearl correctly states that using his Twin Network method (next slide) itmay be shown that
Y(x0, x1) is not independent of X1, given Z and X0.
However, he then goes on to say (incorrectly):
In the twin network model there is a d-connected path fromX1 to Y(x0, x1). . . Therefore, [(3)] is not satisfied for Y(x0, x1)and X1.
[Ex. 11.3.3, p.353]This is actually incorrect in two ways:
Y(x0, x1) 6⊥⊥ X1 | Z,X0 does not imply Y(x0, x1) 6⊥⊥ X1 | Z,X0=x0
d-separation is not complete for Twin Networks so the presence of ad-connected path does not imply that an independence is notimplied.
Thomas Richardson Therme Vals Workshop Slide 75
![Page 77: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/77.jpg)
T
X
Z
Y
T
X
Z
Y
UT
UX
UZ
UY
T(z)
X(z)
z
Y(z)
T
X
Z
z
Y(z)
T and Y(z) are d-connected given X in the twin-network, but in spite ofthis T ⊥⊥ Y(z) | X under the associated NPSEM-IE because X(z) = X,and T and Y(z) are d-separated given X in the twin-network.
Thomas Richardson Therme Vals Workshop Slide 76
![Page 78: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/78.jpg)
Mediation graph (I)Intervention on Z alone.
X YZ X(z)Z z Y(z)⇒
factorization:
P(Z,X(z), Y(z)) = P(Z)P(X(z))P(Y(z) | X(z))
modularity:
P(X(z)=x) = P(X=x | Z= z),
P(Y(z)=y | X(z)=x) = P(Y=y | X=x,Z= z).
d-separation gives:Z ⊥⊥ X(z), Y(z).
Thomas Richardson Therme Vals Workshop Slide 77
![Page 79: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/79.jpg)
Mediation graph (II)Intervention on Z and X:
X YZ X(z) xZ z Y(x, z)⇒
factorization:
P(Z,X(z), Y(x, z)) = P(Z)P(X(z))P(Y(x, z))
modularity:
P(X(z)=x) = P(X=x | Z= z),
P(Y(x, z)=y) = P(Y=y | X= x,Z= z).
d-separation gives:
Z ⊥⊥ X(z) ⊥⊥ Y(x, z)
Thomas Richardson Therme Vals Workshop Slide 78
![Page 80: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/80.jpg)
Importance of fixed nodesCompare:
X YZ X(z)Z z Y(z)⇒
P(Z,X(z), Y(z)) = P(Z)P(X(z))P(Y(z) | X(z))
P(Y(z)=y | X(z) = x) = P(Y=y | X=x,Z= z).
versus
X YZ X(z)Z z Y(z)⇒
P(Z,X(z), Y(z)) = P(Z)P(X(z))P(Y(z) | X(z)),
P(Y(z)=y | X(z) = x) = P(Y=y | X=x)
Thomas Richardson Therme Vals Workshop Slide 79
![Page 81: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/81.jpg)
Importance of fixed nodes: leaving them out causesproblems!
X YZ X(z)Z Y(z)⇒
P(Z,X(z), Y(z)) = P(Z)P(X(z))P(Y(z) | X(z))
P(Y(z)=y | X(z) = x) = P(Y=y | X=x,Z= z).
versus
X YZ X(z)Z Y(z)⇒
P(Z,X(z), Y(z)) = P(Z)P(X(z))P(Y(z) | X(z)),
P(Y(z)=y | X(z) = x) = P(Y=y | X=x)
Red nodes are needed in order to read off modularity property from G(a).
Thomas Richardson Therme Vals Workshop Slide 80
![Page 82: Single World Intervention Graphs (SWIGs)people.tuebingen.mpg.de/p/causality-perspect/slides/Thomas_swig.pdf · Single World Intervention Graphs (SWIGs): Unifying the Counterfactual](https://reader033.vdocuments.us/reader033/viewer/2022042218/5ec3d0a4e154f754a04c49ff/html5/thumbnails/82.jpg)
No direct effect graph (I)
X YZ X(z)Z z Y(z)⇒
factorization:
P(Z,X(z), Y(z)) = P(Z)P(X(z))P(Y(z) | X(z))
modularity:
P(X(z)=x) = P(X=x | Z= z),
P(Y(z)=y | X(z) = x) = P(Y=y | X=x).
d-separation gives:Z ⊥⊥ X(z), Y(z)
Thomas Richardson Therme Vals Workshop Slide 81