an introduction to causal inference from observational data...the main point of causal inference and...
TRANSCRIPT
-
Jilles Vreeken
But, why?an introduction to
causal inference fromobservational data
13 September 2019
-
Questions of the day
What is causation,how can we measure it,and how can discover it?
2
-
Causation
âthe relationship between something that happens or exists
and the thing that causes itâ
(Merriam-Webster) 3
-
Correlation vs. Causation
4
Correlation does not tell us anything about causality
Instead, we should talk about dependence.
-
Dependence vs. Causation
5
-
What is causal inference?
âreasoning to the conclusion that something is, or is likely to be, the cause of something elseâ
Godzillian different definitions of âcauseâ and âeffectâ equally many inference frameworks all require (strong) assumptions many highly specific
6
-
Causal Inference
7
-
Naïve approach
Ifðð cause ðð effect cause > ðð effect ðð cause effect
then cause â effect
8
-
Naïve approach fails
Ifðð cause ðð effect cause = ðð effect ðð cause effect
then cause â effect
9
Both are equal as they are simply factorizations of ðð(ðððððððððð, ðððððððððððð)
-
Naïve approach
Ifðð cause ðð effect cause > ðð effect ðð cause effect
then cause â effect
10
-
Naïve approach fails
Ifðð cause ðð effect cause ðð effect ðð cause effect
then cause â effect
11
Depends on distribution and domain size of
data, not on causal effect
-
Naïve approach
Ifðð cause ðð effect cause > ðð effect ðð cause effect
then cause â effect
12
-
Naïve approach fails
Ifðð effect cause
ðð effect >ðð cause ⣠effect
ðð cause
then cause â effect
13
But do we know for sure that the lhs is higher when when cause â effect ?
What about differences in domain sizes, complexities of distributions, etc
-
The Ultimate TestRandomized controlled trials are the de-facto standard for determining whether ðð causes ðð treatment ðð â {0,1, ⊠}, potential effect ðð and co-variates ðð
Simply put, we 1. gather a large population of test subjects2. randomly split the population into two equally sized groups ðŽðŽ and ðµðµ,
making sure that ðð is equally distributed between ðŽðŽ and ðµðµ3. apply treatment ðð = 0 to group ðŽðŽ, and treatment ðð = 1 to group ðµðµ4. determine whether ðð and ðð are dependent
If ðð ⥠ðð, we conclude that ðð causes ðð
14
-
The Ultimate TestRandomized controlled trials are the de-facto standard for determining whether ðð causes ðð treatment ðð â {0,1, ⊠}, potential effect ðð and co-variates ðð
Simply put, we 1. gather a large population of test subjects2. randomly split the population into two equally sized groups ðŽðŽ and ðµðµ,
making sure that ðð is equally distributed between ðŽðŽ and ðµðµ3. apply treatment ðð = 0 to group ðŽðŽ, and treatment ðð = 1 to group ðµðµ4. determine whether ðð and ðð are dependent
If ðð ⥠ðð, we conclude that ðð causes ðð
15
Ultimate, but not ideal
⢠Often impossible or unethical⢠Large populations needed⢠Difficult to control for ðð
-
Do, or do notObservational ðð ðŠðŠ ð¥ð¥ distribution of ðð given that we observe variable ðð takes value ð¥ð¥ what we usually estimate, e.g. in regression or classification
can be inferred from data using Bayesâ rule ðð ðŠðŠ ð¥ð¥ = ðð ð¥ð¥,ðŠðŠðð(ð¥ð¥)
Interventional ðð(ðŠðŠ ⣠ðððð ð¥ð¥ ) distribution of ðð given that we set the value of variable ðð to ð¥ð¥ describes the distribution of ðð we would observe if we would
intervene by artificially forcing ðð to take value ð¥ð¥, but otherwise use the original data generating process â ðð ð¥ð¥,ðŠðŠ, ⊠!
the conditional distribution of ðð we would get through a randomized control trial!
(Pearl, 1982) 16
-
Same old, same old?In general, ðð(ðŠðŠ ⣠ðððð ð¥ð¥ ) and ðð(ðŠðŠ ⣠ð¥ð¥) are not the same
Letâs consider my espresso machine ðŠðŠ actual pressure in the boiler ð¥ð¥ pressure measured by front gauge
Now, if the barometer works well, ðð(ðŠðŠ ⣠ð¥ð¥)
will be unimodal around ð¥ð¥ intervening on the barometer,
e.g. moving its needle up or down,however, has no effect on the actual pressure and hence, ðð ðŠðŠ ðððð ð¥ð¥ = ðð(ðŠðŠ)
17
-
What do you want?Before we go into a lot more detail, what do we want?
If you just want to predict, ðð ðŠðŠ ð¥ð¥ is great e.g. when âinterpolatingâ ðð between its cause and its effects is fine also, boring, because lots of cool methods exist
If you want to act on ð¥ð¥, you really want ðð(ðŠðŠ ⣠ðððð ð¥ð¥ ) for example, for drug administration, or discovery also, exciting, not so many methods exist
18
-
Observational DataEven if we cannot directly access ðð(ðŠðŠ ⣠ðððð ð¥ð¥ )e.g. through randomized trials, it does exist
The main point of causal inference and do-calculus is:
If we cannot measure ðð(ðŠðŠ ⣠ðððð ð¥ð¥ ) directly in a randomized trial, can we estimate it
based on data we observed outside of a controlled experiment?
19
-
Standard learning setup
20
Training data
Trained model
ðð(ðŠðŠ ⣠ð¥ð¥; ðð)
Observable joint
Observational conditional
ðð(ðŠðŠ ⣠ð¥ð¥)â
~
-
Causal learning setup
21
â
~Observable joint
Observational conditional
ðð(ðŠðŠ ⣠ð¥ð¥)
Intervention joint
Interventionconditional
ðð(ðŠðŠ ⣠ðððð(ð¥ð¥))
Training data
Intervention model
ðð(ðŠðŠ ⣠ð¥ð¥; ðð)
?
-
Causal learning goal
22
â
Observational conditional
ðð(ðŠðŠ ⣠ð¥ð¥)
Intervention joint
Training data
?
Causal model
Observable joint
Mutilated causal model
Emulatedintervention joint
Interventionconditional
ï¿œðð(ðŠðŠ ⣠ðððð(ð¥ð¥))â
Interventionconditional
ðð(ðŠðŠ ⣠ðððð(ð¥ð¥))
-
Causal learning goal
23
â
Observational conditional
ðð(ðŠðŠ ⣠ð¥ð¥)
Intervention joint
Causal model
Observable joint
Mutilated causal model
Emulatedintervention joint
Interventionconditional
ï¿œðð(ðŠðŠ ⣠ðððð(ð¥ð¥))â
Interventionconditional
ðð(ðŠðŠ ⣠ðððð(ð¥ð¥))
Estimable formulaðŒðŒð¥ð¥â²~ðð ð¥ð¥ ðŒðŒð§ð§~ðð ð§ð§ ð¥ð¥ ðð ðŠðŠ ð¥ð¥â², ð§ð§
If through do-calculus we can derive an equivalent of ï¿œðð(ðŠðŠ ⣠ðððð ð¥ð¥ ) without any ððððâs, we can estimate it from observational data alone and call ï¿œðð(ðŠðŠ ⣠ðððð ð¥ð¥ ) identifiable.
-
Causal Discovery
24
-
Causal Discovery
25
X
Y Q
S
T
Z
V
W
-
ChoicesâŠ
26
ðð ðð ðð
ðð ðð ðð
ðð ðð ðð
ðð
ðð
ðð
ðð ðð ðð
For these three, ðð ⥠Z, and ðð ⥠Z ⣠ðð holds
For this one, ðð ⥠Z, and ðð ⥠Z ⣠ðð holds
-
Statistical CausalityReichenbachâs
common cause principlelinks causality and probability
if ðð and ðð are statistically dependent then either
When ðð screens ðð and ðð from each other,given ðð, ðð and ðð become independent.
27
ðð ðð ðð
ðð
ðð ðð ðð
-
Causal Markov ConditionAny distribution generated by a Markovian model ðð
can be factorized as
ðð ðð1,ðð2, ⊠,ðððð = ï¿œðð
ðð(ðððð ⣠ðððððð)
where ðð1,ðð2, ⊠,ðððð are the endogenous variables in ðð, and ðððððð are (values of) the endogenous âparentsâ
of ðððð in the causal diagram associated with ðð
(Spirtes, Glymour, Scheines 1982; Pearl 2009) 28
-
Types of Nodes
29
X
Y Q
S
T
Z
V
W
Parentsof ððY Q
Descendantsof ðð S V
Non-descendantsof ðð
X
Z
W
-
Causal Discovery
30
X
Y
S
T
Z
V
W
QQ
S
T
V
Y
W
Endogenous variables
X
Z
Exogenous variables
-
Causal Discovery
31
X
Y
S
T
Z
V
W
QQ
S
T
V
Y
W
Endogenous variables
X
Z
Exogenous variables
Exogenous variable: A factor in a causal model that is not determined
by other variables in the system
Endogenous variable: A factor in a causal model that is determined
other variables in the system
-
Causal Markov ConditionAny distribution generated by a Markovian model ðð
can be factorized as
ðð ðð1,ðð2, ⊠,ðððð = ï¿œðð
ðð(ðððð ⣠ðððððð)
where ðð1,ðð2, ⊠,ðððð are the endogenous variables in ðð, and ðððððð are (values of) the endogenous âparentsâ
of ðððð in the causal diagram associated with ðð
(Spirtes, Glymour, Scheines 1982; Pearl 2009) 32
-
In other wordsâŠFor all distinct variables ðð and ðð in the variable set ðð,
if ðð does not cause ðð, then ðð ðð ðð,ðððððð = ðð(ðð ⣠ðððððð)
That is, we can weed out edges from a causal graph âwe can identify DAGs up to Markov equivalence classes.
Which is great, although we are unable to choose among these
33
ðð
ðð
ðð
ðð
ðð
ðð
ðð
ðð
ðð
ðð
ðð
ðð
-
Constraint-Based Causal DiscoveryThe PC algorithm is one of the most well-known, and most relied upon causal discovery algorithms proposed by Peter Spirtes and Clark Glymour
Assumes the following1) data-generating distribution has the causal Markov property on graph ðºðº2) data generating distribution is faithful to ðºðº3) every member of the population has the same distribution4) all relevant variables are in ðºðº5) there is only one graph G to which the distribution is faithful
34
-
Constraint-Based Causal DiscoveryThe PC algorithm is one of the most well-known, and most relied upon causal discovery algorithms proposed by Peter Spirtes and Clark Glymour
Two main steps1) use conditional independence tests to determine
the undirected causal graph (aka the skeleton) 2) apply constraint-based rules to direct (some) edges
35
-
Step 1: Discover the Skeleton
36
Q
S
Z
V
W
X
Y
T
for ðð = 0 ðððð ððfor all ðð,ðð â ðœðœ with ð¥ð¥,ðŠðŠ â ð¬ð¬for all ðšðš â ðœðœ of ðð nodes with ð¥ð¥, ðð , ðŠðŠ, ðð â ðžðžif ðð ⥠ðð ⣠ðŽðŽremove (ð¥ð¥,ðŠðŠ) from ð¬ð¬
-
Step 1: Discover the Skeleton
37
Q
S
Z
V
W
ðð ⥠ðð ⣠ðð ?
X
Y
T
ðð ⥠ðð ⣠ðð â no causal edge
for ðð = 0 ðððð ððfor all ðð,ðð â ðœðœ with ð¥ð¥,ðŠðŠ â ð¬ð¬for all ðšðš â ðœðœ of ðð nodes with ð¥ð¥, ðð , ðŠðŠ, ðð â ðžðžif ðð ⥠ðð ⣠ðŽðŽremove (ð¥ð¥,ðŠðŠ) from ð¬ð¬
-
Step 1: Discover the Skeleton
38
Q
S
Z
V
W
X
Y
T
for ðð = 0 ðððð ððfor all ðð,ðð â ðœðœ with ð¥ð¥,ðŠðŠ â ð¬ð¬for all ðšðš â ðœðœ of ðð nodes with ð¥ð¥, ðð , ðŠðŠ, ðð â ðžðžif ðð ⥠ðð ⣠ðŽðŽremove (ð¥ð¥,ðŠðŠ) from ð¬ð¬
-
We now have the causal skeleton
Step 1: Discover the Skeleton
39
X
Y Q
S
T
Z
V
W
-
Step 2: Orientation
40
X
Y Q
S
T
Z
V
W
We now identify all colliders ðð â ðð â ððconsidering all relevant pairs once
Rule 1
A C
B
A C
B
ðŽðŽ ⥠ð¶ð¶ðŽðŽ ⥠ð¶ð¶ ⣠ðµðµ
-
Step 2: Orientation
41
X
Y Q
S
T
Z
V
W
Rule 1
A C
B
A C
B
ðŽðŽ ⥠ð¶ð¶ðŽðŽ ⥠ð¶ð¶ ⣠ðµðµ
We now identify all colliders ðð â ðð â ððconsidering all relevant pairs once
-
We then iteratively apply Rules 2â3 untilwe cannot orient any more edges
Step 2: Orientation
42
X
Y Q
S
T
Z
V
W
Rule 2
A C
B
A C
B
Rule 3
A C
B
A C
B
Rule 4
A C
B
D
A C
B
D
Rule 1
A C
B
A C
B
ðŽðŽ ⥠ð¶ð¶ðŽðŽ ⥠ð¶ð¶ ⣠ðµðµ
-
Causal Inference
43
-
We can find the causal skeleton usingconditional independence tests
Causal Inference
44
X
Y Q
S
T
Z
V
W
-
We can find the causal skeleton usingconditional independence tests,
but only few edge directions
Causal Inference
45
X
Y Q
S
T
Z
V
W
-
We can find the causal skeleton usingconditional independence tests,
but only few edge directions
Causal Inference
46
Q
S
T
Z
V
W
?
X
Y
-
Three is a crowdTraditional causal inference methods
rely on conditional independence testsand hence require at least three observed variables
That is, they cannot distinguish between ðð â ðð and ðð â ðð
as ðð ð¥ð¥ ðð ðŠðŠ ð¥ð¥ = ðð ðŠðŠ ðð(ð¥ð¥ ⣠ðŠðŠ)are just factorisations of ðð(ð¥ð¥,ðŠðŠ)
Can we infer the causal direction between pairs?
47
-
Wiggle WiggleLetâs take another look at the definition of causality.
âthe relationship between something that happens or exists and the thing that causes itâ
From the do-calculus it follows that if ðð cause ðð, we can wiggle ðð by wiggling ðð,
while when we cannot wiggle ðð by wiggling ðð.
But⊠we only have observational data jointly over ðð,ðð, and cannot do any wiggling ourselvesâŠ
48
-
May The Noise Be With You
(Janzing et al. 2012) 49
ðŠðŠ-value with large ð»ð»(ðð ⣠ðŠðŠ)and large density ðð(ðŠðŠ)
ð¥ð¥
ðŠðŠ
-
May The Noise Be With You
(Janzing et al. 2012) 50
ðð(ð¥ð¥)
ðð(ð¥ð¥)
ðð(ðŠðŠ)
âIf the structure of density of ðð(ð¥ð¥) is not correlated with the slope of ðð, then the flat regions of ðð induce peaks in ðð(ðŠðŠ).
The causal hypothesis ðð â ðð is thus implausible because the causal mechanism ððâ1 appears to be adjusted to the âinputâ distribution ðð(ðŠðŠ).â
ð¥ð¥
ðŠðŠ
-
Independence of Input and Mechanism
If ðð causes ðð, the marginal distribution of the cause, ðð ðð
and the conditional distribution of the effect given the cause, ðð(ðð|ðð)
are independent
That is, if ðð â ðððð(ðð) contains no information about ðð(ðð|ðð)
(Sgouritsa et al 2015) 51
-
Additive Noise ModelsWhenever the joint distribution ðð(ðð,ðð)
admits a model in one direction, i.e.there exists an ðð and ðð such that
ðð = ðð ðð + ðð with ðð ⥠ðð,
but does not admit the reversed model, i.e.for all ðð and ï¿œðð we have
ðð = ðð ðð + ï¿œðð with ï¿œðð ⥠ðð
We can infer ðð â ðð
(Shimizu et al. 2006, Hoyer et al. 2009, Zhang & HyvÀrinen 2009) 52
-
ANMs and IdentifiabilityWhen are ANMs identifiable? what do we need to assume about the data generating process
for ANM-based inference to make sense? for which functions ðð and what noise distributions ð©ð© are ANMs
identifiable from observational data?
Linear functions and Gaussian noiseLinear functions and non-Gaussian noiseFor most cases of non-linear functions and any noise
(Shimizu et al. 2006, Hoyer et al. 2009, Zhang & HyvÀrinen 2009) 53
-
Additive Noise ModelsWhenever the joint distribution ðð(ðð,ðð)
admits a model in one direction, i.e.there exists an ðð and ðð such that
ðð = ðð ðð + ðð with ðð ⥠ðð,
but does not admit the reversed model, i.e.for all ðð and ï¿œðð we have
ðð = ðð ðð + ï¿œðð with ï¿œðð ⥠ðð
How do we determine or use this in practice?
(Shimizu et al. 2006, Hoyer et al. 2009, Zhang & HyvÀrinen 2009) 54
-
Independence of Input and Mechanism
If ðð causes ðð, the marginal distribution of the cause, ðð ðð
and the conditional distribution of the effect given the cause, ðð(ðð|ðð)
are independent
That is, if ðð â ðððð(ðð) contains no information about ðð(ðð|ðð)
(Sgouritsa et al 2015) 55
-
Plausible Markov KernelsIn other words, if we observe that
ðð cause ðð(effect ⣠cause)
is simpler than
ðð effect ðð(cause ⣠effect)
then it is likely that cause â effect
How to robustly measure âsimplerâ?
(Sun et al. 2006, Janzing et al. 2012) 56
-
Kolmogorov Complexity
ðŸðŸ ðð
The Kolmogorov complexity of a binary string ððis the length of the shortest program ððâ
for a universal Turing Machine ððthat generates ðð and halts.
(Kolmogorov, 1963) 57
-
Algorithmic Markov Condition
If ðð â ðð, we have,up to an additive constant,
ðŸðŸ ðð ðð + ðŸðŸ ðð ðð ðð †ðŸðŸ ðð ðð + ðŸðŸ ðð ðð ðð
That is, we can do causal inference byidentifying the factorization of the joint with the lowest Kolmogorov complexity
(Janzing & Schölkopf, IEEE TIT 2012) 58
-
Univariate and Numeric
59(Marx & V, Telling Cause from Effect using MDL-based Local and Global Regression, ICDMâ17)
-
Two-Part MDLThe Minimum Description Length (MDL) principle
given a model class â³, the best model ðð â â³is the ðð that minimises
ð¿ð¿(ðð) + ð¿ð¿(ð·ð· ⣠ðð)in which
ð¿ð¿ ðð is the length, in bits, of the description of ðð
ð¿ð¿ ð·ð· ðð is the length, in bits, of the description of the data when encoded using ðð
(see, e.g., Rissanen 1978, 1983, GrÃŒnwald, 2007) 60
-
MDL and Regression
61
a1 x + a0
ð¿ð¿ ðð + ð¿ð¿(ð·ð·|ðð)
a10 x10 + a9 x9 + ⊠+ a0
errors
{ }
-
Modelling the DataWe model ðð as
ðð = ðð ðð + ð©ð©
As ðð we consider linear, quadratic, cubic, exponential, and reciprocal functions, andmodel the noise using a 0-mean Gaussian. We choose the ðð that minimizes
ð¿ð¿ ðð ðð = ð¿ð¿ ðð + ð¿ð¿(ð©ð©)
62
-
SLOPE â computing ð¿ð¿ ðð ðð)
63
-
Confidence and SignificanceHow certain are we?
64
â = ð¿ð¿ ðð + ð¿ð¿ ðð ðð) â ð¿ð¿ ðð + ð¿ð¿ ðð ðð) the higher the more certain
ð¿ð¿(ðð â ðð) ð¿ð¿(ðð â ðð)
-
Confidence Robustness
65
RESIT(HSIC idep.)
IGCI(Entropy)
SLOPE(Compression)
-
Putting SLOPE to the testWe first evaluate using an ANM, with linear, cubic, or reciprocal functions, sampling ðð and noise as indicated.
(Resit by Peters et al, 2014; IGCI by Janzing et al 2012) 66
Uniform Gaussian Binomial Poisson
-
Performance on Benchmark Data(TÃŒbingen 97 univariate numeric cause-effect pairs, weighted)
67
-
Performance on Benchmark Data(TÃŒbingen 97 univariate numeric cause-effect pairs, weighted)
68
Inferences of state of the art algorithms orderedby confidence values.
SLOPE is 85% accurate with ðŒðŒ = 0.001
-
Detecting Confounding
71(Kaltenpoth & V. Telling Causal From Confounded, SDMâ19)
-
Does Chocolate Consumption cause Nobel Prizes?
72
-
Reichenbach
If ðð and ðð are statistically dependent then either
How can we distinguish these cases?
(Reichenbach, 1956) 73
ðð ðð ðð
ðð
ðð ðð ðð
-
Conditional Independence TestsIf we have measured everything relevant
then testing ðð ⥠ðð|ðð for all possible ððlets us decide whether
or
Problem: Itâs impossible to measure everything relevant
74
ðð
ðð
ðð ðð ðð
-
Why not just find a confounder?We would like to be able to infer a ï¿œÌï¿œð such that
ðð ⥠ðð|ï¿œÌï¿œð
if and only if ðð and ðð are actually confounded
Problem: Finding such a ï¿œÌï¿œð is too easy. ï¿œÌï¿œð = ðð always works
75
-
Kolmogorov ComplexityðŸðŸ(ðð) is the length of the shortest program computing ðð
ðŸðŸ ðð = minðð
ðð :ðð â 0,1 â, ð°ð° ðð, ð¥ð¥, ðð â ðð ð¥ð¥ <1ðð
This shortest program ððâ is the best compression of ðð
76
-
From the Markov ConditionâŠ
An admissible causal network for ðð1, ⊠,ðððð is ðºðº satisfying
ðð ðð1, ⊠,ðððð = ï¿œðð=1
ðð
ðð ðððð ðððððð
Problem: How do we find a simple factorization?
77
-
âŠto the Algorithmic Markov Condition
The simplest causal network for ðð1, ⊠,ðððð is ðºðºâ satisfying
ðŸðŸ(ðð ðð1, ⊠,ðððð ) = ï¿œðð=1
ðð
ðŸðŸ(ðð ðððð ððððððâ )
Postulate: ðºðºâ corresponds to the true generating process
(Janzing & Schölkopf, 2010) 78
-
AMC with Confounding
We can also include latent variables
ðŸðŸ ðð ð¿ð¿,ðð = ï¿œðð=1
ðð
ðŸðŸ ðð ðððð ððððððâ² + ï¿œðð=1
ðð
ðŸðŸ ðð ðððð
79
-
We donât know ðð(â )
ðð ð¿ð¿,ðð = ðð ðð ï¿œðð=1
ðð
ðð(ðððð ⣠ðð)
In particular, we will use probabilistic PCA
80
-
Kolmogorov is not computable
For data ðð, the Minimum Description Length principle identifies the best model ðð â â³ by minimizing
ð¿ð¿ ðð,ðð = ð¿ð¿ ðð + ð¿ð¿(ðð ⣠ðð)
gives a statistically sound approximation to ðŸðŸ
(GrÃŒnwald 2007) 81
-
Decisions, decisions
If
ð¿ð¿ ð¿ð¿,ðð, ⣠â³ðððð < ð¿ð¿ ð¿ð¿,ðð ⣠â³ðððð
then we consider ð¿ð¿,ðð to be confounded
82
-
Decisions, decisions
If
ð¿ð¿ ð¿ð¿,ðð, ⣠â³ðððð > ð¿ð¿ ð¿ð¿,ðð ⣠â³ðððð
then we consider ð¿ð¿,ðð to be causal
The difference can be interpreted as confidence
83
-
Confounding in Synthetic Data
84
-
Synthetic Data: ResultsThere are only two other works directly related to oursSA: Confounding strength in linear models using spectral analysisICA: Confounding strength using independent component analysis
(Janzing & Schölkopf, 2017, 2018) 85
-
Confounding in Genetic NetworksMore realistically, we consider gene regulation data
86
-
Optical Data
(Janzing & Schölkopf, 2017) 87
-
Optical Data
88
-
Wait! What aboutâŠ
(Messerli 2012) 89
-
ConclusionsCausal inference from observational data necessary when making decisions, and to evaluate what-if scenarios impossible without assumptions about the causal model
Constraint-based causal discovery traditional approach based on conditional independence testing PC-algorithm discovers causal skeleton and orients (some) edges
Algorithmic Markov condition works very well in practice prefer simple explanations over complex ones consider complexity of both the model and the data
There is no causality without assumptions early work on relaxing e.g. causal sufficiency, determining confounding
90
-
Thank you!
âNo causal claim can be established by a purely statistical method, be it propensity scores,
regression, stratification, or any other distribution-based designâ
91(Pearl)
Slide Number 1Questions of the dayCausationCorrelation vs. CausationDependence vs. CausationWhat is causal inference?Causal InferenceNaïve approachNaïve approach failsNaïve approachNaïve approach failsNaïve approachNaïve approach failsThe Ultimate TestThe Ultimate TestDo, or do notSame old, same old?What do you want?Observational DataStandard learning setupCausal learning setupCausal learning goalCausal learning goalSlide Number 24Causal DiscoveryChoicesâŠStatistical CausalityCausal Markov ConditionTypes of NodesCausal DiscoveryCausal DiscoveryCausal Markov ConditionIn other wordsâŠConstraint-Based Causal DiscoveryConstraint-Based Causal DiscoveryStep 1: Discover the SkeletonStep 1: Discover the SkeletonStep 1: Discover the SkeletonStep 1: Discover the SkeletonStep 2: OrientationStep 2: OrientationStep 2: OrientationSlide Number 43Causal InferenceCausal InferenceCausal InferenceThree is a crowdWiggle WiggleMay The Noise Be With YouMay The Noise Be With YouIndependence of Input and MechanismAdditive Noise ModelsANMs and IdentifiabilityAdditive Noise ModelsIndependence of Input and MechanismPlausible Markov KernelsKolmogorov ComplexityAlgorithmic Markov ConditionSlide Number 59Two-Part MDLMDL and RegressionModelling the DataSlope â computing ð¿ ð ð)Confidence and SignificanceConfidence RobustnessPutting Slope to the testPerformance on Benchmark Dataï¿œ(TÃŒbingen 97 univariate numeric cause-effect pairs, weighted)ï¿œPerformance on Benchmark Dataï¿œ(TÃŒbingen 97 univariate numeric cause-effect pairs, weighted)ï¿œSlide Number 71Does Chocolate Consumption cause Nobel Prizes?ReichenbachConditional Independence TestsWhy not just find a confounder?Kolmogorov ComplexityFrom the Markov ConditionâŠâŠto the Algorithmic Markov ConditionAMC with ConfoundingWe donât know ð(â )Kolmogorov is not computableDecisions, decisionsDecisions, decisionsConfounding in Synthetic DataSynthetic Data: ResultsConfounding in Genetic NetworksOptical DataOptical DataWait! What aboutâŠConclusionsSlide Number 91