an introduction to causal inference from observational data...the main point of causal inference and...
Post on 28-Jan-2021
6 Views
Preview:
TRANSCRIPT
-
Jilles Vreeken
But, why?an introduction to
causal inference fromobservational data
13 September 2019
-
Questions of the day
What is causation,how can we measure it,and how can discover it?
2
-
Causation
βthe relationship between something that happens or exists
and the thing that causes itβ
(Merriam-Webster) 3
-
Correlation vs. Causation
4
Correlation does not tell us anything about causality
Instead, we should talk about dependence.
-
Dependence vs. Causation
5
-
What is causal inference?
βreasoning to the conclusion that something is, or is likely to be, the cause of something elseβ
Godzillian different definitions of βcauseβ and βeffectβ equally many inference frameworks all require (strong) assumptions many highly specific
6
-
Causal Inference
7
-
NaΓ―ve approach
Ifππ cause ππ effect cause > ππ effect ππ cause effect
then cause β effect
8
-
NaΓ―ve approach fails
Ifππ cause ππ effect cause = ππ effect ππ cause effect
then cause β effect
9
Both are equal as they are simply factorizations of ππ(ππππππππππ, ππππππππππππ)
-
NaΓ―ve approach
Ifππ cause ππ effect cause > ππ effect ππ cause effect
then cause β effect
10
-
NaΓ―ve approach fails
Ifππ cause ππ effect cause ππ effect ππ cause effect
then cause β effect
11
Depends on distribution and domain size of
data, not on causal effect
-
NaΓ―ve approach
Ifππ cause ππ effect cause > ππ effect ππ cause effect
then cause β effect
12
-
NaΓ―ve approach fails
Ifππ effect cause
ππ effect >ππ cause β£ effect
ππ cause
then cause β effect
13
But do we know for sure that the lhs is higher when when cause β effect ?
What about differences in domain sizes, complexities of distributions, etc
-
The Ultimate TestRandomized controlled trials are the de-facto standard for determining whether ππ causes ππ treatment ππ β {0,1, β¦ }, potential effect ππ and co-variates ππ
Simply put, we 1. gather a large population of test subjects2. randomly split the population into two equally sized groups π΄π΄ and π΅π΅,
making sure that ππ is equally distributed between π΄π΄ and π΅π΅3. apply treatment ππ = 0 to group π΄π΄, and treatment ππ = 1 to group π΅π΅4. determine whether ππ and ππ are dependent
If ππ β₯ ππ, we conclude that ππ causes ππ
14
-
The Ultimate TestRandomized controlled trials are the de-facto standard for determining whether ππ causes ππ treatment ππ β {0,1, β¦ }, potential effect ππ and co-variates ππ
Simply put, we 1. gather a large population of test subjects2. randomly split the population into two equally sized groups π΄π΄ and π΅π΅,
making sure that ππ is equally distributed between π΄π΄ and π΅π΅3. apply treatment ππ = 0 to group π΄π΄, and treatment ππ = 1 to group π΅π΅4. determine whether ππ and ππ are dependent
If ππ β₯ ππ, we conclude that ππ causes ππ
15
Ultimate, but not ideal
β’ Often impossible or unethicalβ’ Large populations neededβ’ Difficult to control for ππ
-
Do, or do notObservational ππ π¦π¦ π₯π₯ distribution of ππ given that we observe variable ππ takes value π₯π₯ what we usually estimate, e.g. in regression or classification
can be inferred from data using Bayesβ rule ππ π¦π¦ π₯π₯ = ππ π₯π₯,π¦π¦ππ(π₯π₯)
Interventional ππ(π¦π¦ β£ ππππ π₯π₯ ) distribution of ππ given that we set the value of variable ππ to π₯π₯ describes the distribution of ππ we would observe if we would
intervene by artificially forcing ππ to take value π₯π₯, but otherwise use the original data generating process β ππ π₯π₯,π¦π¦, β¦ !
the conditional distribution of ππ we would get through a randomized control trial!
(Pearl, 1982) 16
-
Same old, same old?In general, ππ(π¦π¦ β£ ππππ π₯π₯ ) and ππ(π¦π¦ β£ π₯π₯) are not the same
Letβs consider my espresso machine π¦π¦ actual pressure in the boiler π₯π₯ pressure measured by front gauge
Now, if the barometer works well, ππ(π¦π¦ β£ π₯π₯)
will be unimodal around π₯π₯ intervening on the barometer,
e.g. moving its needle up or down,however, has no effect on the actual pressure and hence, ππ π¦π¦ ππππ π₯π₯ = ππ(π¦π¦)
17
-
What do you want?Before we go into a lot more detail, what do we want?
If you just want to predict, ππ π¦π¦ π₯π₯ is great e.g. when βinterpolatingβ ππ between its cause and its effects is fine also, boring, because lots of cool methods exist
If you want to act on π₯π₯, you really want ππ(π¦π¦ β£ ππππ π₯π₯ ) for example, for drug administration, or discovery also, exciting, not so many methods exist
18
-
Observational DataEven if we cannot directly access ππ(π¦π¦ β£ ππππ π₯π₯ )e.g. through randomized trials, it does exist
The main point of causal inference and do-calculus is:
If we cannot measure ππ(π¦π¦ β£ ππππ π₯π₯ ) directly in a randomized trial, can we estimate it
based on data we observed outside of a controlled experiment?
19
-
Standard learning setup
20
Training data
Trained model
ππ(π¦π¦ β£ π₯π₯; ππ)
Observable joint
Observational conditional
ππ(π¦π¦ β£ π₯π₯)β
~
-
Causal learning setup
21
β
~Observable joint
Observational conditional
ππ(π¦π¦ β£ π₯π₯)
Intervention joint
Interventionconditional
ππ(π¦π¦ β£ ππππ(π₯π₯))
Training data
Intervention model
ππ(π¦π¦ β£ π₯π₯; ππ)
?
-
Causal learning goal
22
β
Observational conditional
ππ(π¦π¦ β£ π₯π₯)
Intervention joint
Training data
?
Causal model
Observable joint
Mutilated causal model
Emulatedintervention joint
Interventionconditional
οΏ½ππ(π¦π¦ β£ ππππ(π₯π₯))β
Interventionconditional
ππ(π¦π¦ β£ ππππ(π₯π₯))
-
Causal learning goal
23
β
Observational conditional
ππ(π¦π¦ β£ π₯π₯)
Intervention joint
Causal model
Observable joint
Mutilated causal model
Emulatedintervention joint
Interventionconditional
οΏ½ππ(π¦π¦ β£ ππππ(π₯π₯))β
Interventionconditional
ππ(π¦π¦ β£ ππππ(π₯π₯))
Estimable formulaπΌπΌπ₯π₯β²~ππ π₯π₯ πΌπΌπ§π§~ππ π§π§ π₯π₯ ππ π¦π¦ π₯π₯β², π§π§
If through do-calculus we can derive an equivalent of οΏ½ππ(π¦π¦ β£ ππππ π₯π₯ ) without any ππππβs, we can estimate it from observational data alone and call οΏ½ππ(π¦π¦ β£ ππππ π₯π₯ ) identifiable.
-
Causal Discovery
24
-
Causal Discovery
25
X
Y Q
S
T
Z
V
W
-
Choicesβ¦
26
ππ ππ ππ
ππ ππ ππ
ππ ππ ππ
ππ
ππ
ππ
ππ ππ ππ
For these three, ππ β₯ Z, and ππ β₯ Z β£ ππ holds
For this one, ππ β₯ Z, and ππ β₯ Z β£ ππ holds
-
Statistical CausalityReichenbachβs
common cause principlelinks causality and probability
if ππ and ππ are statistically dependent then either
When ππ screens ππ and ππ from each other,given ππ, ππ and ππ become independent.
27
ππ ππ ππ
ππ
ππ ππ ππ
-
Causal Markov ConditionAny distribution generated by a Markovian model ππ
can be factorized as
ππ ππ1,ππ2, β¦ ,ππππ = οΏ½ππ
ππ(ππππ β£ ππππππ)
where ππ1,ππ2, β¦ ,ππππ are the endogenous variables in ππ, and ππππππ are (values of) the endogenous βparentsβ
of ππππ in the causal diagram associated with ππ
(Spirtes, Glymour, Scheines 1982; Pearl 2009) 28
-
Types of Nodes
29
X
Y Q
S
T
Z
V
W
Parentsof ππY Q
Descendantsof ππ S V
Non-descendantsof ππ
X
Z
W
-
Causal Discovery
30
X
Y
S
T
Z
V
W
QQ
S
T
V
Y
W
Endogenous variables
X
Z
Exogenous variables
-
Causal Discovery
31
X
Y
S
T
Z
V
W
QQ
S
T
V
Y
W
Endogenous variables
X
Z
Exogenous variables
Exogenous variable: A factor in a causal model that is not determined
by other variables in the system
Endogenous variable: A factor in a causal model that is determined
other variables in the system
-
Causal Markov ConditionAny distribution generated by a Markovian model ππ
can be factorized as
ππ ππ1,ππ2, β¦ ,ππππ = οΏ½ππ
ππ(ππππ β£ ππππππ)
where ππ1,ππ2, β¦ ,ππππ are the endogenous variables in ππ, and ππππππ are (values of) the endogenous βparentsβ
of ππππ in the causal diagram associated with ππ
(Spirtes, Glymour, Scheines 1982; Pearl 2009) 32
-
In other wordsβ¦For all distinct variables ππ and ππ in the variable set ππ,
if ππ does not cause ππ, then ππ ππ ππ,ππππππ = ππ(ππ β£ ππππππ)
That is, we can weed out edges from a causal graph βwe can identify DAGs up to Markov equivalence classes.
Which is great, although we are unable to choose among these
33
ππ
ππ
ππ
ππ
ππ
ππ
ππ
ππ
ππ
ππ
ππ
ππ
-
Constraint-Based Causal DiscoveryThe PC algorithm is one of the most well-known, and most relied upon causal discovery algorithms proposed by Peter Spirtes and Clark Glymour
Assumes the following1) data-generating distribution has the causal Markov property on graph πΊπΊ2) data generating distribution is faithful to πΊπΊ3) every member of the population has the same distribution4) all relevant variables are in πΊπΊ5) there is only one graph G to which the distribution is faithful
34
-
Constraint-Based Causal DiscoveryThe PC algorithm is one of the most well-known, and most relied upon causal discovery algorithms proposed by Peter Spirtes and Clark Glymour
Two main steps1) use conditional independence tests to determine
the undirected causal graph (aka the skeleton) 2) apply constraint-based rules to direct (some) edges
35
-
Step 1: Discover the Skeleton
36
Q
S
Z
V
W
X
Y
T
for ππ = 0 ππππ ππfor all ππ,ππ β π½π½ with π₯π₯,π¦π¦ β π¬π¬for all π¨π¨ β π½π½ of ππ nodes with π₯π₯, ππ , π¦π¦, ππ β πΈπΈif ππ β₯ ππ β£ π΄π΄remove (π₯π₯,π¦π¦) from π¬π¬
-
Step 1: Discover the Skeleton
37
Q
S
Z
V
W
ππ β₯ ππ β£ ππ ?
X
Y
T
ππ β₯ ππ β£ ππ β no causal edge
for ππ = 0 ππππ ππfor all ππ,ππ β π½π½ with π₯π₯,π¦π¦ β π¬π¬for all π¨π¨ β π½π½ of ππ nodes with π₯π₯, ππ , π¦π¦, ππ β πΈπΈif ππ β₯ ππ β£ π΄π΄remove (π₯π₯,π¦π¦) from π¬π¬
-
Step 1: Discover the Skeleton
38
Q
S
Z
V
W
X
Y
T
for ππ = 0 ππππ ππfor all ππ,ππ β π½π½ with π₯π₯,π¦π¦ β π¬π¬for all π¨π¨ β π½π½ of ππ nodes with π₯π₯, ππ , π¦π¦, ππ β πΈπΈif ππ β₯ ππ β£ π΄π΄remove (π₯π₯,π¦π¦) from π¬π¬
-
We now have the causal skeleton
Step 1: Discover the Skeleton
39
X
Y Q
S
T
Z
V
W
-
Step 2: Orientation
40
X
Y Q
S
T
Z
V
W
We now identify all colliders ππ β ππ β ππconsidering all relevant pairs once
Rule 1
A C
B
A C
B
π΄π΄ β₯ πΆπΆπ΄π΄ β₯ πΆπΆ β£ π΅π΅
-
Step 2: Orientation
41
X
Y Q
S
T
Z
V
W
Rule 1
A C
B
A C
B
π΄π΄ β₯ πΆπΆπ΄π΄ β₯ πΆπΆ β£ π΅π΅
We now identify all colliders ππ β ππ β ππconsidering all relevant pairs once
-
We then iteratively apply Rules 2β3 untilwe cannot orient any more edges
Step 2: Orientation
42
X
Y Q
S
T
Z
V
W
Rule 2
A C
B
A C
B
Rule 3
A C
B
A C
B
Rule 4
A C
B
D
A C
B
D
Rule 1
A C
B
A C
B
π΄π΄ β₯ πΆπΆπ΄π΄ β₯ πΆπΆ β£ π΅π΅
-
Causal Inference
43
-
We can find the causal skeleton usingconditional independence tests
Causal Inference
44
X
Y Q
S
T
Z
V
W
-
We can find the causal skeleton usingconditional independence tests,
but only few edge directions
Causal Inference
45
X
Y Q
S
T
Z
V
W
-
We can find the causal skeleton usingconditional independence tests,
but only few edge directions
Causal Inference
46
Q
S
T
Z
V
W
?
X
Y
-
Three is a crowdTraditional causal inference methods
rely on conditional independence testsand hence require at least three observed variables
That is, they cannot distinguish between ππ β ππ and ππ β ππ
as ππ π₯π₯ ππ π¦π¦ π₯π₯ = ππ π¦π¦ ππ(π₯π₯ β£ π¦π¦)are just factorisations of ππ(π₯π₯,π¦π¦)
Can we infer the causal direction between pairs?
47
-
Wiggle WiggleLetβs take another look at the definition of causality.
βthe relationship between something that happens or exists and the thing that causes itβ
From the do-calculus it follows that if ππ cause ππ, we can wiggle ππ by wiggling ππ,
while when we cannot wiggle ππ by wiggling ππ.
Butβ¦ we only have observational data jointly over ππ,ππ, and cannot do any wiggling ourselvesβ¦
48
-
May The Noise Be With You
(Janzing et al. 2012) 49
π¦π¦-value with large π»π»(ππ β£ π¦π¦)and large density ππ(π¦π¦)
π₯π₯
π¦π¦
-
May The Noise Be With You
(Janzing et al. 2012) 50
ππ(π₯π₯)
ππ(π₯π₯)
ππ(π¦π¦)
βIf the structure of density of ππ(π₯π₯) is not correlated with the slope of ππ, then the flat regions of ππ induce peaks in ππ(π¦π¦).
The causal hypothesis ππ β ππ is thus implausible because the causal mechanism ππβ1 appears to be adjusted to the βinputβ distribution ππ(π¦π¦).β
π₯π₯
π¦π¦
-
Independence of Input and Mechanism
If ππ causes ππ, the marginal distribution of the cause, ππ ππ
and the conditional distribution of the effect given the cause, ππ(ππ|ππ)
are independent
That is, if ππ β ππππ(ππ) contains no information about ππ(ππ|ππ)
(Sgouritsa et al 2015) 51
-
Additive Noise ModelsWhenever the joint distribution ππ(ππ,ππ)
admits a model in one direction, i.e.there exists an ππ and ππ such that
ππ = ππ ππ + ππ with ππ β₯ ππ,
but does not admit the reversed model, i.e.for all ππ and οΏ½ππ we have
ππ = ππ ππ + οΏ½ππ with οΏ½ππ β₯ ππ
We can infer ππ β ππ
(Shimizu et al. 2006, Hoyer et al. 2009, Zhang & HyvΓ€rinen 2009) 52
-
ANMs and IdentifiabilityWhen are ANMs identifiable? what do we need to assume about the data generating process
for ANM-based inference to make sense? for which functions ππ and what noise distributions π©π© are ANMs
identifiable from observational data?
Linear functions and Gaussian noiseLinear functions and non-Gaussian noiseFor most cases of non-linear functions and any noise
(Shimizu et al. 2006, Hoyer et al. 2009, Zhang & HyvΓ€rinen 2009) 53
-
Additive Noise ModelsWhenever the joint distribution ππ(ππ,ππ)
admits a model in one direction, i.e.there exists an ππ and ππ such that
ππ = ππ ππ + ππ with ππ β₯ ππ,
but does not admit the reversed model, i.e.for all ππ and οΏ½ππ we have
ππ = ππ ππ + οΏ½ππ with οΏ½ππ β₯ ππ
How do we determine or use this in practice?
(Shimizu et al. 2006, Hoyer et al. 2009, Zhang & HyvΓ€rinen 2009) 54
-
Independence of Input and Mechanism
If ππ causes ππ, the marginal distribution of the cause, ππ ππ
and the conditional distribution of the effect given the cause, ππ(ππ|ππ)
are independent
That is, if ππ β ππππ(ππ) contains no information about ππ(ππ|ππ)
(Sgouritsa et al 2015) 55
-
Plausible Markov KernelsIn other words, if we observe that
ππ cause ππ(effect β£ cause)
is simpler than
ππ effect ππ(cause β£ effect)
then it is likely that cause β effect
How to robustly measure βsimplerβ?
(Sun et al. 2006, Janzing et al. 2012) 56
-
Kolmogorov Complexity
πΎπΎ ππ
The Kolmogorov complexity of a binary string ππis the length of the shortest program ππβ
for a universal Turing Machine ππthat generates ππ and halts.
(Kolmogorov, 1963) 57
-
Algorithmic Markov Condition
If ππ β ππ, we have,up to an additive constant,
πΎπΎ ππ ππ + πΎπΎ ππ ππ ππ β€ πΎπΎ ππ ππ + πΎπΎ ππ ππ ππ
That is, we can do causal inference byidentifying the factorization of the joint with the lowest Kolmogorov complexity
(Janzing & SchΓΆlkopf, IEEE TIT 2012) 58
-
Univariate and Numeric
59(Marx & V, Telling Cause from Effect using MDL-based Local and Global Regression, ICDMβ17)
-
Two-Part MDLThe Minimum Description Length (MDL) principle
given a model class β³, the best model ππ β β³is the ππ that minimises
πΏπΏ(ππ) + πΏπΏ(π·π· β£ ππ)in which
πΏπΏ ππ is the length, in bits, of the description of ππ
πΏπΏ π·π· ππ is the length, in bits, of the description of the data when encoded using ππ
(see, e.g., Rissanen 1978, 1983, GrΓΌnwald, 2007) 60
-
MDL and Regression
61
a1 x + a0
πΏπΏ ππ + πΏπΏ(π·π·|ππ)
a10 x10 + a9 x9 + β¦ + a0
errors
{ }
-
Modelling the DataWe model ππ as
ππ = ππ ππ + π©π©
As ππ we consider linear, quadratic, cubic, exponential, and reciprocal functions, andmodel the noise using a 0-mean Gaussian. We choose the ππ that minimizes
πΏπΏ ππ ππ = πΏπΏ ππ + πΏπΏ(π©π©)
62
-
SLOPE β computing πΏπΏ ππ ππ)
63
-
Confidence and SignificanceHow certain are we?
64
β = πΏπΏ ππ + πΏπΏ ππ ππ) β πΏπΏ ππ + πΏπΏ ππ ππ) the higher the more certain
πΏπΏ(ππ β ππ) πΏπΏ(ππ β ππ)
-
Confidence Robustness
65
RESIT(HSIC idep.)
IGCI(Entropy)
SLOPE(Compression)
-
Putting SLOPE to the testWe first evaluate using an ANM, with linear, cubic, or reciprocal functions, sampling ππ and noise as indicated.
(Resit by Peters et al, 2014; IGCI by Janzing et al 2012) 66
Uniform Gaussian Binomial Poisson
-
Performance on Benchmark Data(TΓΌbingen 97 univariate numeric cause-effect pairs, weighted)
67
-
Performance on Benchmark Data(TΓΌbingen 97 univariate numeric cause-effect pairs, weighted)
68
Inferences of state of the art algorithms orderedby confidence values.
SLOPE is 85% accurate with πΌπΌ = 0.001
-
Detecting Confounding
71(Kaltenpoth & V. Telling Causal From Confounded, SDMβ19)
-
Does Chocolate Consumption cause Nobel Prizes?
72
-
Reichenbach
If ππ and ππ are statistically dependent then either
How can we distinguish these cases?
(Reichenbach, 1956) 73
ππ ππ ππ
ππ
ππ ππ ππ
-
Conditional Independence TestsIf we have measured everything relevant
then testing ππ β₯ ππ|ππ for all possible ππlets us decide whether
or
Problem: Itβs impossible to measure everything relevant
74
ππ
ππ
ππ ππ ππ
-
Why not just find a confounder?We would like to be able to infer a οΏ½ΜοΏ½π such that
ππ β₯ ππ|οΏ½ΜοΏ½π
if and only if ππ and ππ are actually confounded
Problem: Finding such a οΏ½ΜοΏ½π is too easy. οΏ½ΜοΏ½π = ππ always works
75
-
Kolmogorov ComplexityπΎπΎ(ππ) is the length of the shortest program computing ππ
πΎπΎ ππ = minππ
ππ :ππ β 0,1 β, π°π° ππ, π₯π₯, ππ β ππ π₯π₯ <1ππ
This shortest program ππβ is the best compression of ππ
76
-
From the Markov Conditionβ¦
An admissible causal network for ππ1, β¦ ,ππππ is πΊπΊ satisfying
ππ ππ1, β¦ ,ππππ = οΏ½ππ=1
ππ
ππ ππππ ππππππ
Problem: How do we find a simple factorization?
77
-
β¦to the Algorithmic Markov Condition
The simplest causal network for ππ1, β¦ ,ππππ is πΊπΊβ satisfying
πΎπΎ(ππ ππ1, β¦ ,ππππ ) = οΏ½ππ=1
ππ
πΎπΎ(ππ ππππ ππππππβ )
Postulate: πΊπΊβ corresponds to the true generating process
(Janzing & SchΓΆlkopf, 2010) 78
-
AMC with Confounding
We can also include latent variables
πΎπΎ ππ πΏπΏ,ππ = οΏ½ππ=1
ππ
πΎπΎ ππ ππππ ππππππβ² + οΏ½ππ=1
ππ
πΎπΎ ππ ππππ
79
-
We donβt know ππ(β )
ππ πΏπΏ,ππ = ππ ππ οΏ½ππ=1
ππ
ππ(ππππ β£ ππ)
In particular, we will use probabilistic PCA
80
-
Kolmogorov is not computable
For data ππ, the Minimum Description Length principle identifies the best model ππ β β³ by minimizing
πΏπΏ ππ,ππ = πΏπΏ ππ + πΏπΏ(ππ β£ ππ)
gives a statistically sound approximation to πΎπΎ
(GrΓΌnwald 2007) 81
-
Decisions, decisions
If
πΏπΏ πΏπΏ,ππ, β£ β³ππππ < πΏπΏ πΏπΏ,ππ β£ β³ππππ
then we consider πΏπΏ,ππ to be confounded
82
-
Decisions, decisions
If
πΏπΏ πΏπΏ,ππ, β£ β³ππππ > πΏπΏ πΏπΏ,ππ β£ β³ππππ
then we consider πΏπΏ,ππ to be causal
The difference can be interpreted as confidence
83
-
Confounding in Synthetic Data
84
-
Synthetic Data: ResultsThere are only two other works directly related to oursSA: Confounding strength in linear models using spectral analysisICA: Confounding strength using independent component analysis
(Janzing & SchΓΆlkopf, 2017, 2018) 85
-
Confounding in Genetic NetworksMore realistically, we consider gene regulation data
86
-
Optical Data
(Janzing & SchΓΆlkopf, 2017) 87
-
Optical Data
88
-
Wait! What aboutβ¦
(Messerli 2012) 89
-
ConclusionsCausal inference from observational data necessary when making decisions, and to evaluate what-if scenarios impossible without assumptions about the causal model
Constraint-based causal discovery traditional approach based on conditional independence testing PC-algorithm discovers causal skeleton and orients (some) edges
Algorithmic Markov condition works very well in practice prefer simple explanations over complex ones consider complexity of both the model and the data
There is no causality without assumptions early work on relaxing e.g. causal sufficiency, determining confounding
90
-
Thank you!
βNo causal claim can be established by a purely statistical method, be it propensity scores,
regression, stratification, or any other distribution-based designβ
91(Pearl)
Slide Number 1Questions of the dayCausationCorrelation vs. CausationDependence vs. CausationWhat is causal inference?Causal InferenceNaΓ―ve approachNaΓ―ve approach failsNaΓ―ve approachNaΓ―ve approach failsNaΓ―ve approachNaΓ―ve approach failsThe Ultimate TestThe Ultimate TestDo, or do notSame old, same old?What do you want?Observational DataStandard learning setupCausal learning setupCausal learning goalCausal learning goalSlide Number 24Causal DiscoveryChoicesβ¦Statistical CausalityCausal Markov ConditionTypes of NodesCausal DiscoveryCausal DiscoveryCausal Markov ConditionIn other wordsβ¦Constraint-Based Causal DiscoveryConstraint-Based Causal DiscoveryStep 1: Discover the SkeletonStep 1: Discover the SkeletonStep 1: Discover the SkeletonStep 1: Discover the SkeletonStep 2: OrientationStep 2: OrientationStep 2: OrientationSlide Number 43Causal InferenceCausal InferenceCausal InferenceThree is a crowdWiggle WiggleMay The Noise Be With YouMay The Noise Be With YouIndependence of Input and MechanismAdditive Noise ModelsANMs and IdentifiabilityAdditive Noise ModelsIndependence of Input and MechanismPlausible Markov KernelsKolmogorov ComplexityAlgorithmic Markov ConditionSlide Number 59Two-Part MDLMDL and RegressionModelling the DataSlope β computing πΏ π π)Confidence and SignificanceConfidence RobustnessPutting Slope to the testPerformance on Benchmark DataοΏ½(TΓΌbingen 97 univariate numeric cause-effect pairs, weighted)οΏ½Performance on Benchmark DataοΏ½(TΓΌbingen 97 univariate numeric cause-effect pairs, weighted)οΏ½Slide Number 71Does Chocolate Consumption cause Nobel Prizes?ReichenbachConditional Independence TestsWhy not just find a confounder?Kolmogorov ComplexityFrom the Markov Conditionβ¦β¦to the Algorithmic Markov ConditionAMC with ConfoundingWe donβt know π(β )Kolmogorov is not computableDecisions, decisionsDecisions, decisionsConfounding in Synthetic DataSynthetic Data: ResultsConfounding in Genetic NetworksOptical DataOptical DataWait! What aboutβ¦ConclusionsSlide Number 91
top related