challenges in causality: results of the wcci 2008 challenge isabelle guyon, clopinet constantin...
TRANSCRIPT
![Page 1: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/1.jpg)
Challenges in causality: Results of the WCCI 2008
challenge
Isabelle Guyon, ClopinetConstantin Aliferis and Alexander Statnikov, Vanderbilt Univ.
André Elisseeff and Jean-Philippe Pellet, IBM Zürich
Gregory F. Cooper, Pittsburg University
Peter Spirtes, Carnegie Mellon
![Page 2: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/2.jpg)
Causal discovery
Which actions will have beneficial effects?
…your health?
…climate changes?… the economy?
What affects…
![Page 3: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/3.jpg)
What is causality?
• Many definitions:– Science– Philosophy– Law– Psychology– History– Religion– Engineering
• “Cause is the effect concealed, effect is the cause revealed” (Hindu philosophy)
![Page 4: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/4.jpg)
The system
Systemic causality
External agent
![Page 5: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/5.jpg)
Difficulty
• A lot of “observational” data.
Correlation Causality!
• Experiments are often needed, but:– Costly– Unethical– Infeasible
![Page 6: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/6.jpg)
Causality workbench
http://clopinet.com/causality
![Page 7: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/7.jpg)
Our approach
What is the causal question?
Why should we care?
What is hard about it?
Is this solvable?
Is this a good benchmark?
![Page 8: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/8.jpg)
Four tasks
Toy datasets
Challenge datasets
![Page 9: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/9.jpg)
On-line feed-back
![Page 10: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/10.jpg)
Toy Examples
![Page 11: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/11.jpg)
Lung Cancer
Smoking Genetics
Coughing
AttentionDisorder
Allergy
Anxiety Peer Pressure
Yellow Fingers
Car Accident
Born an Even Day
Fatigue
LUCAS0: natural
Causality assessmentwith manipulations
![Page 12: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/12.jpg)
LUCAS1: manipulate
d
Lung Cancer
Smoking Genetics
Coughing
AttentionDisorder
Allergy
Anxiety Peer Pressure
Yellow Fingers
Car Accident
Born an Even Day
Fatigue
Causality assessmentwith manipulations
![Page 13: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/13.jpg)
Lung Cancer
Smoking Genetics
Coughing
AttentionDisorder
Allergy
Anxiety Peer Pressure
Yellow Fingers
Car Accident
Born an Even Day
Fatigue
LUCAS2: manipulate
d
Causality assessmentwith manipulations
![Page 14: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/14.jpg)
Goal driven causality
0
9 4
11
61
10 2
3
7
5
8
• We define: V=variables of interest
(e.g. MB, direct causes, ...)
• We assess causal relevance: Fscore=f(V,S).
4 11 2 3 1
• Participants return: S=selected subset
(ordered or not).
![Page 15: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/15.jpg)
Causality assessmentwithout manipulation?
![Page 16: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/16.jpg)
Using artificial “probes”
Lung Cancer
Smoking Genetics
Coughing
AttentionDisorder
Allergy
Anxiety Peer Pressure
Yellow Fingers
Car Accident
Born an Even Day
FatigueLUCAP0: natural
Probes
P1 P2 P3 PT
![Page 17: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/17.jpg)
Lung Cancer
Smoking Genetics
Coughing
AttentionDisorder
Allergy
Anxiety Peer Pressure
Yellow Fingers
Car Accident
Born an Even Day
FatigueLUCAP0: natural
Probes
P1 P2 P3 PT
Using artificial “probes”
![Page 18: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/18.jpg)
Probes
Lung Cancer
Smoking Genetics
Coughing
AttentionDisorder
Allergy
Anxiety Peer Pressure
Yellow Fingers
Car Accident
Born an Even Day
Fatigue
P1 P2 P3 PT
LUCAP1&2:
manipulated
Using artificial “probes”
![Page 19: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/19.jpg)
Scoring using “probes”
• What we can compute (Fscore):
– Negative class = probes (here, all “non-causes”, all manipulated).
– Positive class = other variables (may include causes and non causes).
• What we want (Rscore):
– Positive class = causes.
– Negative class = non-causes.
• What we get (asymptotically):
Fscore = (NTruePos/NReal) Rscore + 0.5 (NTrueNeg/NReal)
![Page 20: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/20.jpg)
Results
![Page 21: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/21.jpg)
AUC distribution
![Page 22: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/22.jpg)
Methods employed
• Causal: Methods employing causal discovery technique to unravel cause-effect relationships in the neighborhood of the target.
• Markov blanket: Methods for extracting the Markov blanket, without attempting to unravel cause-effect relationships.
• Feature selection: Methods for selecting predictive features making no explicit attempt to uncover the Markov blanket or perform causal discovery.
![Page 23: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/23.jpg)
Formalism:Causal Bayesian networks
• Bayesian network:– Graph with random variables X1, X2, …Xn as
nodes.– Dependencies represented by edges.– Allow us to compute P(X1, X2, …Xn) as
i P( Xi | Parents(Xi) ).
– Edge directions have no meaning.
• Causal Bayesian network: egde directions indicate causality.
![Page 24: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/24.jpg)
Causal discovery from “observational data”
Example algorithm: PC (Peter Spirtes and Clarck Glymour, 1999)
Let A, B, C X and V X. Initialize with a fully connected un-oriented graph.1. Conditional independence. Cut connection if
V s.t. (A B | V).2. Colliders. In triplets A — C — B (A — B) if there is
no subset V containing C s.t. A B | V, orient edges as: A C B.
3. Constraint-propagation. Orient edges until no change:
(i) If A B … C, and A — C then A C. (ii) If A B — C then B C.
![Page 25: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/25.jpg)
Computational and statistical complexity
Computing the full causal graph poses:• Computational challenges (intractable for large numbers of
variables)• Statistical challenges (difficulty of estimation of conditional
probabilities for many var. w. few samples).
Compromise:• Develop algorithms with good average- case
performance, tractable for many real-life datasets.• Abandon learning the full causal graph and instead
develop methods that learn a local neighborhood.• Abandon learning the fully oriented causal graph and
instead develop methods that learn unoriented graphs.
![Page 26: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/26.jpg)
Target Y
A prototypical MB algo: HITON
Aliferis-Tsamardinos-Statnikov, 2003
![Page 27: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/27.jpg)
Target Y
1 – Identify variables with direct edges to the target
(parent/children)
Aliferis-Tsamardinos-Statnikov, 2003
![Page 28: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/28.jpg)
Target Y
Aliferis-Tsamardinos-Statnikov, 2003
1 – Identify variables with direct edges to the target
(parent/children)
A
B Iteration 1: add A
Iteration 2: add B
Iteration 3: remove A because A Y | B
etc.
A
A B
B
![Page 29: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/29.jpg)
Target Y
Aliferis-Tsamardinos-Statnikov, 2003
2 – Repeat algorithm for parents and children of Y(get
depth two relatives)
![Page 30: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/30.jpg)
Target Y
Aliferis-Tsamardinos-Statnikov, 2003
3 – Remove non-members of the MB
A member A of PCPC that is not in PC is a member of the Markov Blanket if there is some member of PC B, such that A becomes conditionally dependent with Y conditioned on any subset of the remaining variables and B .
A
B
![Page 31: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/31.jpg)
Collider
Spouse
Target Y
Spouse
Collider
Aliferis-Tsamardinos-Statnikov, 2003
4 – Orient edges1. Colliders:
• The presence of a spouse determines a collider.
• The target may also be a collider (B C | Y).
B C
![Page 32: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/32.jpg)
Collider
Spouse
Target Y
Spouse
Collider
Aliferis-Tsamardinos-Statnikov, 2003
4 – Orient edges1. Colliders:
• The presence of a spouse determines a collider.
• The target may also be a collider (B C | Y).
2. Orient remaining edges.
B C
![Page 33: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/33.jpg)
Additional Bells and Whistles
• The basic algorithms make simplifying assumptions:– Faithfulness (any conditional independence
between two variables results in an absence of direct edge.)
– Causal sufficiency (there are no unobserved common causes of the observed variables.)
• Laura E. Brown &Ioannis Tsamardinos:– Violations of “faithfulness”: select product of
features.– Violation of “causal sufficiency”: use Y structures.
![Page 34: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/34.jpg)
Discussion
![Page 35: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/35.jpg)
Top ranking methods
• According to the rules of the challenge:– Yin Wen Chang: SVM => best prediction accuracy on
REGED and CINA. – Gavin Cawley: Causal explorer + linear ridge
regression ensembles => best prediction accuracy on SIDO and MARTI.
• According to pairwise comparisons:– Jianxin Yin and Prof. Zhi Geng’s group: Partial
Orientation and Local Structural Learning => best on Pareto front, new original causal discovery algorithm.
![Page 36: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/36.jpg)
Pairwise comparisons
Gavin CawleyYin-Wen Chang
Mehreen Saeed
Alexander Borisov
E. Mwebaze & J. QuinnH. Jair Escalante
J.G. Castellano
Chen Chu AnLouis Duclos-Gosselin
Cristian Grozea
H.A. Jen
J. Yin & Z. Geng Gr.Jinzhu Jia
Jianming Jin
L.E.B & Y.T.
M.B.Vladimir Nikulin
Alexey Polovinkin
Marius PopescuChing-Wei Wang
Wu Zhili
Florin Popescu
CaMML TeamNistor Grozavu
![Page 37: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/37.jpg)
Causal vs. non-causal
Jianxin Yin: causal Vladimir Nikulin: non-causal
![Page 38: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/38.jpg)
Using manip-MB as feature set using a causal model
Unmanipulated (training)
Manipulation #1 (test)
Heuristic: (1) Use the post-manipulation MB as feature set; (2) train a classifier to predict Y on training data (from the unmanipulated distribution).
Manipulation #2 (test)
Problem: Manipulated children of the target may remain in the post-manipulation MB (if they are also spouses) but with a different dependency to the target.
![Page 39: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/39.jpg)
MB is not the best feature set?
Some features outside the MB may enhance predictivity if:a. Some MB features go undetected (e.g. the direct
causes are children of a common ancestor).
b. The predictor is too “weak” (e.g. the relationship to the target is non-linear but the predictor is linear).
Y
X
Z
y=a x2 z= x2
(a) (b)
![Page 40: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/40.jpg)
Insensitivity to irrelevant features
Simple univariate predictive model, binary target and features, all relevant features correlate perfectly with the target, all irrelevant features randomly drawn. With 98% confidence, abs(feat_weight) < w and i wixi < v.
ng number of “good” (relevant) features
nb number of “bad” (irrelevant) features
m number of training examples.
![Page 41: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/41.jpg)
Conclusion
• Causal discovery from observational data is not an impossible task, but a very hard one.
• This points to the need for further research and benchmark.
• Don’t miss the “pot-luck challenge”
http://clopinet.com/causality
![Page 42: Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André](https://reader036.vdocuments.us/reader036/viewer/2022062322/5697c0031a28abf838cc3f8c/html5/thumbnails/42.jpg)
1) Causal Feature SelectionI. Guyon, C. Aliferis, A. Elisseeff In “Computational Methods of Feature Selection”, Huan Liu and Hiroshi Motoda Eds., Chapman and Hall/CRC Press, 2007.
2) Design and Analysis of theCausation and Prediction Challenge
I. Guyon, C. Aliferis, G. Cooper, A. Elisseeff,J.-P. Pellet, P. Spirtes, A. Statnikov, JMLR workshop proceedings, in press.
http://clopinet.com/causality