incorporating prior information in causal discovery rodney o'donnell, jahangir alam, bin han,...
Post on 15-Jan-2016
220 views
TRANSCRIPT
![Page 1: Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649d575503460f94a35377/html5/thumbnails/1.jpg)
Incorporating Prior Information in Causal Discovery
Rodney O'Donnell, Jahangir Alam,
Bin Han, Kevin Korb and Ann Nicholson
![Page 2: Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649d575503460f94a35377/html5/thumbnails/2.jpg)
Outline• Methods for learning causal models
– Data mining, Elicitation, Hybrid approach
• Algorithms for learning causal models– Constraint based– Metric based (including our CaMML)
• Incorporating priors into CaMML– 5 different types of priors
• Experimental Design
• Experimental Results
![Page 3: Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649d575503460f94a35377/html5/thumbnails/3.jpg)
Learning Causal Bayesian Networks
Elicitation Data mining
Requires domain knowledge Requires large dataset
Expensive and time-consuming
Sometimes, the algorithms are “stupid”
(no prior knowledge →no common sense)
Partial knowledge may be insufficient
Data only tells part of the story
![Page 4: Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649d575503460f94a35377/html5/thumbnails/4.jpg)
A hybrid approach
• Combine the domain knowledge and the facts learned from data
• Minimize the expert’s effort in domain knowledge elicitation
Elicitation Data Mining
Causal BN
• Enhance the efficiency of the learning process– Reduce / bias the search space
![Page 5: Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649d575503460f94a35377/html5/thumbnails/5.jpg)
Objectives
• Generate different prior specification methods
• Comparatively study the influences of priors on the BN structural learning
• Future: apply the methods to the Heart Disease modeling project
![Page 6: Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649d575503460f94a35377/html5/thumbnails/6.jpg)
Causal learning algorithms
• Constraint based– Pearl & Verma’s algorithm, PC
• Metric based– MML, MDL, BIC, BDe, K2,
K2+MWST,GES,CaMML
• Priors on structure– Optional vs. Required – Hard vs. Soft
![Page 7: Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649d575503460f94a35377/html5/thumbnails/7.jpg)
Priors on structure
Required Optional Hard Soft
K2 (BNT) yes yes
K2+MWST
(BNT)yes yes
GES
(Tetrad)yes yes
PC (Tetrad ) yes yes
CaMML yes yes yes
![Page 8: Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649d575503460f94a35377/html5/thumbnails/8.jpg)
CaMML• MML metric based• MML vs. MDL
– MML can be derived from Bayes’ Theorem (Wallace)– MDL is a non-Bayesian method
• Search: MCMC sampling through TOM space– TOM = DAG + total ordering – TOM is finer than DAG
A
B C
Two TOMs: ABC, ACB
A B C
One Tom: ABC
![Page 9: Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649d575503460f94a35377/html5/thumbnails/9.jpg)
Priors in CaMML: arcs
Experts may provide priors on pairwise relations:1. Directed arcs:
– e.g. {A→B 0.7} (soft)– e.g. {A→D 1.0} (hard)
2. Undirected arcs– E.g. {A─C 0.6} (soft)
3. {A→B 0.7; B→A 0.8; A─C 0.6}– Represented by 2 adjacency matrices
0.7
0.8
A B C
A
B
C
Directed arcs
0.6
0.6
A B C
A
B
C
Undirected arcs
![Page 10: Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649d575503460f94a35377/html5/thumbnails/10.jpg)
Priors in CaMML: arcs (continued)
• MML cost for each pair
AB: log(0.7) + log(1-0.8)
AC: log(1-0.6)
BC: log( default arc prior)
expert specified network
A0.7
0.8
0.6
B C
One candidate network
A
B C
![Page 11: Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649d575503460f94a35377/html5/thumbnails/11.jpg)
Priors in CaMML: Tiers
• Expert can provide prior on an additional pairwise relation
• Tier: Temporal ordering of variables
E.g., Tier {A>>C 0.6;B>>C 0.8}
IMML(h)=log(0.6)+log(1-0.8)
A
C
B
One possible TOM
![Page 12: Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649d575503460f94a35377/html5/thumbnails/12.jpg)
Priors in CaMML: edPrior
• Expert specifies single network, plus a confidence– e.g. EdConf=0.7
• Prior is based on edit distance from this network
A
B
Expert specified network
C
IMML(h)=-2*(log0.7-log(1-0.7))
One candidate network :ED=2
A
B C
![Page 13: Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649d575503460f94a35377/html5/thumbnails/13.jpg)
Priors in CaMML: KTPrior
• Again, expert specifies single network, plus a confidence– e.g. KTConf = 0.7
• Prior is based on Kendall-Tau Edit distance from this network – KTEditDist = KT + undirected ED
A B C
Expert specified dagTOM: ABC
A
B C
A candidate TOM: ACB
IMML(h)=-3*(log0.7-log(1-0.7))
• B-C order in expert TOM disagrees with candidate TOM • KTEditDist = KT(1) + Undirected ED (2) = 3
![Page 14: Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649d575503460f94a35377/html5/thumbnails/14.jpg)
Experiment 1: Design• Prior
– weak, strong– correct, incorrect
• Size of dataset– 100,1000,10k and 100k– For each size we randomly generate 30 datasets
• Algorithms: – CaMML– K2 (BNT)– K2+MWST (BNT)– GES (TETRAD)– PC (TETRAD)
• Models: AsiaNet, “Model6”(An artificial model)
![Page 15: Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649d575503460f94a35377/html5/thumbnails/15.jpg)
Models: AsiaNet and “Model6”
![Page 16: Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649d575503460f94a35377/html5/thumbnails/16.jpg)
Experimental Design
Priors
Algorithms
Sample Size
![Page 17: Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649d575503460f94a35377/html5/thumbnails/17.jpg)
Experiment Design: Evaluation
• ED: Difference between Structures
• KL: Difference between distributions
![Page 18: Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649d575503460f94a35377/html5/thumbnails/18.jpg)
Model6 (1000 samples)
![Page 19: Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649d575503460f94a35377/html5/thumbnails/19.jpg)
Model6 (10k samples)
![Page 20: Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649d575503460f94a35377/html5/thumbnails/20.jpg)
AsiaNet (1000 Samples)
![Page 21: Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649d575503460f94a35377/html5/thumbnails/21.jpg)
Experiment 1: Results
• With default priors: CaMML is comparable to or outperforms other algorithms
• With full tiers: – There is no statistically significant differences
between CaMML and K2– GES is slightly behind, PC performs poorly.
• CaMML is the only method allowing soft priors: – with the prior 0.7, CaMML is comparable to other
algorithms with full tiers– With stronger prior, CaMML performs better
• CaMML performs significantly better with expert’s priors than with uniform priors
![Page 22: Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649d575503460f94a35377/html5/thumbnails/22.jpg)
Expertiment 2:Is CaMML well calibrated?
• Biased prior– Expert’s confidence may not be consistent
with the expert’s skill
e.g, expert 0.99 sure but wrong about a connection
– Biased hard prior– Soft prior and data will eventually overcome
the bad prior
![Page 23: Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649d575503460f94a35377/html5/thumbnails/23.jpg)
Is CaMML well calibrated?
• Question: Does CaMML reward well calibrated experts?
• Experimental design– Objective measure: How good is a proposed
structure?: • ED: 0-14
– Subjective measure: Expert’s confidence• 0.5 to 0.9999
– How good is the learned structure?• KL distance
![Page 24: Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649d575503460f94a35377/html5/thumbnails/24.jpg)
Effect of expert skill and confidence on quality of learned model
Better ← Expert Skill → Worse
Overconfidence penalized
Justified confidence rewarded
Unconfident expert
![Page 25: Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649d575503460f94a35377/html5/thumbnails/25.jpg)
Experiment 2: Results
• CaMML improves the elicited structure and approaches the true structure
• CaMML improves when the expert confidence matches with the expert skill
![Page 26: Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649d575503460f94a35377/html5/thumbnails/26.jpg)
Conclusions
• CaMML is comparable to other algorithms when given equivalent prior knowledge
• CaMML can incorporate more flexible prior knowledge
• CaMML’s results improve when expert is skillful or well calibrated
![Page 27: Incorporating Prior Information in Causal Discovery Rodney O'Donnell, Jahangir Alam, Bin Han, Kevin Korb and Ann Nicholson](https://reader036.vdocuments.us/reader036/viewer/2022062314/56649d575503460f94a35377/html5/thumbnails/27.jpg)
Thanks