causal&inference:&predic1on,& explana1on,&and&interven1on&
TRANSCRIPT
Causal&Inference:&predic1on,&explana1on,&and&interven1on&
Lecture&6:&Causality&in&1me&series&(part&I)&
Samantha&Kleinberg&[email protected]&
Leibovici,&L.&(2001).&Effects&of&remote,&retroac1ve&intercessory&prayer&on&outcomes&in&pa1ents&with&bloodstream&infec1on:&Randomised&controlled&trial.&Bmj,&323(7327),&1450Q1451.&&
Contrast&
• Exposure&to&someone&with&chickenpox&Tuesday,&illness&Friday&
• Exposure&to&someone&with&chickenpox&in&September,&illness&in&November&
&
Following&videos&from:&Scholl,&B.&J.,&&&Tremoulet,&P.&(2000).&Perceptual&causality&and&animacy.&Trends'in'Cogni,ve'Sciences,'4(8),&299Q309.&
Delays&and&percep1on&
• Micho]e:&no&percep1on&of&causality&with&delay&
• Shanks,&Pearson,&and&Dickinson&(1999):&increase&in&delay&decreases&causal&judgment&
Expecta1ons&
• Keyboard/triangle:&Delays&lower&causal&judgment&but&knowledge&that&there&could&be&delay&reduced&effect&(Buehner&&&May&2003)&
• Energy&efficient&lightbulb:&No&effect&from&delay…&but&instant&effect&s1ll&judged&as&causal&(Buehner&&&May&2004)&
Buehner&MJ,&McGregor&S&(2006)&Temporal&delays&can&facilitate&causal&a]ribu1on:&Towards&a&general&1meframe&bias&in&causal&induc1on.&Thinking&&&reasoning&12(4):353Q378.&
• Time&used&to&learn&causality&(but&can&incorrectly&override&covaria1on)&
• More&consistent&vs&variable&1ming&
FirstQorder&Markov&process&
&
• Edges&only&between&i&and&i+1&
• Future&and&past&are&independent&condi1oned&on&present&
X1→ X2 → X3→ ...→ Xt
P(Xt | X0...Xt−1) = P(Xt | Xt−1)
Markov&processes&
• Path&to¤t&state&doesn’t&ma]er&
• Once¤t&state&is&known,&informa1on&on&prior&states¬&informa1ve&
Disease&
Exposure&
Symptoms&
Treatment&
X"
Example&
XYZ1& XYZ2& P&
111& 111& 0.01&
111& 110& 0.1&
111& 101& 0.03&
…&
000& 001& 0.04&
000& 000& 0.5&
X,&Y,&Z1&
X,&Y,&Z2&
Terminology&
Markov&process:&stochas1c&process&where&transi1on&probabili1es&dependent&only&on¤t&state&&&
Hidden&Markov&Model&
X0&
Y0&
X1&
Y1&
X2&
Y2&
X3&
Y3&
X0&
Y0&
X1&
Y1&
…&
P(Yt | X0:t−1) = P(Yt | Xt−1)
Dynamic&Bayesian&networks&
• “Dynamic”&because&they&include&1me&–¬&because&they&change&over&1me!&
• Generaliza1on&of&HMM&(HMMs&are&a&subset&of&DBNs)&
• Instead&of&one&state&variable,&now&we&have&a&collec1on&of&BNs&
Example&
Stock&B&
Stock&A&
Stock&C&
Stock&B&
Stock&A&
Stock&C&
Stock&B&
Stock&A&
Stock&C&
Ini1al&Transi1on&from&t&to&t+1&
Unrolled&
Stock&B&
Stock&A&
Stock&C&
Stock&B&
Stock&A&
Stock&C&
Time&0& Time&1&
Stock&B&
Stock&A&
Stock&C&
Stock&B&
Stock&A&
Stock&C&
Stock&B&
Stock&A&
Stock&C&
Time&2& Time&3& Time&4&
DBNs&
A&DBN&is&defined&by:&&Qa&BN&for&the&ini1al&state&of&the&system&&&Qa&set&of&BNs&(one&for&each&1me&slice)&showing&&how&a&variable&at&1me&t&influences&another&at&&a&later&1me&t+i&
Notes&on&DBNs&
• Have&to&specify&set&of&lags&to&test&• Complexity:&
– In&theory&set&of&graphs&is&all&variables&connected&in&all&possible&ways&–&across&each&1meslice&
– More&compact&than&HMM:&recall&CPT/BN&
NonQsta1onarity&
Grzegorczyk,&M.,&&&Husmeier,&D.&(2009).&NonQsta1onary&con1nuous&dynamic&bayesian&networks.&Advances'in'Neural'Informa<on'Processing'Systems'(NIPS),&22,&682Q690.&
Some&sovware&
• Banjo&(java)&–&discre1zed&data&• DBmcmc&/BNT&(MATLAB)&&• miniTUBA&&More:&h]p://www.cs.ubc.ca/~murphyk/Sovware/bnsov.html&
Applica1on:&gene®ulatory&networks&
Zou,&M.,&&&Conzen,&S.&D.&(2005).&A&new&dynamic&bayesian&network&(DBN)&approach&for&iden1fying&gene®ulatory&networks&from&1me&courseµarray&data.&Bioinforma<cs,&21(1),&71.&
Applica1on:&neuronal&connec1vity&
Eldawlatly,&S.,&Zhou,&Y.,&Jin,&R.,&&&Oweiss,&K.&G.&(2010).&On&the&use&of&dynamic&bayesian&networks&in&reconstruc1ng&func1onal&neuronal&networks&from&spike&train&ensembles.&Neural'Computa<on,&22(1),&158Q189&
Applica1on:&diagnosis&
Charitos,&T.,&van&der&Gaag,&L.&C.,&Visscher,&S.,&Schurink,&K.&A.&M.,&&&Lucas,&P.&J.&F.&(2009).&A&dynamic&bayesian&network&for&diagnosing&ven1latorQassociated&pneumonia&in&ICU&pa1ents.&Expert'Systems'with'Applica<ons,&36(2),&1249Q1258.&
References&
• Ghahramani&Z&(1998)&Learning&dynamic&Bayesian&networks.&Adap1ve&Processing&of&Sequences&and&Data&Structures:168Q197.&
• Murphy&K&(2002)&Dynamic&Bayesian&networks:&representa1on,&inference&and&learning.&PhD&Thesis,&University&of&California,&Berkley.&
LogicQbased&
• Rela1onships&versus&structures&• Complexity&+&1me&• Finding&1me&windows&without&prior&knowledge&
• Explana1on&with&uncertainty&+&1me&
The&following&slides&are&from:&Kleinberg,&S&(2013).&Causality,'Probability'and'Time,&Cambridge&University&Press.&
What&is&a&causal&rela1onship?&
smoking&and&hypertension&un1l&a&par1cular&gene1c&muta1on&occurs&causes&CHF&in&1Q2&years&with&probability&p&
(s∧h)Ug≥p
≥1,≤2CHF
*made&up&rela1onship&+&1ming&
Prima&facie&causes&
earlier&than&effect,&raise&probability&of&effect&&Also&called&poten1al/possible&causes&
eF
ec
cF
p
ts
p
∞≤<
≤≥
≥
∞≤>
,0
1&≤&s&≤&t&≤&∞,&&s&≠&∞&
How&to&dis1nguish&cause/nonQcause&
• Causes&raise&probability&of&effects&• But&so&do&nonQcauses&
decreasing air pressure&
falling barometer& rain&
Significance/insignificance&
Suppes:&P(e|c^d)=P(e|c)&so&d&spurious&BN:&Distribu1on&unfaithful&Eells:&&wouldn’t&hold&d&fixed&when&evalua1ng&c&&
ct et2
dt1
9/16
1/4 3/4
What&do&we&want&from&a&measure&of&causal&significance?&
• Propor1onal&to&how&“informa1ve”&cause&is&• Direct&causes&• Dis1nguish&between&strong/weak&causes&
Which&causes&are&significant?&
• Main&idea:&looking&for&be]er&explana1ons&for&the&effect&
• Calculate&impact&of&cause&on&effect's&probability&Q&aver&accoun1ng&for&other&factors&&
|\|
)|()|(),( \
cX
xcePxcePec cXx
avg
∑∈
∧¬−∧=ε
εavg&and&1me&
P(e|c^x)&means&c,x&both&occur&and&e&occurs&in&overlapping&1me&window&c
≥p
≥s,≤te
x≥p
≥s ',≤t 'e
c&
x&
e&
r& s&
r'& s'&
More&on&εavg:&set&X&• What&is&compared&against?&
– Probability&raisers&at&any&1me&before&e&– Could&be&causes/effects&of&c&
• Why&can&we&do&this?&Two&cases:&– Independent&– Nega1vely&correlated&
What&does&εavg&&mean?&
• Posi1ve:&c&has&posi1ve&influence&on&e’s&probability&
• Nega1ve:&probability&of&e&greater&aver&c’s&absence&(c&is&possible&nega1ve&cause&of&e)&
• Zero:&+/Q&may&cancel,&or&no&influence&
• Small&value:&maybe&ar1fact,&maybe&weak&real&cause&
Proper1es&of&εavg&&
• With&large&sample,&no&causes,&values&are&normally&distributed&
• Note:¬&equal&to&zero&for&nonQcauses.&Why?&
−2 −1 0 1 2 3 4 5 6 7 80
50
100
150
200
250
300
350
z−values
cou
nt
Dataf(z)Empirical nullTheoretical null
Example&
¬S, YFS
S, YF
LC
0.1
0.750.01
0.85
1
εS (YF,LC) = P(LC |YF∧S)−P(LC |¬YF∧S)= 0.85− 0.75= 0.10
εS (S,LC) = P(LC | S∧YF)−P(LC |¬S∧YF)= 0.85− 0.01= 0.84
Defini1on&Q&insignificant&
c&is&an&ε$insignificant&cause&of&e&if:&&&c&is&a&poten1al&cause&of&e&and&|εavg|<&ε&
Defini1on&Q&significant&
• c&is&a&just"so"or&ε$significant&cause&of&e&if&it&is&a&nonQεQinsignificant&poten1al&cause&
• Note&insignificant&≠&spurious&– Could&be&spurious&or&have&small&influence&
• Just&so¬&necessarily&genuine&– There&may&be&missing&common&causes&
Tes1ng&formulas&in&observa1onal&data&
• Specify&hypotheses&as&temporal&logical&formulas&
• Instead&of&inferring&model,&test&whether&formulas&are&sa1sfied&in&data&– Find&formulas&sa1sfied&at&each&1mepoint&
• Ex:&– Calculate&condi1onal&probability&across&sequences&using&frequencies&
(s∧h)Ug
Probability&of&c&leadsQto&e&in&1Q2&1me&units&
• Number&of&1mes&sa1sfying&c:&3&• Number&of&1mes&sa1sfying&leadsQto&formula:&2&• Probability&=&2/3&
c& a& c,d& f& e& a,c& e,b&c& c& e& c& e&
False True True
ecp
2,1 ≤≥
≥Satisfied by trace if p≤ 2/3
Mul1ple&trace&case&
Tes1ng&�(c&and&d)&lead&to&e&in&2Q3&months�&
P1&
P2&
P3&
P4&
Time&(months)&
e&
e&
e&
cd&
c&
c&d&
c&d&
c&d& e&
Another&example&• To&determine&significance,&need&to&calculate&probabili1es&
• Count&frequency,&sum&over&all&traces&edcfts ≤≥
∧=,
(# times satisfying f )p∈P∑
(# times satisfying c∧d)p∈P∑
P1&
P2&
e&
e&
cd&
c&
c&d& e&
Determinism&
• Events&that&are&inevitable&
• Cause&that&always&produces&effect&
M
5 minutes
D
Gbroken
1
1
M B
5 minutes 10 minutes
D
0.720.28
1 1
11
M&necessary/sufficient& M,&B&sufficient,¬&necc&
Redundant&causa1on&
• Overdetermina1on&– Mul1ple&causes&that&always&coQoccur&(e.g.&firing&squad)&
• Early&preemp1on&• Late&preemp1on&
– Causes&both&occur,&one&rou1nely&preempts&other&
Applica1on:&simulated&financial&1me&series&
• Created&synthe1c&data&based&on&FamaQFrench&factor&model&[Fama&&&French&1992]&
• 11&datasets,&two&1me&periods&– [1]&no&causality&– [2Q8]&randomly&generated&rela1onships,&one&lag&– [9Q10]&randomly&generated&rela1onships,&random&lags&
– [11]&manyQtoQone&rela1onships,&one&lag&
tij
tjijti fr ,,, εβ +=∑
Pa]erns&
1
21
15
8
20
17
7
6
12
11
19
2
18
10
5
9
3
23
0
1316
422
24
23
2
8
16
1
11
17
0
5
20
14
7
18
19
915
22
3 4
10
13
1
5
2 3 4
6
7
Results&• Results&for&return&data&below &&
– 22&datasets&(11&types,&2&runs&each)&• Two&4000&day&1me&periods&
– 25&stocks,&3&factors&– Daily&returns&
• True&posi1ves&– Dependency&between&stocks&
FDR FNR Intersection
0.0241 0.0082 0.4714
68"
Inferring&1ming&
• From&tes1ng&rela1onships&between&all&variables&and&CHF&in&1Q2&weeks,&want&to&ul1mately&infer&"high&AST&leads&to&CHF&in&4Q10&days"&
• Instead&of&only&accep1ng/rejec1ng&hypotheses,&refine&hypotheses&&&
Intui1on&behind&procedure&
1) Too wide&
2) Shifted&
3) Too narrow&
actual&
s
c
x
r'& s'&
r
|\|
)|()|(),( \
cX
xcePxcePec cXx
avg
∑∈
∧¬−∧=ε P(e | c∧ x) = #(c∧ x∧e)
#(c∧ x)
Window&inference&
• Assump1on&1:&&A&significant&rela1onship&will&be&found&to&be&significant&in&at&least&one&window&intersec1ng&the&true&window&
• Assump1on&2:&Significant&rela1onships&are&a&small&propor1on&of&overall&set&tested&
• Claim:&Itera1vely&perturba1on&of&windows&in&a&greedy&way&converges&to&true&windows&
Recap&• Find&significant&rela1onships&
– Generate&ini1al&hypotheses&with&candidate&1me&windows,&find&possible&causes&
– Calculate&εavg&for&each&– Label&possible&causes&with&fdr&>&threshold&as&εQinsignificant,&rest&εQsignificant&
• For&sta1s1cally&significant&rela1onships,&itera1vely&perturb&1me&window&(expand,&contract,&shiv)&trying&to&maximize&εavg&&&
• ReQcalculate&&εavg&with&newly&inferred&rela1onships&
Explana1on&
• Where&do&explana1ons&come&from?&– Need&formal&representa1on&for&algorithms&
• Type&≠&Token&– Cannot&assume&observa1ons&will&exactly&fit&what&is&known&
• Data&– Frequently&missing&– Time&of&event&≠&1me&event&is&recorded&
Intui1on&
• Type&level&rela1onships&are&candidates&&• Strictly&applying&typeQlevel&rela1onships&is&problema1c&
• Without&background&knowledge&that&1ming&is&a&strong&constraint,&do¬&want&this&to&strictly&limit&explana1ons&
Goals&
• Quan1fy&significance&of&an&explana1on&– Allows&for&incomplete&knowledge&– Incorporates&temporal&uncertainty&
• Algorithm&for&finding&the&significance&of&explana1ons&
General&approach&
• Infer&causal&rela1onships&from&data&• Observe&token&sequence&
– E.g.&Series&of&medical&tests&or&con1nuous&recordings&
• Iterate&over&rela1onships&to&calculate&significance&for&each&explana1on&
• Result&is&ranking&of&possible&explana1ons&
Significance&of&hypotheses&
• When&cause&and&effect&occur&consistent&with&typeQlevel&informa1on&– Significance&=&typeQlevel&significance&
• When&1ming&differs&– Weight&by&difference&
• When&unknown&if&c&occurs&– Weight&by&probability&of&occurrence&
1me&
0& 7& 10&
Exposure&1&
Exposure&2& flu&
Recall&connec1ng&principle!&
Assessing&significance&
&&&&With&
&=1&if&occurs&at&1me&t’ & &=1&if&c&in&window&[r,s]&&V&is&observa1on&sequence&
& & & & &&
),,,'()|(),(),( '' srttfVcPececS tavgtt ××= ε
ecsr
p
≤≥
≥
,
Accoun1ng&for&uncertainty&
• Slope&may&be&dependent&on&underlying&mechanisms&– i.e.&when&is&cause&s1ll&capable&of&producing&effect&
• Can&use&other&func1on&with&same&proper1es&
),,,'()|(),(),( '' srttfVcPececS tavgtt ××= ε
f(t)&
1&
0&
r& s&
Calcula1ng&P&
• Where&do&probabili1es&come&from?&– Ini1al&dataset&– Model&
• Condi1onal&on&tokenQlevel&observa1ons&
),,,'()|(),(),( '' srttfVcPececS tavgtt ××= ε
Algorithm&
so type-level causes have occurred consistently with the type-level timings, they
will have the highest values of this significance. As the goal is to find the causes
with the highest level of significance, we can begin by taking these sets and
testing whether any of their members are true on the particular occasion. Where
Vi
indicates the set of propositions true at time i and et
is the occurrence of
event e at actual time t, the procedure for finding relationships with the highest
levels of significance is as follows.
1. X type-level genuine and just so causes of e
V [V0
,V1
. . .Vt
]
EX ;
2. For each x 2 X, where x = y ;>r,6s e, for i 2 [t- s, t- r] if Vi
|= y the
significance of yi
for et
is "avg
(yr-s
, e) and EX = EXS{x}
3. For each x 2 X \ EX, where x = y ;>r,6s e, and
i = argmaxj2[0,t)
(P(yj
|V)⇥ f(j, t, r, s)),
the significance of yi
for et
is
S(yi
, et
) = "avg
(yr-s
, e)⇥ P(yi
|V)⇥ f(i, t, r, s).
We begin by finding occurrences that exactly match the type-level relation-
ships as these will have the maximum significance value for that type-level
relationship (as both P and f are equal to one, so S = "avg
). For relationships
without exact matches, the times with the highest significance after weighting
by f and P are identified using that the instance of a cause with the maximum
227
Midterm&topics&• Why&bother&seeking&causality&if&it’s&hard&and&why&aren’t&correla1ons&enough?&
• Regulari1es/probabili1es/counterfactuals&• Why&might&an&inferred&rela1onship&be&spurious?&• Type&vs.&token&• Basics&of&condi1onal&independence,&Bayes&rule,&rela1onship&between&independence&and&BN,&joint&probability&
• Interven1on&vs.&observa1on&• Granger&causality&(conceptual),&more&next&weeks&
NOTE:&These&are&focus&areas&but&anything&we&covered&before&the&midterm&is&fair&game&