causal&inference:&predic1on,& explana1on,&and&interven1on&

Causal&Inference:&predic1on,&explana1on,&and&interven1on&

Lecture&6:&Causality&in&1me&series&(part&I)&

Samantha&Kleinberg&[email protected]&

Recap&

Exposure&

Disease&

Leibovici,&L.&(2001).&Effects&of&remote,&retroac1ve&intercessory&prayer&on&outcomes&in&pa1ents&with&bloodstream&infec1on:&Randomised&controlled&trial.&Bmj,&323(7327),&1450Q1451.&&

Leibovici’s&reply&

Contrast&

•  Exposure&to&someone&with&chickenpox&Tuesday,&illness&Friday&

•  Exposure&to&someone&with&chickenpox&in&September,&illness&in&November&

&

Following&videos&from:&Scholl,&B.&J.,&&&Tremoulet,&P.&(2000).&Perceptual&causality&and&animacy.&Trends'in'Cogni,ve'Sciences,'4(8),&299Q309.&

Heider&and&Simmel&

Delays&and&percep1on&

•  Micho]e:&no&percep1on&of&causality&with&delay&

•  Shanks,&Pearson,&and&Dickinson&(1999):&increase&in&delay&decreases&causal&judgment&

Expecta1ons&

•  Keyboard/triangle:&Delays&lower&causal&judgment&but&knowledge&that&there&could&be&delay&reduced&effect&(Buehner&&&May&2003)&

•  Energy&efficient&lightbulb:&No&effect&from&delay…&but&instant&effect&s1ll&judged&as&causal&(Buehner&&&May&2004)&

Buehner&MJ,&McGregor&S&(2006)&Temporal&delays&can&facilitate&causal&a]ribu1on:&Towards&a&general&1meframe&bias&in&causal&induc1on.&Thinking&&&reasoning&12(4):353Q378.&

•  Time&used&to&learn&causality&(but&can&incorrectly&override&covaria1on)&

•  More&consistent&vs&variable&1ming&

But…&

•  Data&collec1on&granularity&•  Post&hoc&ergo&propter&hoc&•  Nonsta1onarity&

Nonsta1onarity&

Time&in&graphical&models&

Exposure&

Disease&

Symptoms&

FirstQorder&Markov&process&

&

•  Edges&only&between&i&and&i+1&

•  Future&and&past&are&independent&condi1oned&on&present&

X1→ X2 → X3→ ...→ Xt

P(Xt | X0...Xt−1) = P(Xt | Xt−1)

Markov&processes&

•  Path&to&current&state&doesn’t&ma]er&

•  Once&current&state&is&known,&informa1on&on&prior&states&not&informa1ve&

Disease&

Exposure&

Symptoms&

Treatment&

X"

States&vs.&variables&

{A,B,C}&

{A,B,C}&

{A,B,C}&

A&

B& C&

Example&

XYZ1& XYZ2& P&

111& 111& 0.01&

111& 110& 0.1&

111& 101& 0.03&

…&

000& 001& 0.04&

000& 000& 0.5&

X,&Y,&Z1&

X,&Y,&Z2&

Terminology&

Markov&process:&stochas1c&process&where&transi1on&probabili1es&dependent&only&on&current&state&&&

Each&node&is&a&state&(collec1on&of&variable&assignments)&

Heads& Tails&

0.5&

0.5&

0.5&0.5&

General&

•  First&order:&t&depends&on&tQ1&•  Second&order:&t&depends&on&tQ1&and&tQ2&

X0& X1& X2& X3&

Mosquito&bite&

Welt&

Malaria&infec1on&

Symptoms&

Market&factors&

Stock&prices&

Market&factors&

Stock&prices&

Hidden&Markov&Model&

X0&

Y0&

X1&

Y1&

X2&

Y2&

X3&

Y3&

X0&

Y0&

X1&

Y1&

…&

P(Yt | X0:t−1) = P(Yt | Xt−1)

Dynamic&Bayesian&networks&

•  “Dynamic”&because&they&include&1me&–&not&because&they&change&over&1me!&

•  Generaliza1on&of&HMM&(HMMs&are&a&subset&of&DBNs)&

•  Instead&of&one&state&variable,&now&we&have&a&collec1on&of&BNs&

Example&

Stock&B&

Stock&A&

Stock&C&

Stock&B&

Stock&A&

Stock&C&

Stock&B&

Stock&A&

Stock&C&

Ini1al&Transi1on&from&t&to&t+1&

Unrolled&

Stock&B&

Stock&A&

Stock&C&

Stock&B&

Stock&A&

Stock&C&

Time&0& Time&1&

Stock&B&

Stock&A&

Stock&C&

Stock&B&

Stock&A&

Stock&C&

Stock&B&

Stock&A&

Stock&C&

Time&2& Time&3& Time&4&

DBNs&

A&DBN&is&defined&by:&&Qa&BN&for&the&ini1al&state&of&the&system&&&Qa&set&of&BNs&(one&for&each&1me&slice)&showing&&how&a&variable&at&1me&t&influences&another&at&&a&later&1me&t+i&

Cycles&

Treatment&

Doctor&

Outcome&

D&

T&

O&

D&

T&

O&

t& t+1&

Notes&on&DBNs&

•  Have&to&specify&set&of&lags&to&test&•  Complexity:&

–  In&theory&set&of&graphs&is&all&variables&connected&in&all&possible&ways&–&across&each&1meslice&

– More&compact&than&HMM:&recall&CPT/BN&

NonQsta1onarity&

Grzegorczyk,&M.,&&&Husmeier,&D.&(2009).&NonQsta1onary&con1nuous&dynamic&bayesian&networks.&Advances'in'Neural'Informa<on'Processing'Systems'(NIPS),&22,&682Q690.&

Some&sovware&

•  Banjo&(java)&–&discre1zed&data&•  DBmcmc&/BNT&(MATLAB)&&•  miniTUBA&&More:&h]p://www.cs.ubc.ca/~murphyk/Sovware/bnsov.html&

Applica1on:&gene&regulatory&networks&

Zou,&M.,&&&Conzen,&S.&D.&(2005).&A&new&dynamic&bayesian&network&(DBN)&approach&for&iden1fying&gene&regulatory&networks&from&1me&course&microarray&data.&Bioinforma<cs,&21(1),&71.&

Applica1on:&neuronal&connec1vity&

Eldawlatly,&S.,&Zhou,&Y.,&Jin,&R.,&&&Oweiss,&K.&G.&(2010).&On&the&use&of&dynamic&bayesian&networks&in&reconstruc1ng&func1onal&neuronal&networks&from&spike&train&ensembles.&Neural'Computa<on,&22(1),&158Q189&

Applica1on:&diagnosis&

Charitos,&T.,&van&der&Gaag,&L.&C.,&Visscher,&S.,&Schurink,&K.&A.&M.,&&&Lucas,&P.&J.&F.&(2009).&A&dynamic&bayesian&network&for&diagnosing&ven1latorQassociated&pneumonia&in&ICU&pa1ents.&Expert'Systems'with'Applica<ons,&36(2),&1249Q1258.&

References&

•  Ghahramani&Z&(1998)&Learning&dynamic&Bayesian&networks.&Adap1ve&Processing&of&Sequences&and&Data&Structures:168Q197.&

•  Murphy&K&(2002)&Dynamic&Bayesian&networks:&representa1on,&inference&and&learning.&PhD&Thesis,&University&of&California,&Berkley.&

LogicQbased&

•  Rela1onships&versus&structures&•  Complexity&+&1me&•  Finding&1me&windows&without&prior&knowledge&

•  Explana1on&with&uncertainty&+&1me&

The&following&slides&are&from:&Kleinberg,&S&(2013).&Causality,'Probability'and'Time,&Cambridge&University&Press.&

Recent&uses&

What&is&a&causal&rela1onship?&

smoking&and&hypertension&un1l&a&par1cular&gene1c&muta1on&occurs&causes&CHF&in&1Q2&years&with&probability&p&

(s∧h)Ug≥p

≥1,≤2CHF

*made&up&rela1onship&+&1ming&

Prima&facie&causes&

earlier&than&effect,&raise&probability&of&effect&&Also&called&poten1al/possible&causes&

eF

ec

cF

p

ts

p

∞≤<

≤≥

≥

∞≤>

,0

1&≤&s&≤&t&≤&∞,&&s&≠&∞&

How&to&dis1nguish&cause/nonQcause&

•  Causes&raise&probability&of&effects&•  But&so&do&nonQcauses&

decreasing air pressure&

falling barometer& rain&

Significance/insignificance&

Suppes:&P(e|c^d)=P(e|c)&so&d&spurious&BN:&Distribu1on&unfaithful&Eells:&&wouldn’t&hold&d&fixed&when&evalua1ng&c&&

ct et2

dt1

9/16

1/4 3/4

What&do&we&want&from&a&measure&of&causal&significance?&

•  Propor1onal&to&how&“informa1ve”&cause&is&•  Direct&causes&•  Dis1nguish&between&strong/weak&causes&

Which&causes&are&significant?&

•  Main&idea:&looking&for&be]er&explana1ons&for&the&effect&

•  Calculate&impact&of&cause&on&effect's&probability&Q&aver&accoun1ng&for&other&factors&&

|\|

)|()|(),( \

cX

xcePxcePec cXx

avg

∑∈

∧¬−∧=ε

εavg&and&1me&

P(e|c^x)&means&c,x&both&occur&and&e&occurs&in&overlapping&1me&window&c

≥p

≥s,≤te

x≥p

≥s ',≤t 'e

c&

x&

e&

r& s&

r'& s'&

More&on&εavg:&set&X&•  What&is&compared&against?&

– Probability&raisers&at&any&1me&before&e&– Could&be&causes/effects&of&c&

•  Why&can&we&do&this?&Two&cases:&–  Independent&– Nega1vely&correlated&

What&does&εavg&&mean?&

•  Posi1ve:&c&has&posi1ve&influence&on&e’s&probability&

•  Nega1ve:&probability&of&e&greater&aver&c’s&absence&(c&is&possible&nega1ve&cause&of&e)&

•  Zero:&+/Q&may&cancel,&or&no&influence&

•  Small&value:&maybe&ar1fact,&maybe&weak&real&cause&

Proper1es&of&εavg&&

•  With&large&sample,&no&causes,&values&are&normally&distributed&

•  Note:&not&equal&to&zero&for&nonQcauses.&Why?&

−2 −1 0 1 2 3 4 5 6 7 80

50

100

150

200

250

300

350

z−values

cou

nt

Dataf(z)Empirical nullTheoretical null

Example&

¬S, YFS

S, YF

LC

0.1

0.750.01

0.85

1

εS (YF,LC) = P(LC |YF∧S)−P(LC |¬YF∧S)= 0.85− 0.75= 0.10

εS (S,LC) = P(LC | S∧YF)−P(LC |¬S∧YF)= 0.85− 0.01= 0.84

Defini1on&Q&insignificant&

c&is&an&ε$insignificant&cause&of&e&if:&&&c&is&a&poten1al&cause&of&e&and&|εavg|<&ε&

Defini1on&Q&significant&

•  c&is&a&just"so"or&ε$significant&cause&of&e&if&it&is&a&nonQεQinsignificant&poten1al&cause&

•  Note&insignificant&≠&spurious&– Could&be&spurious&or&have&small&influence&

•  Just&so&not&necessarily&genuine&– There&may&be&missing&common&causes&

Tes1ng&formulas&in&observa1onal&data&

•  Specify&hypotheses&as&temporal&logical&formulas&

•  Instead&of&inferring&model,&test&whether&formulas&are&sa1sfied&in&data&– Find&formulas&sa1sfied&at&each&1mepoint&

•  Ex:&– Calculate&condi1onal&probability&across&sequences&using&frequencies&

(s∧h)Ug

Probability&of&c&leadsQto&e&in&1Q2&1me&units&

•  Number&of&1mes&sa1sfying&c:&3&•  Number&of&1mes&sa1sfying&leadsQto&formula:&2&•  Probability&=&2/3&

c& a& c,d& f& e& a,c& e,b&c& c& e& c& e&

False True True

ecp

2,1 ≤≥

≥Satisfied by trace if p≤ 2/3

Mul1ple&trace&case&

Tes1ng&�(c&and&d)&lead&to&e&in&2Q3&months�&

P1&

P2&

P3&

P4&

Time&(months)&

e&

e&

e&

cd&

c&

c&d&

c&d&

c&d& e&

Another&example&•  To&determine&significance,&need&to&calculate&probabili1es&

•  Count&frequency,&sum&over&all&traces&edcfts ≤≥

∧=,

(# times satisfying f )p∈P∑

(# times satisfying c∧d)p∈P∑

P1&

P2&

e&

e&

cd&

c&

c&d& e&

Determinism&

•  Events&that&are&inevitable&

•  Cause&that&always&produces&effect&

M

5 minutes

D

Gbroken

1

1

M B

5 minutes 10 minutes

D

0.720.28

1 1

11

M&necessary/sufficient& M,&B&sufficient,&not&necc&

Redundant&causa1on&

•  Overdetermina1on&– Mul1ple&causes&that&always&coQoccur&(e.g.&firing&squad)&

•  Early&preemp1on&•  Late&preemp1on&

– Causes&both&occur,&one&rou1nely&preempts&other&

Chains&

•  Always&finds&most&direct&cause.&Why?&

Applica1on:&simulated&financial&1me&series&

•  Created&synthe1c&data&based&on&FamaQFrench&factor&model&[Fama&&&French&1992]&

•  11&datasets,&two&1me&periods&–  [1]&no&causality&–  [2Q8]&randomly&generated&rela1onships,&one&lag&–  [9Q10]&randomly&generated&rela1onships,&random&lags&

–  [11]&manyQtoQone&rela1onships,&one&lag&

tij

tjijti fr ,,, εβ +=∑

Pa]erns&

1

21

15

8

20

17

7

6

12

11

19

2

18

10

5

9

3

23

0

1316

422

24

23

2

8

16

1

11

17

0

5

20

14

7

18

19

915

22

3 4

10

13

1

5

2 3 4

6

7

Results&•  Results&for&return&data&below &&

–  22&datasets&(11&types,&2&runs&each)&•  Two&4000&day&1me&periods&

–  25&stocks,&3&factors&–  Daily&returns&

•  True&posi1ves&–  Dependency&between&stocks&

FDR FNR Intersection

0.0241 0.0082 0.4714

68"

Inferring&1ming&

•  From&tes1ng&rela1onships&between&all&variables&and&CHF&in&1Q2&weeks,&want&to&ul1mately&infer&"high&AST&leads&to&CHF&in&4Q10&days"&

•  Instead&of&only&accep1ng/rejec1ng&hypotheses,&refine&hypotheses&&&

Intui1on&behind&procedure&

1) Too wide&

2) Shifted&

3) Too narrow&

actual&

s

c

x

r'& s'&

r

|\|

)|()|(),( \

cX

xcePxcePec cXx

avg

∑∈

∧¬−∧=ε P(e | c∧ x) = #(c∧ x∧e)

#(c∧ x)

Window&inference&

•  Assump1on&1:&&A&significant&rela1onship&will&be&found&to&be&significant&in&at&least&one&window&intersec1ng&the&true&window&

•  Assump1on&2:&Significant&rela1onships&are&a&small&propor1on&of&overall&set&tested&

•  Claim:&Itera1vely&perturba1on&of&windows&in&a&greedy&way&converges&to&true&windows&

Recap&•  Find&significant&rela1onships&

– Generate&ini1al&hypotheses&with&candidate&1me&windows,&find&possible&causes&

–  Calculate&εavg&for&each&–  Label&possible&causes&with&fdr&>&threshold&as&εQinsignificant,&rest&εQsignificant&

•  For&sta1s1cally&significant&rela1onships,&itera1vely&perturb&1me&window&(expand,&contract,&shiv)&trying&to&maximize&εavg&&&

•  ReQcalculate&&εavg&with&newly&inferred&rela1onships&

Explana1on&

•  Where&do&explana1ons&come&from?&– Need&formal&representa1on&for&algorithms&

•  Type&≠&Token&– Cannot&assume&observa1ons&will&exactly&fit&what&is&known&

•  Data&– Frequently&missing&– Time&of&event&≠&1me&event&is&recorded&

Timing&

•  LC=&S&•  LC=A&

Intui1on&

•  Type&level&rela1onships&are&candidates&&•  Strictly&applying&typeQlevel&rela1onships&is&problema1c&

•  Without&background&knowledge&that&1ming&is&a&strong&constraint,&do&not&want&this&to&strictly&limit&explana1ons&

Goals&

•  Quan1fy&significance&of&an&explana1on&– Allows&for&incomplete&knowledge&–  Incorporates&temporal&uncertainty&

•  Algorithm&for&finding&the&significance&of&explana1ons&

General&approach&

•  Infer&causal&rela1onships&from&data&•  Observe&token&sequence&

– E.g.&Series&of&medical&tests&or&con1nuous&recordings&

•  Iterate&over&rela1onships&to&calculate&significance&for&each&explana1on&

•  Result&is&ranking&of&possible&explana1ons&

Significance&of&hypotheses&

•  When&cause&and&effect&occur&consistent&with&typeQlevel&informa1on&–  Significance&=&typeQlevel&significance&

•  When&1ming&differs&–  Weight&by&difference&

•  When&unknown&if&c&occurs&–  Weight&by&probability&of&occurrence&

1me&

0& 7& 10&

Exposure&1&

Exposure&2& flu&

Recall&connec1ng&principle!&

Assessing&significance&

&&&&With&

&=1&if&occurs&at&1me&t’ & &=1&if&c&in&window&[r,s]&&V&is&observa1on&sequence&

& & & & &&

),,,'()|(),(),( '' srttfVcPececS tavgtt ××= ε

ecsr

p

≤≥

≥

,

Accoun1ng&for&uncertainty&

•  Slope&may&be&dependent&on&underlying&mechanisms&–  i.e.&when&is&cause&s1ll&capable&of&producing&effect&

•  Can&use&other&func1on&with&same&proper1es&


f(t)&

1&

0&

r& s&

Calcula1ng&P&

•  Where&do&probabili1es&come&from?&–  Ini1al&dataset&– Model&

•  Condi1onal&on&tokenQlevel&observa1ons&


Algorithm&

so type-level causes have occurred consistently with the type-level timings, they

will have the highest values of this significance. As the goal is to find the causes

with the highest level of significance, we can begin by taking these sets and

testing whether any of their members are true on the particular occasion. Where

Vi

indicates the set of propositions true at time i and et

is the occurrence of

event e at actual time t, the procedure for finding relationships with the highest

levels of significance is as follows.

1. X type-level genuine and just so causes of e

V [V0

,V1

. . .Vt

]

EX ;

2. For each x 2 X, where x = y ;>r,6s e, for i 2 [t- s, t- r] if Vi

|= y the

significance of yi

for et

is "avg

(yr-s

, e) and EX = EXS{x}

3. For each x 2 X \ EX, where x = y ;>r,6s e, and

i = argmaxj2[0,t)

(P(yj

|V)⇥ f(j, t, r, s)),

the significance of yi

for et

is

S(yi

, et

) = "avg

(yr-s

, e)⇥ P(yi

|V)⇥ f(i, t, r, s).

We begin by finding occurrences that exactly match the type-level relation-

ships as these will have the maximum significance value for that type-level

relationship (as both P and f are equal to one, so S = "avg

). For relationships

without exact matches, the times with the highest significance after weighting

by f and P are identified using that the instance of a cause with the maximum

227

Midterm&topics&•  Why&bother&seeking&causality&if&it’s&hard&and&why&aren’t&correla1ons&enough?&

•  Regulari1es/probabili1es/counterfactuals&•  Why&might&an&inferred&rela1onship&be&spurious?&•  Type&vs.&token&•  Basics&of&condi1onal&independence,&Bayes&rule,&rela1onship&between&independence&and&BN,&joint&probability&

•  Interven1on&vs.&observa1on&•  Granger&causality&(conceptual),&more&next&weeks&

NOTE:&These&are&focus&areas&but&anything&we&covered&before&the&midterm&is&fair&game&

For&next&week&

•  Read:&C.&W.&J.&Granger&(1980).&Tes1ng&for&Causality:&A&Personal&Viewpoint.&&

•  Bring&ques1ons&for&midterm&review!!&&

causal&inference:&predic1on,& explana1on,&and&interven1on&

Documents