causal&inference:&predic1on,& explana1on,&and&interven1on&

84
Causal Inference: predic1on, explana1on, and interven1on Lecture 6: Causality in 1me series (part I) Samantha Kleinberg [email protected]

Upload: others

Post on 21-Nov-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Causal&Inference:&predic1on,&explana1on,&and&interven1on&

Lecture&6:&Causality&in&1me&series&(part&I)&

Samantha&Kleinberg&[email protected]&

Recap&

Exposure&

Disease&

Leibovici,&L.&(2001).&Effects&of&remote,&retroac1ve&intercessory&prayer&on&outcomes&in&pa1ents&with&bloodstream&infec1on:&Randomised&controlled&trial.&Bmj,&323(7327),&1450Q1451.&&

Leibovici’s&reply&

Contrast&

•  Exposure&to&someone&with&chickenpox&Tuesday,&illness&Friday&

•  Exposure&to&someone&with&chickenpox&in&September,&illness&in&November&

&

Following&videos&from:&Scholl,&B.&J.,&&&Tremoulet,&P.&(2000).&Perceptual&causality&and&animacy.&Trends'in'Cogni,ve'Sciences,'4(8),&299Q309.&

Heider&and&Simmel&

Delays&and&percep1on&

•  Micho]e:&no&percep1on&of&causality&with&delay&

•  Shanks,&Pearson,&and&Dickinson&(1999):&increase&in&delay&decreases&causal&judgment&

Expecta1ons&

•  Keyboard/triangle:&Delays&lower&causal&judgment&but&knowledge&that&there&could&be&delay&reduced&effect&(Buehner&&&May&2003)&

•  Energy&efficient&lightbulb:&No&effect&from&delay…&but&instant&effect&s1ll&judged&as&causal&(Buehner&&&May&2004)&

Buehner&MJ,&McGregor&S&(2006)&Temporal&delays&can&facilitate&causal&a]ribu1on:&Towards&a&general&1meframe&bias&in&causal&induc1on.&Thinking&&&reasoning&12(4):353Q378.&

•  Time&used&to&learn&causality&(but&can&incorrectly&override&covaria1on)&

•  More&consistent&vs&variable&1ming&

But…&

•  Data&collec1on&granularity&•  Post&hoc&ergo&propter&hoc&•  Nonsta1onarity&

Nonsta1onarity&

Time&in&graphical&models&

Exposure&

Disease&

Symptoms&

FirstQorder&Markov&process&

&

•  Edges&only&between&i&and&i+1&

•  Future&and&past&are&independent&condi1oned&on&present&

X1→ X2 → X3→ ...→ Xt

P(Xt | X0...Xt−1) = P(Xt | Xt−1)

Markov&processes&

•  Path&to&current&state&doesn’t&ma]er&

•  Once&current&state&is&known,&informa1on&on&prior&states&not&informa1ve&

Disease&

Exposure&

Symptoms&

Treatment&

X"

States&vs.&variables&

{A,B,C}&

{A,B,C}&

{A,B,C}&

A&

B& C&

Example&

XYZ1& XYZ2& P&

111& 111& 0.01&

111& 110& 0.1&

111& 101& 0.03&

…&

000& 001& 0.04&

000& 000& 0.5&

X,&Y,&Z1&

X,&Y,&Z2&

Terminology&

Markov&process:&stochas1c&process&where&transi1on&probabili1es&dependent&only&on&current&state&&&

Each&node&is&a&state&(collec1on&of&variable&assignments)&

Heads& Tails&

0.5&

0.5&

0.5&0.5&

General&

•  First&order:&t&depends&on&tQ1&•  Second&order:&t&depends&on&tQ1&and&tQ2&

X0& X1& X2& X3&

Mosquito&bite&

Welt&

Malaria&infec1on&

Symptoms&

Market&factors&

Stock&prices&

Market&factors&

Stock&prices&

Hidden&Markov&Model&

X0&

Y0&

X1&

Y1&

X2&

Y2&

X3&

Y3&

X0&

Y0&

X1&

Y1&

…&

P(Yt | X0:t−1) = P(Yt | Xt−1)

Dynamic&Bayesian&networks&

•  “Dynamic”&because&they&include&1me&–&not&because&they&change&over&1me!&

•  Generaliza1on&of&HMM&(HMMs&are&a&subset&of&DBNs)&

•  Instead&of&one&state&variable,&now&we&have&a&collec1on&of&BNs&

Example&

Stock&B&

Stock&A&

Stock&C&

Stock&B&

Stock&A&

Stock&C&

Stock&B&

Stock&A&

Stock&C&

Ini1al&Transi1on&from&t&to&t+1&

Unrolled&

Stock&B&

Stock&A&

Stock&C&

Stock&B&

Stock&A&

Stock&C&

Time&0& Time&1&

Stock&B&

Stock&A&

Stock&C&

Stock&B&

Stock&A&

Stock&C&

Stock&B&

Stock&A&

Stock&C&

Time&2& Time&3& Time&4&

DBNs&

A&DBN&is&defined&by:&&Qa&BN&for&the&ini1al&state&of&the&system&&&Qa&set&of&BNs&(one&for&each&1me&slice)&showing&&how&a&variable&at&1me&t&influences&another&at&&a&later&1me&t+i&

Cycles&

Treatment&

Doctor&

Outcome&

D&

T&

O&

D&

T&

O&

t& t+1&

Notes&on&DBNs&

•  Have&to&specify&set&of&lags&to&test&•  Complexity:&

–  In&theory&set&of&graphs&is&all&variables&connected&in&all&possible&ways&–&across&each&1meslice&

– More&compact&than&HMM:&recall&CPT/BN&

NonQsta1onarity&

Grzegorczyk,&M.,&&&Husmeier,&D.&(2009).&NonQsta1onary&con1nuous&dynamic&bayesian&networks.&Advances'in'Neural'Informa<on'Processing'Systems'(NIPS),&22,&682Q690.&

Some&sovware&

•  Banjo&(java)&–&discre1zed&data&•  DBmcmc&/BNT&(MATLAB)&&•  miniTUBA&&More:&h]p://www.cs.ubc.ca/~murphyk/Sovware/bnsov.html&

Applica1on:&gene&regulatory&networks&

Zou,&M.,&&&Conzen,&S.&D.&(2005).&A&new&dynamic&bayesian&network&(DBN)&approach&for&iden1fying&gene&regulatory&networks&from&1me&course&microarray&data.&Bioinforma<cs,&21(1),&71.&

Applica1on:&neuronal&connec1vity&

Eldawlatly,&S.,&Zhou,&Y.,&Jin,&R.,&&&Oweiss,&K.&G.&(2010).&On&the&use&of&dynamic&bayesian&networks&in&reconstruc1ng&func1onal&neuronal&networks&from&spike&train&ensembles.&Neural'Computa<on,&22(1),&158Q189&

Applica1on:&diagnosis&

Charitos,&T.,&van&der&Gaag,&L.&C.,&Visscher,&S.,&Schurink,&K.&A.&M.,&&&Lucas,&P.&J.&F.&(2009).&A&dynamic&bayesian&network&for&diagnosing&ven1latorQassociated&pneumonia&in&ICU&pa1ents.&Expert'Systems'with'Applica<ons,&36(2),&1249Q1258.&

References&

•  Ghahramani&Z&(1998)&Learning&dynamic&Bayesian&networks.&Adap1ve&Processing&of&Sequences&and&Data&Structures:168Q197.&

•  Murphy&K&(2002)&Dynamic&Bayesian&networks:&representa1on,&inference&and&learning.&PhD&Thesis,&University&of&California,&Berkley.&

LogicQbased&

•  Rela1onships&versus&structures&•  Complexity&+&1me&•  Finding&1me&windows&without&prior&knowledge&

•  Explana1on&with&uncertainty&+&1me&

The&following&slides&are&from:&Kleinberg,&S&(2013).&Causality,'Probability'and'Time,&Cambridge&University&Press.&

Recent&uses&

What&is&a&causal&rela1onship?&

smoking&and&hypertension&un1l&a&par1cular&gene1c&muta1on&occurs&causes&CHF&in&1Q2&years&with&probability&p&

(s∧h)Ug≥p

≥1,≤2CHF

*made&up&rela1onship&+&1ming&

Prima&facie&causes&

earlier&than&effect,&raise&probability&of&effect&&Also&called&poten1al/possible&causes&

eF

ec

cF

p

ts

p

∞≤<

≤≥

∞≤>

,0

1&≤&s&≤&t&≤&∞,&&s&≠&∞&

How&to&dis1nguish&cause/nonQcause&

•  Causes&raise&probability&of&effects&•  But&so&do&nonQcauses&

decreasing air pressure&

falling barometer& rain&

Significance/insignificance&

Suppes:&P(e|c^d)=P(e|c)&so&d&spurious&BN:&Distribu1on&unfaithful&Eells:&&wouldn’t&hold&d&fixed&when&evalua1ng&c&&

ct et2

dt1

9/16

1/4 3/4

What&do&we&want&from&a&measure&of&causal&significance?&

•  Propor1onal&to&how&“informa1ve”&cause&is&•  Direct&causes&•  Dis1nguish&between&strong/weak&causes&

Which&causes&are&significant?&

•  Main&idea:&looking&for&be]er&explana1ons&for&the&effect&

•  Calculate&impact&of&cause&on&effect's&probability&Q&aver&accoun1ng&for&other&factors&&

|\|

)|()|(),( \

cX

xcePxcePec cXx

avg

∑∈

∧¬−∧=ε

εavg&and&1me&

P(e|c^x)&means&c,x&both&occur&and&e&occurs&in&overlapping&1me&window&c

≥p

≥s,≤te

x≥p

≥s ',≤t 'e

c&

x&

e&

r& s&

r'& s'&

More&on&εavg:&set&X&•  What&is&compared&against?&

– Probability&raisers&at&any&1me&before&e&– Could&be&causes/effects&of&c&

•  Why&can&we&do&this?&Two&cases:&–  Independent&– Nega1vely&correlated&

What&does&εavg&&mean?&

•  Posi1ve:&c&has&posi1ve&influence&on&e’s&probability&

•  Nega1ve:&probability&of&e&greater&aver&c’s&absence&(c&is&possible&nega1ve&cause&of&e)&

•  Zero:&+/Q&may&cancel,&or&no&influence&

•  Small&value:&maybe&ar1fact,&maybe&weak&real&cause&

Proper1es&of&εavg&&

•  With&large&sample,&no&causes,&values&are&normally&distributed&

•  Note:&not&equal&to&zero&for&nonQcauses.&Why?&

−2 −1 0 1 2 3 4 5 6 7 80

50

100

150

200

250

300

350

z−values

cou

nt

Dataf(z)Empirical nullTheoretical null

Example&

¬S, YFS

S, YF

LC

0.1

0.750.01

0.85

1

εS (YF,LC) = P(LC |YF∧S)−P(LC |¬YF∧S)= 0.85− 0.75= 0.10

εS (S,LC) = P(LC | S∧YF)−P(LC |¬S∧YF)= 0.85− 0.01= 0.84

Defini1on&Q&insignificant&

c&is&an&ε$insignificant&cause&of&e&if:&&&c&is&a&poten1al&cause&of&e&and&|εavg|<&ε&

Defini1on&Q&significant&

•  c&is&a&just"so"or&ε$significant&cause&of&e&if&it&is&a&nonQεQinsignificant&poten1al&cause&

•  Note&insignificant&≠&spurious&– Could&be&spurious&or&have&small&influence&

•  Just&so&not&necessarily&genuine&– There&may&be&missing&common&causes&

Tes1ng&formulas&in&observa1onal&data&

•  Specify&hypotheses&as&temporal&logical&formulas&

•  Instead&of&inferring&model,&test&whether&formulas&are&sa1sfied&in&data&– Find&formulas&sa1sfied&at&each&1mepoint&

•  Ex:&– Calculate&condi1onal&probability&across&sequences&using&frequencies&

(s∧h)Ug

Probability&of&c&leadsQto&e&in&1Q2&1me&units&

•  Number&of&1mes&sa1sfying&c:&3&•  Number&of&1mes&sa1sfying&leadsQto&formula:&2&•  Probability&=&2/3&

c& a& c,d& f& e& a,c& e,b&c& c& e& c& e&

False True True

ecp

2,1 ≤≥

≥Satisfied by trace if p≤ 2/3

Mul1ple&trace&case&

Tes1ng&�(c&and&d)&lead&to&e&in&2Q3&months�&

P1&

P2&

P3&

P4&

Time&(months)&

e&

e&

e&

cd&

c&

c&d&

c&d&

c&d& e&

Another&example&•  To&determine&significance,&need&to&calculate&probabili1es&

•  Count&frequency,&sum&over&all&traces&edcfts ≤≥

∧=,

(# times satisfying f )p∈P∑

(# times satisfying c∧d)p∈P∑

P1&

P2&

e&

e&

cd&

c&

c&d& e&

Determinism&

•  Events&that&are&inevitable&

•  Cause&that&always&produces&effect&

M

5 minutes

D

Gbroken

1

1

M B

5 minutes 10 minutes

D

0.720.28

1 1

11

M&necessary/sufficient& M,&B&sufficient,&not&necc&

Redundant&causa1on&

•  Overdetermina1on&– Mul1ple&causes&that&always&coQoccur&(e.g.&firing&squad)&

•  Early&preemp1on&•  Late&preemp1on&

– Causes&both&occur,&one&rou1nely&preempts&other&

Chains&

•  Always&finds&most&direct&cause.&Why?&

Applica1on:&simulated&financial&1me&series&

•  Created&synthe1c&data&based&on&FamaQFrench&factor&model&[Fama&&&French&1992]&

•  11&datasets,&two&1me&periods&–  [1]&no&causality&–  [2Q8]&randomly&generated&rela1onships,&one&lag&–  [9Q10]&randomly&generated&rela1onships,&random&lags&

–  [11]&manyQtoQone&rela1onships,&one&lag&

tij

tjijti fr ,,, εβ +=∑

Pa]erns&

1

21

15

8

20

17

7

6

12

11

19

2

18

10

5

9

3

23

0

1316

422

24

23

2

8

16

1

11

17

0

5

20

14

7

18

19

915

22

3 4

10

13

1

5

2 3 4

6

7

Results&•  Results&for&return&data&below &&

–  22&datasets&(11&types,&2&runs&each)&•  Two&4000&day&1me&periods&

–  25&stocks,&3&factors&–  Daily&returns&

•  True&posi1ves&–  Dependency&between&stocks&

FDR FNR Intersection

0.0241 0.0082 0.4714

68"

Inferring&1ming&

•  From&tes1ng&rela1onships&between&all&variables&and&CHF&in&1Q2&weeks,&want&to&ul1mately&infer&"high&AST&leads&to&CHF&in&4Q10&days"&

•  Instead&of&only&accep1ng/rejec1ng&hypotheses,&refine&hypotheses&&&

Intui1on&behind&procedure&

1) Too wide&

2) Shifted&

3) Too narrow&

actual&

s

c

x

r'& s'&

r

|\|

)|()|(),( \

cX

xcePxcePec cXx

avg

∑∈

∧¬−∧=ε P(e | c∧ x) = #(c∧ x∧e)

#(c∧ x)

Window&inference&

•  Assump1on&1:&&A&significant&rela1onship&will&be&found&to&be&significant&in&at&least&one&window&intersec1ng&the&true&window&

•  Assump1on&2:&Significant&rela1onships&are&a&small&propor1on&of&overall&set&tested&

•  Claim:&Itera1vely&perturba1on&of&windows&in&a&greedy&way&converges&to&true&windows&

Recap&•  Find&significant&rela1onships&

– Generate&ini1al&hypotheses&with&candidate&1me&windows,&find&possible&causes&

–  Calculate&εavg&for&each&–  Label&possible&causes&with&fdr&>&threshold&as&εQinsignificant,&rest&εQsignificant&

•  For&sta1s1cally&significant&rela1onships,&itera1vely&perturb&1me&window&(expand,&contract,&shiv)&trying&to&maximize&εavg&&&

•  ReQcalculate&&εavg&with&newly&inferred&rela1onships&

Explana1on&

•  Where&do&explana1ons&come&from?&– Need&formal&representa1on&for&algorithms&

•  Type&≠&Token&– Cannot&assume&observa1ons&will&exactly&fit&what&is&known&

•  Data&– Frequently&missing&– Time&of&event&≠&1me&event&is&recorded&

Timing&

•  LC=&S&•  LC=A&

Intui1on&

•  Type&level&rela1onships&are&candidates&&•  Strictly&applying&typeQlevel&rela1onships&is&problema1c&

•  Without&background&knowledge&that&1ming&is&a&strong&constraint,&do&not&want&this&to&strictly&limit&explana1ons&

Goals&

•  Quan1fy&significance&of&an&explana1on&– Allows&for&incomplete&knowledge&–  Incorporates&temporal&uncertainty&

•  Algorithm&for&finding&the&significance&of&explana1ons&

General&approach&

•  Infer&causal&rela1onships&from&data&•  Observe&token&sequence&

– E.g.&Series&of&medical&tests&or&con1nuous&recordings&

•  Iterate&over&rela1onships&to&calculate&significance&for&each&explana1on&

•  Result&is&ranking&of&possible&explana1ons&

Significance&of&hypotheses&

•  When&cause&and&effect&occur&consistent&with&typeQlevel&informa1on&–  Significance&=&typeQlevel&significance&

•  When&1ming&differs&–  Weight&by&difference&

•  When&unknown&if&c&occurs&–  Weight&by&probability&of&occurrence&

1me&

0& 7& 10&

Exposure&1&

Exposure&2& flu&

Recall&connec1ng&principle!&

Assessing&significance&

&&&&With&

&=1&if&occurs&at&1me&t’ & &=1&if&c&in&window&[r,s]&&V&is&observa1on&sequence&

& & & & &&

),,,'()|(),(),( '' srttfVcPececS tavgtt ××= ε

ecsr

p

≤≥

,

Accoun1ng&for&uncertainty&

•  Slope&may&be&dependent&on&underlying&mechanisms&–  i.e.&when&is&cause&s1ll&capable&of&producing&effect&

•  Can&use&other&func1on&with&same&proper1es&

),,,'()|(),(),( '' srttfVcPececS tavgtt ××= ε

f(t)&

1&

0&

r& s&

Calcula1ng&P&

•  Where&do&probabili1es&come&from?&–  Ini1al&dataset&– Model&

•  Condi1onal&on&tokenQlevel&observa1ons&

),,,'()|(),(),( '' srttfVcPececS tavgtt ××= ε

Algorithm&

so type-level causes have occurred consistently with the type-level timings, they

will have the highest values of this significance. As the goal is to find the causes

with the highest level of significance, we can begin by taking these sets and

testing whether any of their members are true on the particular occasion. Where

Vi

indicates the set of propositions true at time i and et

is the occurrence of

event e at actual time t, the procedure for finding relationships with the highest

levels of significance is as follows.

1. X type-level genuine and just so causes of e

V [V0

,V1

. . .Vt

]

EX ;

2. For each x 2 X, where x = y ;>r,6s e, for i 2 [t- s, t- r] if Vi

|= y the

significance of yi

for et

is "avg

(yr-s

, e) and EX = EXS{x}

3. For each x 2 X \ EX, where x = y ;>r,6s e, and

i = argmaxj2[0,t)

(P(yj

|V)⇥ f(j, t, r, s)),

the significance of yi

for et

is

S(yi

, et

) = "avg

(yr-s

, e)⇥ P(yi

|V)⇥ f(i, t, r, s).

We begin by finding occurrences that exactly match the type-level relation-

ships as these will have the maximum significance value for that type-level

relationship (as both P and f are equal to one, so S = "avg

). For relationships

without exact matches, the times with the highest significance after weighting

by f and P are identified using that the instance of a cause with the maximum

227

Midterm&topics&•  Why&bother&seeking&causality&if&it’s&hard&and&why&aren’t&correla1ons&enough?&

•  Regulari1es/probabili1es/counterfactuals&•  Why&might&an&inferred&rela1onship&be&spurious?&•  Type&vs.&token&•  Basics&of&condi1onal&independence,&Bayes&rule,&rela1onship&between&independence&and&BN,&joint&probability&

•  Interven1on&vs.&observa1on&•  Granger&causality&(conceptual),&more&next&weeks&

NOTE:&These&are&focus&areas&but&anything&we&covered&before&the&midterm&is&fair&game&

For&next&week&

•  Read:&C.&W.&J.&Granger&(1980).&Tes1ng&for&Causality:&A&Personal&Viewpoint.&&

•  Bring&ques1ons&for&midterm&review!!&&