a non-gaussian model for causal discovery in the presence of hidden common causes

30
Shohei Shimizu Shiga University / Osaka University Japan A non-Gaussian model for causal discovery in the presence of hidden common causes 2016 Munich Workshop on Causal Inference and Information Theory

Upload: shiga-university-riken

Post on 28-Jan-2018

1.192 views

Category:

Science


0 download

TRANSCRIPT

Page 1: A non-Gaussian model for causal discovery in the presence of hidden common causes

Shohei Shimizu

Shiga University / Osaka University

Japan

1

A non-Gaussian model for causal

discovery in the presence of hidden

common causes

2016 Munich Workshop on

Causal Inference and Information Theory

Page 2: A non-Gaussian model for causal discovery in the presence of hidden common causes

Abstract

• Managing hidden common causes is

essential in causal discovery

• Non-causally-related observed variables

can be correlated due to hidden common

causes

• Propose a linear non-Gaussian model for

estimating causal direction in cases with

hidden common causes

2

Page 3: A non-Gaussian model for causal discovery in the presence of hidden common causes

Motivation

Illustrative example

Page 4: A non-Gaussian model for causal discovery in the presence of hidden common causes

Strong correlation btw chocolate

consumption and number of Nobel

laureates (Messerli12NEJM)

4

2002-2011Chocolate consumption (kg/yr/capita)Num

. N

obel la

ure

ate

s p

er

10 m

illio

n p

op.

Corr. 0.791

P-value < 0.001

Page 5: A non-Gaussian model for causal discovery in the presence of hidden common causes

Eating more chocolate increases

num. Nobel laureates?

• Interpretational drift (Maurage+13, J. Nutrition)

5

Chclt Nobel?

Chclt Nobelor

GDP GDP

Chclt Nobelor

GDP

Corr. 0.791

P-value < 0.001

No

bel

Chocolate

Hidden

Common

cause

Manage this gap!

Hidden

Common

cause

Hidden

Common

cause

Page 6: A non-Gaussian model for causal discovery in the presence of hidden common causes

Formulating the problem

Page 7: A non-Gaussian model for causal discovery in the presence of hidden common causes

Structural causal models (Pearl, 2000,2009; cf. Bollen, 1989)

• A framework for describing causal relations

• Generally speaking, if the value of 𝑥1 has

been changed and then that of 𝑥2 changes,

then 𝑥1 causes 𝑥2

7

2122

111

,,

,

efxgx

efgx

x1 x2

f

e1 e2

GDP

NobelChclt

Page 8: A non-Gaussian model for causal discovery in the presence of hidden common causes

Challenge in causal discovery8

Hidden common cause

2122

111

,,

,

efxgx

efgx

Data matrixx1

x2 21... ,~ xxpdii

obs.1

Assume that either of

the three generated

the data

Estimate which of the

three models generated

the data

obs.nobs.2 …

x1 x2

f

x1 x2

f

x1 x2

f

e1 e2 e1 e2 e1 e2

fpepep ,, 21

Hidden common cause Hidden common cause

222

1211

,

,,

efgx

efxgx

222

111

,

,

efgx

efgx

fpepep ,, 21 fpepep ,, 21

Page 9: A non-Gaussian model for causal discovery in the presence of hidden common causes

Under what conditions

can we manage the gap?

• We have shown that it is possible under the three

assumptions: i) linearity; ii) Acyclicty;

iii) non-Gaussianity (Hoyer+08IJAR; Shimizu+14JMLR):

• Classical Bayesian network approach incapable

9

x1 x2?

x1 x2or

f1 f1

x1 x2

f1

or

21211212

11121

efxbx

efx

21212

11122121

efx

efxbx

22212

11121

efx

efx

Page 10: A non-Gaussian model for causal discovery in the presence of hidden common causes

Basic non-Gaussian model

(No hidden common cause)

S. Shimizu, P. O. Hoyer, A. Hyvärinen

and A. Kerminen

Journal of Machine Learning Research

2006

Page 11: A non-Gaussian model for causal discovery in the presence of hidden common causes

Linear Non-Gaussian Acyclic

Model (LiNGAM) (Shimizu et al., 2006)

• Identifiable: causal directions and coefficients

• Various extensions including nonlinear (Hoyer+08NIPS,

Zhang+09UAI) and cyclic (Lacerda+08UAI) models

11

i

ij

jiji exbx

x1 x2

x3

21b

23b13b

2e

3e

1e

Linearity

Acyclicity

Non-Gaussian errors eiIndependence of errors ei

(no hidden common causes)

Page 12: A non-Gaussian model for causal discovery in the presence of hidden common causes

1212Different directions give

different data distributionsGaussian Non-Gaussian

(ex. uniform)

Model 1:

Model 2:

x1

x2

x1

x2

e1

e2

x1

x2

e1

e2

x1

x2

x1

x2

x1

x2

212

11

8.0 exx

ex

22

121 8.0

ex

exx

1varvar 21 xx

,021 eEeE

Page 13: A non-Gaussian model for causal discovery in the presence of hidden common causes

13

Independent Component Analysis

(ICA) (Jutten & Herault, 1991; Comon, 1994; Hyvarinen et al., 2001)

• Observed variables are modeled by

where

– Hidden variables are non-Gaussian and independent

• Then, mixing matrix A is identifiable up to permutation and scaling of the columns

Asx

pjs j ,,1

p

j

jiji sax1

or

ix

Page 14: A non-Gaussian model for causal discovery in the presence of hidden common causes

Sketch of the identifiability proof

• Different directions give different zero/non-

zero patterns of the mixing matrices

– No zeros on the diagonal in the causal model

– No permutation indeterminacy

14

2

1

212

1

1

01

e

e

bx

x

21212

11

exbx

ex

A sx

2

112

2

1

10

1

e

eb

x

x

A sx22

12121

ex

exbx

x1

x2

e1

e2

x1

x2

e1

e2

0

0

Model 1:

Model 2:

Page 15: A non-Gaussian model for causal discovery in the presence of hidden common causes

LiNGAM with hidden

common causes

P. O. Hoyer, S. Shimizu, A. Kerminen,

and M. Palviainen

Int. J. Approximate Reasoning

2008

Page 16: A non-Gaussian model for causal discovery in the presence of hidden common causes

qf

2121

1

22

1

1

11

exbfx

efx

Q

q

qq

Q

q

qq

i

ij

jij

Q

q

qiqi exbfx 1

• Extension to incorporate non-Gaussian hidden

common causes

LiNGAM with hidden

common causes (Hoyer+08IJAR)

16

where are independent: ),,1( Qqfq

x1 x2 2e1e

1f 2f

Page 17: A non-Gaussian model for causal discovery in the presence of hidden common causes

i

ij

jij

Q

q

qiqi exbfx 1

2

:2 fef

1

:1 fef

qfWLG, hidden common causes

are assumed to be independent

Independent hidden

common causes

17

x1 x2 2e1e

1fe

2fe

x1 x2 2e1e

1f 2f

Dependent hidden

common causes

2

1

2221

11

2221

11

2

100

2

1

f

f

aa

a

e

e

aa

a

f

f

f

f

Page 18: A non-Gaussian model for causal discovery in the presence of hidden common causes

Non-Gaussian

x2

x1

Gaussian e1,e2, f1

x2

• Faithfulness on 𝑥𝑖, 𝑓𝑖 + Number of 𝑓𝑖 given

Different directions give different

zero/non-zero patterns (Hoyer+08IJAR)

18

x1 x2

f1

x1 x2

f1

x1 x2

f1

Models

1.

2.

3.

**0

*0*

***

*0*

**0

***

A

A

Page 19: A non-Gaussian model for causal discovery in the presence of hidden common causes

Previous estimation methods(Hoyer+08IJAR; Henao+11JMLR)

• Explicitly model hidden common causes

• Do model comparison based on maximum

likelihood principle or Bayesian approach

• Need to specify their number and distributions,

which is difficult in general

19

x1 x2

f1

x1 x2

orfQ f1 fQ… …

2e1e2e1e

Page 20: A non-Gaussian model for causal discovery in the presence of hidden common causes

Our proposal:

A Bayesian LiNGAM

approach

S. Shimizu and K. Bollen.

Journal of Machine Learning Research,

2014

and something extra

Page 21: A non-Gaussian model for causal discovery in the presence of hidden common causes

Key idea (1/2)

• Transform the model to a model with

no hidden common causes

21

)1(

1x)1(

2x

)(

2

mx)1(

1xx1 x2

f1 fQ…

2e1e

)1(

2e)1(

1e

)(

2

me)(

1

me

……

21b

21b

21b)(

2

m

)1(

2

LiNGAM with no hidden

common causes but with

possibly different

intercepts over obs.

LiNGAM with

hidden common

causes

)1(

1

)(

1

m

Page 22: A non-Gaussian model for causal discovery in the presence of hidden common causes

Key idea (2/2)

• Include the sums of hidden common causes as

the model parameters, i.e., observation-specific

intercepts:

• Not explicitly model hidden common causes

– Neither necessary to specify the number of hidden

common causes Q nor estimate the coefficients

22

)(

2

m

)(

2

)(

121

1

)(

2

)(

2

mmQ

q

m

qq

m exbfx

m-th obs.:

q2

Obs.-specific

intercept

Page 23: A non-Gaussian model for causal discovery in the presence of hidden common causes

• Compare the marginal likelihoods wth data stndrdzd

• Once a direction has been estimated, compute the

posterior of the connection strength b21 or b12

• Many obs.-specific intercepts

– Similar to mixed models and multi-level models

– Informative prior

)()(

121

)(

2

)(

2

)(

1

)(

1

)(

1

m

i

mmm

mmm

exbx

ex

Bayesian model selection23

),,1;2,1()( nmim

i

Model 3 (x1 x2)

)(

2

)(

2

)(

2

)(

1

)(

212

)(

1

)(

1

mmm

mmmm

ex

exbx

Model 4 (x1 x2)

Page 24: A non-Gaussian model for causal discovery in the presence of hidden common causes

Prior for the observation-specific

intercepts

• Motivation: Central limit theorem

– Sums of independent variables tend to be more Gaussian

• Approximate the density by a bell-shaped curve dist.

– Dependent due to hidden common causes

• Select the hyper-parameter values

that maximize the marginal likelihood

24

Q

q

m

qq

mQ

q

m

qq

m ff1

)(

2

)(

2

1

)(

1

)(

1 ,

~)(

2

)(

1

m

m

t-distribution with sd ,

correlation , and DOF1221,v

}8.0,.6.0,4.0{, 21

)(m

qf

(here, 8)

Page 25: A non-Gaussian model for causal discovery in the presence of hidden common causes

Error distributions and other

priors used in the experiment

• Error distributions

– Fixed to be the Laplace distribution

– Possible to be estimated assuming a family of

generalized Gaussian distributions, for

example

• Priors for the other parameters

25

)75.0,0(~

)75.0,0(~

)1,1(~

2

21

2

12

12

Nb

Nb

U

)1,0(~)(

)1,0(~)(

2

1

Uestd

Uestd

)(),( 21 epep

Page 26: A non-Gaussian model for causal discovery in the presence of hidden common causes

Experiment on sociology data

Page 27: A non-Gaussian model for causal discovery in the presence of hidden common causes

Sociology data

• Source: General Social Survey (n=1380)– Non-farm background, ages 35-44, white, male, in the labor

force, no missing data for any of the covariates, 1972-2006

• 15 pairs with known temporal directions (Duncan+1972)

27

Status attainment model(Duncan et al., 1972)

x2: Son’s Income

Page 28: A non-Gaussian model for causal discovery in the presence of hidden common causes

Numbers of successes

(n=1380)

28

FE

Cf. LiNGAM-GU-UK (Chen+13NECO) 0.20; PNL(Zhang+09UAI): 0.60

Known (temporal)

orderings of 15 pairs

Son’s

Education

Father’s

Education

Son’s

Income

Son’s

Occupation

f1

f1

Page 29: A non-Gaussian model for causal discovery in the presence of hidden common causes

Conclusion

Page 30: A non-Gaussian model for causal discovery in the presence of hidden common causes

Conclusion• Estimation of causal direction in the presence of

hidden common causes is a major challenge in

causal discovery

• Proposed a linear non-Gaussian SEM approach

– Not necessary to model individual hidden common

causes

• Future directions

– Cyclic cases: Using some prior for forcing the

identifiability condition of Lacerda+08UAI?

– Non-stationarity: Combining with Kun’s method

(Huang+15IJACI)?

30