non-gaussian methods for learning linear structural equation models: part i
DESCRIPTION
UAI2010 Tutorial Slides (UAI: Uncertainty in Artificial Intelligence)TRANSCRIPT
![Page 1: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/1.jpg)
1UAI2010 Tutorial, July 8, Catalina Island.
Non-Gaussian Methods for Learning Linear Structural
Equation ModelsEquation Models
Shohei Shimizu and Yoshinobu KawaharaOsaka Universityy
Special thanks to1
Special thanks to Aapo Hyvärinen, Patrik O. Hoyer and Takashi Washio.
![Page 2: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/2.jpg)
2
AbstractAbstractLi t t l ti d l (li SEM )• Linear structural equation models (linear SEMs) can be used to model data generating processes of variablesof variables.
• We review a new approach to learn or estimateWe review a new approach to learn or estimate linear SEMs.
• The new estimation approach utilizes non-Gaussianity of data for model identification
d i l ti t h id i t fand uniquely estimates much wider variety of models.
![Page 3: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/3.jpg)
33
OutlineOutline • Part I. Overview (70 min.) : Shohei
• Break (10 min.)
• Part II. Recent advances (40 min): Yoshi( )– Time series
Latent confounders– Latent confounders
![Page 4: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/4.jpg)
4
Motivation (1/2)Motivation (1/2)• Suppose that data X was randomly generatedSuppose that data X was randomly generated
from either of the following two data generating processes:
Model 1: Model 2:
or11
exbxex
+==
22
12121
exexbx
=+=x1
x2
e1
e2
x1
x2
e1
e2
where and are latent variables (disturbances, errors).
21212 exbx += 22 exx2 e2 x2 e2
1e 2e ( , )
• We want to estimate or identify which model t d th d t X b d th d t X l
1 2
generated the data X based on the data X only.
![Page 5: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/5.jpg)
5
Motivation (2/2)( )• We want to identify which model generated the
data X based on the data X onlydata X based on the data X only.
If 1 d 2 G i it i ll k th t• If x1 and x2 are Gaussian, it is well known that we cannot identify the data generating process.
M d l 1 d 2 ll fit d t
1e 2e
– Models 1 and 2 equally fit data.
If 1 d 2 G i i t ti• If x1 and x2 are non-Gaussian, an interesting result is obtained: We can identify which of Models 1 and 2 generated the data
1e 2e
Models 1 and 2 generated the data.
Thi t t i l i h h G i• This tutorial reviews how such non-Gaussian methods work.
![Page 6: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/6.jpg)
Problem formulation
![Page 7: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/7.jpg)
7
Basic problem setup (1/3)Basic problem setup (1/3)• Assume that the data generating process of g g p
continuous observed variables is graphically represented by a directed acyclic graph (DAG).
ixp y y g p ( )
– Acyclicity means that there are no directed cycles.
E l f di t dExample of a directed acyclic graph (DAG):
Example of a directed cyclic graph:
x3 e3 x3 e3
x1 e1
x2 e2
x1 e1
x2 e2x2 e2 x2 e2
is a parent of etc.1x3x
![Page 8: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/8.jpg)
8
Basic problem setup (2/3)Basic problem setup (2/3)• Further assume linear relations of variables .ix• Then we obtain a linear acyclic SEM (Wright, 1921; Bollen,
1989):
eBxx +=ijiji exbx += ∑ or
where
iij
jiji ∑ofparents:
where– The are continuous latent variables that are not
determined inside the model which we call externalie
determined inside the model, which we call external influences (disturbances, errors).
– The are of non-zero variance and are independenteThe are of non zero variance and are independent.– The ‘path-coefficient’ matrix B = [ ] corresponds to a DAG.
ieijb
![Page 9: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/9.jpg)
99
Example of linear acyclic SEMs• A three-variable linear acyclic SEM:
p y
⎥⎤
⎢⎡
⎥⎤
⎢⎡⎥⎤
⎢⎡
⎥⎤
⎢⎡ 111 5.100 exx
131 5.1 exx +=
⎥⎥⎥
⎦⎢⎢⎢
⎣
+⎥⎥⎥
⎦⎢⎢⎢
⎣⎥⎥⎥
⎦⎢⎢⎢
⎣
−=⎥⎥⎥
⎦⎢⎢⎢
⎣ 3
2
3
2
3
2
000003.1
ee
xx
xx
212 3.1ex
exx=
+−= or
44 344 21
B33 ex =
• B corresponds to the data-generating DAG:
x3 e3x3
x1
e3
e11.5 ijij xxb tofromedgedirected No0 ⇔=
b fddi dA0x2 e2
-1.3 ijij xxb tofromedgedirectedA 0 ⇔≠
![Page 10: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/10.jpg)
10
Assumption of acyclicity• Acyclicity ensures existence of an ordering of
variables that makes B lower triangular withxvariables that makes B lower-triangular with zeros on the diagonal (Bollen, 1989).
ix
⎥⎥⎤
⎢⎢⎡
+⎥⎥⎤
⎢⎢⎡
⎥⎥⎤
⎢⎢⎡
=⎥⎥⎤
⎢⎢⎡
1
3
1
3
1
3
005.1000
ee
xx
xx 0
00 0
0⎥⎥⎤
⎢⎢⎡
+⎥⎥⎤
⎢⎢⎡
⎥⎥⎤
⎢⎢⎡−=⎥
⎥⎤
⎢⎢⎡
2
1
2
1
2
1
003.15.100
ee
xx
xx 00
⎥⎥⎦⎢
⎢⎣⎥
⎥⎦⎢
⎢⎣⎥⎥⎦⎢
⎢⎣ −⎥
⎥⎦⎢
⎢⎣ 222 03.10 exx
44 344 210
B⎥⎥⎦⎢
⎢⎣⎥
⎥⎦⎢
⎢⎣⎥⎥⎦⎢
⎢⎣⎥
⎥⎦⎢
⎢⎣ 333 000 exx
44 344 21
Bx3 e3
permBB:isorderingThex3
x1
e3
e11.5
ofancestoranbemay.g
213
xxxxxx <<
x2-1.3
e2 .versavicenotbut ,,ofancestor an bemay 213 xxx
![Page 11: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/11.jpg)
11Assumption of independence between external influences
• It implies that there are no latent confounders (Spirtes et al. 2000)– A latent confounder is a latent variable that is a parent of morefA latent confounder is a latent variable that is a parent of more
than or equal to two observed variables:f
x1
x2f
e1’
e2’
• Such a latent confounder makes external influences
x2e2
fSuch a latent confounder makes external influences dependent (Part II):
x1 e1
f
x2 e2
![Page 12: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/12.jpg)
1212Basic problem setup (3/3):Learning linear acyclic SEMs
• Assume that data X is randomly sampled from a linear acyclic SEM (with no latentfrom a linear acyclic SEM (with no latent confounders):
eBxx +=x1 e1
21bx2 e2
• Goal: Estimate the path-coefficient matrix B by observing data X only!observing data X only!– B corresponds to the data-generating DAG.
![Page 13: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/13.jpg)
Problems: Identifiability problems of
con entional methodsconventional methods
![Page 14: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/14.jpg)
1414
Under what conditions B is identifiable?
• `B is identifiable’ `B is uniquely determined or estimated from p(x)’
≡estimated from p(x) .
S• Linear acylic SEM:
Bx1 e1
beBxx +=x2 e2
21b
– B and p(e) induce p(x). If p(x) are different for different B then B is uniquely– If p(x) are different for different B, then B is uniquely determined.
![Page 15: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/15.jpg)
15
Conventional estimation principle: Causal Markov condition
• If the data-generating model is a (linear) acyclic SEM causal Markov conditionacyclic SEM, causal Markov condition holds :
E h b d i bl i i i d d t f itx– Each observed variable xi is independent of its non-descendants in the DAG conditional on its
t
ix
parents (Pearl & Verma, 1991) :
( ) ( )∏=p
iii xxpp
1
ofparents|x=i 1
![Page 16: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/16.jpg)
1616Conventional methods based on
causal Markov condition• Methods based on conditional independencies
(Spirtes & Glymour, 1991)
M li li SEM i t f– Many linear acyclic SEMs give a same set of conditional independences and equally fit data.
• Scoring methods based on Gaussianity(Chickering 2002)(Chickering, 2002)
– Many linear acyclic SEMs give a same Gaussian distribution and equally fit datadistribution and equally fit data.
• In many cases the path coefficient matrix B is• In many cases, the path-coefficient matrix B is not uniquely determined.
![Page 17: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/17.jpg)
1717
Example• Two models with Gaussian e1 and :1e 2e
11 ex = 121 8.0 exx +=Model 1: Model 2:
x1 e1 x1 e1
212 8.0 exx += 22 ex =x2 e2 x2 e2
• Both introduce no conditional independence:
( ) ( ) 1varvar 21 == xx( ) ( ) ,021 == eEeE
• Both introduce no conditional independence:
( ) 08.0,cov 21 ≠=xx
• Both induce the same Gaussian distribution:
( )21
⎞⎛ ⎤⎡⎤⎡⎤⎡ 8010x⎟⎟⎠
⎞⎜⎜⎝
⎛⎥⎦
⎤⎢⎣
⎡⎥⎦
⎤⎢⎣
⎡⎥⎦
⎤⎢⎣
⎡18.08.01
00
~2
1 Nxx
![Page 18: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/18.jpg)
A solution:A solution: Non-Gaussian approachpp
![Page 19: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/19.jpg)
1919
A new direction: Non-Gaussian approach
• Non-Gaussian data in many applications:– Neuroinformatics (Hyvarinen et al 2001); BioinformaticsNeuroinformatics (Hyvarinen et al., 2001); Bioinformatics
(Sogawa et al., ICANN2010); Social sciences (Micceri, 1989); Economics (Moneta, Entner, et al., 2010)
• Utilize non-Gaussianity for model identification:– Bentler (1983); Mooijaart (1985); Dodge and Rousson (2001)Bentler (1983); Mooijaart (1985); Dodge and Rousson (2001)
• The path-coefficient matrix B is uniquely p q yestimated if ei are non-Gaussian. – Shimizu, Hoyer, Hyvarinen and Kerminen (JMLR, 2006)
ie
![Page 20: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/20.jpg)
2020Illustrative example: G i G iGaussian vs non-Gaussian
Gaussian Non-Gaussian(uniform)
Model 1:x2x2
Model 1:x1 e1 x1x111
80ex
+=
M d l 2
x2 e2212 8.0 exx +=
Model 2:
x1
x2
x1 e1
x2
121 8.0 exx +=x1
x2 e2x1
22 ex =
( ) ( ) 1varvar 21 == xx( ) ( ) ,021 == eEeE
![Page 21: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/21.jpg)
2121Linear Non-Gaussian A li M d l LiNGAMAcyclic Model: LiNGAM(Shimizu, Hyvarinen, Hoyer & Kerminen, JMLR, 2006)
• Non-Gaussian version of linear acyclic SEM:
eBxx +=exbx += ∑ or eBxx +=iij
jiji exbx += ∑ofparents:
or
whereThe external influence variables (disturbancese– The external influence variables (disturbances, errors) are
• of non-zero variance
ie
of non-zero variance.• non-Gaussian and mutually independent.
![Page 22: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/22.jpg)
22
Identifiability of LiNGAM modelIdentifiability of LiNGAM model
• LiNGAM model can be shown to beidentifiableidentifiable.– B is uniquely estimated.
• To see the identifiability, helpful to review y, pindependent component analysis (ICA) (Hyvarinen et al., 2001).
![Page 23: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/23.jpg)
23Independent Component Analysis (ICA)(ICA) (Jutten & Herault, 1991; Comon, 1994)
• Observed random vector x is modeled byObserved random vector x is modeled by
Asx =∑=p
jiji sax or
where
Asx∑=j
jiji1
– The mixing matrix A = [ ] is square and is of full column rank.The latent ariables (independent components)s
ija
– The latent variables (independent components) are of non-zero variance, non-Gaussian and mutually independent.
is
• Then, A is identifiable up to permutation P and scaling D of the columns:scaling D of the columns:
APDA =ica
![Page 24: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/24.jpg)
2424Estimation of ICA• Most of estimation methods estimate
(Hyvarinen et al., 2001):1−= AW
( y , )
sWAsx 1−==
• Most of the methods minimize mutual information (or its approximation) of estimated independent components:
xWs ica=ˆ• W is estimated up to permutation P and scaling D of the rows:
( )1−== PDAPDWWi
• Consistent and computationally efficient algorithms:
( )PDAPDWWica
– Fixed point (FastICA) (Hyvarinen,99); Gradient-based (Amari, 98)
– Semiparametric: no specific distributional assumption
![Page 25: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/25.jpg)
Back to LiNGAM model
![Page 26: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/26.jpg)
2626Identifiability of LiNGAM (1/3):ICA hi h lf f id tifi tiICA achieves half of identification
• LiNGAM model is ICA• LiNGAM model is ICA.– Observed variables are linear combinations of
non Gaussian independent external influences :ix
enon-Gaussian independent external influences :
eBIxeBxx −−=⇔+= 1)(ie
eWAe 1−== ( )BIW −=
• ICA gives .– P: unknown permutation matrix
)( BIPDPDWW −==icaP: unknown permutation matrix
– D: unknown scaling (diagonal) matrix
• Need to determine P and D to identify B.
![Page 27: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/27.jpg)
2727Identifiability of LiNGAM (2/3):
No permutation indeterminacy (1/6)• ICA gives .
– P : permutation matrix; D: scaling (diagonal) matrix)( BIPDPDWW −==ica
• We want to find such a permutation matrix that cancels the permutation i e :IPP =
PPpermutation , i.e., :IPP =
IDWPDWPWP ==ica
P
• We can show (Shimizu et al., UAI05) (illustrated in the next slides):– If i e no permutation is made on the rows ofIPP =
I=
DW– If , i.e., no permutation is made on the rows of , has no zero in the diagonal (obvious by definition).
If i id ti l t ti i d th
IPP =icaWP
DW
– If , i.e., any nonidentical permutation is made on the rows of , has a zero in the diagonal.
IPP ≠icaWPDW
![Page 28: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/28.jpg)
28
Identifiability of LiNGAM (2/3):No permutation indeterminacy (2/6)
• By definition has all unities in the diagonal.– The diagonal elements of B are all zeros.
BIW −=
• Acyclicity ensures existence of an ordering of variables that makes lower triangular and then isBIWBthat makes lower-triangular, and then is also lower-triangular.
BIW −=B
• So, WLG, can be assumed to be lower-triangular:W
⎥⎥⎤
⎢⎢⎡
= 01*001
W0 0
0 No zeros in the diagonal!
⎥⎥⎦⎢
⎢⎣ 1**
0W in the diagonal!
![Page 29: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/29.jpg)
29Identifiability of LiNGAM (2/3):N t ti i d t i (3/6)No permutation indeterminacy (3/6)
• Premultiplying by a scaling (diagonal) matrix D does NOT change the zero/non-zero pattern of :
WW
⎥⎥⎤
⎢⎢⎡
=11
0*00
dd
DW 0
0 0
⎥⎥⎤
⎢⎢⎡
= 01*001
W0 0
0
⎥⎥⎦⎢
⎢⎣
=
33
22
**0
ddDW 0
⎥⎥⎦⎢
⎢⎣
=1**01W 0
No zeros in the diagonal!
![Page 30: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/30.jpg)
3030
Identifiability of LiNGAM (2/3):No permutation indeterminacy (4/6)
• Any other permutation of the rows of changes the zero/non-zero pattern of and
DWDWg p
brings zero in the diagonal:
⎥⎥⎤
⎢⎢⎡
= 11
2212 00
0*d
dDWP
⎥⎥⎤
⎢⎢⎡
=11
0*00
dd
DW 00 0 0
00⎥⎥⎦⎢
⎢⎣ 33
11
** d⎥⎥⎦⎢
⎢⎣
=
33
22
**0
ddDW 0 00
Exchanging 1stExchanging 1and 2nd rows Zero in the diagonal!
![Page 31: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/31.jpg)
3131
Identifiability of LiNGAM (2/3):No permutation indeterminacy (5/6)
• Any other permutation of the rows of changes the zero/non-zero pattern of and
DWDWg p
brings zero in the diagonal:
⎥⎥⎤
⎢⎢⎡ 11
0*00
dd
DW ⎥⎥⎤
⎢⎢⎡
= 0*** 33
13 dd
DWP00 0
0⎥⎥⎦⎢
⎢⎣
=
33
22
**0*
ddDW
⎥⎥⎦⎢
⎢⎣
=000
11
22
ddDWP0 0
0Exchanging 1st
0Exchanging 1st
and 3rd rows Zero in the diagonal!
![Page 32: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/32.jpg)
3232
Identifiability of LiNGAM (2/3):No permutation indeterminacy (6/6)
• We can find correct by finding that gives no th di l f WP
P Pzero on the diagonal of (Shimizu et al., UAI05).icaWP
• Thus, we can solve the permutation indeterminacy and obtain: y
( )BIDDWPDWPWP −=== ( )BIDDWPDWPWP −===ica
I=
![Page 33: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/33.jpg)
3333
Identifiability of LiNGAM (3/3): No scaling indeterminacy
• Now we have: B)D(IWP −=ica
• Then, ( )icaWPD diag=
• Divide each row of by its corresponding icaWP y p gdiagonal element to get , i.e., :
icaBI − B
( ) BIB)D(IDWPWP −=−= −− 11diag icaica
![Page 34: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/34.jpg)
3434
Estimation of LiNGAM model
1 ICA-LiNGAM algorithm1. ICA LiNGAM algorithm2. DirectLiNGAM algorithm
![Page 35: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/35.jpg)
3535
Two estimation algorithmsg• ICA-LiNGAM algorithm
(Shi i H H i & K i JMLR 2006)(Shimizu, Hoyer, Hyvarinen & Kerminen, JMLR, 2006)
• DirectLiNGAM algorithm (Shimizu Hyvarinen Kawahara & Washio UAI09)(Shimizu, Hyvarinen, Kawahara & Washio, UAI09)
• Both estimate an ordering of variables that makes gthe path-coefficient matrix B to be lower-triangular.– Acyclicity ensures existence of such an ordering.
⎤⎡ OA full DAG
permpermperm exx +⎥⎦
⎤⎢⎣
⎡=
31
Ox3x1
321
permBx2
redundant edges
![Page 36: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/36.jpg)
3636
Once such an ordering is found…
• Many existing (covariance-based) methods can do:– Pruning the redundant path-coefficients
• Sparse methods like weighted lasso (Zou, 2006)
– Finding significant path-coefficients • Testing, bootstrapping (Shimizu et al., 2006; Hyvarinen et al. 2010)
x3x1 x3x1exx +⎥⎤
⎢⎡
= O0*
A full DAG
x2 x2
permpermperm exx +⎥⎦
⎢⎣
=321
00*
* *
permB
![Page 37: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/37.jpg)
3737
1. Outline of ICA-LiNGAM algorithmOut e o C G a go t(Shimizu, Hoyer, Hyvarinen, & Kerminen, JMLR, 2006)
1 Estimate B by ICA 2. Pruning1. Estimate B by ICA + permutation
g
x2x1x1 x2
x3x3x3 23b13b
Redundant edgesRedundant edges
![Page 38: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/38.jpg)
3838
ICA-LiNGAM algorithm (1/2): Step 1. Estimation of B
1. Perform ICA (here, FastICA (Hyvarinen, 1999)) to obtain an estimate of
B)PD(IPDWW −==ica
2. Find a permutation that makes the diagonal elements of as large (non-zero) as possible in b l t l
icaWP ˆP
absolute value:
( )WPP
P ˆ1minˆ = Hungarian alg.
(Kuhn, 1955)
3 N li h f th t
( )iiicaWPP ( , )
WP ˆˆ3. Normalize each row of , then we get an estimate of I-B and .
icaWPB̂
![Page 39: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/39.jpg)
3939ICA-LiNGAM algorithm (2/2): St 2 P iStep 2. Pruning
• Find such an ordering of variables that makes• Find such an ordering of variables that makes estimated B be as close to be lower-triangular as possiblepossible.– Find a permutation matrix Q that minimizes the sum of the
elements in the upper triangular part of permuted :B̂elements in the upper triangular part of permuted :
( )∑≤
=ji
ijT 2ˆminˆ QBQQ
Q
B
– Approximate algorithm for large variables (Hoyer et al., ICA06)
≤ ji
x2x1
0 1 3
0.1
55x1 x2
x3x3x30.1
0.1 3 0.1 3-0.01
![Page 40: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/40.jpg)
4040
Basic properties of ICA-LiNGAM algorithm
• ICA-LiNGAM algorithm = ICA + permutationsComputationally efficient with the help of well developed– Computationally efficient with the help of well-developed ICA techniques.
• Potential problems ICA is an iterative search method:– ICA is an iterative search method:
• May get stuck in a local optimum if the initial guess or step size is badly chosensize is badly chosen.
– The permutation algorithms are not scale-invariant:• May provide different estimates for different scales ofMay provide different estimates for different scales of
variables.
![Page 41: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/41.jpg)
41
Estimation of LiNGAM model
1 ICA-LiNGAM algorithm1. ICA LiNGAM algorithm2. DirectLiNGAM algorithm
![Page 42: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/42.jpg)
4242
2. DirectLiNGAM algorithm2. DirectLiNGAM algorithm(Shimizu, Hyvarinen, Kawahara & Washio, UAI2009)
• Alternative estimation method without ICA– Estimates an ordering of variables that makes path-g p
coefficient matrix B to be lower-triangular.
⎤⎡A full DAG
permpermperm exx +⎥⎦
⎤⎢⎣
⎡= O
x3x1
A full DAG
⎦⎣ 321
permB x2Redundant edges
• Many existing (covariance-based) methods can do further pruning or finding significant pathdo further pruning or finding significant path coefficients (Zou, 2006; Shimizu et al., 2006; Hyvarinen et al. 2010)
![Page 43: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/43.jpg)
43Basic idea (1/2) :A i bl b t thAn exogenous variable can be at the
top of a right orderingtop of a right ordering • An exogenous variable is a variable with no jxg
parents (Bollen, 1989), here .– The corresponding row of B has all zeros.
3xp g
• So, an exogenous variable can be at the top ofsuch an ordering that makes B lower-triangularwith zeros on the diagonal.
⎥⎥⎤
⎢⎢⎡
+⎥⎥⎤
⎢⎢⎡
⎥⎥⎤
⎢⎢⎡
=⎥⎥⎤
⎢⎢⎡ 333
0051000
ee
xx
xx
000
00x3 x1 x2
⎥⎥⎦⎢
⎢⎣
+⎥⎥⎦⎢
⎢⎣⎥⎥⎦⎢
⎢⎣ −
=⎥⎥⎦⎢
⎢⎣ 2
1
2
1
2
1
03.10005.1
ee
xx
xx 0 0
0
x3 x1 x2
![Page 44: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/44.jpg)
44Basic idea (2/2): R tRegress exogenous out
• Compute the residuals regressing the other
3x( ) )21(3 =ir• Compute the residuals regressing the other
variables on exogenous :The residuals form a LiNGAM model
( ) )2,1( =iri
3x)2,1( =ixi( ) ( )33 and rr– The residuals form a LiNGAM model.
– The ordering of the residuals is equivalent to that of corresponding original variables.
21 and rr
p g g
⎥⎤
⎢⎡
⎥⎤
⎢⎡⎥⎤
⎢⎡
⎥⎤
⎢⎡ 333 000 exx 0 0 000
⎤⎡⎤⎡⎤⎡⎤⎡
⎥⎥⎥
⎦⎢⎢⎢
⎣
+⎥⎥⎥
⎦⎢⎢⎢
⎣⎥⎥⎥
⎦⎢⎢⎢
⎣ −=
⎥⎥⎥
⎦⎢⎢⎢
⎣ 2
1
2
1
2
1
03.10005.1
ee
xx
xx 0 0
0 ⎥⎦
⎤⎢⎣
⎡+⎥
⎦
⎤⎢⎣
⎡⎥⎦
⎤⎢⎣
⎡−
=⎥⎦
⎤⎢⎣
⎡
2
1)3(
2
)3(1
)3(2
)3(1
03.100
ee
rr
rr 0 0
0⎦⎣⎦⎣⎦⎣⎦⎣ 222 ⎦⎣⎦⎣
)3(2r
)3(1rx3 x1 x2
• Exogenous implies ` can be at the second top’.)3(1r 1x
![Page 45: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/45.jpg)
45
Outline of DirectLiNGAM• Iteratively find exogenous variables until all the
variables are ordered:variables are ordered:1. Find an exogenous variable .
P t t th t f th d i3x
x– Put at the top of the ordering. – Regress out.
2 Fi d id l h )3(r
3x3x
2. Find an exogenous residual, here .– Put at the second top of the ordering.
R t
)(1r
1x)3(– Regress out.
3. Put at the third top of the ordering and terminate. Th ti t d d i i
)3(1r
2x<<The estimated ordering is .213 xxx <<
Step. 1 Step. 2 Step. 3)3(
2r)3(
1rx3 x1 x2 )1,3(2r
![Page 46: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/46.jpg)
46-1Identification of an exogenous variable (two variable cases)
ii) is NOT exogenousi) is exogenous)( 11 ex = 1xii) is NOT exogenous.i) is exogenous.
( )011 =
bbex
)( 11 ex 1x( )122121 0bxbx ≠+= 1e
( )02121212 ≠+= bexbx 22 ex =
)var(),cov(,on Regressing
112
2)1(
2
12
xx
xxxr
xx
−=1
122
)1(2
12
),cov(,on Regressing
xxxxr
xx
−=
( ))var(
var)var(
),cov(1
)var(
2122
1212
1
xxbx
xxxb
x
−⎭⎬⎫
⎩⎨⎧−=1212
11
22 )var(xbx
xx
xr
−=1e
)var()var( 11 xx ⎭⎩2e=
t.independenNOTareand )1(21 rxt.independenareand )1(
21 rx
![Page 47: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/47.jpg)
46-2Need to use Darmois-Skitovitch’ ththeorem (Darmois, 1953; Skitovitch, 1953)
ii) is NOT exogenous1x( )1212121 0bexbx ≠⋅+=
Darmois-Skitovitch’ theorem:
Define two variables and as
ii) is NOT exogenous.1x
1x1
2x22 ex =Define two variables and as
∑∑ ==p
jj
p
jj eaxeax 2211 ,
1x 2x
112
2)1(
2
21
)var(),cov(,on Regressing
xx
xxxr
xx
−=
∑∑== j
jjj
jj11
where are independent d i bl
je
( )1
22
1212
1
)var(var
)var(),cov(1
)var(
ex
xxx
xxb
x
−⎭⎬⎫
⎩⎨⎧−=
random variables.
If there exists a non-Gaussian ie 12b
11 )var()var( xx ⎭⎩for which , and are dependent.
021 ≠iiaa1x 2x
t.independenNOTareand )1(21 rx
![Page 48: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/48.jpg)
47
Identification of an exogenous variable (p variable cases)
• Lemma 1: and its residual ( )j
jii
ji x
xxxr
)cov( ,−=jxLemma 1: and its residual
are independent for all is exogenous
jj
ii xx
x)var(j
ji ≠ jx⇔are independent for all is exogenous
I ti id tif i bl
ji ≠ jx⇔
• In practice, we can identify an exogenous variable by finding a variable that is most independent of it id lits residuals.
![Page 49: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/49.jpg)
48
Independence measures• Evaluate independence between a variable and a
residual by a nonlinear correlation:
p
residual by a nonlinear correlation:
( ){ } ( )tanhcorr )( =grgx j
• Taking the sum over all the residuals we get:
( ){ } ( )tanh,corr =grgx ij
• Taking the sum over all the residuals, we get:
( ){ } ( ){ }∑ += jj rxgrgxT )()( corrcorr
C hi ti t d ll
( ){ } ( ){ }∑≠
+=ji
ijij rxgrgxT ,corr,corr
• Can use more sophisticated measures as well (Bach & Jordan, 2002; Gretton et al., 2005; Kraskov et al., 2004).– Kernel-based independence measure (Bach & Jordan 2002)Kernel based independence measure (Bach & Jordan, 2002)
often gives more accurate estimates (Sogawa et al., IJCNN10).
![Page 50: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/50.jpg)
49
Important properties of DirectLiNGAM
• DirectLiNGAM repeats:Least squares simple linear regression– Least squares simple linear regression
– Evaluation of pairwise independence between each variable and its residualsvariable and its residuals.
• No algorithmic parameters like stepsize, initial g p p ,guesses, convergence criteria.
• Guaranteed convergence to the right solution in a fixed number of steps (the number of variables) if the data strictly follows the model.
![Page 51: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/51.jpg)
50Estimation of LiNGAM model: Summary (1/2)
• Two estimation algorithms:ICA LiNGAM: Estimation using ICA– ICA-LiNGAM: Estimation using ICA
• Pros. Fast• Cons Possible local optimum; Not scale invariant• Cons. Possible local optimum; Not scale-invariant
– DirectLiNGAM: Alternative estimation without ICA• Pros. Guaranteed convergence; Scale-invariant• Cons. Less fast
– Cf. Neither needs faithfulness (Shimizu et al., JMLR, 2006; Hoyer personal comm July 2010)2006; Hoyer, personal comm., July, 2010).
![Page 52: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/52.jpg)
51Estimation of LiNGAM model: Summary (2/2)
• Experimental comparison of the two algorithms: (Sogawa et al., IJCNN2010)
• Sample size: Both need at least 1000 sample size for more than 10 variables.S l bilit B th l 100 i bl Th• Scalability: Both can analyze 100 variables. The performances depend on the sample size etc., of course!
• Scale invariance: ICA-LiNGAM is less robust for changing g gscales of variables.
• Local optima?: For less than 10 variables ICA LiNGAM often a bit– For less than 10 variables, ICA-LiNGAM often a bit better.
– For more than 10 variables, DirectLiNGAM often better perhaps because the problem of local optima might become more serious?
![Page 53: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/53.jpg)
52
Testing andTesting and Reliability evaluationy
![Page 54: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/54.jpg)
53
Testing testable assumptionsg p• Non-Gaussianity of external influences :iey
– Gaussianity testsi
• Could detect violations of some assumptions:– Local test
• Independence of external influences• Conditional independencies between observed
variables (causal Markov condition)
ie
xvariables (causal Markov condition)• Linearity
– Overall fit of the model assumptions
ix
Overall fit of the model assumptions• Chi-square test using 3rd and/or 4th-order moments
(Shimizu & Kano, 2008) • Still under development
![Page 55: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/55.jpg)
54
Reliability evaluationReliability evaluation• Need to evaluate statistical reliability of LiNGAMy
results:– Sample fluctuations– Smaller non-Gaussianity makes the model closer to
be NOT identifiable.
• Reliability evaluation by bootstrapping:(Komatsu Shimizu & Shimodaira ICANN2010; Hyvarinen et al JMLR 2010)(Komatsu, Shimizu & Shimodaira, ICANN2010; Hyvarinen et al., JMLR, 2010)
– If either the sample size is small or the magnitude of non-Gaussianity is small, LiNGAM would give very different results for bootstrap samples.
![Page 56: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/56.jpg)
55
Extensions
![Page 57: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/57.jpg)
56
Extensions (a partial list)Extensions (a partial list)R l i th ti f LiNGAM d l• Relaxing the assumptions of LiNGAM model:– Acyclic Cyclic (Lacerda et al., UAI2008)
– Single homogenous population heterogeneous population (Shimizu et al., ICONIP2007)
– i.i.d. sampling time structures (Part II.) (Hyvarinen et al, JMLR, 2010; Kawahara, S & Washio, 2010)
– No latent confounders Allow latents (Part II.) (Hoyer et al., IJAR, 08; Kawahara, Bollen et al., 2010)
– Linear Non-linear (Hoyer et al., NIPS08; Zhang & Hyvarinen, UAI09; Tilmann, Gretton & Spirtes, NIPS09)
![Page 58: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/58.jpg)
57
Application areas so far
![Page 59: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/59.jpg)
58
Non-Gaussian SEMs have been applied to…
• Neuroinformatics– Brain connectivity analysis (Hyvarinen et al., JMLR, 2010)y y ( y )
• Bioinformatics– Gene network estimation (Sogawa et al ICANN2010)Gene network estimation (Sogawa et al., ICANN2010)
• Economics (Wan & Tan, 2009; Moneta, Entner, Hoyer & Coad, 2010)
• Genetics (Ozaki & Ando, 2009)
• Environmental sciences (Niyogi et al., 2010)( y g , )
• Physics (Kawahara, Shimizu & Washio, 2010)
S i l• Sociology (Kawahara, Bollen, Shimizu & Washio, 2010)
![Page 60: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/60.jpg)
59
Final summary of Part IFinal summary of Part IU f G i it i li SEM i• Use of non-Gaussianity in linear SEMs is useful for model identification.
• Non-Gaussian data is encountered in many applicationsapplications.
• The non-Gaussian approach can be a goodThe non Gaussian approach can be a good option.
• Links to codes and papers: http://homepage.mac.com/shoheishimizu/lingampapers.html
![Page 61: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/61.jpg)
60
FAQs
![Page 62: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/62.jpg)
61
Q. My data is Gaussian. LiNGAM will not be useful.
• A. You’re right. Try Gaussian methods.
• Comment: Hoyer et al. (UAI2008) showed: `To hat e tent one can identif the model for aTo what extent one can identify the model for a mixture of Gaussian and non-Gaussian external influence variables’influence variables’.
![Page 63: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/63.jpg)
62
Q. I applied LiNGAM, but the result is not reasonable to background knowledge.
• A. You might first want to check: Some model assumptions might be violated– Some model assumptions might be violated.
Try other extensions of LiNGAMor non-parametric methods PC or FCI etcor non parametric methods PC or FCI etc. (Spirtes et al., 2000).
– Small sample size or small non-GaussianityTry bootstrap to see if the result is reliableTry bootstrap to see if the result is reliable.
Background knowledge might be wrong– Background knowledge might be wrong.
![Page 64: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/64.jpg)
63
Q. Relation to causal Markov condition?
• A. The following 3 estimation principles are equivalent (Zhang & Hyvarinen ECML09; Hyvarinen et al JMLR 2010):equivalent (Zhang & Hyvarinen, ECML09; Hyvarinen et al., JMLR, 2010):1. Maximize independence between external influences .2 Minimize the sum of entropies of external influences
iee2. Minimize the sum of entropies of external influences .
3. Causal Markov condition (Each variable is independent of its non-descendants in the DAG conditional on its
ie
its non-descendants in the DAG conditional on its parents) AND maximization of independence between the parents of each variable and its corresponding p p gexternal influences .ie
![Page 65: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/65.jpg)
64
Q. I am a psychometrician and am more interested in latent factors.• A. Shimizu, Hoyer, and Hyvarinen. (2009)
proposes LiNGAM for latent factors:proposes LiNGAM for latent factors:
dBff += -- LiNGAM for latent factors
eGfx += -- Measurement model
![Page 66: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/66.jpg)
65
Others• Q. Prior knowledge?
– It is possible to incorporate prior knowledge. The accuracy of p p p g yDirectLiNGAM is often greatly improved even if the amount of prior knowledge is not so large (Inazumi et al., LVA/ICA2010).
• Q. Sparse LiNGAM?– Zhang et al. (ICA09) and Hyvarinen et al. (JMLR, 2010).– ICA + adaptive Lasso (Zou 2006)ICA + adaptive Lasso (Zou, 2006).
• Q. Bayesian approach?H d H tti (UAI09) H d Wi th (NIPS09)– Hoyer and Hyttinen (UAI09); Henao and Winther. (NIPS09).
• Q. The idea can be applied to discrete variables?Q. The idea can be applied to discrete variables?– One proposal by Peters et al. (AISTATS2010).– Comment: if your discrete variables are close to be continuous, e.g.,
ordinal scales with many points LiNGAM might workordinal scales with many points, LiNGAM might work.
![Page 67: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/67.jpg)
66
Q Nonlinear extensions?• A Several nonlinear SEMs have been proposed:
Q. Nonlinear extensions?• A. Several nonlinear SEMs have been proposed:
– DAG; No latent confounders.
( ) ij
iiji ex jfx +=∑ ofparent -- Imoto et al. (2002)1.
( ) iiii
j
exfx += ofparents -- Hoyer et al. (NIPS08)2.
( )( )iiiii exffx += − ofparents1,12, -- Zhang et al. (UAI09)3.
• For two variable cases, unique identification possible except several combinations of nonlinearities and distributions (Hoyer et al., NIPS08; Zhang & Hyvarinen, UAI09).
![Page 68: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/68.jpg)
67
Nonlinear extensions (continued)
• Proposals to aim at computational efficiency (Mooij et al., ICML09; Tilmann et al., NIPS09; Zhang & Hyvarinen, ECML09; UAI09).
• Pros: – Nonlinear models are more general than linear models.g
• Cons:– Computationally demanding.
• Current: at most 7 or 8 variables.• Perhaps assumption of Gaussian external influences might• Perhaps, assumption of Gaussian external influences might
help: Imoto et al. (2002) analyzes 100 variables.
– More difficult to allow other possible violations of LiNGAMassumptions, latent confounders etc.
![Page 69: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/69.jpg)
68
Q. My data follows neither such linear SEMs nor such nonlinear SEMs as you
have talked.
• A Try non-parametric methods e g
have talked.
A. Try non parametric methods, e.g.,– DAG: PC (Spirtes & Glymour, 1991)
– DAG with latent confounders: FCI (Spirtes et al 1995)DAG with latent confounders: FCI (Spirtes et al., 1995).
( )exfx ofparents=
( ) f
( )iiii exfx ,ofparents=
• Probably you get an (probably large) class of equivalence models rather than a single model, b h ld b h b lbut that would be the best you currently can.
![Page 70: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/70.jpg)
69
Q Deterministic relations?Q. Deterministic relations?
• A. LiNGAM is not applicable.
• See Daniusis et al. (UAI2010) for a method to analyze deterministic relationsanalyze deterministic relations.
![Page 71: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/71.jpg)
70
Extra slides
![Page 72: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/72.jpg)
Very brief review ofVery brief review of Linear SEMs and causalityy
See Pearl (2000) for detailsSee Pearl (2000) for details.
![Page 73: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/73.jpg)
72
We can give a causal interpretation to each path-coefficient bij (Pearl, 2000).ijb
• Example:
11
exbxex
+= x1 e1
221b
32323
21212
exbxexbx
+=+=
x3
x2
e3
e232b
32b
• Causal interpretation of :32bCausa e p e a o o,constant atoconstant afromchangeyouIf 102 ccx
32
( ).bychangewill)(then 01323 ccbxE −
![Page 74: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/74.jpg)
73
`Change from to ’2x 1c0cg• It means `Fix at regardless of the variables 2x 0c
2 10
that previously determines and change the constant from to ’ (Pearl, 2000).1c0c
2x
• `Fix at regardless of the variables that determines ’ is denoted by .2x ( )02 cxdo =
2x 0c
– In SEM, `Replace the function determining with a constant ’:
Mutilated model0c
2x
11 ex =11 ex =
Mutilated model with do(x2=c0):( )02 cxdo =
21212
11
exbxexbx
+=+=
02
11
bcxex
+=
32323 exbx +=32323 exbx +=
(Assumption of invariance: the other functions don’t change)
![Page 75: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/75.jpg)
74
Average causal effect: Definition (Rubin, 1974; Pearl, 2000)
• Average causal effect of x2 on x3 when changing x2 from c0 to c1:2x
2x 3x0c 1cchanging x2 from c0 to c1:
( )( ) ( )( )023123 || cxdoxEcxdoxE =−= c0c1
2 0c 1c
0cc01c
Computed based on the mutilated models
( )0132 ccb −= 32b– Computed based on the mutilated models
with do(x2=c1) or do(x2=c0) .( )12 cxdo = ( )02 cxdo =
• Average causal effects can be computed based on estimated path-coefficient matrix Bon estimated path coefficient matrix B.
![Page 76: Non-Gaussian Methods for Learning Linear Structural Equation Models: Part I](https://reader031.vdocuments.us/reader031/viewer/2022020123/559e99701a28ab0c208b4762/html5/thumbnails/76.jpg)
75Cf. Conducting experiments with d i t i th d trandom assignment is a method to
estimate average causal effectsestimate average causal effects• Example: a drug trialp g• Random assignment of subjects into two groups
fixes to or regardless of the variables that2x 1c 0cfixes to or regardless of the variables that determines :– Experimental group with (take the drug)
2 1c 0c2x
12 cx =Experimental group with (take the drug) – Control group with (placebo)02 cx =
12 cx
( )( ) ( )( )023123 || cxdoxEcxdoxE =−=
for the exp. group for the control group-)( 3xE )( 3xE