“ideal parent” structure learning
DESCRIPTION
“Ideal Parent” Structure Learning. Gal Elidan with Iftach Nachman and Nir Friedman. School of Engineering & Computer Science The Hebrew University, Jerusalem, Israel. Variables. Data. S. C. E. S. C. S. C. E. D. D. E. 1. Consider local changes. D. 2. Score each candidate. S. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: “Ideal Parent” Structure Learning](https://reader034.vdocuments.us/reader034/viewer/2022051516/56812b0e550346895d8effed/html5/thumbnails/1.jpg)
“Ideal Parent” Structure Learning
School of Engineering & Computer Science
The Hebrew University, Jerusalem, Israel
Gal Elidan
withIftach Nachman and Nir Friedman
![Page 2: “Ideal Parent” Structure Learning](https://reader034.vdocuments.us/reader034/viewer/2022051516/56812b0e550346895d8effed/html5/thumbnails/2.jpg)
Problems: Need to score many candidates Each one requires costly parameter optimization
Structure learning is often impractical
S C
E
D
S C
E
D
S C
E
D
S C
E
D
Learning Structure
Data
VariablesInput:
-17.23
-19.19
-23.13
Inst
ance
s
S C
E
D
Output:
Init: Start with initial structure
Consider local changes1Score each candidate2
Apply best modification 3 The “Ideal Parent” Approach Approximate improvements of changes (fast)
Optimize & score promising candidates (slow)
![Page 3: “Ideal Parent” Structure Learning](https://reader034.vdocuments.us/reader034/viewer/2022051516/56812b0e550346895d8effed/html5/thumbnails/3.jpg)
EC
P(E
| C
)
D
A
C
E
B
Linear Gaussian Networks
![Page 4: “Ideal Parent” Structure Learning](https://reader034.vdocuments.us/reader034/viewer/2022051516/56812b0e550346895d8effed/html5/thumbnails/4.jpg)
Goal: Score only promising candidates
The “Ideal Parent” Idea
Parent Profile
Child Profile
Instances
Pred(X|U)
U
X
![Page 5: “Ideal Parent” Structure Learning](https://reader034.vdocuments.us/reader034/viewer/2022051516/56812b0e550346895d8effed/html5/thumbnails/5.jpg)
Goal: Score only promising candidates
The “Ideal Parent” Idea
Ideal Profile
Instances
Pred(X|U)
U
X
Y
Step 1:Compute optimal
hypothetical parent
Pred(X|U,Y)
Instances
pote
ntia
l par
ents
Step 2:Search for
“similar” parent
Z1
Z2
Z3
Z4
Parent Profile
Child Profile
![Page 6: “Ideal Parent” Structure Learning](https://reader034.vdocuments.us/reader034/viewer/2022051516/56812b0e550346895d8effed/html5/thumbnails/6.jpg)
Step 3:Add new parent
and optimize parameters
Goal: Score only promising candidates
The “Ideal Parent” Idea
Instances
U
X
Step 1:Compute optimal
hypothetical parent
Instances
pote
ntia
l par
ents
Step 2:Search for
“similar” parent
Z1
Z2
Z3
Z4Pred(X|U,Y)
Ideal Profile
Y
Parent(s) Profile
Z2
Predicted(X|U,Z)
Child Profile
![Page 7: “Ideal Parent” Structure Learning](https://reader034.vdocuments.us/reader034/viewer/2022051516/56812b0e550346895d8effed/html5/thumbnails/7.jpg)
Choosing the best parent Z
Our goal: Choose Z that maximizes
U
X
Z U
X
Likelihood of Likelihood of
Theorem: likelihood improvement when only z is optimized
y,z
Y
Z
We define:
![Page 8: “Ideal Parent” Structure Learning](https://reader034.vdocuments.us/reader034/viewer/2022051516/56812b0e550346895d8effed/html5/thumbnails/8.jpg)
Similarity vs. Score
C2 is more accurate
C1 will be useful later
scoreC
2 S
imila
rity
score
C1
Sim
ilarit
y
We now have an efficient approximation for the score
effect of fixed variance is large
![Page 9: “Ideal Parent” Structure Learning](https://reader034.vdocuments.us/reader034/viewer/2022051516/56812b0e550346895d8effed/html5/thumbnails/9.jpg)
Ideal Parent in Search Structure search involves
O(N2) Add parentO(NE) Replace parentO(E) Delete parentO(E) Reverse edge
S C
E
D
S C
E
D
S C
E
D
S C
E
D
-17.23
-19.19
-23.13
Vast majority of evaluations are replaced by ideal approximation
Only K candidates per family are optimized and scored
![Page 10: “Ideal Parent” Structure Learning](https://reader034.vdocuments.us/reader034/viewer/2022051516/56812b0e550346895d8effed/html5/thumbnails/10.jpg)
Gene Expression Experiment
4 Gene expression datasets with 44 (Amino), 89 (Metabolism) and 173 (2xConditions) variables
0.1
0.2
1 2 3 4 5K
test
-l
og
-lik
elih
oo
d AminoMetabolismConditions (AA)Conditions (Met)
0
1 2 3 4 5K
0
1
2
3
4
sp
eed
up
1 2 3 4 5K0.4%-3.6%
changes evaluated
greedy
Speedup:1.8-2.7
![Page 11: “Ideal Parent” Structure Learning](https://reader034.vdocuments.us/reader034/viewer/2022051516/56812b0e550346895d8effed/html5/thumbnails/11.jpg)
Scope
Conditional probability distribution (CPD) of the form
link function white noise
General requirement:
g(U) be any invertible (w.r.t ui) function
Linear Gaussian Chemical ReactionSigmoid Gaussian
![Page 12: “Ideal Parent” Structure Learning](https://reader034.vdocuments.us/reader034/viewer/2022051516/56812b0e550346895d8effed/html5/thumbnails/12.jpg)
Problem: No simple form for similarity measures
Sigmoid Gaussian CPD
0
2
-4 -2 0 2 4
P(X
=0.
5|Z
)
Z
0
2
-4 -2 0 2 4
P(X
=0.
85|Z
)
0
1
g(z)
Z
X = 0.5 X = 0.85
0
1
g(z) 0.5
Y(0.5) Y(0.85)-4 -2 0 2 4-4 -2 0 2 4 Linear approximation
around Y=0ExactApprox
Z
X
Like
lihoo
d
Like
lihoo
d
Solution:
Sensitivity to Z depends on gradient of specific instance
Z
![Page 13: “Ideal Parent” Structure Learning](https://reader034.vdocuments.us/reader034/viewer/2022051516/56812b0e550346895d8effed/html5/thumbnails/13.jpg)
Sigmoid Gaussian CPD
-0.86 -0.3 0.27 0.83
0.04
1.15
2.26
3.37
Z x 0.25 (g0.5)
Z x
0.1
275
(g 0
.85)
-1.85 -0.64 0.58 1.79
-0.11
1.1
2.31
3.52
Z (X=0.5)
Z (
X=
0.85
)
Equi-Likelihood Potential After gradient correction
We can now use the same measure
![Page 14: “Ideal Parent” Structure Learning](https://reader034.vdocuments.us/reader034/viewer/2022051516/56812b0e550346895d8effed/html5/thumbnails/14.jpg)
Sigmoid Gene Expression
4 Gene expression datasets with 44 (Amino), 89 (Metabolism) and 173 (Conditions) variables
-0.1
0
0.1
test
-l
og
-lik
elih
oo
d
0 5 10 15 20K
AminoMetabolismConditions (AA)Conditions (Met)
greedy
20
60
100
sp
eed
up
0 5 10 15 20K 2.2%-6.1% moves evaluated
18-30 times faster
![Page 15: “Ideal Parent” Structure Learning](https://reader034.vdocuments.us/reader034/viewer/2022051516/56812b0e550346895d8effed/html5/thumbnails/15.jpg)
For the Linear Gaussian case:
Challenge: Find that maximizes this bound
Adding New Hidden Variables
Idea Profile
Idea: Introduce hidden parent for nodeswith similar ideal profiles
H
X1 X2 X4
X1
X2
X3
X4
X5
Y1
Y2
Y3
Y4
Y5
Instances
![Page 16: “Ideal Parent” Structure Learning](https://reader034.vdocuments.us/reader034/viewer/2022051516/56812b0e550346895d8effed/html5/thumbnails/16.jpg)
where is the matrix whose columns are
must lie in the span of
is the eigenvector with largest eignevalue
Setting and using the above (with A invertible)
Scoring a parent
Rayleigh quotient of the matrix and .
Finding h* amounts to solving an eigenvector problem where |A|=size of cluster
![Page 17: “Ideal Parent” Structure Learning](https://reader034.vdocuments.us/reader034/viewer/2022051516/56812b0e550346895d8effed/html5/thumbnails/17.jpg)
X1
X2
X3
X4
X1 X2 X3 X4
compute only once
Compute using
X1 X2
12.35
X1 X3
14.12
X3 X4
3.11
Finding the best Cluster
![Page 18: “Ideal Parent” Structure Learning](https://reader034.vdocuments.us/reader034/viewer/2022051516/56812b0e550346895d8effed/html5/thumbnails/18.jpg)
X1
X2
X3
X4
X1 X2 X3 X4
compute only once
X1 X3
X1 X3
X1 X2
12.35
X1 X3
14.12
X3 X4
3.11
14.12
X1 X3 X2
X2
18.45
X4
X1 X3 X2 X416.79
Finding the best Cluster
Select cluster with highest score Add hidden parent and continue with search
![Page 19: “Ideal Parent” Structure Learning](https://reader034.vdocuments.us/reader034/viewer/2022051516/56812b0e550346895d8effed/html5/thumbnails/19.jpg)
Bipartite Network
Instances from biological expert network with7 (hidden) parents and 141 (observed) children
10 100
-100
-60
-20
test
log
-lik
elih
oo
d
Instances10 100
-60
-40
-20
tra
in lo
g-l
ikel
iho
od
Instances
GreedyIdeal K=2Ideal K=5Gold
Speedup is roughly x 10
Greedy takes over 2.5 days!
![Page 20: “Ideal Parent” Structure Learning](https://reader034.vdocuments.us/reader034/viewer/2022051516/56812b0e550346895d8effed/html5/thumbnails/20.jpg)
Summary New method for significantly speeding up structure learning in continuous variable networks
Offers promising time vs. performance tradeoff
Guided insertion of new hidden variables
Future work Improve cluster identification for non-linear case
Explore additional distributions and relation to GLM
Combine the ideal parent approach as plug-in with other search approaches