“ideal parent” structure learning

“Ideal Parent” Structure Learning

School of Engineering & Computer Science

The Hebrew University, Jerusalem, Israel

Gal Elidan

withIftach Nachman and Nir Friedman

Problems: Need to score many candidates Each one requires costly parameter optimization

Structure learning is often impractical

S C

E

D

S C

E

D

S C

E

D

S C

E

D

Learning Structure

Data

VariablesInput:

-17.23

-19.19

-23.13

Inst

ance

s

S C

E

D

Output:

Init: Start with initial structure

Consider local changes1Score each candidate2

Apply best modification 3 The “Ideal Parent” Approach Approximate improvements of changes (fast)

Optimize & score promising candidates (slow)

EC

P(E

| C

)

D

A

C

E

B

Linear Gaussian Networks

Goal: Score only promising candidates

The “Ideal Parent” Idea

Parent Profile

Child Profile

Instances

Pred(X|U)

U

X



Ideal Profile

Instances

Pred(X|U)

U

X

Y

Step 1:Compute optimal

hypothetical parent

Pred(X|U,Y)

Instances

pote

ntia

l par

ents

Step 2:Search for

“similar” parent

Z1

Z2

Z3

Z4

Parent Profile

Child Profile

Step 3:Add new parent

and optimize parameters



Instances

U

X

Step 1:Compute optimal

hypothetical parent

Instances

pote

ntia

l par

ents

Step 2:Search for

“similar” parent

Z1

Z2

Z3

Z4Pred(X|U,Y)

Ideal Profile

Y

Parent(s) Profile

Z2

Predicted(X|U,Z)

Child Profile

Choosing the best parent Z

Our goal: Choose Z that maximizes

U

X

Z U

X

Likelihood of Likelihood of

Theorem: likelihood improvement when only z is optimized

y,z

Y

Z

We define:

Similarity vs. Score

C2 is more accurate

C1 will be useful later

scoreC

2 S

imila

rity

score

C1

Sim

ilarit

y

We now have an efficient approximation for the score

effect of fixed variance is large

Ideal Parent in Search Structure search involves

O(N2) Add parentO(NE) Replace parentO(E) Delete parentO(E) Reverse edge

S C

E

D

S C

E

D

S C

E

D

S C

E

D

-17.23

-19.19

-23.13

Vast majority of evaluations are replaced by ideal approximation

Only K candidates per family are optimized and scored

Gene Expression Experiment

4 Gene expression datasets with 44 (Amino), 89 (Metabolism) and 173 (2xConditions) variables

0.1

0.2

1 2 3 4 5K

test

-l

og

-lik

elih

oo

d AminoMetabolismConditions (AA)Conditions (Met)

0

1 2 3 4 5K

0

1

2

3

4

sp

eed

up

1 2 3 4 5K0.4%-3.6%

changes evaluated

greedy

Speedup:1.8-2.7

Scope

Conditional probability distribution (CPD) of the form

link function white noise

General requirement:

g(U) be any invertible (w.r.t ui) function

Linear Gaussian Chemical ReactionSigmoid Gaussian

Problem: No simple form for similarity measures

Sigmoid Gaussian CPD

0

2

-4 -2 0 2 4

P(X

=0.

5|Z

)

Z

0

2

-4 -2 0 2 4

P(X

=0.

85|Z

)

0

1

g(z)

Z

X = 0.5 X = 0.85

0

1

g(z) 0.5

Y(0.5) Y(0.85)-4 -2 0 2 4-4 -2 0 2 4 Linear approximation

around Y=0ExactApprox

Z

X

Like

lihoo

d

Like

lihoo

d

Solution:

Sensitivity to Z depends on gradient of specific instance

Z

Sigmoid Gaussian CPD

-0.86 -0.3 0.27 0.83

0.04

1.15

2.26

3.37

Z x 0.25 (g0.5)

Z x

0.1

275

(g 0

.85)

-1.85 -0.64 0.58 1.79

-0.11

1.1

2.31

3.52

Z (X=0.5)

Z (

X=

0.85

)

Equi-Likelihood Potential After gradient correction

We can now use the same measure

Sigmoid Gene Expression

4 Gene expression datasets with 44 (Amino), 89 (Metabolism) and 173 (Conditions) variables

-0.1

0

0.1

test

-l

og

-lik

elih

oo

d

0 5 10 15 20K

AminoMetabolismConditions (AA)Conditions (Met)

greedy

20

60

100

sp

eed

up

0 5 10 15 20K 2.2%-6.1% moves evaluated

18-30 times faster

For the Linear Gaussian case:

Challenge: Find that maximizes this bound

Adding New Hidden Variables

Idea Profile

Idea: Introduce hidden parent for nodeswith similar ideal profiles

H

X1 X2 X4

X1

X2

X3

X4

X5

Y1

Y2

Y3

Y4

Y5

Instances

where is the matrix whose columns are

must lie in the span of

is the eigenvector with largest eignevalue

Setting and using the above (with A invertible)

Scoring a parent

Rayleigh quotient of the matrix and .

Finding h* amounts to solving an eigenvector problem where |A|=size of cluster

X1

X2

X3

X4

X1 X2 X3 X4

compute only once

Compute using

X1 X2

12.35

X1 X3

14.12

X3 X4

3.11

Finding the best Cluster

X1

X2

X3

X4

X1 X2 X3 X4

compute only once

X1 X3

X1 X3

X1 X2

12.35

X1 X3

14.12

X3 X4

3.11

14.12

X1 X3 X2

X2

18.45

X4

X1 X3 X2 X416.79

Finding the best Cluster

Select cluster with highest score Add hidden parent and continue with search

Bipartite Network

Instances from biological expert network with7 (hidden) parents and 141 (observed) children

10 100

-100

-60

-20

test

log

-lik

elih

oo

d

Instances10 100

-60

-40

-20

tra

in lo

g-l

ikel

iho

od

Instances

GreedyIdeal K=2Ideal K=5Gold

Speedup is roughly x 10

Greedy takes over 2.5 days!

Summary New method for significantly speeding up structure learning in continuous variable networks

Offers promising time vs. performance tradeoff

Guided insertion of new hidden variables

Future work Improve cluster identification for non-linear case

Explore additional distributions and relation to GLM

Combine the ideal parent approach as plug-in with other search approaches

“ideal parent” structure learning

Documents

new parent

largeideal parent

highest scoreadd hidden

ideal approximation

best parent zour goal

promising time

hidden parents

linear approximation