regulatory network (part ii) 11/05/07. methods linear –pca (raychaudhuri et al. 2000) –nir...

41
Regulatory Network (Part II) 11/05/07

Post on 19-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

Regulatory Network (Part II)

11/05/07

Page 2: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

Methods

• Linear– PCA (Raychaudhuri et al. 2000)– NIR (Gardner et al. 2003)

• Nonlinear– Bayesian network (Friedman et al. 2000;

Friedman 2004)

Page 3: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

Cell-cycle network

Data (Spellman et al. 1998)

• 76 arrays

• 7 time points

• 6177 yeast genes

• 800 cell-cycle related genes identified

Page 4: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

PCA

Raychaudhuri et al. 2000

Page 5: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

Raychaudhuri et al. 2000

The PCA components identify the dominant modes of variation.

Page 6: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

Limitations of PCA

• Does not directly associate regulators with their target genes.

• Alternatively, it can be interpreted as the network is fully connected. The expression of each gene is regulated by the linear combination of all other genes.

Page 7: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

NIR

Idea: The dynamics of gene activities can be approximated by

gene expression levels approximately reach steady state.

uAxdt

dx

perturbation

uAx

Page 8: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

NIR• Solve for A

• This is unidentifiable since M << N.• Add constraint that there are at most k-

connections for any given gene (k < M).• For each row, use multiple regression to find a

linear combination of k-genes so that the least square error is minimal.

MNMNNN uxA

#genes #perturbations

Page 9: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

Application of NIR

repression

activation

Known E Coli SOS pathway

Page 10: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

Application of NIR

Regression coefficients

Page 11: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

Limitation of NIR

• True dynamics is nonlinear.

• The choice of k is ad hoc.

• Steady state approximation does not apply to oscillatory genes.

Page 12: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

Bayesian network

Directed acyclic graph (DAG)

• Nodes: random variables

• Edges: direct effect --- conditional dependency

Friedman 2004

Page 13: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

An example

Earthquake Burglary

Radio Alarm

Call

Page 14: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

This is not a Bayesian network

A

B C

Page 15: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

A

B

C D

E

Tree: a special kind of DAG

Each node has only one parent node.

Page 16: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

Advantage

• Intuitive --- popular among biologists

• Graph structure is easy to interpret

• Well-established probabilistic tools for DAG models.

• Support all the features for probabilistic learning– Model selection criteria– Handling of missing data

Page 17: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

Known Structure, complete data

E B

A.9 .1

e

b

e

.7 .3

.99 .01

.8 .2

be

b

b

e

BE P(A | E,B)

? ?

e

b

e

? ?

? ?

? ?

be

b

b

e

BE P(A | E,B) E B

A

• Network structure is specified– Inducer needs to estimate parameters

• Data does not contain missing values

Learner

E, B, A<Y,N,N><Y,N,Y><N,N,Y><N,Y,Y> . .<N,Y,Y>

(Nir Friedman)

Page 18: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

Unknown Structure, Complete Data

E B

A.9 .1

e

b

e

.7 .3

.99 .01

.8 .2

be

b

b

e

BE P(A | E,B)

? ?

e

b

e

? ?

? ?

? ?

be

b

b

e

BE P(A | E,B) E B

A

• Network structure is not specified– Inducer needs to select arcs & estimate parameters

• Data does not contain missing values

E, B, A<Y,N,N><Y,N,Y><N,N,Y><N,Y,Y> . .<N,Y,Y>

Learner

(Nir Friedman)

Page 19: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

Learning parameters

E B

A

C

][][][][

]1[]1[]1[]1[

MCMAMBME

CABE

D

• Training data has the form:

Page 20: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

Likelihood Function E B

A

C

• Assume i.i.d. samples

• Likelihood function is

m

mCmAmBmEPDL ):][],[],[],[():(

Page 21: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

Likelihood FunctionE B

A

C

• By definition of network, we get

m

m

mAmCP

mEmBmAP

mBP

mEP

mCmAmBmEPDL

):][|][(

):][],[|][(

):][(

):][(

):][],[],[],[():(

][][][][

]1[]1[]1[]1[

MCMAMBME

CABE

Page 22: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

Likelihood FunctionE B

A

C

• Rewriting terms, we get

m

m

m

m

m

mAmCP

mEmBmAP

mBP

mEP

mCmAmBmEPDL

):][|][(

):][],[|][(

):][(

):][(

):][],[],[],[():(

][][][][

]1[]1[]1[]1[

MCMAMBME

CABE

Page 23: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

General Bayesian Networks

Generalization for any Bayesian network:

Parameters can be estimated independently!

iii

i miii

mn

DL

mPamxP

mxmxPDL

):(

):][|][(

):][, ... ],[():( 1

Page 24: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

Bayesian Inference

• Represent uncertainty about parameters using a probability distribution over parameters, data

• Using Bayes rule

])[ ..., ],1[(

)()|][ ..., ],1[(])[ ..., ],1[|(

MxxP

PMxxPMxxP

• Common prior distributions:– Dirichlet (discrete)– Normal (continuous)

Page 25: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

Why Struggle for Accurate Structure?

• Increases the number of parameters to be estimated

• Wrong assumptions about domain structure

• Cannot be compensated for by fitting parameters

• Wrong assumptions about domain structure

Earthquake Alarm Set

Sound

Burglary Earthquake Alarm Set

Sound

Burglary

Earthquake Alarm Set

Sound

Burglary

Adding an arcMissing an arc

Page 26: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

Score based Learning

E, B, A<Y,N,N><Y,Y,Y><N,N,Y><N,Y,Y> . .<N,Y,Y>

E B

A

E

B

A

E

BA

Search for a structure that maximizes the score

Define scoring function that evaluates how well a structure matches the data

G1

S(G1) = 10 S(G2) = 1.5 S(G3) = 0.01

G2 G3

Page 27: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

Max likelihood params

Structure Score

Likelihood score:

Bayesian score:– Average over all possible parameter values

)θP(D|G,L(G:D) Gˆ

dGPGDPGDP )|(),|()|(

Likelihood Prior over parametersMarginal Likelihood

Page 28: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

Search for Optimal Network Structure

• Start with a given network– empty network– best tree – a random network

• At each iteration– Evaluate all possible changes– Apply change based on score

• Stop when no modification improves score

Page 29: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

• Typical operations:

S C

E

D Reverse C EDelete C

E

Add C

D

S C

E

D

S C

E

D

S C

E

D

Search for Optimal Network Structure

Page 30: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

• Typical operations:

S C

E

D Reverse C EDelete C

E

Add C

D

S C

E

D

S C

E

D

S C

E

D

score = S({C,E} D) - S({E} D)

Search for Optimal Network Structure

At each iteration only need to score the site that is being updated !

Page 31: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

Structure Discovery

Task: Discover structural properties– Is there a direct connection between X & Y– Does X separate between two “subsystems”– Does X causally effect Y

Example: scientific data mining– Disease properties and symptoms– Interactions between the expression of genes

Page 32: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

Discovering Structure

– There may be many high scoring models– Answer should not be based on any single model– Want to average over many models

E

R

B

A

C

E

R

B

A

C

E

R

B

A

C

E

R

B

A

C

E

R

B

A

C

P(G|D)

P(D)

P(D|G)P(G)P(G|D)

Page 33: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

Cell-cycle network

Friedman et al 2000

Page 34: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman
Page 35: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman
Page 36: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

Limitations for Bayesian network

• Computationally costly– It is NP hard problem to identify the globally

optimal network structure

• Heuristic approaches may be trapped to local maxima.

• Prior distribution for DAGs is tricky.

• In practice, failure to find more difficult network structures than cell-cycle data.

Page 37: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

Equivalence of graphs

• When two DAGs can represent the same set of conditional independence assertions, we say that these DAGs are equivalent

Y Z Y Z

• Are these graphs equivalent?

Page 38: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

X

Y Z

X

Y Z

Are these graphs equivalent?

Page 39: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

Therefore, the exact graph is unidentifiable!

Page 40: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

Reading List

• Raychaudhuri et al. 2000– Apply PCA to analyze gene expression

• Gardner et al. 2003– Developed NIR to find regulatory network

• Friedman et al. 2000– Applied Bayesian network to analysis cell-

cycle network.

• Friedman 2004– Review of probabilistic graphic models.

Page 41: Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman

Acknowledgement

Some of the slides are obtained from

Nir Friedman