regulatory network (part ii) 11/05/07. methods linear –pca (raychaudhuri et al. 2000) –nir...

Regulatory Network (Part II)

11/05/07

Methods

• Linear– PCA (Raychaudhuri et al. 2000)– NIR (Gardner et al. 2003)

• Nonlinear– Bayesian network (Friedman et al. 2000;

Friedman 2004)

Cell-cycle network

Data (Spellman et al. 1998)

• 76 arrays

• 7 time points

• 6177 yeast genes

• 800 cell-cycle related genes identified

Raychaudhuri et al. 2000

The PCA components identify the dominant modes of variation.

Limitations of PCA

• Does not directly associate regulators with their target genes.

• Alternatively, it can be interpreted as the network is fully connected. The expression of each gene is regulated by the linear combination of all other genes.

Idea: The dynamics of gene activities can be approximated by

gene expression levels approximately reach steady state.

perturbation

NIR• Solve for A

• This is unidentifiable since M << N.• Add constraint that there are at most k-

connections for any given gene (k < M).• For each row, use multiple regression to find a

linear combination of k-genes so that the least square error is minimal.

MNMNNN uxA

#genes #perturbations

Application of NIR

repression

activation

Known E Coli SOS pathway

Application of NIR

Regression coefficients

Limitation of NIR

• True dynamics is nonlinear.

• The choice of k is ad hoc.

• Steady state approximation does not apply to oscillatory genes.

Bayesian network

Directed acyclic graph (DAG)

• Nodes: random variables

• Edges: direct effect --- conditional dependency

Friedman 2004

An example

Earthquake Burglary

Radio Alarm

This is not a Bayesian network

Tree: a special kind of DAG

Each node has only one parent node.

Advantage

• Intuitive --- popular among biologists

• Graph structure is easy to interpret

• Well-established probabilistic tools for DAG models.

• Support all the features for probabilistic learning– Model selection criteria– Handling of missing data

Known Structure, complete data

A.9 .1

.99 .01

BE P(A | E,B)

BE P(A | E,B) E B

• Network structure is specified– Inducer needs to estimate parameters

• Data does not contain missing values

Learner

E, B, A<Y,N,N><Y,N,Y><N,N,Y><N,Y,Y> . .<N,Y,Y>

(Nir Friedman)

Unknown Structure, Complete Data

A.9 .1

.99 .01

BE P(A | E,B)

BE P(A | E,B) E B

• Network structure is not specified– Inducer needs to select arcs & estimate parameters

• Data does not contain missing values

E, B, A<Y,N,N><Y,N,Y><N,N,Y><N,Y,Y> . .<N,Y,Y>

Learner

(Nir Friedman)

Learning parameters

][][][][

]1[]1[]1[]1[

MCMAMBME

• Training data has the form:

Likelihood Function E B

• Assume i.i.d. samples

• Likelihood function is

mCmAmBmEPDL ):][],[],[],[():(

Likelihood FunctionE B

• By definition of network, we get

mEmBmAP

mCmAmBmEPDL

):][|][(

):][],[|][(

):][],[],[],[():(

][][][][

]1[]1[]1[]1[

MCMAMBME

Likelihood FunctionE B

• Rewriting terms, we get

mEmBmAP

mCmAmBmEPDL

):][|][(

):][],[|][(

):][],[],[],[():(

][][][][

]1[]1[]1[]1[

MCMAMBME

General Bayesian Networks

Generalization for any Bayesian network:

Parameters can be estimated independently!

i miii

mPamxP

mxmxPDL

):][|][(

):][, ... ],[():( 1

Bayesian Inference

• Represent uncertainty about parameters using a probability distribution over parameters, data

• Using Bayes rule

])[ ..., ],1[(

)()|][ ..., ],1[(])[ ..., ],1[|(

PMxxPMxxP

• Common prior distributions:– Dirichlet (discrete)– Normal (continuous)

Why Struggle for Accurate Structure?

• Increases the number of parameters to be estimated

• Wrong assumptions about domain structure

• Cannot be compensated for by fitting parameters

• Wrong assumptions about domain structure

Earthquake Alarm Set

Burglary Earthquake Alarm Set

Burglary

Earthquake Alarm Set

Burglary

Adding an arcMissing an arc

Score based Learning

E, B, A<Y,N,N><Y,Y,Y><N,N,Y><N,Y,Y> . .<N,Y,Y>

Search for a structure that maximizes the score

Define scoring function that evaluates how well a structure matches the data

S(G1) = 10 S(G2) = 1.5 S(G3) = 0.01

Max likelihood params

Structure Score

Likelihood score:

Bayesian score:– Average over all possible parameter values

)θP(D|G,L(G:D) Gˆ

dGPGDPGDP )|(),|()|(

Likelihood Prior over parametersMarginal Likelihood

Search for Optimal Network Structure

• Start with a given network– empty network– best tree – a random network

• At each iteration– Evaluate all possible changes– Apply change based on score

• Stop when no modification improves score

• Typical operations:

D Reverse C EDelete C

• Typical operations:

D Reverse C EDelete C

score = S({C,E} D) - S({E} D)

At each iteration only need to score the site that is being updated !

Structure Discovery

Task: Discover structural properties– Is there a direct connection between X & Y– Does X separate between two “subsystems”– Does X causally effect Y

Example: scientific data mining– Disease properties and symptoms– Interactions between the expression of genes

Discovering Structure

– There may be many high scoring models– Answer should not be based on any single model– Want to average over many models

P(G|D)

P(D|G)P(G)P(G|D)

Cell-cycle network

Friedman et al 2000

Limitations for Bayesian network

• Computationally costly– It is NP hard problem to identify the globally

optimal network structure

• Heuristic approaches may be trapped to local maxima.

• Prior distribution for DAGs is tricky.

• In practice, failure to find more difficult network structures than cell-cycle data.

Equivalence of graphs

• When two DAGs can represent the same set of conditional independence assertions, we say that these DAGs are equivalent

Y Z Y Z

• Are these graphs equivalent?

Are these graphs equivalent?

Therefore, the exact graph is unidentifiable!

Reading List

• Raychaudhuri et al. 2000– Apply PCA to analyze gene expression

• Gardner et al. 2003– Developed NIR to find regulatory network

• Friedman et al. 2000– Applied Bayesian network to analysis cell-

cycle network.

• Friedman 2004– Review of probabilistic graphic models.

Acknowledgement

Some of the slides are obtained from

Nir Friedman

regulatory network (part ii) 11/05/07. methods linear –pca (raychaudhuri et al. 2000) –nir...

perturbation slide

yeast genes

pca components

target genes

limitations of pca

regulatory network

cellcycle related genes

nir gardner

Documents

stochastic modeling in systems biology and biophysics...

a study on situational awareness security and privacy of...

career network, inc. et al v. wot services, ltd. et al main

et channel lineup fall2021 rev 8.23.21 acc network

hensel et al v. american air network, inc. et al complaint

architecture paramétrique et network thinking

ajitava raychaudhuri department of economics jadavpur ... 2...

network cooperation for client-ap association optimization...

ece 544 spring 2006 lecture 10: network security d....

manishi raychaudhuri

prabir de and ajitava raychaudhuri - united nations escap...

airborne wireless network, et al. - sec.gov | home

cross layer network architecture for efficient...

robyn brooker, et al. v. network engines, inc., et al. 12...

dr anindya raychaudhuri university of st andrews

mobilityfirst: high-level architectural updates arun...

an experimental study of the cache-and-forward network...

in fond memory of professor a.k. raychaudhuri who passed...

2008 buckner et al. the brains default network

nuhorizons 09: summary talk amitava raychaudhuri...