using bayesian networks to analyze expression data

25
. Using Bayesian Networks to Analyze Expression Data N. Friedman M. Linial I. Nachman D. Pe’er Hebrew University, Jerusalem

Upload: latif

Post on 09-Jan-2016

17 views

Category:

Documents


2 download

DESCRIPTION

Using Bayesian Networks to Analyze Expression Data. N. Friedman M. Linial I. Nachman D. Pe’er Hebrew University, Jerusalem. Transcription. mRNA. Gene. Central Dogma. Translation. Protein. Cells express different subset of the genes - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Using Bayesian Networks to Analyze Expression Data

.

Using Bayesian Networks to Analyze Expression Data

N. Friedman M. Linial I. Nachman D. Pe’er Hebrew University, Jerusalem

Page 2: Using Bayesian Networks to Analyze Expression Data

Central Dogma

Transcription

mRNA

Cells express different subset of the genesIn different tissues and under different conditions

Gene

Translation

Protein

Page 3: Using Bayesian Networks to Analyze Expression Data

Microarrays (aka “DNA chips”)

New technological breakthrough: Measure RNA expression levels of thousands

of genes in one experiment Measure expression on

a genomic scale Opens up new

experimental designs Many major labs are using,

or will use this technology in the near future

Page 4: Using Bayesian Networks to Analyze Expression Data

The ProblemGenes

Exp

erim

ents

j

i

Aij - the mRNA level of gene j in experiment iGoal:

Learn regulatory/metabolic networks Identify causal sources of the biological

phenomena of interest

Page 5: Using Bayesian Networks to Analyze Expression Data

Our Approach

Characterize statistical relationships between expression patterns of different genes

Beyond pair-wise interactions Many interactions are explained by intermediate factors Regulation involves combined effects of several gene-

products

We build on the language of Bayesian networks

Page 6: Using Bayesian Networks to Analyze Expression Data

Modeling assumptions: Ancestors can effect descendants' genotype only by passing

genetic materials through intermediate generations

Network: Example

Noisy stochastic process:

Example: Pedigree A node represents

an individual’sgenotype

Homer

Bart

Marge

Lisa Maggie

Page 7: Using Bayesian Networks to Analyze Expression Data

Network Structure

Generalizing to DAGs: A child is conditionally

independent from its non-descendents, given the value of its parents

Often a natural assumption for causal processes if we believe that we capture

the relevant state of each intermediate stage.

X

Y1 Y2

Descendent

Ancestor

Parent

Non-descendentNon-descendent

Page 8: Using Bayesian Networks to Analyze Expression Data

Associated with each variable Xi is a conditional probability distribution P(Xi|Pai:)

Discrete variables: Multinomial distribution

Continuous variables: Choice: for example linear gaussian

Local Probabilities

XY

P(Y

| X

)

X

Y

0.9 0.1

x 0.3 0.7

x

X P(Y |X)

Page 9: Using Bayesian Networks to Analyze Expression Data

Qualitative partDAG specifies

conditionalindependence

statements

+

Quantitative part

localprobability

models

Unique jointdistribution

over domain=

P(C,A,R,E,B) = P(B)*P(E|B)*P(R|E,B)*P(A|R,B,E)*P(C|A,R,B,E) versusP(C,A,R,E,B) = P(B)*P(E) * P(R|E) * P(A|B,E) * P(C|A)

E

R

B

A

C

Bayesian Network Semantics

Compact & efficient representation: k parents O(2kn) vs. O(2n) params parameters pertain to local interactions

Page 10: Using Bayesian Networks to Analyze Expression Data

Why Bayesian Networks?

Bayesian Networks: Flexible representation of dependency structure

of multivariate distributions Natural for modeling processes with local

interactions

Learning of Bayesian Networks Can learn dependencies from observations Handles stochastic processes:

“true” stochastic behavior noise in measurements

Page 11: Using Bayesian Networks to Analyze Expression Data

Modeling Regulatory Interactions

Variables of interest: Expression levels of genes Concentration levels of proteins (proteomics!) Exogenous variables: Nutrient levels, Metabolite

Levels, Temperature, Phenotype information …

Bayesian Network Structure: Capture dependencies among these variables

Page 12: Using Bayesian Networks to Analyze Expression Data

Examples

Interactions are represented by a graph: Each gene is represented by a node in the graph Edges between the nodes represent direct

dependency

Measured expression level of each gene

Gene interaction

Random variables

Probabilistic dependencies

A BX BA

Page 13: Using Bayesian Networks to Analyze Expression Data

More Complex Examples

Dependencies can be mediated through other nodes

Common effects can imply conditional dependence

Common cause

A CB

Intermediate gene

A

C

B

B

A C

Page 14: Using Bayesian Networks to Analyze Expression Data

Outline of Our Approach

Use learned network to make predictions about

structure of the interactions between genes

Bayesian NetworkLearning Algorithm

E

R

B

A

C

Expression data

Page 15: Using Bayesian Networks to Analyze Expression Data

Experiment

Data from Spellman et al. (Mol.Bio. of the Cell 1998)

Contains 76 samples of all the yeast genome:

Different methods for synchronizing cell-cycle in yeast

Time series at few minutes (5-20min) intervals

Spellman et al. identified 800 cell-cycle regulated genes.

Page 16: Using Bayesian Networks to Analyze Expression Data

Methods Treat samples as IID (ignoring temporal order)

Experiment 1: Discretized into three levels of expression

Learn multinomial probabilities

Experiment 2: Learn linear interactions (w/ Gaussian noise)

No prior biological knowledge was used

-0.5 0.5

0 +-

Log(ratio to control)

Page 17: Using Bayesian Networks to Analyze Expression Data

Network Learned

Page 18: Using Bayesian Networks to Analyze Expression Data

Challenge: Statistical Significance

Sparse Data Small number of samples “Flat posterior” -- many networks fit the data

Solution estimate confidence in network features Two types of features

Markov neighbors: X directly interacts with Y Order relations: X is an ancestor of Y

Page 19: Using Bayesian Networks to Analyze Expression Data

Confidence Estimates

D resample

resample

resample

D1

D2

Dm

...

Learn

Learn

Learn

E

R

B

A

C

E

R

B

A

C

E

R

B

A

C

m

iiGf

mfC

1

11

)(Estimate:

Bootstrap approach[FGW, UAI99]

Page 20: Using Bayesian Networks to Analyze Expression Data

Testing for Significance

We run our procedure on randomized data where we reshuffled the order of values for each gene

Histograms of number of Markov features at each confidence level

Original Data Randomized Data

Page 21: Using Bayesian Networks to Analyze Expression Data

RandomReal

Testing for Significance

0

500

1000

1500

2000

2500

3000

3500

4000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Fea

ture

s w

ith C

onfid

ence

abo

ve t

t

0

50

100

150

200

250

300

350

400

450

500

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

RandomReal

We run our procedure on randomized data where we reshuffled the order of values for each gene

Markov w/ Gaussian Models

Page 22: Using Bayesian Networks to Analyze Expression Data

Testing for Significance

0

200

400

600

800

1000

1200

1400

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Fea

ture

s w

ith C

onfid

ence

abo

ve t

t

RandomReal

Markov w/ Multinomial Models

0

50

100

150

200

250

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

RandomReal

Page 23: Using Bayesian Networks to Analyze Expression Data

Local Map

Page 24: Using Bayesian Networks to Analyze Expression Data

Finding Key GenesKey gene: a gene that preceeds many other genes YLR183C MCD1 Mitotic Chromosome Determinant; RAD27 DNA repair protein CLN2 role in cell cycle START SRO4 involved in cellular polarization during budding YOX1 Homeodomain protein that binds leu-tRNA gene POL30 required for DNA replication and repair YLR467W CDC5 MSH6 Homolog of the human GTBP protein YML119W CLN1 role in cell cycle START

Page 25: Using Bayesian Networks to Analyze Expression Data

Future Work

Finding suitable local distribution models Correct handling of hidden variables

Can we recognize hidden causes of coordinated regulation events?

Incorporating prior knowledge Incorporate large mass of biological knowledge, and

insight from sequence/structure databases Abstraction

Combine with cluster analysis of higher confidence conclusions