using bayesian networks to analyze expression data

Using Bayesian Networks to Analyze Expression Data

N. Friedman M. Linial I. Nachman D. Pe’er Hebrew University, Jerusalem

Central Dogma

Transcription

Cells express different subset of the genesIn different tissues and under different conditions

Translation

Protein

Microarrays (aka “DNA chips”)

New technological breakthrough: Measure RNA expression levels of thousands

of genes in one experiment Measure expression on

a genomic scale Opens up new

experimental designs Many major labs are using,

or will use this technology in the near future

The ProblemGenes

Aij - the mRNA level of gene j in experiment iGoal:

Learn regulatory/metabolic networks Identify causal sources of the biological

phenomena of interest

Analysis Approaches

Clustering of expression data Groups together genes with similar expression patterns Does not reveal structural relations between genes

Boolean networks Deterministic models of the logical interactions between

genes Deterministic, impractical for real data

Example: Cell-Cycle Data [Spellman et al]

clusters

Cell cycle stages

Our Approach

Characterize statistical relationships between expression patterns of different genes

Beyond pair-wise interactions Many interactions are explained by intermediate factors Regulation involves combined effects of several gene-

products

We build on the language of Bayesian networks

Modeling assumptions: Ancestors can effect descendants' genotype only by passing

genetic materials through intermediate generations

Network: Example

Noisy stochastic process:

Example: Pedigree A node represents

an individual’sgenotype

Lisa Maggie

Network Structure

Generalizing to DAGs: A child is conditionally

independent from its non-descendents, given the value of its parents

Often a natural assumption for causal processes if we believe that we capture

the relevant state of each intermediate stage.

Descendent

Ancestor

Parent

Non-descendentNon-descendent

Associated with each variable Xi is a conditional probability distribution P(Xi|Pai:)

Discrete variables: Multinomial distribution

Continuous variables: Choice: for example linear Gaussian

Local Probabilities

0.9 0.1

0 0.3 0.7

X P(Y |X)

Qualitative partDAG specifies

conditionalindependence

statements

Quantitative part

localprobability

models

Unique jointdistribution

over domain=

Bayesian Network Semantics

Compact & efficient representation: k parents O(2kn) vs. O(2n) params parameters pertain to local interactions

Why Bayesian Networks?

Bayesian Networks: Flexible representation of dependency structure

of multivariate distributions Natural for modeling processes with local

interactions

Learning of Bayesian Networks Can learn dependencies from observations Handles stochastic processes:

“true” stochastic behavior noise in measurements

Modeling Biological Regulation

Variables of interest: Expression levels of genes Concentration levels of proteins Exogenous variables: Nutrient levels, Metabolite

Levels, Temperature, Phenotype information …

Bayesian Network Structure: Capture dependencies among these variables

Examples

Interactions are represented by a graph: Each gene is represented by a node in the graph Edges between the nodes represent direct

dependency

Measured expression level of each gene

Gene interaction

Random variables

Probabilistic dependencies

A BX BA

More Complex Examples

Dependencies can be mediated through other nodes

Common effects can imply conditional dependence

Common cause

Intermediate gene

Outline of Our Approach

Use learned network to make predictions about

structure of the interactions between genes

Bayesian NetworkLearning Algorithm

Expression data

Sparse Candidate algorithm - efficient heuristic search that relies on sparseness

Learning With Many Variables

parents in BNcandidates

Choose candidate set for direct influence for each gene

Find optimal BN constrained on candidates

Iteratively improve candidate set

Experiment

Data from Spellman et al. (Mol.Bio. of the Cell 1998).

Contains 76 samples of all the yeast genome:

Different methods for synchronizing cell-cycle in yeast.

Time series at few minutes (5-20min) intervals.

Spellman et al. identified 800 cell-cycle regulated genes.

MethodsExperiment 1: discretized data into 3 levels

Learn multinomial probabilities

Experiment 2: Learn linear interactions (w/ Gaussian noise)

No prior biological knowledge was used

-0.5 0.5

Log(ratio to control)

Network Learned

Challenge: Statistical Significance

Sparse Data Small number of samples “Flat posterior” -- many networks fit the data

Solution estimate confidence in network features Two types of features

Markov neighbors: X directly interacts with Y Order relations: X is an ancestor of Y

Confidence Estimates

D resample

resample

)(Estimate:

Bootstrap approach[FGW, UAI99]

RandomReal

Testing for Significance

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

RandomReal

We run our procedure on randomized data where we reshuffled the order of values for each gene

Markov w/ Gaussian Models

Testing for Significance

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

RandomReal

Markov w/ Multinomial Models

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

RandomReal

Local Map

Finding Key GenesKey gene: a gene that preceeds many other genes YLR183C MCD1 Mitotic Chromosome Determinant; RAD27 DNA repair protein CLN2 role in cell cycle START SRO4 involved in cellular polarization during budding YOX1 Homeodomain protein that binds leu-tRNA gene POL30 required for DNA replication and repair YLR467W CDC5 MSH6 Homolog of the human GTBP protein YML119W CLN1 role in cell cycle START

Strong Markov Relations

YKL163W-PIR3 YKL164C-PIR1 Close location

YKR013W-PRY2 YKR012C Close location

MCD1 MSH6 Bind to DNA during mitosis

PHO11 PHO12 Acid phosphatases

HHT1 HTB1 Histones

FAR1 ASH1 Mating type switch, expression uncorrelated

CLN2 SVS1 Unknown function - SVS1

STE2 MFA2 Mating factor & receptor

Future Work

Finding suitable local distribution models Temporal aspect - DBN Correct handling of hidden variables

Can we recognize hidden causes of coordinated regulation events?

Incorporating prior knowledge Incorporate large mass of biological knowledge, and

insight from sequence/structure databases Abstraction

Combine with cluster analysis

using bayesian networks to analyze expression data

experimentmeasure expression

expression datan

similar expression patterns

nutrient levels

metabolite levels

pcawhy bayesian networks

e versuspc

genes deterministic

Documents

immunohistochemical expression of e-cadherin and β-catenin...

valliammai engineering college department of...

bayesian robust inference for differential gene expression...

sequential screening: a bayesian dynamic programming...

using bayesian networks to analyze expression...

gene expression analysis using bayesian networks for...

using emerging patterns to analyze gene expression data...

a bayesian multiple comparison approach for gene ... ·...

using bayesian networks to analyze expression...

using bayesian networks to analyze expression data by...

bayesian clustering of gene expression dynamicspeople.bu.edu...

the e3 sumo ligase nse2 regulates sumoylation and nuclear...

combining bayesian beliefs and willingness to bet to analyze...

6.096 – algorithms for computational biology lecture 12...

metaproteomics: much more than measuring gene expression...

a bayesian framework to account for complex non- genetic...

using bayesian networks to analyze expression...

bayesian bayesian network

using bayesian networks to analyze expression data · using...

using bayesian networks to analyze expression...