genetic networks. cellular networks u most processes in the cell are controlled by “networks” of...

54
. Genetic Networks

Post on 21-Dec-2015

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

.

Genetic Networks

Page 2: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Cellular Networks

Most processes in the cell are controlled by “networks” of interacting molecules:

Metabolic Networks Signal Transduction Networks Regulatory Networks

Page 3: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Unifying View

The cell as a “state machine” Cell state S = (P1,P2, …, R1, R2, …m1, m2, …) P proteins, R mRNA molecules, m metabolites Each cell at any given time, can be

characterized using its state S Dynamics:

Input(t), S(t) => S(t+t)

Page 4: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

What does it mean?

Steady Cell State – cell type Neuron RBC muscle cell Tumor cell

Dynamics – cellular process Differentiation Apoptosis Cell Cycle

Page 5: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Gene Regulation Networks

Regulation of expression of genes is crucial

Regulation occurs at many stages: pre-transcriptional (chromatin structure) transcription initiation RNA editing (splicing) and transport Translation initiation Post-translation modification RNA & Protein degradation

Understanding regulatory processes is a central problem of biological research

Page 6: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Genetic Network Models: Goals

Incorporate rule-based dependencies between genes Rule-based dependencies may constitute important

biological information. Allow to systematically study global network dynamics

In particular, individual gene effects on long-run network behavior.

Must be able to cope with uncertainty Small sample size, noisy measurements, biological

“noise” Quantify the relative influence and sensitivity of genes in

their interactions with other genes This allows us to focus on individual (groups of) genes.

What model should we use?

Page 7: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Level of Biochemical Detail

Detailed models require lots of data! Highly detailed biochemical models are only

feasible for very small systems which are extensively studied

Example: Arkin et al. (1998), Genetics 149(4):1633-48

lysis-lysogeny switch in Lambda phage: 5 genes, 67 parameters based on 50 years of

research stochastic simulation required supercomputer!

Page 8: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Example: Lysis-Lysogeny

Arkin et al. (1998), Genetics 149(4):1633-48

Page 9: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Level of Biochemical Detail

In-depth biochemical simulation of e.g. a whole cell is infeasible (so far)

Less detailed network models are useful when data is scarce and/or network structure is unknown

Once network structure has been determined, we can refine the model

Page 10: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Boolean or Continuous?

Boolean Networks (Kauffman (1993), The Origins of Order) assumes ON/OFF gene states.

Allows analysis at the network-level Provides useful insights in network dynamics Algorithms for network inference from binary data

A

B

C C = A AND B

0

10

Page 11: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Boolean Formalism: Cons

Boolean abstraction is poor fit to real data Cannot model important concepts:

amplification of a signal subtraction and addition of signals compensating for smoothly varying environmental

parameter (e.g. temperature, nutrients) varying dynamical behavior (e.g. cell cycle period)

Feedback control:negative feedback is used to stabilize expression

causes oscillation in Boolean model

Page 12: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Boolean Formalism: Pros

Studies give rise to qualitative phenomena, as observed by experimentalists.

Some studied systems exhibit multiple steady states and “switchlike” transitions between them.

It is experimentally shown that such systems are “robust” to exact values of kinetic parameters of individual reactions.

Page 13: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Concentrations or Molecules?

Use of concentrations assumes individual molecules can be ignored

Known examples (in prokaryotes) where stochastic fluctuations play an essential role (e.g. lysis-lysogeny in lambda)

Requires stochastic simulation (Arkin et al. (1998),

Genetics 149(4):1633-48), or modeling molecule counts (e.g. Petri nets, Goss and Peccoud (1998), PNAS 95(12):6750-5)

Significantly increases model complexity

Page 14: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Concentrations or Molecules?

Eukaryotes: larger cell volume, typically longer half-lives. Few known stochastic effects.

Yeast: 80% of the transcriptome is expressed at 0.1-2 mRNA copies/cell Holstege, et al.(1998), Cell 95:717-728.

Human: 95% of transcriptome is expressed at <5 copies/cell Velculescu et al.(1997), Cell 88:243-251

Page 15: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Spatial or Non-Spatial

Spatiality introduces additional complexity: intercellular interactions spatial differentiation cell compartments cell types

Spatial patterns also provide more data

e.g. stripe formation in Drosophila: Mjolsness et al. (1991), J. Theor. Biol. 152: 429-454.

Few (no?) large-scale spatial gene expression data sets available so far.

Page 16: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Example: Drosophila Segmentationan

terio

r

post

erio

r

expression of transcription factors in embryo

gt Kr

hb

bcdbcd

eve (stripe 2)high

low

eve (even-striped) expression

Page 17: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Deterministic or Stochastic?

Many sources of stochasticity Bioloical stochasticity Experimental noise

Stochastic models can account for those Deterministic models are usually simpler to analyze

(dynamics, steady states) and interpret

Page 18: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Modeling Approaches

Boolean Networks

Linear Models

Bayesian Networks

Page 19: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Boolean Network

Page 20: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

What is a Boolean Network?

Boolean network is a kind of Graph G(V, F) – V is a set of nodes ( genes )

F is a list of Boolean functions

Every node has only two values: ON ( 1 ) and OFF ( 0 )

Every function has the result value of each node :

Representation: standard, wiring , automaton

1 2( , , , )i i nx f x x x

Page 21: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

What is a Boolean Network?

Attractor : Certain states revisited infinitely often depending on the initial starting state.

Basin of attraction

Limit-cycle attractor

Page 22: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Boolean Network Example

x1

x2

x3

0 1

10

10

Nodes (genes)x2 x3x1 Time = t

Time = t+1x2 x3x1

Activate gene

inactivate gene

Wiring diagram G’(V’,F’)

Interation 1 2 3 4 5 6

1 1 0 0 0 0

1 1 1 0 0 0

0 1 1 1 0 0

X1

X2

X3

Trajectory example

1 2 3

1 2 3

( , , )if x x x

x x x

1 2 3{ , , }V x x x

Page 23: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Boolean Network Example

x1

x2

x3

0 1

10

10

Nodes (genes)Interation 1 2 3 4 5 6

1 1 0 0 0 0

1 1 1 0 0 0

0 1 1 1 0 0

X1

X2

X3

111 011110 000001

010100 101

Start!

trajectory 1

trajectory 2

1 1 2 3

2 3

2 1 2 3

1

3 1 2 3

2

( , , )

( , , )

( , , )

f x x x

x x

f x x x

x

f x x x

x

Page 24: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Basic Structure of Boolean Networks

A

X

B

Boolean functionA B X0 0 10 1 11 0 01 1 1

•Each node is a gene•1 means active/expressed•0 means inactive/unexpressed

In this example, two genes (A and B) regulate gene X. In principle, any number of “input” genes are possible. Positive/negative feedback is also common (and necessary for homeostasis).

Page 25: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Dynamics of Boolean Networks

0 1 1 0 01

A B C D E F Time

1

A

1

B

0

C

1

D

1

E

0

F

At a given time point, all the genes form a genome-wide gene activity pattern (GAP) (binary string of length n ).Consider the state space formed by all possible GAPs.

Page 26: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

State Space of Boolean Networks

Similar GAPs lie close together.

There is an inherent directionality in the state space.

Some states are attractors (or limit-cycle attractors). The system may alternate between several attractors.

Other states are transient.

Picture generated using the program DDLab.

Page 27: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Reverse Engineering Problem

Can we infer the structure and rules of a genetic network from gene expression measurements?

Page 28: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Reverse Engineering Problem

Input: Gene expression data

Output: Network structure and parameters (or regulation rules)

Page 29: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Gene Expression Time Series Data

0 10 20 30 40 50 60time (min)

Problem: how can these data be used to infer how these three genes influence each other?

gene 1

gene 2

gene 3

Page 30: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Modelling Gene Expression Data

0 10 20 30 40 50 60time (min)

assume that genes exist in two states: on and off

if expression of gene i is above level i consider it on, otherwise, consider it off

gene 1

gene 2

gene 3

Page 31: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Modelling Gene Expression Data

0 10 20 30 40 50 60time (min)

assume that genes exist in two states: on and off

if expression of gene i is above level i consider it on, otherwise, consider it off

gene 1

gene 2

gene 3

2

1

3

Page 32: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Modelling Gene Expression Data

0 10 20 30 40 50 60time (min)

assume that genes exist in two states: on and off

if expression of gene i is above level i consider it on, otherwise, consider it off

gene 1

gene 2

gene 3

2

1

3

ononononon

on

off off off

off

off

offoffoff

off

off

on on onon

on

on

on

off off off off offoff

on

off off off

Page 33: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Modelling Gene Expression Data

we obtain the following discretized gene expression data:

time 0 5 10 15 20 25 30 35 40 45 50 55

gene 1 0 0 0 0 0 0 1 1 1 1 1 1

gene 2 0 0 0 0 0 0 0 1 1 0 0 0

gene 3 1 1 1 1 1 1 1 0 0 0 0 0

the gene expression data is now in the form of bit streams

Page 34: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Information Theoretic Tools

we define some necessary information theoretic tools:

Shannon entropy of data stream

H(X) = - ∑ pi log(pi)

where pi is the probability that a random element of data stream X is i

(the base of the logarithm can be anything, but must be consistent throughout; usually we use base 2)

Page 35: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Information Theoretic Tools

e.g. Shannon entropy of data streams X and Y

X = [0, 1, 1, 1, 1, 1, 1, 0, 0, 0]

Y = [0, 0, 0, 1, 1, 0, 0, 1, 1, 1]

H(X) = - ∑ pi logn(pi)

= -(pX=0 log2(pX=0) + pX=1 log2(pX=1))

= -(0.4 log2(0.4) + 0.6 log2(0.6))

= 0.971

H(Y) = - ∑ pi logn(pi)

= -(0.5 log2(0.5) + 0.5 log2(0.5))

= 1.0

Page 36: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Information Theoretic Tools

e.g. Shannon joint entropy of data streams X and Y

X = [0, 1, 1, 1, 1, 1, 1, 0, 0, 0]

Y = [0, 0, 0, 1, 1, 0, 0, 1, 1, 1]

H(X, Y) = - ∑ pi logn(pi)

= -(pX=0,Y=0 log2(pX=0,Y=0,) + pX=1,Y=0 log2(pX=1,Y=0)

+ pX=0,Y=1 log2(pX=0,Y=1,) + pX=1,Y=1 log2(pX=1,Y=1))

= -(0.1 log2(0.1) + 0.4 log2(0.4)

+ 0.3 log2(0.3) + 0.2 log2(0.2)

= 1.85

Page 37: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Information Theoretic Tools

Define:

Conditional Entropy

H(X|Y) = H(X, Y) – H(X)

H(Y|X) = H(X, Y) – H(Y)

Mutual Information

M(X, Y) = H(Y) - H(Y|X)

= H(X) - H(X|Y)

= H(X) + H(Y) - H(X,Y)

Page 38: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Information Theoretic Tools

It is easy to show that:

Let X be an input data stream

and Y be an output data stream

If M(Y, X) = H(Y)

then X exactly determines Y

Look for pairs(x,y) where M(Yt+1, Xt) = H(Yt+1)

Page 39: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Identification of the Network Graph

back to the data:

step 1: put data in “state transition table” form

time 1 2 3 4 5 6 1 2 3 1 2 3 1 2

gene A 0 0 1 1 1 1 0 1 1 0 1 1 1 1

gene B 0 0 0 1 0 0 1 0 1 1 0 1 1 1

gene C 0 1 1 0 0 0 0 1 0 1 0 0 1 0

Page 40: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Identification of the Network Graph

state transition table:

step 1: put data in “state transition table” form

Input stream value Output stream value

Ai-1 Bi-1 Ci-1 Ai Bi Ci

0 0 0 0 0 1

0 0 1 1 0 1

0 1 0 0 0 1

0 1 1 1 0 1

1 0 0 1 0 0

1 0 1 1 1 0

1 1 0 1 0 0

1 1 1 1 1 0

Page 41: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Identification of the Network Graph

state transition table tells us how to get from

state i – 1 to state i as a lookup table however, it is difficult to discern functional relationships,

so… step 2: use information theoretic tools to discover which

inputs determine the outputs

Page 42: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Identification of the Network Graph

step 2a: calculate entropies

note: limx+0xx=1, therefore in the left-hand limit, (0)log(0) = 0.

H(Ai) = -((0.25)log(0.25) + (0.75)log(0.75)) = 0.81

H(Bi) = -((0.75)log(0.75) + (0.25)log(0.25)) = 0.81

H(Ci) = -((0.5)log(0.5) + (0.5)log(0.5)) = 1

H(Ai-1) = H(Bi-1) = H(Ci-1) = -((0.5)log(0.5) + (0.5)log(0.5)) = 1

H(Ai-1, Ci-1) = -((0.25)log(0.25) + (0.25)log(0.25)

+ (0.25)log(0.25) + (0.25)log(0.25)) = 2

Page 43: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Identification of the Network Graph

step 2a: calculate entropies

H(Ai, Ai-1, Ci-1) = -((0.25)log(0.25) + (0.25)log(0.25)

+ (0.25)log(0.25) + (0.25)log(0.25)) = 2

H(Bi, Ai-1, Ci-1) = -((0.25)log(0.25) + (0.25)log(0.25)

+ (0.25)log(0.25) + (0.25)log(0.25)) = 2

H(Ci, Ai-1) = -((0.5)log(0.5) + (0.5)log(0.5) = 1

Page 44: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Identification of the Network Graph

step 2b: calculate mutual informationM(Ai, [Ai-1, Ci-1]) = H(Ai) + H(Ai-1, Ci-1) - H(Ai, Ai-1, Ci-1)

= 0.81 + 2 – 2

= 0.81

= H(Ai), therefore Ai-1 and Ci-1 determine Ai

M(Bi, [Ai-1, Ci-1]) = H(Bi) + H(Ai-1, Ci-1) - H(Bi, Ai-1, Ci-1)

= 0.81 + 2 – 2

= 0.81

= H(Bi), therefore Ai-1 and Ci-1 determine Bi

M(Ci, Ai-1) = H(Ci) + H(Ai-1) - H(Ci, Ai-1)

= 1 + 1 – 1

= 1

= H(Ci), therefore Ai-1 determines Ci

Page 45: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Identification of the Boolean Circuits

step 3: determine functional relationship between variables (this is simply the truth table)

Ai-1 Ci-1 Ai

0 0 0

0 1 1

1 0 1

1 1 1

Ai = Ai-1 OR Ci-1

Page 46: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Identification of the Boolean Circuits

step 3: determine functional relationship between variables

Ai-1 Ci-1 Bi

0 0 0

0 1 0

1 0 0

1 1 1

Bi = Ai-1 AND Ci-1

Page 47: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Identification of the Boolean Circuits

step 3: determine functional relationship between variables

Ai-1 Ci

0 1

1 0

Ci = NOT Ai-1

Page 48: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Problems With This Approach

no theory exists for determining the discretization level i

the assumption that genes can be modeled as either ‘on’ or ‘off’ may be sufficient for some genes, but will certainly not be sufficient for all genes

Ignores noise of all kinds (experimental, biological)

Page 49: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Boolean networks areinherently deterministic

Conceptually, the regularity of genetic function and interaction is not due to “hard-wired” logical rules, but rather to the intrinsic self-organizing stability of the dynamical system.

Additionally, we may want to model an open system with inputs (stimuli) that affect the dynamics of the network.

From an empirical viewpoint, the assumption of only one logical rule per gene may lead to incorrect conclusions when inferring these rules from gene expression measurements, as the latter are typically noisy and the number of samples is small relative to the number of parameters to be inferred.

Page 50: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Linear Models

Basic model: weighted sum of inputs

Simple network representation:

Only first-order approximation

Parameters of the model:

weight matrix containing NxN interaction weights

“Fitting” the model: find the parameters wji, bi such

that model best fits available data

w23

g1g2

g3g4

g5

w12

w55

j

ijjii btywtty )()( j

ijjii bywdt

dyor

Page 51: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Underdetermined problem!

Assumes fully connected network: need at least as many data points (arrays, conditions) as variables (genes)!

Underdetermined (underconstrained, ill-posed) model: we have many more parameters than data values to fit

No single solution, rather infinite number of parameter settings that will all fit the data equally well

Page 52: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Solution 1: reduce N

Rather than trying to model all genes, we can reduce the dimensionality of the problem:

Network of clusters: construct a linear model based on the cluster centroids

rat CNS data (4 clusters): Wahde and Hertz (2000),

Biosystems 55, 1-3:129-136. yeast cell cycle (15-18 clusters): Mjolsness et al.(2000),

NIPS 12; van Someren et al.(2000) ISMB2000, 355-366.

Network of Principal Components: linear model between “characteristic modes” of the data

Holter et al.(2001), PNAS 98(4):1693-1698.

Page 53: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Solution 2:

Take advantage of additional information: replicates accuracy of measurements smoothness of time series …

Most likely, the network will still be poorly constrained.

Need a method to identify and extract those parts of the model that are well-determined and robust

Page 54: Genetic Networks. Cellular Networks u Most processes in the cell are controlled by “networks” of interacting molecules: l Metabolic Networks l Signal

Danger of Overfitting

The linear model assumes every gene is regulated by all other genes (i.e. full connectivity)

This is the richest model of its kind Danger to over fit the training data Will result in poor prediction on new data Far from reality: only few regulators for each

gene