models and algorithmic tools for computational processes in cellular biology bhaskar dasgupta...

97
Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at Chicago Chicago, IL 60607-7053 [email protected]

Upload: samuel-bond

Post on 26-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

Models and Algorithmic Tools for Computational Processes in Cellular Biology

Bhaskar DasGuptaDepartment of Computer ScienceUniversity of Illinois at Chicago

Chicago, IL 60607-7053

[email protected]

Page 2: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

What is “systems biology”

in one sentence ?

study to unravel and conceptualize

dynamic processes, feedback control loops and signal processing mechanisms

underlying life

Page 3: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Cellular Networks

• A single cell by itself is complex enough

• Various technologies have facilitated the monitoring of expression of genes and activities of proteins

• Difficult to find the causal relations and overall structure of the network

http://www.nyas.org/ebriefreps/ebrief/000534/images/mendes2.gif

Page 4: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Cellular Networks

Genes and gene products interact on several levels, e.g.:

• Genes regulate each other’s expression as part of gene regulatory networks– transcription factors can activate or inhibit the transcription of genes

to give mRNAs– these transcription factors are themselves products of genes

• Protein-protein interaction networks– proteins can participate in diverse post-translational interactions that

lead to modified protein functions or to formation of protein complexes that have new roles

• Different levels of interactions are integrated – e.g., presence of an external signal triggers a cascade of interactions

that involves biochemical reactions, protein-protein interactions and transcriptional regulation

Page 5: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Cellular networks

• cellular interaction maps only represent a network of possibilities, and not all edges are present and active in vivo in a given condition or in a given cellular location

• only an integration of time-dependent interaction and activity information will be able to give the correct dynamical picture of a cellular network

Page 6: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Modeling problem

• interaction data produced by the biologist in the form of a diagram (e.g., some type of labeled digraph)

• wish to pose questions about the behavior (dynamics) of such a network

– essential to provide a precise mathematical formulation of its dynamics, and specifically how the state of each node depends on the state of the nodes interacting with it

Page 7: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Models

• discrete, continuous and hybrid models• their inter-relationships, powers and limitations• computational complexity and algorithmic issues• biological implications and validations• fascinating interplay between several areas such as:

– biology– control theory– discrete mathematics and computer science

Page 8: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

System dynamics• state variables

– continuous– discrete (e.g., small number of quantitative states)

• time variables– continuous (e.g., partial differential equation, delay equations)– discrete (difference equations, quantized descriptions of continuous

variables)

• deterministic or probabilistic nature of the model

• hybrid models– combines continuous and discrete time-scales and/or– combines continuous and discrete time variables

Page 9: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Continuous-state dynamics

Differential equation

(continuous-time)Difference equation

(discrete-time)

Page 10: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Examples of other models

Boolean

x1, x2, x3 {0.1}

Boolean feedforward

SignalTransduction

Page 11: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Reverse engineering of modelsGiven

– partial knowledge about the process/network– access to suitable biological experiments

How to gain more knowledge about the model ?– effective use of resources (time, cost)

Page 12: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Reverse engineering

Process of backward reasoning, requiring careful observation of inputs and outputs, to elucidate the structure of the system

http://www.computerworld.com/computerworld/records/images/story/46Reverse-engineering.gif

Page 13: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Ingredients for reverse engineering

• Mathematical model to be reverse engineered– e.g., differential equation model

• Biological experiments available, e.g., – perturbation experiments– gene expression measurements

Page 14: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Many reverse engineering approaches are possible

I will discuss two types of approaches:

– “hitting set” based combinatorial approaches

– modular response analysis (MRA) approach

Page 15: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

Reverse Engineering of Networks Via

Modular Response Analysis Method

Page 16: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Ingredients for reverse engineering viamodular response analysis approach

• Mathematical models– differential equation model

• Biological experiments available– perturbation experiments

Page 17: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Differential Equation Model

state variables evolve by (unknown) ordinary differential equations

11 1 2 n 1 2 m

nn 1 2 n 1 2 m

dx= f (x ,x ,…,x ,p ,p ,…,p )

dtdx

= f(x, t)dt

dx= f (x ,x ,…,x ,p ,p ,…,p )

dt

x = (x1(t),...,xn(t)) state variables over time t measurable (e.g., activity levels of proteins)

p = (p1,...,pm) parameters that can be manipulated

f(x*,p*)=0 p* “wild-type” (i.e., normal) condition of p x* corresponding steady-state condition

Page 18: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

settings for modular response analysis method

– do not know f

– but, prior information of the following type is available

• parameter pj does not effect variables xi

(i.e., fi /pj ≡ 0 or not)

Kholodenko, Kiyatkin, Bruggeman, Sontag, Westerhoff and Hoek, PNAS, 2002

Page 19: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Experimental protocols(perturbation experiments)

• perturb one parameter, say pk

• for perturbed p, measure steady state vector x = (p)– let the system relax to steady state– measure xi (western blots, microarrys etc.)

• estimate n “sensitivities”:

* * *( ) ( ) ( ) (ii j j i

j

p p p e pp

ij *j j

1b for i = 1,2,…,n

p - p)

where ej is the jth canonical basis vector

Page 20: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Modeling Goal

A

DC

B1. Topology of

connections only

2. Direction of the relationship

3. Information about stimulatory or inhibitory effects

4. Strength of relationship

+

+ -+

-2.1

9.3 1.24.8

5.3

Modeling goal can be at

different levels

Page 21: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Goal of MRA approach

Obtain information about the sign of fi/xj(x,p)

e.g., if fi/xj 0, then xj has a positive (catalytic) effect on the formation of xi

Page 22: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

In a nutshell

after some combinatorics and linear algebra

one can quantify the additional prior knowledge necessary to reach the goal

Kholodenko, Kiyatkin, Bruggeman, Sontag, Westerhoff and Hoek, PNAS, 2002Bermen, DasGupta and Sontag, Discrete Applied Math, 2007Berman, DasGupta and Sontag, Annals of NYAS, 2007

Page 23: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

But, assuming (near)-sufficient prior information

• how to determine a minimum or near-minimum number of perturbation experiments that will work?

This now becomes a algorithmic/complexity issue...

Page 24: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

After some effort, one can see that

designing minimal sets of experiments

leads to

the set multi-cover problem

Page 25: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

In our biological application context,

our set-multicover algorithm provides a set of suggested experiments such that

# of experiments ≈ minimum possible

Page 26: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Modular Response Analysis for

Differential Equations model

Linear Algebraicformulation

Combinatorialformulation

CombinatorialAlgorithms

(randomized)

Selection ofappropriate

perturbation experiments Overall high-level picture

Page 27: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Experimental validation of MRA MethodSee the paper:

S. D. M. Santos, P. J. Verveer, P. I. H. Bastiaens,

Growth factor-induced MAPK network topology shapes Erk response determining PC-12 cell fate

Nature Cell Biology 9, 324 - 330 (2007)

• MAPK pathway involving proteins Raf, Mek and Erk is activated through receptor tyrosine kinases TrkA and epidermal growth factor receptor (EGFR) by two different stimuli, NGF (neuronal-) or EGF (epidermal growth factor)

• MRA method was applied to determine the MAPK network architecture in the context of NGF and EGF stimulations

Page 28: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

Reverse Engineering of Networks Via

Hitting-set based (combinatorial) Method

Page 29: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

steady state profiles of perturbations of the network

hitting set

introduceredundancy

multi-hitting set

expression data representing state transition measurement

for wildtype and perturbation data

topology of interconnection

network

hitting set

introduceredundancy

multi-hitting set

“Hitting set” based combinatorial approaches

topology of interconnection

network

Page 30: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Basic idea behind the hitting-set based approaches

which variables influence x5 ?

x5 changes so does x1, x3, x4

at least one of {x1,x3,x4} must influence x5

build dependency information over all successive time steps

{x1,x3,x4}{x1} {x1,x3}{x2,x3,x4}{x1}

minimal dependency(hitting set problem)

{x1,x2}

Page 31: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Why construct “minimal” dependency ?

Occam's razor

entia non sunt multiplicanda praeter necessitatem

(entities must not be multiplied beyond necessity)

However, biological networks may be redundant:

e.g.– G. Tononi, O. Sporns, G. M. Edelman, PNAS, 1999– R. Albert et al., Physical Review E, 2011

How can we introduce redundancy if necessary ?

Page 32: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

How can we introduce redundancy if necessary ?

First idea: add random extra dependencies (edges)

not good, these edges may not be supported by given data

Better idea: modify hitting set to “multi-hitting set”

{x1,x3,x4} previously: select at least 1

now: select at least 2

(in general, some r)

Page 33: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Evaluation of performance of reverse engineering Methods

Reverse-engineering methods are ill-posed, i.e., their solution is not unique– existence of measurement error – not all molecular species involved in a given analyzed phenomenon are

included in the construction of a network • i.e., existence of hidden variables

Two possible ways for evaluation:

Experimental testing of predictions:

after a model has been inferred, newly found interactions or predictions can be tested experimentally

Benchmarking testing:

measure how “accurate” the method of our interest is in recovering a known (“gold standard”) network

Page 34: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Evaluation of performance of reverse engineering MethodsMetrics for accuracy for benchmark testing

Measurements:– correct interactions inferred (true positives, TP)– incorrect interactions inferred (false positives, FP)– correct non-interactions inferred (true negatives, TN)– incorrect non-interactions inferred (false negatives FN)

Metrics– recall or true positive rate

– false positive rate – – accuracy

– precision or positive predictive value

TPTPR =

TP +FN

FPFPR =

FP +TN

TPPPV =

FP +TP

TP +TNACC =

total possible interactions

Page 35: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Two published method based on hitting set approach

(A) Ideker, Thorsson, Karp, PSB (2000)

First step (network inference):

estimate a set of Boolean networks consistent with an observed set of steady-state gene expression profiles, each generated from a different perturbation to the genetic network

Second step (optimization):

use an entropy-based approach to select an additional perturbation experiment to perform a model selection from the set of predicted Boolean networks

(B) Jarrah, Laubenbacher, Stigler, Stillman, Adv. in Applied Mathematics (2007)

Attempts to infer the most likely causal relationships among network elements from gene expression data

For other published results, see, for example:

Krupa, Journal of Theoretical Biology (2002)

Page 36: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Comparative analysis (via benchmark testing) of two approaches by

(A) Ideker, Thorsson, Karp

(B) Jarrah, Laubenbacher, Stigler, Stillman

Two gold standard networks:

a. Segment polarity network of Drosophila melanogaster (fruit fly):– last step in the hierarchical cascade of gene families initiating the segmented

body of the fruit fly– genes of this network include

– engrailed (en) – wingless (wg), – hedgehog (hh) – patched (ptc) – cubitus interruptus (ci) and – sloppy paired (slp)

coding for the corresponding proteins– 1 para-segment of 4 cells– 60 nodes: variables are expression levels of segment polarity genes/proteins– Boolean model from (Albert and Othmer, Journal of Theoretical Biology, 2003)

DasGupta, Vera-Licona, Sontag, 2011

Page 37: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

b. In Silico network: gene regulatory network with external perturbations– 13 species: 10 genes plus 3 different environmental perturbations– perturbations affect the transcription rate of the gene on which they act directly

(through inhibition or activation) and their effect is propagated throughout the network by the interactions between the genes

– generated using the software package in (Mendes, Trends Biochem. Sci, 1997)

DasGupta, Vera-Licona, Sontag, 2011

Page 38: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

generated time courses for both networks (a) and (b)

For method (A) we considered both greedy and linear programming based approximations to the hitting set problem as well as redundancy values R=1, 2

For method (B), input data must be discrete

used three discretization methods: • graph-theoretic based approached “D” from (Dimitrova, Garcia-Puente,

Jarrah, Laubenbacher, Stigler, Stillman, Vera-Licona, 2010)• quantile “Q” discretization (method in which each variable state

receives an equal number of data values)• interval “I” discretization (select thresholds for the different discrete

values).

DasGupta, Vera-Licona, Sontag, 2011

Page 39: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Summary of Comparison

DasGupta, Vera-Licona, Sontag, 2011

network (b): • method (B) was better than method (A) in ROC space• method (A) achieved a performance no better than random guessing

network (a): • method (B) could not obtain any results after running over 12 hours• method (A) was able to compute results in less than 1 minute• method (A) improved slightly when small redundancy was introduced

Page 40: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

implementation of method (B):

http://polymath.vbi.vt.edu/polynome/

implementation of method (A)

done by (DasGupta, Vera-Licona, Sontag, 2011) at

http://sts.bioengr.uic.edu/causal/

DasGupta, Vera-Licona, Sontag, 2011

Page 41: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

Direct Synthesization of

Signal Transduction Networks

Only from known interactions and information

No new experiments needed

Page 42: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Overall Goal

direct interactionA → BA ┤B

double-causal interaction

A → (B → C)A → (B ┤C)

additionalinformation

Method(algorithms, software)

FAST

network

minimal complexitybiologically relevant

Page 43: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Nature of experimental evidence• biochemical

– direct interaction, e.g., • binding of two proteins• a transcription factor activating the transcription of a gene • a chemical reaction with a single reactant and single product

• pharmacological – indirect causal effects most probably resulting from a chain of

interactions and reactions, e.g., • binding of a chemical to a receptor protein starts a cascade of protein-

protein interactions and chemical reactions that ultimately results in the transcription of a gene

• genetic evidence of differential responses to a stimulus– can be direct, but most often indirect (double-causal)

Page 44: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

We describe a method for synthesizing double-causal (path-level) information into a consistent network

Page 45: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Direct interactions

A promotes B A → B

A inhibits B A ┤ B

Illustration of double-causal interaction

C promotes the process of A promoting B

A B

BA

C

BApseudo

Page 46: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

“Critical” edge

(known direct interaction, part of input)

Page 47: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Main computational step for network synthesis

• Pseudo-vertex collapse (PVC) – easy

• Binary transitive reduction (BTR) – hard– need heuristics

Page 48: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Pseudo-vertex collapse (PVC)

Intuitively, PVC is useful for reducing the pseudo-vertex set to the minimal set that maintains the graph consistent with all indirect experimental observations.

u

v

in(u)=in(v)out(u)=out(v)

uv

pseudo-vertices

new psuedo-vertex

Page 49: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Illustration of Binary Transitive Reduction (BTR)

remove?

yes,alternate path

remove?

no,critical edge

Intuitively, the BTR problem is useful for determining the sparsest graph consistent with a set of experimental observations

Page 50: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Some biologists did look at very simplified or somewhat different version of BTR, e.g.:

• A. Wagner, Estimating Coarse Gene Network Structure from Large-Scale Gene Perturbation Data , Genome Research, 12, pp. 309-315, 2002– too special (reachability only), no efficient algorithms reported

• T. Chen, V. Filkov and S. Skiena, Identifying Gene Regulatory Networks from Experimental Data, Third Annual International Conference on Computational Moledular Biology, pp. 94-103, 1999– “excess edge deletion” problem, biologically too restrictive version

See the following excellent survey for more comprehensive information about biological network inference and modeling:

• V. Filkov, Identifying Gene Regulatory Networks from Gene Expression Data, in Handbook of

Computational Molecular Biology (edited by S. Aluru), Chapman & Hall/CRC Press, 2005 • H. D. Jong, Modelling and Simulation of Genetic Regulatory Systems: A Literature Review, Journal of

Computational Biology, Volume 9, Number 1, pp. 67-103, 2002

Page 51: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

High level description of the network synthesis process

Synthesize direct interactions

Optimize

Synthesize double-causal interactions

Optimize

Interaction with

biologists

BTR

PVC

BTR

Albert, DasGupta, Dondi, Kachalo, Sontag, Zelikovsky, Westbrooks, 2007

Page 52: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

All the steps in the network synthesis procedure except the steps that involve BTR can be done easily

Thus, it behooves to look at BTR more closely

Page 53: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

But, before that, biological validation of the network synthesis approach is desirable

Need a network that uses double-causal experimental evidence

Page 54: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Plant signal transduction network

consistent guard cell signal transduction network for ABA-induced stomatal closure– manually curated– described in S. Li, S. M. Assmann and R. Albert, Predicting Essential Components

of Signal Transduction Networks: A Dynamic Model of Guard Cell Abscisic Acid Signaling, PLoS Biology, 4(10), October 2006

– list of experimentally observed causal relationships collected by Li et al. and published as Table S1. This table contains

• around 140 interactions and causal inferences, both of type “A promotes B” and “C promotes process (A promotes B)”

– We augment this list with critical edges drawn from biophysical/biochemical knowledge on enzymatic reactions and ion flows and with simplifying hypotheses made by Li et al. both described in Text of S1

Page 55: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

We also formalized an additional rule specific to the context of this network (and implicitly assumed by

Li et al.) regarding enzyme-catalyzed reactions

Page 56: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Regulatory interactions between ABA signal transduction pathway components

Page 57: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Regulatory interactions between ABA signal transduction pathway components (continued)

NO → GC not critical and not enzymatic

ERA1 ┤(ABA → CalM)

Page 58: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Some nodes in the network

GCR1 putative G protein coupled receptor

OST1 protein

NO Nitric Oxide

ABH1 RNA cap-binding protein

RAC1 small GTPase protein

Page 59: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

(left) Guard cell signal transduction network for ABA-induced stomatal closure manually curated by Li, Assmann and Albert [source: PloS Biology, 10 (4), 2006].

( right) our developed automated network synthesis procedure produced a reduced (fewer edges) network while preserving all observed pathways

Albert, DasGupta, Dondi, Kachalo, Sontag, Zelikovsky, Westbrooks, 2007

Page 60: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Albert, DasGupta, Dondi, Kachalo, Sontag, Zelikovsky, Westbrooks, 2007

Page 61: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Summary of comparison of the two networks

• Li et al. has 54 vertices and 92 edges

our network has 57 vertices but 84 edges• Both networks have identical strongly connected component of

vertices• All the paths present in the Li et al.’s reconstruction are present

in our network as well• The two networks have 71 common edges• It took a few seconds to synthesize our network

Albert, DasGupta, Dondi, Kachalo, Sontag, Zelikovsky, Westbrooks, 2007

Page 62: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Summary of comparison of the two networks (continued)

Thus the two networks are highly similar but diverge on a few edges,

All these discrepancies are not due to algorithmic deficiencies but to human decisions.

Albert, DasGupta, Dondi, Kachalo, Sontag, Zelikovsky, Westbrooks, 2007

Page 63: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Software is available at:

http://www.cs.uic.edu/~dasgupta/network-synthesis/

• runs on any machine with MS Windows (Win32)– click, save the executable and run

Page 64: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Data sources for this type of network synthesis

Signal transduction pathway repositories such as

• TRANSPATH (http://www.gene-regulation.com/pub/databases.html#transpath)

• protein interaction databases such as the Search Tool for the Retrieval of Interacting Proteins (http://string.embl.de)

contain up to thousands of interactions, a large number of which are not supported by direct physical evidence.

NET-SYNTHESIS can be used to filter redundant information while keeping all direct interactions

Page 65: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Transitive reduction step used a heuristic

How good is the heuristic in general?

Page 66: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Performance of our BTR algorithm on

“random” signal transduction networks

But, what is a random biological network?

Page 67: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Biological networks are scale-free: e.g.,

N. Guelzim, S. Bottani, P. Bourgine, and F. Kepes, Topological and causal structure of the yeast transcriptional regulatory network, Nature Genetics 31, 60–63, 2002

Biological networks are NOT scale-free: e.g., :

R. Khanin and E. Wit, How Scale-Free Are Biological Networks ?, Journal of Computational Biology, 13 (3), 810 -818, 2006

So, we decided to look at the literature ourselves and decide on a reasonable model for random signal transduction networks

Page 68: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

According to us, random signal transduction networks:• distribution of in-degree of the network is exponential:

Pr[in-degree=x]=L e-Lx, ½ ≤ L ≤ ⅓

maximum in-degree is 12

• distribution of out-degree is governed by a power-law:

x ≥ 1 : Pr[out-degree=x]=cx-c;

Pr[out-degree=0] ≥ c, 2 < c < 3

maximum out-degree is 200

• ratio of excitory to inhibitory edges between 2 and 4

random graphs with prescribed degree distributions are generated using the

procedure described in:

M. E. J. Newman, S. H. Strogatz and D. J. Watts. Random graphs with arbitrary degree distributions and their applications, Physical Review E, 64 (2), 026118-026134, 2001

Page 69: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

What percentage of edges should be

Critical (known direct interaction)?No known accurate estimates:• curated network of Ma'ayan et al. (Science, 2005)

– expected to have close to 100% critical edges as they specifically focused on collecting direct interactions only

• Protein interaction networks are expected to be mostly critical – Giot et al., Science, 2003– Han et al., Nature, 2004– Li et al., Science, 2004

• Genetic interactions (e.g., synthetic lethal interactions) – represent compensatory relationships– only a minority are direct interactions.

• Reverse engineering approaches:– lead to networks whose interactions are close to 0% critical

Page 70: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

We tried a few small and large values, such as 1%, 2% and 50%, for the percentage of edges that are critical to catch qualitatively all regions of dynamics of the network that are of interest

Page 71: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Tested on about 550 random networks

– # of vertices in the range of about 100 to 1000

– running time for individual networks• seconds to at most a minute

Page 72: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Verify the robustness of performance of our BTR algorithm

– perturb network such they do not change the optimal solution of the original graph

Almost always the solution quality does not change because of this

Page 73: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

% additional edges = ( ( |E'| / OPT ) - 1 ) * 100

fre

qu

en

cy

of

oc

cu

ren

ce

On an average, we use about 5.5% more edges than the optimum

Performance of our implemented algorithm for BTR on random networks

Albert, DasGupta, Dondi, Kachalo, Sontag, Zelikovsky, Westbrooks, 2007

Page 74: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Other applications NET-SYNTHESIS Synthesizing a Network for T Cell Survival and Death in LGL Leukemia

Backgound• Large Granular Lymphocytes (LGL)

– medium to large size cells with eccentric nuclei and abundant cytoplasm– comprise 10%~15% of the total peripheral blood mononuclear cells– two major lineages

• CD3- natural-killer (NK) cell lineage: ~85% of LGL cells• CD3+ lineage: ~15% of LGL

Kachalo, Zhang, Sontag, Albert, DasGupta, 2008

Page 75: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

LGL leukemia

disordered clonal expansion of LGL and their invasions in the marrow, spleen and liver

Page 76: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Background (continued)

Ras: – small GTPase essential for controlling multiple essential signaling

pathways– its deregulation is frequently seen in human cancers

Activation of H-Ras require its farnesylation, which can be blocked by Farnesyltransferase inhibitiors (FTIs)

This envisions FTIs as future drug target for anti-cancer therapies, and several FTIs have entered early phase clinical trials

This observation, together with the finding that Ras is constitutively

activated in leukemic LGL cells, leads to the hypothesis that Ras plays an important role in LGL leukemia, and may functions through influencing Fas/FasL pathway.

Page 77: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

we constructed the cell-survival/cell-death regulation-related signaling network, with special interest on the Ras’ effect on apoptosis response through Fas/FasL pathway

Goal: initiate understanding of the interactions between Ras pathway and Fas/FasL pathways, two of the major pathways that regulate cell survival/death decision.

Currently, there is no standard therapy for LGL leukemia. Understanding the mechanism of this disease is crucial for drug/therapy development

Proteins that modulate the Ras-apoptosis response can potentially serve as future reference for drug design and therapeutic-target-molecule search, and this may not be restricted to LGL leukemia

Kachalo, Zhang, Sontag, Albert, DasGupta, 2008

Page 78: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Synthesizing a Network for T Cell Survival and Death in Large Granular Lymphocyte Leukemia

• Synthesized a cell-survival/cell-death regulation-related signaling network from the TRANSPATH 6.0 database, with additional information manually curated from literature search

• 359 vertices of this network represent proteins/protein families and mRNAs participating in pro-survival and Fas-induced apoptosis pathways

• 1295 edges represent regulatory relationships between nodes, including protein interactions, catalytic reactions, transcriptional regulation (no double-causal interactions were known)

• Performing BTR with NET-SYNTHESIS reduced the total edge-number to 873

Kachalo, Zhang, Sontag, Albert, DasGupta, 2008

Page 79: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

To focus on pathways that involve the 33 known T-LGL deregulated proteins, we designated vertices that correspond to proteins with no evidence of being changed during T-LGL as pseudo-vertices and deleted the label “Y” for those edges whose both endpoints were pseudo-vertices

Recursively performing “Reduction (faster)” BTR and “Collapse degree-2 pseudonodes” of NET-SYNTHESIS until no edge/node could be further removed simplified the network to 267 nodes and 751 edges.

Kachalo, Zhang, Sontag, Albert, DasGupta, 2008

Page 80: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

For further results, see

R. Zhang, M. V. Shah, J. Yang, S. B. Nyland, X. Liu, J. K. Yun, R. Albert, and T. P. Loughran,

Network Model of Survival Signaling in LGL Leukemia

PNAS, 2008

Page 81: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Binary transitive reductions revives two further interesting questions:

– how redundant are biological networks ?• what is redundancy and how to measure it ?

– percentage of edges removed by binary transitive reduction

(Albert, DasGupta, Gitter, Gürsoy, Hegde, Pal, Sivanathan, Sontag, 2011)

– are redundancy and dynamical properties correlated ?

Page 82: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Feedback loops

and

dynamics of biological networks

analyzing behaviors of feedback loops is a long-standing topic in the context of regulation, metabolism, and developments

– e.g., see classical reference works such as

J. Monod and F. Jacob, General conclusions: telenomic mechanisms in cellular metabolism, growth, and differentiation, Cold Spring Harbor Symp. Quant. Biol., 26, 389 401, 1961

Page 83: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Monotone dynamical system

Page 84: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Monotone dynamical system

Page 85: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Monotone systems are “simpler behaved” systems:

• pathological behavior (“chaos”) is ruled out

• even though they may have arbitrarily large dimensionality, monotone systems behave in many ways like one-dimensional systems

– e. g. , in monotone systems• bounded trajectories generically converge to steady states• there are no stable oscillatory behaviors

Page 86: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Associated Signal Transduction Network

v1vj vkvi vn

0)(

xx

f

i

k

0)(

xx

f

j

k

Page 87: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

+

-

+

++

+

+ +

- -

-

-

sign-consistent sign-inconsistent

parity: product of signs

sign-consistent: every undirected path between two nodes have same parity

--

( check undirected paths 1 — 4 and 1 — 2 — 3 — 4 )

Page 88: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

sign-consistent networks are monotone system

This allows us to define the

“degree of monotonicity” M

of a differential equation system

in the following way:

minimum percentage of edges we need to delete

to make the associated signal transduction network

sign-consistent

(Albert, DasGupta, Gitter, Gürsoy, Hegde, Pal, Sivanathan, Sontag, 2011)

Page 89: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Page 90: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Undirected Labeling Problem (ULP)

needed to compute degree of monotonicity M

Given: undirected graph G=(V,E)

edge labeling function h: E {0,1}

Valid solution: a vertex labeling function f: V {0,1}

Definition: an edge {u,v}E is consistent if

h(u,v) = f(u) + f(v) (mod 2)

Goal: maximize number of consistent edges

Bad news: NP-hard and even MAX-SNP-hard.

DasGupta, Enciso, Sontag, Zhang, 2007

Page 91: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Algorithm for ULP• Solve the following vector program via Semidefinite programming methods:

maximize

subject to: for each vV, xv · xv = 1

for each vV, xv|V|

• Select an uniformly random vector r in the |V|-dimensional unit sphere

• Label each vertex v as 0 if r · xv 0 1 otherwise

It can be easily implemented in MATLAB

DasGupta, Enciso, Sontag, Zhang, 2007

Page 92: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

We have two measurable properties:

– (topological) redundancy R • percentage of edges removed by binary transitive reduction

– (dynamical) monotonicity M• minimum percentage of edges we need to delete to make the

associated signal transduction network consistent

M is negatively correlated to R

(Albert, DasGupta, Gitter, Gürsoy, Hegde, Pal, Sivanathan, Sontag, 2011)

Page 93: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Some other conclusions from (Albert, DasGupta, Gitter, Gürsoy, Hegde, Pal, Sivanathan, Sontag, 2011)

• the redundancy measure R is statistically significant

• transcriptional networks are less redundant than signaling networks

• redundancy of C. elegans metabolic network is largely due to currency metabolites

• calculation of redundancy values and minimal networks provides a way to gain insight into predicted orientation of a protein-protein-interaction (PPI) networks

Page 94: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Future Research Questionsin the context of parallel and distributed computing

• Synchronization: – no “global clocks” are known to exist for cellular processes (ignoring

circadian rhythms and some other global timing mechanisms in higher organisms)

• Spatial effects: – localization (nuclear, cytoplasmic, membrane-bound) in cells

• akin to geographical location affecting communication speeds and coordination in distributed computing

Page 95: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

List of some relevant referencesR. Albert, B. DasGupta, et al. A New Computationally Efficient Measure of Topological Redundancy of Biological and Social Networks, Physical Review E, 84 (3), 036117, 2011.

B. DasGupta, P. Vera-Licona, E. Sontag. Reverse Engineering of Molecular Networks from a Common Combinatorial Approach, in Algorithms in Computational Molecular Biology: Techniques, Approaches and Applications, John Wiley & Sons, Inc., 2011.

R. Albert, B. DasGupta, E. Sontag. Inference of signal transduction networks from double causal evidence, in Methods in Molecular Biology: Topics in Computational Biology, D. Fenyo (editor), Springer , 2010.

P. Berman, B. DasGupta, M. Karpinski. Approximating Transitive Reduction Problems for Directed Networks, 11 th Algorithms and Data Structures Symposium, 2009.

R. Albert, B. DasGupta, R. Dondi, E. Sontag. Inferring (Biological) Signal Transduction Networks via Transitive Reductions of Directed Graphs, Algorithmica, 51 (2), 129-159, 2008.

S. Kachalo, R. Zhang, E. Sontag, R. Albert, B. DasGupta. NET-SYNTHESIS: A software for synthesis, inference and simplification of signal transduction networks, Bioinformatics, 24 (2), 293-295, 2008.

P. Berman, B. DasGupta, E. Sontag. Algorithmic Issues in Reverse Engineering of Protein and Gene Networks via the Modular Response Analysis Method, Annals of the New York Academy of Sciences, 2007.

R. Albert, B. DasGupta, et al. A Novel Method for Signal Transduction Network Inference from Indirect Experimental Evidence, Journal of Computational Biology, 14 (7), 927-949, 2007.

B. DasGupta, G. A. Enciso, E. Sontag, Y. Zhang. Algorithmic and Complexity Results for Decompositions of Biological Networks into Monotone Subsystems}, Biosystems, 90 (1), 161-178, 2007.

P. Berman, B. DasGupta, E. Sontag. Computational Complexities of Combinatorial Problems With Applications to Reverse Engineering of Biological Networks, in Advances in Computational Intelligence: Theory and Applications, F.-Y. Wang and D. Liu (editors), Series in Intelligent Control and Intelligent Automation, World Scientific publishers, 303-316, 2007.

P. Berman, B. DasGupta, E. Sontag. Randomized Approximation Algorithms for Set Multicover Problems with Applications to Reverse Engineering of Protein and Gene Networks, Discrete Applied Mathematics, 155 (6-7), 733-749, 2007.

Page 96: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Acknowledgments

Thanks to research collaborators for these projects

R. Albert (Penn State) P. Berman (Penn State) R. Dondi (U. of Bergamo)

G. Enciso (UC Irvine) A. Gitter (CMU) G. Gürsoy (UIC)

R. Hegde (UIC) S. Kachalo (UIC) M. Karpinski (Bonn)

P. Pal G. S. Sivanathan (UIC) E. Sontag (Rutgers)P. Vera-Licona (INRIA) K. Westbrooks (GSU) A. Zelikovsky (GSU)R. Zhang (Penn State) Y. Zhang (UIC)

Thanks to National Science Foundation (NSF) for funding:

DBI-1062328 IIS-1064681 IIS-0346973 DBI-0543365

IIS-0610244 CCR-9800086 CNS-0206795 CCF-0208749

Thanks to generous support from DIMACS (Rutgers) during my Sabbatical leave through their special focus on computational and mathematical epidemiology

Page 97: Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at

ISBRA 2012

Thank you for your attention!

Questions?

98