models and algorithmic tools for computational processes in cellular biology bhaskar dasgupta...

Download Models and Algorithmic Tools for Computational Processes in  Cellular  Biology Bhaskar DasGupta Department of Computer Science

If you can't read please download the document

Upload: candy

Post on 25-Feb-2016

47 views

Category:

Documents


1 download

DESCRIPTION

Models and Algorithmic Tools for Computational Processes in Cellular Biology Bhaskar DasGupta Department of Computer Science University of Illinois at Chicago Chicago, IL 60607-7053 [email protected]. What is “systems biology” in one sentence ? study to unravel and conceptualize - PowerPoint PPT Presentation

TRANSCRIPT

Slide 1

Models and Algorithmic Tools for Computational Processes in Cellular Biology

Bhaskar DasGuptaDepartment of Computer ScienceUniversity of Illinois at ChicagoChicago, IL 60607-7053

[email protected]

1ISBRA 2012What is systems biology in one sentence ?

study to unravel and conceptualize dynamic processes, feedback control loops and signal processing mechanisms underlying lifeCellular Networks

A single cell by itself is complex enough

Various technologies have facilitated the monitoring of expression of genes and activities of proteins

Difficult to find the causal relations and overall structure of the network

http://www.nyas.org/ebriefreps/ebrief/000534/images/mendes2.gifISBRA 2012ISBRA 2012Cellular NetworksGenes and gene products interact on several levels, e.g.:

Genes regulate each others expression as part of gene regulatory networkstranscription factors can activate or inhibit the transcription of genes to give mRNAsthese transcription factors are themselves products of genes

Protein-protein interaction networksproteins can participate in diverse post-translational interactions that lead to modified protein functions or to formation of protein complexes that have new roles

Different levels of interactions are integrated e.g., presence of an external signal triggers a cascade of interactions that involves biochemical reactions, protein-protein interactions and transcriptional regulationISBRA 2012Cellular networks

cellular interaction maps only represent a network of possibilities, and not all edges are present and active in vivo in a given condition or in a given cellular location

only an integration of time-dependent interaction and activity information will be able to give the correct dynamical picture of a cellular network

ISBRA 2012Modeling problem

interaction data produced by the biologist in the form of a diagram (e.g., some type of labeled digraph)

wish to pose questions about the behavior (dynamics) of such a network

essential to provide a precise mathematical formulation of its dynamics, and specifically how the state of each node depends on the state of the nodes interacting with it

ISBRA 2012Models

discrete, continuous and hybrid modelstheir inter-relationships, powers and limitationscomputational complexity and algorithmic issuesbiological implications and validationsfascinating interplay between several areas such as:biologycontrol theorydiscrete mathematics and computer scienceISBRA 2012System dynamicsstate variables continuousdiscrete (e.g., small number of quantitative states)

time variablescontinuous (e.g., partial differential equation, delay equations)discrete (difference equations, quantized descriptions of continuous variables)

deterministic or probabilistic nature of the model

hybrid modelscombines continuous and discrete time-scales and/orcombines continuous and discrete time variablesISBRA 2012Continuous-state dynamics

Differential equation(continuous-time)

Difference equation (discrete-time)

Examples of other modelsISBRA 2012

Boolean

x1, x2, x3 {0.1}

Boolean feedforward

SignalTransductionISBRA 2012Reverse engineering of modelsGivenpartial knowledge about the process/networkaccess to suitable biological experiments

How to gain more knowledge about the model ?effective use of resources (time, cost)Reverse engineering Process of backward reasoning, requiring careful observation of inputs and outputs, to elucidate the structure of the system

http://www.computerworld.com/computerworld/records/images/story/46Reverse-engineering.gifISBRA 2012Ingredients for reverse engineeringMathematical model to be reverse engineerede.g., differential equation model

Biological experiments available, e.g., perturbation experimentsgene expression measurementsISBRA 2012Many reverse engineering approaches are possible

I will discuss two types of approaches:

hitting set based combinatorial approaches

modular response analysis (MRA) approach

ISBRA 2012Reverse Engineering of Networks Via Modular Response Analysis Method Ingredients for reverse engineering viamodular response analysis approachMathematical modelsdifferential equation model

Biological experiments availableperturbation experimentsISBRA 2012ISBRA 2012Differential Equation Model

state variables evolve by (unknown) ordinary differential equations

x = (x1(t),...,xn(t)) state variables over time t measurable (e.g., activity levels of proteins)

p = (p1,...,pm) parameters that can be manipulatedf(x*,p*)=0 p* wild-type (i.e., normal) condition of p x* corresponding steady-state conditionISBRA 2012settings for modular response analysis method

do not know f

but, prior information of the following type is available

parameter pj does not effect variables xi (i.e., fi /pj 0 or not)

Kholodenko, Kiyatkin, Bruggeman, Sontag, Westerhoff and Hoek, PNAS, 2002ISBRA 2012Experimental protocols(perturbation experiments)perturb one parameter, say pk

for perturbed p, measure steady state vector x = (p)let the system relax to steady statemeasure xi (western blots, microarrys etc.)

estimate n sensitivities:

where ej is the jth canonical basis vectorModeling GoalADCBTopology of connections only

Direction of the relationship

Information about stimulatory or inhibitory effects

Strength of relationship

++-+-2.19.31.24.85.3 Modeling goal can be at different levels

ISBRA 2012ISBRA 2012 Goal of MRA approach

Obtain information about the sign of fi/xj(x,p)

e.g., if fi/xj 0, then xj has a positive (catalytic) effect on the formation of xi

ISBRA 2012In a nutshellafter some combinatorics and linear algebra

one can quantify the additional prior knowledge necessary to reach the goalKholodenko, Kiyatkin, Bruggeman, Sontag, Westerhoff and Hoek, PNAS, 2002Bermen, DasGupta and Sontag, Discrete Applied Math, 2007Berman, DasGupta and Sontag, Annals of NYAS, 2007ISBRA 2012But, assuming (near)-sufficient prior information

how to determine a minimum or near-minimum number of perturbation experiments that will work?

This now becomes a algorithmic/complexity issue...ISBRA 2012After some effort, one can see that

designing minimal sets of experimentsleads tothe set multi-cover problemISBRA 2012In our biological application context,

our set-multicover algorithm provides a set of suggested experiments such that

# of experiments minimum possibleISBRA 2012Modular Response Analysis forDifferential Equations modelLinear AlgebraicformulationCombinatorialformulationCombinatorialAlgorithms(randomized)Selection ofappropriateperturbation experimentsOverall high-level pictureISBRA 2012Experimental validation of MRA MethodSee the paper:

S. D. M. Santos, P. J. Verveer, P. I. H. Bastiaens, Growth factor-induced MAPK network topology shapes Erk response determining PC-12 cell fateNature Cell Biology 9, 324 - 330 (2007)

MAPK pathway involving proteins Raf, Mek and Erk is activated through receptor tyrosine kinases TrkA and epidermal growth factor receptor (EGFR) by two different stimuli, NGF (neuronal-) or EGF (epidermal growth factor)

MRA method was applied to determine the MAPK network architecture in the context of NGF and EGF stimulations

Reverse Engineering of Networks Via Hitting-set based (combinatorial) Method ISBRA 2012steady state profiles of perturbations of the networkhitting set introduceredundancymulti-hitting setexpression data representing state transition measurement for wildtype and perturbation datatopology of interconnection network hitting set introduceredundancymulti-hitting setHitting set based combinatorial approachestopology of interconnection network Basic idea behind the hitting-set based approachesISBRA 2012which variables influence x5 ?

x5 changes so does x1, x3, x4

at least one of {x1,x3,x4} must influence x5build dependency information over all successive time steps{x1,x3,x4}{x1}{x1,x3}{x2,x3,x4}{x1}minimal dependency(hitting set problem){x1,x2}

Why construct minimal dependency ?

Occam's razorentia non sunt multiplicanda praeter necessitatem (entities must not be multiplied beyond necessity)

However, biological networks may be redundant:e.g.G. Tononi, O. Sporns, G. M. Edelman, PNAS, 1999R. Albert et al., Physical Review E, 2011

How can we introduce redundancy if necessary ?ISBRA 2012How can we introduce redundancy if necessary ?

First idea: add random extra dependencies (edges)not good, these edges may not be supported by given data

Better idea: modify hitting set to multi-hitting set

{x1,x3,x4} previously: select at least 1 now: select at least 2 (in general, some r)

ISBRA 2012Evaluation of performance of reverse engineering Methods

Reverse-engineering methods are ill-posed, i.e., their solution is not uniqueexistence of measurement error not all molecular species involved in a given analyzed phenomenon are included in the construction of a network i.e., existence of hidden variables

Two possible ways for evaluation:

Experimental testing of predictions: after a model has been inferred, newly found interactions or predictions can be tested experimentally

Benchmarking testing: measure how accurate the method of our interest is in recovering a known (gold standard) network ISBRA 2012Evaluation of performance of reverse engineering MethodsMetrics for accuracy for benchmark testing

Measurements:correct interactions inferred (true positives, TP)incorrect interactions inferred (false positives, FP)correct non-interactions inferred (true negatives, TN)incorrect non-interactions inferred (false negatives FN)

Metricsrecall or true positive rate

false positive rate accuracy

precision or positive predictive value

ISBRA 2012

Two published method based on hitting set approach

(A) Ideker, Thorsson, Karp, PSB (2000)First step (network inference): estimate a set of Boolean networks consistent with an observed set of steady-state gene expression profiles, each generated from a different perturbation to the genetic networkSecond step (optimization): use an entropy-based approach to select an additional perturbation experiment to perform a model selection from the set of predicted Boolean networks

(B) Jarrah, Laubenbacher, Stigler, Stillman, Adv. in Applied Mathematics (2007)Attempts to infer the most likely causal relationships among network elements from gene expression data

For other published results, see, for example: Krupa, Journal of Theoretical Biology (2002)ISBRA 2012Comparative analysis (via benchmark testing) of two approaches by(A) Ideker, Thorsson, Karp (B) Jarrah, Laubenbacher, Stigler, StillmanTwo gold standard networks:

Segment polarity network of Drosophila melanogaster (fruit fly):last step in the hierarchical cascade of gene families initiating the segmented body of the fruit flygenes of this network include engrailed (en) wingless (wg), hedgehog (hh) patched (ptc) cubitus interruptus (ci) and sloppy paired (slp) coding for the corresponding proteins1 para-segment of 4 cells60 nodes: variables are expression levels of segment polarity genes/proteinsBoolean model from (Albert and Othmer, Journal of Theoretical Biology, 2003)

ISBRA 2012

DasGupta, Vera-Licona, Sontag, 2011b. In Silico network: gene regulatory network with external perturbations13 species: 10 genes plus 3 different environmental perturbationsperturbations affect the transcription rate of the gene on which they act directly (through inhibition or activation) and their effect is propagated throughout the network by the interactions between the genesgenerated using the software package in (Mendes, Trends Biochem. Sci, 1997)

ISBRA 2012

DasGupta, Vera-Licona, Sontag, 2011generated time courses for both networks (a) and (b)

For method (A) we considered both greedy and linear programming based approximations to the hitting set problem as well as redundancy values R=1, 2

For method (B), input data must be discrete

used three discretization methods: graph-theoretic based approached D from (Dimitrova, Garcia-Puente, Jarrah, Laubenbacher, Stigler, Stillman, Vera-Licona, 2010)quantile Q discretization (method in which each variable state receives an equal number of data values)interval I discretization (select thresholds for the different discrete values).

ISBRA 2012DasGupta, Vera-Licona, Sontag, 2011Summary of ComparisonISBRA 2012

DasGupta, Vera-Licona, Sontag, 2011network (b): method (B) was better than method (A) in ROC space method (A) achieved a performance no better than random guessingnetwork (a): method (B) could not obtain any results after running over 12 hours method (A) was able to compute results in less than 1 minute method (A) improved slightly when small redundancy was introducedISBRA 2012implementation of method (B): http://polymath.vbi.vt.edu/polynome/

implementation of method (A) done by (DasGupta, Vera-Licona, Sontag, 2011) athttp://sts.bioengr.uic.edu/causal/

DasGupta, Vera-Licona, Sontag, 2011Direct Synthesization of Signal Transduction Networks

Only from known interactions and informationNo new experiments needed

ISBRA 2012Overall Goaldirect interactionA BA Bdouble-causal interactionA (B C)A (B C)additionalinformationMethod(algorithms, software)FASTnetworkminimal complexitybiologically relevantISBRA 2012Nature of experimental evidencebiochemical direct interaction, e.g., binding of two proteinsa transcription factor activating the transcription of a gene a chemical reaction with a single reactant and single product

pharmacological indirect causal effects most probably resulting from a chain of interactions and reactions, e.g., binding of a chemical to a receptor protein starts a cascade of protein-protein interactions and chemical reactions that ultimately results in the transcription of a gene

genetic evidence of differential responses to a stimuluscan be direct, but most often indirect (double-causal)

ISBRA 2012We describe a method for synthesizing double-causal (path-level) information into a consistent network

ISBRA 2012Direct interactions

A promotes B A B

A inhibits B A B

Illustration of double-causal interactionC promotes the process of A promoting BABBACBApseudoISBRA 2012Critical edge(known direct interaction, part of input)

ISBRA 2012Main computational step for network synthesis

Pseudo-vertex collapse (PVC) easy

Binary transitive reduction (BTR) hardneed heuristics

ISBRA 2012Pseudo-vertex collapse (PVC)

Intuitively, PVC is useful for reducing the pseudo-vertex set to the minimal set that maintains the graph consistent with all indirect experimental observations.uvin(u)=in(v)out(u)=out(v)uvpseudo-verticesnew psuedo-vertexISBRA 2012Illustration of Binary Transitive Reduction (BTR) remove?yes,alternate pathremove?no,critical edgeIntuitively, the BTR problem is useful for determining the sparsest graph consistent with a set of experimental observationsISBRA 2012Some biologists did look at very simplified or somewhat different version of BTR, e.g.:

A. Wagner, Estimating Coarse Gene Network Structure from Large-Scale Gene Perturbation Data, Genome Research, 12, pp. 309-315, 2002too special (reachability only), no efficient algorithms reported

T. Chen, V. Filkov and S. Skiena, Identifying Gene Regulatory Networks from Experimental Data, Third Annual International Conference on Computational Moledular Biology, pp. 94-103, 1999excess edge deletion problem, biologically too restrictive version

See the following excellent survey for more comprehensive information about biological network inference and modeling:

V. Filkov, Identifying Gene Regulatory Networks from Gene Expression Data, in Handbook of Computational Molecular Biology (edited by S. Aluru), Chapman & Hall/CRC Press, 2005 H. D. Jong, Modelling and Simulation of Genetic Regulatory Systems: A Literature Review, Journal of Computational Biology, Volume 9, Number 1, pp. 67-103, 2002ISBRA 2012High level description of the network synthesis process

Synthesize direct interactionsOptimizeSynthesize double-causal interactionsOptimizeInteraction withbiologistsBTRPVCBTRAlbert, DasGupta, Dondi, Kachalo, Sontag, Zelikovsky, Westbrooks, 2007ISBRA 2012excitory (inhibitory) connection encoded by edge label 0 (1)

[encode single causal relationships] 1.1 Build networks for connections like AB and AB noting each critical edge.1.2 Apply BTR[encode double causal reltionships] 2.1 For each double causal relationship of the form A (B C) with x,y{0,1}, add new nodes and/or edges as follows:if B C Ecritical then add A (B C) if no subgraph of the form (for some node D with b = a+b = y (mod 2) )

then add the subgraph (where P is a new pseudo-node and b = a+b = y (mod 2) )

2.2 Apply PVC[final reduction] Apply BTRxyxxxyyABDCbaabAPBCAlbert, DasGupta, Dondi, Kachalo, Sontag, Zelikovsky, Westbrooks, 2007ISBRA 2012All the steps in the network synthesis procedure except the steps that involve BTR can be done easily

Thus, it behooves to look at BTR more closelyISBRA 2012But, before that, biological validation of the network synthesis approach is desirable

Need a network that uses double-causal experimental evidenceISBRA 2012Plant signal transduction network

consistent guard cell signal transduction network for ABA-induced stomatal closuremanually curateddescribed in S. Li, S. M. Assmann and R. Albert, Predicting Essential Components of Signal Transduction Networks: A Dynamic Model of Guard Cell Abscisic Acid Signaling, PLoS Biology, 4(10), October 2006list of experimentally observed causal relationships collected by Li et al. and published as Table S1. This table containsaround 140 interactions and causal inferences, both of type A promotes B and C promotes process (A promotes B) We augment this list with critical edges drawn from biophysical/biochemical knowledge on enzymatic reactions and ion flows and with simplifying hypotheses made by Li et al. both described in Text of S1

ISBRA 2012 We also formalized an additional rule specific to the context of this network (and implicitly assumed by Li et al.) regarding enzyme-catalyzed reactionsISBRA 2012

Regulatory interactions between ABA signal transduction pathway componentsISBRA 2012

Regulatory interactions between ABA signal transduction pathway components (continued)NO GC not critical and not enzymaticERA1 (ABA CalM)ISBRA 2012Some nodes in the network

GCR1 putative G protein coupled receptorOST1 proteinNO Nitric OxideABH1 RNA cap-binding proteinRAC1 small GTPase proteinISBRA 2012

(left) Guard cell signal transduction network for ABA-induced stomatal closure manually curated by Li, Assmann and Albert [source: PloS Biology, 10 (4), 2006].

( right) our developed automated network synthesis procedure produced a reduced (fewer edges) network while preserving all observed pathways

Albert, DasGupta, Dondi, Kachalo, Sontag, Zelikovsky, Westbrooks, 2007ISBRA 2012

Albert, DasGupta, Dondi, Kachalo, Sontag, Zelikovsky, Westbrooks, 2007ISBRA 2012Summary of comparison of the two networks

Li et al. has 54 vertices and 92 edges our network has 57 vertices but 84 edgesBoth networks have identical strongly connected component of verticesAll the paths present in the Li et al.s reconstruction are present in our network as wellThe two networks have 71 common edgesIt took a few seconds to synthesize our network

Albert, DasGupta, Dondi, Kachalo, Sontag, Zelikovsky, Westbrooks, 2007ISBRA 2012Summary of comparison of the two networks (continued)

Thus the two networks are highly similar but diverge on a few edges,

All these discrepancies are not due to algorithmic deficiencies but to human decisions.

Albert, DasGupta, Dondi, Kachalo, Sontag, Zelikovsky, Westbrooks, 2007ISBRA 2012Software is available at:

http://www.cs.uic.edu/~dasgupta/network-synthesis/

runs on any machine with MS Windows (Win32) click, save the executable and runISBRA 2012Data sources for this type of network synthesisSignal transduction pathway repositories such as

TRANSPATH (http://www.gene-regulation.com/pub/databases.html#transpath)protein interaction databases such as the Search Tool for the Retrieval of Interacting Proteins (http://string.embl.de)

contain up to thousands of interactions, a large number of which are not supported by direct physical evidence.

NET-SYNTHESIS can be used to filter redundant information while keeping all direct interactions

ISBRA 2012Transitive reduction step used a heuristic

How good is the heuristic in general?ISBRA 2012Performance of our BTR algorithm on random signal transduction networks

But, what is a random biological network? ISBRA 2012Biological networks are scale-free: e.g.,

N. Guelzim, S. Bottani, P. Bourgine, and F. Kepes, Topological and causal structure of the yeast transcriptional regulatory network, Nature Genetics 31, 6063, 2002

Biological networks are NOT scale-free: e.g., :

R.Khanin and E.Wit, How Scale-Free Are Biological Networks ?, Journal of Computational Biology, 13 (3), 810 -818, 2006

So, we decided to look at the literature ourselves and decide on a reasonable model for random signal transduction networks

ISBRA 2012According to us, random signal transduction networks:distribution of in-degree of the network is exponential: Pr[in-degree=x]=L e-Lx, L maximum in-degree is 12distribution of out-degree is governed by a power-law: x 1 : Pr[out-degree=x]=cx-c; Pr[out-degree=0] c, 2 < c < 3 maximum out-degree is 200ratio of excitory to inhibitory edges between 2 and 4

random graphs with prescribed degree distributions are generated using the procedure described in: M. E. J. Newman, S. H. Strogatz and D. J. Watts. Random graphs with arbitrary degree distributions and their applications, Physical Review E, 64 (2), 026118-026134, 2001

ISBRA 2012What percentage of edges should be Critical (known direct interaction)?No known accurate estimates:curated network of Ma'ayan et al. (Science, 2005) expected to have close to 100% critical edges as they specifically focused on collecting direct interactions only Protein interaction networks are expected to be mostly critical Giot et al., Science, 2003Han et al., Nature, 2004Li et al., Science, 2004 Genetic interactions (e.g., synthetic lethal interactions) represent compensatory relationshipsonly a minority are direct interactions. Reverse engineering approaches: lead to networks whose interactions are close to 0% critical

ISBRA 2012We tried a few small and large values, such as 1%, 2% and 50%, for the percentage of edges that are critical to catch qualitatively all regions of dynamics of the network that are of interest

ISBRA 2012Tested on about 550 random networks

# of vertices in the range of about 100 to 1000

running time for individual networksseconds to at most a minute

ISBRA 2012Verify the robustness of performance of our BTR algorithm

perturb network such they do not change the optimal solution of the original graph

Almost always the solution quality does not change because of this

ISBRA 2012

On an average, we use about 5.5% more edges than the optimumPerformance of our implemented algorithm for BTR on random networksAlbert, DasGupta, Dondi, Kachalo, Sontag, Zelikovsky, Westbrooks, 2007ISBRA 2012Other applications NET-SYNTHESIS Synthesizing a Network for T Cell Survival and Death in LGL Leukemia

BackgoundLarge Granular Lymphocytes (LGL)medium to large size cells with eccentric nuclei and abundant cytoplasmcomprise 10%~15% of the total peripheral blood mononuclear cellstwo major lineagesCD3- natural-killer (NK) cell lineage: ~85% of LGL cellsCD3+ lineage: ~15% of LGL

Kachalo, Zhang, Sontag, Albert, DasGupta, 2008ISBRA 2012LGL leukemia

disordered clonal expansion of LGL and their invasions in the marrow, spleen and liverISBRA 2012Background (continued)Ras: small GTPase essential for controlling multiple essential signaling pathwaysits deregulation is frequently seen in human cancers

Activation of H-Ras require its farnesylation, which can be blocked by Farnesyltransferase inhibitiors (FTIs)

This envisions FTIs as future drug target for anti-cancer therapies, and several FTIs have entered early phase clinical trials

This observation, together with the finding that Ras is constitutively activated in leukemic LGL cells, leads to the hypothesis that Ras plays an important role in LGL leukemia, and may functions through influencing Fas/FasL pathway.ISBRA 2012we constructed the cell-survival/cell-death regulation-related signaling network, with special interest on the Ras effect on apoptosis response through Fas/FasL pathway

Goal: initiate understanding of the interactions between Ras pathway and Fas/FasL pathways, two of the major pathways that regulate cell survival/death decision.

Currently, there is no standard therapy for LGL leukemia. Understanding the mechanism of this disease is crucial for drug/therapy development

Proteins that modulate the Ras-apoptosis response can potentially serve as future reference for drug design and therapeutic-target-molecule search, and this may not be restricted to LGL leukemiaKachalo, Zhang, Sontag, Albert, DasGupta, 2008ISBRA 2012Synthesizing a Network for T Cell Survival and Death in Large Granular Lymphocyte Leukemia

Synthesized a cell-survival/cell-death regulation-related signaling network from the TRANSPATH 6.0 database, with additional information manually curated from literature search

359 vertices of this network represent proteins/protein families and mRNAs participating in pro-survival and Fas-induced apoptosis pathways

1295 edges represent regulatory relationships between nodes, including protein interactions, catalytic reactions, transcriptional regulation (no double-causal interactions were known)

Performing BTR with NET-SYNTHESIS reduced the total edge-number to 873Kachalo, Zhang, Sontag, Albert, DasGupta, 2008ISBRA 2012To focus on pathways that involve the 33 known T-LGL deregulated proteins, we designated vertices that correspond to proteins with no evidence of being changed during T-LGL as pseudo-vertices and deleted the label Y for those edges whose both endpoints were pseudo-vertices

Recursively performing Reduction (faster) BTR and Collapse degree-2 pseudonodes of NET-SYNTHESIS until no edge/node could be further removed simplified the network to 267 nodes and 751 edges.

Kachalo, Zhang, Sontag, Albert, DasGupta, 2008ISBRA 2012For further results, see

R. Zhang, M. V. Shah, J. Yang, S. B. Nyland, X. Liu, J. K. Yun, R. Albert, and T. P. Loughran, Network Model of Survival Signaling in LGL Leukemia PNAS, 2008

Binary transitive reductions revives two further interesting questions:

how redundant are biological networks ?what is redundancy and how to measure it ? percentage of edges removed by binary transitive reduction (Albert, DasGupta, Gitter, Grsoy, Hegde, Pal, Sivanathan, Sontag, 2011)

are redundancy and dynamical properties correlated ?ISBRA 2012ISBRA 2012Feedback loops and dynamics of biological networks

analyzing behaviors of feedback loops is a long-standing topic in the context of regulation, metabolism, and developments

e.g., see classical reference works such as

J. Monod and F. Jacob, General conclusions: telenomic mechanisms in cellular metabolism, growth, and differentiation, Cold Spring Harbor Symp. Quant. Biol., 26, 389401, 1961

ISBRA 2012Monotone dynamical system

ISBRA 2012Monotone dynamical system

ISBRA 2012Monotone systems are simpler behaved systems:

pathological behavior (chaos) is ruled out

even though they may have arbitrarily large dimensionality, monotone systems behave in many ways like one-dimensional systems

e. g. , in monotone systemsbounded trajectories generically converge to steady statesthere are no stable oscillatory behaviorsISBRA 2012Associated Signal Transduction Network

v1vjvkvivn

ISBRA 2012

+-++++++----sign-consistentsign-inconsistentparity: product of signssign-consistent: every undirected path between two nodes have same parity--( check undirected paths 1 4 and 1 2 3 4 )ISBRA 2012sign-consistent networks are monotone system

This allows us to define the degree of monotonicity Mof a differential equation systemin the following way:

minimum percentage of edges we need to deleteto make the associated signal transduction network sign-consistent(Albert, DasGupta, Gitter, Grsoy, Hegde, Pal, Sivanathan, Sontag, 2011)ISBRA 2012

ISBRA 2012Undirected Labeling Problem (ULP)needed to compute degree of monotonicity M

Given: undirected graph G=(V,E) edge labeling function h: E {0,1}

Valid solution: a vertex labeling function f: V {0,1}

Definition: an edge {u,v}E is consistent if h(u,v) = f(u) + f(v) (mod 2)

Goal: maximize number of consistent edges

Bad news: NP-hard and even MAX-SNP-hard.

DasGupta, Enciso, Sontag, Zhang, 2007ISBRA 2012Algorithm for ULPSolve the following vector program via Semidefinite programming methods: maximize subject to: for each vV, xv xv = 1

for each vV, xv|V|

Select an uniformly random vector r in the |V|-dimensional unit sphere

Label each vertex v as 0 if r xv 0 1 otherwiseIt can be easily implemented in MATLABDasGupta, Enciso, Sontag, Zhang, 2007

We have two measurable properties:

(topological) redundancy R percentage of edges removed by binary transitive reduction

(dynamical) monotonicity Mminimum percentage of edges we need to delete to make the associated signal transduction network consistent

M is negatively correlated to RISBRA 2012(Albert, DasGupta, Gitter, Grsoy, Hegde, Pal, Sivanathan, Sontag, 2011)Some other conclusions from (Albert, DasGupta, Gitter, Grsoy, Hegde, Pal, Sivanathan, Sontag, 2011)

the redundancy measure R is statistically significant

transcriptional networks are less redundant than signaling networks

redundancy of C. elegans metabolic network is largely due to currency metabolites

calculation of redundancy values and minimal networks provides a way to gain insight into predicted orientation of a protein-protein-interaction (PPI) networksISBRA 2012ISBRA 2012Future Research Questionsin the context of parallel and distributed computing

Synchronization: no global clocks are known to exist for cellular processes (ignoring circadian rhythms and some other global timing mechanisms in higher organisms)

Spatial effects: localization (nuclear, cytoplasmic, membrane-bound) in cellsakin to geographical location affecting communication speeds and coordination in distributed computingList of some relevant referencesR. Albert, B. DasGupta, et al. A New Computationally Efficient Measure of Topological Redundancy of Biological and Social Networks, Physical Review E, 84 (3), 036117, 2011.

B. DasGupta, P. Vera-Licona, E. Sontag. Reverse Engineering of Molecular Networks from a Common Combinatorial Approach, in Algorithms in Computational Molecular Biology: Techniques, Approaches and Applications, John Wiley & Sons, Inc., 2011.

R. Albert, B. DasGupta, E. Sontag. Inference of signal transduction networks from double causal evidence, in Methods in Molecular Biology: Topics in Computational Biology, D. Fenyo (editor), Springer , 2010.

P. Berman, B. DasGupta, M. Karpinski. Approximating Transitive Reduction Problems for Directed Networks, 11th Algorithms and Data Structures Symposium, 2009.

R. Albert, B. DasGupta, R. Dondi, E. Sontag. Inferring (Biological) Signal Transduction Networks via Transitive Reductions of Directed Graphs, Algorithmica, 51 (2), 129-159, 2008.

S. Kachalo, R. Zhang, E. Sontag, R. Albert, B. DasGupta. NET-SYNTHESIS: A software for synthesis, inference and simplification of signal transduction networks, Bioinformatics, 24 (2), 293-295, 2008.

P. Berman, B. DasGupta, E. Sontag. Algorithmic Issues in Reverse Engineering of Protein and Gene Networks via the Modular Response Analysis Method, Annals of the New York Academy of Sciences, 2007.

R. Albert, B. DasGupta, et al. A Novel Method for Signal Transduction Network Inference from Indirect Experimental Evidence, Journal of Computational Biology, 14 (7), 927-949, 2007.

B. DasGupta, G. A. Enciso, E. Sontag, Y. Zhang. Algorithmic and Complexity Results for Decompositions of Biological Networks into Monotone Subsystems}, Biosystems, 90 (1), 161-178, 2007.

P. Berman, B. DasGupta, E. Sontag. Computational Complexities of Combinatorial Problems With Applications to Reverse Engineering of Biological Networks, in Advances in Computational Intelligence: Theory and Applications, F.-Y. Wang and D. Liu (editors), Series in Intelligent Control and Intelligent Automation, World Scientific publishers, 303-316, 2007.

P. Berman, B. DasGupta, E. Sontag. Randomized Approximation Algorithms for Set Multicover Problems with Applications to Reverse Engineering of Protein and Gene Networks, Discrete Applied Mathematics, 155 (6-7), 733-749, 2007.ISBRA 2012Acknowledgments

Thanks to research collaborators for these projects

R. Albert (Penn State) P. Berman (Penn State)R. Dondi (U. of Bergamo)G. Enciso (UC Irvine)A. Gitter (CMU) G. Grsoy (UIC)R. Hegde (UIC)S. Kachalo (UIC)M. Karpinski (Bonn) P. PalG. S. Sivanathan (UIC)E. Sontag (Rutgers)P. Vera-Licona (INRIA)K. Westbrooks (GSU)A. Zelikovsky (GSU)R. Zhang (Penn State)Y. Zhang (UIC)

Thanks to National Science Foundation (NSF) for funding:

DBI-1062328IIS-1064681IIS-0346973DBI-0543365IIS-0610244CCR-9800086CNS-0206795CCF-0208749

Thanks to generous support from DIMACS (Rutgers) during my Sabbatical leave through their special focus on computational and mathematical epidemiologyISBRA 2012ISBRA 2012Thank you for your attention!Questions?

989898Chart1804856595735372927221922161279512422010

Frequency% additional edges = ( ( |E'| / OPT ) - 1 ) * 100frequency of occurence

Sheet1

Sheet1804856595735372927221922161279512422010

Frequency% additional edges = ( ( |E'| / OPT ) - 1 ) * 100frequency of occurence

Sheet2

Sheet3