models and algorithmic tools for computational processes in cellular biology bhaskar dasgupta...
TRANSCRIPT
Models and Algorithmic Tools for Computational Processes in Cellular Biology
Bhaskar DasGuptaDepartment of Computer ScienceUniversity of Illinois at Chicago
Chicago, IL 60607-7053
ISBRA 2012
What is “systems biology”
in one sentence ?
study to unravel and conceptualize
dynamic processes, feedback control loops and signal processing mechanisms
underlying life
ISBRA 2012
Cellular Networks
• A single cell by itself is complex enough
• Various technologies have facilitated the monitoring of expression of genes and activities of proteins
• Difficult to find the causal relations and overall structure of the network
http://www.nyas.org/ebriefreps/ebrief/000534/images/mendes2.gif
ISBRA 2012
Cellular Networks
Genes and gene products interact on several levels, e.g.:
• Genes regulate each other’s expression as part of gene regulatory networks– transcription factors can activate or inhibit the transcription of genes
to give mRNAs– these transcription factors are themselves products of genes
• Protein-protein interaction networks– proteins can participate in diverse post-translational interactions that
lead to modified protein functions or to formation of protein complexes that have new roles
• Different levels of interactions are integrated – e.g., presence of an external signal triggers a cascade of interactions
that involves biochemical reactions, protein-protein interactions and transcriptional regulation
ISBRA 2012
Cellular networks
• cellular interaction maps only represent a network of possibilities, and not all edges are present and active in vivo in a given condition or in a given cellular location
• only an integration of time-dependent interaction and activity information will be able to give the correct dynamical picture of a cellular network
ISBRA 2012
Modeling problem
• interaction data produced by the biologist in the form of a diagram (e.g., some type of labeled digraph)
• wish to pose questions about the behavior (dynamics) of such a network
– essential to provide a precise mathematical formulation of its dynamics, and specifically how the state of each node depends on the state of the nodes interacting with it
ISBRA 2012
Models
• discrete, continuous and hybrid models• their inter-relationships, powers and limitations• computational complexity and algorithmic issues• biological implications and validations• fascinating interplay between several areas such as:
– biology– control theory– discrete mathematics and computer science
ISBRA 2012
System dynamics• state variables
– continuous– discrete (e.g., small number of quantitative states)
• time variables– continuous (e.g., partial differential equation, delay equations)– discrete (difference equations, quantized descriptions of continuous
variables)
• deterministic or probabilistic nature of the model
• hybrid models– combines continuous and discrete time-scales and/or– combines continuous and discrete time variables
ISBRA 2012
Continuous-state dynamics
Differential equation
(continuous-time)Difference equation
(discrete-time)
ISBRA 2012
Examples of other models
Boolean
x1, x2, x3 {0.1}
Boolean feedforward
SignalTransduction
ISBRA 2012
Reverse engineering of modelsGiven
– partial knowledge about the process/network– access to suitable biological experiments
How to gain more knowledge about the model ?– effective use of resources (time, cost)
ISBRA 2012
Reverse engineering
Process of backward reasoning, requiring careful observation of inputs and outputs, to elucidate the structure of the system
http://www.computerworld.com/computerworld/records/images/story/46Reverse-engineering.gif
ISBRA 2012
Ingredients for reverse engineering
• Mathematical model to be reverse engineered– e.g., differential equation model
• Biological experiments available, e.g., – perturbation experiments– gene expression measurements
ISBRA 2012
Many reverse engineering approaches are possible
I will discuss two types of approaches:
– “hitting set” based combinatorial approaches
– modular response analysis (MRA) approach
Reverse Engineering of Networks Via
Modular Response Analysis Method
ISBRA 2012
Ingredients for reverse engineering viamodular response analysis approach
• Mathematical models– differential equation model
• Biological experiments available– perturbation experiments
ISBRA 2012
Differential Equation Model
state variables evolve by (unknown) ordinary differential equations
11 1 2 n 1 2 m
nn 1 2 n 1 2 m
dx= f (x ,x ,…,x ,p ,p ,…,p )
dtdx
= f(x, t)dt
dx= f (x ,x ,…,x ,p ,p ,…,p )
dt
x = (x1(t),...,xn(t)) state variables over time t measurable (e.g., activity levels of proteins)
p = (p1,...,pm) parameters that can be manipulated
f(x*,p*)=0 p* “wild-type” (i.e., normal) condition of p x* corresponding steady-state condition
ISBRA 2012
settings for modular response analysis method
– do not know f
– but, prior information of the following type is available
• parameter pj does not effect variables xi
(i.e., fi /pj ≡ 0 or not)
Kholodenko, Kiyatkin, Bruggeman, Sontag, Westerhoff and Hoek, PNAS, 2002
ISBRA 2012
Experimental protocols(perturbation experiments)
• perturb one parameter, say pk
• for perturbed p, measure steady state vector x = (p)– let the system relax to steady state– measure xi (western blots, microarrys etc.)
• estimate n “sensitivities”:
* * *( ) ( ) ( ) (ii j j i
j
p p p e pp
ij *j j
1b for i = 1,2,…,n
p - p)
where ej is the jth canonical basis vector
ISBRA 2012
Modeling Goal
A
DC
B1. Topology of
connections only
2. Direction of the relationship
3. Information about stimulatory or inhibitory effects
4. Strength of relationship
+
+ -+
-2.1
9.3 1.24.8
5.3
Modeling goal can be at
different levels
ISBRA 2012
Goal of MRA approach
Obtain information about the sign of fi/xj(x,p)
e.g., if fi/xj 0, then xj has a positive (catalytic) effect on the formation of xi
ISBRA 2012
In a nutshell
after some combinatorics and linear algebra
one can quantify the additional prior knowledge necessary to reach the goal
Kholodenko, Kiyatkin, Bruggeman, Sontag, Westerhoff and Hoek, PNAS, 2002Bermen, DasGupta and Sontag, Discrete Applied Math, 2007Berman, DasGupta and Sontag, Annals of NYAS, 2007
ISBRA 2012
But, assuming (near)-sufficient prior information
• how to determine a minimum or near-minimum number of perturbation experiments that will work?
This now becomes a algorithmic/complexity issue...
ISBRA 2012
After some effort, one can see that
designing minimal sets of experiments
leads to
the set multi-cover problem
ISBRA 2012
In our biological application context,
our set-multicover algorithm provides a set of suggested experiments such that
# of experiments ≈ minimum possible
ISBRA 2012
Modular Response Analysis for
Differential Equations model
Linear Algebraicformulation
Combinatorialformulation
CombinatorialAlgorithms
(randomized)
Selection ofappropriate
perturbation experiments Overall high-level picture
ISBRA 2012
Experimental validation of MRA MethodSee the paper:
S. D. M. Santos, P. J. Verveer, P. I. H. Bastiaens,
Growth factor-induced MAPK network topology shapes Erk response determining PC-12 cell fate
Nature Cell Biology 9, 324 - 330 (2007)
• MAPK pathway involving proteins Raf, Mek and Erk is activated through receptor tyrosine kinases TrkA and epidermal growth factor receptor (EGFR) by two different stimuli, NGF (neuronal-) or EGF (epidermal growth factor)
• MRA method was applied to determine the MAPK network architecture in the context of NGF and EGF stimulations
Reverse Engineering of Networks Via
Hitting-set based (combinatorial) Method
ISBRA 2012
steady state profiles of perturbations of the network
hitting set
introduceredundancy
multi-hitting set
expression data representing state transition measurement
for wildtype and perturbation data
topology of interconnection
network
hitting set
introduceredundancy
multi-hitting set
“Hitting set” based combinatorial approaches
topology of interconnection
network
ISBRA 2012
Basic idea behind the hitting-set based approaches
which variables influence x5 ?
x5 changes so does x1, x3, x4
at least one of {x1,x3,x4} must influence x5
build dependency information over all successive time steps
{x1,x3,x4}{x1} {x1,x3}{x2,x3,x4}{x1}
minimal dependency(hitting set problem)
{x1,x2}
ISBRA 2012
Why construct “minimal” dependency ?
Occam's razor
entia non sunt multiplicanda praeter necessitatem
(entities must not be multiplied beyond necessity)
However, biological networks may be redundant:
e.g.– G. Tononi, O. Sporns, G. M. Edelman, PNAS, 1999– R. Albert et al., Physical Review E, 2011
How can we introduce redundancy if necessary ?
ISBRA 2012
How can we introduce redundancy if necessary ?
First idea: add random extra dependencies (edges)
not good, these edges may not be supported by given data
Better idea: modify hitting set to “multi-hitting set”
{x1,x3,x4} previously: select at least 1
now: select at least 2
(in general, some r)
ISBRA 2012
Evaluation of performance of reverse engineering Methods
Reverse-engineering methods are ill-posed, i.e., their solution is not unique– existence of measurement error – not all molecular species involved in a given analyzed phenomenon are
included in the construction of a network • i.e., existence of hidden variables
Two possible ways for evaluation:
Experimental testing of predictions:
after a model has been inferred, newly found interactions or predictions can be tested experimentally
Benchmarking testing:
measure how “accurate” the method of our interest is in recovering a known (“gold standard”) network
ISBRA 2012
Evaluation of performance of reverse engineering MethodsMetrics for accuracy for benchmark testing
Measurements:– correct interactions inferred (true positives, TP)– incorrect interactions inferred (false positives, FP)– correct non-interactions inferred (true negatives, TN)– incorrect non-interactions inferred (false negatives FN)
Metrics– recall or true positive rate
– false positive rate – – accuracy
– precision or positive predictive value
TPTPR =
TP +FN
FPFPR =
FP +TN
TPPPV =
FP +TP
TP +TNACC =
total possible interactions
ISBRA 2012
Two published method based on hitting set approach
(A) Ideker, Thorsson, Karp, PSB (2000)
First step (network inference):
estimate a set of Boolean networks consistent with an observed set of steady-state gene expression profiles, each generated from a different perturbation to the genetic network
Second step (optimization):
use an entropy-based approach to select an additional perturbation experiment to perform a model selection from the set of predicted Boolean networks
(B) Jarrah, Laubenbacher, Stigler, Stillman, Adv. in Applied Mathematics (2007)
Attempts to infer the most likely causal relationships among network elements from gene expression data
For other published results, see, for example:
Krupa, Journal of Theoretical Biology (2002)
ISBRA 2012
Comparative analysis (via benchmark testing) of two approaches by
(A) Ideker, Thorsson, Karp
(B) Jarrah, Laubenbacher, Stigler, Stillman
Two gold standard networks:
a. Segment polarity network of Drosophila melanogaster (fruit fly):– last step in the hierarchical cascade of gene families initiating the segmented
body of the fruit fly– genes of this network include
– engrailed (en) – wingless (wg), – hedgehog (hh) – patched (ptc) – cubitus interruptus (ci) and – sloppy paired (slp)
coding for the corresponding proteins– 1 para-segment of 4 cells– 60 nodes: variables are expression levels of segment polarity genes/proteins– Boolean model from (Albert and Othmer, Journal of Theoretical Biology, 2003)
DasGupta, Vera-Licona, Sontag, 2011
ISBRA 2012
b. In Silico network: gene regulatory network with external perturbations– 13 species: 10 genes plus 3 different environmental perturbations– perturbations affect the transcription rate of the gene on which they act directly
(through inhibition or activation) and their effect is propagated throughout the network by the interactions between the genes
– generated using the software package in (Mendes, Trends Biochem. Sci, 1997)
DasGupta, Vera-Licona, Sontag, 2011
ISBRA 2012
generated time courses for both networks (a) and (b)
For method (A) we considered both greedy and linear programming based approximations to the hitting set problem as well as redundancy values R=1, 2
For method (B), input data must be discrete
used three discretization methods: • graph-theoretic based approached “D” from (Dimitrova, Garcia-Puente,
Jarrah, Laubenbacher, Stigler, Stillman, Vera-Licona, 2010)• quantile “Q” discretization (method in which each variable state
receives an equal number of data values)• interval “I” discretization (select thresholds for the different discrete
values).
DasGupta, Vera-Licona, Sontag, 2011
ISBRA 2012
Summary of Comparison
DasGupta, Vera-Licona, Sontag, 2011
network (b): • method (B) was better than method (A) in ROC space• method (A) achieved a performance no better than random guessing
network (a): • method (B) could not obtain any results after running over 12 hours• method (A) was able to compute results in less than 1 minute• method (A) improved slightly when small redundancy was introduced
ISBRA 2012
implementation of method (B):
http://polymath.vbi.vt.edu/polynome/
implementation of method (A)
done by (DasGupta, Vera-Licona, Sontag, 2011) at
http://sts.bioengr.uic.edu/causal/
DasGupta, Vera-Licona, Sontag, 2011
Direct Synthesization of
Signal Transduction Networks
Only from known interactions and information
No new experiments needed
ISBRA 2012
Overall Goal
direct interactionA → BA ┤B
double-causal interaction
A → (B → C)A → (B ┤C)
additionalinformation
Method(algorithms, software)
FAST
network
minimal complexitybiologically relevant
ISBRA 2012
Nature of experimental evidence• biochemical
– direct interaction, e.g., • binding of two proteins• a transcription factor activating the transcription of a gene • a chemical reaction with a single reactant and single product
• pharmacological – indirect causal effects most probably resulting from a chain of
interactions and reactions, e.g., • binding of a chemical to a receptor protein starts a cascade of protein-
protein interactions and chemical reactions that ultimately results in the transcription of a gene
• genetic evidence of differential responses to a stimulus– can be direct, but most often indirect (double-causal)
ISBRA 2012
We describe a method for synthesizing double-causal (path-level) information into a consistent network
ISBRA 2012
Direct interactions
A promotes B A → B
A inhibits B A ┤ B
Illustration of double-causal interaction
C promotes the process of A promoting B
A B
BA
C
BApseudo
ISBRA 2012
“Critical” edge
(known direct interaction, part of input)
ISBRA 2012
Main computational step for network synthesis
• Pseudo-vertex collapse (PVC) – easy
• Binary transitive reduction (BTR) – hard– need heuristics
ISBRA 2012
Pseudo-vertex collapse (PVC)
Intuitively, PVC is useful for reducing the pseudo-vertex set to the minimal set that maintains the graph consistent with all indirect experimental observations.
u
v
in(u)=in(v)out(u)=out(v)
uv
pseudo-vertices
new psuedo-vertex
ISBRA 2012
Illustration of Binary Transitive Reduction (BTR)
remove?
yes,alternate path
remove?
no,critical edge
Intuitively, the BTR problem is useful for determining the sparsest graph consistent with a set of experimental observations
ISBRA 2012
Some biologists did look at very simplified or somewhat different version of BTR, e.g.:
• A. Wagner, Estimating Coarse Gene Network Structure from Large-Scale Gene Perturbation Data , Genome Research, 12, pp. 309-315, 2002– too special (reachability only), no efficient algorithms reported
• T. Chen, V. Filkov and S. Skiena, Identifying Gene Regulatory Networks from Experimental Data, Third Annual International Conference on Computational Moledular Biology, pp. 94-103, 1999– “excess edge deletion” problem, biologically too restrictive version
See the following excellent survey for more comprehensive information about biological network inference and modeling:
• V. Filkov, Identifying Gene Regulatory Networks from Gene Expression Data, in Handbook of
Computational Molecular Biology (edited by S. Aluru), Chapman & Hall/CRC Press, 2005 • H. D. Jong, Modelling and Simulation of Genetic Regulatory Systems: A Literature Review, Journal of
Computational Biology, Volume 9, Number 1, pp. 67-103, 2002
ISBRA 2012
High level description of the network synthesis process
Synthesize direct interactions
Optimize
Synthesize double-causal interactions
Optimize
Interaction with
biologists
BTR
PVC
BTR
Albert, DasGupta, Dondi, Kachalo, Sontag, Zelikovsky, Westbrooks, 2007
ISBRA 2012
All the steps in the network synthesis procedure except the steps that involve BTR can be done easily
Thus, it behooves to look at BTR more closely
ISBRA 2012
But, before that, biological validation of the network synthesis approach is desirable
Need a network that uses double-causal experimental evidence
ISBRA 2012
Plant signal transduction network
consistent guard cell signal transduction network for ABA-induced stomatal closure– manually curated– described in S. Li, S. M. Assmann and R. Albert, Predicting Essential Components
of Signal Transduction Networks: A Dynamic Model of Guard Cell Abscisic Acid Signaling, PLoS Biology, 4(10), October 2006
– list of experimentally observed causal relationships collected by Li et al. and published as Table S1. This table contains
• around 140 interactions and causal inferences, both of type “A promotes B” and “C promotes process (A promotes B)”
– We augment this list with critical edges drawn from biophysical/biochemical knowledge on enzymatic reactions and ion flows and with simplifying hypotheses made by Li et al. both described in Text of S1
ISBRA 2012
We also formalized an additional rule specific to the context of this network (and implicitly assumed by
Li et al.) regarding enzyme-catalyzed reactions
ISBRA 2012
Regulatory interactions between ABA signal transduction pathway components
ISBRA 2012
Regulatory interactions between ABA signal transduction pathway components (continued)
NO → GC not critical and not enzymatic
ERA1 ┤(ABA → CalM)
ISBRA 2012
Some nodes in the network
GCR1 putative G protein coupled receptor
OST1 protein
NO Nitric Oxide
ABH1 RNA cap-binding protein
RAC1 small GTPase protein
…
ISBRA 2012
(left) Guard cell signal transduction network for ABA-induced stomatal closure manually curated by Li, Assmann and Albert [source: PloS Biology, 10 (4), 2006].
( right) our developed automated network synthesis procedure produced a reduced (fewer edges) network while preserving all observed pathways
Albert, DasGupta, Dondi, Kachalo, Sontag, Zelikovsky, Westbrooks, 2007
ISBRA 2012
Albert, DasGupta, Dondi, Kachalo, Sontag, Zelikovsky, Westbrooks, 2007
ISBRA 2012
Summary of comparison of the two networks
• Li et al. has 54 vertices and 92 edges
our network has 57 vertices but 84 edges• Both networks have identical strongly connected component of
vertices• All the paths present in the Li et al.’s reconstruction are present
in our network as well• The two networks have 71 common edges• It took a few seconds to synthesize our network
Albert, DasGupta, Dondi, Kachalo, Sontag, Zelikovsky, Westbrooks, 2007
ISBRA 2012
Summary of comparison of the two networks (continued)
Thus the two networks are highly similar but diverge on a few edges,
All these discrepancies are not due to algorithmic deficiencies but to human decisions.
Albert, DasGupta, Dondi, Kachalo, Sontag, Zelikovsky, Westbrooks, 2007
ISBRA 2012
Software is available at:
http://www.cs.uic.edu/~dasgupta/network-synthesis/
• runs on any machine with MS Windows (Win32)– click, save the executable and run
ISBRA 2012
Data sources for this type of network synthesis
Signal transduction pathway repositories such as
• TRANSPATH (http://www.gene-regulation.com/pub/databases.html#transpath)
• protein interaction databases such as the Search Tool for the Retrieval of Interacting Proteins (http://string.embl.de)
contain up to thousands of interactions, a large number of which are not supported by direct physical evidence.
NET-SYNTHESIS can be used to filter redundant information while keeping all direct interactions
ISBRA 2012
Transitive reduction step used a heuristic
How good is the heuristic in general?
ISBRA 2012
Performance of our BTR algorithm on
“random” signal transduction networks
But, what is a random biological network?
ISBRA 2012
Biological networks are scale-free: e.g.,
N. Guelzim, S. Bottani, P. Bourgine, and F. Kepes, Topological and causal structure of the yeast transcriptional regulatory network, Nature Genetics 31, 60–63, 2002
Biological networks are NOT scale-free: e.g., :
R. Khanin and E. Wit, How Scale-Free Are Biological Networks ?, Journal of Computational Biology, 13 (3), 810 -818, 2006
So, we decided to look at the literature ourselves and decide on a reasonable model for random signal transduction networks
ISBRA 2012
According to us, random signal transduction networks:• distribution of in-degree of the network is exponential:
Pr[in-degree=x]=L e-Lx, ½ ≤ L ≤ ⅓
maximum in-degree is 12
• distribution of out-degree is governed by a power-law:
x ≥ 1 : Pr[out-degree=x]=cx-c;
Pr[out-degree=0] ≥ c, 2 < c < 3
maximum out-degree is 200
• ratio of excitory to inhibitory edges between 2 and 4
random graphs with prescribed degree distributions are generated using the
procedure described in:
M. E. J. Newman, S. H. Strogatz and D. J. Watts. Random graphs with arbitrary degree distributions and their applications, Physical Review E, 64 (2), 026118-026134, 2001
ISBRA 2012
What percentage of edges should be
Critical (known direct interaction)?No known accurate estimates:• curated network of Ma'ayan et al. (Science, 2005)
– expected to have close to 100% critical edges as they specifically focused on collecting direct interactions only
• Protein interaction networks are expected to be mostly critical – Giot et al., Science, 2003– Han et al., Nature, 2004– Li et al., Science, 2004
• Genetic interactions (e.g., synthetic lethal interactions) – represent compensatory relationships– only a minority are direct interactions.
• Reverse engineering approaches:– lead to networks whose interactions are close to 0% critical
ISBRA 2012
We tried a few small and large values, such as 1%, 2% and 50%, for the percentage of edges that are critical to catch qualitatively all regions of dynamics of the network that are of interest
ISBRA 2012
Tested on about 550 random networks
– # of vertices in the range of about 100 to 1000
– running time for individual networks• seconds to at most a minute
ISBRA 2012
Verify the robustness of performance of our BTR algorithm
– perturb network such they do not change the optimal solution of the original graph
Almost always the solution quality does not change because of this
ISBRA 2012
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
% additional edges = ( ( |E'| / OPT ) - 1 ) * 100
fre
qu
en
cy
of
oc
cu
ren
ce
On an average, we use about 5.5% more edges than the optimum
Performance of our implemented algorithm for BTR on random networks
Albert, DasGupta, Dondi, Kachalo, Sontag, Zelikovsky, Westbrooks, 2007
ISBRA 2012
Other applications NET-SYNTHESIS Synthesizing a Network for T Cell Survival and Death in LGL Leukemia
Backgound• Large Granular Lymphocytes (LGL)
– medium to large size cells with eccentric nuclei and abundant cytoplasm– comprise 10%~15% of the total peripheral blood mononuclear cells– two major lineages
• CD3- natural-killer (NK) cell lineage: ~85% of LGL cells• CD3+ lineage: ~15% of LGL
Kachalo, Zhang, Sontag, Albert, DasGupta, 2008
ISBRA 2012
LGL leukemia
disordered clonal expansion of LGL and their invasions in the marrow, spleen and liver
ISBRA 2012
Background (continued)
Ras: – small GTPase essential for controlling multiple essential signaling
pathways– its deregulation is frequently seen in human cancers
Activation of H-Ras require its farnesylation, which can be blocked by Farnesyltransferase inhibitiors (FTIs)
This envisions FTIs as future drug target for anti-cancer therapies, and several FTIs have entered early phase clinical trials
This observation, together with the finding that Ras is constitutively
activated in leukemic LGL cells, leads to the hypothesis that Ras plays an important role in LGL leukemia, and may functions through influencing Fas/FasL pathway.
ISBRA 2012
we constructed the cell-survival/cell-death regulation-related signaling network, with special interest on the Ras’ effect on apoptosis response through Fas/FasL pathway
Goal: initiate understanding of the interactions between Ras pathway and Fas/FasL pathways, two of the major pathways that regulate cell survival/death decision.
Currently, there is no standard therapy for LGL leukemia. Understanding the mechanism of this disease is crucial for drug/therapy development
Proteins that modulate the Ras-apoptosis response can potentially serve as future reference for drug design and therapeutic-target-molecule search, and this may not be restricted to LGL leukemia
Kachalo, Zhang, Sontag, Albert, DasGupta, 2008
ISBRA 2012
Synthesizing a Network for T Cell Survival and Death in Large Granular Lymphocyte Leukemia
• Synthesized a cell-survival/cell-death regulation-related signaling network from the TRANSPATH 6.0 database, with additional information manually curated from literature search
• 359 vertices of this network represent proteins/protein families and mRNAs participating in pro-survival and Fas-induced apoptosis pathways
• 1295 edges represent regulatory relationships between nodes, including protein interactions, catalytic reactions, transcriptional regulation (no double-causal interactions were known)
• Performing BTR with NET-SYNTHESIS reduced the total edge-number to 873
Kachalo, Zhang, Sontag, Albert, DasGupta, 2008
ISBRA 2012
To focus on pathways that involve the 33 known T-LGL deregulated proteins, we designated vertices that correspond to proteins with no evidence of being changed during T-LGL as pseudo-vertices and deleted the label “Y” for those edges whose both endpoints were pseudo-vertices
Recursively performing “Reduction (faster)” BTR and “Collapse degree-2 pseudonodes” of NET-SYNTHESIS until no edge/node could be further removed simplified the network to 267 nodes and 751 edges.
Kachalo, Zhang, Sontag, Albert, DasGupta, 2008
ISBRA 2012
For further results, see
R. Zhang, M. V. Shah, J. Yang, S. B. Nyland, X. Liu, J. K. Yun, R. Albert, and T. P. Loughran,
Network Model of Survival Signaling in LGL Leukemia
PNAS, 2008
ISBRA 2012
Binary transitive reductions revives two further interesting questions:
– how redundant are biological networks ?• what is redundancy and how to measure it ?
– percentage of edges removed by binary transitive reduction
(Albert, DasGupta, Gitter, Gürsoy, Hegde, Pal, Sivanathan, Sontag, 2011)
– are redundancy and dynamical properties correlated ?
ISBRA 2012
Feedback loops
and
dynamics of biological networks
analyzing behaviors of feedback loops is a long-standing topic in the context of regulation, metabolism, and developments
– e.g., see classical reference works such as
J. Monod and F. Jacob, General conclusions: telenomic mechanisms in cellular metabolism, growth, and differentiation, Cold Spring Harbor Symp. Quant. Biol., 26, 389 401, 1961
ISBRA 2012
Monotone dynamical system
ISBRA 2012
Monotone dynamical system
ISBRA 2012
Monotone systems are “simpler behaved” systems:
• pathological behavior (“chaos”) is ruled out
• even though they may have arbitrarily large dimensionality, monotone systems behave in many ways like one-dimensional systems
– e. g. , in monotone systems• bounded trajectories generically converge to steady states• there are no stable oscillatory behaviors
ISBRA 2012
Associated Signal Transduction Network
v1vj vkvi vn
0)(
xx
f
i
k
0)(
xx
f
j
k
ISBRA 2012
+
-
+
++
+
+ +
- -
-
-
sign-consistent sign-inconsistent
parity: product of signs
sign-consistent: every undirected path between two nodes have same parity
--
( check undirected paths 1 — 4 and 1 — 2 — 3 — 4 )
ISBRA 2012
sign-consistent networks are monotone system
This allows us to define the
“degree of monotonicity” M
of a differential equation system
in the following way:
minimum percentage of edges we need to delete
to make the associated signal transduction network
sign-consistent
(Albert, DasGupta, Gitter, Gürsoy, Hegde, Pal, Sivanathan, Sontag, 2011)
ISBRA 2012
ISBRA 2012
Undirected Labeling Problem (ULP)
needed to compute degree of monotonicity M
Given: undirected graph G=(V,E)
edge labeling function h: E {0,1}
Valid solution: a vertex labeling function f: V {0,1}
Definition: an edge {u,v}E is consistent if
h(u,v) = f(u) + f(v) (mod 2)
Goal: maximize number of consistent edges
Bad news: NP-hard and even MAX-SNP-hard.
DasGupta, Enciso, Sontag, Zhang, 2007
ISBRA 2012
Algorithm for ULP• Solve the following vector program via Semidefinite programming methods:
maximize
subject to: for each vV, xv · xv = 1
for each vV, xv|V|
• Select an uniformly random vector r in the |V|-dimensional unit sphere
• Label each vertex v as 0 if r · xv 0 1 otherwise
It can be easily implemented in MATLAB
DasGupta, Enciso, Sontag, Zhang, 2007
ISBRA 2012
We have two measurable properties:
– (topological) redundancy R • percentage of edges removed by binary transitive reduction
– (dynamical) monotonicity M• minimum percentage of edges we need to delete to make the
associated signal transduction network consistent
M is negatively correlated to R
(Albert, DasGupta, Gitter, Gürsoy, Hegde, Pal, Sivanathan, Sontag, 2011)
ISBRA 2012
Some other conclusions from (Albert, DasGupta, Gitter, Gürsoy, Hegde, Pal, Sivanathan, Sontag, 2011)
• the redundancy measure R is statistically significant
• transcriptional networks are less redundant than signaling networks
• redundancy of C. elegans metabolic network is largely due to currency metabolites
• calculation of redundancy values and minimal networks provides a way to gain insight into predicted orientation of a protein-protein-interaction (PPI) networks
ISBRA 2012
Future Research Questionsin the context of parallel and distributed computing
• Synchronization: – no “global clocks” are known to exist for cellular processes (ignoring
circadian rhythms and some other global timing mechanisms in higher organisms)
• Spatial effects: – localization (nuclear, cytoplasmic, membrane-bound) in cells
• akin to geographical location affecting communication speeds and coordination in distributed computing
ISBRA 2012
List of some relevant referencesR. Albert, B. DasGupta, et al. A New Computationally Efficient Measure of Topological Redundancy of Biological and Social Networks, Physical Review E, 84 (3), 036117, 2011.
B. DasGupta, P. Vera-Licona, E. Sontag. Reverse Engineering of Molecular Networks from a Common Combinatorial Approach, in Algorithms in Computational Molecular Biology: Techniques, Approaches and Applications, John Wiley & Sons, Inc., 2011.
R. Albert, B. DasGupta, E. Sontag. Inference of signal transduction networks from double causal evidence, in Methods in Molecular Biology: Topics in Computational Biology, D. Fenyo (editor), Springer , 2010.
P. Berman, B. DasGupta, M. Karpinski. Approximating Transitive Reduction Problems for Directed Networks, 11 th Algorithms and Data Structures Symposium, 2009.
R. Albert, B. DasGupta, R. Dondi, E. Sontag. Inferring (Biological) Signal Transduction Networks via Transitive Reductions of Directed Graphs, Algorithmica, 51 (2), 129-159, 2008.
S. Kachalo, R. Zhang, E. Sontag, R. Albert, B. DasGupta. NET-SYNTHESIS: A software for synthesis, inference and simplification of signal transduction networks, Bioinformatics, 24 (2), 293-295, 2008.
P. Berman, B. DasGupta, E. Sontag. Algorithmic Issues in Reverse Engineering of Protein and Gene Networks via the Modular Response Analysis Method, Annals of the New York Academy of Sciences, 2007.
R. Albert, B. DasGupta, et al. A Novel Method for Signal Transduction Network Inference from Indirect Experimental Evidence, Journal of Computational Biology, 14 (7), 927-949, 2007.
B. DasGupta, G. A. Enciso, E. Sontag, Y. Zhang. Algorithmic and Complexity Results for Decompositions of Biological Networks into Monotone Subsystems}, Biosystems, 90 (1), 161-178, 2007.
P. Berman, B. DasGupta, E. Sontag. Computational Complexities of Combinatorial Problems With Applications to Reverse Engineering of Biological Networks, in Advances in Computational Intelligence: Theory and Applications, F.-Y. Wang and D. Liu (editors), Series in Intelligent Control and Intelligent Automation, World Scientific publishers, 303-316, 2007.
P. Berman, B. DasGupta, E. Sontag. Randomized Approximation Algorithms for Set Multicover Problems with Applications to Reverse Engineering of Protein and Gene Networks, Discrete Applied Mathematics, 155 (6-7), 733-749, 2007.
ISBRA 2012
Acknowledgments
Thanks to research collaborators for these projects
R. Albert (Penn State) P. Berman (Penn State) R. Dondi (U. of Bergamo)
G. Enciso (UC Irvine) A. Gitter (CMU) G. Gürsoy (UIC)
R. Hegde (UIC) S. Kachalo (UIC) M. Karpinski (Bonn)
P. Pal G. S. Sivanathan (UIC) E. Sontag (Rutgers)P. Vera-Licona (INRIA) K. Westbrooks (GSU) A. Zelikovsky (GSU)R. Zhang (Penn State) Y. Zhang (UIC)
Thanks to National Science Foundation (NSF) for funding:
DBI-1062328 IIS-1064681 IIS-0346973 DBI-0543365
IIS-0610244 CCR-9800086 CNS-0206795 CCF-0208749
Thanks to generous support from DIMACS (Rutgers) during my Sabbatical leave through their special focus on computational and mathematical epidemiology
ISBRA 2012
Thank you for your attention!
Questions?
98