v18 ecocyc analysis of e.coli metabolism
DESCRIPTION
V18 EcoCyc Analysis of E.coli Metabolism. E.coli genome contains 4391 predicted genes, of which 4288 code for proteins. 676 of these genes form 607 enzymes of E.coli small-molecule metabolism. Of those enzymes, 311 are protein complexes, 296 are monomers. Organization of protein complexes. - PowerPoint PPT PresentationTRANSCRIPT
18. Lecture WS 2004/05
Bioinformatics III 1
V18 EcoCyc Analysis of E.coli Metabolism
E.coli genome contains 4391 predicted genes, of which 4288 code for proteins.
676 of these genes form 607 enzymes of E.coli small-molecule metabolism.
Of those enzymes, 311 are protein complexes, 296 are monomers.
Organization of protein complexes. Distribution of subunit counts for all EcoCyc protein complexes. The predominance of monomers, dimers, and tetramers is obvious
Ouzonis, Karp, Genome Res. 10, 568 (2000)
18. Lecture WS 2004/05
Bioinformatics III 2
ReactionsEcoCyc describes 905 metabolic reactions that are catalyzed by E. coli.
Of these reactions, 161 are not involved in small-molecule metabolism,
e.g. they participate in macromolecule metabolism such as DNA replication and
tRNA charging.
Of the remaining 744 reactions, 569 have been assigned to at least one pathway.
Ouzonis, Karp, Genome Res. 10, 568 (2000)
18. Lecture WS 2004/05
Bioinformatics III 3
Reactions
The number of reactions (744) and the number of enzymes (607) differ ...
WHY??
(1) there is no one-to-one mapping between enzymes and reactions –
some enzymes catalyze multiple reactions, and some reactions are catalyzed
by multiple enzymes.
(2) for some reactions known to be catalyzed by E.coli, the enzyme has not yet
been identified.
Ouzonis, Karp, Genome Res. 10, 568 (2000)
18. Lecture WS 2004/05
Bioinformatics III 4
Compounds
The 744 reactions of E.coli small-molecule metabolism involve a total of 791
different substrates.
On average, each reaction contains 4.0 substrates.
Number of reactions containing varying numbers of substrates (reactants plus products).
Ouzonis, Karp, Genome Res. 10, 568 (2000)
Fill out the plot for thedata in EcoCyc(# of reactions vs. substrates)
18. Lecture WS 2004/05
Bioinformatics III 5
Pathways
EcoCyc describes 131 pathways:
energy metabolism
nucleotide and amino acid biosynthesis
secondary metabolism
Pathways vary in length from a
single reaction step to 16 steps
with an average of 5.4 steps.
Length distribution of EcoCyc pathways
Ouzonis, Karp, Genome Res. 10, 568 (2000)
Q: Gilt diese Verteilung (für dieLänge von Schritten einesbiochemischen Pfades) auchfür die elementaren Moden vonE.coli?A: Quatsch.
18. Lecture WS 2004/05
Bioinformatics III 6
Reactions Catalyzed by More Than one Enzyme
Diagram showing the number of reactions
that are catalyzed by one or more enzymes.
Most reactions are catalyzed by one enzyme,
some by two, and very few by more than two
enzymes.
For 84 reactions, the corresponding enzyme is not yet encoded in EcoCyc.
What may be the reasons for isozyme redundancy?
(2) the reaction is easily „invented“; therefore, there is more than one protein family
that is independently able to perform the catalysis (convergence).
(1) the enzymes that catalyze the same reaction are homologs and have
duplicated (or were obtained by horizontal gene transfer),
acquiring some specificity but retaining the same mechanism (divergence)
Ouzonis, Karp, Genome Res. 10, 568 (2000)
18. Lecture WS 2004/05
Bioinformatics III 7
Enzymes that catalyze more than one reaction
Genome predictions usually assign a single enzymatic function.
However, E.coli is known to contain many multifunctional enzymes.
Of the 607 E.coli enzymes, 100 are multifunctional, either having the same active
site and different substrate specificities or different active sites.
Number of enzymes that catalyze one or
more reactions. Most enzymes catalyze
one reaction; some are multifunctional.
The enzymes that catalyze 7 and 9 reactions are purine nucleoside phosphorylase
and nucleoside diphosphate kinase.
Take-home message: The high proportion of multifunctional enzymes implies that
the genome projects significantly underpredict multifunctional enzymes!
Ouzonis, Karp, Genome Res. 10, 568 (2000)
18. Lecture WS 2004/05
Bioinformatics III 8
Reactions participating in more than one pathway
The 99 reactions belonging to multiple
pathways appear to be the intersection
points in the complex network of chemical
processes in the cell.
E.g. the reaction present in 6 pathways corresponds to the reaction catalyzed by
malate dehydrogenase, a central enzyme in cellular metabolism.
Ouzonis, Karp,
Genome Res. 10, 568 (2000)
18. Lecture WS 2004/05
Bioinformatics III 9
Connectivity distributions P(k) for substrates
a, Archaeoglobus fulgidus (archae);
b, E. coli (bacterium);
c, Caenorhabditis elegans (eukaryote),
shown on a log–log plot, counting
separately the incoming (In) and
outgoing links (Out) for each substrate.
kin (kout) corresponds to the number of
reactions in which a substrate
participates as a product (educt).
d, The connectivity distribution
averaged over 43 organisms.
Jeong et al. Nature 407, 651 (2000)
consider in and out-degrees separately because many biochemical reactions arenot reversible.
Warum ist es sinnvoll, in- undout-degree getrennt zu betrachten?
18. Lecture WS 2004/05
Bioinformatics III 10
Properties of metabolic networksa, The histogram of the biochemical pathway
lengths, l, in E. coli.
b, The average path length (diameter) for each
of the 43 organisms.
c, d, Average number of incoming links (c) or
outgoing links (d) per node for each organism.
e, The effect of substrate removal on the
metabolic network diameter (average shortest
biochemical pathway between 2 substrates)
of E. coli. In the top curve (red) the most
connected substrates are removed first. In the
bottom curve (green) nodes are removed
randomly. M = 60 corresponds to 8% of the
total number of substrates in found in E. coli.
The horizontal axis in b– d denotes the number
of nodes in each organism. b–d, Archaea
(magenta), bacteria (green) and eukaryotes
(blue) are shown.
Zeichnen Sie in (e) den erwartetenKurvenverlauf für den Durchmeserfür Entfernung eines Hub-Proteins bzw. eines zufälligen Proteins ein.
18. Lecture WS 2004/05
Bioinformatics III 11
Stoichiometric matrix
Stoichiometric matrix:
A matrix with reaction stochio-
metries as columns and
metabolite participations as
rows.
The stochiometric matrix is an
important part of the in silico
model.
With the matrix, the methods of
extreme pathway and
elementary mode analyses can
be used to generate a unique
set of pathways P1, P2, and P3
(see future lecture).
Papin et al. TIBS 28, 250 (2003)
Hierzu kommt definitv eineAufgabe a la: stellen Sie fürdas folgende Netzwerk diestöchiometrische Matrix auf.
18. Lecture WS 2004/05
Bioinformatics III 12
Flux balancingAny chemical reaction requires mass conservation.
Therefore one may analyze metabolic systems by requiring mass conservation.
Only required: knowledge about stoichiometry of metabolic pathways and
metabolic demands
For each metabolite:
Under steady-state conditions, the mass balance constraints in a metabolic
network can be represented mathematically by the matrix equation:
S · v = 0
where the matrix S is the m n stoichiometric matrix,
m = the number of metabolites and n = the number of reactions in the network.
The vector v represents all fluxes in the metabolic network, including the internal
fluxes, transport fluxes and the growth flux.
)( dtransporteuseddegradeddsynthesizei
i VVVVdt
dXv
Was ist die Grundannahme für die Anwendungvon Flux Balance Analysis, d.h. Lösen derGleichung S v = 0 ?
18. Lecture WS 2004/05
Bioinformatics III 13
Flux balance analysis
Since the number of metabolites is generally smaller than the number of reactions
(m < n) the flux-balance equation is typically underdetermined.
Therefore there are generally multiple feasible flux distributions that satisfy the mass
balance constraints.
The set of solutions are confined to the nullspace of matrix S.
To find the „true“ biological flux in cells ( e.g. Heinzle, Huber, UdS) one needs
additional (experimental) information,
or one may impose constraints
on the magnitude of each individual metabolic flux.
The intersection of the nullspace and the region defined by those linear inequalities
defines a region in flux space = the feasible set of fluxes.
iii v
18. Lecture WS 2004/05
Bioinformatics III 14
Feasible solution set for a metabolic reaction network
(A) The steady-state operation of the metabolic network is restricted to the region
within a cone, defined as the feasible set. The feasible set contains all flux vectors
that satisfy the physicochemical constrains. Thus, the feasible set defines the
capabilities of the metabolic network. All feasible metabolic flux distributions lie
within the feasible set, and
(B) in the limiting case, where all constraints on the metabolic network are known,
such as the enzyme kinetics and gene regulation, the feasible set may be reduced
to a single point. This single point must lie within the feasible set.
Edwards & Palsson PNAS 97, 5528 (2000)
18. Lecture WS 2004/05
Bioinformatics III 15
SummaryFBA analysis constructs the optimal network utilization simply using
stoichiometry of metabolic reactions and capacity constraints.
For E.coli the in silico results are consistent with experimental data.
FBA shows that in the E.coli metabolic network there are relatively few critical
gene products in central metabolism.
However, the ability to adjust to different environments (growth conditions) may be
dimished by gene deletions.
FBA identifies „the best“ the cell can do, not how the cell actually behaves under a
given set of conditions. Here, survival was equated with growth.
FBA does not directly consider regulation or regulatory constraints on the
metabolic network. This can be treated separately (see future lecture).
Edwards & Palsson PNAS 97, 5528 (2000)
18. Lecture WS 2004/05
Bioinformatics III 16
V19 Extreme Pathwaysintroduced into metabolic analysis by the lab of Bernard Palsson
(Dept. of Bioengineering, UC San Diego). The publications of this lab
are available at http://gcrg.ucsd.edu/publications/index.html
The extreme pathway
technique is based
on the stoichiometric
matrix representation
of metabolic networks.
All external fluxes are
defined as pointing outwards.
Schilling, Letscher, Palsson,
J. theor. Biol. 203, 229 (2000)
18. Lecture WS 2004/05
Bioinformatics III 17
Extreme Pathways – theorem
Theorem. A convex flux cone has a set of systemically independent generating
vectors. Furthermore, these generating vectors (extremal rays) are unique up to
a multiplication by a positive scalar. These generating vectors will be called
„extreme pathways“.
(1) The existence of a systemically independent generating set for a cone is
provided by an algorithm to construct extreme pathways (see below).
(2) uniqueness?
Let {p1, ..., pk} be a systemically independent generating set for a cone.
Then follows that if pj = c´+ c´´ both c´and c´´ are positive multiples of pj.
Schilling, Letscher, Palsson,
J. theor. Biol. 203, 229 (2000)
18. Lecture WS 2004/05
Bioinformatics III 18
Extreme Pathways – algorithm - setup
The algorithm to determine the set of extreme pathways for a reaction network
follows the pinciples of algorithms for finding the extremal rays/ generating
vectors of convex polyhedral cones.
Combine n n identity matrix (I) with the transpose of the stoichiometric
matrix ST. I serves for bookkeeping.
Schilling, Letscher, Palsson,
J. theor. Biol. 203, 229 (2000)
S
I ST
18. Lecture WS 2004/05
Bioinformatics III 19
separate internal and external fluxes
Examine constraints on each of the exchange fluxes as given by
j bj j
If the exchange flux is constrained to be positive do nothing.
If the exchange flux is constrained to be negative multiply the
corresponding row of the initial matrix by -1.
If the exchange flux is unconstrained move the entire row to a temporary
matrix T(E). This completes the first tableau T(0).
T(0) and T(E) for the example reaction system are shown on the previous slide.
Each element of this matrices will be designated Tij.
Starting with x = 1 and T(0) = T(x-1) the next tableau is generated in the following
way:
Schilling, Letscher, Palsson,
J. theor. Biol. 203, 229 (2000)
18. Lecture WS 2004/05
Bioinformatics III 20
idea of algorithm
(1) Identify all metabolites that do not have an unconstrained exchange flux
associated with them.
The total number of such metabolites is denoted by .
For the example, this is only the case for metabolite C ( = 1).
What is the main idea?
- We want to find balanced extreme pathways
that don‘t change the concentrations of
metabolites when flux flows through
(input fluxes are channelled to products not to
accumulation of intermediates).
- The stochiometrix matrix describes the coupling of each reaction to the
concentration of metabolites X.
- Now we need to balance combinations of reactions that leave concentrations
unchanged. Pathways applied to metabolites should not change their
concentrations the matrix entries
need to be brought to 0.Schilling, Letscher, Palsson,
J. theor. Biol. 203, 229 (2000)
18. Lecture WS 2004/05
Bioinformatics III 21
keep pathways that do not change concentrations of internal metabolites
(2) Begin forming the new matrix T(x) by copying
all rows from T(x – 1) which contain a zero in the
column of ST that corresponds to the first
metabolite identified in step 1, denoted by index c.
(Here 3rd column of ST.)
Schilling, Letscher, Palsson, J. theor. Biol. 203, 229 (2000)
1 -1 1 0 0 0
1 0 -1 1 0 0
1 0 1 -1 0 0
1 0 0 -1 1 0
1 0 0 1 -1 0
1 0 0 -1 0 1
1 -1 1 0 0 0
T(0) =
T(1) =
+
Hierzu kommt ebenfallseine Rechenaufgabe.Wir werden das Endresultatangeben, sodass Sie IhreLösung überprüfen können.
18. Lecture WS 2004/05
Bioinformatics III 22
balance combinations of other pathways
(3) Of the remaining rows in T(x-1) add together
all possible combinations of rows which contain
values of the opposite sign in column c, such that
the addition produces a zero in this column.
Schilling, et al.
JTB 203, 229
1 -1 1 0 0 0
1 0 -1 1 0 0
1 0 1 -1 0 0
1 0 0 -1 1 0
1 0 0 1 -1 0
1 0 0 -1 0 1
T(0) =
T(1) =
1 0 0 0 0 0 -1 1 0 0 0
0 1 1 0 0 0 0 0 0 0 0
0 1 0 1 0 0 0 -1 0 1 0
0 1 0 0 0 1 0 -1 0 0 1
0 0 1 0 1 0 0 1 0 -1 0
0 0 0 1 1 0 0 0 0 0 0
0 0 0 0 1 1 0 0 0 -1 1
18. Lecture WS 2004/05
Bioinformatics III 23
remove “non-orthogonal” pathways
(4) For all of the rows added to T(x) in steps 2 and 3 check to make sure that no
row exists that is a non-negative combination of any other sets of rows in T(x) .
One method used is as follows:
let A(i) = set of column indices j for with the elements of row i = 0.
For the example above Then check to determine if there exists
A(1) = {2,3,4,5,6,9,10,11} another row (h) for which A(i) is a
A(2) = {1,4,5,6,7,8,9,10,11} subset of A(h).
A(3) = {1,3,5,6,7,9,11}
A(4) = {1,3,4,5,7,9,10} If A(i) A(h), i h
A(5) = {1,2,3,6,7,8,9,10,11} where
A(6) = {1,2,3,4,7,8,9} A(i) = { j : Ti,j = 0, 1 j (n+m) }
then row i must be eliminated from T(x)
Schilling et al.
JTB 203, 229
Q: was ist der Sinn hiervon?A: Elementarität bzw. Minimalität.in V24 heisst A(i) nun Z(i), also der zero set.
18. Lecture WS 2004/05
Bioinformatics III 24
repeat steps for all internal metabolites
(5) With the formation of T(x) complete steps 2 – 4 for all of the metabolites that do
not have an unconstrained exchange flux operating on the metabolite,
incrementing x by one up to . The final tableau will be T().
Note that the number of rows in T () will be equal to k, the number of extreme
pathways.
Schilling et al.
JTB 203, 229
18. Lecture WS 2004/05
Bioinformatics III 25
balance external fluxes
(6) Next we append T(E) to the bottom of T(). (In the example here = 1.)
This results in the following tableau:
Schilling et al.
JTB 203, 229
T(1/E) =
1 -1 1 0 0 0
1 1 0 0 0 0 0
1 1 0 -1 0 1 0
1 1 0 -1 0 1 0
1 1 0 1 0 -1 0
1 1 0 0 0 0 0
1 1 0 0 0 -1 1
1 -1 0 0 0 0
1 0 -1 0 0 0
1 0 0 0 -1 0
1 0 0 0 0 -1
18. Lecture WS 2004/05
Bioinformatics III 26
balance external fluxes
(7) Starting in the n+1 column (or the first non-zero column on the right side),
if Ti,(n+1) 0 then add the corresponding non-zero row from T(E) to row i so as to
produce 0 in the n+1-th column.
This is done by simply multiplying the corresponding row in T(E) by Ti,(n+1) and
adding this row to row i .
Repeat this procedure for each of the rows in the upper portion of the tableau so
as to create zeros in the entire upper portion of the (n+1) column.
When finished, remove the row in T(E) corresponding to the exchange flux for the
metabolite just balanced.
Schilling et al.
JTB 203, 229
18. Lecture WS 2004/05
Bioinformatics III 27
balance external fluxes
(8) Follow the same procedure as in step (7) for each of the columns on the right
side of the tableau containing non-zero entries.
(In this example we need to perform step (7) for every column except the middle
column of the right side which corresponds to metabolite C.)
The final tableau T(final) will contain the transpose of the matrix P containing the
extreme pathways in place of the original identity matrix.
Schilling et al.
JTB 203, 229
18. Lecture WS 2004/05
Bioinformatics III 28
pathway matrix
T(final) =
PT =
Schilling et al.
JTB 203, 229
1 -1 1 0 0 0 0 0 0
1 1 0 0 0 0 0 0
1 1 -1 1 0 0 0 0 0 0
1 1 -1 1 0 0 0 0 0 0
1 1 1 -1 0 0 0 0 0 0
1 1 0 0 0 0 0 0
1 1 -1 1 0 0 0 0 0 0
1 0 0 0 0 0 -1 1 0 0
0 1 1 0 0 0 0 0 0 0
0 1 0 1 0 0 0 -1 1 0
0 1 0 0 0 1 0 -1 0 1
0 0 1 0 1 0 0 1 -1 0
0 0 0 1 1 0 0 0 0 0
0 0 0 0 1 1 0 0 -1 1
v1 v2 v3 v4 v5 v6 b1 b2 b3 b4
p1 p7 p3 p2 p4 p6 p5
18. Lecture WS 2004/05
Bioinformatics III 29
Extreme Pathways for model system
Schilling et al.
JTB 203, 229
1 0 0 0 0 0 -1 1 0 0
0 1 1 0 0 0 0 0 0 0
0 1 0 1 0 0 0 -1 1 0
0 1 0 0 0 1 0 -1 0 1
0 0 1 0 1 0 0 1 -1 0
0 0 0 1 1 0 0 0 0 0
0 0 0 0 1 1 0 0 -1 1
v1 v2 v3 v4 v5 v6 b1 b2 b3 b4
p1 p7 p3 p2 p4 p6 p5
2 pathways p6 and p7 are not shown (right below) because all exchange fluxes with the exterior are 0.Such pathways have no net overall effect on the functional capabilities of the network.They belong to the cycling of reactions v4/v5 and v2/v3.
18. Lecture WS 2004/05
Bioinformatics III 30
How reactions appear in pathway matrix
In the matrix P of extreme pathways, each column is an EP and each row
corresponds to a reaction in the network.
The numerical value of the i,j-th element corresponds to the relative flux level
through the i-th reaction in the j-th EP.
Papin, Price, Palsson,
Genome Res. 12, 1889 (2002)
PPP TLM
18. Lecture WS 2004/05
Bioinformatics III 31
Papin, Price, Palsson, Genome Res. 12, 1889 (2002)
A symmetric Pathway Length Matrix PLM can be calculated:
where the values along the diagonal correspond to the length of the EPs.
PPP TLM
Properties of pathway matrix
The off-diagonal terms of PLM are the number of reactions that a pair of extreme
pathways have in common.
Wie können Sie aus der Pathway Matrix Pdie Länge der beteiligten Pfade berechnen?
18. Lecture WS 2004/05
Bioinformatics III 32
Papin, Price, Palsson, Genome Res. 12, 1889 (2002)
One can also compute a reaction participation matrix PPM from P:
where the diagonal correspond to the number of pathways in which the given
reaction participates.
TPM PPP
Properties of pathway matrix
Wie können Sie aus der Pathway Matrix Permitteln, an wievielen Pfaden eine Reaktion beteiligt ist?
18. Lecture WS 2004/05
Bioinformatics III 33
Summary (extreme pathways)
Price et al. Biophys J 84, 794 (2003)
Extreme pathway analysis provides a mathematically rigorous way to dissect
complex biochemical networks.
The matrix products PT P and PT P are useful ways to interpret pathway lengths
and reaction participation.
However, the number of computed vectors may range in the 1000sands.
Therefore, meta-methods (e.g. singular value decomposition) are required that
reduce the dimensionality to a useful number that can be inspected by humans.
Single value decomposition may be one useful method ... and there are more to
come. Single value decomposition kommt nicht dran.
18. Lecture WS 2004/05
Bioinformatics III 34
Computational metabolomics: modelling constraints
Price et al. Nature Rev Microbiol 2, 886 (2004)
Surviving (expressed) phenotypes must satisfy constraints imposed on the molecular
functions of a cell, e.g. conservation of mass and energy.
Fundamental approach to understand biological systems: identify and formulate
constraints.
Important constraints of cellular function:
(1) physico-chemical constraints
(2) Topological constraints
(3) Environmental constraints
(4) Regulatory constraints
18. Lecture WS 2004/05
Bioinformatics III 35
Physico-chemical constraints
Price et al. Nature Rev Microbiol 2, 886 (2004)
These are „hard“ constraints: Conservation of mass, energy and momentum.
Contents of a cell are densely packed viscosity can be 100 – 1000 times higher
than that of water
Therefore, diffusion rates of macromolecules in cells are slower than in water.
Many molecules are confined inside the semi-permeable membrane high
osmolarity. Need to deal with osmotic pressure (e.g. Na+K+ pumps)
Reaction rates are determined by local concentrations inside cells
Enzyme-turnover numbers are generally less than 104 s-1. Maximal rates are equal to
the turnover-number multiplied by the enzyme concentration.
Biochemical reactions are driven by negative free-energy change in forward
direction.
18. Lecture WS 2004/05
Bioinformatics III 36
Topological constraints
Price et al. Nature Rev Microbiol 2, 886 (2004)
The crowding of molecules inside cells leads to topological (3D)-constraints that affect
both the form and the function of biological systems.
18. Lecture WS 2004/05
Bioinformatics III 37
Environmental constraints
Price et al. Nature Rev Microbiol 2, 886 (2004)
Environmental constraints on cells are time and condition dependent:
Nutrient availability, pH, temperature, osmolarity, availability of electron acceptors.
E.g. Heliobacter pylori lives in the human stomach at pH=1 needs to produce
NH3 at a rate that will maintain its immediate surrounding at a pH that is sufficiently
high to allow survival.
Ammonia is made from elementary nitrogen H. pylori has adapted by using amino
acids instead of carbohydrates as its primary carbon source.
Q: welche Methode ist besser geeignet, den Effekt vonZwangsbedingungen wie die Einschränkung der Reaktionsrate eines Enzym auf das Verhalten eines Metabolismus zu beschreiben?
FBA oder Extreme Pathways?
A: FBA. Modellierung als Einschränkung des Flusses v1
111 v
18. Lecture WS 2004/05
Bioinformatics III 38
Regulatory constraints
Price et al. Nature Rev Microbiol 2, 886 (2004)
Regulatory constraints are self-imposed by the organism and are subject to
evolutionary change they are no „hard“ constraints.
Regulatory constraints allow the cell to eliminate suboptimal phenotypic states and to
confine itself to behaviors of increased fitness.
Nennen Sie 5 Zwangsbedingungen, die für die Modellierungzellulärer Netzwerke wichtig sein können.
18. Lecture WS 2004/05
Bioinformatics III 39
Mathematical formation of constraints
Price et al. Nature Rev Microbiol 2, 886 (2004)
There are two fundamental types of constraints: balances and bounds.
Balances are constraints that are associated
with conserved quantities as energy, mass, redox potential, momentum
or with phenomena such as solvent capacity, electroneutrality and osmotic pressure.
Bounds are constraints that limit numerical ranges of individual variables and
parameters such as concentrations, fluxes or kinetic constants.
Both bound and balance constraints limit the allowable functional states of
reconstructed cellular metabolic networks.
18. Lecture WS 2004/05
Bioinformatics III 40
Tools for analyzing network states
Price et al. Nature Rev Microbiol
2, 886 (2004)
The two steps that are used to form a solution space — reconstruction and the imposition of governing constraints — are illustrated in the centre of the figure. Several methods are being developed at various laboratories to analyse the solution space. Ci and Cj concentrations of compounds i and j; EP, extreme pathway; vi and vj fluxes through reactions i and j; v1 –v3 flux through reactions 1-3; vnet, net flux through loop.
18. Lecture WS 2004/05
Bioinformatics III 41
Determining optimal states
Price et al. Nature Rev Microbiol 2, 886 (2004)
18. Lecture WS 2004/05
Bioinformatics III 42
Characterizing the whole solution space
Price et al. Nature Rev Microbiol 2, 886 (2004)
18. Lecture WS 2004/05
Bioinformatics III 43
V20 Applications of Flux Balance Analysis
Many aspects of metabolism are currently being studied to understand its
hierarchical and modular organization:
- Topology (Jeong et al. V18)
- Reaction fluxes (Almaas et al. - today)
- Epistasis (Segrè et al. - today)
- Coupling to gene expression (V22)
18. Lecture WS 2004/05
Bioinformatics III 44
vj is the flux of reaction j and Sij is the stoichiometric coefficient of reaction j.
We denote the mass carried by reaction j producing (consuming) metabolite i by
Fluxes vary widely: e.g. dimensionless flux of succinyl coenzyme A synthetase
reaction is 0.185, whereas the flux of the aspartate oxidase reaction is 10.000
times smaller, 2.2 10-5.
E.coli in silico
j
jiji vSAdt
d0
Stochiometric matrix Sij for E.coli strain MG1655 containing 537 metabolites Ai
and 739 reactions j.
Apply flux balance analysis: in a steady state the concentrations of all the
metabolites are time independent
jijij vSv ˆWie können Sie die Beteiligung einerReaktion j an der Produktion einesMetaboliten i charakterisieren?
18. Lecture WS 2004/05
Bioinformatics III 45
Overall flux organization of E.coli metabolic network
a, Flux distribution for optimized biomass production
on succinate (black) and glutamate (red) substrates.
The solid line corresponds to the power-law fit
that a reaction has flux v
P(v) (v + v0)- , with v0 = 0.0003 and = 1.5.
d, The distribution of experimentally determined fluxes
from the central metabolism of E. coli shows power-
law behaviour as well, with a best fit to P(v) v- with
= 1.
Both computed and experimental flux distribution
show wide spectrum of fluxes.
Almaar et al., Nature 427, 839 (2004)
18. Lecture WS 2004/05
Bioinformatics III 46
Overall flux organization of E.coli metabolic network
The observed flux distribution is compatible with two different potential local flux
structures:
(a) a homogenous local organization would imply that all reactions producing
(consuming) a given metabolite have comparable fluxes
(b) a more delocalized „high-flux backbone (HFB)“ is expected if the local flux
organisation is heterogenous such that each metabolite has a dominant source
(consuming) reaction.
Schematic illustration of the hypothetical scenario in which (a) all fluxes have
comparable activity, in which case we expect kY(k) 1 and (b) the majority of the
flux is carried by a single incoming or outgoing reaction, for which we should have
kY(k) k . Almaar et al., Nature 427, 839 (2004)
18. Lecture WS 2004/05
Bioinformatics III 47
Overall flux organization of E.coli metabolic network
To distinguish between these 2 schemes for each metabolite i produced
(consumed) by k reactions, define
Almaar et al., Nature 427, 839 (2004)
2
11ˆ
ˆ,
k
jk
l ilv
ijv
ikY
where vij is the mass carried by reaction j which produces (consumes) metabolite i.
If all reactions producing (consuming) metabolite i have comparable vij values,
Y(k,i) scales as 1/k.
If, however, the activity of a single reaction dominates we expect
Y(k,i) 1 (independent of k).
Beschreiben Sie die Strategie von Barabasi und Mitarbeitern,zwischen den Szenarien (a) und (b) zu unterscheiden.
18. Lecture WS 2004/05
Bioinformatics III 48
Characterizing the local inhomogeneity of the flux net
a, Measured kY(k) shown as a function of k for
incoming and outgoing reactions, averaged over
all metabolites, indicates that Y(k) k-0.27, as the
solid line has the slope = 0.73.
Inset shows non-zero mass flows, v^ij, producing
(consuming) FAD on a glutamate-rich substrate.
an intermediate behavior is found between the
two extreme cases.
the large-scale inhomogeneity observed in the
overall flux distribution is also increasingly valid at
the level of the individual metabolites.
The more reactions that consume (produce) a
given metabolite, the more likely it is that a single
reaction carries most of the flux, see FAD.Almaar et al., Nature 427, 839 (2004)
18. Lecture WS 2004/05
Bioinformatics III 49
Clean up metabolic network
Simple algorithm removes for each metabolite systematically all reactions
but the one providing the largest incoming (outgoing) flux distribution.
The algorithm uncovers the „high-flux-backbone“ of the metabolism,
a distinct structure of linked reactions that form a giant component
with a star-like topology.
Almaar et al., Nature 427, 839 (2004)
18. Lecture WS 2004/05
Bioinformatics III 50
Maximal flow networks
glutamate rich succinate rich substrates
Directed links: Two metabolites (e.g. A and B) are connected with a directed link pointing
from A to B only if the reaction with maximal flux consuming A is the reaction with maximal
flux producing B.
Shown are all metabolites that have at least one neighbour after completing this procedure.
The background colours denote different known biochemical pathways.
Almaar et al., Nature 427, 839 (2004)
18. Lecture WS 2004/05
Bioinformatics III 51
FBA-optimized network on glutamate-rich substrateHigh-flux backbone for FBA-optimized metabolic
network of E. coli on a glutamate-rich substrate.
Metabolites (vertices) coloured blue have at least one
neighbour in common in glutamate- and succinate-rich
substrates, and those coloured red have none.
Reactions (lines) are coloured blue if they are identical
in glutamate- and succinate-rich substrates, green if a
different reaction connects the same neighbour pair,
and red if this is a new neighbour pair. Black dotted
lines indicate where the disconnected pathways, for
example, folate biosynthesis, would connect to the
cluster through a link that is not part of the HFB. Thus,
the red nodes and links highlight the predicted changes
in the HFB when shifting E. coli from glutamate- to
succinate-rich media. Dashed lines indicate links to the
biomass growth reaction.
Almaar et al., Nature 427, 839 (2004)
(1) Pentose Phospate (11) Respiration
(2) Purine Biosynthesis (12) Glutamate Biosynthesis (20) Histidine Biosynthesis
(3) Aromatic Amino Acids (13) NAD Biosynthesis (21) Pyrimidine Biosynthesis
(4) Folate Biosynthesis (14) Threonine, Lysine and Methionine Biosynthesis
(5) Serine Biosynthesis (15) Branched Chain Amino Acid Biosynthesis
(6) Cysteine Biosynthesis (16) Spermidine Biosynthesis (22) Membrane Lipid Biosynthesis
(7) Riboflavin Biosynthesis (17) Salvage Pathways (23) Arginine Biosynthesis
(8) Vitamin B6 Biosynthesis (18) Murein Biosynthesis (24) Pyruvate Metabolism
(9) Coenzyme A Biosynthesis (19) Cell Envelope Biosynthesis (25) Glycolysis
(10) TCA Cycle
18. Lecture WS 2004/05
Bioinformatics III 52
Interpretation
Only a few pathways appear disconnected indicating that although these pathways
are part of the HFB, their end product is only the second-most important source for
another HFB metabolite.
Groups of individual HFB reactions largely overlap with traditional
biochemical partitioning of cellular metabolism.
Almaar et al., Nature 427, 839 (2004)
18. Lecture WS 2004/05
Bioinformatics III 53
How sensitive is the HFB to changes in the environment?
Almaar et al., Nature 427, 839 (2004)
b, Fluxes of individual
reactions for glutamate-rich
and succinate-rich conditions.
Reactions with negligible flux
changes follow the diagonal
(solid line).
Some reactions are turned off
in only one of the conditions
(shown close to the
coordinate axes). Reactions
belonging to the HFB are
indicated by black squares,
the rest are indicated by blue
dots. Reactions in which the
direction of the flux is
reversed are coloured green.
Only the reactions in the high-flux territory
undergo noticeable differences!
Type I: reactions turned on in one conditions and
off in the other (symbols).
Type II: reactions remain active but show an
orders-in-magnitude shift in flux under the two
different growth conditions.
18. Lecture WS 2004/05
Bioinformatics III 54
Flux distributions for individual reactions
Shown is the flux distribution for four selected
E. coli reactions in a 50% random environment.
a Triosphosphate isomerase;
b carbon dioxide transport;
c NAD kinase;
d guanosine kinase.
Reactions on the v curve (small fluxes)
have unimodal/gaussian distributions (a
and c). Shifts in growth-conditions only lead
to small changes of their flux values.
Reactions off this curve have multimodal
distributions (b and d), showing several
discrete flux values under diverse
conditions. Under different growth
conditions they show several discrete and
distinct flux values. Almaar et al., Nature 427, 839 (2004)
18. Lecture WS 2004/05
Bioinformatics III 55
Summary Metabolic network use is highly uneven (power-law distribution) at the global level
and at the level of the individual metabolites.
Whereas most metabolic reactions have low fluxes, the overall activity of the
metabolism is dominated by several reactions with very high fluxes.
E. coli responds to changes in growth conditions by reorganizing the rates of
selected fluxes predominantly within this high-flux backbone.
Apart from minor changes, the use of the other pathways remains unaltered.
These reorganizations result in large, discrete changes in the fluxes of the HFB
reactions.
may represent a universal feature of metabolic activity in all cells.
Implications for metabolic engineering?
18. Lecture WS 2004/05
Bioinformatics III 56
A global view of epistasisThe relationship between genotype and phenotype is expected to be nonlinear
for most common human diseases.
Part of this complexity can be attributed to epistasis – the effects of a given gene
on a trait can be dependent on one or more other genes.
Moore, Nature Gen 37, 13 (2005)
18. Lecture WS 2004/05
Bioinformatics III 57
Why is there epistasis?
Exact origin is unknown.
Suspect: canalization stabilizes phenotypes through natural selection.
Hypothesis: phenotypes should be stable in the presence of mutations
they must have an underlying genetic architecture that is comprised of networks
of genes that are redundant and robust.
Therefore, substantial effects on the phenotype are observed only when there are
multiple mutational hits to the gene network.
creates dependencies among the genes in the network.
Approach: systematic study epistatis in well-understood model organisms.
Here: single and double knockouts of 890 metabolic genes of yeast.
Estimate growth phenotypes of all knockouts by metabolic flux analysis.
Moore, Nature Gen 37, 13 (2005)
18. Lecture WS 2004/05
Bioinformatics III 58
How does one measure epistasis?
WX, WY: fitness of individual mutants WXY: fitness of double mutant
Here: compute maximal rate of biomass Scaled espsilon can distinguish between
^production Vgrowth of all networks. WX = 0.7, WY = 0.7, WXY = 0.54
For deletion of gene X, define fitness as WX = 0.54, WY = 0.91, WXY = 0.54
In both cases WX WY = 0.49 and the
non-scaled epsilon cannot distinguish
both cases.
Moore, Nature Gen 37, 13 (2005)
typewildgrowth
Xgrowth
X V
VW
erschwerend/risikoerhöhend
18. Lecture WS 2004/05
Bioinformatics III 59
Examples of gene deletion interactions
full dots: essential biomass components,
need to be synthesized for the cell to be
able to grow.
a idealized case: all biomass components
are derived from independent nutrient
sources.
WX = SX, WY = SY,
WXY = min(WX,WY)
b more realistic case, parallel and mutually
required pathways demand optimal
allocation of a common nutrient.
c simplified example of synthetic
lethality: a single biomass component can
be produced through 2 alternative routes.
Single mutants can grow, double mutant is
lethal.
d complete buffering WXY = min(WX,WY)Moore, Nature Gen 37, 13 (2005)
effect of gene deletion X : reduced
„effective stochiometry“ SX.
18. Lecture WS 2004/05
Bioinformatics III 60
Fitness of double mutantsFitness values of all possible double mutants relative to the
expected no-epistasis values are calculated with FBA over all
pairs of enzyme deletions (excluding essential genes and
gene-deletions with no phenotypic consequence).
The trimodal distribution is uncovered by transforming the
(a) nonscaled epistasis level = WXY - WXWY into
(b) the new scale defined in Table 1.
The new espilon values are used to classify the interactions
into buffering (green) at > +; aggravating (red), including
synthetic lethal at = -1 and strong synthetic sick at < -;
and no epistasis (black). Here we used (-, +) = (-0.25, 0.95).
Relatively few interaction pairs (gray) fall in a nondecisive
area. Although the = 1 point is the outermost value in the
FBA model, in experimental measurements compensatory
interactions could exceed this buffering case.
Segrè et al., Nature Genet. 37, 77 (2005)
18. Lecture WS 2004/05
Bioinformatics III 61
Classification of gene interactions
Segrè et al., Nature Genet. 37, 77 (2005)
(c) The classification of gene interactions is also evident in a scatter
plot showing / versus normalized to the effect of the double
mutation, 1 - WXY. The ratio between the x and y axes is equal to the
scaled epistasis level .
(d) A schematic metabolic network showing simple examples of
buffering, aggravating and multiplicative interactions (green, red and
black arcs, respectively) between gene deletions (X). The synthesis
of biomass (full square) from biomass components (full dots)
requires an optimal allocation of a common nutrient (empty square)
through intermediate metabolites (empty dots). Additional reactions
(dotted arrow) may account for more subtle buffering interactions in
the complete network.
(e,f) Distribution of epistasis in experimental data of fitness
measurements of double and single mutants in RNA viruses. The
unimodal distribution of (e) diverges into a trimodal distribution
when is used (f). While these data support the FBA-derived
trimodal distribution in the [-1,1] range of , the presence of pairs
with > 1 stresses the relevance of the additional class of such
compensatory interactions (31 pairs, not shown). In viewing these
results, one should keep in mind that the data are based on a
heterogeneous collection of diverse experiments and may not
represent a truly random set of mutations.
Welche Effekte erwarten Siefür die gleichzeitigeEntfernung der beiden roten/grünen/schwarzen Proteine?
18. Lecture WS 2004/05
Bioinformatics III 62
Epistatic interactions between genesEpistatic interactions between genes classified by functional
annotation groups tend to be of a single sign (i.e.
monochromatic).
(a) Representation of the number of buffering and aggravating
interactions within and between groups of genes defined by
common preassigned annotation from the FBA model. The radii
of the pies represent the total number of interactions (ranging
logarithmically from 1 in the smallest pies to 35 in the largest).
The red and green pie slices reflect the numbers of aggravating
and buffering interactions, respectively. Monochromatic
interactions, represented by whole green or red pies, are much
more common than would be expected by chance.
(b) Sensitivity analysis of the prevalence of monochromaticity
with respect to changes in the growth conditions In each matrix,
an input parameter was modified with respect to the nominal
analysis: O−, oxygen concentration; C−, carbon concentration;
AC, acetate (instead of glucose) supplied as carbon source.
The color of the matrix element indicates the kind of interactions
observed between the genes in different annotation groups:
red for pure aggravating, green for pure buffering and yellow for
mixed links.
Segrè et al., Nature Genet. 37, 77 (2005)
18. Lecture WS 2004/05
Bioinformatics III 63
Prism algorithm
(a) The algorithm arranges a network of aggravating (red) and buffering (green)
interactions into modules whose genes interact with one another in a strictly
monochromatic way. This classification allows a system-level description of
buffering and aggravating interactions between functional modules.
Two networks with the same topology, but different permutations of link colors,
can have different properties of monochromatic clusterability: permuting links 3−4
with 2−4 transforms a 'clusterable' graph (b) into a 'nonclusterable' one (c).
Segrè et al., Nature Genet. 37, 77 (2005)
18. Lecture WS 2004/05
Bioinformatics III 64
Prism algorithm
Prism stands for „pairwise reduction into subgraphs monochromatically“
Examples of monochromatic clustering of 3
toy networks using Prism algorithm. Prism
performs agglomerative clustering with the
additional feature of avoiding, when
possible, the generation of clusters that
interact with each other with both
aggravating and buffering epistatic links.
(a) And (b) show examples of networks
which are clustered with no monochromatic
violations, i.e. with Qmodule = 0.
(c) provides an example for which a
monochromatic solution does not exist. In
this network, any two pairs of genes
clustered in the first step will cause a
monochromatic violation. The Prism
algorithm would find the solution shown,
which have a total of Qmodule = 1 violations.
18. Lecture WS 2004/05
Bioinformatics III 65
Ax,y is computed as the linear combination
of a direct affinity and an associative affinity
Agglomerative clustering
N
Z
ZYZX
YXYXyYxX
ayx
yYxX yx
YXdyx
N
EEaaA
nn
EA
1
,,
,,,
,
,
,
,
21wheremax
Start: assign each gene to a distinct cluster.
Iteration: at each sequential clustering step, pairs of clusters are combined
depending on the biological proximity, or affinity Ax,y between cluster x (size nx)
and cluster y (size ny).
ayx
dyxyx AAA ,,, 1
End: when the whole network is covered.
The epistasis network, EX,Y is defined as a discretization of the X,Y values based
on the cut-off parameters . EX,Y = -1 if X,Y < - and EX,Y if X,Y > +.
EX,Y = 0 otherwise.
18. Lecture WS 2004/05
Bioinformatics III 66
Agglomerative clustering
At every step, each cluster (x,y) is also assigned an integer Cx,y counting how
many non-monochromatic connections would be formed if clusters x and y were
joined (i.e. the number of clusters z that have buffering links with x and
aggravating links with y, or vice versa).
The algorithm hence identifies the set of (x,y) pairs for which Cx,y = Cm, where
The set contains all the candidate pairs that, if joined at the next step, would
cause the minimal possible number of monochromatic conflicts. The pair with
highest Ax,y in is then chosen as the pair of clusters to be combined.
At a given step, monochromaticity is preserved if Cm = 0.
The final clustering solution is assigned a total module-module monochromaticity
violation number, Qmodule = Cm, where the sum is over all the clustering steps.
yxyx
m CC ,,min
Q: Beschreiben Sie die Grundidee des PRISM-Algorithmus, agglomeratives Clustering (welches Abstandsmass wird benutzt?) und welches zusätzliche Kriterium (zähle die Anzahl an nicht-monochromatischen Verbindungen).
18. Lecture WS 2004/05
Bioinformatics III 67
Example for Prism
Schematic demonstration of monochromatic classification. A network of two types
of connections, such as buffering (green) and aggravating (red) epistasis, is
sorted into module of genes that interact with one another in a purely
monochromatic way.
Segrè et al., Nature Genet. 37, 77 (2005)
18. Lecture WS 2004/05
Bioinformatics III 68
Prism monochromatic gene interaction network
Segrè et al., Nature Genet. 37, 77 (2005)
Buffering (green) and aggravating
(red) gene interaction network.
Genes (black nodes) are grouped
into monochromatically interacting
modules (enclosing boxes).
Gene annotations (white letters
inside nodes) correlate well with
the unsupervised classification.
For directional buffering links,
arrows point from the deletion with
the larger effect to the deletion with
the smaller effect (i.e., to the
mutation whose fitness effect is
buffered by the presence of the
other).
Gene names are indicated on the side of the nodes. Names consisting of
the letter U followed by a number correspond to enzymatic or transport
reactions with unassigned genes. Prism parameter = 0.3 was used.
18. Lecture WS 2004/05
Bioinformatics III 69
System-level view of interactions between functional modulesNotable predictions of module-module interactions include the aggravating
link between LYSbs and TRPcat and the buffering one between PRObs
and ATPs. 'Buffering chains', such as PENT ATPs PRObs IDP,
can be observed owing to the coherent directionality of the buffering links
in a. Such chains are not necessarily transitive; e.g., although PENT
buffers ATPs, which buffers PRObs, there is no direct buffering from
PENT to PRObs. The interacting functional modules are shown at their
approximate locations on a schematic metabolic chart. Functional
modules are named to reflect the main common metabolic processes of
the genes involved:
ACAL, acetaldehyde and acetate metabolism
ATPs, ATP synthase
COA, pantothenate and coenzyme-A biosynthesis
ETHxt, ethanol transport
GLUCN, gluconeogenesis
GLYC, glycolysis
IDP, isocitrate dehydrogenase
LYSbs, lysine biosynthesis
PENT, pentose phosphate pathway
PRObs, proline biosynthesis
PROcat, proline catabolism
PYR, pyruvate metabolism
RESPIR, respiratory chain
STEROL, sterol biosynthesis
TCA, TCA cycle
TRPcat, tryptophan catabolism
URA, uracil biosynthesis.
Segrè et al., Nature Genet. 37, 77 (2005)
18. Lecture WS 2004/05
Bioinformatics III 70
ConclusionEpistatic interactions, manifested in the effects of mutations on the phenotypes caused by
other mutations, may help uncover the functional organization of complex biological networks.
Here, system-level epistatic interactions were studied by computing growth phenotypes of all
single and double knockouts of 890 metabolic genes in S. cerevisiae, using framework of FBA.
A new scale for epistasis identified a distinctive trimodal distribution of these epistatic effects,
allowing gene pairs to be classified as buffering, aggravating or noninteracting.
The epistatic interaction network could be organized hierarchically into function-
enriched modules that interact with each other 'monochromatically' (i.e., with purely
aggravating or purely buffering epistatic links).
This property extends the concept of epistasis from single genes to functional units and
provides a new definition of biological modularity, which emphasizes interactions between,
rather than within, functional modules.
Our approach can be used to infer functional gene modules from purely phenotypic epistasis
measurements. Segrè et al., Nature Genet. 37, 77 (2005)
18. Lecture WS 2004/05
Bioinformatics III 71
V21 Metabolic Pathway Analysis (MPA)Metabolic Pathway Analysis searches for meaningful structural and functional units
in metabolic networks. The most promising, very similar approaches are based on
convex analysis and use the sets of elementary flux modes (Schuster et al. 1999,
2000) and extreme pathways (Schilling et al. 2000).
Both sets span the space of feasible steady-state flux distributions by non-
decomposable routes, i.e. no subset of reactions involved in an EFM or EP can hold
the network balanced using non-trivial fluxes.
MPA can be used to study e.g.
- routing + flexibility/redundancy of networks
- functionality of networks
- idenfication of futile cycles
- gives all (sub)optimal pathways with respect to product/biomass yield
- can be useful for calculability studies in MFA
Klamt et al. Bioinformatics 19, 261 (2003)
18. Lecture WS 2004/05
Bioinformatics III 72
Metabolic Pathway Analysis: Elementary Flux ModesThe technique of Elementary Flux Modes (EFM) was developed prior to extreme
pathways (EP) by Stephan Schuster, Thomas Dandekar and co-workers:Pfeiffer et al. Bioinformatics, 15, 251 (1999)
Schuster et al. Nature Biotech. 18, 326 (2000)
The method is very similar to the „extreme pathway“ method to construct a basis
for metabolic flux states based on methods from convex algebra.
Extreme pathways are a subset of elementary modes, and for many systems, both
methods coincide.
Are the subtle differences important?
18. Lecture WS 2004/05
Bioinformatics III 73
Two approaches for Metabolic Pathway Analysis?The pathway P(v) is an elementary flux mode if it fulfills conditions C1 – C3.
(C1) Pseudo steady-state. S e = 0. This ensures that none of the metabolites is
consumed or produced in the overall stoichiometry.
(C2) Feasibility: rate ei 0 if reaction is irreversible. This demands that only
thermodynamically realizable fluxes are contained in e.
(C3) Non-decomposability: there is no vector v (unequal to the zero vector and to
e) fulfilling C1 and C2 and that P(v) is a proper subset of P(e). This is the core
characteristics for EFMs and EPs and supplies the decomposition of the network
into smallest units (able to hold the network in steady state).
C3 is often called „genetic independence“ because it implies that the enzymes in
one EFM or EP are not a subset of the enzymes from another EFM or EP.
Klamt & Stelling Trends Biotech 21, 64 (2003)
18. Lecture WS 2004/05
Bioinformatics III 74
Two approaches for Metabolic Pathway Analysis?The pathway P(e) is an extreme pathway if it fulfills conditions C1 – C3 AND
conditions C4 – C5.
(C4) Network reconfiguration: Each reaction must be classified either as exchange
flux or as internal reaction. All reversible internal reactions must be split up into
two separate, irreversible reactions (forward and backward reaction).
(C5) Systemic independence: the set of EPs in a network is the minimal set of
EFMs that can describe all feasible steady-state flux distributions.
Klamt & Stelling Trends Biotech 21, 64 (2003)
Q: Nennen Sie die 2 Hauptunterschiede zwischen den MethodenExtreme Pathways und Elementarmodenanalyse.A: siehe Zusatzbedingungen C4 und C5.
18. Lecture WS 2004/05
Bioinformatics III 75
Two approaches for Metabolic Pathway Analysis?
Klamt & Stelling Trends Biotech 21, 64 (2003)
A C P
B
D
A(ext) B(ext) C(ext)R1 R2 R3
R5
R4 R8
R9
R6
R7
18. Lecture WS 2004/05
Bioinformatics III 76
Reconfigured Network
Klamt & Stelling Trends Biotech 21, 64 (2003)
A C P
B
D
A(ext) B(ext) C(ext)R1 R2 R3
R5
R4 R8
R9
R6
R7bR7f
3 EFMs are not systemically independent:EFM1 = EP4 + EP5EFM2 = EP3 + EP5EFM4 = EP2 + EP3
18. Lecture WS 2004/05
Bioinformatics III 77
Property 1 of EFMs
Klamt & Stelling Trends Biotech 21, 64 (2003)
The only difference in the set of EFMs emerging upon reconfiguration consists in
the two-cycles that result from splitting up reversible reactions. However, two-cycles
are not considered as meaningful pathways.
Valid for any network: Property 1
Reconfiguring a network by splitting up reversible reactions leads to the same set of
meaningful EFMs.
18. Lecture WS 2004/05
Bioinformatics III 78
Software: FluxAnalyzerWhat is the consequence of when all exchange fluxes (and hence all
reactions in the network) are irreversible?
Klamt & Stelling Trends Biotech 21, 64 (2003)
EFMs and EPs always co-incide!
18. Lecture WS 2004/05
Bioinformatics III 79
Property 2 of EFMs
Klamt & Stelling Trends Biotech 21, 64 (2003)
Property 2
If all exchange reactions in a network are irreversible then the sets of meaningful
EFMs (both in the original and in the reconfigured network) and EPs coincide.
18. Lecture WS 2004/05
Bioinformatics III 80
Reconfigured Network
Klamt & Stelling Trends Biotech 21, 64 (2003)
A C P
B
D
A(ext) B(ext) C(ext)R1 R2 R3
R5
R4 R8
R9
R6
R7bR7f
3 EFMs are not systemically independent:EFM1 = EP4 + EP5EFM2 = EP3 + EP5EFM4 = EP2 + EP3
18. Lecture WS 2004/05
Bioinformatics III 81
Comparison of EFMs and EPs
Klamt & Stelling Trends Biotech 21, 64 (2003)
Problem EFM (network N1) EP (network N2)
Recognition of 4 genetically indepen- Set of EPs does not contain
operational modes: dent routes all genetically independent
routes for converting (EFM1-EFM4) routes. Searching for EPs
exclusively A to P. leading from A to P via B,
no pathway would be found.
18. Lecture WS 2004/05
Bioinformatics III 82
Comparison of EFMs and EPs
Klamt & Stelling Trends Biotech 21, 64 (2003)
Problem EFM (network N1) EP (network N2)
Finding all the EFM1 and EFM2 are One would only find the
optimal routes: optimal because they suboptimal EP1, not the
optimal pathways for yield one mole P per optimal routes EFM1 and
synthesizing P during mole substrate A EFM2.
growth on A alone. (i.e. R3/R1 = 1),
whereas EFM3 and
EFM4 are only sub-
optimal (R3/R1 = 0.5).
18. Lecture WS 2004/05
Bioinformatics III 83
Comparison of EFMs and EPs
Klamt & Stelling Trends Biotech 21, 64 (2003)
EFM (network N1)
4 pathways convert A
to P (EFM1-EFM4),
whereas for B only one
route (EFM8) exists.
When one of the
internal reactions (R4-
R9) fails, for production
of P from A 2 pathways
will always „survive“. By
contrast, removing
reaction R8 already
stops the production of
P from B alone.
EP (network N2)
Only 1 EP exists for
producing P by substrate A
alone, and 1 EP for
synthesizing P by (only)
substrate B. One might
suggest that both
substrates possess the
same redundancy of
pathways, but as shown by
EFM analysis, growth on
substrate A is much more
flexible than on B.
Problem
Analysis of network
flexibility (structural
robustness,
redundancy):
relative robustness of
exclusive growth on
A or B.
18. Lecture WS 2004/05
Bioinformatics III 84
Comparison of EFMs and EPs
Klamt & Stelling Trends Biotech 21, 64 (2003)
EFM (network N1)
R8 is essential for
producing P by substrate
B, whereas for A there is
no structurally „favored“
reaction (R4-R9 all occur
twice in EFM1-EFM4).
However, considering the
optimal modes EFM1,
EFM2, one recognizes the
importance of R8 also for
growth on A.
EP (network N2)
Consider again biosynthesis
of P from substrate A (EP1
only). Because R8 is not
involved in EP1 one might
think that this reaction is not
important for synthesizing P
from A. However, without this
reaction, it is impossible to
obtain optimal yields (1 P per
A; EFM1 and EFM2).
Problem
Relative importance
of single reactions:
relative importance of
reaction R8.
18. Lecture WS 2004/05
Bioinformatics III 85
Comparison of EFMs and EPs
Klamt & Stelling Trends Biotech 21, 64 (2003)
EFM (network N1)
R6 and R9 are an enzyme
subset. By contrast, R6
and R9 never occur
together with R8 in an
EFM. Thus (R6,R8) and
(R8,R9) are excluding
reaction pairs.(In an arbitrary composable
steady-state flux distribution they
might occur together.)
EP (network N2)
The EPs pretend R4 and R8
to be an excluding reaction
pair – but they are not
(EFM2). The enzyme
subsets would be correctly
identified. However, one can construct simple
examples where the EPs would also
pretend wrong enzyme subsets (not
shown).
Problem
Enzyme subsets
and excluding
reaction pairs:
suggest regulatory
structures or rules.
18. Lecture WS 2004/05
Bioinformatics III 86
Comparison of EFMs and EPs
Klamt & Stelling Trends Biotech 21, 64 (2003)
EFM (network N1)
The shortest pathway
from A to P needs 2
internal reactions (EFM2),
the longest 4 (EFM4).
EP (network N2)
Both the shortest (EFM2)
and the longest (EFM4)
pathway from A to P are not
contained in the set of EPs.
Problem
Pathway length:
shortest/longest
pathway for
production of P from
A.
18. Lecture WS 2004/05
Bioinformatics III 87
Comparison of EFMs and EPs
Klamt & Stelling Trends Biotech 21, 64 (2003)
EFM (network N1)
All EFMs not involving the
specific reactions build up
the complete set of EFMs
in the new (smaller) sub-
network. If R7 is deleted,
EFMs 2,3,6,8 „survive“.
Hence the mutant is
viable.
EP (network N2)
Analyzing a subnetwork
implies that the EPs must be
newly computed. E.g. when
deleting R2, EFM2 would
become an EP. For this
reason, mutation studies
cannot be performed easily.
Problem
Removing a
reaction and
mutation studies:
effect of deleting R7.
18. Lecture WS 2004/05
Bioinformatics III 88
Integrated Analysis of Metabolic and Regulatory NetworksSofar, studies of large-scale cellular networks have focused on their connectivities.
The emerging picture shows a densely-woven web where almost everything is
connected to everything.
In the cell‘s metabolic network, hundreds of substrates are interconnected through
biochemical reactions.
Although this could in principle lead to the simultaneous flow of substrates in
numerous directions, in practice metabolic fluxes pass through specific pathways
( high flux backbone, V20).
Topological studies sofar did not consider how the modulation of this connectivity
might also determine network properties.
Therefore it is important to correlate the network topology (picture derived from
EFMs and EPs) with the expression of enzymes in the cell.
18. Lecture WS 2004/05
Bioinformatics III 89
Analyze transcriptional control in metabolic networksRegulatory and metabolic functions of cells are mediated by networks of interacting
biochemical components.
Metabolic flux is optimized to maximize metabolic efficiency under different
conditions.
Control of metabolic flow:
- allosteric interactions
- covalent modifications involving enzymatic activity
- transcription (revealed by genome-wide expression studies)
Here: N. Barkai and colleagues analyzed published experimental expression data of
Saccharomyces cerevisae.
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
18. Lecture WS 2004/05
Bioinformatics III 90
Recurrence signature algorithmAvailability of DNA microarray data study transcriptional response of a complete
genome to different experimental conditions.
An essential task in studying the global structure of transcriptional networks is the
gene classification.
Commonly used clustering algorithms classify genes successfully when applied to
relatively small data sets, but their application to large-scale expression data is
limited by 2 well-recognized drawbacks:
- commonly used algorithms assign each gene to a single cluster, whereas in fact
genes may participate in several functions and should thus be included in several
clusters
- these algorithms classify genes on the basis of their expression under all
experimental conditions, whereas cellular processes are generally affected only by
a small subset of these conditions.
Ihmels et al. Nat Genetics 31, 370 (2002)
18. Lecture WS 2004/05
Bioinformatics III 91
Recurrence signature algorithmAim: identify transcription „modules“ (TMs).
a set of randomly selected genes is unlikely to be identical to the genes of any
TM. Yet many such sets do have some overlap with a specific TM.
In particular, sets of genes that are compiled according to existing knowledge of
their functional (or regulatory) sequence similarity may have a significant overlap
with a transcription module.
Algorithm receives a gene set that partially overlaps a TM and then provides the
complete module as output. Therefore this algorithm is referred to as „signature
algorithm“.
Ihmels et al. Nat Genetics 31, 370 (2002)
18. Lecture WS 2004/05
Bioinformatics III 92
Recurrence signature algorithm
a, The signature algorithm.
b , Recurrence as a reliability measure. The signature algorithm is applied to distinct input
sets containing different subsets of the postulated transcription module. If the different input
sets give rise to the same module, it is considered reliable.
c, General application of the recurrent signature method.
Ihmels et al. Nat Genetics 31, 370 (2002)
normalizationof data
identify modules
classify genesinto modules
18. Lecture WS 2004/05
Bioinformatics III 93
Correlation between genes of the same metabolic pathwayDistribution of the average correlation
between genes assigned to the same
metabolic pathway in the KEGG database.
The distribution corresponding to random
assignment of genes to metabolic
pathways of the same size is shown for
comparison. Importantly, only genes
coding for enzymes were used in the
random control.
Interpretation: pairs of genes associated
with the same metabolic pathway show
a similar expression pattern.
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
However, typically only a set of the
genes assigned to a given pathway
are coregulated.
18. Lecture WS 2004/05
Bioinformatics III 94
Correlation between genes of the same metabolic pathway
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
Genes of the glycolysis pathway
(according KEGG) were clustered
and ordered based on the correlation
in their expression profiles.
Shown here is the matrix of their
pair-wise correlations.
The cluster of highly correlated
genes (orange frame) corresponds
to genes that encode the central
glycolysis enzymes.
The linear arrangement of these
genes along the pathway is shown at
right.
Of the 46 genes assigned to the
glycolysis pathway in the KEGG
database, only 24 show a correlated
expression pattern.
In general, the coregulated genes
belong to the central pieces of
pathways.
18. Lecture WS 2004/05
Bioinformatics III 95
Coexpressed enzymes often catalyze linear chain of reactionsCoregulation between enzymes
associated with central metabolic
pathways. Each branch
corresponds to several enzymes.
In the cases shown, only one of the
branches downstream of the
junction point is coregulated with
upstream genes.
Interpretation: coexpressed
enzymes are often arranged in a
linear order, corresponding to a
metabolic flow that is directed in
a particular direction.
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
18. Lecture WS 2004/05
Bioinformatics III 96
Co-regulation at branch points
To examine more systematically whether coregulation enhances the linearity of
metabolic flow, analyze the coregulation of enzymes at metabolic branch-points.
Search KEGG for metabolic compounds that are involved in exactly 3 reactions.
Only consider reactions that exist in S.cerevisae.
3-junctions can integrate metabolic flow (convergent junction)
or allow the flow to diverge in 2 directions (divergent junction).
In the cases where several reactions are catalyzed by the same enzymes, choose
one representative so that all junctions considered are composed of precisely 3
reactions catalyzed by distinct enzymes.
Each 3-junction is categorized according to the correlation pattern found between
enzymes catalyzing its branches. Correlation coefficients > 0.25 are considered
significant.
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
18. Lecture WS 2004/05
Bioinformatics III 97
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
Coregulation pattern in three-point junctions
In the majority of divergent
junctions, only one of the
emanating branches is
significantly coregulated with
the incoming reaction that
synthesizes the metabolite.
All junctions corresponding to metabolites that participate in exactly 3
reactions (according to KEGG) were identified and the correlations
between the genes associated with each such junction were calculated.
The junctions were grouped according to the directionality of the
reactions, as shown.
Divergent junctions, which allow the flow of metabolites in two
alternative directions, predominantly show a linear coregulation pattern,
where one of the emanating reaction is correlated with the incoming
reaction (linear regulatory pattern) or the two alternative outgoing
reactions are correlated in a context-dependent manner with a distinct
isozyme catalyzing the incoming reaction (linear switch).
By contrast, the linear regulatory pattern is significantly less abundant
in convergent junctions, where the outgoing flow follows a unique
direction, and in conflicting junctions that do not support metabolic flow.
Most of the reversible junctions comply with linear regulatory patterns.
Indeed, similar to divergent junctions, reversible junctions allow
metabolites to flow in two alternative directions. Reactions were
counted as coexpressed if at least two of the associated genes were
significantly correlated (correlation coefficient >0.25). As a random
control, we randomized the identity of all metabolic genes and repeated
the analysis.
18. Lecture WS 2004/05
Bioinformatics III 98
Co-regulation at branch points: conclusions
The observed co-regulation patterns correspond to a linear metabolic flow,
whose directionality can be switched in a condition-specific manner.
When analyzing junctions that allow metabolic flow in a larger number of
directions, there also only a few important branches are coregulated with
the incoming branch.
Therefore: transcription regulation is used to enhance the linearity of
metabolic flow, by biasing the flow toward only a few of the possible routes.
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
18. Lecture WS 2004/05
Bioinformatics III 99
The connectivity of a given metabolite
is defined as the number of reactions
connecting it to other metabolites.
Shown are the distributions of
connectivity between metabolites in an
unrestricted network () and in a
network where only correlated
reactions are considered ().
In accordance with previous results
(Jeong et al. 2000) , the connectivity
distribution between metabolites
follows a power law (log-log plot).
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
Connectivity of metabolites
In contrast, when coexpression is
used as a criterion to distinguish
functional links, the connectivity
distribution becomes exponential
(log-linear plot).
18. Lecture WS 2004/05
Bioinformatics III 100
Differential regulation of isozymes
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
Observe that isozymes at junction points are often preferentially coexpressed
with alternative reactions.
investigate their role in the metabolic network more systematically.
Two possible functions of isozymes
associated with the same metabolic
reaction.
An isozyme pair could provide redundancy which may be needed for buffering genetic
mutations or for amplifying metabolite production. Redundant isozymes are expected
to be coregulated.
Alternatively, distinct isozymes could be dedicated to separate biochemical
pathways using the associated reaction. Such isozymes are expected to be
differentially expressed with the two alternative processes.
18. Lecture WS 2004/05
Bioinformatics III 101
Arrows represent metabolic
pathways composed of a sequence
of enzymes.
Coregulation is indicated with the
same color (e.g., the isozyme
represented by the green arrow is
coregulated with the metabolic
pathway represented by the green
arrow).
Most members of isozyme pairs
are separately coregulated with
alternative processes.
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
Differential regulation of isozymes in central metabolic PW
18. Lecture WS 2004/05
Bioinformatics III 102
Regulatory pattern of all gene pairs
associated with a common metabolic
reaction (according to KEGG).
All such pairs were classified into several
classes:
(1) parallel, where each gene is
correlated with a distinct connected
reaction (a reaction that shares a
metabolite with the reaction catalyzed by
the respective gene pair);
(2) selective, where only one of the
enzymes shows a significant correlation
with a connected reaction; and
(3) converging, where both enzymes
were correlated with the same reaction.
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
Differential regulation of isozymes
Correlations coefficients >0.25 were
considered significant. To be
counted as parallel, rather than
converging, we demanded that the
correlation with the alternative
reaction be <80% of the correlation
with the preferred reaction.
18. Lecture WS 2004/05
Bioinformatics III 103
The primary role of isozyme multiplicity is to allow for differential regulation
of reactions that are shared by separated processes.
Dedicating a specific enzyme to each pathway may offer a way of independently
controlling the associated reaction in response to pathway-specific requirements, at
both the transcriptional and the post-transcriptional levels.
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
Differential regulation of isozymes: interpretation
18. Lecture WS 2004/05
Bioinformatics III 104
Identify the coregulated subparts of each metabolic pathway and identify relevant
experimental conditions that induce or repress the expression of the pathway
genes.
Also associate additional genes showing similar expression profiles with each
pathway using the signature algorithm.
Input: set of genes, some of which are expected to be coregulated.
Output: coregulated part of the input and additional coregulated genes together
with the set of conditions where the coregulation is realized.
Numerous genes were found that are not directly involved in enzymatic steps:
- transporters
- transcription factors
Genes coexpressed with metabolic pathways
Q: Von welchen Proteinklassen erwarten Sie, dass sie mit den Proteinen eines biochemischenPfades co-exprimiert sind?A: Feeder-Pathways, Transporter, Transkriptions-faktoren.
18. Lecture WS 2004/05
Bioinformatics III 105
Co-expression of transporters
Transporter genes are
co-expressed with the relevant
metabolic pathways providing
the pathways with its metabolites.
Co-expression is marked in green.
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
18. Lecture WS 2004/05
Bioinformatics III 106
Co-regulation of transcription factors
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
Transcription factors are often co-regulated with their regulated pathways. Shown
here are transcription factors which were found to be co-regulated in the analysis.
Co-regulation is shown by color-coding such that the transcription factor and the
associated pathways are of the same color.
18. Lecture WS 2004/05
Bioinformatics III 107
Sofar: co-expression analysis revealed a strong tendency toward coordinated
regulation of genes involved in individual metabolic pathways.
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
Hierarchical modularity in the metabolic network
Does transcription regulation also define a higher-order metabolic organization, by
coordinated expression of distinct metabolic pathways?
Based on observation that feeder pathways (which synthesize metabolites) are
frequently coexpressed with pathways using the synthesized metabolites.
18. Lecture WS 2004/05
Bioinformatics III 108
Feeder-pathways/enzymesFeeder pathways or genes
co-expressed with the
pathways they fuel. The
feeder pathways (light blue)
provide the main pathway
(dark blue) with metabolites
in order to assist the main
pathway, indicating that co-
expression extends beyond
the level of individual
pathways.
These results can be
interpreted in the following
way: the organism will
produce those enzymes that
are needed.Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
18. Lecture WS 2004/05
Bioinformatics III 109
Hierarchical modularity in the metabolic networkDerive hierarchy by applying an iterative
signature algorithm to the metabolic pathways,
and decreasing the resolution parameter
(coregulation stringency) in small steps.
Each box contains a group of coregulated genes
(transcription module). Strongly associated
genes (left) can be associated with a specific
function, whereas moderately correlated
modules (right) are larger and their function is
less coherent.
The merging of 2 branches indicates that the
associated modules are induced by similar
conditions.
All pathways converge to one of 3 low-resolution
modules: amino acid biosynthesis, protein
synthesis, and stress.Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
18. Lecture WS 2004/05
Bioinformatics III 110
Hierarchical modularity in the metabolic networkAlthough amino acids serve as building blocks for proteins, the expression of genes
mediating these 2 processes is clearly uncoupled!
This may reflect the association of rapid cell growth (which triggers enhanced
protein synthesis) with rich growth conditions, where amino acids are readily
available and do not need to be synthesized.
Amino acid biosynthesis genes are only required when external amino acids are
scarce.
In support of this view, a group of amino acid transporters converged to the protein
synthesis module, together with other pathways required for rapid cell growth
(glucose fermentation, nucleotide synthesis and fatty acid synthesis).
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
18. Lecture WS 2004/05
Bioinformatics III 111
Global network propertiesJeong et al. showed that the structural connectivity between metabolites imposes a
hierarchical organization of the metabolic network. That analysis was based on
connectivity between substrates, considering all potential connections.
Here, analysis is based on coexpression of enzymes.
In both approaches, related metabolic pathways were clustered together!
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
There are, however, some differences in the particular groupings (not discussed
here),
and importantly, when including expression data the connectivity pattern of
metabolites changes from a power-law dependence to an exponential one
corresponding to a network structure with a defined scale of connectivity.
This reflects the reduction in the complexity of the network.
18. Lecture WS 2004/05
Bioinformatics III 112
SummaryTranscription regulation is prominently involved in shaping the metabolic network of
S. cerevisae.
1 Transcription leads the metabolic flow toward linearity.
2 Individual isozymes are often separately coregulated with distinct processes,
providing a means of reducing crosstalk between pathways using a common
reaction.
3 Transcription regulation entails a higher-order structure of the metabolic
network.
It exists a hierarchical organization of metabolic pathways into groups of
decreasing expression coherence.
Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)
18. Lecture WS 2004/05
Bioinformatics III 113
V24 Framework for computation of elementary modes
18. Lecture WS 2004/05
Bioinformatics III 114
Definition and benefits of Elementary Modes
Consider a metabolic network with m metabolites and q reactions.
Reactions may involve further metabolites that are not considered as proper
members of the system of study.
These are considered to be buffered, and are called external metabolites in
opposition to the m metabolites within the boundary of the system, called internal
metabolites.
The stoichiometry matrix N is an m × q matrix.
Its element nij is the signed stoichiometric coefficient of metabolite i in reaction j
with the following sign convention: negative for educts, positive for products.
18. Lecture WS 2004/05
Bioinformatics III 115
Definition and benefits of Elementary Modes
Some reactions, called irreversible reactions, are thermodynamically feasible in
only one direction under the normal conditions of the system.
Therefore, the reaction indices are split into two sets:
Irrev (the set of irreversible reaction indices) and
Rev (the set of reversible reaction indices).
A flux vector (flux distribution), denoted v, is a q-vector of the reaction space q,
in which each element vi describes the net rate of the ith reaction.
Sometimes we are interested only in the relative proportions of fluxes in a flux
vector.
In this sense, two flux vectors v and v' can be seen to be equivalent,
denoted by v ≃ v', if and only if there is some α > 0 such that v = α · v'.
18. Lecture WS 2004/05
Bioinformatics III 116
Definition and benefits of Elementary ModesMetabolism involves fast reactions and high turnover of substances compared to
events of gene regulation. Therefore, it is often assumed that metabolite
concentrations and reaction rates are equilibrated, thus constant, in the timescale of
study. The metabolic system is then considered to be in quasi steady state.
This assumption implies Nv = 0.
Thermodynamics impose the rate of each irreversible reaction to be nonnegative.
Consequently the set of feasible flux vectors is restricted to
P = {v q : Nv = 0 and vi ≥ 0, i Irrev} (1)
P is a set of q-vectors that obey a finite set of homogeneous linear equalities and
inequalities, namely
- the |Irrev| inequalities defined by vi ≥ 0, i Irrev and
- the m equalities defined by Nv = 0.
P is therefore – by definition – a convex polyhedral cone.
Q: Welche Bedingungen müssengültige Flüsse in einem metabolischen Netz erfüllen?
18. Lecture WS 2004/05
Bioinformatics III 117
Elementary Flux Modes
Metabolic pathway analysis serves to describe the (infinite) set P of feasible states by
providing a (finite) set of vectors that allow the generation of any vectors of P and
are of fundamental importance for the overall capabilities of the metabolic
system.
One of this set is the so-called set of elementary (flux) modes (EMs).
For a given flux vector v, we note R(v) = {i : vi ≠ 0} the set of indices of the reactions
participating in v.
R(v) can be seen as the underlying pathway of v.
18. Lecture WS 2004/05
Bioinformatics III 118
Elementary Flux Modes
By definition, a flux vector e is an elementary mode (EM) if and only if it fulfills the
following three conditions:
In other words, e is an EM if and only if
(1) it works at quasi steady state,
(2) is thermodynamically feasible and
(3) there is no other non-null flux vector (up to a scaling) that both satisfies these
constraints and involves a proper subset of its participating reactions.
With this convention, reversible modes are here considered as two vectors of
opposite directions.
Q: welche Eigenschaften müssenElementarmoden erfüllen?
18. Lecture WS 2004/05
Bioinformatics III 119
Applications of elementary modes in metabolic studies(1) Identification of pathways: The set of EMs comprises all admissible routes
through the network and thus of "pathways" in the classical sense, i.e. of routes that
convert some educts into some products.
(2) Network flexibility: The number of EMs is at least a rough measure of the
network's flexibility (redundancy, structural robustness) to perform a certain function.
(3) Identification of all pathways with optimal yield: Consider the linear
optimization problem, where all flux vectors with optimal product yield are to be
identified, i.e. where the moles of products generated per mole of educts is maximal.
Then, one or several of the EMs reach this optimum and any optimal flux vector is a
convex combination of these optimal EMs.
(4) Importance of reactions: The importance or relevance of a reaction can be
assessed by its participation frequency or/and flux values in the EMs.
(4a) Inference of viability of mutants: If a reaction is involved in all growth-related
EMs its deletion can be predicted to be lethal, since all EMs would disappear.
18. Lecture WS 2004/05
Bioinformatics III 120
Applications of elementary modes in metabolic studies(5) Reaction correlations: EMs can be used to analyze structural couplings
between reactions, which might give hints for underlying regulatory circuits.
An extreme case is an enzyme (or reaction) subset (set of reactions which can
operate only together) or a pair of mutually excluding reactions (two reactions never
occurring together in any EM).
(6) Detection of thermodynamically infeasible cycles: EMs representing internal
cycles (without participation of external material or energy sources) are infeasible by
laws of thermodynamics and thus reflect structural inconsistencies.
Q: wieso sind interne geschlossene Zyklen nicht zulässig?
Q: wie kann man durch EM-Analyse lethale Mutanten
in einem metabolischen Netz identifizieren?
18. Lecture WS 2004/05
Bioinformatics III 121
In the particular case of a metabolic system with only irreversible reactions, the set
of admissible reactions reads:
Compared with (1) P is in this case a particular, namely a pointed polyhedral cone.
A unified framework - Elementary modes as extreme rays in networks of irreversible reactions
A pointed polyhedral cone. Dashed lines represent virtual cuts of unbounded areas
18. Lecture WS 2004/05
Bioinformatics III 122
Pointed polyhedral conesThis geometry can be intuitively understood, noting that there are certainly
'enough' intersecting half-spaces (given by the inequalities v ≥ 0) to have this
'pointed' effect in 0:
P contains no real line (otherwise there coexist x and -x not null in P, a
contradiction with the constraint v ≥ 0).
The figure even suggests that a pointed polyhedral cone can be either defined in
an implicit way, by the set of constraints as we did until now, or in an explicit or
generative way, by its 'edges', the so-called extreme rays (or generating vectors)
that unambiguously define its boundaries.
In the following, we show that elementary modes always correspond to extreme
rays of a particular pointed cone as defined in (3) and that their computation
therefore matches to the so-called extreme ray enumeration problem, i.e. the
problem of enumerating all extreme rays of a pointed polyhedral cone defined by
its constraints.
!
Q: wann kann eine Linie zu dem Lösungs-raum der gültigen Flüssen gehören?
18. Lecture WS 2004/05
Bioinformatics III 123
Pointed polyhedral cone – more precise
Definition P is a pointed polyhedral cone of d if and only if P is defined by a
full rank h × d matrix A (rank(A) = d) such that,
Insert: the rank of a matrix is the dimension of the range of the matrix,
corresponding to the number of linearly independent rows or columns of the matrix.
The h rows of the matrix A represent h linear inequalities, whereas the full rank
mention imposes the "pointed" effect in 0. Note that a pointed polyhedral cone is,
in general, not restricted to be located completely in the positive orthant as in (3).
For example, the cone considered in extreme-pathway analysis may have
negative parts (namely for exchange reactions), however, by using a particular
configuration it is ensured that the spanned cone is pointed.
18. Lecture WS 2004/05
Bioinformatics III 124
Extreme rays
A vector r is said to be a ray of P(A) if r ≠ 0 and for all α > 0, α · r P(A).
We identify two rays r and r' if there is some α > 0 such that r = α · r' and we
denote r ≃ r', analogous as introduced above for flux vectors.
For any vector x in P(A), the zero set or active set Z(x) is the set of inequality
indices satisfied by x with equality. Noting Ai• the ith row of A, Z(x) = {i : Ai•x = 0}.
Zero sets can be used to characterize extreme rays.
Q: Welche Beziehung herrscht zwischen Z(x), dem Zero-Set,und R(x), dem Reaktionssatz?A: Z(x) ist das Komplement von R(x).
18. Lecture WS 2004/05
Bioinformatics III 125
Extreme rays - definitionDefinition 1: Extreme ray
Let r be a ray of the pointed polyhedral cone P(A). The following statements are
equivalent:
(a) r is an extreme ray of P(A)
(b) if r' is a ray of P(A) with Z(r) Z(r') then r' ≃ r
Since A is full rank, 0 is the unique vector that solves all constraints with equality.
The extreme rays are those rays of P(A) that solve a maximum but not all
constraints with equalities. This is expressed in (b) by requiring that no other ray in
P(A) solves the same constraints plus additional ones with equalities. Note that in
(b) Z(r) = Z(r') consequently holds.
An important property of the extreme rays is that they form a finite set of generating
vectors of the pointed cone: any vector of P(A) can be expressed as a non-negative
linear combination of extreme rays, and the converse is true: all non-negative
combinations of extreme rays lie in P(A). The set of extreme rays is the unique
minimal set of generating vectors of a pointed cone P(A) (up to positive scalings).
18. Lecture WS 2004/05
Bioinformatics III 126
Elementary modes
Lemma 1: EMs in networks of irreversible reactions
In a metabolic system where all reactions are irreversible, the EMs are exactly the
extreme rays of P = {v q : Nv = 0 and v ≥ 0}.
18. Lecture WS 2004/05
Bioinformatics III 127
NotationsWe denote the original reaction network by S and the reconfigured network (with all
reversible reactions split up) by S'.
The reactions of S are indexed from 1 to q.
Irrev denotes the set of irreversible reaction indices and Rev the reversible ones.
An irreversible reaction indexed i gives rise to a reaction of S' indexed i.
A reversible reaction indexed i gives rise to two opposite reactions of S' indexed by
the pairs (i,+1) and (i,-1) for the forward and the backward respectively.
The reconfiguration of a flux vector v q of S is a flux vector v' Irrev Rev × {-1;+1}
of S' such that
18. Lecture WS 2004/05
Bioinformatics III 128
Notations
Let N' be the stoichiometry matrix of S'. N' can be written as N' = [N - NRev] where
NRev consists of all columns of N corresponding to reversible reactions.
Note that if v is a flux vector of S and v' is its reconfiguration then Nv = N'v'.
If possible, i.e. if v' Irrev Rev × {-1;+1} is such that for any reversible reaction index
i Rev at least one of the two coefficients v'(i,+1) or v'(i,-1) equals zero, then we define
the reverse operation, called back-configuration that maps v' back to a flux vector v
such that:
18. Lecture WS 2004/05
Bioinformatics III 129
Theorem 1: EMs in original and in reconfigured networks
Theorem 1
Let S be a metabolic system and S' its reconfiguration by splitting up reversible
reactions. Then the set of EMs of S' is the union of
a) the set of reconfigured EMs of S
b) the set of two-cycles made of a forward and a backward reaction of S'
derived from the same reversible reaction of S
18. Lecture WS 2004/05
Bioinformatics III 130
V25 Framework for computation of elementary modes II
18. Lecture WS 2004/05
Bioinformatics III 131
Elementary modes
Lemma 1: EMs in networks of irreversible reactions
In a metabolic system where all reactions are irreversible, the EMs are exactly the
extreme rays of P = {v q : Nv = 0 and v ≥ 0}.
Proof: P is the solution set of the linear inequalities defined by
where I is the q × q identity matrix.
Since it contains I, A is full rank and therefore P is a pointed polyhedral cone.
All v P obey Nv = 0, thus the 2m first inequalities defined by A hold with equality
for all vectors in P and the inclusion condition of Definition 1 can be restricted to the
last q inequalities, i.e. the inequalities corresponding to the reactions.
Inclusion over the zero set can be equivalently seen as containment over the set of
non-zeros in v, i.e. R(v). Consequently, e P is an extreme ray of P if and only if:
for all e' P : R(e') R(e) e' = 0 or e' ≃ e, i.e. if and only if e is elementary.
Thus, all three conditions in (2) are fulfilled.
18. Lecture WS 2004/05
Bioinformatics III 132
Strategy of the Double Description Method
Iteratively build a minimal DD pair (Ak, Rk) from a minimal DD pair (Ak - 1, Rk - 1),
where Ak is a submatrix of A made of k rows of A.
At each step the columns of Rk are the extreme rays of P(Ak), the convex
polyhedron defined by the linear inequalities Ak. The incremental step introduces a
constraint of A that is not yet satisfied by all computed extreme rays. Some extreme
rays are kept, some are discarded and new ones are generated. The generation of
new extreme rays relies on the notion of adjacent extreme rays.
Definition 2: Adjacent extreme rays
Let r and r' be distinct rays of the pointed polyhedral cone P(A). Then the following
statements are equivalent:
(a) r and r' are adjacent extreme rays
(b) if r" is a ray of P(A) with Z(r) ∩ Z(r') Z(r") then either r" ≃ r or r" ≃ r'
18. Lecture WS 2004/05
Bioinformatics III 133
Initialization
The initialization of the double description method must be done with a minimal DD
pair.
One possibility is the following.
Since P is pointed, A has full rank and contains a nonsingular submatrix of order d
denoted by Ad.
Insert: a square matrix A has an inverse A-1 (so that A A-1 = 1) if its determinant |A| 0
Hence, Ad-1 can be constructed and (Ad, Ad
-1) is a minimal DD pair which works as
initialization and leads directly to step k = d.
Note: there is some freedom in choosing a submatrix Ad or some alternative starting
minimal DD pair.
18. Lecture WS 2004/05
Bioinformatics III 134
Incremental stepAssume (Ak-1, Rk–1) is a minimal DD pair and consider a kth constraint defined by a
not yet extracted row of A, denoted Ai•.
Let J be the set of column indices of Rk - 1 and rj, j J, its column vectors, i.e. the
extreme rays of P(Ak – 1), the polyhedral cone of the previous iteration.
Ai• splits J in three parts (see Figure) whether rj satisfies the constraint with strict
inequality (positive ray), with equality (zero ray) or does not satisfy it (negative
ray):
J+ = {j J : Ai• rj > 0}
J0 = {j J : Ai• rj = 0} (6)
J- = {j J : Ai• rj < 0}Double description incremental step. The scene is best visualized with a polytope; consider the cube pictured here as a 3 projection of a 4 polyhedral cone. Extreme rays from the previous iteration are {a,b,c,d,e,f,g,h} whose adjacencies are represented by edges. For the considered constraint, whose null space is the hyperplane depicted by the bold black border lines, b and f are positive rays, a and c are zero rays, d, e, g and h are negative rays. b, f, a and c satisfy the constraint and are kept for the next iteration. {f,e} and {f,g} are the only two pairs of adjacent positive/negative rays and only they give rise to new rays: i and j at the intersection of the hyperplane and the respective edges. The new polytope is then defined by its extreme rays: {a,b,c,f,i,j}.
18. Lecture WS 2004/05
Bioinformatics III 135
Minimality of Rk
Minimality of Rk is ensured in considering all positive rays, all zero rays and new
rays obtained as combination of a positive and a negative ray that are adjacent to
each other.
For convenience, we denote by Adj the index set of the newly generated rays in
which every new ray is expressed by a pair of indices corresponding to the two
adjacent rays combined.
Hence, Rk is defined as the set of column vectors rj, j J' with
The incremental step is repeated until k = h i.e. having treated all rows of the matrix
A.
The columns of the final matrix Rm are the extreme rays of P(A)
18. Lecture WS 2004/05
Bioinformatics III 136
Computing EMsThe Double Description Method together with Theorem 1 offers a framework for
computing EMs. The only steps to include are a reconfiguration step that splits
reversible reactions and builds the matrix A, and a post-processing step that gets
rid of futile two-cycles and computes the back-configuration.
The dimension of the space is given by the number of reactions in the reconfigured
network: q' = q + |Rev|. This results in the following algorithmic scheme: