v18 ecocyc analysis of e.coli metabolism

136
18. Lecture WS 2004/05 Bioinformatics III 1 V18 EcoCyc Analysis of E.coli Metabolism E.coli genome contains 4391 predicted genes, of which 4288 code for proteins. 676 of these genes form 607 enzymes of E.coli small-molecule metabolism. Of those enzymes, 311 are protein complexes, 296 are monomers. Organization of protein complexes. Distribution of subunit counts for all EcoCyc protein complexes. The predominance of monomers, dimers, and tetramers is obvious Ouzonis, Karp, Genome Res. 10, 568 (2000)

Upload: cadman-day

Post on 01-Jan-2016

37 views

Category:

Documents


1 download

DESCRIPTION

V18 EcoCyc Analysis of E.coli Metabolism. E.coli genome contains 4391 predicted genes, of which 4288 code for proteins. 676 of these genes form 607 enzymes of E.coli small-molecule metabolism. Of those enzymes, 311 are protein complexes, 296 are monomers. Organization of protein complexes. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 1

V18 EcoCyc Analysis of E.coli Metabolism

E.coli genome contains 4391 predicted genes, of which 4288 code for proteins.

676 of these genes form 607 enzymes of E.coli small-molecule metabolism.

Of those enzymes, 311 are protein complexes, 296 are monomers.

Organization of protein complexes. Distribution of subunit counts for all EcoCyc protein complexes. The predominance of monomers, dimers, and tetramers is obvious

Ouzonis, Karp, Genome Res. 10, 568 (2000)

Page 2: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 2

ReactionsEcoCyc describes 905 metabolic reactions that are catalyzed by E. coli.

Of these reactions, 161 are not involved in small-molecule metabolism,

e.g. they participate in macromolecule metabolism such as DNA replication and

tRNA charging.

Of the remaining 744 reactions, 569 have been assigned to at least one pathway.

Ouzonis, Karp, Genome Res. 10, 568 (2000)

Page 3: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 3

Reactions

The number of reactions (744) and the number of enzymes (607) differ ...

WHY??

(1) there is no one-to-one mapping between enzymes and reactions –

some enzymes catalyze multiple reactions, and some reactions are catalyzed

by multiple enzymes.

(2) for some reactions known to be catalyzed by E.coli, the enzyme has not yet

been identified.

Ouzonis, Karp, Genome Res. 10, 568 (2000)

Page 4: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 4

Compounds

The 744 reactions of E.coli small-molecule metabolism involve a total of 791

different substrates.

On average, each reaction contains 4.0 substrates.

Number of reactions containing varying numbers of substrates (reactants plus products).

Ouzonis, Karp, Genome Res. 10, 568 (2000)

Fill out the plot for thedata in EcoCyc(# of reactions vs. substrates)

Page 5: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 5

Pathways

EcoCyc describes 131 pathways:

energy metabolism

nucleotide and amino acid biosynthesis

secondary metabolism

Pathways vary in length from a

single reaction step to 16 steps

with an average of 5.4 steps.

Length distribution of EcoCyc pathways

Ouzonis, Karp, Genome Res. 10, 568 (2000)

Q: Gilt diese Verteilung (für dieLänge von Schritten einesbiochemischen Pfades) auchfür die elementaren Moden vonE.coli?A: Quatsch.

Page 6: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 6

Reactions Catalyzed by More Than one Enzyme

Diagram showing the number of reactions

that are catalyzed by one or more enzymes.

Most reactions are catalyzed by one enzyme,

some by two, and very few by more than two

enzymes.

For 84 reactions, the corresponding enzyme is not yet encoded in EcoCyc.

What may be the reasons for isozyme redundancy?

(2) the reaction is easily „invented“; therefore, there is more than one protein family

that is independently able to perform the catalysis (convergence).

(1) the enzymes that catalyze the same reaction are homologs and have

duplicated (or were obtained by horizontal gene transfer),

acquiring some specificity but retaining the same mechanism (divergence)

Ouzonis, Karp, Genome Res. 10, 568 (2000)

Page 7: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 7

Enzymes that catalyze more than one reaction

Genome predictions usually assign a single enzymatic function.

However, E.coli is known to contain many multifunctional enzymes.

Of the 607 E.coli enzymes, 100 are multifunctional, either having the same active

site and different substrate specificities or different active sites.

Number of enzymes that catalyze one or

more reactions. Most enzymes catalyze

one reaction; some are multifunctional.

The enzymes that catalyze 7 and 9 reactions are purine nucleoside phosphorylase

and nucleoside diphosphate kinase.

Take-home message: The high proportion of multifunctional enzymes implies that

the genome projects significantly underpredict multifunctional enzymes!

Ouzonis, Karp, Genome Res. 10, 568 (2000)

Page 8: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 8

Reactions participating in more than one pathway

The 99 reactions belonging to multiple

pathways appear to be the intersection

points in the complex network of chemical

processes in the cell.

E.g. the reaction present in 6 pathways corresponds to the reaction catalyzed by

malate dehydrogenase, a central enzyme in cellular metabolism.

Ouzonis, Karp,

Genome Res. 10, 568 (2000)

Page 9: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 9

Connectivity distributions P(k) for substrates

a, Archaeoglobus fulgidus (archae);

b, E. coli (bacterium);

c, Caenorhabditis elegans (eukaryote),

shown on a log–log plot, counting

separately the incoming (In) and

outgoing links (Out) for each substrate.

kin (kout) corresponds to the number of

reactions in which a substrate

participates as a product (educt).

d, The connectivity distribution

averaged over 43 organisms.

Jeong et al. Nature 407, 651 (2000)

consider in and out-degrees separately because many biochemical reactions arenot reversible.

Warum ist es sinnvoll, in- undout-degree getrennt zu betrachten?

Page 10: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 10

Properties of metabolic networksa, The histogram of the biochemical pathway

lengths, l, in E. coli.

b, The average path length (diameter) for each

of the 43 organisms.

c, d, Average number of incoming links (c) or

outgoing links (d) per node for each organism.

e, The effect of substrate removal on the

metabolic network diameter (average shortest

biochemical pathway between 2 substrates)

of E. coli. In the top curve (red) the most

connected substrates are removed first. In the

bottom curve (green) nodes are removed

randomly. M  = 60 corresponds to 8% of the

total number of substrates in found in E. coli.

The horizontal axis in b– d denotes the number

of nodes in each organism. b–d, Archaea

(magenta), bacteria (green) and eukaryotes

(blue) are shown.

Zeichnen Sie in (e) den erwartetenKurvenverlauf für den Durchmeserfür Entfernung eines Hub-Proteins bzw. eines zufälligen Proteins ein.

Page 11: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 11

Stoichiometric matrix

Stoichiometric matrix:

A matrix with reaction stochio-

metries as columns and

metabolite participations as

rows.

The stochiometric matrix is an

important part of the in silico

model.

With the matrix, the methods of

extreme pathway and

elementary mode analyses can

be used to generate a unique

set of pathways P1, P2, and P3

(see future lecture).

Papin et al. TIBS 28, 250 (2003)

Hierzu kommt definitv eineAufgabe a la: stellen Sie fürdas folgende Netzwerk diestöchiometrische Matrix auf.

Page 12: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 12

Flux balancingAny chemical reaction requires mass conservation.

Therefore one may analyze metabolic systems by requiring mass conservation.

Only required: knowledge about stoichiometry of metabolic pathways and

metabolic demands

For each metabolite:

Under steady-state conditions, the mass balance constraints in a metabolic

network can be represented mathematically by the matrix equation:

S · v = 0

where the matrix S is the m n stoichiometric matrix,

m = the number of metabolites and n = the number of reactions in the network.

The vector v represents all fluxes in the metabolic network, including the internal

fluxes, transport fluxes and the growth flux.

)( dtransporteuseddegradeddsynthesizei

i VVVVdt

dXv

Was ist die Grundannahme für die Anwendungvon Flux Balance Analysis, d.h. Lösen derGleichung S v = 0 ?

Page 13: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 13

Flux balance analysis

Since the number of metabolites is generally smaller than the number of reactions

(m < n) the flux-balance equation is typically underdetermined.

Therefore there are generally multiple feasible flux distributions that satisfy the mass

balance constraints.

The set of solutions are confined to the nullspace of matrix S.

To find the „true“ biological flux in cells ( e.g. Heinzle, Huber, UdS) one needs

additional (experimental) information,

or one may impose constraints

on the magnitude of each individual metabolic flux.

The intersection of the nullspace and the region defined by those linear inequalities

defines a region in flux space = the feasible set of fluxes.

iii v

Page 14: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 14

Feasible solution set for a metabolic reaction network

(A) The steady-state operation of the metabolic network is restricted to the region

within a cone, defined as the feasible set. The feasible set contains all flux vectors

that satisfy the physicochemical constrains. Thus, the feasible set defines the

capabilities of the metabolic network. All feasible metabolic flux distributions lie

within the feasible set, and

(B) in the limiting case, where all constraints on the metabolic network are known,

such as the enzyme kinetics and gene regulation, the feasible set may be reduced

to a single point. This single point must lie within the feasible set.

Edwards & Palsson PNAS 97, 5528 (2000)

Page 15: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 15

SummaryFBA analysis constructs the optimal network utilization simply using

stoichiometry of metabolic reactions and capacity constraints.

For E.coli the in silico results are consistent with experimental data.

FBA shows that in the E.coli metabolic network there are relatively few critical

gene products in central metabolism.

However, the ability to adjust to different environments (growth conditions) may be

dimished by gene deletions.

FBA identifies „the best“ the cell can do, not how the cell actually behaves under a

given set of conditions. Here, survival was equated with growth.

FBA does not directly consider regulation or regulatory constraints on the

metabolic network. This can be treated separately (see future lecture).

Edwards & Palsson PNAS 97, 5528 (2000)

Page 16: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 16

V19 Extreme Pathwaysintroduced into metabolic analysis by the lab of Bernard Palsson

(Dept. of Bioengineering, UC San Diego). The publications of this lab

are available at http://gcrg.ucsd.edu/publications/index.html

The extreme pathway

technique is based

on the stoichiometric

matrix representation

of metabolic networks.

All external fluxes are

defined as pointing outwards.

Schilling, Letscher, Palsson,

J. theor. Biol. 203, 229 (2000)

Page 17: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 17

Extreme Pathways – theorem

Theorem. A convex flux cone has a set of systemically independent generating

vectors. Furthermore, these generating vectors (extremal rays) are unique up to

a multiplication by a positive scalar. These generating vectors will be called

„extreme pathways“.

(1) The existence of a systemically independent generating set for a cone is

provided by an algorithm to construct extreme pathways (see below).

(2) uniqueness?

Let {p1, ..., pk} be a systemically independent generating set for a cone.

Then follows that if pj = c´+ c´´ both c´and c´´ are positive multiples of pj.

Schilling, Letscher, Palsson,

J. theor. Biol. 203, 229 (2000)

Page 18: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 18

Extreme Pathways – algorithm - setup

The algorithm to determine the set of extreme pathways for a reaction network

follows the pinciples of algorithms for finding the extremal rays/ generating

vectors of convex polyhedral cones.

Combine n n identity matrix (I) with the transpose of the stoichiometric

matrix ST. I serves for bookkeeping.

Schilling, Letscher, Palsson,

J. theor. Biol. 203, 229 (2000)

S

I ST

Page 19: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 19

separate internal and external fluxes

Examine constraints on each of the exchange fluxes as given by

j bj j

If the exchange flux is constrained to be positive do nothing.

If the exchange flux is constrained to be negative multiply the

corresponding row of the initial matrix by -1.

If the exchange flux is unconstrained move the entire row to a temporary

matrix T(E). This completes the first tableau T(0).

T(0) and T(E) for the example reaction system are shown on the previous slide.

Each element of this matrices will be designated Tij.

Starting with x = 1 and T(0) = T(x-1) the next tableau is generated in the following

way:

Schilling, Letscher, Palsson,

J. theor. Biol. 203, 229 (2000)

Page 20: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 20

idea of algorithm

(1) Identify all metabolites that do not have an unconstrained exchange flux

associated with them.

The total number of such metabolites is denoted by .

For the example, this is only the case for metabolite C ( = 1).

What is the main idea?

- We want to find balanced extreme pathways

that don‘t change the concentrations of

metabolites when flux flows through

(input fluxes are channelled to products not to

accumulation of intermediates).

- The stochiometrix matrix describes the coupling of each reaction to the

concentration of metabolites X.

- Now we need to balance combinations of reactions that leave concentrations

unchanged. Pathways applied to metabolites should not change their

concentrations the matrix entries

need to be brought to 0.Schilling, Letscher, Palsson,

J. theor. Biol. 203, 229 (2000)

Page 21: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 21

keep pathways that do not change concentrations of internal metabolites

(2) Begin forming the new matrix T(x) by copying

all rows from T(x – 1) which contain a zero in the

column of ST that corresponds to the first

metabolite identified in step 1, denoted by index c.

(Here 3rd column of ST.)

Schilling, Letscher, Palsson, J. theor. Biol. 203, 229 (2000)

1 -1 1 0 0 0

1 0 -1 1 0 0

1 0 1 -1 0 0

1 0 0 -1 1 0

1 0 0 1 -1 0

1 0 0 -1 0 1

1 -1 1 0 0 0

T(0) =

T(1) =

+

Hierzu kommt ebenfallseine Rechenaufgabe.Wir werden das Endresultatangeben, sodass Sie IhreLösung überprüfen können.

Page 22: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 22

balance combinations of other pathways

(3) Of the remaining rows in T(x-1) add together

all possible combinations of rows which contain

values of the opposite sign in column c, such that

the addition produces a zero in this column.

Schilling, et al.

JTB 203, 229

1 -1 1 0 0 0

1 0 -1 1 0 0

1 0 1 -1 0 0

1 0 0 -1 1 0

1 0 0 1 -1 0

1 0 0 -1 0 1

T(0) =

T(1) =

1 0 0 0 0 0 -1 1 0 0 0

0 1 1 0 0 0 0 0 0 0 0

0 1 0 1 0 0 0 -1 0 1 0

0 1 0 0 0 1 0 -1 0 0 1

0 0 1 0 1 0 0 1 0 -1 0

0 0 0 1 1 0 0 0 0 0 0

0 0 0 0 1 1 0 0 0 -1 1

Page 23: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 23

remove “non-orthogonal” pathways

(4) For all of the rows added to T(x) in steps 2 and 3 check to make sure that no

row exists that is a non-negative combination of any other sets of rows in T(x) .

One method used is as follows:

let A(i) = set of column indices j for with the elements of row i = 0.

For the example above Then check to determine if there exists

A(1) = {2,3,4,5,6,9,10,11} another row (h) for which A(i) is a

A(2) = {1,4,5,6,7,8,9,10,11} subset of A(h).

A(3) = {1,3,5,6,7,9,11}

A(4) = {1,3,4,5,7,9,10} If A(i) A(h), i h

A(5) = {1,2,3,6,7,8,9,10,11} where

A(6) = {1,2,3,4,7,8,9} A(i) = { j : Ti,j = 0, 1 j (n+m) }

then row i must be eliminated from T(x)

Schilling et al.

JTB 203, 229

Q: was ist der Sinn hiervon?A: Elementarität bzw. Minimalität.in V24 heisst A(i) nun Z(i), also der zero set.

Page 24: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 24

repeat steps for all internal metabolites

(5) With the formation of T(x) complete steps 2 – 4 for all of the metabolites that do

not have an unconstrained exchange flux operating on the metabolite,

incrementing x by one up to . The final tableau will be T().

Note that the number of rows in T () will be equal to k, the number of extreme

pathways.

Schilling et al.

JTB 203, 229

Page 25: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 25

balance external fluxes

(6) Next we append T(E) to the bottom of T(). (In the example here = 1.)

This results in the following tableau:

Schilling et al.

JTB 203, 229

T(1/E) =

1 -1 1 0 0 0

1 1 0 0 0 0 0

1 1 0 -1 0 1 0

1 1 0 -1 0 1 0

1 1 0 1 0 -1 0

1 1 0 0 0 0 0

1 1 0 0 0 -1 1

1 -1 0 0 0 0

1 0 -1 0 0 0

1 0 0 0 -1 0

1 0 0 0 0 -1

Page 26: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 26

balance external fluxes

(7) Starting in the n+1 column (or the first non-zero column on the right side),

if Ti,(n+1) 0 then add the corresponding non-zero row from T(E) to row i so as to

produce 0 in the n+1-th column.

This is done by simply multiplying the corresponding row in T(E) by Ti,(n+1) and

adding this row to row i .

Repeat this procedure for each of the rows in the upper portion of the tableau so

as to create zeros in the entire upper portion of the (n+1) column.

When finished, remove the row in T(E) corresponding to the exchange flux for the

metabolite just balanced.

Schilling et al.

JTB 203, 229

Page 27: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 27

balance external fluxes

(8) Follow the same procedure as in step (7) for each of the columns on the right

side of the tableau containing non-zero entries.

(In this example we need to perform step (7) for every column except the middle

column of the right side which corresponds to metabolite C.)

The final tableau T(final) will contain the transpose of the matrix P containing the

extreme pathways in place of the original identity matrix.

Schilling et al.

JTB 203, 229

Page 28: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 28

pathway matrix

T(final) =

PT =

Schilling et al.

JTB 203, 229

1 -1 1 0 0 0 0 0 0

1 1 0 0 0 0 0 0

1 1 -1 1 0 0 0 0 0 0

1 1 -1 1 0 0 0 0 0 0

1 1 1 -1 0 0 0 0 0 0

1 1 0 0 0 0 0 0

1 1 -1 1 0 0 0 0 0 0

1 0 0 0 0 0 -1 1 0 0

0 1 1 0 0 0 0 0 0 0

0 1 0 1 0 0 0 -1 1 0

0 1 0 0 0 1 0 -1 0 1

0 0 1 0 1 0 0 1 -1 0

0 0 0 1 1 0 0 0 0 0

0 0 0 0 1 1 0 0 -1 1

v1 v2 v3 v4 v5 v6 b1 b2 b3 b4

p1 p7 p3 p2 p4 p6 p5

Page 29: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 29

Extreme Pathways for model system

Schilling et al.

JTB 203, 229

1 0 0 0 0 0 -1 1 0 0

0 1 1 0 0 0 0 0 0 0

0 1 0 1 0 0 0 -1 1 0

0 1 0 0 0 1 0 -1 0 1

0 0 1 0 1 0 0 1 -1 0

0 0 0 1 1 0 0 0 0 0

0 0 0 0 1 1 0 0 -1 1

v1 v2 v3 v4 v5 v6 b1 b2 b3 b4

p1 p7 p3 p2 p4 p6 p5

2 pathways p6 and p7 are not shown (right below) because all exchange fluxes with the exterior are 0.Such pathways have no net overall effect on the functional capabilities of the network.They belong to the cycling of reactions v4/v5 and v2/v3.

Page 30: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 30

How reactions appear in pathway matrix

In the matrix P of extreme pathways, each column is an EP and each row

corresponds to a reaction in the network.

The numerical value of the i,j-th element corresponds to the relative flux level

through the i-th reaction in the j-th EP.

Papin, Price, Palsson,

Genome Res. 12, 1889 (2002)

PPP TLM

Page 31: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 31

Papin, Price, Palsson, Genome Res. 12, 1889 (2002)

A symmetric Pathway Length Matrix PLM can be calculated:

where the values along the diagonal correspond to the length of the EPs.

PPP TLM

Properties of pathway matrix

The off-diagonal terms of PLM are the number of reactions that a pair of extreme

pathways have in common.

Wie können Sie aus der Pathway Matrix Pdie Länge der beteiligten Pfade berechnen?

Page 32: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 32

Papin, Price, Palsson, Genome Res. 12, 1889 (2002)

One can also compute a reaction participation matrix PPM from P:

where the diagonal correspond to the number of pathways in which the given

reaction participates.

TPM PPP

Properties of pathway matrix

Wie können Sie aus der Pathway Matrix Permitteln, an wievielen Pfaden eine Reaktion beteiligt ist?

Page 33: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 33

Summary (extreme pathways)

Price et al. Biophys J 84, 794 (2003)

Extreme pathway analysis provides a mathematically rigorous way to dissect

complex biochemical networks.

The matrix products PT P and PT P are useful ways to interpret pathway lengths

and reaction participation.

However, the number of computed vectors may range in the 1000sands.

Therefore, meta-methods (e.g. singular value decomposition) are required that

reduce the dimensionality to a useful number that can be inspected by humans.

Single value decomposition may be one useful method ... and there are more to

come. Single value decomposition kommt nicht dran.

Page 34: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 34

Computational metabolomics: modelling constraints

Price et al. Nature Rev Microbiol 2, 886 (2004)

Surviving (expressed) phenotypes must satisfy constraints imposed on the molecular

functions of a cell, e.g. conservation of mass and energy.

Fundamental approach to understand biological systems: identify and formulate

constraints.

Important constraints of cellular function:

(1) physico-chemical constraints

(2) Topological constraints

(3) Environmental constraints

(4) Regulatory constraints

Page 35: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 35

Physico-chemical constraints

Price et al. Nature Rev Microbiol 2, 886 (2004)

These are „hard“ constraints: Conservation of mass, energy and momentum.

Contents of a cell are densely packed viscosity can be 100 – 1000 times higher

than that of water

Therefore, diffusion rates of macromolecules in cells are slower than in water.

Many molecules are confined inside the semi-permeable membrane high

osmolarity. Need to deal with osmotic pressure (e.g. Na+K+ pumps)

Reaction rates are determined by local concentrations inside cells

Enzyme-turnover numbers are generally less than 104 s-1. Maximal rates are equal to

the turnover-number multiplied by the enzyme concentration.

Biochemical reactions are driven by negative free-energy change in forward

direction.

Page 36: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 36

Topological constraints

Price et al. Nature Rev Microbiol 2, 886 (2004)

The crowding of molecules inside cells leads to topological (3D)-constraints that affect

both the form and the function of biological systems.

Page 37: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 37

Environmental constraints

Price et al. Nature Rev Microbiol 2, 886 (2004)

Environmental constraints on cells are time and condition dependent:

Nutrient availability, pH, temperature, osmolarity, availability of electron acceptors.

E.g. Heliobacter pylori lives in the human stomach at pH=1 needs to produce

NH3 at a rate that will maintain its immediate surrounding at a pH that is sufficiently

high to allow survival.

Ammonia is made from elementary nitrogen H. pylori has adapted by using amino

acids instead of carbohydrates as its primary carbon source.

Q: welche Methode ist besser geeignet, den Effekt vonZwangsbedingungen wie die Einschränkung der Reaktionsrate eines Enzym auf das Verhalten eines Metabolismus zu beschreiben?

FBA oder Extreme Pathways?

A: FBA. Modellierung als Einschränkung des Flusses v1

111 v

Page 38: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 38

Regulatory constraints

Price et al. Nature Rev Microbiol 2, 886 (2004)

Regulatory constraints are self-imposed by the organism and are subject to

evolutionary change they are no „hard“ constraints.

Regulatory constraints allow the cell to eliminate suboptimal phenotypic states and to

confine itself to behaviors of increased fitness.

Nennen Sie 5 Zwangsbedingungen, die für die Modellierungzellulärer Netzwerke wichtig sein können.

Page 39: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 39

Mathematical formation of constraints

Price et al. Nature Rev Microbiol 2, 886 (2004)

There are two fundamental types of constraints: balances and bounds.

Balances are constraints that are associated

with conserved quantities as energy, mass, redox potential, momentum

or with phenomena such as solvent capacity, electroneutrality and osmotic pressure.

Bounds are constraints that limit numerical ranges of individual variables and

parameters such as concentrations, fluxes or kinetic constants.

Both bound and balance constraints limit the allowable functional states of

reconstructed cellular metabolic networks.

Page 40: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 40

Tools for analyzing network states

Price et al. Nature Rev Microbiol

2, 886 (2004)

The two steps that are used to form a solution space — reconstruction and the imposition of governing constraints — are illustrated in the centre of the figure. Several methods are being developed at various laboratories to analyse the solution space. Ci and Cj concentrations of compounds i and j; EP, extreme pathway; vi and vj fluxes through reactions i and j; v1 –v3 flux through reactions 1-3; vnet, net flux through loop.

Page 41: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 41

Determining optimal states

Price et al. Nature Rev Microbiol 2, 886 (2004)

Page 42: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 42

Characterizing the whole solution space

Price et al. Nature Rev Microbiol 2, 886 (2004)

Page 43: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 43

V20 Applications of Flux Balance Analysis

Many aspects of metabolism are currently being studied to understand its

hierarchical and modular organization:

- Topology (Jeong et al. V18)

- Reaction fluxes (Almaas et al. - today)

- Epistasis (Segrè et al. - today)

- Coupling to gene expression (V22)

Page 44: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 44

vj is the flux of reaction j and Sij is the stoichiometric coefficient of reaction j.

We denote the mass carried by reaction j producing (consuming) metabolite i by

Fluxes vary widely: e.g. dimensionless flux of succinyl coenzyme A synthetase

reaction is 0.185, whereas the flux of the aspartate oxidase reaction is 10.000

times smaller, 2.2 10-5.

E.coli in silico

j

jiji vSAdt

d0

Stochiometric matrix Sij for E.coli strain MG1655 containing 537 metabolites Ai

and 739 reactions j.

Apply flux balance analysis: in a steady state the concentrations of all the

metabolites are time independent

jijij vSv ˆWie können Sie die Beteiligung einerReaktion j an der Produktion einesMetaboliten i charakterisieren?

Page 45: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 45

Overall flux organization of E.coli metabolic network

a, Flux distribution for optimized biomass production

on succinate (black) and glutamate (red) substrates.

The solid line corresponds to the power-law fit

that a reaction has flux v

P(v) (v + v0)- , with v0 = 0.0003 and = 1.5.

d, The distribution of experimentally determined fluxes

from the central metabolism of E. coli shows power-

law behaviour as well, with a best fit to P(v) v- with

= 1.

Both computed and experimental flux distribution

show wide spectrum of fluxes.

Almaar et al., Nature 427, 839 (2004)

Page 46: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 46

Overall flux organization of E.coli metabolic network

The observed flux distribution is compatible with two different potential local flux

structures:

(a) a homogenous local organization would imply that all reactions producing

(consuming) a given metabolite have comparable fluxes

(b) a more delocalized „high-flux backbone (HFB)“ is expected if the local flux

organisation is heterogenous such that each metabolite has a dominant source

(consuming) reaction.

Schematic illustration of the hypothetical scenario in which (a) all fluxes have

comparable activity, in which case we expect kY(k) 1 and (b) the majority of the

flux is carried by a single incoming or outgoing reaction, for which we should have

kY(k) k . Almaar et al., Nature 427, 839 (2004)

Page 47: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 47

Overall flux organization of E.coli metabolic network

To distinguish between these 2 schemes for each metabolite i produced

(consumed) by k reactions, define

Almaar et al., Nature 427, 839 (2004)

2

11ˆ

ˆ,

k

jk

l ilv

ijv

ikY

where vij is the mass carried by reaction j which produces (consumes) metabolite i.

If all reactions producing (consuming) metabolite i have comparable vij values,

Y(k,i) scales as 1/k.

If, however, the activity of a single reaction dominates we expect

Y(k,i) 1 (independent of k).

Beschreiben Sie die Strategie von Barabasi und Mitarbeitern,zwischen den Szenarien (a) und (b) zu unterscheiden.

Page 48: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 48

Characterizing the local inhomogeneity of the flux net

a, Measured kY(k) shown as a function of k for

incoming and outgoing reactions, averaged over

all metabolites, indicates that Y(k) k-0.27, as the

solid line has the slope = 0.73.

Inset shows non-zero mass flows, v^ij, producing

(consuming) FAD on a glutamate-rich substrate.

an intermediate behavior is found between the

two extreme cases.

the large-scale inhomogeneity observed in the

overall flux distribution is also increasingly valid at

the level of the individual metabolites.

The more reactions that consume (produce) a

given metabolite, the more likely it is that a single

reaction carries most of the flux, see FAD.Almaar et al., Nature 427, 839 (2004)

Page 49: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 49

Clean up metabolic network

Simple algorithm removes for each metabolite systematically all reactions

but the one providing the largest incoming (outgoing) flux distribution.

The algorithm uncovers the „high-flux-backbone“ of the metabolism,

a distinct structure of linked reactions that form a giant component

with a star-like topology.

Almaar et al., Nature 427, 839 (2004)

Page 50: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 50

Maximal flow networks

glutamate rich succinate rich substrates

Directed links: Two metabolites (e.g. A and B) are connected with a directed link pointing

from A to B only if the reaction with maximal flux consuming A is the reaction with maximal

flux producing B.

Shown are all metabolites that have at least one neighbour after completing this procedure.

The background colours denote different known biochemical pathways.

Almaar et al., Nature 427, 839 (2004)

Page 51: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 51

FBA-optimized network on glutamate-rich substrateHigh-flux backbone for FBA-optimized metabolic

network of E. coli on a glutamate-rich substrate.

Metabolites (vertices) coloured blue have at least one

neighbour in common in glutamate- and succinate-rich

substrates, and those coloured red have none.

Reactions (lines) are coloured blue if they are identical

in glutamate- and succinate-rich substrates, green if a

different reaction connects the same neighbour pair,

and red if this is a new neighbour pair. Black dotted

lines indicate where the disconnected pathways, for

example, folate biosynthesis, would connect to the

cluster through a link that is not part of the HFB. Thus,

the red nodes and links highlight the predicted changes

in the HFB when shifting E. coli from glutamate- to

succinate-rich media. Dashed lines indicate links to the

biomass growth reaction.

Almaar et al., Nature 427, 839 (2004)

(1) Pentose Phospate (11) Respiration

(2) Purine Biosynthesis (12) Glutamate Biosynthesis (20) Histidine Biosynthesis

(3) Aromatic Amino Acids (13) NAD Biosynthesis (21) Pyrimidine Biosynthesis

(4) Folate Biosynthesis (14) Threonine, Lysine and Methionine Biosynthesis

(5) Serine Biosynthesis (15) Branched Chain Amino Acid Biosynthesis

(6) Cysteine Biosynthesis (16) Spermidine Biosynthesis (22) Membrane Lipid Biosynthesis

(7) Riboflavin Biosynthesis (17) Salvage Pathways (23) Arginine Biosynthesis

(8) Vitamin B6 Biosynthesis (18) Murein Biosynthesis (24) Pyruvate Metabolism

(9) Coenzyme A Biosynthesis (19) Cell Envelope Biosynthesis (25) Glycolysis

(10) TCA Cycle

Page 52: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 52

Interpretation

Only a few pathways appear disconnected indicating that although these pathways

are part of the HFB, their end product is only the second-most important source for

another HFB metabolite.

Groups of individual HFB reactions largely overlap with traditional

biochemical partitioning of cellular metabolism.

Almaar et al., Nature 427, 839 (2004)

Page 53: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 53

How sensitive is the HFB to changes in the environment?

Almaar et al., Nature 427, 839 (2004)

b, Fluxes of individual

reactions for glutamate-rich

and succinate-rich conditions.

Reactions with negligible flux

changes follow the diagonal

(solid line).

Some reactions are turned off

in only one of the conditions

(shown close to the

coordinate axes). Reactions

belonging to the HFB are

indicated by black squares,

the rest are indicated by blue

dots. Reactions in which the

direction of the flux is

reversed are coloured green.

Only the reactions in the high-flux territory

undergo noticeable differences!

Type I: reactions turned on in one conditions and

off in the other (symbols).

Type II: reactions remain active but show an

orders-in-magnitude shift in flux under the two

different growth conditions.

Page 54: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 54

Flux distributions for individual reactions

Shown is the flux distribution for four selected

E. coli reactions in a 50% random environment.

a Triosphosphate isomerase;

b carbon dioxide transport;

c NAD kinase;

d guanosine kinase.

Reactions on the v curve (small fluxes)

have unimodal/gaussian distributions (a

and c). Shifts in growth-conditions only lead

to small changes of their flux values.

Reactions off this curve have multimodal

distributions (b and d), showing several

discrete flux values under diverse

conditions. Under different growth

conditions they show several discrete and

distinct flux values. Almaar et al., Nature 427, 839 (2004)

Page 55: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 55

Summary Metabolic network use is highly uneven (power-law distribution) at the global level

and at the level of the individual metabolites.

Whereas most metabolic reactions have low fluxes, the overall activity of the

metabolism is dominated by several reactions with very high fluxes.

E. coli responds to changes in growth conditions by reorganizing the rates of

selected fluxes predominantly within this high-flux backbone.

Apart from minor changes, the use of the other pathways remains unaltered.

These reorganizations result in large, discrete changes in the fluxes of the HFB

reactions.

may represent a universal feature of metabolic activity in all cells.

Implications for metabolic engineering?

Page 56: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 56

A global view of epistasisThe relationship between genotype and phenotype is expected to be nonlinear

for most common human diseases.

Part of this complexity can be attributed to epistasis – the effects of a given gene

on a trait can be dependent on one or more other genes.

Moore, Nature Gen 37, 13 (2005)

Page 57: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 57

Why is there epistasis?

Exact origin is unknown.

Suspect: canalization stabilizes phenotypes through natural selection.

Hypothesis: phenotypes should be stable in the presence of mutations

they must have an underlying genetic architecture that is comprised of networks

of genes that are redundant and robust.

Therefore, substantial effects on the phenotype are observed only when there are

multiple mutational hits to the gene network.

creates dependencies among the genes in the network.

Approach: systematic study epistatis in well-understood model organisms.

Here: single and double knockouts of 890 metabolic genes of yeast.

Estimate growth phenotypes of all knockouts by metabolic flux analysis.

Moore, Nature Gen 37, 13 (2005)

Page 58: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 58

How does one measure epistasis?

WX, WY: fitness of individual mutants WXY: fitness of double mutant

Here: compute maximal rate of biomass Scaled espsilon can distinguish between

^production Vgrowth of all networks. WX = 0.7, WY = 0.7, WXY = 0.54

For deletion of gene X, define fitness as WX = 0.54, WY = 0.91, WXY = 0.54

In both cases WX WY = 0.49 and the

non-scaled epsilon cannot distinguish

both cases.

Moore, Nature Gen 37, 13 (2005)

typewildgrowth

Xgrowth

X V

VW

erschwerend/risikoerhöhend

Page 59: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 59

Examples of gene deletion interactions

full dots: essential biomass components,

need to be synthesized for the cell to be

able to grow.

a idealized case: all biomass components

are derived from independent nutrient

sources.

WX = SX, WY = SY,

WXY = min(WX,WY)

b more realistic case, parallel and mutually

required pathways demand optimal

allocation of a common nutrient.

c simplified example of synthetic

lethality: a single biomass component can

be produced through 2 alternative routes.

Single mutants can grow, double mutant is

lethal.

d complete buffering WXY = min(WX,WY)Moore, Nature Gen 37, 13 (2005)

effect of gene deletion X : reduced

„effective stochiometry“ SX.

Page 60: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 60

Fitness of double mutantsFitness values of all possible double mutants relative to the

expected no-epistasis values are calculated with FBA over all

pairs of enzyme deletions (excluding essential genes and

gene-deletions with no phenotypic consequence).

The trimodal distribution is uncovered by transforming the

(a) nonscaled epistasis level = WXY - WXWY into

(b) the new scale defined in Table 1.

The new espilon values are used to classify the interactions

into buffering (green) at > +; aggravating (red), including

synthetic lethal at = -1 and strong synthetic sick at < -;

and no epistasis (black). Here we used (-, +) = (-0.25, 0.95).

Relatively few interaction pairs (gray) fall in a nondecisive

area. Although the = 1 point is the outermost value in the

FBA model, in experimental measurements compensatory

interactions could exceed this buffering case.

Segrè et al., Nature Genet. 37, 77 (2005)

Page 61: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 61

Classification of gene interactions

Segrè et al., Nature Genet. 37, 77 (2005)

(c) The classification of gene interactions is also evident in a scatter

plot showing / versus normalized to the effect of the double

mutation, 1 - WXY. The ratio between the x and y axes is equal to the

scaled epistasis level .

(d) A schematic metabolic network showing simple examples of

buffering, aggravating and multiplicative interactions (green, red and

black arcs, respectively) between gene deletions (X). The synthesis

of biomass (full square) from biomass components (full dots)

requires an optimal allocation of a common nutrient (empty square)

through intermediate metabolites (empty dots). Additional reactions

(dotted arrow) may account for more subtle buffering interactions in

the complete network.

(e,f) Distribution of epistasis in experimental data of fitness

measurements of double and single mutants in RNA viruses. The

unimodal distribution of (e) diverges into a trimodal distribution

when is used (f). While these data support the FBA-derived

trimodal distribution in the [-1,1] range of , the presence of pairs

with > 1 stresses the relevance of the additional class of such

compensatory interactions (31 pairs, not shown). In viewing these

results, one should keep in mind that the data are based on a

heterogeneous collection of diverse experiments and may not

represent a truly random set of mutations.

Welche Effekte erwarten Siefür die gleichzeitigeEntfernung der beiden roten/grünen/schwarzen Proteine?

Page 62: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 62

Epistatic interactions between genesEpistatic interactions between genes classified by functional

annotation groups tend to be of a single sign (i.e.

monochromatic).

(a) Representation of the number of buffering and aggravating

interactions within and between groups of genes defined by

common preassigned annotation from the FBA model. The radii

of the pies represent the total number of interactions (ranging

logarithmically from 1 in the smallest pies to 35 in the largest).

The red and green pie slices reflect the numbers of aggravating

and buffering interactions, respectively. Monochromatic

interactions, represented by whole green or red pies, are much

more common than would be expected by chance.

(b) Sensitivity analysis of the prevalence of monochromaticity

with respect to changes in the growth conditions In each matrix,

an input parameter was modified with respect to the nominal

analysis: O−, oxygen concentration; C−, carbon concentration;

AC, acetate (instead of glucose) supplied as carbon source.

The color of the matrix element indicates the kind of interactions

observed between the genes in different annotation groups:

red for pure aggravating, green for pure buffering and yellow for

mixed links.

Segrè et al., Nature Genet. 37, 77 (2005)

Page 63: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 63

Prism algorithm

(a) The algorithm arranges a network of aggravating (red) and buffering (green)

interactions into modules whose genes interact with one another in a strictly

monochromatic way. This classification allows a system-level description of

buffering and aggravating interactions between functional modules.

Two networks with the same topology, but different permutations of link colors,

can have different properties of monochromatic clusterability: permuting links 3−4

with 2−4 transforms a 'clusterable' graph (b) into a 'nonclusterable' one (c).

Segrè et al., Nature Genet. 37, 77 (2005)

Page 64: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 64

Prism algorithm

Prism stands for „pairwise reduction into subgraphs monochromatically“

Examples of monochromatic clustering of 3

toy networks using Prism algorithm. Prism

performs agglomerative clustering with the

additional feature of avoiding, when

possible, the generation of clusters that

interact with each other with both

aggravating and buffering epistatic links.

(a) And (b) show examples of networks

which are clustered with no monochromatic

violations, i.e. with Qmodule = 0.

(c) provides an example for which a

monochromatic solution does not exist. In

this network, any two pairs of genes

clustered in the first step will cause a

monochromatic violation. The Prism

algorithm would find the solution shown,

which have a total of Qmodule = 1 violations.

Page 65: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 65

Ax,y is computed as the linear combination

of a direct affinity and an associative affinity

Agglomerative clustering

N

Z

ZYZX

YXYXyYxX

ayx

yYxX yx

YXdyx

N

EEaaA

nn

EA

1

,,

,,,

,

,

,

,

21wheremax

Start: assign each gene to a distinct cluster.

Iteration: at each sequential clustering step, pairs of clusters are combined

depending on the biological proximity, or affinity Ax,y between cluster x (size nx)

and cluster y (size ny).

ayx

dyxyx AAA ,,, 1

End: when the whole network is covered.

The epistasis network, EX,Y is defined as a discretization of the X,Y values based

on the cut-off parameters . EX,Y = -1 if X,Y < - and EX,Y if X,Y > +.

EX,Y = 0 otherwise.

Page 66: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 66

Agglomerative clustering

At every step, each cluster (x,y) is also assigned an integer Cx,y counting how

many non-monochromatic connections would be formed if clusters x and y were

joined (i.e. the number of clusters z that have buffering links with x and

aggravating links with y, or vice versa).

The algorithm hence identifies the set of (x,y) pairs for which Cx,y = Cm, where

The set contains all the candidate pairs that, if joined at the next step, would

cause the minimal possible number of monochromatic conflicts. The pair with

highest Ax,y in is then chosen as the pair of clusters to be combined.

At a given step, monochromaticity is preserved if Cm = 0.

The final clustering solution is assigned a total module-module monochromaticity

violation number, Qmodule = Cm, where the sum is over all the clustering steps.

yxyx

m CC ,,min

Q: Beschreiben Sie die Grundidee des PRISM-Algorithmus, agglomeratives Clustering (welches Abstandsmass wird benutzt?) und welches zusätzliche Kriterium (zähle die Anzahl an nicht-monochromatischen Verbindungen).

Page 67: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 67

Example for Prism

Schematic demonstration of monochromatic classification. A network of two types

of connections, such as buffering (green) and aggravating (red) epistasis, is

sorted into module of genes that interact with one another in a purely

monochromatic way.

Segrè et al., Nature Genet. 37, 77 (2005)

Page 68: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 68

Prism monochromatic gene interaction network

Segrè et al., Nature Genet. 37, 77 (2005)

Buffering (green) and aggravating

(red) gene interaction network.

Genes (black nodes) are grouped

into monochromatically interacting

modules (enclosing boxes).

Gene annotations (white letters

inside nodes) correlate well with

the unsupervised classification.

For directional buffering links,

arrows point from the deletion with

the larger effect to the deletion with

the smaller effect (i.e., to the

mutation whose fitness effect is

buffered by the presence of the

other).

Gene names are indicated on the side of the nodes. Names consisting of

the letter U followed by a number correspond to enzymatic or transport

reactions with unassigned genes. Prism parameter = 0.3 was used.

Page 69: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 69

System-level view of interactions between functional modulesNotable predictions of module-module interactions include the aggravating

link between LYSbs and TRPcat and the buffering one between PRObs

and ATPs. 'Buffering chains', such as PENT ATPs PRObs IDP,

can be observed owing to the coherent directionality of the buffering links

in a. Such chains are not necessarily transitive; e.g., although PENT

buffers ATPs, which buffers PRObs, there is no direct buffering from

PENT to PRObs. The interacting functional modules are shown at their

approximate locations on a schematic metabolic chart. Functional

modules are named to reflect the main common metabolic processes of

the genes involved:

ACAL, acetaldehyde and acetate metabolism

ATPs, ATP synthase

COA, pantothenate and coenzyme-A biosynthesis

ETHxt, ethanol transport

GLUCN, gluconeogenesis

GLYC, glycolysis

IDP, isocitrate dehydrogenase

LYSbs, lysine biosynthesis

PENT, pentose phosphate pathway

PRObs, proline biosynthesis

PROcat, proline catabolism

PYR, pyruvate metabolism

RESPIR, respiratory chain

STEROL, sterol biosynthesis

TCA, TCA cycle

TRPcat, tryptophan catabolism

URA, uracil biosynthesis.

Segrè et al., Nature Genet. 37, 77 (2005)

Page 70: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 70

ConclusionEpistatic interactions, manifested in the effects of mutations on the phenotypes caused by

other mutations, may help uncover the functional organization of complex biological networks.

Here, system-level epistatic interactions were studied by computing growth phenotypes of all

single and double knockouts of 890 metabolic genes in S. cerevisiae, using framework of FBA.

A new scale for epistasis identified a distinctive trimodal distribution of these epistatic effects,

allowing gene pairs to be classified as buffering, aggravating or noninteracting.

The epistatic interaction network could be organized hierarchically into function-

enriched modules that interact with each other 'monochromatically' (i.e., with purely

aggravating or purely buffering epistatic links).

This property extends the concept of epistasis from single genes to functional units and

provides a new definition of biological modularity, which emphasizes interactions between,

rather than within, functional modules.

Our approach can be used to infer functional gene modules from purely phenotypic epistasis

measurements. Segrè et al., Nature Genet. 37, 77 (2005)

Page 71: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 71

V21 Metabolic Pathway Analysis (MPA)Metabolic Pathway Analysis searches for meaningful structural and functional units

in metabolic networks. The most promising, very similar approaches are based on

convex analysis and use the sets of elementary flux modes (Schuster et al. 1999,

2000) and extreme pathways (Schilling et al. 2000).

Both sets span the space of feasible steady-state flux distributions by non-

decomposable routes, i.e. no subset of reactions involved in an EFM or EP can hold

the network balanced using non-trivial fluxes.

MPA can be used to study e.g.

- routing + flexibility/redundancy of networks

- functionality of networks

- idenfication of futile cycles

- gives all (sub)optimal pathways with respect to product/biomass yield

- can be useful for calculability studies in MFA

Klamt et al. Bioinformatics 19, 261 (2003)

Page 72: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 72

Metabolic Pathway Analysis: Elementary Flux ModesThe technique of Elementary Flux Modes (EFM) was developed prior to extreme

pathways (EP) by Stephan Schuster, Thomas Dandekar and co-workers:Pfeiffer et al. Bioinformatics, 15, 251 (1999)

Schuster et al. Nature Biotech. 18, 326 (2000)

The method is very similar to the „extreme pathway“ method to construct a basis

for metabolic flux states based on methods from convex algebra.

Extreme pathways are a subset of elementary modes, and for many systems, both

methods coincide.

Are the subtle differences important?

Page 73: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 73

Two approaches for Metabolic Pathway Analysis?The pathway P(v) is an elementary flux mode if it fulfills conditions C1 – C3.

(C1) Pseudo steady-state. S e = 0. This ensures that none of the metabolites is

consumed or produced in the overall stoichiometry.

(C2) Feasibility: rate ei 0 if reaction is irreversible. This demands that only

thermodynamically realizable fluxes are contained in e.

(C3) Non-decomposability: there is no vector v (unequal to the zero vector and to

e) fulfilling C1 and C2 and that P(v) is a proper subset of P(e). This is the core

characteristics for EFMs and EPs and supplies the decomposition of the network

into smallest units (able to hold the network in steady state).

C3 is often called „genetic independence“ because it implies that the enzymes in

one EFM or EP are not a subset of the enzymes from another EFM or EP.

Klamt & Stelling Trends Biotech 21, 64 (2003)

Page 74: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 74

Two approaches for Metabolic Pathway Analysis?The pathway P(e) is an extreme pathway if it fulfills conditions C1 – C3 AND

conditions C4 – C5.

(C4) Network reconfiguration: Each reaction must be classified either as exchange

flux or as internal reaction. All reversible internal reactions must be split up into

two separate, irreversible reactions (forward and backward reaction).

(C5) Systemic independence: the set of EPs in a network is the minimal set of

EFMs that can describe all feasible steady-state flux distributions.

Klamt & Stelling Trends Biotech 21, 64 (2003)

Q: Nennen Sie die 2 Hauptunterschiede zwischen den MethodenExtreme Pathways und Elementarmodenanalyse.A: siehe Zusatzbedingungen C4 und C5.

Page 75: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 75

Two approaches for Metabolic Pathway Analysis?

Klamt & Stelling Trends Biotech 21, 64 (2003)

A C P

B

D

A(ext) B(ext) C(ext)R1 R2 R3

R5

R4 R8

R9

R6

R7

Page 76: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 76

Reconfigured Network

Klamt & Stelling Trends Biotech 21, 64 (2003)

A C P

B

D

A(ext) B(ext) C(ext)R1 R2 R3

R5

R4 R8

R9

R6

R7bR7f

3 EFMs are not systemically independent:EFM1 = EP4 + EP5EFM2 = EP3 + EP5EFM4 = EP2 + EP3

Page 77: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 77

Property 1 of EFMs

Klamt & Stelling Trends Biotech 21, 64 (2003)

The only difference in the set of EFMs emerging upon reconfiguration consists in

the two-cycles that result from splitting up reversible reactions. However, two-cycles

are not considered as meaningful pathways.

Valid for any network: Property 1

Reconfiguring a network by splitting up reversible reactions leads to the same set of

meaningful EFMs.

Page 78: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 78

Software: FluxAnalyzerWhat is the consequence of when all exchange fluxes (and hence all

reactions in the network) are irreversible?

Klamt & Stelling Trends Biotech 21, 64 (2003)

EFMs and EPs always co-incide!

Page 79: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 79

Property 2 of EFMs

Klamt & Stelling Trends Biotech 21, 64 (2003)

Property 2

If all exchange reactions in a network are irreversible then the sets of meaningful

EFMs (both in the original and in the reconfigured network) and EPs coincide.

Page 80: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 80

Reconfigured Network

Klamt & Stelling Trends Biotech 21, 64 (2003)

A C P

B

D

A(ext) B(ext) C(ext)R1 R2 R3

R5

R4 R8

R9

R6

R7bR7f

3 EFMs are not systemically independent:EFM1 = EP4 + EP5EFM2 = EP3 + EP5EFM4 = EP2 + EP3

Page 81: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 81

Comparison of EFMs and EPs

Klamt & Stelling Trends Biotech 21, 64 (2003)

Problem EFM (network N1) EP (network N2)

Recognition of 4 genetically indepen- Set of EPs does not contain

operational modes: dent routes all genetically independent

routes for converting (EFM1-EFM4) routes. Searching for EPs

exclusively A to P. leading from A to P via B,

no pathway would be found.

Page 82: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 82

Comparison of EFMs and EPs

Klamt & Stelling Trends Biotech 21, 64 (2003)

Problem EFM (network N1) EP (network N2)

Finding all the EFM1 and EFM2 are One would only find the

optimal routes: optimal because they suboptimal EP1, not the

optimal pathways for yield one mole P per optimal routes EFM1 and

synthesizing P during mole substrate A EFM2.

growth on A alone. (i.e. R3/R1 = 1),

whereas EFM3 and

EFM4 are only sub-

optimal (R3/R1 = 0.5).

Page 83: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 83

Comparison of EFMs and EPs

Klamt & Stelling Trends Biotech 21, 64 (2003)

EFM (network N1)

4 pathways convert A

to P (EFM1-EFM4),

whereas for B only one

route (EFM8) exists.

When one of the

internal reactions (R4-

R9) fails, for production

of P from A 2 pathways

will always „survive“. By

contrast, removing

reaction R8 already

stops the production of

P from B alone.

EP (network N2)

Only 1 EP exists for

producing P by substrate A

alone, and 1 EP for

synthesizing P by (only)

substrate B. One might

suggest that both

substrates possess the

same redundancy of

pathways, but as shown by

EFM analysis, growth on

substrate A is much more

flexible than on B.

Problem

Analysis of network

flexibility (structural

robustness,

redundancy):

relative robustness of

exclusive growth on

A or B.

Page 84: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 84

Comparison of EFMs and EPs

Klamt & Stelling Trends Biotech 21, 64 (2003)

EFM (network N1)

R8 is essential for

producing P by substrate

B, whereas for A there is

no structurally „favored“

reaction (R4-R9 all occur

twice in EFM1-EFM4).

However, considering the

optimal modes EFM1,

EFM2, one recognizes the

importance of R8 also for

growth on A.

EP (network N2)

Consider again biosynthesis

of P from substrate A (EP1

only). Because R8 is not

involved in EP1 one might

think that this reaction is not

important for synthesizing P

from A. However, without this

reaction, it is impossible to

obtain optimal yields (1 P per

A; EFM1 and EFM2).

Problem

Relative importance

of single reactions:

relative importance of

reaction R8.

Page 85: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 85

Comparison of EFMs and EPs

Klamt & Stelling Trends Biotech 21, 64 (2003)

EFM (network N1)

R6 and R9 are an enzyme

subset. By contrast, R6

and R9 never occur

together with R8 in an

EFM. Thus (R6,R8) and

(R8,R9) are excluding

reaction pairs.(In an arbitrary composable

steady-state flux distribution they

might occur together.)

EP (network N2)

The EPs pretend R4 and R8

to be an excluding reaction

pair – but they are not

(EFM2). The enzyme

subsets would be correctly

identified. However, one can construct simple

examples where the EPs would also

pretend wrong enzyme subsets (not

shown).

Problem

Enzyme subsets

and excluding

reaction pairs:

suggest regulatory

structures or rules.

Page 86: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 86

Comparison of EFMs and EPs

Klamt & Stelling Trends Biotech 21, 64 (2003)

EFM (network N1)

The shortest pathway

from A to P needs 2

internal reactions (EFM2),

the longest 4 (EFM4).

EP (network N2)

Both the shortest (EFM2)

and the longest (EFM4)

pathway from A to P are not

contained in the set of EPs.

Problem

Pathway length:

shortest/longest

pathway for

production of P from

A.

Page 87: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 87

Comparison of EFMs and EPs

Klamt & Stelling Trends Biotech 21, 64 (2003)

EFM (network N1)

All EFMs not involving the

specific reactions build up

the complete set of EFMs

in the new (smaller) sub-

network. If R7 is deleted,

EFMs 2,3,6,8 „survive“.

Hence the mutant is

viable.

EP (network N2)

Analyzing a subnetwork

implies that the EPs must be

newly computed. E.g. when

deleting R2, EFM2 would

become an EP. For this

reason, mutation studies

cannot be performed easily.

Problem

Removing a

reaction and

mutation studies:

effect of deleting R7.

Page 88: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 88

Integrated Analysis of Metabolic and Regulatory NetworksSofar, studies of large-scale cellular networks have focused on their connectivities.

The emerging picture shows a densely-woven web where almost everything is

connected to everything.

In the cell‘s metabolic network, hundreds of substrates are interconnected through

biochemical reactions.

Although this could in principle lead to the simultaneous flow of substrates in

numerous directions, in practice metabolic fluxes pass through specific pathways

( high flux backbone, V20).

Topological studies sofar did not consider how the modulation of this connectivity

might also determine network properties.

Therefore it is important to correlate the network topology (picture derived from

EFMs and EPs) with the expression of enzymes in the cell.

Page 89: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 89

Analyze transcriptional control in metabolic networksRegulatory and metabolic functions of cells are mediated by networks of interacting

biochemical components.

Metabolic flux is optimized to maximize metabolic efficiency under different

conditions.

Control of metabolic flow:

- allosteric interactions

- covalent modifications involving enzymatic activity

- transcription (revealed by genome-wide expression studies)

Here: N. Barkai and colleagues analyzed published experimental expression data of

Saccharomyces cerevisae.

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Page 90: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 90

Recurrence signature algorithmAvailability of DNA microarray data study transcriptional response of a complete

genome to different experimental conditions.

An essential task in studying the global structure of transcriptional networks is the

gene classification.

Commonly used clustering algorithms classify genes successfully when applied to

relatively small data sets, but their application to large-scale expression data is

limited by 2 well-recognized drawbacks:

- commonly used algorithms assign each gene to a single cluster, whereas in fact

genes may participate in several functions and should thus be included in several

clusters

- these algorithms classify genes on the basis of their expression under all

experimental conditions, whereas cellular processes are generally affected only by

a small subset of these conditions.

Ihmels et al. Nat Genetics 31, 370 (2002)

Page 91: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 91

Recurrence signature algorithmAim: identify transcription „modules“ (TMs).

a set of randomly selected genes is unlikely to be identical to the genes of any

TM. Yet many such sets do have some overlap with a specific TM.

In particular, sets of genes that are compiled according to existing knowledge of

their functional (or regulatory) sequence similarity may have a significant overlap

with a transcription module.

Algorithm receives a gene set that partially overlaps a TM and then provides the

complete module as output. Therefore this algorithm is referred to as „signature

algorithm“.

Ihmels et al. Nat Genetics 31, 370 (2002)

Page 92: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 92

Recurrence signature algorithm

a, The signature algorithm.

b , Recurrence as a reliability measure. The signature algorithm is applied to distinct input

sets containing different subsets of the postulated transcription module. If the different input

sets give rise to the same module, it is considered reliable.

c, General application of the recurrent signature method.

Ihmels et al. Nat Genetics 31, 370 (2002)

normalizationof data

identify modules

classify genesinto modules

Page 93: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 93

Correlation between genes of the same metabolic pathwayDistribution of the average correlation

between genes assigned to the same

metabolic pathway in the KEGG database.

The distribution corresponding to random

assignment of genes to metabolic

pathways of the same size is shown for

comparison. Importantly, only genes

coding for enzymes were used in the

random control.

Interpretation: pairs of genes associated

with the same metabolic pathway show

a similar expression pattern.

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

However, typically only a set of the

genes assigned to a given pathway

are coregulated.

Page 94: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 94

Correlation between genes of the same metabolic pathway

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Genes of the glycolysis pathway

(according KEGG) were clustered

and ordered based on the correlation

in their expression profiles.

Shown here is the matrix of their

pair-wise correlations.

The cluster of highly correlated

genes (orange frame) corresponds

to genes that encode the central

glycolysis enzymes.

The linear arrangement of these

genes along the pathway is shown at

right.

Of the 46 genes assigned to the

glycolysis pathway in the KEGG

database, only 24 show a correlated

expression pattern.

In general, the coregulated genes

belong to the central pieces of

pathways.

Page 95: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 95

Coexpressed enzymes often catalyze linear chain of reactionsCoregulation between enzymes

associated with central metabolic

pathways. Each branch

corresponds to several enzymes.

In the cases shown, only one of the

branches downstream of the

junction point is coregulated with

upstream genes.

Interpretation: coexpressed

enzymes are often arranged in a

linear order, corresponding to a

metabolic flow that is directed in

a particular direction.

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Page 96: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 96

Co-regulation at branch points

To examine more systematically whether coregulation enhances the linearity of

metabolic flow, analyze the coregulation of enzymes at metabolic branch-points.

Search KEGG for metabolic compounds that are involved in exactly 3 reactions.

Only consider reactions that exist in S.cerevisae.

3-junctions can integrate metabolic flow (convergent junction)

or allow the flow to diverge in 2 directions (divergent junction).

In the cases where several reactions are catalyzed by the same enzymes, choose

one representative so that all junctions considered are composed of precisely 3

reactions catalyzed by distinct enzymes.

Each 3-junction is categorized according to the correlation pattern found between

enzymes catalyzing its branches. Correlation coefficients > 0.25 are considered

significant.

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Page 97: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 97

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Coregulation pattern in three-point junctions

In the majority of divergent

junctions, only one of the

emanating branches is

significantly coregulated with

the incoming reaction that

synthesizes the metabolite.

All junctions corresponding to metabolites that participate in exactly 3

reactions (according to KEGG) were identified and the correlations

between the genes associated with each such junction were calculated.

The junctions were grouped according to the directionality of the

reactions, as shown.

Divergent junctions, which allow the flow of metabolites in two

alternative directions, predominantly show a linear coregulation pattern,

where one of the emanating reaction is correlated with the incoming

reaction (linear regulatory pattern) or the two alternative outgoing

reactions are correlated in a context-dependent manner with a distinct

isozyme catalyzing the incoming reaction (linear switch).

By contrast, the linear regulatory pattern is significantly less abundant

in convergent junctions, where the outgoing flow follows a unique

direction, and in conflicting junctions that do not support metabolic flow.

Most of the reversible junctions comply with linear regulatory patterns.

Indeed, similar to divergent junctions, reversible junctions allow

metabolites to flow in two alternative directions. Reactions were

counted as coexpressed if at least two of the associated genes were

significantly correlated (correlation coefficient >0.25). As a random

control, we randomized the identity of all metabolic genes and repeated

the analysis.

Page 98: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 98

Co-regulation at branch points: conclusions

The observed co-regulation patterns correspond to a linear metabolic flow,

whose directionality can be switched in a condition-specific manner.

When analyzing junctions that allow metabolic flow in a larger number of

directions, there also only a few important branches are coregulated with

the incoming branch.

Therefore: transcription regulation is used to enhance the linearity of

metabolic flow, by biasing the flow toward only a few of the possible routes.

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Page 99: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 99

The connectivity of a given metabolite

is defined as the number of reactions

connecting it to other metabolites.

Shown are the distributions of

connectivity between metabolites in an

unrestricted network () and in a

network where only correlated

reactions are considered ().

In accordance with previous results

(Jeong et al. 2000) , the connectivity

distribution between metabolites

follows a power law (log-log plot).

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Connectivity of metabolites

In contrast, when coexpression is

used as a criterion to distinguish

functional links, the connectivity

distribution becomes exponential

(log-linear plot).

Page 100: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 100

Differential regulation of isozymes

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Observe that isozymes at junction points are often preferentially coexpressed

with alternative reactions.

investigate their role in the metabolic network more systematically.

Two possible functions of isozymes

associated with the same metabolic

reaction.

An isozyme pair could provide redundancy which may be needed for buffering genetic

mutations or for amplifying metabolite production. Redundant isozymes are expected

to be coregulated.

Alternatively, distinct isozymes could be dedicated to separate biochemical

pathways using the associated reaction. Such isozymes are expected to be

differentially expressed with the two alternative processes.

Page 101: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 101

Arrows represent metabolic

pathways composed of a sequence

of enzymes.

Coregulation is indicated with the

same color (e.g., the isozyme

represented by the green arrow is

coregulated with the metabolic

pathway represented by the green

arrow).

Most members of isozyme pairs

are separately coregulated with

alternative processes.

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Differential regulation of isozymes in central metabolic PW

Page 102: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 102

Regulatory pattern of all gene pairs

associated with a common metabolic

reaction (according to KEGG).

All such pairs were classified into several

classes:

(1) parallel, where each gene is

correlated with a distinct connected

reaction (a reaction that shares a

metabolite with the reaction catalyzed by

the respective gene pair);

(2) selective, where only one of the

enzymes shows a significant correlation

with a connected reaction; and

(3) converging, where both enzymes

were correlated with the same reaction.

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Differential regulation of isozymes

Correlations coefficients >0.25 were

considered significant. To be

counted as parallel, rather than

converging, we demanded that the

correlation with the alternative

reaction be <80% of the correlation

with the preferred reaction.

Page 103: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 103

The primary role of isozyme multiplicity is to allow for differential regulation

of reactions that are shared by separated processes.

Dedicating a specific enzyme to each pathway may offer a way of independently

controlling the associated reaction in response to pathway-specific requirements, at

both the transcriptional and the post-transcriptional levels.

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Differential regulation of isozymes: interpretation

Page 104: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 104

Identify the coregulated subparts of each metabolic pathway and identify relevant

experimental conditions that induce or repress the expression of the pathway

genes.

Also associate additional genes showing similar expression profiles with each

pathway using the signature algorithm.

Input: set of genes, some of which are expected to be coregulated.

Output: coregulated part of the input and additional coregulated genes together

with the set of conditions where the coregulation is realized.

Numerous genes were found that are not directly involved in enzymatic steps:

- transporters

- transcription factors

Genes coexpressed with metabolic pathways

Q: Von welchen Proteinklassen erwarten Sie, dass sie mit den Proteinen eines biochemischenPfades co-exprimiert sind?A: Feeder-Pathways, Transporter, Transkriptions-faktoren.

Page 105: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 105

Co-expression of transporters

Transporter genes are

co-expressed with the relevant

metabolic pathways providing

the pathways with its metabolites.

Co-expression is marked in green.

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Page 106: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 106

Co-regulation of transcription factors

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Transcription factors are often co-regulated with their regulated pathways. Shown

here are transcription factors which were found to be co-regulated in the analysis.

Co-regulation is shown by color-coding such that the transcription factor and the

associated pathways are of the same color.

Page 107: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 107

Sofar: co-expression analysis revealed a strong tendency toward coordinated

regulation of genes involved in individual metabolic pathways.

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Hierarchical modularity in the metabolic network

Does transcription regulation also define a higher-order metabolic organization, by

coordinated expression of distinct metabolic pathways?

Based on observation that feeder pathways (which synthesize metabolites) are

frequently coexpressed with pathways using the synthesized metabolites.

Page 108: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 108

Feeder-pathways/enzymesFeeder pathways or genes

co-expressed with the

pathways they fuel. The

feeder pathways (light blue)

provide the main pathway

(dark blue) with metabolites

in order to assist the main

pathway, indicating that co-

expression extends beyond

the level of individual

pathways.

These results can be

interpreted in the following

way: the organism will

produce those enzymes that

are needed.Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Page 109: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 109

Hierarchical modularity in the metabolic networkDerive hierarchy by applying an iterative

signature algorithm to the metabolic pathways,

and decreasing the resolution parameter

(coregulation stringency) in small steps.

Each box contains a group of coregulated genes

(transcription module). Strongly associated

genes (left) can be associated with a specific

function, whereas moderately correlated

modules (right) are larger and their function is

less coherent.

The merging of 2 branches indicates that the

associated modules are induced by similar

conditions.

All pathways converge to one of 3 low-resolution

modules: amino acid biosynthesis, protein

synthesis, and stress.Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Page 110: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 110

Hierarchical modularity in the metabolic networkAlthough amino acids serve as building blocks for proteins, the expression of genes

mediating these 2 processes is clearly uncoupled!

This may reflect the association of rapid cell growth (which triggers enhanced

protein synthesis) with rich growth conditions, where amino acids are readily

available and do not need to be synthesized.

Amino acid biosynthesis genes are only required when external amino acids are

scarce.

In support of this view, a group of amino acid transporters converged to the protein

synthesis module, together with other pathways required for rapid cell growth

(glucose fermentation, nucleotide synthesis and fatty acid synthesis).

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Page 111: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 111

Global network propertiesJeong et al. showed that the structural connectivity between metabolites imposes a

hierarchical organization of the metabolic network. That analysis was based on

connectivity between substrates, considering all potential connections.

Here, analysis is based on coexpression of enzymes.

In both approaches, related metabolic pathways were clustered together!

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

There are, however, some differences in the particular groupings (not discussed

here),

and importantly, when including expression data the connectivity pattern of

metabolites changes from a power-law dependence to an exponential one

corresponding to a network structure with a defined scale of connectivity.

This reflects the reduction in the complexity of the network.

Page 112: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 112

SummaryTranscription regulation is prominently involved in shaping the metabolic network of

S. cerevisae.

1 Transcription leads the metabolic flow toward linearity.

2 Individual isozymes are often separately coregulated with distinct processes,

providing a means of reducing crosstalk between pathways using a common

reaction.

3 Transcription regulation entails a higher-order structure of the metabolic

network.

It exists a hierarchical organization of metabolic pathways into groups of

decreasing expression coherence.

Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004)

Page 113: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 113

V24 Framework for computation of elementary modes

Page 114: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 114

Definition and benefits of Elementary Modes

Consider a metabolic network with m metabolites and q reactions.

Reactions may involve further metabolites that are not considered as proper

members of the system of study.

These are considered to be buffered, and are called external metabolites in

opposition to the m metabolites within the boundary of the system, called internal

metabolites.

The stoichiometry matrix N is an m × q matrix.

Its element nij is the signed stoichiometric coefficient of metabolite i in reaction j

with the following sign convention: negative for educts, positive for products.

Page 115: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 115

Definition and benefits of Elementary Modes

Some reactions, called irreversible reactions, are thermodynamically feasible in

only one direction under the normal conditions of the system.

Therefore, the reaction indices are split into two sets:

Irrev (the set of irreversible reaction indices) and

Rev (the set of reversible reaction indices).

A flux vector (flux distribution), denoted v, is a q-vector of the reaction space q,

in which each element vi describes the net rate of the ith reaction.

Sometimes we are interested only in the relative proportions of fluxes in a flux

vector.

In this sense, two flux vectors v and v' can be seen to be equivalent,

denoted by v ≃ v', if and only if there is some α > 0 such that v = α · v'.

Page 116: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 116

Definition and benefits of Elementary ModesMetabolism involves fast reactions and high turnover of substances compared to

events of gene regulation. Therefore, it is often assumed that metabolite

concentrations and reaction rates are equilibrated, thus constant, in the timescale of

study. The metabolic system is then considered to be in quasi steady state.

This assumption implies Nv = 0.

Thermodynamics impose the rate of each irreversible reaction to be nonnegative.

Consequently the set of feasible flux vectors is restricted to

P = {v q : Nv = 0 and vi ≥ 0, i Irrev}     (1)

P is a set of q-vectors that obey a finite set of homogeneous linear equalities and

inequalities, namely

- the |Irrev| inequalities defined by vi ≥ 0, i Irrev and

- the m equalities defined by Nv = 0.

P is therefore – by definition – a convex polyhedral cone.

Q: Welche Bedingungen müssengültige Flüsse in einem metabolischen Netz erfüllen?

Page 117: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 117

Elementary Flux Modes

Metabolic pathway analysis serves to describe the (infinite) set P of feasible states by

providing a (finite) set of vectors that allow the generation of any vectors of P and

are of fundamental importance for the overall capabilities of the metabolic

system.

One of this set is the so-called set of elementary (flux) modes (EMs).

For a given flux vector v, we note R(v) = {i : vi ≠ 0} the set of indices of the reactions

participating in v.

R(v) can be seen as the underlying pathway of v.

Page 118: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 118

Elementary Flux Modes

By definition, a flux vector e is an elementary mode (EM) if and only if it fulfills the

following three conditions:

In other words, e is an EM if and only if

(1) it works at quasi steady state,

(2) is thermodynamically feasible and

(3) there is no other non-null flux vector (up to a scaling) that both satisfies these

constraints and involves a proper subset of its participating reactions.

With this convention, reversible modes are here considered as two vectors of

opposite directions.

Q: welche Eigenschaften müssenElementarmoden erfüllen?

Page 119: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 119

Applications of elementary modes in metabolic studies(1) Identification of pathways: The set of EMs comprises all admissible routes

through the network and thus of "pathways" in the classical sense, i.e. of routes that

convert some educts into some products.

(2) Network flexibility: The number of EMs is at least a rough measure of the

network's flexibility (redundancy, structural robustness) to perform a certain function.

(3) Identification of all pathways with optimal yield: Consider the linear

optimization problem, where all flux vectors with optimal product yield are to be

identified, i.e. where the moles of products generated per mole of educts is maximal.

Then, one or several of the EMs reach this optimum and any optimal flux vector is a

convex combination of these optimal EMs.

(4) Importance of reactions: The importance or relevance of a reaction can be

assessed by its participation frequency or/and flux values in the EMs.

(4a) Inference of viability of mutants: If a reaction is involved in all growth-related

EMs its deletion can be predicted to be lethal, since all EMs would disappear.

Page 120: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 120

Applications of elementary modes in metabolic studies(5) Reaction correlations: EMs can be used to analyze structural couplings

between reactions, which might give hints for underlying regulatory circuits.

An extreme case is an enzyme (or reaction) subset (set of reactions which can

operate only together) or a pair of mutually excluding reactions (two reactions never

occurring together in any EM).

(6) Detection of thermodynamically infeasible cycles: EMs representing internal

cycles (without participation of external material or energy sources) are infeasible by

laws of thermodynamics and thus reflect structural inconsistencies.

Q: wieso sind interne geschlossene Zyklen nicht zulässig?

Q: wie kann man durch EM-Analyse lethale Mutanten

in einem metabolischen Netz identifizieren?

Page 121: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 121

In the particular case of a metabolic system with only irreversible reactions, the set

of admissible reactions reads:

Compared with (1) P is in this case a particular, namely a pointed polyhedral cone.

A unified framework - Elementary modes as extreme rays in networks of irreversible reactions

A pointed polyhedral cone. Dashed lines represent virtual cuts of unbounded areas

Page 122: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 122

Pointed polyhedral conesThis geometry can be intuitively understood, noting that there are certainly

'enough' intersecting half-spaces (given by the inequalities v ≥ 0) to have this

'pointed' effect in 0:

P contains no real line (otherwise there coexist x and -x not null in P, a

contradiction with the constraint v ≥ 0).

The figure even suggests that a pointed polyhedral cone can be either defined in

an implicit way, by the set of constraints as we did until now, or in an explicit or

generative way, by its 'edges', the so-called extreme rays (or generating vectors)

that unambiguously define its boundaries.

In the following, we show that elementary modes always correspond to extreme

rays of a particular pointed cone as defined in (3) and that their computation

therefore matches to the so-called extreme ray enumeration problem, i.e. the

problem of enumerating all extreme rays of a pointed polyhedral cone defined by

its constraints.

!

Q: wann kann eine Linie zu dem Lösungs-raum der gültigen Flüssen gehören?

Page 123: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 123

Pointed polyhedral cone – more precise

Definition P is a pointed polyhedral cone of d if and only if P is defined by a

full rank h × d matrix A (rank(A) = d) such that,

Insert: the rank of a matrix is the dimension of the range of the matrix,

corresponding to the number of linearly independent rows or columns of the matrix.

The h rows of the matrix A represent h linear inequalities, whereas the full rank

mention imposes the "pointed" effect in 0. Note that a pointed polyhedral cone is,

in general, not restricted to be located completely in the positive orthant as in (3).

For example, the cone considered in extreme-pathway analysis may have

negative parts (namely for exchange reactions), however, by using a particular

configuration it is ensured that the spanned cone is pointed.

Page 124: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 124

Extreme rays

A vector r is said to be a ray of P(A) if r ≠ 0 and for all α > 0, α · r P(A).

We identify two rays r and r' if there is some α > 0 such that r = α · r' and we

denote r ≃ r', analogous as introduced above for flux vectors.

For any vector x in P(A), the zero set or active set Z(x) is the set of inequality

indices satisfied by x with equality. Noting Ai• the ith row of A, Z(x) = {i : Ai•x = 0}.

Zero sets can be used to characterize extreme rays.

Q: Welche Beziehung herrscht zwischen Z(x), dem Zero-Set,und R(x), dem Reaktionssatz?A: Z(x) ist das Komplement von R(x).

Page 125: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 125

Extreme rays - definitionDefinition 1: Extreme ray

Let r be a ray of the pointed polyhedral cone P(A). The following statements are

equivalent:

(a) r is an extreme ray of P(A)

(b) if r' is a ray of P(A) with Z(r) Z(r') then r' ≃ r

Since A is full rank, 0 is the unique vector that solves all constraints with equality.

The extreme rays are those rays of P(A) that solve a maximum but not all

constraints with equalities. This is expressed in (b) by requiring that no other ray in

P(A) solves the same constraints plus additional ones with equalities. Note that in

(b) Z(r) = Z(r') consequently holds.

An important property of the extreme rays is that they form a finite set of generating

vectors of the pointed cone: any vector of P(A) can be expressed as a non-negative

linear combination of extreme rays, and the converse is true: all non-negative

combinations of extreme rays lie in P(A). The set of extreme rays is the unique

minimal set of generating vectors of a pointed cone P(A) (up to positive scalings).

Page 126: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 126

Elementary modes

Lemma 1: EMs in networks of irreversible reactions

In a metabolic system where all reactions are irreversible, the EMs are exactly the

extreme rays of P = {v q : Nv = 0 and v ≥ 0}.

Page 127: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 127

NotationsWe denote the original reaction network by S and the reconfigured network (with all

reversible reactions split up) by S'.

The reactions of S are indexed from 1 to q.

Irrev denotes the set of irreversible reaction indices and Rev the reversible ones.

An irreversible reaction indexed i gives rise to a reaction of S' indexed i.

A reversible reaction indexed i gives rise to two opposite reactions of S' indexed by

the pairs (i,+1) and (i,-1) for the forward and the backward respectively.

The reconfiguration of a flux vector v q of S is a flux vector v' Irrev Rev × {-1;+1}

of S' such that

Page 128: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 128

Notations

Let N' be the stoichiometry matrix of S'. N' can be written as N' = [N - NRev] where

NRev consists of all columns of N corresponding to reversible reactions.

Note that if v is a flux vector of S and v' is its reconfiguration then Nv = N'v'.

If possible, i.e. if v' Irrev Rev × {-1;+1} is such that for any reversible reaction index

i Rev at least one of the two coefficients v'(i,+1) or v'(i,-1) equals zero, then we define

the reverse operation, called back-configuration that maps v' back to a flux vector v

such that:

Page 129: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 129

Theorem 1: EMs in original and in reconfigured networks

Theorem 1

Let S be a metabolic system and S' its reconfiguration by splitting up reversible

reactions. Then the set of EMs of S' is the union of

a) the set of reconfigured EMs of S

b) the set of two-cycles made of a forward and a backward reaction of S'

derived from the same reversible reaction of S

Page 130: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 130

V25 Framework for computation of elementary modes II

Page 131: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 131

Elementary modes

Lemma 1: EMs in networks of irreversible reactions

In a metabolic system where all reactions are irreversible, the EMs are exactly the

extreme rays of P = {v q : Nv = 0 and v ≥ 0}.

Proof: P is the solution set of the linear inequalities defined by

where I is the q × q identity matrix.

Since it contains I, A is full rank and therefore P is a pointed polyhedral cone.

All v P obey Nv = 0, thus the 2m first inequalities defined by A hold with equality

for all vectors in P and the inclusion condition of Definition 1 can be restricted to the

last q inequalities, i.e. the inequalities corresponding to the reactions.

Inclusion over the zero set can be equivalently seen as containment over the set of

non-zeros in v, i.e. R(v). Consequently, e P is an extreme ray of P if and only if:

for all e' P : R(e') R(e) e' = 0 or e' ≃ e, i.e. if and only if e is elementary.

Thus, all three conditions in (2) are fulfilled.

Page 132: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 132

Strategy of the Double Description Method

Iteratively build a minimal DD pair (Ak, Rk) from a minimal DD pair (Ak - 1, Rk - 1),

where Ak is a submatrix of A made of k rows of A.

At each step the columns of Rk are the extreme rays of P(Ak), the convex

polyhedron defined by the linear inequalities Ak. The incremental step introduces a

constraint of A that is not yet satisfied by all computed extreme rays. Some extreme

rays are kept, some are discarded and new ones are generated. The generation of

new extreme rays relies on the notion of adjacent extreme rays.

Definition 2: Adjacent extreme rays

Let r and r' be distinct rays of the pointed polyhedral cone P(A). Then the following

statements are equivalent:

(a) r and r' are adjacent extreme rays

(b) if r" is a ray of P(A) with Z(r) ∩ Z(r') Z(r") then either r" ≃ r or r" ≃ r'

Page 133: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 133

Initialization

The initialization of the double description method must be done with a minimal DD

pair.

One possibility is the following.

Since P is pointed, A has full rank and contains a nonsingular submatrix of order d

denoted by Ad.

Insert: a square matrix A has an inverse A-1 (so that A A-1 = 1) if its determinant |A| 0

Hence, Ad-1 can be constructed and (Ad, Ad

-1) is a minimal DD pair which works as

initialization and leads directly to step k = d.

Note: there is some freedom in choosing a submatrix Ad or some alternative starting

minimal DD pair.

Page 134: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 134

Incremental stepAssume (Ak-1, Rk–1) is a minimal DD pair and consider a kth constraint defined by a

not yet extracted row of A, denoted Ai•.

Let J be the set of column indices of Rk - 1 and rj, j J, its column vectors, i.e. the

extreme rays of P(Ak – 1), the polyhedral cone of the previous iteration.

Ai• splits J in three parts (see Figure) whether rj satisfies the constraint with strict

inequality (positive ray), with equality (zero ray) or does not satisfy it (negative

ray):

J+ = {j J : Ai• rj > 0}

J0 = {j J : Ai• rj = 0}     (6)

J- = {j J : Ai• rj < 0}Double description incremental step. The scene is best visualized with a polytope; consider the cube pictured here as a 3 projection of a 4 polyhedral cone. Extreme rays from the previous iteration are {a,b,c,d,e,f,g,h} whose adjacencies are represented by edges. For the considered constraint, whose null space is the hyperplane depicted by the bold black border lines, b and f are positive rays, a and c are zero rays, d, e, g and h are negative rays. b, f, a and c satisfy the constraint and are kept for the next iteration. {f,e} and {f,g} are the only two pairs of adjacent positive/negative rays and only they give rise to new rays: i and j at the intersection of the hyperplane and the respective edges. The new polytope is then defined by its extreme rays: {a,b,c,f,i,j}.

Page 135: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 135

Minimality of Rk

Minimality of Rk is ensured in considering all positive rays, all zero rays and new

rays obtained as combination of a positive and a negative ray that are adjacent to

each other.

For convenience, we denote by Adj the index set of the newly generated rays in

which every new ray is expressed by a pair of indices corresponding to the two

adjacent rays combined.

Hence, Rk is defined as the set of column vectors rj, j J' with

The incremental step is repeated until k = h i.e. having treated all rows of the matrix

A.

The columns of the final matrix Rm are the extreme rays of P(A)

Page 136: V18 EcoCyc Analysis of E.coli Metabolism

18. Lecture WS 2004/05

Bioinformatics III 136

Computing EMsThe Double Description Method together with Theorem 1 offers a framework for

computing EMs. The only steps to include are a reconfiguration step that splits

reversible reactions and builds the matrix A, and a post-processing step that gets

rid of futile two-cycles and computes the back-configuration.

The dimension of the space is given by the number of reactions in the reconfigured

network: q' = q + |Rev|. This results in the following algorithmic scheme: