randomizing genome-scale metabolic networks
DESCRIPTION
Talk presented at the meeting:Metabolic pathway analysis 201119—23 September 2011University of Chester, UKTRANSCRIPT
Areejit Samal – MPA’11 Chester
Laboratoire de Physique Théorique et Modèles Statistiques (LPTMS), CNRS and Univ Paris-Sud, Orsay, France
and
Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany
Areejit Samal
Randomizing genome-scale metabolic networks
Areejit Samal – MPA’11 Chester
Architectural features of real-world networksArchitectural features of real-world networks
Small Average Path Length & High Local Clustering
Watts and Strogatz (1998)
Small-World
Ravasz et al (2002)
Power law degree distribution & Hierarchical modularity
Barabasi and Albert (1999)
Scale-free and Hierarchical Organization
Scale-free graphs can be generated using a simple model incorporating: Growth & Preferential attachment scheme
Structure of re
al-world
networks is very different fr
om random graphs
Structure of re
al-world
networks is very different fr
om random graphs
Areejit Samal – MPA’11 Chester
Shen-Orr et al (2002); Milo et al (2002,2004)
Network motifs and evolved designNetwork motifs and evolved design
Sub-graphs that are over-represented in real networks compared to randomized networks with same degree sequence.
Certain network motifs were shown to perform important information processing tasks.
For example, the coherent Feed Forward Loop (FFL) can be used to detect persistent signals.
The preponderance of certain sub-graphs in a real network may
reflect evolutionary design principles favored by selection.
Areejit Samal – MPA’11 Chester
Chosen Property
0.60 0.62 0.64 0.66 0.68 0.70
Fre
qu
ency
0
50
100
150
200
250
300
““Null hypothesis” based approach for identifying design Null hypothesis” based approach for identifying design principles in real-world networksprinciples in real-world networks
The procedure to deduce statistically significant network properties is:
Measure the ‘chosen property’ (E.g., motif significance profile, average clustering coefficient, etc.) in the investigated real network.
Generate randomized networks with structure similar to the investigated real network using an appropriate null model.
Use the distribution of the ‘chosen property’ for the randomized networks to estimate a p-value.
Investigated network
p-value
Areejit Samal – MPA’11 Chester
Edge-exchange algorithm: widely used null modelEdge-exchange algorithm: widely used null model
Randomized networks preserve the degree at
each node and the degree sequence.
Investigated Network
After 1 exchange
After 2 exchanges
Reference: Shen-Orr et al (2002); Milo et al (2002,2004); Maslov & Sneppen (2002)
…
Areejit Samal – MPA’11 Chester
Artzy-Randrup et al Science (2004)
Carefully posed null models are required for deducing Carefully posed null models are required for deducing design principles with confidencedesign principles with confidence
In C. elegans, close by neurons have a greater chance of forming a connection than distant neurons.
A null model incorporating this spatial constraint gives different results compared to generic edge-exchange algorithm.
Misinterpretation can arise due to use of an incomplete null model (e.g. ignoring important constraints) !!
A common generic null model (edge exchange algorithm) is unsuitable for different types of real-world networks.
Areejit Samal – MPA’11 Chester
Edge-exchange algorithm: case of bipartite metabolic Edge-exchange algorithm: case of bipartite metabolic networksnetworks
ASPT CITL
fum
asp-L cit
ac oaanh4
ASPT* CITL*
fum
asp-L cit
ac oaanh4
ASPT: asp-L → fum + nh4CITL: cit → oaa + ac
ASPT*: asp-L → ac + nh4 CITL*: cit → oaa + fum
Preserves degree of each node in the networkBut generates fictitious reactions that violate mass, charge and atomic balance satisfied by real chemical reactions!!
Note that fum has 4 carbon atoms while ac has 2 carbon atoms in the example shown.
Example: Guimera & Amaral (2005); Samal et al (2006)
Biochemica
lly m
eaningless ra
ndomizatio
n inappropria
te for
Biochemica
lly m
eaningless ra
ndomizatio
n inappropria
te for
metabolic netw
orks
metabolic netw
orks
Areejit Samal – MPA’11 Chester
Null models for metabolic networks: Blind-watchmaker network Null models for metabolic networks: Blind-watchmaker network
Blind to
basic co
nstraints
satis
fied by b
iochemica
l reacti
ons!!
Blind to
basic co
nstraints
satis
fied by b
iochemica
l reacti
ons!!
Areejit Samal – MPA’11 Chester
Motivation of this workMotivation of this work
We propose a new Markov Chain Monte Carlo (MCMC) based method to generate meaningful randomized ensembles for metabolic networks by successively imposing biochemical and functional constraints.
Ensemble RGiven number of valid biochemical reactions
Ensemble RmAdditionally, fix the No. of metabolites
Ensemble uRmAdditionally, exclude
blocked reactions
Ensemble uRm-V1Additionally, functional constraint of growth on a specified environments
Increasin
g level o
f con
straints
Constraint I
Constraint II
Constraint III
Constraint IV
Areejit Samal – MPA’11 Chester
Constraint I: Given number of valid biochemical Constraint I: Given number of valid biochemical reactionsreactions
Instead of generating fictitious reactions using edge exchange mechanism, we can restrict to >6000 valid reactions contained within KEGG database.
We use the database of 5870 mass balanced reactions derived from KEGG by Rodrigues and Wagner.
Areejit Samal – MPA’11 Chester
Global Reaction SetGlobal Reaction Set
+
Biomass composition
The E. coli biomass composition of iJR904 was used to determine viability of genotypes.
Nutrient metabolites
The set of external metabolites in iJR904 were allowed for uptake/excretion in the global reaction set.
KEGG Database + E. coli iJR904
We can implement Flux Balance Analysis (FBA)
within this database.
Reference: Rodrigues and Wagner (2009)
Areejit Samal – MPA’11 Chester
Comparison with Comparison with E. coliE. coli: Structural Properties of Interest: Structural Properties of Interest
• Metabolite Degree distribution
• Clustering Coefficient • Average Path Length
• Pc: Probability that a path exists between two nodes in the graph
• Largest strongly component (LSC) and the union of LSC, IN and OUT components
E. coli metabolic network has m (~750) metabolites and r (~900) reactions.
In this work, we will generate randomized metabolic networks using only reactions in KEGG and compare the structural properties of randomized networks with those of E. coli.
We study the following large-scale structural properties:
Bow-tie architecture of the Internet and metabolism
Ref: Broder et al; Ma and Zeng, Bioinformatics (2003)
Scale Free and Small World Scale Free and Small World Ref: Jeong et al (2000) Wagner & Fell (2001)Ref: Jeong et al (2000) Wagner & Fell (2001)
HEX1
PGI
atp
adph
PFK
g6p
glc-D
f6p
fdp
glc-D
g6p
f6p
fdp
HEX1: atp + glc-D → adp + g6p + hPGI: g6p ↔ f6pPFK: atp + f6p → adp + fdp + h
CurrencyMetabolite
OtherMetabolite
ReversibleReaction
IrreversibleReaction
(a) Bipartite Representation (b) Unipartite Representation
Degree Distribution
Clustering coefficientAverage Path Length
Largest Strong Component
Dif
fere
nt
Gra
ph
-th
eore
tic
Rep
rese
nta
tio
ns
of
the
Dif
fere
nt
Gra
ph
-th
eore
tic
Rep
rese
nta
tio
ns
of
the
Met
abo
lic
Net
wo
rkM
etab
oli
c N
etw
ork
Areejit Samal – MPA’11 Chester
Ensemble Ensemble RR: Random networks within KEGG with same number of : Random networks within KEGG with same number of reactions as reactions as E. coliE. coli
6000900C
KEGG Database E. coli
R1: 3pg + nad → 3php + h + nadhR2: 6pgc + nadp → co2 + nadph + ru5p-DR3: 6pgc → 2ddg6p + h2o...RN: ---------------------------------
101....
The E. coli metabolic network or any random network of reactions within KEGG can be represented as a bit string of length N.
Present - 1
Absent - 0
N = 5870 reactions within KEGG
Bit string representation of metabolic networks
Ensemble R of random networks
To compare with E. coli, we generate an ensemble R of random networks: Pick random subsets with exactly r (~900) reactions within KEGG database of N (~6000) reactions. r is the no. of reactions in E. coli. Huge no. of networks possible within KEGG with exactly r reactions ~ Fixing the no. of reactions is equivalent to fixing genome size.
Contains r reactions (1’s in the bit string)
6000900C
KEGG with N reactions
Random networks with r reactions
Areejit Samal – MPA’11 Chester
Metabolite degree distributionMetabolite degree distribution
k
1 10 100 1000
P(k
)
10-6
10-5
10-4
10-3
10-2
10-1
100
E. coli
Complete KEGG: = 2.31
k
1 10 100 1000
P(k
)
10-6
10-5
10-4
10-3
10-2
10-1
100
E. coliRandom
E. coli: = 2.17; Most organisms: is 2.1-2.3
Reference: Jeong et al (2000);
Wagner and Fell (2001)
Random networks in ensemble R: = 2.51
Any (large enough) random subset of reactions in KEGG has Power Law degree distribution !!
k
1 10 100 1000
P(k
)
10-8
10-7
10-6
10-5
10-4
10-3
10-2
10-1
100
E. coliRandomKEGG
A degree of a metabolite is the number of reactions in which the metabolite participates in the network.
Areejit Samal – MPA’11 Chester
Reaction Degree DistributionReaction Degree Distribution
Number of substrates in a reaction
0 2 4 6 8 10 12
Fre
qu
ency
0
500
1000
1500
2000
2500
30000 Currency1 Currency2 Currency3 Currency4 Currency5 Currency6 Currency7 Currency
Tanaka and Doyle have suggested a classification of metabolites into three categories based on their biochemical roles: (a) ‘Carriers’ (very high degree)(b) ‘Precursors’ (intermediate degree)(c) ‘Others’ (low degree)Reference: Tanaka (2005); Tanaka, Csete, and Doyle (2008)
The degree of a reaction is the number of metabolites that participate in it.
The presence
of ubiquito
us (high degree) ‘c
urrency
’ metabolite
s
The presence
of ubiquito
us (high degree) ‘c
urrency
’ metabolite
s
along with
low degree ‘o
ther’ metabolite
s with
in a given re
action
along with
low degree ‘o
ther’ metabolite
s with
in a given re
action
explains t
he observe
d power laws i
n metabolis
m.
explains t
he observe
d power laws i
n metabolis
m.
Areejit Samal – MPA’11 Chester
Network Properties: Given number of valid biochemical Network Properties: Given number of valid biochemical reactions reactions
No. of m
etabolites in
E. coli < No. o
f metabolite
s in any ra
ndom network
Areejit Samal – MPA’11 Chester
Constraint II: Fix the number of metabolites Constraint II: Fix the number of metabolites
Direct sampling of randomized networks with same no. of reactions and metabolites as in E. coli is not possible.
Simple MCMC algorithm to sample networks within KEGG which have same no. of reactions and metabolites as that in E. coli.
Accept/Reject Criterion of reaction swap: Accept if the no. of metabolites in the new network is less than or equal to E. coli.
KEGG Database
R1: 3pg + nad → 3php + h + nadhR2: 6pgc + nadp → co2 + nadph + ru5p-DR3: 6pgc → 2ddg6p + h2o...RN: ---------------------------------
10101..
N = 5870 reactions within KEGG reaction universe
11001..
10110..
11100..
00101..
E. coli
Accept
Reject
Step 1
Swap Swap
Reject
106 Swaps
Step 2
103 Swaps
store storeReaction swap keeps the number of reactions in network constant
Erase memory of
initial network
Saving Frequency
Ensemble Rm
Areejit Samal – MPA’11 Chester
Network Properties: After fixing number of metabolitesNetwork Properties: After fixing number of metabolites
Randomized network properties become closer to
E. coli
Areejit Samal – MPA’11 Chester
Constraint III: Exclude Blocked ReactionsConstraint III: Exclude Blocked Reactions
KEGG
Unblocked KEGG
Large-scale metabolic networks typically have certain dead-end or blocked reactions which cannot contribute to growth of the organism.
Blocked reactions have zero flux for every investigated chemical environment, under any steady-state condition with nonzero biomass growth flux.
We next exclude blocked reactions from our biochemical universe used for sampling randomized networks.
Areejit Samal – MPA’11 Chester
MCMC Sampling: Exclusion of blocked reactions MCMC Sampling: Exclusion of blocked reactions
Restrict the reaction swaps to the set of unblocked reactions within KEGG.
Accept/Reject Criterion of reaction swap: Accept if the no. of metabolites in the new network is less than or equal to E. coli.
KEGG Database
R1: 3pg + nad → 3php + h + nadhR2: 6pgc + nadp → co2 + nadph + ru5p-DR3: 6pgc → 2ddg6p + h2o...RN: ---------------------------------
10101..
N = 5870 reactions within KEGG reaction universe
11001..
10110..
11100..
00101..
E. coli
Accept
Reject
Step 1
Swap Swap
Reject
106 Swaps
Step 2
103 Swaps
store storeReaction swap keeps the number of reactions in network constant
Erase memory of
initial network
Saving Frequency
Ensemble uRm
Areejit Samal – MPA’11 Chester
Network Properties: After excluding blocked reactionsNetwork Properties: After excluding blocked reactions
Randomized network properties become still
closer to E. coli
Areejit Samal – MPA’11 Chester
Constraint IV: Growth PhenotypeConstraint IV: Growth Phenotype
011....
001....
GA
GB
x
Deletedreaction
FluxBalanceAnalysis
(FBA)
Viable genotype
Unviable genotype
Bit string and equivalent network representation of genotypes
Genotypes Ability to synthesize Viability biomass components
Growth
No Growth
Definition of Phenotype
For FBA Kauffman et al (2003) for a review
Areejit Samal – MPA’11 Chester
MCMC Sampling: Growth phenotype in one environment MCMC Sampling: Growth phenotype in one environment
Restrict the reaction swaps to the set of unblocked reactions within KEGG.
Accept/Reject Criterion of reaction swap: Accept if (a) the no. of metabolites in the new network is less than or equal to E. coli(b) the new network is able to grow on the specified environment
KEGG Database
R1: 3pg + nad → 3php + h + nadhR2: 6pgc + nadp → co2 + nadph + ru5p-DR3: 6pgc → 2ddg6p + h2o...RN: ---------------------------------
10101..
N = 5870 reactions within KEGG reaction universe
11001..
10110..
11100..
00101..
E. coli
Accept
Reject
Step 1
Swap Swap
Reject
106 Swaps
Step 2
103 Swaps
store storeReaction swap keeps the number of reactions in network constant
Erase memory of
initial network
Saving Frequency
Ensemble uRm-V1
Areejit Samal – MPA’11 Chester
Network Properties: Growth phenotype in one Network Properties: Growth phenotype in one environmentenvironment
Further im
provement obtained by im
posing phenotypic
constraint o
n randomized networks
Areejit Samal – MPA’11 Chester
Network Properties: Growth phenotype in multiple Network Properties: Growth phenotype in multiple environmentsenvironments
Still fu
rther im
provement obtained by in
creasing phenotypic
constraints on ra
ndomized networks
Areejit Samal – MPA’11 Chester
Global structural properties of real metabolic networks: Consequence of simple biochemical and functional constraints
Incr
easi
ng
lev
el o
f co
nst
rain
ts
Global network structure could be an indirect
consequence of biochemical and functional constraints
Samal and Martin, PLoS ONE (2011) 6(7): e22295
Conjectured by:A. Wagner (2007)
B. Papp et al. (2008)
Areejit Samal – MPA’11 Chester
High level of genetic diversity in our randomized ensemblesHigh level of genetic diversity in our randomized ensembles
Number of reactions in genotype
500 1000 1500 2000 2500
Mea
n H
amm
ing
dis
tan
ce f
or
ran
do
m a
nd
via
ble
gen
oty
pes
0
200
400
600
800
1000
1200
1400
Glucose
AcetateSuccinate
(n): mean distance(n): maximum distance
Any two random networks in our most constrained ensemble differ in ~ 60% of their reactions.
Areejit Samal – MPA’11 Chester
Necessity of MCMC samplingNecessity of MCMC sampling
Constraint of having super-essential reactions
The convex curve indicates that the effect on viability of successive reaction removals becomes more and more severe.
Reference: Samal, Rodrigues, Jost, Martin, Wagner BMC Systems Biology (2010)
Areejit Samal – MPA’11 Chester
SummarySummary
We propose a MCMC based method to generate randomized metabolic networks by successively imposing few macroscopic biochemical and functional constraints into our sampling criteria.
By imposing a few macroscopic constraints into our sampling criteria, we were able to approach the properties of real metabolic networks.
Areejit Samal – MPA’11 Chester
Concluding RemarksConcluding Remarks
We can extend the study using the SEED database to other organisms.
The list of reactions in KEGG is incomplete and its use may reflect evolutionary constraints associated with natural organisms.
From a synthetic biology context, there could be many more reactions that can occur which are not in KEGG.
Alternate approach proposed by Basler, Ebenhöh, Selbig and Nikoloski, Bioinformatics, 2011
Areejit Samal – MPA’11 Chester
Neutral networks: A many-to-one genotype to phenotype mapNeutral networks: A many-to-one genotype to phenotype map
Genotype
Phenotype
RNA Sequence
Secondary structure of pairings
folding
Reference: Fontana & Schuster (1998) Lipman & Wilbur (1991) Ciliberti, Martin & Wagner(2007)
Neutral network or genotype network refers to the set of (many) genotypes that have a given phenotype, and their “organization” in genotype space.
Protein Sequence
Three dimensional structure
Gene network
Gene expression levels
Areejit Samal – MPA’11 Chester
See also:
Common approaches in the study of design principles of metabolic Common approaches in the study of design principles of metabolic networksnetworks
Areejit Samal – MPA’11 Chester
AcknowledgementAcknowledgement
CNRS & Max Planck Society Joint Program in Systems Biology (CNRS MPG CNRS & Max Planck Society Joint Program in Systems Biology (CNRS MPG GDRE 513) for a Postdoctoral Fellowship.GDRE 513) for a Postdoctoral Fellowship.
Reference
Related Work
Genotype networks in metabolic reaction spacesAreejit Samal, João F Matias Rodrigues, Jürgen Jost, Olivier C Martin* and Andreas Wagner* BMC Systems Biology 2010, 4:30
Environmental versatility promotes modularity in genome-scale metabolic networksAreejit Samal, Andreas Wagner* and Olivier C Martin* BMC Systems Biology 2011, 5:135
Funding
Andreas Wagner João Rodrigues
University of Zurich
Jürgen Jost Max Planck Institute for Mathematics in the Sciences
Olivier C. MartinUniversité Paris-Sud
Thanks for y
our atte
ntion!!
-- Questio
ns??