"home depot" model of evolution of prokaryotic metabolic networks and their regulation...
TRANSCRIPT
"Home Depot" Model of Evolution of Prokaryotic Metabolic Networks and
Their Regulation
Sergei MaslovBrookhaven National
LaboratoryIn collaboration with Kim Sneppen and Sandeep Krishna,
Center for Models of Life, Copenhagen Uand
Tin Yau Pang, Stony Brook U
Stover et al., Nature (2000)
van Nimwegen, TIG (2003)
The rise of bureaucracy! Fraction of bureaucrats grows with organization
size Trend (if unchecked) could lead to a “bureaucratic
collapse”: 100% bureaucrats and no workers As human bureaucrats, transcription factors are
replaceable and disposable many anecdotal stories of one regulator replacing another in
closely related organisms Not very essential (at least in yeast) .
One is tempted to view regulators nearly as “parasites” or superficial add-ons that marginally improve the efficiency of an organism
But if you are a bureaucrat you see your role somewhat differently…
Encephalization QuotientEQ~M(brain)1//M(body)
From Carl Sagan's book: “Dragons of Eden: Speculations on the Evolution
of Human Intelligence”
Table from M.Y. Galperin, BMC Microbiology (2005)
Bacterial IQ
Bacterial IQ~(Nsignal
trasnsducers)1/2/Ngenes
Quadratic scaling applies to all types of regulation and signaling
Table from Molina, van Nimwegen, Biology Direct 2008
How to explain the quadratic law?
Let’s play with this scaling law
• NR=NG2/80,000 --> NR=NG 2NG/80,000
NG /NR=40,000/NG
• ~40 new genes per regulator for NG=1000• ~4 new genes (1 regulator + 3 non-regulatory genes)
for the largest bacterial genomes with NG~10,000
• Important observation: NG /NR decreases with genome size
Now to our model
Disclaimer: authors of this study (unfortunately) received no financial support from Home Depot, Inc. or Obi, GMBH
“Home Depot” argument• Inspired by personal experience as a new homeowner buying
tools• Tools are bought to accomplish functional tasks e.g. fix a leaking
faucet • Redundant tools are returned to “Home Depot”• As your toolbox grows you need to get fewer and fewer new tools
to accomplish a new task
• Tools are e.g. metabolic pathways acquired by Horizontal Gene Transfer
• Regulators control these pathways (we assume one regulator per task/pathway)
• Redundant genes are promptly deleted (in prokaryotes)• Genomes shrink by deleting entire pathways that are no longer
required • All non-regulatory “workhorse” genes of an organism - its toolbox• As it gets larger you need fewer new workhorse genes per new
regulated function – FASTER THAN LINEAR SCALING
Random overlap between functions no quadratic scaling!
Nuniv – the total number of tools in “Home Depot” NG – the number of tools in my toolbox Lpathway – the number of tools needed for each
new functional task If overlap is random then Lpathway NG / Nuniv
are redundant (already in the toolbox) dNG/dNR= Lpathway- Lpathway NG / Nuniv
Superlinear only due to logarithmic corrections: NG= Lpathway NR / Nuniv log NR
max/(NRmax
-NR) Networks are needed for non-random overlap
between functional pathways
Spherical cow modelof metabolic networks
Food WasteMilk
nutrient
Horizontal gene transfer:entire pathways could be added in one step
Pathways could be also removed
Central metabolism anabolic pathways biomass
nutrient
nutrient
• New pathways are added from the universal network formed by the union of all reactions in all organisms (bacterial answer to “Home depot”)
• The only parameter - the size of the universal network Nuniv
• The current size of the toolbox (# of genes ~ # of enzymes ~ # of metabolites): NG
• Probability to join the existing pathway: pjoin= NG /Nuniv
• Lpathway=1/pjoin=Nuniv/NG
• If one regulator per pathway: NG/NR=Lpathway=Nuniv/NG
• Quadratic law: NR=NG2 /2Nuniv
+
=
We tried several versions of the toolbox model
On a random network: analytically solved to give NR~Nmet
2
On a union of all KEGG reactions: numerically solved to give NR~Nmet
1.8
~1800 reactions and metabolites upstream of the central metabolism
Randomly select nutrients Follow linear pathways until they overlap with
existing network
102
103
104
100
101
102
103
N
TF
Ngenes
Green – all fully sequenced prokaryotes
Red – toolbox model on KEGG universal network with Nuniv=1800
From SM, S. Krishna, T.Y. Pang, K. Sneppen, PNAS (2009)
100
101
102
10-4
10-3
10-2
10-1
100
branch/regulon size
cum
ula
tive
dis
trib
uti
on
Green – linear branches in E. coli metabolic network
Red – toolbox model on KEGG
SM, S. Krishna, T.Y. Pang, K. Sneppen, PNAS (2009)
Length distribution of metabolic pathways/branches
-1=2
Model with shortest & branched instead of meandering & linear
pathways
101
102
103
100
101
102
103
Nmet
NT
F
Slope=1.7
SM, T.Y. Pang, in preparation (2010)
What does it mean for regulatory networks?
NR<Kout>=NG<Kin>=number of regulatory interactions
NR/NG= <Kin>/<Kout> increases with NG Either <Kout> decreases with NG:
pathways become shorter as in our model Or <Kin> grows with NG:
regulation gets more coordinated Most likely both trends at onceE. van Nimwegen, TIG (2003)
nutrient
nutrient
TF1
TF2
Regulating pathways: basic version
<Kout>: <Kin>=1=const
nutrient
nutrient
TF1
Regulating pathways: long regulons
TF2
<Kout>=const<Kin>:
nutrient
nutrient
TF1
TF2
Regulating pathways: TFTF + upstream
suppression
nutrient
nutrient
TF1
TF2
Regulating pathways: new TFs
TF1
Conclusions and future plans Toolbox “Home Depot” model explains:
Quadratic scaling of the number of regulators Broad distribution (hubs and stubs) of regulon sizes:
most functions need few tools, some need many Gene duplication models offer an alternative way
to explain hubs in biological networks but the ultimate explanation has to be functional
Our model relies on Horizontal Gene Transfer instead of gene duplication
To do list: Coordination of regulation of different pathways:
which of our proposed scenarios (if any) is realized? What Nature is trying to minimize when adding branched
pathways? The number of added reactions? The number of byproducts? Cross-talk with existing pathways?
Extensions to organizations, technology innovations, etc?
Thank you!
Target product
By-product
By-product
“Surface”
NM
100
101
102
100
101
102
# of metabolites in a pathway
Su
rfac
e o
f p
ath
way
surface
log-binned surface:exponent = 0.25 - 0.5
100
101
102
10-1
100
101
102
# of metabolites in a pathway
Su
rfac
e o
f p
ath
way
by-products
log-binned by-products:exponent = 1
Toolbox model E. coli metabolic network (spanning tree)
nutrient
nutrient
nutrient
TF1
TF2
TF1
Deleting pathways
100
101
102
10-4
10-3
10-2
10-1
100
branch length/regulon size
cum
ula
tive
dis
trib
uti
on
-1=1
-1=2
Green – regulons in E. coli
Red – toolbox model on full KEGG
Distribution of regulon sizes
Table from M.Y. Galperin, BMC Microbiology (2005)
Bacterial IQIQ~(Nsignal trasnsducers)1/2/Ngenes
100
101
102
10-3
10-2
10-1
100
regulon size
cum
ula
tive
dis
trib
uti
on
Fig. 2AFig 2BFig. 2CFig. 2D
KEGG pathways vs reactionsIn ~500 fully sequenced prokaryotes
# of reactions ~ NG
# o
f p
ath
ways
~ N
R
SM, S. Krishna, K. Sneppen (2008)
102
103
104
100
101
102
103
N
TF
Ngenes
MF-model, Nuniv
=1750
kegg-maps, 1800best fit to x: slope=2.15
A
100
101
102
100
101
102
dNM
surfaceby-product
Adaptive evolution of bacterial metabolic networks by horizontal gene transferCsaba Pal, Balazs Papp & Martin Lercher, Nat. Gnet. (2005)
Adaptive evolution of bacterial metabolic networks by horizontal gene transferCsaba Pal, Balazs Papp & Martin Lercher, Nat. Gnet. (2005)
nutrient
nutrient
nutrient
TF1
TF2
102
103
104
10-2
10-1
100
# of genes
# o
f e
nzy
mes
/# o
f g
enes
meanfraction=0.23
102
103
104
100
101
102
103
NG
- # of genes
# o
f A
BC
tra
nsp
ort
ers
(p
fam
:PF
000
05)
all prokaryotes
fit slope 1.33
Table from Erik van Nimwegen, TIG 2003
Complexity is manifested in Kin distribution
E. coli vs. S. cerevisiae vs. H. sapiens
0 5 10 15 2010
0
101
102
103
Kin
N(K
in)
100
101
102
10-2
10-1
100
101
102
Kout
N(K
out)
A B
Basic versionCoordinated activity of pathways
SM, S. Krishna, K. Sneppen (2008)
Jerison 1983
Jerison 1983 The evolution of the mammalian brain as an information-processing system. pp. 113-146 IN Eisenberg, J. F. & Kleiman, D. G. (Ed.), Advances in the Study of Mammalian Behavior (Spec. Publ. Amer. Soc. Mamm. 7). Pittsburgh: American Society of Mammalogists. Figure redrawn from Jerison 1973
Jerison 1983
Trivia facts
Zebrafish – the largest # of TFs (~2700) or 10% of ~27,000 genes. (humans ~1900 TFs or 8% of 24,000 genes)
In bacteria it is Burkholderia sp. 383 : ~800 TFs out of 8000 genes (also 10% of the total)
Linear fit to log(NR) with log(Ngenes) explains 87% of the variance (cc~0.93)
Linear fit to NR/Ngenes with Ngenes explains 50%-60% of the variance (cc~0.7-0.75).
Gut/sewer bacterium: E. coli K12: 4467 genes 271 TFs 6%
http://www.g-language.org/g3/
Aphid parasite: Buchnera aphidicola APS: 618 genes 6 TFs 1%
http://www.g-language.org/g3/
Soil bacterium: Rhodococcus sp. RHA1: 9221 genes 641 TFs 7%
http://www.g-language.org/g3/
Gut/free bacterium: E. coli K12: 4467 genes 271 TFs 6%
http://www.g-language.org/g3/