1 network-based data integration reveals extensive post-transcriptional regulation of human...
Post on 21-Dec-2015
220 views
TRANSCRIPT
1
Network-based data integration reveals extensive post-transcriptional regulation of
human tissue-specific metabolism
Tomer Shlomi*, Moran Cabili*, Markus J. Herrgard ,Bernhard Q Palsson and Eytan Ruppin
* These authors contributed equally to this work
2
Metabolism
Metabolism is the totality of all the chemical reactions that operate in a living organism.
Catabolic reactionsBreakdown and produce energy
Anabolic reactionsUse energy and build up essential cell components
3
•In born errors of metabolism cause acute symptoms and even death on early age
•Metabolic diseases (obesity, diabetics) are major sources of morbidity and mortality.
•Metabolic enzymes and their regulators gradually becoming viable drug targets
Why Study Human Metabolism?
4
Metabolite
Reaction catalyzed by an enzyme
Modeling Cellular Metabolism
A Short Review Metabolic flux:
The production or elimination of a quantity of metabolite per mass of organ or organism over a specific time frame
..“it is the concept of metabolic flux that is crucial in the translation of genotype and environmental factors into phenotype or a threshold for disease”.
Brendan Lee Nature 2006
5
Constraint Based Modeling
• Under the constraints:
– Mass balance: metabolite production and consumption rates are equal
– Thermodynamic: irreversibility of reactions– Enzymatic capacity: bounds on enzyme rates
• Successfully predicts:
Find a steady-state flux distribution through all biochemical reactions
constant
6
Constraint Based Modeling (CBM)
Mathematical Representation of Constrains• Stoichiometric matrix – network topology with stoichiometry of biochemical reactions
Glucose + ATP GlucokinaseGlucose-6-Phosphate + ADP
GlucokinaseGlucose -1ATP -1
G-6-P +1ADP +1m
etab
olite
s
reactions
Mass balance
S·v = 0
Subspace of R
Thermodynamic & capacity10 >vi > 0
Bounded convex cone
Optimization Maximize Vgrowth
n
growthFell, et al (1986), Varma and Palsson (1993)
7
•Motivated by the fact that in-vivo studies of tissue-specific metabolic functions are limited in scope
•Individual genes and pathways (KEGG, HumanCyc)• Detailed description of the genes, reactions, enzymes • No connections between pathways
•Specific cell-types and organelles• Red blood cell Wiback et al. 2002• Mitochondria Vo et al. 2004
• Large-Scale Human Metabolic Networks • The first large-scale model of human metabolism ~2000 genes, ~3700 reactions, 7 organelles (Duarte et al. 2007, Ma et al. 2007)
Human Metabolic Models
8
•Various cell-types activate different pathways (shown in Expression studies)
•Hard to formulate cellular metabolic objectives – (like biomass maximization for microbial species)
•Unknown inputs and outputs of each cell-type
CBM in HumanModeling human tissue function is
problematic
Can we use constraint-based modeling to systematically predict tissue-specific metabolic behavior?
9
1. General approach to study tissue specific metabolic models
2. Tissue specific activity of metabolic genes/reactions
Our Objective:
Model Integration with Tissue-Specific Gene and Protein Expression Data
Our Method:
Motivated by the assertion that highly expressed genes in a certain tissue are likely to be active there
10
Our MethodGene expression data Protein measurements data
Highly and Lowly expressed gene sets Gene-to-reaction mapping
Highly and Lowly expressed reaction sets Human Metabolic Model
)Duarte et. al(
New objective function :
Maximize consistency with expression data .
Use Mixed Integer Linear Programming (MILP)
Determine activity state and conf. level for each gene/reaction
1
3
2
4
11
Determine Highly and Lowly Reaction sets
1 .Genes set :Extract set of enzymes whose expression is significantly increased or decreased (GeneNote, HPRD)
2 .Reactions set :Employ a detailed gene-to-reaction mapping to identify a tissue-specific expression state for each reaction
R1 = (g1 & g2) | g3 | g4
Our Method
12
Our MethodGene expression data Protein measurements data
Highly and Lowly expressed gene sets Gene-to-reaction mapping
Highly and Lowly expressed reaction sets Human Metabolic Model
)Duarte et. al(
New objective function :
Maximize consistency with expression data .
Use Mixed Integer Linear Programming (MILP)
Determine activity state and conf. level for each gene/reaction
1
3
2
4
13
E3
E1 E2
E5 E7E6
M3
M9
M6
M4
M2
M7
M8M5M1
Highly expressed
Lowly expressed
E4Input
Output
Output
Looking for real flux vector V
Now add additional Boolean vectors H, L s.t:
Hi=1 Vi != 0 (if the enzyme associated with Vi is Highly expressed)
L i=1 Vi=0 (if the enzyme associated with Vi is Lowly expressed)
Represent Flux Consistency with Expression State
Our Method
H3
H1
H2
L1
L2
14
Use Mixed Integer Linear Programming. Define a new objective function:
MAX Σ (Hi + Li )
Which practically mean maximize the number of Highly expressed reactions that are active and the number of Lowly expressed reactions that are inactive
Maximize consistency with expression data
4 out of 5 reactions were consistent with the
expression state!
Define a New Objective functionOur Method
E3
E1 E2
E5 E7E6
M3
M9
M6
M4
M2
M7
M8M5M1
Highly expressed
Lowly expressed
E4Input
Output
Output
H3
H1
H2
L1
L2
15
Our MethodGene expression data Protein measurements data
Highly and Lowly expressed gene sets Gene-to-reaction mapping
Highly and Lowly expressed reaction sets Human Metabolic Model
)Duarte et. al(
New objective function :
Maximize consistency with expression data .
Use Mixed Integer Linear Programming (MILP)
Determine activity state and conf. level for each gene/reaction
1
3
2
4
16
• Gene’s flux activity states -reflect the absence/existence of non-zero flux through the enzymatic reactions they encode
• Comparison of the flux activity states and the expression state will teach us on post transcription regulation
Our MethodFlux Activity State
Down regulated
Up regulated
Lowly expressed
E3
E1 E2
E5 E7E6
M3
M9
M6
M4
M2
M7
M8M5M1
Highly expressed
E4
17
Flux Activity State Consider Space of Possible Solutions
• We predict for each tissue active and inactive gene and reactions sets
• Since there is a space of possible solutions to the MILP problem we solve a set of MILP problems to determine the gene activity
1. Simulate a state where the gene is inactive
2. Simulate an active gene product
Estimate confidence levels based on the drop in the consistency (with expression) between the 2
different solutions!
18
ResultsGene Tissue Specific Activity
•We employed the method described above on• metabolic network model of Duarte et al.• gene and protein expression measurements from GeneNote and HPRD
•10 tissues : brain, heart, kidney, liver, lung, pancreas, prostate,
spleen, skeletal muscle and thymus.
• The activity state of 781 out of 1475 model genes was determined in at least one tissue
19
Post-transcriptional Regulation of Metabolic
Genes• Post-transcriptional regulation plays a major role in
shaping tissue-specific metabolic behavior: ~20% of the metabolic genes per tissue
• average of 42 (3.6%) genes post-transcriptionally up-regulated and 180 (15.4%) post-transcriptionally down-regulated in each tissue
up-regulated
down-regulated
20
Cross Validation Test •We performed a five-fold cross validation test
•80% of the genes were used to constrain the model
•Gene activity states for a held-out set of 20% of the genes were predicted according to the expression constrains of the remaining other 80%
•The overlap between the genes predicted as active and the highly expressed genes in the held-out data was significantly high for all tissues
21
Large Scale ValidationLarge-Scale Mining of Tissue-Specificity Data
- Tissue-specificity of genes, reactions, and metabolites is significantly correlated with all data sources
- Tissue specificity of post-transcriptional up regulated elements is significantly high !!!!
- Tissue specificity of post-transcriptional down regulated elements is significantly low !!!!
22
Tissue-Specific Metabolite Exchange with Biofluids
• 249 metabolites are known to be secreted or taken up by human tissues
• 54% of the metabolites are not associated with transporters and cannot be predicted by expression data
• Transport direction can not be inferred by the expression data
• A transporter might carry several metabolites
• Many of the known transporters are post-transcriptionall regulated
23
Metabolic Disease-Causing Genes
• 162 metabolic genes are associated with a mendelian disease
• Prediction accuracy: precision of 49% and a recall of 22% • There is a significant affect of post transcriptional regulation on
disease-causing genes GBE1 causes the glycogen storage disease is post-transcriptionally up-regulated in liver, heart, skeletal
muscle, and brain(
24
SummaryMethodological Standpoint
• First constraint-based modeling analysis of recently published human metabolic networks
• First to account for post-transcriptional regulation within the
computational framework of large-scale metabolic modeling
• Integrate expression data as part of the optimization instead of imposing it as a constrain during the preprocessing step (Akesson et al. 2004)
25
SummaryMain Conclusions
• Post transcriptional regulation plays a significant rule in shaping tissue specific metabolic behavior
The tissue specificity of many metabolic disease-causing genes goes markedly beyond that manifested in their expression level, giving rise to new predictions concerning their involvement in different tissues
Metabolites exchange with biofluids displays a large variance across tissues, composing a unique view of tissue-specific uptake and secretion of hundreds of metabolites
26
What’s Next?• Integrate other tissue-specificity data
• Modeling of metabolic diseases– Using various data sources (known disease-causing
genes, drug databases) – Predict tissue-wide metabolic symptoms– Predict metabolic response to drugs
• Predict disease biomarkers that can be identified by biofluid metabolomics
27
Thank you!
28
1,0,
)5(,)1()1(
)4(,
)3(,,
)2(
)1(0
.
))((max
max,min,
max,max,
min,min,
maxmin
,,
ii
m
Niiiii
Eiiii
Eiiii
Ri Ri iiiyyv
yy
Rv
Riyvvyv
Rivvyv
Rivvyv
vvv
vS
ts
yyyE N
Mathematical representation of our optimization problem