sequencing the world of possibilities for energy & environment annotation: function prediction...
TRANSCRIPT
Sequencing the World of Possibilities for Energy & Environment
Annotation: function prediction andmetabolic reconstruction
Thanos LykidisGenome Biology Program
DOE-Joint Genome Institute
Sequencing the World of Possibilities for Energy & Environment
Two main goals of genome analysis:
• Evolutionary analysis– How does an organism compare to the rest?
• Metabolic reconstruction– What can an organism do and how?
Sequencing the World of Possibilities for Energy & Environment
Metabolic reconstruction
• Predict the biochemistry and physiology of an organism based on its genome sequence
• Explain known biochemical and physiological properties
Sequencing the World of Possibilities for Energy & Environment
To do metabolic reconstruction we need to “annotate” the genome:
• Find the genes
• Understand (predict) what these genes are doing
Sequencing the World of Possibilities for Energy & Environment
The “same-gene” problem
Sequencing the World of Possibilities for Energy & Environment
Metabolic reconstruction-Gene function
• Experiment– enzyme assays– mutants
• Computation– sequence comparison
• BLAST, phylogenomics protein family (Pfam, COG, InterPro)
– chromosomal context – fusion
Sequencing the World of Possibilities for Energy & Environment
Similarity-based annotation
Sequencing the World of Possibilities for Energy & Environment
Similarity-based annotation
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
The dgk example
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
Extensive distribution of the dgk protein family
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
Flow chart of the reconstruction process
Gene Annotation Reaction Pathway
Sequencing the World of Possibilities for Energy & Environment
Treponema pallidum is an uncultivated pathogenic bacterium.
Fitzgerald TJ et al, J. Bacteriol 130:1333 1977.
TP0671
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
A, PIS bacterial
B, PIS eukaryotic
C, CLS eukaryotic
D, PGS
E, PSS
F, PCSG, unknownH, CPT/EPTI, unknown
Sequencing the World of Possibilities for Energy & Environment
670
99
86
Group H contains eukaryotic CPT/EPT
A_aeolicusC_reinhardtiiC_intestinalisD_melanogasterH_sapiensH_sapiensC_elegansC_elegans
A_thalianaA_thalianaD_melanogasterD_melanogasterC_intestinalis
C_intestinalisH_sapiens
S_cerevisiaeS_cerevisiae
T_denticolaT_pallidumN_aromaticivoransS_coelicolorS_avermitilis
Sequencing the World of Possibilities for Energy & Environment
Based on the BLAST hits we get a hint that TP0671 is a CEPT
+
CDP-Cho + DAG PtdCho
CDP-Etn DAG PtdEtn
CPT
EPT
Eukarya
Eukarya
Sequencing the World of Possibilities for Energy & Environment
A functional prediction has to make sense in the context of metabolism
Sequencing the World of Possibilities for Energy & Environment
Pathway
What is a pathway?
A sequence of reactions transforming one metabolite to another
cholinePhosphocholine
CDP-choline
Phosphatidylcholine
Cholinekinase
Phosphoholinecytidylyltransferase
Phosphatidyltransferase
Everything should come together
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
All genes of the pathway are present
cholinePhosphocholine
CDP-choline
Phosphatidylcholine
Cholinekinase
Phosphoholinecytidylyltransferase Phosphatidyltransferase
Sequencing the World of Possibilities for Energy & Environment
Reconstruction of phospholipid biosynthesis in Treponema pallidum
CDP-DAG
PtdSer
PtdEtn
PtdGlc
CL
PSS PGS
CLS
PtdOH DAG PtdChoPtdEtn
PSDCho, Etn
P-Cho, P-Etn
CDP-Cho, CDP-Etn
TP0107
TP0107
Sequencing the World of Possibilities for Energy & Environment
Working with no similarity
A
B
C
Enzyme 1
Enzyme x
Sequencing the World of Possibilities for Energy & Environment
The plsX-plsY pathway
Sequencing the World of Possibilities for Energy & Environment
Scoring phylogenetic profile similarity
Sequencing the World of Possibilities for Energy & Environment
Scoring phylogenetic profile similarity
Sequencing the World of Possibilities for Energy & Environment
Scoring phylogenetic profile similarity
Sequencing the World of Possibilities for Energy & Environment
Clustering of fatty acid biosynthesis genes
Acetyl-CoA
Malonyl-CoA
Malonyl-ACP
-ketoacyl-ACP
-hydrxyacyl-ACP
trans-2-enoyl-ACP
(s)-acyl-ACP
fabZ
accA
accD
accC
accB
fabF
fabG
fabD
cis-2-enoyl-ACP
(u)-acyl-ACP
fabK
acpP
fabH
HTH
fabM
acc
fabD
fabHfabF
fabZ
fabI
fabG
cis-2-enoyl-ACP (u)-acyl-ACP
fabM
fabA
fabK
S. pneumoniae
Sequencing the World of Possibilities for Energy & Environment
Current Status of annotation
~ 50-80% precise, accurate prediction
~ 10-30% “twilight zone” predictions
~ 10-30% genome specific genes
Sequencing the World of Possibilities for Energy & Environment
Metabolic reconstruction:Inferring physiology from sequence