high throughput computational sequence analysis rob edwards [email protected] argonne national...
Post on 19-Dec-2015
213 views
TRANSCRIPT
High Throughput ComputationalSequence Analysis
Argonne National LaboratorySan Diego State University
Firstbacterial genome
100bacterial genomes
1,000bacterial genomesN
um
ber
of
know
n s
equence
s
Year
How much has been sequenced
Environmentalsequencing
Everybody inSan Diego
Everybody inUSA
AllculturedBacteria
100people
How much will be sequenced
One genome fromevery species
Most majormicrobial environments
High Performance Computing
TeraGrid
The Teragrid National Resource
Life Sciences Gateway to TeraGrid
Subsystems
Subsystems make up metabolism
Wik
ipedia
Meta
bolis
mhtt
p:/
/en.w
ikip
edia
.org
/wik
i/Port
al:M
eta
bolis
m
Subsystems are not just metabolism
http://aig.cs.man.ac.uk/gallery/Utopia/
Enzyme complex
http://webdeptos.uma.es/
Cell Machinery
http://www.brown.edu/
Cell Processes
http://www.theseed.org
http://www.theseed.org
Growth in generation of subsystems
Microbial Genomics Annotation Platform
• Goal 1: Automate the generation of high quality annotations by leveraging the information contained in SubSystems and FIGfams.
• Goal 2: Minimize turnaround time. Initial target 48 hours
• Automated process consisting of:– Gene calling– Initial annotation of function– Initial metabolic
reconstruction• Process takes 1-7 hours
depending on size and complexity of the genome
• ~20 genomes per day
• Password protected, secure, private
• Release to public databases if required
Freely available annotation service
http://www.nmpdr.org/anno-server/index48.cgi
Some estimate of annotation quality
05
101520253035404550
Bacillus
anthracis str.
Sterne
Mycobacterium
tuberculosisCDC1551
Listeria
monocytogenes
EGD-e
Streptococcuspyogenes M1
GAS
Staphylococcusaureus subsp.
aureus MW2
260799 83331 169963 160490 196620
% in SS SEED
% in SS SP1Ke
% hypothecial SP1Ke
% hypothetical SEED
Evaluation / Viewing
Download results
• We provide a number of export formats:– Genbank, Fasta, GFF3, Excel– can easily be extended to all formats supported by
BioPerl
• Genomes can be deleted by the user at any time (we keep them for max. 120 days)
• Genomes can be directly imported into the SEED if the user wishes
• all genomes are password protected
Metagenomics SEED
http://metagenomics.theseed.org
Metagenome Metabolic Reconstruction
Starch utilization in cow rumens
Metabolic potential in environments
Everybody inSan Diego
Everybody inUSA
AllculturedBacteria
100people
Too much will be sequenced
One genome fromevery species
Most majormicrobial environments
Acknowledgements
Argonne National LaboratoryRick StevensBob OlsonFolker Meyer
San Diego State UniversityForest Rohwer
Fellowship for Interpretation of Genomes
Ross OverbeekVeronika VonsteinThe Annotators