![Page 1: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State](https://reader030.vdocuments.us/reader030/viewer/2022032605/56649e745503460f94b74eb5/html5/thumbnails/1.jpg)
The Metagenomics RAST server: Annotation, Analysis, and
ComparisonsPerfect for Pyrosequencing
Rob Edwards
Department of Computer Science, San Diego State University
Mathematics and Computer Sciences Division, Argonne National Laboratory
Roche Life Sciences Workshop, Sept 2008
www.nmpdr.org www.theseed.org
![Page 2: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State](https://reader030.vdocuments.us/reader030/viewer/2022032605/56649e745503460f94b74eb5/html5/thumbnails/2.jpg)
Outline
• Metagenomics
• Tools for analyzing sequences
• Computational Challenges
• Does it work?
www.nmpdr.org www.theseed.org
![Page 3: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State](https://reader030.vdocuments.us/reader030/viewer/2022032605/56649e745503460f94b74eb5/html5/thumbnails/3.jpg)
Firstbacterial genome
100bacterial genomes
1,000bacterial genomes
Num
ber
of
know
n s
equence
s
Year
How much has been sequenced?
Environmentalsequencing
www.nmpdr.org www.theseed.org
![Page 4: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State](https://reader030.vdocuments.us/reader030/viewer/2022032605/56649e745503460f94b74eb5/html5/thumbnails/4.jpg)
Everybody inSan Diego
Everybody inUSA
AllculturedBacteria
100people
How much will be sequenced?
One genome fromevery species
Most majormicrobial environments
www.nmpdr.org www.theseed.org
![Page 5: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State](https://reader030.vdocuments.us/reader030/viewer/2022032605/56649e745503460f94b74eb5/html5/thumbnails/5.jpg)
Metagenomics(Just sequence it)
200 liters water 5-500 g fresh fecal matter50 g soil
Sequence
Epifluorescent Microscopy
Concentrate and purify bacteria, viruses, etc
Extract nucleic acids
Publish papers
![Page 6: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State](https://reader030.vdocuments.us/reader030/viewer/2022032605/56649e745503460f94b74eb5/html5/thumbnails/6.jpg)
Marine Near-shore water (~100 samples) Off-shore water (~50 samples) Near- and off-shore sediments
Metazoanassociated Corals Fish Human blood Human stool
ModernMetagenomics
Terrestrial/Soil Terragenomics Amazon rainforest Konza prairie Joshua Tree desert Air
Freshwater Aquifer Glacial lake
ExtremeHot springs (84oC; 78oC)Soda lake (pH 13)Solar saltern (>35% salt)
![Page 7: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State](https://reader030.vdocuments.us/reader030/viewer/2022032605/56649e745503460f94b74eb5/html5/thumbnails/7.jpg)
The Problem
How do you generate consistent and accurate annotations for metagenomes?
www.nmpdr.org www.theseed.org
![Page 8: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State](https://reader030.vdocuments.us/reader030/viewer/2022032605/56649e745503460f94b74eb5/html5/thumbnails/8.jpg)
The SEED Family
www.nmpdr.org www.theseed.org
![Page 9: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State](https://reader030.vdocuments.us/reader030/viewer/2022032605/56649e745503460f94b74eb5/html5/thumbnails/9.jpg)
Annotations using subsystemsFIG developed the notion of Subsystem – a generalization of “pathway” as a collection of functional roles jointly involved in a biological process or complex
Extended subsystems into FIGfams – protein families that perform the same functions.
www.nmpdr.org www.theseed.org
![Page 10: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State](https://reader030.vdocuments.us/reader030/viewer/2022032605/56649e745503460f94b74eb5/html5/thumbnails/10.jpg)
Annotation of Complete Genomes
• Automated user originated processing
• Takes 1-7 hours depending on size and complexity of the genome
• ~2,000 external submissions, including hundreds of genomes not yet publicly released.
• Reannotation of >500 genomes complete
• 1,000 users, 200 organizations, 25 countries.
http://rast.nmpdr.org/
www.nmpdr.org www.theseed.org
![Page 11: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State](https://reader030.vdocuments.us/reader030/viewer/2022032605/56649e745503460f94b74eb5/html5/thumbnails/11.jpg)
The metagenomics RAST server
www.nmpdr.org www.theseed.org
![Page 12: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State](https://reader030.vdocuments.us/reader030/viewer/2022032605/56649e745503460f94b74eb5/html5/thumbnails/12.jpg)
Automated Processing
![Page 13: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State](https://reader030.vdocuments.us/reader030/viewer/2022032605/56649e745503460f94b74eb5/html5/thumbnails/13.jpg)
www.nmpdr.org www.theseed.org
Summary View
![Page 14: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State](https://reader030.vdocuments.us/reader030/viewer/2022032605/56649e745503460f94b74eb5/html5/thumbnails/14.jpg)
Metagenomics ToolsAnnotation & Subsystems
www.nmpdr.org www.theseed.org
![Page 15: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State](https://reader030.vdocuments.us/reader030/viewer/2022032605/56649e745503460f94b74eb5/html5/thumbnails/15.jpg)
Metagenomics ToolsAnnotation & KEGG maps
![Page 16: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State](https://reader030.vdocuments.us/reader030/viewer/2022032605/56649e745503460f94b74eb5/html5/thumbnails/16.jpg)
Metagenomics ToolsRecruitment Plots
![Page 17: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State](https://reader030.vdocuments.us/reader030/viewer/2022032605/56649e745503460f94b74eb5/html5/thumbnails/17.jpg)
Metagenomics ToolsPhylogenetic Reconstruction
![Page 18: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State](https://reader030.vdocuments.us/reader030/viewer/2022032605/56649e745503460f94b74eb5/html5/thumbnails/18.jpg)
Metagenomics ToolsComparative Tools
![Page 19: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State](https://reader030.vdocuments.us/reader030/viewer/2022032605/56649e745503460f94b74eb5/html5/thumbnails/19.jpg)
Hours
of
Com
pute
Tim
e
Input size (MB)
Computational Requirements~19 hours of compute per input megabyte
www.nmpdr.org www.theseed.org
![Page 20: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State](https://reader030.vdocuments.us/reader030/viewer/2022032605/56649e745503460f94b74eb5/html5/thumbnails/20.jpg)
How much so far
986 metagenomes
79,417,238 sequences
17,306,834,870 bp (17 Gbp)
Average: ~15-20 M bp per genome
Compute time (on a single CPU):
328,814 hours = 13,700 days = 38 years
~300 GS20~300 FLX~300 Sanger
www.nmpdr.org www.theseed.org
![Page 21: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State](https://reader030.vdocuments.us/reader030/viewer/2022032605/56649e745503460f94b74eb5/html5/thumbnails/21.jpg)
Lots of sequencesall pyrosequencing
www.nmpdr.org www.theseed.org
![Page 22: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State](https://reader030.vdocuments.us/reader030/viewer/2022032605/56649e745503460f94b74eb5/html5/thumbnails/22.jpg)
Metagenomics ToolsFunctional Heat Maps
![Page 23: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State](https://reader030.vdocuments.us/reader030/viewer/2022032605/56649e745503460f94b74eb5/html5/thumbnails/23.jpg)
Sulfur
CDA 60.2%
CD
A 2
1.7
% Respiration
Capsule Motility
Membranetransport
Stress
Signaling
Phosphorus
RNA
MineSaltern
MarineMicrobialites
CoralFish
AnimalsFreshwater
From Sequences To Environments
Dinsdale et al, Nature 2008
![Page 24: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State](https://reader030.vdocuments.us/reader030/viewer/2022032605/56649e745503460f94b74eb5/html5/thumbnails/24.jpg)
Workshops
Free workshops on NMPDR, RAST, mg-RAST, SEED
Contact Leslie McNeil [email protected]
or visithttp://www.nmpdr.org/
www.nmpdr.org www.theseed.org
![Page 25: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State](https://reader030.vdocuments.us/reader030/viewer/2022032605/56649e745503460f94b74eb5/html5/thumbnails/25.jpg)
Acknowledgements
Environmental GenomicsForest Rohwer All the labs that
provided sequence
Metagenomics Annotation ServerRick StevensFolker MeyerBob Olson
Daniel Paarman Mark D'Souza
Jared Wilkening Andreas Wilke
Statistics & Web servicesLiz DinsdaleRobert SchmiederDana HallBeltran Rodriguez-BritoBahador Nosrat
FIGRoss OverbeekVeronika VonsteinAnnotators
www.nmpdr.org www.theseed.org
ArtistPaula Morris
Argonne SequencingMarc DomanusAreej Ammar
![Page 26: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State](https://reader030.vdocuments.us/reader030/viewer/2022032605/56649e745503460f94b74eb5/html5/thumbnails/26.jpg)
Artists impression : not all machines are known to explode
![Page 27: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State](https://reader030.vdocuments.us/reader030/viewer/2022032605/56649e745503460f94b74eb5/html5/thumbnails/27.jpg)
Terragenomics
![Page 28: The Metagenomics RAST server: Annotation, Analysis, and Comparisons Perfect for Pyrosequencing Rob Edwards Department of Computer Science, San Diego State](https://reader030.vdocuments.us/reader030/viewer/2022032605/56649e745503460f94b74eb5/html5/thumbnails/28.jpg)
Differences between soil samples