Download - SGM Meeting, Warwick, April 2006
SGM Meeting, Warwick, April 2006
Challenges for metagenomic data analysis and lessons from viral
metagenomes[What would you do if sequencing were free?] Rob Edwards
San Diego State University Fellowship for Interpretation of Genomes
Outline The envy is not mine A tour around the world, thanks to
phage People suck What is the most successful gene in evolution? Is
there a Future? This is all 454 sequence data
21 libraries 10 microbial, 11 phage 597,340,328 bp total 20% of the
human genome 50% of all complete and partial microbial genomes
5,769,035 sequences Average 274,716 per library Average read length
bp Av. read length has not increased in 7 months Cost 0.04 per bp
Sequencing is cheap and easy. Bioinformatics is neither. The Soudan
Mine, Minnesota
Red StuffOxidized Black Stuff Reduced Red and Black Samples Are
Different
Black stuff Cloned and 454 sequenced 16S are indistinguishable
Cloned Red Red There are different amounts of metabolism in each
environment There are different amounts of substrates in each
environment
Stuff Black Stuff But are the differences significant?
Sample 10,000 proteins from site 1 Count frequency of each
subsystem Repeat 20,000 times Repeat for sample 2 Combine both
samples Sample 10,000 proteins 20,000 times Build 95% CI Compare
medians from sites 1 and 2 with 95% CI Rodriguez-Brito (2006). BMC
Bioinformatics Subsystem differences & metabolism Iron
acquisition
Black Stuff Siderophore enterobactin biosynthesis ferric
enterobactin transport ABC transporter ferrichrome ABC transporter
heme Black stuff: ferrous iron (Fe2+, ferroan
[(Mg,Fe)6(Si,Al)4O10(OH)8]) Red stuff: ferric iron (goethite
[FeO(OH)]) Nitrification differentiates the samples
Edwards (2006) BMC Genomics The challenge is explaining the
differences between samples
Red Sample Arg, Trp, His Ubiquinone FA oxidation Chemotaxis,
Flagella Methylglyoxalmetabolism Black Sample Ile, Leu, Val
Siderophores Glycerolipids NiFe hydrogenase Phenylpropionate
degradation We can cheaply compare the important
biochemistry happening in different environments We dont care which
organisms are doing the metabolism but we know what organisms are
there Outline The envy is not mine A tour around the world, thanks
to phage People suck What is the most successful gene in evolution?
Is there a Future? Why Phages? Phages are viruses that infect
bacteria
10:1 ratio of phages:bacteria 1031 phages on the planet Specific
interactions (probably) one virus : one host Small genome size
Higher coverage Horizontal gene transfer bp DNA per year in the
oceans Cant do fosmids Phages In The Worlds Oceans
GOM 41 samples 13 sites 5 years SAR 1 sample 1 site 1 year BBC 85
samples 38 sites 8 years ARC 56 samples 16 sites LI 4 sites Most
Marine Phage Sequences are Novel Phages are specific to
environments
ssDNA -like Phage Proteomic Tree v. 5 (Edwards, Rohwer) T4-like
T7-like Thanks: Mya Breitbart Marine Single-Stranded DNA
Viruses
6% of SAR sequences ssDNA phage (Chlamydia-like Microviridae) 40%
viral particles in SAR are ssDNA phage Several full-genome
sequences were recovered via de novo assembly of these fragments
Confirmed by PCR and sequencing SAR Aligned Against the Chlamydia
4
Individual sequence reads Coverage Concatenated hits Chlamydia phi
4 genome 12,297 sequence fragments hit using TBLASTX over a ~4.5 kb
genome Outline The envy is not mine A tour around the world, thanks
to phage People suck What is the most successful gene in evolution?
Is there a Future? Phages, Reefs, and Human Disturbance Phages,
Reefs, and Human Disturbance
Kingman Christmas Kingman Palmyra Washington Fanning Christmas The
Northern Line Islands Expedition, 2005 Christmas to Kingman Bias in
No. Phage Hosts
Negative numbers mean relatively more phage hosts at Kingman More
pathogens at Christmas. More people at Christmas. More
photosynthesis at Kingman. No people at Kingman. Outline The envy
is not mine A tour around the world, thanks to phage People suck
What is the most successful gene in evolution? Is there a Future?
Phages enrich for important genes
Rios Mesquites Stromatolites No photosynthesis genes in phages
Pozas Azules Stromatolites 5 different photosynthesis genes in
phages RNR is the most successful reaction in evolution Outline The
envy is not mine A tour around the world, thanks to phage People
suck What is the most successful gene in evolution? Is there a
Future? Computational Challenges
Sequence annotations and analysis What is there? What is it doing?
How is it doing it? Gene predictions in unknowns Lutz Krause
(Bielefeld) Sequence comparisons BLAST Other ways to rapidly
compare short sequences What happens when everyone is using 454
sequencing? Sequence data from 21 libraries
600 million bp 6 million sequences Each BLASTX search takes 1,000
CPU hours 21 libraries = 21,000 CPU hours or 2.4 CPU years Users
want repeat runs, TBLASTX, more analysis more data more, more,
more, more SDSU Forest Rohwer USF Mya Breitbart Rohwer Lab
Stromatolites ANL
Beltran Rodriguez-Brito USF Mya Breitbart Rohwer Lab Linda Wegley
Florent Angly Matt Haynes Stromatolites Janet Seifert Rice
University) Valeria Souza (UNAM, Mexico) ANL Rick Stevens Bob Olsen
CI Support FIG Veronika Vonstein Ross Overbeek Annotators Also at
SDSU Anca Segall Stanley Maloy Math Peter Salamon Joe Mahaffy James
Nulton Ben Felts David Bangor Steve Rayhawk Jennifer Mueller UBC
Curtis Suttle Amy Chan MIT: Ed DeLong