uc davis eve161 lecture 16 by @phylogenomics
DESCRIPTION
Slides for Lecture 16 in EVE 161 Course by Jonathan Eisen at UC DavisTRANSCRIPT
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Lecture 16:
EVE 161:Microbial Phylogenomics
!Lecture #16:
Era IV: Shotgun Metagenomics !
UC Davis, Winter 2014 Instructor: Jonathan Eisen
!1
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Where we are going and where we have been
!Previous lecture: ! 15: Era IV: Shotgun Metagenomics
! Current Lecture: ! 16: Era IV: Metagenomic Diversity and
Binning ! Next Lecture:
! 17: Era IV: Metagenomic Case Study
!2
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Era IV: Genomes in the environment
Era IV: Diversity from Metagenomics
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Era IV: Genomes in the environment
A Very Simple System
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Pierce’s Disease
!5
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
• Obligate xylem feeder
• Transmits Xylella between plants
• Much like mosquitoes transmit malarial pathogen
• Only animal listed as possible “bioterror” agent by US DHS
!6
Glassy winged sharpshooter
GLASSY-WINGEDSHARPSHOOTERA Serious Threat to California Agriculture
FROM THE
UNIVERSITY OF CALIFORNIA’S PIERCE’S DISEASE RESEARCH AND
EMERGENCY RESPONSE TASK FORCE
Glassy-winged sharpshooter eggs are laid together on theunderside of leaves, usually in groups of 10 to 12. The eggmasses appear as small, greenish blisters. These blisters areeasier to observe after the eggs hatch, when they appearas tan to brown scars on the leaves.
Parasitized egg masses are tan to brown in color withsmall, circular holes at one end of the eggs.
This informational brochure was produced by ANRCommunication Services for the University of Califor-nia Pierce’s Disease Research and EmergencyResponse Task Force. You may download a copy of thebrochure from the Division of Agriculture and NaturalResources web site at http://danr.ucop.edu or from theCommunication Services web site athttp://danrcs.ucdavis.edu.
Download a copy of this brochure from http://danr.ucop.edu or http://danrcs.ucdavis.edu
For local information, contact your UC CooperativeExtension farm advisor:
Adults
Egg masses
Glassy-winged SharpshooterGeneralized Lifecycle
100
80
60
40
20
0
Jan.
Mar
.
May
July
Sept
.
Nov.
Glassy-winged sharpshooters overwinter as adultsand begin laying egg masses in late Februarythrough May. This first generation matures asadults in late May through late August. Second-generation egg masses are laid starting in mid-June through late September, which develop intoover-wintering adults.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !7
Xylem feeding insects also very successful
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Xylem and Phloem
From Lodish et al. 2000
!8
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Animal nutrition
• Xylem is frequently missing essential amino acids, vitamins and Co-Factors, and has only small amounts of carbon skeletons
!9
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Plant response to sap feeders
• Possible solutions to no aa, vitamins, etc in xylem ! Eat other things ! Evolve metabolic pathways to synthesize missing nutrients ! Find some poor sap to make the stuff for you
!10
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Moran N. A. PNAS 2007;104:8627-8633
©2007 by National Academy of Sciences !11
5
Sharpshooter:Cuerna sayi
bacteriomes
Sharpshooters harbor two obligatesymbionts in their bacteriomes
Moran et al. 2003 Environ. Microbiol.Moran et al. 2005 Appl. Environ. Microbiol.
Candidatus “Baumannia cicadellinicola” (Gammaproteobacteria)
Candidatus “Sulcia muelleri” (Bacteroidetes)
D Takiya
0.1mm
Bacteriome dissected from anterior abdomen of H. vitripennis
Orange-red portion- Baumannia only
Yellow portion- Baumannia and Sulcia
(Moran et al. 2003 Environmental Microbiology)
7
10!m
“Candidatus Baumannia cicadellinicola” (Gammaproteobacteria)
in “red” portion of bacteriome of Homalodisca vitripennis
N=host nucleus B=Bacteriocyte membrane E=Endosymbionts
Irregularly spherical
~2 !m diameter
Phylogeny of Sulcia muelleri from Auchenorrhyncha
(Hemiptera): the oldest insect symbiont
Moran et al. Appl Env Micro 2005
Permian
age fossils
(>270 myr)
•= 100% Bootstrap
•support, all methods
Broad congruence with host
relationships
Dates to the origins of vascular
plant-feeding in insects
Symbionts derived
from sharpshooters
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
DNA extraction
PCRSequence
rRNA genes
Sequence alignment = Data matrixPhylogenetic tree
PCR
rRNA1
Yeast
Makes lots of copies of the rRNA genes in sample
E. coli
Humans
A
T
T
A
G
A
A
C
A
T
C
A
C
A
A
C
A
G
G
A
G
T
T
CrRNA1
E. coli Humans
Yeast
!12
rRNA1 5’ ...TACAGTATAGGTGGAGCTAGCGATCGAT
CGA... 3’
PCR and phylogenetic analysis of rRNA genes
5
Sharpshooter:Cuerna sayi
bacteriomes
Sharpshooters harbor two obligatesymbionts in their bacteriomes
Moran et al. 2003 Environ. Microbiol.Moran et al. 2005 Appl. Environ. Microbiol.
Candidatus “Baumannia cicadellinicola” (Gammaproteobacteria)
Candidatus “Sulcia muelleri” (Bacteroidetes)
D Takiya
0.1mm
Bacteriome dissected from anterior abdomen of H. vitripennis
Orange-red portion- Baumannia only
Yellow portion- Baumannia and Sulcia
(Moran et al. 2003 Environmental Microbiology)
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Baumania is close relative of Buchnera symbionts of aphids
SharpshootersAphidsAphidsAphidsAntsFlies
!13Wu et al. 2006 PLoS Biology 4: e188.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Baumania is close relative of Buchnera symbionts of aphids
SharpshootersAphidsAphidsAphidsAntsFlies
!14Wu et al. 2006 PLoS Biology 4: e188.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !15
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !16
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Sharpshooter Shotgun Sequencing
shotgun
Wu et al. 2006 PLoS Biology 4: e188.Collaboration with Nancy Moran’s lab
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Higher Evolutionary Rates in Endosymbionts
Wu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy Moran’ s Lab
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Variation in Evolution Rates Correlated with Repair Gene Presence
MutS MutL
+ +
+ +
+ +
+ +
_ _
_ _
Wu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy Moran’ s Lab
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Polymorphisms in Metapopulation
• Data from ~200 hosts –104 SNPs –2 indels
• PCR surveys show that this is between host variation
• Much lower ratio of transitions:transversions than in Blochmannia
• Consistent with absence of MMR from Blochmannia
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Predict metabolic networks from Genome
Wu et al. 2006 PLoS Biology 4: e188.
!24
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Baumannia is a Vitamin and Cofactor Producing Machine
Wu et al. 2006 PLoS Biology 4: e188.
!25
VITAMIN AND COFACTOR
PRODUCING MACHINE
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Baumannia is a Vitamin and Cofactor Producing Machine
Wu et al. 2006 PLoS Biology 4: e188.
!26
NO PATHWAYS FOR ESSENTIAL AMINO ACID SYNTHESIS
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !27
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
???????
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Commonly Used Binning Methods Did not Work Well
• Assembly –Only Baumannia generated good contigs
• Depth of coverage –Everything else 0-1X coverage
• Nucleotide composition –No detectible peaks in any vector we looked at
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
A B C D E F G
T U V W X Y Z
Binning challenge
!30
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
A B C D E F G
T U V W X Y Z
Binning challenge
Best binning method: reference genomes
!31
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
A B C D E F G
T U V W X Y Z
Binning challenge
No reference genome? What do you do? !Phylogeny ....
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
DNA extraction
PCRSequence
rRNA genes
Sequence alignment = Data matrixPhylogenetic tree
PCR
rRNA1
rRNA2
Makes lots of copies of the rRNA genes in sample
rRNA1 5’ ...ACACACATAGGTGGAGCTAGCGATCGAT
CGA... 3’
E. coli
Humans
A
T
T
A
G
A
A
C
A
T
C
A
C
A
A
C
A
G
G
A
G
T
T
CrRNA1
E. coli Humans
rRNA2
!33
rRNA2 5’ ...TACAGTATAGGTGGAGCTAGCGATCGAT
CGA... 3’
PCR and phylogenetic analysis of rRNA genes
!
!"#$%&"''()$*!"#$%& '&()
+#,()$-'.)&
!"#$%&"''()$&/"#$+'$/(0'/'+1-2#()&3.+-'4(& -4/(")-$/+#,()$-'.)&
5'$#4/#*+&,6/7889/-%.)$/%0+1)2$/3)/,05'$#4/#*+&,0+788:/455,0+-%.)$/%0+1)2$/3)/,0+
"#$%&%#'() *!"#$"%%&" '&'"()**&%&'+*", +,#--#./0'102#3'1/
"#$%&%#'() *-#*'&" $#)**).&, +5#3'1/0&%1'1)4
;/<#=-3#
678--
5#3'1/&0-1 %&))13'1%9:/0-9#$'1/&0/9#2%0-1$90:9/012&3.&4)%%&5
;/#$<1=/1%9.0/'&0$= !"#$"%%&" 0$>?9
@1>>0A9.0/'&0$= !"#$"%%&" #$%9-#*'&"
+B0/#$91'9#>79C66D96%2&.+%$)%3"*17&'.+8&+*+9:4
5
Sharpshooter:Cuerna sayi
bacteriomes
Sharpshooters harbor two obligatesymbionts in their bacteriomes
Moran et al. 2003 Environ. Microbiol.Moran et al. 2005 Appl. Environ. Microbiol.
Candidatus “Baumannia cicadellinicola” (Gammaproteobacteria)
Candidatus “Sulcia muelleri” (Bacteroidetes)
D Takiya
0.1mm
Bacteriome dissected from anterior abdomen of H. vitripennis
Orange-red portion- Baumannia only
Yellow portion- Baumannia and Sulcia
(Moran et al. 2003 Environmental Microbiology)
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !34
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014Wu et al. 2006 PLoS Biology 4: e188.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
???????
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
CFB Phyla
Wu et al. 2006 PLoS Biology 4: e188.
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Sulcia makes essential amino acids
!38
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Sulcia makes essential amino acids
!39
ESSENTIAL AMINO ACID PRODUCING
MACHINE
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Wu et al. 2006 PLoS Biology 4: e188.
Baumannia makes vitamins and cofactors
Sulcia makes amino acids
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Era IV: Genomes in the environment
Phylogeny & Metagenomes
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
• Why analyze protein coding “marker” genes in metagenomic data?
•
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
AMPHORA: AutoMated PHylogenOmic infeRence
Wu and Eisen, 2008. Genome Biology 2008, 9:R151 !doi:10.1186/gb-2008-9-10-r151 http://genomebiology.com/2008/9/10/R151
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Choosing Marker Genes
• How choose the marker genes to use?
!
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Choosing Marker Genes
• How choose the marker genes to use?
!• “We selected these proteins because they are universally
distributed in bacteria; the vast majority of them exist as single copy genes within each genome; and they are housekeeping genes that are involved in information processing (replication, transcription, and translation) or central metabolism, and thus are thought to be relatively recalcitrant to lateral gene transfer [19].”
Wu and Eisen, 2008. Genome Biology 2008, 9:R151 !doi:10.1186/gb-2008-9-10-r151 http://genomebiology.com/2008/9/10/R151
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Identification and alignment of metagenomic reads?
• How to identify and rapidly align reads that correspond to each of the markers selected?
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Identification and alignment of metagenomic reads?
• How to identify and rapidly align reads that correspond to each of the markers selected?
!• Prealign representatives from genomes
• Manually curate to build a “mask”
• Create “Hidden markov model” of the alignment
• Use HMM to search for, and align from metagenomes
!• For more on HMMs see “What is a Hidden Markov Model” by Sean Eddy. http://www.nature.com/nbt/journal/v22/
n10/full/nbt1004-1315.html
Wu and Eisen, 2008. Genome Biology 2008, 9:R151 !doi:10.1186/gb-2008-9-10-r151 http://genomebiology.com/2008/9/10/R151
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
HMMs and Profiles
• Useful guide here http://compbio.soe.ucsc.edu/ismb99.handouts/KK185FP.html
Profile HMM
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 20140.2
Kineococcus radiotolerans SRS30216
Nitrosomonas eutropha C91
Flavobacterium psychrophilum JIP02 86
Saccharopolyspora erythraea NRRL 2338
Magnetospirillum magneticum AMB 1
Bacillus subtilis subsp subtilis str 168
Acidobacteria bacterium Ellin345
Rhodopirellula baltica SH 1
Mycobacterium bovis BCG str Pasteur 1173P2
Escherichia coli CFT073
Bacillus thuringiensis serovar konkukian str 97 27
Synechococcus sp CC9902
Leptospira interrogans serovar Lai str 56601
Rickettsia conorii str Malish 7
Helicobacter pylori J99
Shigella flexneri 5 str 8401
Bordetella pertussis Tohama I
Mycobacterium sp JLS
Colwellia psychrerythraea 34H
Lawsonia intracellularis PHE MN1 00
Prochlorococcus marinus str MIT 9303
Streptomyces coelicolor A3 2
Lactobacillus helveticus DPC 4571
Yersinia pestis Nepal516
Wigglesworthia glossinidia endosymbiont of Glossina brevipalpis
Haemophilus influenzae 86 028NP
Campylobacter jejuni subsp jejuni 81116
Burkholderia pseudomallei K96243
Borrelia burgdorferi B31
Serratia proteamaculans 568
Tropheryma whipplei TW08 27
Clostridium perfringens SM101
Anaplasma phagocytophilum HZ
Psychrobacter arcticus 273 4
Fusobacterium nucleatum subsp nucleatum ATCC 25586
Rickettsia bellii OSU 85 389
Dehalococcoides ethenogenes 195
Buchnera aphidicola str Cc Cinara cedri
Candidatus Ruthia magnifica str Cm Calyptogena magnifica
Chlamydophila caviae GPIC
Roseobacter denitrificans OCh 114
Shewanella denitrificans OS217
Buchnera aphidicola str APS Acyrthosiphon pisum
Geobacillus thermodenitrificans NG80 2
Parabacteroides distasonis ATCC 8503
Corynebacterium jeikeium K411
Yersinia pseudotuberculosis IP 32953
Francisella tularensis subsp tularensis SCHU S4
Streptococcus pyogenes MGAS315
Zymomonas mobilis subsp mobilis ZM4
Bacillus anthracis str Ames
Shewanella frigidimarina NCIMB 400
Streptococcus gordonii str Challis substr CH1
Thiomicrospira crunogena XCL 2
Streptococcus pyogenes str Manfredo
Pseudomonas putida KT2440
Mycobacterium leprae TN
Staphylococcus saprophyticus subsp saprophyticus ATCC 15305
Shewanella sp MR 7
Rhodopseudomonas palustris HaA2
Geobacter sulfurreducens PCA
Lactococcus lactis subsp cremoris SK11
Nitrosococcus oceani ATCC 19707
Rhodopseudomonas palustris BisB18
Synechococcus elongatus PCC 6301
Pelotomaculum thermopropionicum SI
Synechocystis sp PCC 6803
Sorangium cellulosum So ce 56
Wolinella succinogenes DSM 1740
Moorella thermoacetica ATCC 39073
Gluconobacter oxydans 621H
Mycoplasma capricolum subsp capricolum ATCC 27343
Actinobacillus succinogenes 130Z
Baumannia cicadellinicola str Hc Homalodisca coagulata
Pseudomonas syringae pv syringae B728aHahella chejuensis KCTC 2396
Gramella forsetii KT0803
Brucella abortus biovar 1 str 9 941
Ochrobactrum anthropi ATCC 49188
Campylobacter fetus subsp fetus 82 40
Burkholderia mallei ATCC 23344
Anaplasma marginale str St Maries
Frankia sp CcI3
Dehalococcoides sp BAV1
Mycoplasma genitalium G37
Mycoplasma hyopneumoniae 7448
Synechococcus sp JA 3 3Ab
Parvibaculum lavamentivorans DS 1
Lactobacillus johnsonii NCC 533
Onion yellows phytoplasma OY M
Mycoplasma mycoides subsp mycoides SC str PG1
Desulfococcus oleovorans Hxd3
Escherichia coli 536
Methylobacterium extorquens PA1
Nitrosomonas europaea ATCC 19718
Campylobacter curvus 525 92
Nitrosospira multiformis ATCC 25196
Pseudomonas aeruginosa PAO1
Burkholderia multivorans ATCC 17616
Chlamydophila abortus S26 3
Shewanella loihica PV 4
Candidatus Blochmannia pennsylvanicus str BPEN
Xylella fastidiosa 9a5c
Chlamydophila felis Fe C 56
Caulobacter crescentus CB15
Orientia tsutsugamushi Boryong
Leptospira borgpetersenii serovar Hardjo bovis JB197
Rhodopseudomonas palustris BisA53
Marinomonas sp MWYL1
Chlamydophila pneumoniae TW 183
Brucella suis 1330
Renibacterium salmoninarum ATCC 33209
Citrobacter koseri ATCC BAA 895
Mycobacterium bovis AF2122 97
Lactococcus lactis subsp lactis Il1403
Campylobacter hominis ATCC BAA 381
Polynucleobacter sp QLW P1DMWA 1
Coxiella burnetii RSA 331
Bifidobacterium longum NCC2705
Rickettsia rickettsii str Iowa
Bacillus cereus ATCC 10987
Synechococcus sp CC9605
Burkholderia xenovorans LB400
Prosthecochloris vibrioformis DSM 265
Streptococcus pyogenes MGAS10270
Listeria monocytogenes EGD e
Burkholderia cenocepacia AU 1054
Trichodesmium erythraeum IMS101
Streptococcus thermophilus CNRZ1066
Haemophilus somnus 129PT
Bacteroides fragilis YCH46
Streptococcus pyogenes SSI 1
Shewanella sp MR 4
Chlamydophila pneumoniae CWL029
Myxococcus xanthus DK 1622
Pelobacter carbinolicus DSM 2380
Actinobacillus pleuropneumoniae serovar 3 str JL03
Rickettsia prowazekii str Madrid E
Listeria welshimeri serovar 6b str SLCC5334
Salmonella enterica subsp enterica serovar Paratyphi B str SPB7
Bartonella henselae str Houston 1
Shewanella oneidensis MR 1
Rubrobacter xylanophilus DSM 9941
Marinobacter aquaeolei VT8
Neisseria meningitidis FAM18
Acidovorax avenae subsp citrulli AAC00 1
Staphylococcus aureus subsp aureus N315
Prochlorococcus marinus str MIT 9301
Oenococcus oeni PSU 1
Clostridium botulinum A str ATCC 19397
Streptococcus sanguinis SK36
Burkholderia pseudomallei 1710b
Bradyrhizobium sp BTAi1
Helicobacter pylori HPAG1
Staphylococcus aureus subsp aureus MW2
Pseudomonas aeruginosa PA7
Arthrobacter aurescens TC1
Candidatus Sulcia muelleri GWSS
Propionibacterium acnes KPA171202
Clostridium botulinum A str Hall
Psychrobacter cryohalolentis K5
Ralstonia solanacearum GMI1000
Polaromonas naphthalenivorans CJ2
Mycoplasma synoviae 53
Neisseria meningitidis MC58
Campylobacter jejuni subsp jejuni 81 176
Lactobacillus reuteri F275
Prochlorococcus marinus subsp marinus str CCMP1375
Rickettsia canadensis str McKiel
Lactococcus lactis subsp cremoris MG1363
Prochlorococcus marinus str MIT 9515
Campylobacter concisus 13826
Bifidobacterium adolescentis ATCC 15703
Lactobacillus gasseri ATCC 33323
Pseudomonas putida F1
Borrelia garinii PBi
Ralstonia metallidurans CH34
Alcanivorax borkumensis SK2
Streptococcus thermophilus LMG 18311
Bacillus thuringiensis str Al Hakam
Herpetosiphon aurantiacus ATCC 23779
Azoarcus sp EbN1
Corynebacterium glutamicum R
Bacteroides thetaiotaomicron VPI 5482
Bordetella bronchiseptica RB50
Synechococcus sp JA 2 3B a 2 13
Xanthomonas oryzae pv oryzae MAFF 311018
Treponema denticola ATCC 35405
Carboxydothermus hydrogenoformans Z 2901
Campylobacter jejuni RM1221
Mycoplasma pulmonis UAB CTIP
Candidatus Protochlamydia amoebophila UWE25
Lactobacillus sakei subsp sakei 23K
Aeromonas hydrophila subsp hydrophila ATCC 7966
Shewanella sp W3 18 1
Thermus thermophilus HB8
Pseudomonas syringae pv phaseolicola 1448A
Rickettsia massiliae MTU5
Helicobacter hepaticus ATCC 51449
Mesoplasma florum L1
Mycoplasma hyopneumoniae 232
Bacillus cereus E33L
Staphylococcus aureus subsp aureus MSSA476
Azoarcus sp BH72
Silicibacter pomeroyi DSS 3
Campylobacter jejuni subsp doylei 269 97
Borrelia afzelii PKo
Ehrlichia canis str Jake
Bordetella petrii
Desulfotomaculum reducens MI 1
Gloeobacter violaceus PCC 7421
Escherichia coli O157 H7 EDL933
Staphylococcus aureus subsp aureus JH9
Nitrobacter hamburgensis X14
Rickettsia typhi str Wilmington
Novosphingobium aromaticivorans DSM 12444
Staphylococcus aureus subsp aureus USA300 TCH1516
Clostridium beijerinckii NCIMB 8052
Bacillus clausii KSM K16
Xanthomonas oryzae pv oryzae KACC10331
Clostridium botulinum F str Langeland
Lactobacillus salivarius subsp salivarius UCC118
Pediococcus pentosaceus ATCC 25745
Burkholderia mallei NCTC 10229
Francisella tularensis subsp tularensis FSC198
Symbiobacterium thermophilum IAM 14863
Pasteurella multocida subsp multocida str Pm70
Bacillus weihenstephanensis KBAB4
Sphingopyxis alaskensis RB2256
Chlamydia trachomatis A HAR 13
Solibacter usitatus Ellin6076
Delftia acidovorans SPH 1
Sinorhizobium medicae WSM419
Psychrobacter sp PRwf 1
Candidatus Pelagibacter ubique HTCC1062
Shewanella baltica OS195
Synechococcus sp CC9311
Aster yellows witches broom phytoplasma AYWB
Salinispora arenicola CNS 205
Dinoroseobacter shibae DFL 12
Pelodictyon luteolum DSM 273
Arthrobacter sp FB24
Prochlorococcus marinus str MIT 9211
Nocardioides sp JS614
Corynebacterium glutamicum ATCC 13032
Synechococcus sp WH 8102
Maricaulis maris MCS10
Methylococcus capsulatus str Bath
Clostridium botulinum A str ATCC 3502
Bacillus pumilus SAFR 032
Thiobacillus denitrificans ATCC 25259
Lactobacillus plantarum WCFS1
Thermoanaerobacter pseudethanolicus ATCC 33223
Bacillus halodurans C 125
Geobacter uraniireducens Rf4
Clostridium acetobutylicum ATCC 824
Thermotoga maritima MSB8
Francisella tularensis subsp tularensis WY96 3418
Anaeromyxobacter dehalogenans 2CP C
Thermobifida fusca YX
Erwinia carotovora subsp atroseptica SCRI1043
Yersinia pestis KIM
Leuconostoc mesenteroides subsp mesenteroides ATCC 8293
Staphylococcus aureus subsp aureus JH1
Photobacterium profundum SS9
Fervidobacterium nodosum Rt17 B1
Rhodobacter sphaeroides 2 4 1
Roseiflexus sp RS 1
Ehrlichia chaffeensis str Arkansas
Streptococcus agalactiae NEM316
Pseudomonas aeruginosa UCBPP PA14
Escherichia coli K12
Sulfurovum sp NBC37 1
Clostridium phytofermentans ISDg
Chlamydia trachomatis 434 Bu
Streptococcus pyogenes MGAS2096
Salmonella enterica subsp arizonae serovar 62 z4 z23
Hyphomonas neptunium ATCC 15444
Silicibacter sp TM1040
Streptococcus suis 05ZYH33
Francisella tularensis subsp holarctica OSU18
Mycobacterium ulcerans Agy99
Prochlorococcus marinus str NATL1A
Escherichia coli O157 H7 str Sakai
Mannheimia succiniciproducens MBEL55E
Chlorobium chlorochromatii CaD3
Yersinia pestis Antiqua
Shewanella sp ANA 3
Mesorhizobium loti MAFF303099
Shewanella putrefaciens CN 32
Wolbachia endosymbiont strain TRS of Brugia malayi
Mycoplasma hyopneumoniae J
Acidovorax sp JS42
Syntrophomonas wolfei subsp wolfei str Goettingen
Alkaliphilus oremlandii OhILAs
Deinococcus geothermalis DSM 11300
Pseudoalteromonas haloplanktis TAC125
Ralstonia eutropha H16
Burkholderia ambifaria AMMD
Streptococcus pyogenes MGAS8232
Wolbachia endosymbiont of Drosophila melanogaster
Acinetobacter baumannii ATCC 17978
Neisseria gonorrhoeae FA 1090
Herminiimonas arsenicoxydans
Clostridium perfringens str 13
Lactobacillus brevis ATCC 367
Prochlorococcus marinus str NATL2A
Shigella sonnei Ss046
Brucella melitensis 16M
Chlamydophila pneumoniae AR39
Xanthobacter autotrophicus Py2
Mycobacterium tuberculosis H37Rv
Enterobacter sakazakii ATCC BAA 894Enterobacter sp 638
Legionella pneumophila str Lens
Rickettsia felis URRWXCal2
Staphylococcus epidermidis RP62A
Jannaschia sp CCS1
Streptomyces avermitilis MA 4680
Bradyrhizobium japonicum USDA 110
Tropheryma whipplei str Twist
Magnetococcus sp MC 1
Verminephrobacter eiseniae EF01 2
Polaromonas sp JS666
Mycobacterium smegmatis str MC2 155
Treponema pallidum subsp pallidum str Nichols
Salinibacter ruber DSM 13855
Mycoplasma pneumoniae M129
Streptococcus pyogenes M1 GAS
Francisella tularensis subsp holarctica FTA
Desulfotalea psychrophila LSv54
Escherichia coli APEC O1
Ehrlichia ruminantium str Gardel
Leptospira interrogans serovar Copenhageni str Fiocruz L1 130
Lactobacillus delbrueckii subsp bulgaricus ATCC BAA 365
Desulfovibrio desulfuricans G20
Streptococcus pneumoniae D39
Brucella canis ATCC 23365
Chromohalobacter salexigens DSM 3043
Streptococcus agalactiae A909
Prochlorococcus marinus str MIT 9312
Haemophilus ducreyi 35000HP
Leptospira borgpetersenii serovar Hardjo bovis L550
Pseudomonas entomophila L48
Thermus thermophilus HB27
Thermotoga lettingae TMO
Salmonella enterica subsp enterica serovar Typhi str Ty2
Salmonella enterica subsp enterica serovar Choleraesuis str SC B67
Haemophilus influenzae PittGG
Bartonella bacilliformis KC583
Clostridium kluyveri DSM 555
Mycobacterium tuberculosis H37Ra
Burkholderia vietnamiensis G4
Syntrophus aciditrophicus SB
Streptococcus pyogenes MGAS6180
Helicobacter pylori 26695
Rhodobacter sphaeroides ATCC 17025
Mycobacterium vanbaalenii PYR 1
Candidatus Vesicomyosocius okutanii HA
Pseudoalteromonas atlantica T6c
Lactobacillus delbrueckii subsp bulgaricus ATCC 11842
Staphylococcus aureus subsp aureus str Newman
Mycoplasma agalactiae PG2
Bacillus anthracis str Ames Ancestor
Thermoanaerobacter sp X514
Staphylococcus aureus subsp aureus MRSA252
Bacillus licheniformis ATCC 14580
Salinispora tropica CNB 440
Legionella pneumophila str Paris
Arcobacter butzleri RM4018
Burkholderia mallei SAVP1
Pseudomonas stutzeri A1501
Xylella fastidiosa Temecula1
Rhodobacter sphaeroides ATCC 17029
Bdellovibrio bacteriovorus HD100
Haemophilus influenzae Rd KW20
Pseudomonas putida GB 1
Francisella tularensis subsp holarctica
Mycobacterium avium subsp paratuberculosis K 10
Rhodoferax ferrireducens T118
Clostridium difficile 630
Sodalis glossinidius str morsitans
Neisseria meningitidis Z2491
Clostridium tetani E88
Streptococcus thermophilus LMD 9
Vibrio vulnificus YJ016
Sphingomonas wittichii RW1
Staphylococcus aureus subsp aureus Mu50
Deinococcus radiodurans R1
Shigella boydii Sb227
Pseudomonas fluorescens Pf 5
Thermotoga petrophila RKU 1
Chlamydophila pneumoniae J138
Synechococcus sp RCC307
Rickettsia rickettsii str Sheila Smith
Lactobacillus acidophilus NCFM
Microcystis aeruginosa NIES 843
Desulfovibrio vulgaris subsp vulgaris str Hildenborough
Streptococcus suis 98HAH33
Burkholderia pseudomallei 1106a
Frankia sp EAN1pec
Mycobacterium tuberculosis CDC1551
Xanthomonas campestris pv vesicatoria str 85 10
Pseudomonas mendocina ymp
Chlorobium phaeobacteroides DSM 266
Brucella melitensis biovar Abortus 2308
Shigella dysenteriae Sd197
Photorhabdus luminescens subsp laumondii TTO1
Buchnera aphidicola str Sg Schizaphis graminum
Acaryochloris marina MBIC11017
Candidatus Blochmannia floridanus
Ehrlichia ruminantium str Welgevonden
Burkholderia cenocepacia HI2424
Syntrophobacter fumaroxidans MPOB
Bordetella parapertussis 12822
Salmonella typhimurium LT2
Sinorhizobium meliloti 1021
Streptococcus pyogenes MGAS10394
Bacteroides fragilis NCTC 9343
Rhodopseudomonas palustris BisB5
Burkholderia thailandensis E264
Gluconacetobacter diazotrophicus PAl 5
Agrobacterium tumefaciens str C58
Escherichia coli HS
Bacillus cereus subsp cytotoxis NVH 391 98
Shewanella baltica OS155
Haemophilus influenzae PittEE
Vibrio harveyi ATCC BAA 1116
Ralstonia eutropha JMP134
Petrotoga mobilis SJ95
Mycoplasma penetrans HF 2
Dechloromonas aromatica RCB
Staphylococcus aureus subsp aureus Mu3
Bartonella quintana str Toulouse
Campylobacter jejuni subsp jejuni NCTC 11168
Mycobacterium tuberculosis F11
Thermoanaerobacter tengcongensis MB4
Yersinia pestis biovar Microtus str 91001
Neorickettsia sennetsu str Miyayama
Xanthomonas axonopodis pv citri str 306
Leifsonia xyli subsp xyli str CTCB07
Desulfovibrio vulgaris subsp vulgaris DP4
Streptococcus pyogenes MGAS9429
Nitrobacter winogradskyi Nb 255
Methylobacillus flagellatus KT
Mycobacterium sp KMS
Staphylococcus haemolyticus JCSC1435
Pseudomonas fluorescens PfO 1
Shigella flexneri 2a str 2457T
Coxiella burnetii RSA 493
Neisseria meningitidis 053442
Coxiella burnetii Dugway 5J108 111
Bacillus amyloliquefaciens FZB42
Chlamydia trachomatis D UW 3 CX
Roseiflexus castenholzii DSM 13941
Salmonella enterica subsp enterica serovar Paratyphi A str ATCC 9150
Staphylococcus aureus subsp aureus COL
Azorhizobium caulinodans ORS 571
Yersinia enterocolitica subsp enterocolitica 8081
Prochlorococcus marinus str MIT 9313
Vibrio cholerae O395
Clostridium thermocellum ATCC 27405
Shigella flexneri 2a str 301
Anaeromyxobacter sp Fw109 5
Brucella ovis ATCC 25840
Paracoccus denitrificans PD1222
Pseudomonas syringae pv tomato str DC3000
Shewanella amazonensis SB2B
Bartonella tribocorum CIP 105476
Nitratiruptor sp SB155 2
Shewanella sediminis HAW EB3
Alkaliphilus metalliredigens QYMF
Bacillus cereus ATCC 14579
Burkholderia mallei NCTC 10247
Synechococcus elongatus PCC 7942
Burkholderia pseudomallei 668
Staphylococcus aureus RF122
Salmonella enterica subsp enterica serovar Typhi str CT18
Staphylococcus epidermidis ATCC 12228
Geobacillus kaustophilus HTA426
Streptococcus agalactiae 2603V R
Streptococcus mutans UA159
Lactobacillus casei ATCC 334
Acinetobacter sp ADP1
Chromobacterium violaceum ATCC 12472
Chlorobium tepidum TLS
Cytophaga hutchinsonii ATCC 33406
Streptococcus pyogenes MGAS5005
Rickettsia bellii RML369 C
Yersinia pseudotuberculosis IP 31758
Rhizobium etli CFN 42
Helicobacter acinonychis str Sheeba
Dichelobacter nodosus VCS1703A
Janthinobacterium sp Marseille
Alkalilimnicola ehrlichei MLHE 1
Idiomarina loihiensis L2TR
Mycoplasma gallisepticum R
Yersinia pestis CO92
Oceanobacillus iheyensis HTE831
Buchnera aphidicola str Bp Baizongia pistaciae
Yersinia pestis Angola
Aeromonas salmonicida subsp salmonicida A449
Porphyromonas gingivalis W83
Granulibacter bethesdensis CGDNIH1
Frankia alni ACN14a
Dehalococcoides sp CBDB1
Mycobacterium avium 104
Listeria monocytogenes str 4b F2365
Nostoc sp PCC 7120
Chlamydia muridarum Nigg
Methylibium petroleiphilum PM1
Streptococcus pyogenes MGAS10750
Enterococcus faecalis V583
Vibrio vulnificus CMCP6
Acholeplasma laidlawii PG 8A
Prochlorococcus marinus str AS9601
Halorhodospira halophila SL1
Xanthomonas campestris pv campestris str 8004
Escherichia coli UTI89
Shewanella pealeana ATCC 700345
Rickettsia akari str Hartford
Caldicellulosiruptor saccharolyticus DSM 8903
Staphylococcus aureus subsp aureus NCTC 8325
Prochlorococcus marinus subsp pastoris str CCMP1986
Vibrio parahaemolyticus RIMD 2210633
Psychromonas ingrahamii 37
Legionella pneumophila str Corby
Listeria innocua Clip11262
Klebsiella pneumoniae subsp pneumoniae MGH 78578
Staphylococcus aureus subsp aureus USA300
Yersinia pestis Pestoides F
Clostridium novyi NT
Rhodopseudomonas palustris CGA009
Bacillus anthracis str Sterne
Flavobacterium johnsoniae UW101
Streptococcus pneumoniae R6
Chlamydia trachomatis L2b UCH 1 proctitis
Sulfurimonas denitrificans DSM 1251
Bacteroides vulgatus ATCC 8482
Thermosipho melanesiensis BI429
Erythrobacter litoralis HTCC2594
Chloroflexus aurantiacus J 10 fl
Ureaplasma parvum serovar 3 str ATCC 700970
Rhodospirillum rubrum ATCC 11170
Mycoplasma mobile 163K
Aquifex aeolicus VF5
Acidothermus cellulolyticus 11B
Anabaena variabilis ATCC 29413
Vibrio cholerae O1 biovar eltor str N16961
Vibrio fischeri ES114
Burkholderia sp 383
Geobacter metallireducens GS 15
Desulfitobacterium hafniense Y51
Mycobacterium sp MCS
Acidiphilium cryptum JF 5
Brucella suis ATCC 23445
Corynebacterium diphtheriae NCTC 13129
Synechococcus sp WH 7803
Mycobacterium gilvum PYR GCK
Francisella tularensis subsp novicida U112
Rhodococcus sp RHA1
Clavibacter michiganensis subsp michiganensis NCPPB 382
Bradyrhizobium sp ORS278
Corynebacterium efficiens YS 314
Mesorhizobium sp BNC1
Thermosynechococcus elongatus BP 1
Nocardia farcinica IFM 10152
Shewanella baltica OS185
Escherichia coli E24377A
Saccharophagus degradans 2 40
Legionella pneumophila subsp pneumophila str Philadelphia 1
Actinobacillus pleuropneumoniae L20
Prochlorococcus marinus str MIT 9215
Xanthomonas campestris pv campestris str ATCC 33913
Rhizobium leguminosarum bv viciae 3841
Streptococcus pneumoniae TIGR4
Pelobacter propionicus DSM 2379
Clostridium perfringens ATCC 13124
Gammaproteobacteria
Betaproteobacteria
Alphaproteobacteria
Epsilonproteobacteria
Deltaproteobacteria
Acidobacteria
Bacteroidetes/Chlorobi
Chlamydiae/Planctomycetes
Spirochaetes
Firmicutes
Cyanobacteria
Chloroflexi
Actinobacteria
Aquificae
Thermotogae
Deinococcus/Thermus
Firmicutes
Fusobacteria
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
0.07
Sodalis glossinidius str. 'morsitans'
Thiomicrospira crunogena XCL-2
Acinetobacter sp. ADP1
Pseudoalteromonas haloplanktis TAC125
Marinobacter aquaeolei VT8
Colwellia psychrerythraea 34H
Pseudomonas stutzeri A1501
Haemophilus somnus 129PT
Alkalilimnicola ehrlichei MLHE-1
Mannheimia succiniciproducens MBEL55E
Psychromonas ingrahamii 37
Xanthomonas campestris pv. vesicatoria str. 85-10
Methylococcus capsulatus str. Bath
Shewanella pealeana ATCC 700345
Candidatus Ruthia magnifica str. Cm Calyptogena magnifica
Vibrio fischeri ES114
Erwinia carotovora subsp. atroseptica SCRI1043
Coxiella burnetii RSA 493
Buchnera aphidicola str. Cc Cinara cedri
Francisella tularensis subsp. novicida U112
Psychrobacter sp. PRwf-1
Xylella fastidiosa Temecula1
Hahella chejuensis KCTC 2396
Nitrosococcus oceani ATCC 19707
Chromohalobacter salexigens DSM 3043
Acinetobacter baumannii ATCC 17978
Haemophilus influenzae PittEE
Serratia proteamaculans 568
Escherichia coli O157H7 str. Sakai
Saccharophagus degradans 2-40
Photorhabdus luminescens subsp. laumondii TTO1
Candidatus Blochmannia floridanus
Aeromonas hydrophila subsp. hydrophila ATCC 7966
Dichelobacter nodosus VCS1703A
Marinomonas sp. MWYL1
Baumannia cicadellinicola str. Hc Homalodisca coagulataBuchnera aphidicola str. APS Acyrthosiphon pisum
Wigglesworthia glossinidia endosymbiont of Glossina brevipalpis
Photobacterium profundum SS9
Actinobacillus pleuropneumoniae L20
Buchnera aphidicola str. Bp Baizongia pistaciae
Candidatus Vesicomyosocius okutanii HA
Candidatus Blochmannia pennsylvanicus str. BPEN
Buchnera aphidicola str. Sg Schizaphis graminum
Pseudoalteromonas atlantica T6c
Pasteurella multocida subsp. multocida str. Pm70
Halorhodospira halophila SL1
Haemophilus ducreyi 35000HP
Idiomarina loihiensis L2TR
Yersinia enterocolitica subsp. enterocolitica 8081
Legionella pneumophila str. Corby
Actinobacillus succinogenes 130Z
Alcanivorax borkumensis SK2
Aeromonas salmonicida subsp. salmonicida A449
Psychrobacter arcticus 273-4
54
37
10014
10
28
50
10048
25
45
99
98
100
100
52
37
100
83
100
10
43
24
100
100
8
94
86
100
75
100
37
45
56
31
49
100
64
22
94
23
38
7678
45
22
14100
72
4
66
2
0.1
Halorhodospira halophila SL1
Buchnera aphidicola str Sg Schizaphis graminum
Haemophilus somnus 129PT
Marinobacter aquaeolei VT8
Escherichia coli O157 H7 str Sakai
Photorhabdus luminescens subsp laumondii TTO1
Aeromonas hydrophila subsp hydrophila ATCC 7966
Candidatus Ruthia magnifica str Cm Calyptogena magnifica
Saccharophagus degradans 2 40
Pasteurella multocida subsp multocida str Pm70
Wigglesworthia glossinidia endosymbiont of Glossina brevipalpis
Aeromonas salmonicida subsp salmonicida A449
Coxiella burnetii RSA 493
Alkalilimnicola ehrlichei MLHE 1
Psychrobacter sp PRwf 1
Mannheimia succiniciproducens MBEL55E
Idiomarina loihiensis L2TR
Xylella fastidiosa Temecula1
Candidatus Blochmannia floridanus
Sodalis glossinidius str morsitans
Psychromonas ingrahamii 37
Pseudoalteromonas atlantica T6c
Dichelobacter nodosus VCS1703A
Candidatus Blochmannia pennsylvanicus str BPEN
Baumannia cicadellinicola str Hc Homalodisca coagulata
Haemophilus influenzae PittEE
Buchnera aphidicola str Bp Baizongia pistaciae
Candidatus Vesicomyosocius okutanii HA
Serratia proteamaculans 568
Chromohalobacter salexigens DSM 3043
Buchnera aphidicola str Cc Cinara cedri
Acinetobacter sp ADP1
Haemophilus ducreyi 35000HP
Shewanella pealeana ATCC 700345
Alcanivorax borkumensis SK2
Hahella chejuensis KCTC 2396
Vibrio fischeri ES114
Acinetobacter baumannii ATCC 17978
Methylococcus capsulatus str Bath
Colwellia psychrerythraea 34H
Photobacterium profundum SS9
Erwinia carotovora subsp atroseptica SCRI1043
Legionella pneumophila str Corby
Actinobacillus pleuropneumoniae L20
Xanthomonas campestris pv vesicatoria str 85 10
Francisella tularensis subsp novicida U112
Pseudomonas stutzeri A1501
Yersinia enterocolitica subsp enterocolitica 8081
Nitrosococcus oceani ATCC 19707
Pseudoalteromonas haloplanktis TAC125
Marinomonas sp MWYL1
Psychrobacter arcticus 273 4
Thiomicrospira crunogena XCL 2
Actinobacillus succinogenes 130Z
Buchnera aphidicola str APS Acyrthosiphon pisum
20
26
100100
75
100
10
93
53
100
34
76
100
95
100
100
29
32
100
9
100
100
100
100
87
100
91
100
100
22
74100
100
93
100
6585
46
79
99
100
100
100
99
84
66
48
5
100
75
100
46
73
A.
B.
0.07
Sodalis glossinidius str. 'morsitans'
Thiomicrospira crunogena XCL-2
Acinetobacter sp. ADP1
Pseudoalteromonas haloplanktis TAC125
Marinobacter aquaeolei VT8
Colwellia psychrerythraea 34H
Pseudomonas stutzeri A1501
Haemophilus somnus 129PT
Alkalilimnicola ehrlichei MLHE-1
Mannheimia succiniciproducens MBEL55E
Psychromonas ingrahamii 37
Xanthomonas campestris pv. vesicatoria str. 85-10
Methylococcus capsulatus str. Bath
Shewanella pealeana ATCC 700345
Candidatus Ruthia magnifica str. Cm Calyptogena magnifica
Vibrio fischeri ES114
Erwinia carotovora subsp. atroseptica SCRI1043
Coxiella burnetii RSA 493
Buchnera aphidicola str. Cc Cinara cedri
Francisella tularensis subsp. novicida U112
Psychrobacter sp. PRwf-1
Xylella fastidiosa Temecula1
Hahella chejuensis KCTC 2396
Nitrosococcus oceani ATCC 19707
Chromohalobacter salexigens DSM 3043
Acinetobacter baumannii ATCC 17978
Haemophilus influenzae PittEE
Serratia proteamaculans 568
Escherichia coli O157H7 str. Sakai
Saccharophagus degradans 2-40
Photorhabdus luminescens subsp. laumondii TTO1
Candidatus Blochmannia floridanus
Aeromonas hydrophila subsp. hydrophila ATCC 7966
Dichelobacter nodosus VCS1703A
Marinomonas sp. MWYL1
Baumannia cicadellinicola str. Hc Homalodisca coagulataBuchnera aphidicola str. APS Acyrthosiphon pisum
Wigglesworthia glossinidia endosymbiont of Glossina brevipalpis
Photobacterium profundum SS9
Actinobacillus pleuropneumoniae L20
Buchnera aphidicola str. Bp Baizongia pistaciae
Candidatus Vesicomyosocius okutanii HA
Candidatus Blochmannia pennsylvanicus str. BPEN
Buchnera aphidicola str. Sg Schizaphis graminum
Pseudoalteromonas atlantica T6c
Pasteurella multocida subsp. multocida str. Pm70
Halorhodospira halophila SL1
Haemophilus ducreyi 35000HP
Idiomarina loihiensis L2TR
Yersinia enterocolitica subsp. enterocolitica 8081
Legionella pneumophila str. Corby
Actinobacillus succinogenes 130Z
Alcanivorax borkumensis SK2
Aeromonas salmonicida subsp. salmonicida A449
Psychrobacter arcticus 273-4
54
37
10014
10
28
50
10048
25
45
99
98
100
100
52
37
100
83
100
10
43
24
100
100
8
94
86
100
75
100
37
45
56
31
49
100
64
22
94
23
38
7678
45
22
14100
72
4
66
2
0.1
Halorhodospira halophila SL1
Buchnera aphidicola str Sg Schizaphis graminum
Haemophilus somnus 129PT
Marinobacter aquaeolei VT8
Escherichia coli O157 H7 str Sakai
Photorhabdus luminescens subsp laumondii TTO1
Aeromonas hydrophila subsp hydrophila ATCC 7966
Candidatus Ruthia magnifica str Cm Calyptogena magnifica
Saccharophagus degradans 2 40
Pasteurella multocida subsp multocida str Pm70
Wigglesworthia glossinidia endosymbiont of Glossina brevipalpis
Aeromonas salmonicida subsp salmonicida A449
Coxiella burnetii RSA 493
Alkalilimnicola ehrlichei MLHE 1
Psychrobacter sp PRwf 1
Mannheimia succiniciproducens MBEL55E
Idiomarina loihiensis L2TR
Xylella fastidiosa Temecula1
Candidatus Blochmannia floridanus
Sodalis glossinidius str morsitans
Psychromonas ingrahamii 37
Pseudoalteromonas atlantica T6c
Dichelobacter nodosus VCS1703A
Candidatus Blochmannia pennsylvanicus str BPEN
Baumannia cicadellinicola str Hc Homalodisca coagulata
Haemophilus influenzae PittEE
Buchnera aphidicola str Bp Baizongia pistaciae
Candidatus Vesicomyosocius okutanii HA
Serratia proteamaculans 568
Chromohalobacter salexigens DSM 3043
Buchnera aphidicola str Cc Cinara cedri
Acinetobacter sp ADP1
Haemophilus ducreyi 35000HP
Shewanella pealeana ATCC 700345
Alcanivorax borkumensis SK2
Hahella chejuensis KCTC 2396
Vibrio fischeri ES114
Acinetobacter baumannii ATCC 17978
Methylococcus capsulatus str Bath
Colwellia psychrerythraea 34H
Photobacterium profundum SS9
Erwinia carotovora subsp atroseptica SCRI1043
Legionella pneumophila str Corby
Actinobacillus pleuropneumoniae L20
Xanthomonas campestris pv vesicatoria str 85 10
Francisella tularensis subsp novicida U112
Pseudomonas stutzeri A1501
Yersinia enterocolitica subsp enterocolitica 8081
Nitrosococcus oceani ATCC 19707
Pseudoalteromonas haloplanktis TAC125
Marinomonas sp MWYL1
Psychrobacter arcticus 273 4
Thiomicrospira crunogena XCL 2
Actinobacillus succinogenes 130Z
Buchnera aphidicola str APS Acyrthosiphon pisum
20
26
100100
75
100
10
93
53
100
34
76
100
95
100
100
29
32
100
9
100
100
100
100
87
100
91
100
100
22
74100
100
93
100
6585
46
79
99
100
100
100
99
84
66
48
5
100
75
100
46
73
A.
B.
Proteins rRNA
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
ORF00955
gi73748443
gi57234592
gi86609871
gi86605128
gi17229086
gi75910405
gi113476515
gi16329957
gi56752516
gi81300331
gi33862041
gi78779962
gi33865147
gi78213583
gi78184182
gi113953748
gi33863774
gi72382854
gi33241089
gi22298184
gi37521852
ZAVAM73TF
Thermomicrobium roseum DSM 5159
Dehalococcoides sp. CBDB1
Dehalococcoides ethenogenes 195
Synechococcus sp. JA 2 3Bʼa 2 13
Synechococcus sp. JA 3 3Ab
Nostoc sp. PCC 7120
Anabaena variabilis ATCC 29413
Trichodesmium erythraeum IMS101
Synechocystis sp. PCC 6803
Synechococcus elongatus PCC 6301
Synechococcus elongatus PCC 7942
Prochlorococcus marinus subsp. pastoris str. CCMP1986
Prochlorococcus marinus str. MIT 9312
Synechococcus sp. WH 8102
Synechococcus sp. CC9605
Synechococcus sp. CC9902
Synechococcus sp. CC9311
Prochlorococcus marinus str. MIT 9313
Prochlorococcus marinus str. NATL2A
Prochlorococcus marinus subsp. marinus str. CCMP1375
Thermosynechococcus elongatus BP 1
Gloeobacter violaceus PCC 7421
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Search with HMMs
!54
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Search with HMMs
!55
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
What Else Needed for This to Be Better?
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
What Else Needed for This to Be Better?
• More Phylogenetically Diverse Genomes
• More Markers esp. Taxon Specific Markers
• Better / Faster Searching
• Longer reads
• Many sequences at once
• Many genes at once
• Alpha and Beta Diversity
• Automated Masking
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Zorro - Automated Masking
ce to
Tru
e Tr
ee
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
200 400 800 1600 3200
Dist
ance
to T
rue
Tree
Sequence Length
200
no maskingzorrogblocks
200
Wu M, Chatterji S, Eisen JA (2012) Accounting For Alignment Uncertainty in Phylogenomics. PLoS ONE 7(1): e30288. doi:10.1371/journal.pone.0030288
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
RepresentativeGenomes
ExtractProtein
Annotation
All v. AllBLAST
HomologyClustering
(MCL)
SFams
Align & Build
HMMs
HMMs
Screen forHomologs
NewGenomes
ExtractProtein
Annotation
Figure 1Sharpton et al. 2013
AB
C
��
�
�
�
�
�� �
�
�
�
�
�
��
��
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
��
�
�
�
��
��
�
�
��
�
�
� ��
�
�
��
��
�
��
�
�
�
�
�
�
�
�
�
�
�
�
��
��
�
�
�
�
�
� �
�
�
��
�
�
� �
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
� �
��
�
�
�
�
�
�
�
��
�
� �
�
�
�
�
�
�
�
�
�
� �
�
�
�
�
�
�
�
�
� �
�
��
�
�
��
�
�
� ��
�
��
�
��
�
�
��
�
��
�
��
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
��
��
�
�
� �
�
�
�
��
�
�
�
��
�
�
�
�
�
�
� �
�
�
��
�
�
�
��
�
�
�
�
�
�
�
�
�
�
� �
�
�
�
�
��
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
��
�
�
��
��
�
�
��
�
�
� �
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
���
�
�
�
�
�
�
�
�
�
��
�
�
�
��
� �
�
�
��
�
�
��
�
�
�
�
�
�
�
�
�
��
�
� ��
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
� �
�
�
�
�
�
��
�
�
�
�
�
�
��
�
�
�
� �
�
�
� �� �
�
�
�
�
� �
�
��
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
��
���
�
�
�
�
�
�
� �
�
�
��
� �
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
��
�
�
��
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
���
�
�
��
�
�
��
��
�
�
�
�
��
�
�
�
�
�
��
�
� �
�
�
��
�
��
�
��
�
��
�
��
��
�
�
�
�
�
�
�
���
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
��
�
�
�
�
�
�
�
��
�
�
��
�
�
�
�
�
��
� �
��
� �
�
�
�
�
�
�
��
�
�� �
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
��
� �
�
�
�
�
�
�
�
��
�
�
�
�
�
�
��
�
�
�
�
�
�
��
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
��
��
�
��
�
�
�
�
�
�
�
�
�� �
�
�
�
�
�
�
�
�
��
��
�
�
��
�
�
���
�
�
�
�
�
�
�
��
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
��
��
�
��
�
�
�
�
�
�
�
�
�
�
�
�� �
�
�
�
�
�
�
�
�
�
�
�
�
�
�� �
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
��
��
�
�
�
�
�
�
�
�
�
�
��
��
�
�
�
�
�
�
�
�
�
�� �
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
� ��
�
�
�
�
�
�
�
�
� �
��
�
�
�
�
�
�
��
�
�
��
�
�
�
�
�
�
� �
�
��
�
�
�
�
� �
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
� �
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
��
�
�
�
�
�
�
��
�
��
�
�
�
� �
�
�
��
�
�
�
��
��
� �
�
�
�
�
��
�
�
�
�
�
�
�
�
��
��
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
��
�
�
�
�
�
�
�
�
�
�
��
�
�
��
�
�
�
��
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
� �
��
�
�
�
�
��
�
��
��
��
�
�
�
�
�
�
� �
��
�
�
�
�
Improving II: More Families
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Kembel Combiner
cally defined by a sequence similarity threshold) in the sampleas equally related. Newer ! diversity measures that incorporatephylogenetic information are more powerful because they ac-count for the degree of divergence between sequences (13, 18,29, 30). Phylogenetic ! diversity measures can also be eitherquantitative or qualitative depending on whether abundance istaken into account. The original, unweighted UniFrac measure(13) is a qualitative measure. Unweighted UniFrac measuresthe distance between two communities by calculating the frac-tion of the branch length in a phylogenetic tree that leads todescendants in either, but not both, of the two communities(Fig. 1A). The fixation index (FST), which measures thedistance between two communities by comparing the geneticdiversity within each community to the total genetic diversity ofthe communities combined (18), is a quantitative measure thataccounts for different levels of divergence between sequences.The phylogenetic test (P test), which measures the significanceof the association between environment and phylogeny (18), istypically used as a qualitative measure because duplicate se-quences are usually removed from the tree. However, the Ptest may be used in a semiquantitative manner if all clones,even those with identical or near-identical sequences, are in-cluded in the tree (13).
Here we describe a quantitative version of UniFrac that wecall “weighted UniFrac.” We show that weighted UniFrac be-haves similarly to the FST test in situations where both are
applicable. However, weighted UniFrac has a major advantageover FST because it can be used to combine data in whichdifferent parts of the 16S rRNA were sequenced (e.g., whennonoverlapping sequences can be combined into a single treeusing full-length sequences as guides). We use two differentdata sets to illustrate how analyses with quantitative and qual-itative ! diversity measures can lead to dramatically differentconclusions about the main factors that structure microbialdiversity. Specifically, qualitative measures that disregard rel-ative abundance can better detect effects of different foundingpopulations, such as the source of bacteria that first colonizethe gut of newborn mice and the effects of factors that arerestrictive for microbial growth such as temperature. In con-trast, quantitative measures that account for the relative abun-dance of microbial lineages can reveal the effects of moretransient factors such as nutrient availability.
MATERIALS AND METHODS
Weighted UniFrac. Weighted UniFrac is a new variant of the original un-weighted UniFrac measure that weights the branches of a phylogenetic treebased on the abundance of information (Fig. 1B). Weighted UniFrac is thus aquantitative measure of ! diversity that can detect changes in how many se-quences from each lineage are present, as well as detect changes in which taxaare present. This ability is important because the relative abundance of differentkinds of bacteria can be critical for describing community changes. In contrast,the original, unweighted UniFrac (Fig. 1A) is a qualitative ! diversity measurebecause duplicate sequences contribute no additional branch length to the tree(by definition, the branch length that separates a pair of duplicate sequences iszero, because no substitutions separate them).
The first step in applying weighted UniFrac is to calculate the raw weightedUniFrac value (u), according to the first equation:
u ! !i
n
bi " "Ai
AT#
Bi
BT"
Here, n is the total number of branches in the tree, bi is the length of branch i,Ai and Bi are the numbers of sequences that descend from branch i in commu-nities A and B, respectively, and AT and BT are the total numbers of sequencesin communities A and B, respectively. In order to control for unequal samplingeffort, Ai and Bi are divided by AT and BT.
If the phylogenetic tree is not ultrametric (i.e., if different sequences in thesample have evolved at different rates), clustering with weighted UniFrac willplace more emphasis on communities that contain quickly evolving taxa. Sincethese taxa are assigned more branch length, a comparison of the communitiesthat contain them will tend to produce higher values of u. In some situations, itmay be desirable to normalize u so that it has a value of 0 for identical commu-nities and 1 for nonoverlapping communities. This is accomplished by dividing uby a scaling factor (D), which is the average distance of each sequence from theroot, as shown in the equation as follows:
D ! !j
n
dj " #Aj
AT$
Bj
BT$
Here, dj is the distance of sequence j from the root, Aj and Bj are the numbersof times the sequences were observed in communities A and B, respectively, andAT and BT are the total numbers of sequences from communities A and B,respectively.
Clustering with normalized u values treats each sample equally instead of
TABLE 1. Measurements of diversity
Measure Measurement of " diversity Measurement of ! diversity
Only presence/absence of taxa considered Qualitative (species richness) QualitativeAdditionally accounts for the no. of times that
each taxon was observedQuantitative (species richness and evenness) Quantitative
FIG. 1. Calculation of the unweighted and the weighted UniFracmeasures. Squares and circles represent sequences from two differentenvironments. (a) In unweighted UniFrac, the distance between thecircle and square communities is calculated as the fraction of thebranch length that has descendants from either the square or the circleenvironment (black) but not both (gray). (b) In weighted UniFrac,branch lengths are weighted by the relative abundance of sequences inthe square and circle communities; square sequences are weightedtwice as much as circle sequences because there are twice as many totalcircle sequences in the data set. The width of branches is proportionalto the degree to which each branch is weighted in the calculations, andgray branches have no weight. Branches 1 and 2 have heavy weightssince the descendants are biased toward the square and circles, respec-tively. Branch 3 contributes no value since it has an equal contributionfrom circle and square sequences after normalization.
VOL. 73, 2007 PHYLOGENETICALLY COMPARING MICROBIAL COMMUNITIES 1577
Kembel SW, Eisen JA, Pollard KS, Green JL (2011) The Phylogenetic Diversity of Metagenomes. PLoS ONE 6(8): e23214. doi:10.1371/journal.pone.0023214
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Phylosift - Mining the Global Metagenome
Jonathan Eisen
Students and other staff: - Eric Lowe, John Zhang, David Coil !Open source community: - BLAST, LAST, HMMER, Infernal, pplacer, Krona, metAMOS, Bioperl, Bio::Phylo, JSON, etc. etc. !PhyloSift is open source software: -http://phylosift.wordpress.org -http://github.com/gjospin/phylosift
Erick Matsen FHCRC
Todd Treangen BNBI, NBACC
Holly Bik
Tiffanie NelsonMark
BrownAaron Darling
Guillaume Jospin
Supported by DHS Grant
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Output 1: Taxonomy
Taxonomic summary plots in Krona (Ondov et al 2011)
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Tree Reconciliation in PhyloSift
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Tree Reconciliation in PhyloSift
Environmental Sequences
Named Taxa
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Output 2: Phylogenetic Tree of ReadsPlacement tree from 2 week old infant gut data
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Edge PCA: Identify lineages that explain most variation
among samples
Matsen and Evans 2013, Darling et al Submitted.
Output: Edge PCA
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
QIIME and Edge PCA on 110 fecal metagenomes from
Yatsunenko et al 2012 Nature.
Sequenced with 454, to about 150Mbp/metagenome
Darling et al Submitted.
Edge PCA vs. UNIFRAC PCA
Edge PCA: Matsen and Evans 2013
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Output 3: Edge PCA
" Edge PCA for exploratory data analysis (Matsen and Evans 2013)
" Given E edges and S samples:
− For each edge, calculate difference in placement mass on either side of edge
− Results in E x S matrix
− Calculate E x E covariance matrix
− Calculate eigenvectors, eigenvalues of covariance matrix
" Eigenvector: each value indicates how “important” an edge is in explaining differences among the S samples
Example calculating a matrix entry for an edge: This edge gets 5-2=3
mass=5 mass=2
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Phylosift/ pplacer Workflow
Aaron Darling, Guillaume Jospin, Holly Bik, Erik Matsen, Eric Lowe, and others
Input Sequences rRNA workflow
protein workflow
profile HMMs used to align candidates to reference alignment
Taxonomic Summaries
parallel option
hmmalign multiple alignment
LAST fast candidate search
pplacer phylogenetic placement
LAST fast candidate search
LAST fast candidate search
search input against references
hmmalign multiple alignment
hmmalign multiple alignment
Infernal multiple alignment
LAST fast candidate search
<600 bp
>600 bp
Sample Analysis & Comparison
Krona plots, Number of reads placed
for each marker gene
Edge PCA, Tree visualization, Bayes factor tests
each
inpu
t seq
uenc
e sc
anne
d ag
ains
t bot
h w
orkf
low
s
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Phylosift Markers
• PMPROK – Dongying Wu’s Bac/Arch markers
• Eukaryotic Orthologs – Parfrey 2011 paper • 16S/18S rRNA • Mitochondria - protein-coding genes • Viral Markers – Markov clustering on
genomes • Codon Subtrees – finer scale taxonomy • Extended Markers – plastids, gene families
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
More Markers
Phylogenetic group
Genome Number
Gene Number
Maker CandidateArchaea 62 145415 106
Actinobacteria 63 267783 136Alphaproteoba 94 347287 121Betaproteobac 56 266362 311Gammaproteo 126 483632 118Deltaproteoba 25 102115 206Epislonproteo 18 33416 455Bacteriodes 25 71531 286Chlamydae 13 13823 560Chloroflexi 10 33577 323Cyanobacteria 36 124080 590Firmicutes 106 312309 87Spirochaetes 18 38832 176Thermi 5 14160 974Thermotogae 9 17037 684
Wu D, Jospin G, Eisen JA (2013) Systematic Identification of Gene Families for Use as “Markers” for Phylogenetic and Phylogeny-Driven Ecological Studies of Bacteria and Archaea and Their Major Subgroups. PLoS ONE 8(10): e77033. doi:10.1371/journal.pone.0077033
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Better Reference Tree
Lang JM, Darling AE, Eisen JA (2013) Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees and Supermatrices. PLoS ONE 8(4): e62510. doi:10.1371/journal.pone.0062510
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Stalking the Fourth Domain in Metagenomic Data:Searching for, Discovering, and Interpreting Novel, DeepBranches in Marker Gene Phylogenetic TreesDongying Wu1, Martin Wu1,4, Aaron Halpern2,3, Douglas B. Rusch2,3, Shibu Yooseph2,3, Marvin Frazier2,3,
J. Craig Venter2,3, Jonathan A. Eisen1*
1 Department of Evolution and Ecology, Department of Medical Microbiology and Immunology, University of California Davis Genome Center, University of California
Davis, Davis, California, United States of America, 2 The J. Craig Venter Institute, Rockville, Maryland, United States of America, 3 The J. Craig Venter Institute, La Jolla,
California, United States of America, 4 University of Virginia, Charlottesville, Virginia, United States of America
Abstract
Background: Most of our knowledge about the ancient evolutionary history of organisms has been derived from dataassociated with specific known organisms (i.e., organisms that we can study directly such as plants, metazoans, andculturable microbes). Recently, however, a new source of data for such studies has arrived: DNA sequence data generateddirectly from environmental samples. Such metagenomic data has enormous potential in a variety of areas including, as weargue here, in studies of very early events in the evolution of gene families and of species.
Methodology/Principal Findings: We designed and implemented new methods for analyzing metagenomic data and usedthem to search the Global Ocean Sampling (GOS) Expedition data set for novel lineages in three gene families commonlyused in phylogenetic studies of known and unknown organisms: small subunit rRNA and the recA and rpoB superfamilies.Though the methods available could not accurately identify very deeply branched ss-rRNAs (largely due to difficulties inmaking robust sequence alignments for novel rRNA fragments), our analysis revealed the existence of multiple novelbranches in the recA and rpoB gene families. Analysis of available sequence data likely from the same genomes as thesenovel recA and rpoB homologs was then used to further characterize the possible organismal source of the novel sequences.
Conclusions/Significance: Of the novel recA and rpoB homologs identified in the metagenomic data, some likely come fromuncharacterized viruses while others may represent ancient paralogs not yet seen in any cultured organism. A thirdpossibility is that some come from novel cellular lineages that are only distantly related to any organisms for whichsequence data is currently available. If there exist any major, but so-far-undiscovered, deeply branching lineages in the treeof life, we suggest that methods such as those described herein currently offer the best way to search for them.
Citation: Wu D, Wu M, Halpern A, Rusch DB, Yooseph S, et al. (2011) Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, andInterpreting Novel, Deep Branches in Marker Gene Phylogenetic Trees. PLoS ONE 6(3): e18011. doi:10.1371/journal.pone.0018011
Editor: Robert Fleischer, Smithsonian Institution National Zoological Park, United States of America
Received October 25, 2010; Accepted February 20, 2011; Published March 18, 2011
This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration which stipulates that, once placed in the publicdomain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.
Funding: The development and main work on this project was supported by the National Science Foundation via an ‘‘Assembling the Tree of Life’’ grant(number 0228651) to to Jonathan A. Eisen and Naomi Ward. The final work on this project was funded by the Gordon and Betty Moore Foundation (throughgrants 0000951 and 0001660). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript
Competing Interests: The authors have declared that no competing interests exist.
* E-mail: [email protected]
Introduction
During the last 30 years, technological advances in nucleic acidsequencing have led to revolutionary changes in our perception ofthe evolutionary relationships among all species as visualized in thetree of life. The first revolution was spawned by the work of CarlWoese and colleagues who, through sequencing and phylogeneticanalysis of fragments of rRNA molecules, demonstrated how thediverse kinds of known cellular organisms could be placed on asingle tree of life [1,2,3]. Most significantly, their analyses revealedthe existence of a third major branch on the tree; the Archaea(then referred to as Archaebacteria) took their place along with theBacteria and the Eukaryota [2]. Several factors make rRNA genesexceptionally powerful for this purpose, the most important beingperhaps that highly conserved, homologous rRNA genes arepresent in all cellular lineages. To this day, analyses of rRNA genes
continue to clarify and extend our knowledge of the evolutionaryrelationships among all life forms [4,5].
For microbial organisms, this approach was restricted to theminority that could be grown in pure culture in the laboratoryuntil Norm Pace and colleagues showed that one could sequencerRNAs directly from environmental samples [6,7]. Initially, themethodology was cumbersome. However, this changed with thedevelopment of the polymerase chain reaction (PCR) methodology[8]. PCR generates many copies of a target segment of DNA,which in turn facilitates cloning and sequencing of that segment.However, delineation of the segment to be amplified requiresprimers, i.e., short segments of DNA whose nucleotide sequence iscomplementary to the DNA flanking the target. Because rRNAgenes contain regions that are very highly conserved, ‘‘universalprimers’’ can be used for PCR amplification of those genes even inenvironmental samples [9,10]. Thus, in principle, one can use
PLoS ONE | www.plosone.org 1 March 2011 | Volume 6 | Issue 3 | e18011
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Figure 1. Phylogenetic tree of the RecA superfamily. All RecA sequences were grouped into clusters using the Lek algorithm. Representativesof each cluster that contained .2 members were then selected and aligned using MUSCLE. A phylogenetic tree was built by from this alignmentusing PHYML; bootstrap values are based on 100 replicas. The Lek cluster ID precedes each sequence accession ID. Proposed subfamilies in the RecAsuperfamily are shaded and given a name on the right. Five of the proposed subfamilies contained only GOS sequences at the time of our initialanalysis (RecA-like SAR, Phage SAR1, Phage SAR2, Unknown 1 and Unknown 2) and are highlighted by colored shading. As noted on the tree and inthe text, sequences from two Archaea that were released after our initial analysis group in the Unknown 2 subfamily.doi:10.1371/journal.pone.0018011.g001
Stalking the Fourth Domain
PLoS ONE | www.plosone.org 5 March 2011 | Volume 6 | Issue 3 | e18011
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014
Subfamily RecA AccessionAccession ofLinked Gene Assembly ID Neighboring Gene Description Taxonomy Assignment
Unknown2 1096686533379 1096686533473 1096627390330 aryl-alcohol dehydrogenases relatedoxidoreductases
Eukaryota
Unknown2 1096686533379 1096686533505 1096627390330 snRNP Sm-like protein Chain A Eukaryota
Unknown2 1096689280551 1096689280549 1096627650434 S-adenosylmethionine synthetase Bacteria
RecA-like SAR1 1096683378299 1096683378297 1096627289467 DNA polymerase III alpha subunit Bacteria
Unknown1 1096694953057 1096694953059 1096520459783 FKBP-type peptidyl-prolyl cis-trans isomerase Archaea
Unknown1 1096665977449 1096665977451 1096627520210 single-stranded DNA binding protein Viruses/Phages
Unknown1 1096682182125 1096682182127 1096628394294 DNA polymerase I Bacteria
Five RecA subfamilies were identified as being novel (i.e., only seen in metagenomic data) in our initial analyses. GOS metagenome assemblies that encode members ofthese subfamilies were identified and the genes neighboring the novel RecAs were characterized. The neighboring gene descriptions are based on the top BLASTP hitsagainst the NRAA database; taxonomy assignments are based on their closest neighbor in phylogenetic trees built from the top NRAA BLASTP hits.doi:10.1371/journal.pone.0018011.t002
Table 2. Cont.
Figure 2. The largest assembly from the GOS data that encodes a novel RecA subfamily member (a representative of subfamilyUnknown 2). This GOS assembly (ID 1096627390330) encodes 33 annotated genes plus 16 hypothetical proteins, including several with similarity toknown archaeal genes (e.g., DNA primase, translation initiation factor 2, Table 2). The arrow indicates a novel recA homolog from the Unknown 2subfamily (cluster ID 9).doi:10.1371/journal.pone.0018011.g002
Stalking the Fourth Domain
PLoS ONE | www.plosone.org 7 March 2011 | Volume 6 | Issue 3 | e18011
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014Methods
Identification of deeply-branching ss-rRNA sequencesA data set of 340 representative ss-rRNA sequences from all
three domains was prepared. These sequences represented 134eukaryotic, 186 bacterial, and 20 archaeal species. Alignments for
these 340 sequences were extracted from the European RibosomalRNA database [66] and then manually curated to removecolumns with more than 90% gaps or with poor alignmentquality. Sorcerer II Global Ocean Sampling Expedition (GOS) ss-rRNA sequences were identified by the PhylOTU pipeline [67].Using MUSCLE [68,69], each GOS ss-rRNA sequence was
Figure 3. Phylogenetic tree of the RpoB superfamily. All RpoB sequences were grouped into clusters using the Lek algorithm. Representativesof each cluster that contained .2 members were then selected and aligned using MUSCLE. A phylogenetic tree was built by from this alignmentusing PHYML; bootstrap values are based on 100 replicas. The Lek cluster ID precedes each sequence accession ID. Proposed subfamilies in the RpoBsuperfamily are shaded and given a name on the right. The two novel RpoB clades that contain only GOS sequences are highlighted by the coloredpanels.doi:10.1371/journal.pone.0018011.g003
Stalking the Fourth Domain
PLoS ONE | www.plosone.org 9 March 2011 | Volume 6 | Issue 3 | e18011