uc davis eve161 lecture 16 by @phylogenomics

82
Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 Lecture 16: EVE 161: Microbial Phylogenomics Lecture #16: Era IV: Shotgun Metagenomics UC Davis, Winter 2014 Instructor: Jonathan Eisen 1

Upload: jonathan-eisen

Post on 10-May-2015

1.567 views

Category:

Education


1 download

DESCRIPTION

Slides for Lecture 16 in EVE 161 Course by Jonathan Eisen at UC Davis

TRANSCRIPT

Page 1: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Lecture 16:

EVE 161:Microbial Phylogenomics

!Lecture #16:

Era IV: Shotgun Metagenomics !

UC Davis, Winter 2014 Instructor: Jonathan Eisen

!1

Page 2: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Where we are going and where we have been

!Previous lecture: ! 15: Era IV: Shotgun Metagenomics

! Current Lecture: ! 16: Era IV: Metagenomic Diversity and

Binning ! Next Lecture:

! 17: Era IV: Metagenomic Case Study

!2

Page 3: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Era IV: Genomes in the environment

Era IV: Diversity from Metagenomics

Page 4: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Era IV: Genomes in the environment

A Very Simple System

Page 5: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Pierce’s Disease

!5

Page 6: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

• Obligate xylem feeder

• Transmits Xylella between plants

• Much like mosquitoes transmit malarial pathogen

• Only animal listed as possible “bioterror” agent by US DHS

!6

Glassy winged sharpshooter

GLASSY-WINGEDSHARPSHOOTERA Serious Threat to California Agriculture

FROM THE

UNIVERSITY OF CALIFORNIA’S PIERCE’S DISEASE RESEARCH AND

EMERGENCY RESPONSE TASK FORCE

Glassy-winged sharpshooter eggs are laid together on theunderside of leaves, usually in groups of 10 to 12. The eggmasses appear as small, greenish blisters. These blisters areeasier to observe after the eggs hatch, when they appearas tan to brown scars on the leaves.

Parasitized egg masses are tan to brown in color withsmall, circular holes at one end of the eggs.

This informational brochure was produced by ANRCommunication Services for the University of Califor-nia Pierce’s Disease Research and EmergencyResponse Task Force. You may download a copy of thebrochure from the Division of Agriculture and NaturalResources web site at http://danr.ucop.edu or from theCommunication Services web site athttp://danrcs.ucdavis.edu.

Download a copy of this brochure from http://danr.ucop.edu or http://danrcs.ucdavis.edu

For local information, contact your UC CooperativeExtension farm advisor:

Adults

Egg masses

Glassy-winged SharpshooterGeneralized Lifecycle

100

80

60

40

20

0

Jan.

Mar

.

May

July

Sept

.

Nov.

Glassy-winged sharpshooters overwinter as adultsand begin laying egg masses in late Februarythrough May. This first generation matures asadults in late May through late August. Second-generation egg masses are laid starting in mid-June through late September, which develop intoover-wintering adults.

Page 7: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !7

Xylem feeding insects also very successful

Page 8: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Xylem and Phloem

From Lodish et al. 2000

!8

Page 9: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Animal nutrition

• Xylem is frequently missing essential amino acids, vitamins and Co-Factors, and has only small amounts of carbon skeletons

!9

Page 10: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Plant response to sap feeders

• Possible solutions to no aa, vitamins, etc in xylem ! Eat other things ! Evolve metabolic pathways to synthesize missing nutrients ! Find some poor sap to make the stuff for you

!10

Page 11: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Moran N. A. PNAS 2007;104:8627-8633

©2007 by National Academy of Sciences !11

5

Sharpshooter:Cuerna sayi

bacteriomes

Sharpshooters harbor two obligatesymbionts in their bacteriomes

Moran et al. 2003 Environ. Microbiol.Moran et al. 2005 Appl. Environ. Microbiol.

Candidatus “Baumannia cicadellinicola” (Gammaproteobacteria)

Candidatus “Sulcia muelleri” (Bacteroidetes)

D Takiya

0.1mm

Bacteriome dissected from anterior abdomen of H. vitripennis

Orange-red portion- Baumannia only

Yellow portion- Baumannia and Sulcia

(Moran et al. 2003 Environmental Microbiology)

7

10!m

“Candidatus Baumannia cicadellinicola” (Gammaproteobacteria)

in “red” portion of bacteriome of Homalodisca vitripennis

N=host nucleus B=Bacteriocyte membrane E=Endosymbionts

Irregularly spherical

~2 !m diameter

Phylogeny of Sulcia muelleri from Auchenorrhyncha

(Hemiptera): the oldest insect symbiont

Moran et al. Appl Env Micro 2005

Permian

age fossils

(>270 myr)

•= 100% Bootstrap

•support, all methods

Broad congruence with host

relationships

Dates to the origins of vascular

plant-feeding in insects

Symbionts derived

from sharpshooters

Page 12: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

DNA extraction

PCRSequence

rRNA genes

Sequence alignment = Data matrixPhylogenetic tree

PCR

rRNA1

Yeast

Makes lots of copies of the rRNA genes in sample

E. coli

Humans

A

T

T

A

G

A

A

C

A

T

C

A

C

A

A

C

A

G

G

A

G

T

T

CrRNA1

E. coli Humans

Yeast

!12

rRNA1 5’ ...TACAGTATAGGTGGAGCTAGCGATCGAT

CGA... 3’

PCR and phylogenetic analysis of rRNA genes

5

Sharpshooter:Cuerna sayi

bacteriomes

Sharpshooters harbor two obligatesymbionts in their bacteriomes

Moran et al. 2003 Environ. Microbiol.Moran et al. 2005 Appl. Environ. Microbiol.

Candidatus “Baumannia cicadellinicola” (Gammaproteobacteria)

Candidatus “Sulcia muelleri” (Bacteroidetes)

D Takiya

0.1mm

Bacteriome dissected from anterior abdomen of H. vitripennis

Orange-red portion- Baumannia only

Yellow portion- Baumannia and Sulcia

(Moran et al. 2003 Environmental Microbiology)

Page 13: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Baumania is close relative of Buchnera symbionts of aphids

SharpshootersAphidsAphidsAphidsAntsFlies

!13Wu et al. 2006 PLoS Biology 4: e188.

Page 14: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Baumania is close relative of Buchnera symbionts of aphids

SharpshootersAphidsAphidsAphidsAntsFlies

!14Wu et al. 2006 PLoS Biology 4: e188.

Page 15: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !15

Page 16: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !16

Page 17: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Sharpshooter Shotgun Sequencing

shotgun

Wu et al. 2006 PLoS Biology 4: e188.Collaboration with Nancy Moran’s lab

Page 18: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Page 19: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Page 20: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Page 21: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Higher Evolutionary Rates in Endosymbionts

Wu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy Moran’ s Lab

Page 22: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Variation in Evolution Rates Correlated with Repair Gene Presence

MutS MutL

+ +

+ +

+ +

+ +

_ _

_ _

Wu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy Moran’ s Lab

Page 23: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Polymorphisms in Metapopulation

• Data from ~200 hosts –104 SNPs –2 indels

• PCR surveys show that this is between host variation

• Much lower ratio of transitions:transversions than in Blochmannia

• Consistent with absence of MMR from Blochmannia

Page 24: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Predict metabolic networks from Genome

Wu et al. 2006 PLoS Biology 4: e188.

!24

Page 25: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Baumannia is a Vitamin and Cofactor Producing Machine

Wu et al. 2006 PLoS Biology 4: e188.

!25

VITAMIN AND COFACTOR

PRODUCING MACHINE

Page 26: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Baumannia is a Vitamin and Cofactor Producing Machine

Wu et al. 2006 PLoS Biology 4: e188.

!26

NO PATHWAYS FOR ESSENTIAL AMINO ACID SYNTHESIS

Page 27: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !27

Page 28: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Page 29: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

???????

Page 30: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Commonly Used Binning Methods Did not Work Well

• Assembly –Only Baumannia generated good contigs

• Depth of coverage –Everything else 0-1X coverage

• Nucleotide composition –No detectible peaks in any vector we looked at

Page 31: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

A B C D E F G

T U V W X Y Z

Binning challenge

!30

Page 32: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

A B C D E F G

T U V W X Y Z

Binning challenge

Best binning method: reference genomes

!31

Page 33: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

A B C D E F G

T U V W X Y Z

Binning challenge

No reference genome? What do you do? !Phylogeny ....

Page 34: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

DNA extraction

PCRSequence

rRNA genes

Sequence alignment = Data matrixPhylogenetic tree

PCR

rRNA1

rRNA2

Makes lots of copies of the rRNA genes in sample

rRNA1 5’ ...ACACACATAGGTGGAGCTAGCGATCGAT

CGA... 3’

E. coli

Humans

A

T

T

A

G

A

A

C

A

T

C

A

C

A

A

C

A

G

G

A

G

T

T

CrRNA1

E. coli Humans

rRNA2

!33

rRNA2 5’ ...TACAGTATAGGTGGAGCTAGCGATCGAT

CGA... 3’

PCR and phylogenetic analysis of rRNA genes

!

!"#$%&"''()$*!"#$%& '&()

+#,()$-'.)&

!"#$%&"''()$&/"#$+'$/(0'/'+1-2#()&3.+-'4(& -4/(")-$/+#,()$-'.)&

5'$#4/#*+&,6/7889/-%.)$/%0+1)2$/3)/,05'$#4/#*+&,0+788:/455,0+-%.)$/%0+1)2$/3)/,0+

"#$%&%#'() *!"#$"%%&" '&'"()**&%&'+*", +,#--#./0'102#3'1/&#4

"#$%&%#'() *-#*'&" $#)**).&, +5#3'1/0&%1'1)4

;/<#=-3#

678--

5#3'1/&0-1 %&))13'1%9:/0-9#$'1/&0/9#2%0-1$90:9/012&3.&4)%%&5

;/#$<1=/1%9.0/'&0$= !"#$"%%&" 0$>?9

@1>>0A9.0/'&0$= !"#$"%%&" #$%9-#*'&"

+B0/#$91'9#>79C66D96%2&.+%$)%3"*17&'.+8&+*+9:4

5

Sharpshooter:Cuerna sayi

bacteriomes

Sharpshooters harbor two obligatesymbionts in their bacteriomes

Moran et al. 2003 Environ. Microbiol.Moran et al. 2005 Appl. Environ. Microbiol.

Candidatus “Baumannia cicadellinicola” (Gammaproteobacteria)

Candidatus “Sulcia muelleri” (Bacteroidetes)

D Takiya

0.1mm

Bacteriome dissected from anterior abdomen of H. vitripennis

Orange-red portion- Baumannia only

Yellow portion- Baumannia and Sulcia

(Moran et al. 2003 Environmental Microbiology)

Page 35: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014 !34

Page 36: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014Wu et al. 2006 PLoS Biology 4: e188.

Page 37: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Page 38: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

???????

Page 39: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

CFB Phyla

Wu et al. 2006 PLoS Biology 4: e188.

Page 40: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Sulcia makes essential amino acids

!38

Page 41: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Sulcia makes essential amino acids

!39

ESSENTIAL AMINO ACID PRODUCING

MACHINE

Page 42: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Wu et al. 2006 PLoS Biology 4: e188.

Baumannia makes vitamins and cofactors

Sulcia makes amino acids

Page 43: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Era IV: Genomes in the environment

Phylogeny & Metagenomes

Page 44: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

• Why analyze protein coding “marker” genes in metagenomic data?

Page 45: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

AMPHORA: AutoMated PHylogenOmic infeRence

Wu and Eisen, 2008. Genome Biology 2008, 9:R151  !doi:10.1186/gb-2008-9-10-r151 http://genomebiology.com/2008/9/10/R151

Page 46: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Choosing Marker Genes

• How choose the marker genes to use?

!

Page 47: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Choosing Marker Genes

• How choose the marker genes to use?

!• “We selected these proteins because they are universally

distributed in bacteria; the vast majority of them exist as single copy genes within each genome; and they are housekeeping genes that are involved in information processing (replication, transcription, and translation) or central metabolism, and thus are thought to be relatively recalcitrant to lateral gene transfer [19].”

Wu and Eisen, 2008. Genome Biology 2008, 9:R151  !doi:10.1186/gb-2008-9-10-r151 http://genomebiology.com/2008/9/10/R151

Page 48: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Identification and alignment of metagenomic reads?

• How to identify and rapidly align reads that correspond to each of the markers selected?

Page 49: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Identification and alignment of metagenomic reads?

• How to identify and rapidly align reads that correspond to each of the markers selected?

!• Prealign representatives from genomes

• Manually curate to build a “mask”

• Create “Hidden markov model” of the alignment

• Use HMM to search for, and align from metagenomes

!• For more on HMMs see “What is a Hidden Markov Model” by Sean Eddy. http://www.nature.com/nbt/journal/v22/

n10/full/nbt1004-1315.html

Wu and Eisen, 2008. Genome Biology 2008, 9:R151  !doi:10.1186/gb-2008-9-10-r151 http://genomebiology.com/2008/9/10/R151

Page 50: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

HMMs and Profiles

• Useful guide here http://compbio.soe.ucsc.edu/ismb99.handouts/KK185FP.html

Profile HMM

Page 51: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Page 52: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 20140.2

Kineococcus radiotolerans SRS30216

Nitrosomonas eutropha C91

Flavobacterium psychrophilum JIP02 86

Saccharopolyspora erythraea NRRL 2338

Magnetospirillum magneticum AMB 1

Bacillus subtilis subsp subtilis str 168

Acidobacteria bacterium Ellin345

Rhodopirellula baltica SH 1

Mycobacterium bovis BCG str Pasteur 1173P2

Escherichia coli CFT073

Bacillus thuringiensis serovar konkukian str 97 27

Synechococcus sp CC9902

Leptospira interrogans serovar Lai str 56601

Rickettsia conorii str Malish 7

Helicobacter pylori J99

Shigella flexneri 5 str 8401

Bordetella pertussis Tohama I

Mycobacterium sp JLS

Colwellia psychrerythraea 34H

Lawsonia intracellularis PHE MN1 00

Prochlorococcus marinus str MIT 9303

Streptomyces coelicolor A3 2

Lactobacillus helveticus DPC 4571

Yersinia pestis Nepal516

Wigglesworthia glossinidia endosymbiont of Glossina brevipalpis

Haemophilus influenzae 86 028NP

Campylobacter jejuni subsp jejuni 81116

Burkholderia pseudomallei K96243

Borrelia burgdorferi B31

Serratia proteamaculans 568

Tropheryma whipplei TW08 27

Clostridium perfringens SM101

Anaplasma phagocytophilum HZ

Psychrobacter arcticus 273 4

Fusobacterium nucleatum subsp nucleatum ATCC 25586

Rickettsia bellii OSU 85 389

Dehalococcoides ethenogenes 195

Buchnera aphidicola str Cc Cinara cedri

Candidatus Ruthia magnifica str Cm Calyptogena magnifica

Chlamydophila caviae GPIC

Roseobacter denitrificans OCh 114

Shewanella denitrificans OS217

Buchnera aphidicola str APS Acyrthosiphon pisum

Geobacillus thermodenitrificans NG80 2

Parabacteroides distasonis ATCC 8503

Corynebacterium jeikeium K411

Yersinia pseudotuberculosis IP 32953

Francisella tularensis subsp tularensis SCHU S4

Streptococcus pyogenes MGAS315

Zymomonas mobilis subsp mobilis ZM4

Bacillus anthracis str Ames

Shewanella frigidimarina NCIMB 400

Streptococcus gordonii str Challis substr CH1

Thiomicrospira crunogena XCL 2

Streptococcus pyogenes str Manfredo

Pseudomonas putida KT2440

Mycobacterium leprae TN

Staphylococcus saprophyticus subsp saprophyticus ATCC 15305

Shewanella sp MR 7

Rhodopseudomonas palustris HaA2

Geobacter sulfurreducens PCA

Lactococcus lactis subsp cremoris SK11

Nitrosococcus oceani ATCC 19707

Rhodopseudomonas palustris BisB18

Synechococcus elongatus PCC 6301

Pelotomaculum thermopropionicum SI

Synechocystis sp PCC 6803

Sorangium cellulosum So ce 56

Wolinella succinogenes DSM 1740

Moorella thermoacetica ATCC 39073

Gluconobacter oxydans 621H

Mycoplasma capricolum subsp capricolum ATCC 27343

Actinobacillus succinogenes 130Z

Baumannia cicadellinicola str Hc Homalodisca coagulata

Pseudomonas syringae pv syringae B728aHahella chejuensis KCTC 2396

Gramella forsetii KT0803

Brucella abortus biovar 1 str 9 941

Ochrobactrum anthropi ATCC 49188

Campylobacter fetus subsp fetus 82 40

Burkholderia mallei ATCC 23344

Anaplasma marginale str St Maries

Frankia sp CcI3

Dehalococcoides sp BAV1

Mycoplasma genitalium G37

Mycoplasma hyopneumoniae 7448

Synechococcus sp JA 3 3Ab

Parvibaculum lavamentivorans DS 1

Lactobacillus johnsonii NCC 533

Onion yellows phytoplasma OY M

Mycoplasma mycoides subsp mycoides SC str PG1

Desulfococcus oleovorans Hxd3

Escherichia coli 536

Methylobacterium extorquens PA1

Nitrosomonas europaea ATCC 19718

Campylobacter curvus 525 92

Nitrosospira multiformis ATCC 25196

Pseudomonas aeruginosa PAO1

Burkholderia multivorans ATCC 17616

Chlamydophila abortus S26 3

Shewanella loihica PV 4

Candidatus Blochmannia pennsylvanicus str BPEN

Xylella fastidiosa 9a5c

Chlamydophila felis Fe C 56

Caulobacter crescentus CB15

Orientia tsutsugamushi Boryong

Leptospira borgpetersenii serovar Hardjo bovis JB197

Rhodopseudomonas palustris BisA53

Marinomonas sp MWYL1

Chlamydophila pneumoniae TW 183

Brucella suis 1330

Renibacterium salmoninarum ATCC 33209

Citrobacter koseri ATCC BAA 895

Mycobacterium bovis AF2122 97

Lactococcus lactis subsp lactis Il1403

Campylobacter hominis ATCC BAA 381

Polynucleobacter sp QLW P1DMWA 1

Coxiella burnetii RSA 331

Bifidobacterium longum NCC2705

Rickettsia rickettsii str Iowa

Bacillus cereus ATCC 10987

Synechococcus sp CC9605

Burkholderia xenovorans LB400

Prosthecochloris vibrioformis DSM 265

Streptococcus pyogenes MGAS10270

Listeria monocytogenes EGD e

Burkholderia cenocepacia AU 1054

Trichodesmium erythraeum IMS101

Streptococcus thermophilus CNRZ1066

Haemophilus somnus 129PT

Bacteroides fragilis YCH46

Streptococcus pyogenes SSI 1

Shewanella sp MR 4

Chlamydophila pneumoniae CWL029

Myxococcus xanthus DK 1622

Pelobacter carbinolicus DSM 2380

Actinobacillus pleuropneumoniae serovar 3 str JL03

Rickettsia prowazekii str Madrid E

Listeria welshimeri serovar 6b str SLCC5334

Salmonella enterica subsp enterica serovar Paratyphi B str SPB7

Bartonella henselae str Houston 1

Shewanella oneidensis MR 1

Rubrobacter xylanophilus DSM 9941

Marinobacter aquaeolei VT8

Neisseria meningitidis FAM18

Acidovorax avenae subsp citrulli AAC00 1

Staphylococcus aureus subsp aureus N315

Prochlorococcus marinus str MIT 9301

Oenococcus oeni PSU 1

Clostridium botulinum A str ATCC 19397

Streptococcus sanguinis SK36

Burkholderia pseudomallei 1710b

Bradyrhizobium sp BTAi1

Helicobacter pylori HPAG1

Staphylococcus aureus subsp aureus MW2

Pseudomonas aeruginosa PA7

Arthrobacter aurescens TC1

Candidatus Sulcia muelleri GWSS

Propionibacterium acnes KPA171202

Clostridium botulinum A str Hall

Psychrobacter cryohalolentis K5

Ralstonia solanacearum GMI1000

Polaromonas naphthalenivorans CJ2

Mycoplasma synoviae 53

Neisseria meningitidis MC58

Campylobacter jejuni subsp jejuni 81 176

Lactobacillus reuteri F275

Prochlorococcus marinus subsp marinus str CCMP1375

Rickettsia canadensis str McKiel

Lactococcus lactis subsp cremoris MG1363

Prochlorococcus marinus str MIT 9515

Campylobacter concisus 13826

Bifidobacterium adolescentis ATCC 15703

Lactobacillus gasseri ATCC 33323

Pseudomonas putida F1

Borrelia garinii PBi

Ralstonia metallidurans CH34

Alcanivorax borkumensis SK2

Streptococcus thermophilus LMG 18311

Bacillus thuringiensis str Al Hakam

Herpetosiphon aurantiacus ATCC 23779

Azoarcus sp EbN1

Corynebacterium glutamicum R

Bacteroides thetaiotaomicron VPI 5482

Bordetella bronchiseptica RB50

Synechococcus sp JA 2 3B a 2 13

Xanthomonas oryzae pv oryzae MAFF 311018

Treponema denticola ATCC 35405

Carboxydothermus hydrogenoformans Z 2901

Campylobacter jejuni RM1221

Mycoplasma pulmonis UAB CTIP

Candidatus Protochlamydia amoebophila UWE25

Lactobacillus sakei subsp sakei 23K

Aeromonas hydrophila subsp hydrophila ATCC 7966

Shewanella sp W3 18 1

Thermus thermophilus HB8

Pseudomonas syringae pv phaseolicola 1448A

Rickettsia massiliae MTU5

Helicobacter hepaticus ATCC 51449

Mesoplasma florum L1

Mycoplasma hyopneumoniae 232

Bacillus cereus E33L

Staphylococcus aureus subsp aureus MSSA476

Azoarcus sp BH72

Silicibacter pomeroyi DSS 3

Campylobacter jejuni subsp doylei 269 97

Borrelia afzelii PKo

Ehrlichia canis str Jake

Bordetella petrii

Desulfotomaculum reducens MI 1

Gloeobacter violaceus PCC 7421

Escherichia coli O157 H7 EDL933

Staphylococcus aureus subsp aureus JH9

Nitrobacter hamburgensis X14

Rickettsia typhi str Wilmington

Novosphingobium aromaticivorans DSM 12444

Staphylococcus aureus subsp aureus USA300 TCH1516

Clostridium beijerinckii NCIMB 8052

Bacillus clausii KSM K16

Xanthomonas oryzae pv oryzae KACC10331

Clostridium botulinum F str Langeland

Lactobacillus salivarius subsp salivarius UCC118

Pediococcus pentosaceus ATCC 25745

Burkholderia mallei NCTC 10229

Francisella tularensis subsp tularensis FSC198

Symbiobacterium thermophilum IAM 14863

Pasteurella multocida subsp multocida str Pm70

Bacillus weihenstephanensis KBAB4

Sphingopyxis alaskensis RB2256

Chlamydia trachomatis A HAR 13

Solibacter usitatus Ellin6076

Delftia acidovorans SPH 1

Sinorhizobium medicae WSM419

Psychrobacter sp PRwf 1

Candidatus Pelagibacter ubique HTCC1062

Shewanella baltica OS195

Synechococcus sp CC9311

Aster yellows witches broom phytoplasma AYWB

Salinispora arenicola CNS 205

Dinoroseobacter shibae DFL 12

Pelodictyon luteolum DSM 273

Arthrobacter sp FB24

Prochlorococcus marinus str MIT 9211

Nocardioides sp JS614

Corynebacterium glutamicum ATCC 13032

Synechococcus sp WH 8102

Maricaulis maris MCS10

Methylococcus capsulatus str Bath

Clostridium botulinum A str ATCC 3502

Bacillus pumilus SAFR 032

Thiobacillus denitrificans ATCC 25259

Lactobacillus plantarum WCFS1

Thermoanaerobacter pseudethanolicus ATCC 33223

Bacillus halodurans C 125

Geobacter uraniireducens Rf4

Clostridium acetobutylicum ATCC 824

Thermotoga maritima MSB8

Francisella tularensis subsp tularensis WY96 3418

Anaeromyxobacter dehalogenans 2CP C

Thermobifida fusca YX

Erwinia carotovora subsp atroseptica SCRI1043

Yersinia pestis KIM

Leuconostoc mesenteroides subsp mesenteroides ATCC 8293

Staphylococcus aureus subsp aureus JH1

Photobacterium profundum SS9

Fervidobacterium nodosum Rt17 B1

Rhodobacter sphaeroides 2 4 1

Roseiflexus sp RS 1

Ehrlichia chaffeensis str Arkansas

Streptococcus agalactiae NEM316

Pseudomonas aeruginosa UCBPP PA14

Escherichia coli K12

Sulfurovum sp NBC37 1

Clostridium phytofermentans ISDg

Chlamydia trachomatis 434 Bu

Streptococcus pyogenes MGAS2096

Salmonella enterica subsp arizonae serovar 62 z4 z23

Hyphomonas neptunium ATCC 15444

Silicibacter sp TM1040

Streptococcus suis 05ZYH33

Francisella tularensis subsp holarctica OSU18

Mycobacterium ulcerans Agy99

Prochlorococcus marinus str NATL1A

Escherichia coli O157 H7 str Sakai

Mannheimia succiniciproducens MBEL55E

Chlorobium chlorochromatii CaD3

Yersinia pestis Antiqua

Shewanella sp ANA 3

Mesorhizobium loti MAFF303099

Shewanella putrefaciens CN 32

Wolbachia endosymbiont strain TRS of Brugia malayi

Mycoplasma hyopneumoniae J

Acidovorax sp JS42

Syntrophomonas wolfei subsp wolfei str Goettingen

Alkaliphilus oremlandii OhILAs

Deinococcus geothermalis DSM 11300

Pseudoalteromonas haloplanktis TAC125

Ralstonia eutropha H16

Burkholderia ambifaria AMMD

Streptococcus pyogenes MGAS8232

Wolbachia endosymbiont of Drosophila melanogaster

Acinetobacter baumannii ATCC 17978

Neisseria gonorrhoeae FA 1090

Herminiimonas arsenicoxydans

Clostridium perfringens str 13

Lactobacillus brevis ATCC 367

Prochlorococcus marinus str NATL2A

Shigella sonnei Ss046

Brucella melitensis 16M

Chlamydophila pneumoniae AR39

Xanthobacter autotrophicus Py2

Mycobacterium tuberculosis H37Rv

Enterobacter sakazakii ATCC BAA 894Enterobacter sp 638

Legionella pneumophila str Lens

Rickettsia felis URRWXCal2

Staphylococcus epidermidis RP62A

Jannaschia sp CCS1

Streptomyces avermitilis MA 4680

Bradyrhizobium japonicum USDA 110

Tropheryma whipplei str Twist

Magnetococcus sp MC 1

Verminephrobacter eiseniae EF01 2

Polaromonas sp JS666

Mycobacterium smegmatis str MC2 155

Treponema pallidum subsp pallidum str Nichols

Salinibacter ruber DSM 13855

Mycoplasma pneumoniae M129

Streptococcus pyogenes M1 GAS

Francisella tularensis subsp holarctica FTA

Desulfotalea psychrophila LSv54

Escherichia coli APEC O1

Ehrlichia ruminantium str Gardel

Leptospira interrogans serovar Copenhageni str Fiocruz L1 130

Lactobacillus delbrueckii subsp bulgaricus ATCC BAA 365

Desulfovibrio desulfuricans G20

Streptococcus pneumoniae D39

Brucella canis ATCC 23365

Chromohalobacter salexigens DSM 3043

Streptococcus agalactiae A909

Prochlorococcus marinus str MIT 9312

Haemophilus ducreyi 35000HP

Leptospira borgpetersenii serovar Hardjo bovis L550

Pseudomonas entomophila L48

Thermus thermophilus HB27

Thermotoga lettingae TMO

Salmonella enterica subsp enterica serovar Typhi str Ty2

Salmonella enterica subsp enterica serovar Choleraesuis str SC B67

Haemophilus influenzae PittGG

Bartonella bacilliformis KC583

Clostridium kluyveri DSM 555

Mycobacterium tuberculosis H37Ra

Burkholderia vietnamiensis G4

Syntrophus aciditrophicus SB

Streptococcus pyogenes MGAS6180

Helicobacter pylori 26695

Rhodobacter sphaeroides ATCC 17025

Mycobacterium vanbaalenii PYR 1

Candidatus Vesicomyosocius okutanii HA

Pseudoalteromonas atlantica T6c

Lactobacillus delbrueckii subsp bulgaricus ATCC 11842

Staphylococcus aureus subsp aureus str Newman

Mycoplasma agalactiae PG2

Bacillus anthracis str Ames Ancestor

Thermoanaerobacter sp X514

Staphylococcus aureus subsp aureus MRSA252

Bacillus licheniformis ATCC 14580

Salinispora tropica CNB 440

Legionella pneumophila str Paris

Arcobacter butzleri RM4018

Burkholderia mallei SAVP1

Pseudomonas stutzeri A1501

Xylella fastidiosa Temecula1

Rhodobacter sphaeroides ATCC 17029

Bdellovibrio bacteriovorus HD100

Haemophilus influenzae Rd KW20

Pseudomonas putida GB 1

Francisella tularensis subsp holarctica

Mycobacterium avium subsp paratuberculosis K 10

Rhodoferax ferrireducens T118

Clostridium difficile 630

Sodalis glossinidius str morsitans

Neisseria meningitidis Z2491

Clostridium tetani E88

Streptococcus thermophilus LMD 9

Vibrio vulnificus YJ016

Sphingomonas wittichii RW1

Staphylococcus aureus subsp aureus Mu50

Deinococcus radiodurans R1

Shigella boydii Sb227

Pseudomonas fluorescens Pf 5

Thermotoga petrophila RKU 1

Chlamydophila pneumoniae J138

Synechococcus sp RCC307

Rickettsia rickettsii str Sheila Smith

Lactobacillus acidophilus NCFM

Microcystis aeruginosa NIES 843

Desulfovibrio vulgaris subsp vulgaris str Hildenborough

Streptococcus suis 98HAH33

Burkholderia pseudomallei 1106a

Frankia sp EAN1pec

Mycobacterium tuberculosis CDC1551

Xanthomonas campestris pv vesicatoria str 85 10

Pseudomonas mendocina ymp

Chlorobium phaeobacteroides DSM 266

Brucella melitensis biovar Abortus 2308

Shigella dysenteriae Sd197

Photorhabdus luminescens subsp laumondii TTO1

Buchnera aphidicola str Sg Schizaphis graminum

Acaryochloris marina MBIC11017

Candidatus Blochmannia floridanus

Ehrlichia ruminantium str Welgevonden

Burkholderia cenocepacia HI2424

Syntrophobacter fumaroxidans MPOB

Bordetella parapertussis 12822

Salmonella typhimurium LT2

Sinorhizobium meliloti 1021

Streptococcus pyogenes MGAS10394

Bacteroides fragilis NCTC 9343

Rhodopseudomonas palustris BisB5

Burkholderia thailandensis E264

Gluconacetobacter diazotrophicus PAl 5

Agrobacterium tumefaciens str C58

Escherichia coli HS

Bacillus cereus subsp cytotoxis NVH 391 98

Shewanella baltica OS155

Haemophilus influenzae PittEE

Vibrio harveyi ATCC BAA 1116

Ralstonia eutropha JMP134

Petrotoga mobilis SJ95

Mycoplasma penetrans HF 2

Dechloromonas aromatica RCB

Staphylococcus aureus subsp aureus Mu3

Bartonella quintana str Toulouse

Campylobacter jejuni subsp jejuni NCTC 11168

Mycobacterium tuberculosis F11

Thermoanaerobacter tengcongensis MB4

Yersinia pestis biovar Microtus str 91001

Neorickettsia sennetsu str Miyayama

Xanthomonas axonopodis pv citri str 306

Leifsonia xyli subsp xyli str CTCB07

Desulfovibrio vulgaris subsp vulgaris DP4

Streptococcus pyogenes MGAS9429

Nitrobacter winogradskyi Nb 255

Methylobacillus flagellatus KT

Mycobacterium sp KMS

Staphylococcus haemolyticus JCSC1435

Pseudomonas fluorescens PfO 1

Shigella flexneri 2a str 2457T

Coxiella burnetii RSA 493

Neisseria meningitidis 053442

Coxiella burnetii Dugway 5J108 111

Bacillus amyloliquefaciens FZB42

Chlamydia trachomatis D UW 3 CX

Roseiflexus castenholzii DSM 13941

Salmonella enterica subsp enterica serovar Paratyphi A str ATCC 9150

Staphylococcus aureus subsp aureus COL

Azorhizobium caulinodans ORS 571

Yersinia enterocolitica subsp enterocolitica 8081

Prochlorococcus marinus str MIT 9313

Vibrio cholerae O395

Clostridium thermocellum ATCC 27405

Shigella flexneri 2a str 301

Anaeromyxobacter sp Fw109 5

Brucella ovis ATCC 25840

Paracoccus denitrificans PD1222

Pseudomonas syringae pv tomato str DC3000

Shewanella amazonensis SB2B

Bartonella tribocorum CIP 105476

Nitratiruptor sp SB155 2

Shewanella sediminis HAW EB3

Alkaliphilus metalliredigens QYMF

Bacillus cereus ATCC 14579

Burkholderia mallei NCTC 10247

Synechococcus elongatus PCC 7942

Burkholderia pseudomallei 668

Staphylococcus aureus RF122

Salmonella enterica subsp enterica serovar Typhi str CT18

Staphylococcus epidermidis ATCC 12228

Geobacillus kaustophilus HTA426

Streptococcus agalactiae 2603V R

Streptococcus mutans UA159

Lactobacillus casei ATCC 334

Acinetobacter sp ADP1

Chromobacterium violaceum ATCC 12472

Chlorobium tepidum TLS

Cytophaga hutchinsonii ATCC 33406

Streptococcus pyogenes MGAS5005

Rickettsia bellii RML369 C

Yersinia pseudotuberculosis IP 31758

Rhizobium etli CFN 42

Helicobacter acinonychis str Sheeba

Dichelobacter nodosus VCS1703A

Janthinobacterium sp Marseille

Alkalilimnicola ehrlichei MLHE 1

Idiomarina loihiensis L2TR

Mycoplasma gallisepticum R

Yersinia pestis CO92

Oceanobacillus iheyensis HTE831

Buchnera aphidicola str Bp Baizongia pistaciae

Yersinia pestis Angola

Aeromonas salmonicida subsp salmonicida A449

Porphyromonas gingivalis W83

Granulibacter bethesdensis CGDNIH1

Frankia alni ACN14a

Dehalococcoides sp CBDB1

Mycobacterium avium 104

Listeria monocytogenes str 4b F2365

Nostoc sp PCC 7120

Chlamydia muridarum Nigg

Methylibium petroleiphilum PM1

Streptococcus pyogenes MGAS10750

Enterococcus faecalis V583

Vibrio vulnificus CMCP6

Acholeplasma laidlawii PG 8A

Prochlorococcus marinus str AS9601

Halorhodospira halophila SL1

Xanthomonas campestris pv campestris str 8004

Escherichia coli UTI89

Shewanella pealeana ATCC 700345

Rickettsia akari str Hartford

Caldicellulosiruptor saccharolyticus DSM 8903

Staphylococcus aureus subsp aureus NCTC 8325

Prochlorococcus marinus subsp pastoris str CCMP1986

Vibrio parahaemolyticus RIMD 2210633

Psychromonas ingrahamii 37

Legionella pneumophila str Corby

Listeria innocua Clip11262

Klebsiella pneumoniae subsp pneumoniae MGH 78578

Staphylococcus aureus subsp aureus USA300

Yersinia pestis Pestoides F

Clostridium novyi NT

Rhodopseudomonas palustris CGA009

Bacillus anthracis str Sterne

Flavobacterium johnsoniae UW101

Streptococcus pneumoniae R6

Chlamydia trachomatis L2b UCH 1 proctitis

Sulfurimonas denitrificans DSM 1251

Bacteroides vulgatus ATCC 8482

Thermosipho melanesiensis BI429

Erythrobacter litoralis HTCC2594

Chloroflexus aurantiacus J 10 fl

Ureaplasma parvum serovar 3 str ATCC 700970

Rhodospirillum rubrum ATCC 11170

Mycoplasma mobile 163K

Aquifex aeolicus VF5

Acidothermus cellulolyticus 11B

Anabaena variabilis ATCC 29413

Vibrio cholerae O1 biovar eltor str N16961

Vibrio fischeri ES114

Burkholderia sp 383

Geobacter metallireducens GS 15

Desulfitobacterium hafniense Y51

Mycobacterium sp MCS

Acidiphilium cryptum JF 5

Brucella suis ATCC 23445

Corynebacterium diphtheriae NCTC 13129

Synechococcus sp WH 7803

Mycobacterium gilvum PYR GCK

Francisella tularensis subsp novicida U112

Rhodococcus sp RHA1

Clavibacter michiganensis subsp michiganensis NCPPB 382

Bradyrhizobium sp ORS278

Corynebacterium efficiens YS 314

Mesorhizobium sp BNC1

Thermosynechococcus elongatus BP 1

Nocardia farcinica IFM 10152

Shewanella baltica OS185

Escherichia coli E24377A

Saccharophagus degradans 2 40

Legionella pneumophila subsp pneumophila str Philadelphia 1

Actinobacillus pleuropneumoniae L20

Prochlorococcus marinus str MIT 9215

Xanthomonas campestris pv campestris str ATCC 33913

Rhizobium leguminosarum bv viciae 3841

Streptococcus pneumoniae TIGR4

Pelobacter propionicus DSM 2379

Clostridium perfringens ATCC 13124

Gammaproteobacteria

Betaproteobacteria

Alphaproteobacteria

Epsilonproteobacteria

Deltaproteobacteria

Acidobacteria

Bacteroidetes/Chlorobi

Chlamydiae/Planctomycetes

Spirochaetes

Firmicutes

Cyanobacteria

Chloroflexi

Actinobacteria

Aquificae

Thermotogae

Deinococcus/Thermus

Firmicutes

Fusobacteria

Page 53: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Page 54: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

0.07

Sodalis glossinidius str. 'morsitans'

Thiomicrospira crunogena XCL-2

Acinetobacter sp. ADP1

Pseudoalteromonas haloplanktis TAC125

Marinobacter aquaeolei VT8

Colwellia psychrerythraea 34H

Pseudomonas stutzeri A1501

Haemophilus somnus 129PT

Alkalilimnicola ehrlichei MLHE-1

Mannheimia succiniciproducens MBEL55E

Psychromonas ingrahamii 37

Xanthomonas campestris pv. vesicatoria str. 85-10

Methylococcus capsulatus str. Bath

Shewanella pealeana ATCC 700345

Candidatus Ruthia magnifica str. Cm Calyptogena magnifica

Vibrio fischeri ES114

Erwinia carotovora subsp. atroseptica SCRI1043

Coxiella burnetii RSA 493

Buchnera aphidicola str. Cc Cinara cedri

Francisella tularensis subsp. novicida U112

Psychrobacter sp. PRwf-1

Xylella fastidiosa Temecula1

Hahella chejuensis KCTC 2396

Nitrosococcus oceani ATCC 19707

Chromohalobacter salexigens DSM 3043

Acinetobacter baumannii ATCC 17978

Haemophilus influenzae PittEE

Serratia proteamaculans 568

Escherichia coli O157H7 str. Sakai

Saccharophagus degradans 2-40

Photorhabdus luminescens subsp. laumondii TTO1

Candidatus Blochmannia floridanus

Aeromonas hydrophila subsp. hydrophila ATCC 7966

Dichelobacter nodosus VCS1703A

Marinomonas sp. MWYL1

Baumannia cicadellinicola str. Hc Homalodisca coagulataBuchnera aphidicola str. APS Acyrthosiphon pisum

Wigglesworthia glossinidia endosymbiont of Glossina brevipalpis

Photobacterium profundum SS9

Actinobacillus pleuropneumoniae L20

Buchnera aphidicola str. Bp Baizongia pistaciae

Candidatus Vesicomyosocius okutanii HA

Candidatus Blochmannia pennsylvanicus str. BPEN

Buchnera aphidicola str. Sg Schizaphis graminum

Pseudoalteromonas atlantica T6c

Pasteurella multocida subsp. multocida str. Pm70

Halorhodospira halophila SL1

Haemophilus ducreyi 35000HP

Idiomarina loihiensis L2TR

Yersinia enterocolitica subsp. enterocolitica 8081

Legionella pneumophila str. Corby

Actinobacillus succinogenes 130Z

Alcanivorax borkumensis SK2

Aeromonas salmonicida subsp. salmonicida A449

Psychrobacter arcticus 273-4

54

37

10014

10

28

50

10048

25

45

99

98

100

100

52

37

100

83

100

10

43

24

100

100

8

94

86

100

75

100

37

45

56

31

49

100

64

22

94

23

38

7678

45

22

14100

72

4

66

2

0.1

Halorhodospira halophila SL1

Buchnera aphidicola str Sg Schizaphis graminum

Haemophilus somnus 129PT

Marinobacter aquaeolei VT8

Escherichia coli O157 H7 str Sakai

Photorhabdus luminescens subsp laumondii TTO1

Aeromonas hydrophila subsp hydrophila ATCC 7966

Candidatus Ruthia magnifica str Cm Calyptogena magnifica

Saccharophagus degradans 2 40

Pasteurella multocida subsp multocida str Pm70

Wigglesworthia glossinidia endosymbiont of Glossina brevipalpis

Aeromonas salmonicida subsp salmonicida A449

Coxiella burnetii RSA 493

Alkalilimnicola ehrlichei MLHE 1

Psychrobacter sp PRwf 1

Mannheimia succiniciproducens MBEL55E

Idiomarina loihiensis L2TR

Xylella fastidiosa Temecula1

Candidatus Blochmannia floridanus

Sodalis glossinidius str morsitans

Psychromonas ingrahamii 37

Pseudoalteromonas atlantica T6c

Dichelobacter nodosus VCS1703A

Candidatus Blochmannia pennsylvanicus str BPEN

Baumannia cicadellinicola str Hc Homalodisca coagulata

Haemophilus influenzae PittEE

Buchnera aphidicola str Bp Baizongia pistaciae

Candidatus Vesicomyosocius okutanii HA

Serratia proteamaculans 568

Chromohalobacter salexigens DSM 3043

Buchnera aphidicola str Cc Cinara cedri

Acinetobacter sp ADP1

Haemophilus ducreyi 35000HP

Shewanella pealeana ATCC 700345

Alcanivorax borkumensis SK2

Hahella chejuensis KCTC 2396

Vibrio fischeri ES114

Acinetobacter baumannii ATCC 17978

Methylococcus capsulatus str Bath

Colwellia psychrerythraea 34H

Photobacterium profundum SS9

Erwinia carotovora subsp atroseptica SCRI1043

Legionella pneumophila str Corby

Actinobacillus pleuropneumoniae L20

Xanthomonas campestris pv vesicatoria str 85 10

Francisella tularensis subsp novicida U112

Pseudomonas stutzeri A1501

Yersinia enterocolitica subsp enterocolitica 8081

Nitrosococcus oceani ATCC 19707

Pseudoalteromonas haloplanktis TAC125

Marinomonas sp MWYL1

Psychrobacter arcticus 273 4

Thiomicrospira crunogena XCL 2

Actinobacillus succinogenes 130Z

Buchnera aphidicola str APS Acyrthosiphon pisum

20

26

100100

75

100

10

93

53

100

34

76

100

95

100

100

29

32

100

9

100

100

100

100

87

100

91

100

100

22

74100

100

93

100

6585

46

79

99

100

100

100

99

84

66

48

5

100

75

100

46

73

A.

B.

0.07

Sodalis glossinidius str. 'morsitans'

Thiomicrospira crunogena XCL-2

Acinetobacter sp. ADP1

Pseudoalteromonas haloplanktis TAC125

Marinobacter aquaeolei VT8

Colwellia psychrerythraea 34H

Pseudomonas stutzeri A1501

Haemophilus somnus 129PT

Alkalilimnicola ehrlichei MLHE-1

Mannheimia succiniciproducens MBEL55E

Psychromonas ingrahamii 37

Xanthomonas campestris pv. vesicatoria str. 85-10

Methylococcus capsulatus str. Bath

Shewanella pealeana ATCC 700345

Candidatus Ruthia magnifica str. Cm Calyptogena magnifica

Vibrio fischeri ES114

Erwinia carotovora subsp. atroseptica SCRI1043

Coxiella burnetii RSA 493

Buchnera aphidicola str. Cc Cinara cedri

Francisella tularensis subsp. novicida U112

Psychrobacter sp. PRwf-1

Xylella fastidiosa Temecula1

Hahella chejuensis KCTC 2396

Nitrosococcus oceani ATCC 19707

Chromohalobacter salexigens DSM 3043

Acinetobacter baumannii ATCC 17978

Haemophilus influenzae PittEE

Serratia proteamaculans 568

Escherichia coli O157H7 str. Sakai

Saccharophagus degradans 2-40

Photorhabdus luminescens subsp. laumondii TTO1

Candidatus Blochmannia floridanus

Aeromonas hydrophila subsp. hydrophila ATCC 7966

Dichelobacter nodosus VCS1703A

Marinomonas sp. MWYL1

Baumannia cicadellinicola str. Hc Homalodisca coagulataBuchnera aphidicola str. APS Acyrthosiphon pisum

Wigglesworthia glossinidia endosymbiont of Glossina brevipalpis

Photobacterium profundum SS9

Actinobacillus pleuropneumoniae L20

Buchnera aphidicola str. Bp Baizongia pistaciae

Candidatus Vesicomyosocius okutanii HA

Candidatus Blochmannia pennsylvanicus str. BPEN

Buchnera aphidicola str. Sg Schizaphis graminum

Pseudoalteromonas atlantica T6c

Pasteurella multocida subsp. multocida str. Pm70

Halorhodospira halophila SL1

Haemophilus ducreyi 35000HP

Idiomarina loihiensis L2TR

Yersinia enterocolitica subsp. enterocolitica 8081

Legionella pneumophila str. Corby

Actinobacillus succinogenes 130Z

Alcanivorax borkumensis SK2

Aeromonas salmonicida subsp. salmonicida A449

Psychrobacter arcticus 273-4

54

37

10014

10

28

50

10048

25

45

99

98

100

100

52

37

100

83

100

10

43

24

100

100

8

94

86

100

75

100

37

45

56

31

49

100

64

22

94

23

38

7678

45

22

14100

72

4

66

2

0.1

Halorhodospira halophila SL1

Buchnera aphidicola str Sg Schizaphis graminum

Haemophilus somnus 129PT

Marinobacter aquaeolei VT8

Escherichia coli O157 H7 str Sakai

Photorhabdus luminescens subsp laumondii TTO1

Aeromonas hydrophila subsp hydrophila ATCC 7966

Candidatus Ruthia magnifica str Cm Calyptogena magnifica

Saccharophagus degradans 2 40

Pasteurella multocida subsp multocida str Pm70

Wigglesworthia glossinidia endosymbiont of Glossina brevipalpis

Aeromonas salmonicida subsp salmonicida A449

Coxiella burnetii RSA 493

Alkalilimnicola ehrlichei MLHE 1

Psychrobacter sp PRwf 1

Mannheimia succiniciproducens MBEL55E

Idiomarina loihiensis L2TR

Xylella fastidiosa Temecula1

Candidatus Blochmannia floridanus

Sodalis glossinidius str morsitans

Psychromonas ingrahamii 37

Pseudoalteromonas atlantica T6c

Dichelobacter nodosus VCS1703A

Candidatus Blochmannia pennsylvanicus str BPEN

Baumannia cicadellinicola str Hc Homalodisca coagulata

Haemophilus influenzae PittEE

Buchnera aphidicola str Bp Baizongia pistaciae

Candidatus Vesicomyosocius okutanii HA

Serratia proteamaculans 568

Chromohalobacter salexigens DSM 3043

Buchnera aphidicola str Cc Cinara cedri

Acinetobacter sp ADP1

Haemophilus ducreyi 35000HP

Shewanella pealeana ATCC 700345

Alcanivorax borkumensis SK2

Hahella chejuensis KCTC 2396

Vibrio fischeri ES114

Acinetobacter baumannii ATCC 17978

Methylococcus capsulatus str Bath

Colwellia psychrerythraea 34H

Photobacterium profundum SS9

Erwinia carotovora subsp atroseptica SCRI1043

Legionella pneumophila str Corby

Actinobacillus pleuropneumoniae L20

Xanthomonas campestris pv vesicatoria str 85 10

Francisella tularensis subsp novicida U112

Pseudomonas stutzeri A1501

Yersinia enterocolitica subsp enterocolitica 8081

Nitrosococcus oceani ATCC 19707

Pseudoalteromonas haloplanktis TAC125

Marinomonas sp MWYL1

Psychrobacter arcticus 273 4

Thiomicrospira crunogena XCL 2

Actinobacillus succinogenes 130Z

Buchnera aphidicola str APS Acyrthosiphon pisum

20

26

100100

75

100

10

93

53

100

34

76

100

95

100

100

29

32

100

9

100

100

100

100

87

100

91

100

100

22

74100

100

93

100

6585

46

79

99

100

100

100

99

84

66

48

5

100

75

100

46

73

A.

B.

Proteins rRNA

Page 55: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

ORF00955

gi73748443

gi57234592

gi86609871

gi86605128

gi17229086

gi75910405

gi113476515

gi16329957

gi56752516

gi81300331

gi33862041

gi78779962

gi33865147

gi78213583

gi78184182

gi113953748

gi33863774

gi72382854

gi33241089

gi22298184

gi37521852

ZAVAM73TF

Thermomicrobium roseum DSM 5159

Dehalococcoides sp. CBDB1

Dehalococcoides ethenogenes 195

Synechococcus sp. JA 2 3Bʼa 2 13

Synechococcus sp. JA 3 3Ab

Nostoc sp. PCC 7120

Anabaena variabilis ATCC 29413

Trichodesmium erythraeum IMS101

Synechocystis sp. PCC 6803

Synechococcus elongatus PCC 6301

Synechococcus elongatus PCC 7942

Prochlorococcus marinus subsp. pastoris str. CCMP1986

Prochlorococcus marinus str. MIT 9312

Synechococcus sp. WH 8102

Synechococcus sp. CC9605

Synechococcus sp. CC9902

Synechococcus sp. CC9311

Prochlorococcus marinus str. MIT 9313

Prochlorococcus marinus str. NATL2A

Prochlorococcus marinus subsp. marinus str. CCMP1375

Thermosynechococcus elongatus BP 1

Gloeobacter violaceus PCC 7421

Page 56: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Search with HMMs

!54

Page 57: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Search with HMMs

!55

Page 58: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Page 59: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Page 60: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Page 61: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

What Else Needed for This to Be Better?

Page 62: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

What Else Needed for This to Be Better?

• More Phylogenetically Diverse Genomes

• More Markers esp. Taxon Specific Markers

• Better / Faster Searching

• Longer reads

• Many sequences at once

• Many genes at once

• Alpha and Beta Diversity

• Automated Masking

Page 63: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Zorro - Automated Masking

ce to

Tru

e Tr

ee

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

200 400 800 1600 3200

Dist

ance

to T

rue

Tree

Sequence Length

200

no maskingzorrogblocks

200

Wu M, Chatterji S, Eisen JA (2012) Accounting For Alignment Uncertainty in Phylogenomics. PLoS ONE 7(1): e30288. doi:10.1371/journal.pone.0030288

Page 64: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

RepresentativeGenomes

ExtractProtein

Annotation

All v. AllBLAST

HomologyClustering

(MCL)

SFams

Align & Build

HMMs

HMMs

Screen forHomologs

NewGenomes

ExtractProtein

Annotation

Figure 1Sharpton et al. 2013

AB

C

��

�� �

��

��

��

��

��

��

��

��

��

� ��

��

��

��

��

��

� �

��

� �

��

��

� �

��

��

� �

� �

� �

��

��

� ��

��

��

��

��

��

��

��

��

��

� �

��

��

� �

��

��

� �

��

��

��

��

��

��

��

� �

��

��

���

��

��

� �

��

��

��

� ��

��

� �

��

��

� �

� �� �

� �

��

��

��

��

���

� �

��

� �

��

��

��

��

��

��

��

���

��

��

��

��

��

� �

��

��

��

��

��

��

���

��

��

��

��

��

� �

��

� �

��

�� �

��

��

� �

��

��

��

��

��

��

��

��

�� �

��

��

��

���

��

��

��

��

��

�� �

�� �

��

��

��

��

��

�� �

��

� ��

� �

��

��

��

� �

��

� �

��

� �

��

��

��

��

��

� �

��

��

��

� �

��

��

��

��

��

��

��

��

��

� �

��

��

��

��

��

� �

��

Improving II: More Families

Page 65: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Kembel Combiner

cally defined by a sequence similarity threshold) in the sampleas equally related. Newer ! diversity measures that incorporatephylogenetic information are more powerful because they ac-count for the degree of divergence between sequences (13, 18,29, 30). Phylogenetic ! diversity measures can also be eitherquantitative or qualitative depending on whether abundance istaken into account. The original, unweighted UniFrac measure(13) is a qualitative measure. Unweighted UniFrac measuresthe distance between two communities by calculating the frac-tion of the branch length in a phylogenetic tree that leads todescendants in either, but not both, of the two communities(Fig. 1A). The fixation index (FST), which measures thedistance between two communities by comparing the geneticdiversity within each community to the total genetic diversity ofthe communities combined (18), is a quantitative measure thataccounts for different levels of divergence between sequences.The phylogenetic test (P test), which measures the significanceof the association between environment and phylogeny (18), istypically used as a qualitative measure because duplicate se-quences are usually removed from the tree. However, the Ptest may be used in a semiquantitative manner if all clones,even those with identical or near-identical sequences, are in-cluded in the tree (13).

Here we describe a quantitative version of UniFrac that wecall “weighted UniFrac.” We show that weighted UniFrac be-haves similarly to the FST test in situations where both are

applicable. However, weighted UniFrac has a major advantageover FST because it can be used to combine data in whichdifferent parts of the 16S rRNA were sequenced (e.g., whennonoverlapping sequences can be combined into a single treeusing full-length sequences as guides). We use two differentdata sets to illustrate how analyses with quantitative and qual-itative ! diversity measures can lead to dramatically differentconclusions about the main factors that structure microbialdiversity. Specifically, qualitative measures that disregard rel-ative abundance can better detect effects of different foundingpopulations, such as the source of bacteria that first colonizethe gut of newborn mice and the effects of factors that arerestrictive for microbial growth such as temperature. In con-trast, quantitative measures that account for the relative abun-dance of microbial lineages can reveal the effects of moretransient factors such as nutrient availability.

MATERIALS AND METHODS

Weighted UniFrac. Weighted UniFrac is a new variant of the original un-weighted UniFrac measure that weights the branches of a phylogenetic treebased on the abundance of information (Fig. 1B). Weighted UniFrac is thus aquantitative measure of ! diversity that can detect changes in how many se-quences from each lineage are present, as well as detect changes in which taxaare present. This ability is important because the relative abundance of differentkinds of bacteria can be critical for describing community changes. In contrast,the original, unweighted UniFrac (Fig. 1A) is a qualitative ! diversity measurebecause duplicate sequences contribute no additional branch length to the tree(by definition, the branch length that separates a pair of duplicate sequences iszero, because no substitutions separate them).

The first step in applying weighted UniFrac is to calculate the raw weightedUniFrac value (u), according to the first equation:

u ! !i

n

bi " "Ai

AT#

Bi

BT"

Here, n is the total number of branches in the tree, bi is the length of branch i,Ai and Bi are the numbers of sequences that descend from branch i in commu-nities A and B, respectively, and AT and BT are the total numbers of sequencesin communities A and B, respectively. In order to control for unequal samplingeffort, Ai and Bi are divided by AT and BT.

If the phylogenetic tree is not ultrametric (i.e., if different sequences in thesample have evolved at different rates), clustering with weighted UniFrac willplace more emphasis on communities that contain quickly evolving taxa. Sincethese taxa are assigned more branch length, a comparison of the communitiesthat contain them will tend to produce higher values of u. In some situations, itmay be desirable to normalize u so that it has a value of 0 for identical commu-nities and 1 for nonoverlapping communities. This is accomplished by dividing uby a scaling factor (D), which is the average distance of each sequence from theroot, as shown in the equation as follows:

D ! !j

n

dj " #Aj

AT$

Bj

BT$

Here, dj is the distance of sequence j from the root, Aj and Bj are the numbersof times the sequences were observed in communities A and B, respectively, andAT and BT are the total numbers of sequences from communities A and B,respectively.

Clustering with normalized u values treats each sample equally instead of

TABLE 1. Measurements of diversity

Measure Measurement of " diversity Measurement of ! diversity

Only presence/absence of taxa considered Qualitative (species richness) QualitativeAdditionally accounts for the no. of times that

each taxon was observedQuantitative (species richness and evenness) Quantitative

FIG. 1. Calculation of the unweighted and the weighted UniFracmeasures. Squares and circles represent sequences from two differentenvironments. (a) In unweighted UniFrac, the distance between thecircle and square communities is calculated as the fraction of thebranch length that has descendants from either the square or the circleenvironment (black) but not both (gray). (b) In weighted UniFrac,branch lengths are weighted by the relative abundance of sequences inthe square and circle communities; square sequences are weightedtwice as much as circle sequences because there are twice as many totalcircle sequences in the data set. The width of branches is proportionalto the degree to which each branch is weighted in the calculations, andgray branches have no weight. Branches 1 and 2 have heavy weightssince the descendants are biased toward the square and circles, respec-tively. Branch 3 contributes no value since it has an equal contributionfrom circle and square sequences after normalization.

VOL. 73, 2007 PHYLOGENETICALLY COMPARING MICROBIAL COMMUNITIES 1577

Kembel SW, Eisen JA, Pollard KS, Green JL (2011) The Phylogenetic Diversity of Metagenomes. PLoS ONE 6(8): e23214. doi:10.1371/journal.pone.0023214

Page 66: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Phylosift - Mining the Global Metagenome

Jonathan Eisen

Students and other staff: - Eric Lowe, John Zhang, David Coil !Open source community: - BLAST, LAST, HMMER, Infernal, pplacer, Krona, metAMOS, Bioperl, Bio::Phylo, JSON, etc. etc. !PhyloSift is open source software: -http://phylosift.wordpress.org -http://github.com/gjospin/phylosift

Erick Matsen FHCRC

Todd Treangen BNBI, NBACC

Holly Bik

Tiffanie NelsonMark

BrownAaron Darling

Guillaume Jospin

Supported by DHS Grant

Page 67: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Output 1: Taxonomy

Taxonomic summary plots in Krona (Ondov et al 2011)

Page 68: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Tree Reconciliation in PhyloSift

Page 69: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Tree Reconciliation in PhyloSift

Environmental Sequences

Named Taxa

Page 70: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Output 2: Phylogenetic Tree of ReadsPlacement tree from 2 week old infant gut data

Page 71: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Edge PCA: Identify lineages that explain most variation

among samples

Matsen and Evans 2013, Darling et al Submitted.

Output: Edge PCA

Page 72: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

QIIME and Edge PCA on 110 fecal metagenomes from

Yatsunenko et al 2012 Nature.

Sequenced with 454, to about 150Mbp/metagenome

Darling et al Submitted.

Edge PCA vs. UNIFRAC PCA

Edge PCA: Matsen and Evans 2013

Page 73: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Output 3: Edge PCA

" Edge PCA for exploratory data analysis (Matsen and Evans 2013)

" Given E edges and S samples:

− For each edge, calculate difference in placement mass on either side of edge

− Results in E x S matrix

− Calculate E x E covariance matrix

− Calculate eigenvectors, eigenvalues of covariance matrix

" Eigenvector: each value indicates how “important” an edge is in explaining differences among the S samples

Example calculating a matrix entry for an edge: This edge gets 5-2=3

mass=5 mass=2

Page 74: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Phylosift/ pplacer Workflow

Aaron Darling, Guillaume Jospin, Holly Bik, Erik Matsen, Eric Lowe, and others

Input Sequences rRNA workflow

protein workflow

profile HMMs used to align candidates to reference alignment

Taxonomic Summaries

parallel option

hmmalign multiple alignment

LAST fast candidate search

pplacer phylogenetic placement

LAST fast candidate search

LAST fast candidate search

search input against references

hmmalign multiple alignment

hmmalign multiple alignment

Infernal multiple alignment

LAST fast candidate search

<600 bp

>600 bp

Sample Analysis & Comparison

Krona plots, Number of reads placed

for each marker gene

Edge PCA, Tree visualization, Bayes factor tests

each

inpu

t seq

uenc

e sc

anne

d ag

ains

t bot

h w

orkf

low

s

Page 75: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Phylosift Markers

• PMPROK – Dongying Wu’s Bac/Arch markers

• Eukaryotic Orthologs – Parfrey 2011 paper • 16S/18S rRNA • Mitochondria - protein-coding genes • Viral Markers – Markov clustering on

genomes • Codon Subtrees – finer scale taxonomy • Extended Markers – plastids, gene families

Page 76: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

More Markers

Phylogenetic group

Genome Number

Gene Number

Maker CandidateArchaea 62 145415 106

Actinobacteria 63 267783 136Alphaproteoba 94 347287 121Betaproteobac 56 266362 311Gammaproteo 126 483632 118Deltaproteoba 25 102115 206Epislonproteo 18 33416 455Bacteriodes 25 71531 286Chlamydae 13 13823 560Chloroflexi 10 33577 323Cyanobacteria 36 124080 590Firmicutes 106 312309 87Spirochaetes 18 38832 176Thermi 5 14160 974Thermotogae 9 17037 684

Wu D, Jospin G, Eisen JA (2013) Systematic Identification of Gene Families for Use as “Markers” for Phylogenetic and Phylogeny-Driven Ecological Studies of Bacteria and Archaea and Their Major Subgroups. PLoS ONE 8(10): e77033. doi:10.1371/journal.pone.0077033

Page 77: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Better Reference Tree

Lang JM, Darling AE, Eisen JA (2013) Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees and Supermatrices. PLoS ONE 8(4): e62510. doi:10.1371/journal.pone.0062510

Page 78: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Page 79: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Stalking the Fourth Domain in Metagenomic Data:Searching for, Discovering, and Interpreting Novel, DeepBranches in Marker Gene Phylogenetic TreesDongying Wu1, Martin Wu1,4, Aaron Halpern2,3, Douglas B. Rusch2,3, Shibu Yooseph2,3, Marvin Frazier2,3,

J. Craig Venter2,3, Jonathan A. Eisen1*

1 Department of Evolution and Ecology, Department of Medical Microbiology and Immunology, University of California Davis Genome Center, University of California

Davis, Davis, California, United States of America, 2 The J. Craig Venter Institute, Rockville, Maryland, United States of America, 3 The J. Craig Venter Institute, La Jolla,

California, United States of America, 4 University of Virginia, Charlottesville, Virginia, United States of America

Abstract

Background: Most of our knowledge about the ancient evolutionary history of organisms has been derived from dataassociated with specific known organisms (i.e., organisms that we can study directly such as plants, metazoans, andculturable microbes). Recently, however, a new source of data for such studies has arrived: DNA sequence data generateddirectly from environmental samples. Such metagenomic data has enormous potential in a variety of areas including, as weargue here, in studies of very early events in the evolution of gene families and of species.

Methodology/Principal Findings: We designed and implemented new methods for analyzing metagenomic data and usedthem to search the Global Ocean Sampling (GOS) Expedition data set for novel lineages in three gene families commonlyused in phylogenetic studies of known and unknown organisms: small subunit rRNA and the recA and rpoB superfamilies.Though the methods available could not accurately identify very deeply branched ss-rRNAs (largely due to difficulties inmaking robust sequence alignments for novel rRNA fragments), our analysis revealed the existence of multiple novelbranches in the recA and rpoB gene families. Analysis of available sequence data likely from the same genomes as thesenovel recA and rpoB homologs was then used to further characterize the possible organismal source of the novel sequences.

Conclusions/Significance: Of the novel recA and rpoB homologs identified in the metagenomic data, some likely come fromuncharacterized viruses while others may represent ancient paralogs not yet seen in any cultured organism. A thirdpossibility is that some come from novel cellular lineages that are only distantly related to any organisms for whichsequence data is currently available. If there exist any major, but so-far-undiscovered, deeply branching lineages in the treeof life, we suggest that methods such as those described herein currently offer the best way to search for them.

Citation: Wu D, Wu M, Halpern A, Rusch DB, Yooseph S, et al. (2011) Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, andInterpreting Novel, Deep Branches in Marker Gene Phylogenetic Trees. PLoS ONE 6(3): e18011. doi:10.1371/journal.pone.0018011

Editor: Robert Fleischer, Smithsonian Institution National Zoological Park, United States of America

Received October 25, 2010; Accepted February 20, 2011; Published March 18, 2011

This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration which stipulates that, once placed in the publicdomain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.

Funding: The development and main work on this project was supported by the National Science Foundation via an ‘‘Assembling the Tree of Life’’ grant(number 0228651) to to Jonathan A. Eisen and Naomi Ward. The final work on this project was funded by the Gordon and Betty Moore Foundation (throughgrants 0000951 and 0001660). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

Competing Interests: The authors have declared that no competing interests exist.

* E-mail: [email protected]

Introduction

During the last 30 years, technological advances in nucleic acidsequencing have led to revolutionary changes in our perception ofthe evolutionary relationships among all species as visualized in thetree of life. The first revolution was spawned by the work of CarlWoese and colleagues who, through sequencing and phylogeneticanalysis of fragments of rRNA molecules, demonstrated how thediverse kinds of known cellular organisms could be placed on asingle tree of life [1,2,3]. Most significantly, their analyses revealedthe existence of a third major branch on the tree; the Archaea(then referred to as Archaebacteria) took their place along with theBacteria and the Eukaryota [2]. Several factors make rRNA genesexceptionally powerful for this purpose, the most important beingperhaps that highly conserved, homologous rRNA genes arepresent in all cellular lineages. To this day, analyses of rRNA genes

continue to clarify and extend our knowledge of the evolutionaryrelationships among all life forms [4,5].

For microbial organisms, this approach was restricted to theminority that could be grown in pure culture in the laboratoryuntil Norm Pace and colleagues showed that one could sequencerRNAs directly from environmental samples [6,7]. Initially, themethodology was cumbersome. However, this changed with thedevelopment of the polymerase chain reaction (PCR) methodology[8]. PCR generates many copies of a target segment of DNA,which in turn facilitates cloning and sequencing of that segment.However, delineation of the segment to be amplified requiresprimers, i.e., short segments of DNA whose nucleotide sequence iscomplementary to the DNA flanking the target. Because rRNAgenes contain regions that are very highly conserved, ‘‘universalprimers’’ can be used for PCR amplification of those genes even inenvironmental samples [9,10]. Thus, in principle, one can use

PLoS ONE | www.plosone.org 1 March 2011 | Volume 6 | Issue 3 | e18011

Page 80: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Figure 1. Phylogenetic tree of the RecA superfamily. All RecA sequences were grouped into clusters using the Lek algorithm. Representativesof each cluster that contained .2 members were then selected and aligned using MUSCLE. A phylogenetic tree was built by from this alignmentusing PHYML; bootstrap values are based on 100 replicas. The Lek cluster ID precedes each sequence accession ID. Proposed subfamilies in the RecAsuperfamily are shaded and given a name on the right. Five of the proposed subfamilies contained only GOS sequences at the time of our initialanalysis (RecA-like SAR, Phage SAR1, Phage SAR2, Unknown 1 and Unknown 2) and are highlighted by colored shading. As noted on the tree and inthe text, sequences from two Archaea that were released after our initial analysis group in the Unknown 2 subfamily.doi:10.1371/journal.pone.0018011.g001

Stalking the Fourth Domain

PLoS ONE | www.plosone.org 5 March 2011 | Volume 6 | Issue 3 | e18011

Page 81: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014

Subfamily RecA AccessionAccession ofLinked Gene Assembly ID Neighboring Gene Description Taxonomy Assignment

Unknown2 1096686533379 1096686533473 1096627390330 aryl-alcohol dehydrogenases relatedoxidoreductases

Eukaryota

Unknown2 1096686533379 1096686533505 1096627390330 snRNP Sm-like protein Chain A Eukaryota

Unknown2 1096689280551 1096689280549 1096627650434 S-adenosylmethionine synthetase Bacteria

RecA-like SAR1 1096683378299 1096683378297 1096627289467 DNA polymerase III alpha subunit Bacteria

Unknown1 1096694953057 1096694953059 1096520459783 FKBP-type peptidyl-prolyl cis-trans isomerase Archaea

Unknown1 1096665977449 1096665977451 1096627520210 single-stranded DNA binding protein Viruses/Phages

Unknown1 1096682182125 1096682182127 1096628394294 DNA polymerase I Bacteria

Five RecA subfamilies were identified as being novel (i.e., only seen in metagenomic data) in our initial analyses. GOS metagenome assemblies that encode members ofthese subfamilies were identified and the genes neighboring the novel RecAs were characterized. The neighboring gene descriptions are based on the top BLASTP hitsagainst the NRAA database; taxonomy assignments are based on their closest neighbor in phylogenetic trees built from the top NRAA BLASTP hits.doi:10.1371/journal.pone.0018011.t002

Table 2. Cont.

Figure 2. The largest assembly from the GOS data that encodes a novel RecA subfamily member (a representative of subfamilyUnknown 2). This GOS assembly (ID 1096627390330) encodes 33 annotated genes plus 16 hypothetical proteins, including several with similarity toknown archaeal genes (e.g., DNA primase, translation initiation factor 2, Table 2). The arrow indicates a novel recA homolog from the Unknown 2subfamily (cluster ID 9).doi:10.1371/journal.pone.0018011.g002

Stalking the Fourth Domain

PLoS ONE | www.plosone.org 7 March 2011 | Volume 6 | Issue 3 | e18011

Page 82: UC Davis EVE161 Lecture 16 by @phylogenomics

Slides for UC Davis EVE161 Course Taught by Jonathan Eisen Winter 2014Methods

Identification of deeply-branching ss-rRNA sequencesA data set of 340 representative ss-rRNA sequences from all

three domains was prepared. These sequences represented 134eukaryotic, 186 bacterial, and 20 archaeal species. Alignments for

these 340 sequences were extracted from the European RibosomalRNA database [66] and then manually curated to removecolumns with more than 90% gaps or with poor alignmentquality. Sorcerer II Global Ocean Sampling Expedition (GOS) ss-rRNA sequences were identified by the PhylOTU pipeline [67].Using MUSCLE [68,69], each GOS ss-rRNA sequence was

Figure 3. Phylogenetic tree of the RpoB superfamily. All RpoB sequences were grouped into clusters using the Lek algorithm. Representativesof each cluster that contained .2 members were then selected and aligned using MUSCLE. A phylogenetic tree was built by from this alignmentusing PHYML; bootstrap values are based on 100 replicas. The Lek cluster ID precedes each sequence accession ID. Proposed subfamilies in the RpoBsuperfamily are shaded and given a name on the right. The two novel RpoB clades that contain only GOS sequences are highlighted by the coloredpanels.doi:10.1371/journal.pone.0018011.g003

Stalking the Fourth Domain

PLoS ONE | www.plosone.org 9 March 2011 | Volume 6 | Issue 3 | e18011