phylogenies from large samples of bacterial …...phylogenies from large samples of bacterial...
TRANSCRIPT
![Page 1: Phylogenies from Large Samples of Bacterial …...Phylogenies from Large Samples of Bacterial Genomes Bernhard Haubold MPI for Evolutionary Biology, Plön June 10, 2016 Bernhard Haubold](https://reader033.vdocuments.us/reader033/viewer/2022042301/5ecbbb0b28bb144c0c321e1f/html5/thumbnails/1.jpg)
Phylogenies from Large Samples of Bacterial
Genomes
Bernhard HauboldMPI for Evolutionary Biology, Plön
June 10, 2016
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 1 / 17
![Page 2: Phylogenies from Large Samples of Bacterial …...Phylogenies from Large Samples of Bacterial Genomes Bernhard Haubold MPI for Evolutionary Biology, Plön June 10, 2016 Bernhard Haubold](https://reader033.vdocuments.us/reader033/viewer/2022042301/5ecbbb0b28bb144c0c321e1f/html5/thumbnails/2.jpg)
Overview
From genomes to phylogenies
Approximate alignments
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 2 / 17
![Page 3: Phylogenies from Large Samples of Bacterial …...Phylogenies from Large Samples of Bacterial Genomes Bernhard Haubold MPI for Evolutionary Biology, Plön June 10, 2016 Bernhard Haubold](https://reader033.vdocuments.us/reader033/viewer/2022042301/5ecbbb0b28bb144c0c321e1f/html5/thumbnails/3.jpg)
From Genomes to Phylogenies—1
Genomes Alignment Distance Matrix Tree
S4
S3
S2
S1
S4
S3
S2
S1 S1 S2 S3 S4
S1 0
S2 d2,1 0
S3 d3,1 d3,2 0
S4 d4,1 d4,2 d4,3 0
S3
S4
S1
S2
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 3 / 17
![Page 4: Phylogenies from Large Samples of Bacterial …...Phylogenies from Large Samples of Bacterial Genomes Bernhard Haubold MPI for Evolutionary Biology, Plön June 10, 2016 Bernhard Haubold](https://reader033.vdocuments.us/reader033/viewer/2022042301/5ecbbb0b28bb144c0c321e1f/html5/thumbnails/4.jpg)
From Genomes to Phylogenies—2
Genomes Alignment Distance Matrix Tree
S4
S3
S2
S1
S4
S3
S2
S1 S1 S2 S3 S4
S1 0
S2 d2,1 0
S3 d3,1 d3,2 0
S4 d4,1 d4,2 d4,3 0
S3
S4
S1
S2
slow fast fast
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 4 / 17
![Page 5: Phylogenies from Large Samples of Bacterial …...Phylogenies from Large Samples of Bacterial Genomes Bernhard Haubold MPI for Evolutionary Biology, Plön June 10, 2016 Bernhard Haubold](https://reader033.vdocuments.us/reader033/viewer/2022042301/5ecbbb0b28bb144c0c321e1f/html5/thumbnails/5.jpg)
From Genomes to Phylogenies—3
Genomes Alignment Distance Matrix Tree
S4
S3
S2
S1
S4
S3
S2
S1 S1 S2 S3 S4
S1 0
S2 d2,1 0
S3 d3,1 d3,2 0
S4 d4,1 d4,2 d4,3 0
S3
S4
S1
S2
slow fast fast
andi
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 5 / 17
![Page 6: Phylogenies from Large Samples of Bacterial …...Phylogenies from Large Samples of Bacterial Genomes Bernhard Haubold MPI for Evolutionary Biology, Plön June 10, 2016 Bernhard Haubold](https://reader033.vdocuments.us/reader033/viewer/2022042301/5ecbbb0b28bb144c0c321e1f/html5/thumbnails/6.jpg)
Approximate Alignment
Only consider pairs of sequences.
Q S
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 6 / 17
![Page 7: Phylogenies from Large Samples of Bacterial …...Phylogenies from Large Samples of Bacterial Genomes Bernhard Haubold MPI for Evolutionary Biology, Plön June 10, 2016 Bernhard Haubold](https://reader033.vdocuments.us/reader033/viewer/2022042301/5ecbbb0b28bb144c0c321e1f/html5/thumbnails/7.jpg)
Anchors
Q S
Q S
Anchors:
Unique
Cannot be extended (maximal)
Longer than random match
Equidistant
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 7 / 17
![Page 8: Phylogenies from Large Samples of Bacterial …...Phylogenies from Large Samples of Bacterial Genomes Bernhard Haubold MPI for Evolutionary Biology, Plön June 10, 2016 Bernhard Haubold](https://reader033.vdocuments.us/reader033/viewer/2022042301/5ecbbb0b28bb144c0c321e1f/html5/thumbnails/8.jpg)
Anchor Distance
g1 AATGCCACCGGGTGATGATAGCCTCGATAGGCCGCAGGTCTCGCGGGGAAATC
g2 GCGAGAGCGCACCAGCGGGTGATGATAGCCTGGATAGGCCGCAGGACGGT
da =1
20 + 13= 0.03
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 8 / 17
![Page 9: Phylogenies from Large Samples of Bacterial …...Phylogenies from Large Samples of Bacterial Genomes Bernhard Haubold MPI for Evolutionary Biology, Plön June 10, 2016 Bernhard Haubold](https://reader033.vdocuments.us/reader033/viewer/2022042301/5ecbbb0b28bb144c0c321e1f/html5/thumbnails/9.jpg)
Searching
Q S
Compute index of S:◮ Time- & memory-intensive step◮ Parallelize
Search index of S with Q: Quick
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 9 / 17
![Page 10: Phylogenies from Large Samples of Bacterial …...Phylogenies from Large Samples of Bacterial Genomes Bernhard Haubold MPI for Evolutionary Biology, Plön June 10, 2016 Bernhard Haubold](https://reader033.vdocuments.us/reader033/viewer/2022042301/5ecbbb0b28bb144c0c321e1f/html5/thumbnails/10.jpg)
Implementation
Program: andi (ANchor DIstances)
Code: www.github.com/evolbioinf/andi
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 10 / 17
![Page 11: Phylogenies from Large Samples of Bacterial …...Phylogenies from Large Samples of Bacterial Genomes Bernhard Haubold MPI for Evolutionary Biology, Plön June 10, 2016 Bernhard Haubold](https://reader033.vdocuments.us/reader033/viewer/2022042301/5ecbbb0b28bb144c0c321e1f/html5/thumbnails/11.jpg)
Accuracy
10−5
10−4
10−3
10−2
10−1
100
10−4 10−3 10−2 10−1
da
Substitutions per Site (K )
da
ideal
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 11 / 17
![Page 12: Phylogenies from Large Samples of Bacterial …...Phylogenies from Large Samples of Bacterial Genomes Bernhard Haubold MPI for Evolutionary Biology, Plön June 10, 2016 Bernhard Haubold](https://reader033.vdocuments.us/reader033/viewer/2022042301/5ecbbb0b28bb144c0c321e1f/html5/thumbnails/12.jpg)
Problems at high Substitution Rates
0.1
0.2
0.3
0.4
0.5
0.60.70.80.91.0
0.4 0.5 0.6 0.70
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1d
a
Faile
dd
aE
stim
ation
Substitutions per Site (K )
da
idealfailed da
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 12 / 17
![Page 13: Phylogenies from Large Samples of Bacterial …...Phylogenies from Large Samples of Bacterial Genomes Bernhard Haubold MPI for Evolutionary Biology, Plön June 10, 2016 Bernhard Haubold](https://reader033.vdocuments.us/reader033/viewer/2022042301/5ecbbb0b28bb144c0c321e1f/html5/thumbnails/13.jpg)
29 Escherichia coli Genomes
mugsy: 2 h 29 min andi: 16.7s0.002
E. coli IAI1E. coli SE11
E. coli E24377A
S. sonnei Ss046
S. boydii Sb227S. boydii CDC 3083-94
S. flexneri 5 str. 8401S. flexneri 2a str. 2457TS. flexneri 2a str. 301
E. coli ATCC 8739E. coli HS
E. coli str. K-12 substr. MG1655E. coli str. K12 substr. W3110
E. coli str. K12 substr. DH10BE. coli BW2952
S. dysenteriae Sd197E. coli O55:H7 str. CB9615
E. coli O157:H7 EDL933E. coli O157:H7 str. Sakai
E. coli UMN026E. coli IAI39E. coli SMS-3-5
E. coli 0127:H6 E2348/69E. coli 536
E. coli ED1aE. coli CFT073
E. coli S88
E. coli UTI89E. coli APEC O1
0.002
E. coli IAI1E. coli SE11
E. coli E24377A
S. sonnei Ss046
S. boydii Sb227S. boydii CDC 3083-94
S. flexneri 5 str. 8401S. flexneri 2a str. 2457TS. flexneri 2a str. 301
E. coli ATCC 8739E. coli HS
E. coli str. K-12 substr. MG1655E. coli str. K12 substr. W3110
E. coli str. K12 substr. DH10BE. coli BW2952
S. dysenteriae Sd197E. coli O55:H7 str. CB9615
E. coli O157:H7 EDL933E. coli O157:H7 str. Sakai
E. coli UMN026E. coli IAI39E. coli SMS-3-5
E. coli 0127:H6 E2348/69
E. coli 536E. coli ED1a
E. coli CFT073
E. coli S88
E. coli UTI89E. coli APEC O1
500-fold speedup
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 13 / 17
![Page 14: Phylogenies from Large Samples of Bacterial …...Phylogenies from Large Samples of Bacterial Genomes Bernhard Haubold MPI for Evolutionary Biology, Plön June 10, 2016 Bernhard Haubold](https://reader033.vdocuments.us/reader033/viewer/2022042301/5ecbbb0b28bb144c0c321e1f/html5/thumbnails/14.jpg)
Time & Memory
0
50
100
150
200
250
0 5 10 15 20 25 300
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Tim
e(s
)
Me
mo
ry(G
b)
Processors
LaptopZone
TimeMemory
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 14 / 17
![Page 15: Phylogenies from Large Samples of Bacterial …...Phylogenies from Large Samples of Bacterial Genomes Bernhard Haubold MPI for Evolutionary Biology, Plön June 10, 2016 Bernhard Haubold](https://reader033.vdocuments.us/reader033/viewer/2022042301/5ecbbb0b28bb144c0c321e1f/html5/thumbnails/15.jpg)
3085 Streptococcus pneumoniae Genomes (2.2 Mb)
4 h 37 min on 24-core computer; 9.2 GB RAM
Cheewapreecha et al. (2014). Nature Genetics, 46:305–309.
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 15 / 17
![Page 16: Phylogenies from Large Samples of Bacterial …...Phylogenies from Large Samples of Bacterial Genomes Bernhard Haubold MPI for Evolutionary Biology, Plön June 10, 2016 Bernhard Haubold](https://reader033.vdocuments.us/reader033/viewer/2022042301/5ecbbb0b28bb144c0c321e1f/html5/thumbnails/16.jpg)
Summary
From genomes to phylogenies
genome alignment distance matrix tree
Approximate alignments
genome alignment distance matrix tree
ANchor DIstances: andi◮ accurate & scaleable to thousands of genomes◮ www.github.com/evolbioinf/andi◮ Ubuntu 16.04 (Xenial Xerus)
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 16 / 17
![Page 17: Phylogenies from Large Samples of Bacterial …...Phylogenies from Large Samples of Bacterial Genomes Bernhard Haubold MPI for Evolutionary Biology, Plön June 10, 2016 Bernhard Haubold](https://reader033.vdocuments.us/reader033/viewer/2022042301/5ecbbb0b28bb144c0c321e1f/html5/thumbnails/17.jpg)
Acknowledgments
Fabian Klötzl, Plön
Peter Pfaffelhuber, Freiburg
Bernhard Haubold (MPI Plön) Whole Genome Phylogenies RaMi-NGS 2016 17 / 17