exploring the remarkable diversity of escherichia coli · 1/19/2020 · 74 coli (mg1655, k-12),...
TRANSCRIPT
Exploring the remarkable diversity of Escherichia coli 1
Phages in the Danish Wastewater Environment, Including 2
91 Novel Phage Species 3
Nikoline S. Olsen 1, Witold Kot 1,2* Laura M. F. Junco2 and Lars H. Hansen 1,2,* 4
1 Department of Environmental Science, Aarhus University, Frederiksborgvej 399, Roskilde, Denmark; 5
2 Department of Plant and Environmental Sciences, University of Copenhagen, Thorvaldsensvej 40, 1871 7
Frederiksberg C, Denmark; [email protected], [email protected] 8
* Correspondence: [email protected] Phone: +45 28 75 20 53, [email protected] Phone: +45 35 33 38 77 9
10 Funding: This research was funded by Villum Experiment Grant 17595, Aarhus University Research Foundation 11 AUFF Grant E-2015-FLS-7-28 to Witold Kot and Human Frontier Science Program RGP0024/2018. 12 Competing interests: The authors declare no competing interests. 13
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
2
Abstract: Phages drive bacterial diversity - profoundly influencing diverse microbial communities, from 14
microbiomes to the drivers of global biogeochemical cycling. The vast genomic diversity of phages is 15
gradually being uncovered as >8000 phage genomes have now been sequenced. Aiming to broaden our 16
understanding of Escherichia coli (MG1655, K-12) phages, we screened 188 Danish wastewater samples (0.5 17
ml) and identified 136 phages of which 104 are unique phage species and 91 represent novel species, 18
including several novel lineages. These phages are estimated to represent roughly a third of the true diversity 19
of Escherichia phages in Danish wastewater. The novel phages are remarkably diverse and represent four 20
different families Myoviridae, Siphoviridae, Podoviridae and Microviridae. They group into 14 distinct clusters 21
and nine singletons without any substantial similarity to other phages in the dataset. Their genomes vary 22
drastically in length from merely 5 342 bp to 170 817 kb, with an impressive span of GC contents ranging 23
from 35.3% to 60.0%. Hence, even for a model host bacterium, in the go-to source for phages, substantial 24
diversity remains to be uncovered. These results expand and underlines the range of Escherichia phage 25
diversity and demonstrate how far we are from fully disclosing phage diversity and ecology. 26
Keywords: bacteriophage; wastewater; Escherichia coli; diversity; genomics; taxonomy; coliphage 27
28
1. Introduction 29
Phages are important ecological contributors, they renew organic matter supplies in nutrient cycles and 30
drive bacterial diversity by enabling co-existence of competing bacteria by “Killing the winner” and by serving 31
as genomic reservoirs and transport units [1,2]. Phage genomes are known to entail auxiliary metabolism 32
genes (AMGs), toxins, virulence factors and even antibiotic resistance genes [3–7] and through lysogeny and 33
transduction they can transfer metabolic traits to their hosts and even confer immunity against homologous 34
phages [1]. Still, in spite of their ecological role, potential as antimicrobials and the fact that they carry a 35
multitude of unknown genes with great potential for biotechnological applications, phages are vastly 36
understudied. Less than 9000 phage genomes have now been published, and though the number increases 37
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
3
rapidly, we may have merely scratched the surface of the expected diversity [8]. It has been estimated that at 38
least a billion bacterial species exist [9], hence only phages targeting a tiny fraction of potential hosts have been 39
reported. Efforts to disclose the range and diversity of phages targeting a single host, have revealed a stunning 40
display of diversity. The most scrutinized phage host is the Mycobacterium smegmatis, for which the Science 41
Education Alliance Phage Hunters program has isolated more than 4700 phages and fully sequenced 680 42
phages which represent 30 distinct phage clusters [10,11]. This endeavour has provided a unique insight into 43
viral and host diversity, evolution and genetics [12–15]. No other phage host has been equally targeted, but 44
Escherichia coli phages have been isolated in fairly high numbers. The International Committee on Taxonomy 45
of Viruses (ICTV) has currently recognised 158 phage species originally isolated on E. coli [16], while 2732 46
Escherichia phage genomes have been deposited in GenBank [17]. As phages are expected to have an 47
evolutionary potential to migrate across microbial populations, host species may not be an ideal indicator of 48
relatedness, but it serves as an excellent starting point to explore phage diversity. Hierarchical classification 49
of phages is complicated by the high degree of horizontal gene transfer, consequently several classification 50
systems have been proposed [18–20], and we may not yet have reached a point where it is reasonable to 51
establish the criteria for a universal system [20]. Nonetheless, a system enabling a mutual understanding and 52
exchange of knowledge is needed. Accordingly, we have in this study chosen to classify our novel phages 53
according to the ICTV guidelines [21]. 54
Here we aim to expand our understanding of Escherichia phage diversity. Earlier studies are few, have 55
relied on more laborious and time-consuming methods, or are based on in silico analyses of already published 56
phage genomes [8]. Accordingly, Korf et al., (2019) isolated 50 phages on 29 individual E. coli strains, while 57
Jurczak-Kurek et al., (2016) isolated 60 phages on a single E. coli strain, both finding a broad diversity of 58
Caudovirales and also representatives of novel phage lineages [22,23]. We hypothesized, that a high-throughput 59
screening of nearly 200 samples in time-series, using a single host, would expand the number of known phages 60
by facilitating the interception of the prevailing phage(s) of the given day. 61
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
4
2. Materials and Methods 62
The screening for Escherichia phages was performed with the microplate based High-throughput screening 63
method as described in (not published). With the exception that lysates of wells giving rise to plaques were 64
sequenced without further purification. In short, an overnight enrichment was performed in microplates with 65
host culture, media and wastewater (0.5 ml per well), followed by filtration (0.45 µm), a purification step by 66
re-inoculation, a second overnight incubation and filtration (0.45µm), and then a spot-test (soft-agar overlay) 67
to indicate positive wells. All procedures were performed under sterile conditions. 68
2.1.1 Samples Bacteria and media 69
Inlet Wastewater samples (188) were collected (40-50 ml) in time-series (2-4 days spanning 1-3 weeks), 70
from 48 Danish wastewater treatment facilities (rural and urban) geographically distributed on Zealand, 71
Funen and in Jutland, during July and August 2017. The samples were centrifuged (9000 x g, 4 °C, 10 min) and 72
the supernatant filtered (0.45 µm) before storage in aliquots (-20°C) until screening. The host bacterium is E. 73
coli (MG1655, K-12), and the media Lysogeny Broth (LB), amended with CaCl2 and MgCl2 (final concentration 74
50 mM). Collected phages were stored in SM-buffer [24] at 4°C. 75
2.1.2 Sequencing and genomic characterisation 76
DNA extractions, clean-up (ZR-96 Clean and Concentrator kit, Zymo Research, Irvine, CA USA) and 77
sequencing libraries (Nextera® XT DNA kit, Illumina, San Diego, CA USA) were performed according to 78
manufacturer’s protocol with minor modifications as described in Kot et al., (2014) [25]. The libraries were 79
sequenced as paired-end reads on an Illumina NextSeq platform with the Mid Output Kit v2 (300 cycles). The 80
obtained reads were trimmed and assembled in CLC Genomics Workbench version 10.1.1. (CLC BIO, Aarhus, 81
Denmark). Overlapping reads were merged with the following settings: mismatch cost: 2, minimum score: 15, 82
gap cost: 3 and maximum unaligned end mismatches: 0, and then assembled de novo. Additional control 83
assemblies were constructed using SPAdes version 3.12.0 [26]. Phage genomes were defined as contigs with 84
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
5
an average coverage > x 20 and a sequence length ≥ 90% of closest relative. Gene prediction and annotation 85
were performed using a customized RASTtk version 2.0 [27] workflow with GeneMark [28], with manual 86
curation and verification using BLASTP [29], HHpred [30] and Pfam version 32.0 [31], or de novo annotated 87
using VIGA version 0.11.0 [32] based on DIAMOND searches (RefSeq Viral protein database) and HMMer 88
searches (pVOG HMM database). All genomes were assessed for antibiotic resistance genes, bacterial 89
virulence genes, type I, II, III and IV restriction modification (RM) genes and auxiliary metabolism genes 90
(AMGs) using ResFinder 3.1 [33,34], VirulenceFinder 2.0 [35], Restriction-ModificationFinder 1.1 (REBASE) 91
[36] and VIBRANT version 1.0.1 [37], respectively. The 104 unique phage genomes were aligned to viromes of 92
BioProject PRJNA545408 in CLC Genomics Workbench and deposited in Genbank [17]. 93
2.1.3 Genetic analyses 94
Nucleotide (NT) and amino acid (AA) similarities were calculated using tools recommended by the ICTV 95
[38], i.e. BLAST [29] for identification of closest relative (BLASTn when possible, discontinuous megaBLAST 96
(word size 16) for larger genomes) and Gegenees version 2.2.1 [39] for assessing phylogenetic distances of 97
multiple genomes, for both NTs (BLASTn algorithm) and AAs (tBLASTx algorithm) a fragment size of 200 bp 98
and step size 100 bp was applied. NT similarity was determined as percentage query cover multiplied by 99
percentage NT identity. Novel phages are categorised according to ICTV taxonomy. The criterion of 95% DNA 100
sequence similarity for demarcation of species was applied to identify novel species representatives and to 101
determine uniqueness within the dataset. Evolutionary analyses for phylogenomic trees were conducted in 102
MEGA7 version 2.1 (default settings) [40]. These were based on the large terminase subunit (Caudovirales), a 103
gene commonly applied for phylogenetic analysis [41,42] and on the DNA replication gene (gpA) 104
(Microviridae). The NT sequences were aligned by MUSCLE [43] and the evolutionary history inferred by the 105
Maximum Likelihood method based on the Tamura-Nei model [44]. The trees with the highest log likelihood are 106
shown. Pairwise whole genome comparisons were performed with Easyfig 2.2.2 [45] (BLASTn algorithm), these 107
were curated by adding color-codes and identifiers in Inkscape version 0.92.2. The R package iNEXT [46,47] in 108
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
6
R studio version 1.1.456 [48] was used for rarefaction, species diversity (q = 0, datatype: incidence_raw), 109
extrapolation hereof (estimadeD) and estimation of sample coverage. The visualisation of genome sizes and 110
GC contents was prepared in Excel version 16.31. 111
3. Results and Discussion 112
3.2. Phylogenetics, taxonomy and species richness 113
The sequenced phages were analysed strictly in silico, focusing on their relatedness to known phages, 114
their taxonomy and any distinctive characteristics. Based on the confirmed morphology of closely related 115
phages, at least four different families are represented, Myoviridae (58%), Siphoviridae (25%), Podoviridae (7.4%) 116
and even the single stranded DNA (ssDNA) Microviridae (5.9%) (Table 1). Five (3.7%) of the phages are so 117
distinct that they could not with certainty be assigned to any family. A similar distribution was found by Korf 118
et al., (2019), i.e. 70% Myoviridae, 22% Siphoviridae and 8% Podoviridae, just as an analysis of the genomes of 119
Caudovirales infecting Enterobacteriaceae, by Grose & Casjens (2014), also identified more clusters belonging to 120
the Myoviridae, than the Siphoviridae and fewest of the Podoviridae [8,22]. Jurczak-Kurek et al., (2016) found more 121
siphoviruses than myovirus, but also found the Podoviridae to be the least abundant [23]. Nonetheless, these 122
distributions are just as likely to be caused by culture or isolation methods, as they are to reflect true 123
abundances. Based on DNA homology and phylogenetic analysis, the 104 unique phages identified in this 124
study group into 14 distinct clusters and nine singletons, having a <1% Gegenees score between clusters and 125
singletons, with the exceptions of cluster IV, V and Jahat which have inter-Gegenees scores of up to 20%, cluster 126
IX, X and XI which have inter-Gegenees scores of 1-8% and cluster XIII, Sortsne and Sortkaff which have inter-127
Gegenees scores of 6-14% (Figure 1, Table 2). Excluding one phage in cluster I and two phages in cluster XIV, 128
the intra-Gegenees score of all clusters is above 39% (Figure 1). 129
130
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
131
Figure 1 Phylogenetic tree (Maximum log Likelihood: -1325.01, large terminase subunit (Caudovirales) or DNA replication protein gpA (Microviridae)), scalebar: substitutions 132 per site, morphology is indicated by symbols (Myoviridae: ☐, Siphoviridae: ▽, Podoviridae: ○, Microviridae: ◇) and phylogenomic nucleotide distances (Gegenees, BLASTn: 133 fragment size: 200, step size: 100, threshold: 0%) 134
I
II
III
IV
V
VI
VII
VIIIIX
X
XI
XII
XIII
XIV
Organism 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104
1: MG_194_C2_Esch_VpaE1 100 82 79 78 82 82 81 80 82 80 83 80 77 74 78 79 76 79 83 21 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2: MG_196_C1_Esch_AYO145A 76 100 75 77 83 80 77 78 78 78 78 76 78 73 75 78 77 76 80 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3: MG_191_C1_Esch_AYO145A 75 76 100 77 78 77 84 75 80 78 77 83 76 72 77 75 79 77 77 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4: MG_189_C2_Esch_Alf5 73 77 76 100 78 76 76 75 77 77 77 77 78 71 75 76 76 75 77 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5: MG_160_C2_Esch_AYO145A 78 83 77 79 100 83 80 84 79 82 81 81 80 73 76 83 79 78 83 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6: MG_106_C1_Esch_VpaE1 80 84 79 79 85 100 81 83 83 82 82 81 81 76 79 82 81 82 84 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7: MG_079_C1_Esch_VpaE1 76 78 84 76 80 78 100 80 80 79 80 81 78 75 78 79 79 78 78 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8: MG_053_C1_Esch_Alf5 77 80 76 77 85 81 81 100 78 79 83 82 77 73 76 84 77 76 82 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9: MG_198_C3_Esch_VpaE1 77 79 80 78 79 80 80 78 100 79 81 80 78 73 79 80 80 80 81 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10: MG_049_C2_Esch_AYO145A 76 80 78 79 83 80 80 78 80 100 78 79 77 74 77 79 79 76 80 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11: MG_087_C3_Esch_Alf5 79 79 78 78 82 80 81 82 82 78 100 83 79 73 78 84 79 79 83 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12: MG_156_C2_Esch_O157 76 78 84 79 82 78 82 81 81 79 82 100 78 75 78 82 82 79 81 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
13: MG_172_C1_Esch_Alf5 74 80 77 79 81 80 79 77 78 77 78 79 100 75 81 78 79 79 79 19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
14: MG_047_C1_Sal_VSe11 74 78 76 75 78 77 80 76 78 77 77 79 79 100 80 76 80 76 77 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
15: MG_131_C5_Esch_AYO145A 74 77 78 76 76 77 78 75 80 77 78 78 80 76 100 76 80 76 78 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
16: MG_137_C3_Esch_Alf5 77 81 77 79 85 81 81 86 82 80 85 84 79 74 78 100 84 78 84 21 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
17: MG_081_C1_Sal_Mushroom 74 79 80 79 81 81 81 79 82 80 80 83 80 77 81 83 100 80 81 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
18: MG_084_C1_Esch_VpaE1 76 78 78 77 79 81 80 76 81 76 79 80 80 73 77 77 80 100 79 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
19: MG_008_C1_Sal_Mushroom 78 81 76 77 83 81 78 81 81 79 82 81 78 73 78 81 79 78 100 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
20: MG_017_C2_Esch_SUSP2 21 20 20 19 20 20 20 19 19 21 20 21 19 19 20 20 20 20 21 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
21: MG_179_C1_Esch_SECphi17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 92 88 92 86 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
22: MG_187_C21_Esch_SECphi17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 92 100 91 93 88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23: MG_044_C1_Esch_SECphi17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 88 90 100 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
24: MG_107_C2_Esch_SECphi17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 92 93 91 100 87 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
25: MG_138_C36_Esch_SECphi17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 88 89 92 89 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
26: MG_185_C4_Esch_Murica 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 90 84 84 88 83 86 90 91 87 90 89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
27: MG_197_C2_Esch_slur12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 92 100 84 85 87 85 86 88 90 86 89 88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
28: MG_132_C2_Esch_O157 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 84 83 100 83 86 85 87 85 85 88 85 84 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
29: MG_086_C4_Esch_0157 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 83 83 82 100 82 80 81 79 80 84 80 80 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
30: MG_097_C2_Esch_FFH2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 89 85 85 83 100 86 87 86 87 86 88 87 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
31: MG_120_C1_esch_2JES-2013 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 84 85 86 82 87 100 87 85 86 89 85 86 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
32: MG_130_C1_Ent_FV3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 86 85 87 82 86 86 100 89 88 88 87 87 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
33: MG_150_C5_Esch_APCEc02 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 91 87 85 80 87 84 89 100 91 86 91 89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
34: MG_160_C1_Esch_APCEc02 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 91 89 85 81 87 85 88 91 100 88 92 91 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
35: MG_183_C17_Esch_APCEc02 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 87 85 87 84 86 87 87 86 87 100 87 86 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
36: MG_003_R1_C1_Esch_FFH2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 91 88 85 82 89 85 88 91 92 88 100 93 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
37: MG_039_C5_Esch_FFH2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 88 84 81 88 85 88 90 92 87 93 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
38: MG_184_C35_Esch_ST32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
39: MG_099_C3_Esch_SLUR25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
40: MG_001_C1_Sal_wksl3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
41: MG_106_C6_Shi_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 71 80 79 71 77 75 76 9 8 10 18 8 10 10 7 9 8 9 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
42: MG_182_C1_Shi_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 74 100 76 76 74 80 79 78 10 9 10 18 9 10 10 11 11 11 10 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
43: MG_146_C2_Shi_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 77 68 100 82 70 73 72 74 8 7 8 16 7 9 8 7 9 8 8 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
44: MG_010_C12_Shi_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 76 69 83 100 73 74 76 76 6 7 7 15 7 8 9 8 9 9 9 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
45: MG_187_C4_Shi_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 73 70 74 77 100 73 75 76 11 10 10 18 9 11 11 10 11 9 10 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
46: MG_086_C1_Sal_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 75 74 74 74 70 100 78 84 7 8 9 17 8 9 9 7 10 9 9 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
47: MG_142_C100_Shi_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 74 73 74 77 72 78 100 80 9 8 9 17 8 9 9 8 10 8 9 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
48: MG_193_C1_Shi_pSf-1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 73 70 74 75 72 83 79 100 9 10 10 17 9 10 10 7 10 9 12 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
49: MG_145_C1_Esch_Swan01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 9 8 6 10 7 9 9 100 20 20 19 17 19 20 14 16 14 14 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
50: MG_081_C2_Esch_SECphi17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 9 7 7 10 8 8 9 19 100 85 79 84 87 88 76 76 77 78 87 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
51: MG_164_C9_Esch_SECphi27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 9 8 7 10 9 9 10 20 85 100 88 89 89 90 74 76 75 76 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
52: MG_187_C12_Esch_Swan01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 17 17 16 15 17 17 17 18 18 78 87 100 80 84 82 67 69 69 70 82 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
53: MG_126_C1_Esch_SECphi27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 9 7 7 9 8 8 9 17 86 90 82 100 91 93 76 77 76 78 92 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
54: MG_049_C5_Esch_Swan01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 9 9 8 10 9 9 10 19 87 89 84 89 100 91 73 76 76 77 91 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
55: MG_190_C1_Esch_SECphi27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 10 8 9 11 9 9 11 20 89 90 83 91 91 100 74 78 78 78 91 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
56: MG_022_C3_Esch_phAPEEC8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 10 7 8 10 7 8 7 14 77 74 68 75 73 74 100 84 79 83 74 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
57: MG_086_C6_Esch_SECphi27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 10 9 9 11 10 9 11 15 76 76 70 75 76 77 84 100 80 83 77 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
58: MG_135_C1_Esch_SECphi27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 10 8 9 9 9 9 9 14 76 74 69 73 75 76 78 79 100 81 76 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
59: MG_197_C3_Esch_SECphi27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 10 8 9 10 9 9 12 15 78 76 70 76 76 78 82 82 81 100 78 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
60: MG_192_C21_Esch_Swan 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 8 7 7 9 8 8 9 20 88 90 83 91 92 91 74 77 77 79 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
61: MG_022_C1_Esch_SECphi27 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 84 82 85 78 4 5 4 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
62: MG_091_C1_Esch_phAPEC8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 84 100 86 77 81 5 5 4 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
63: MG_136_C1_Esch_phAPEC8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 81 85 100 80 82 5 4 5 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
64: MG_187_C1_Esch_phAPEC8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 89 80 84 100 80 3 3 3 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
65: MG_145_C3_Esch_ESCO13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 78 80 82 77 100 4 3 4 3 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
66: MG_170_C16_Esch_PHB05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 5 3 4 100 92 91 91 91 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
67: MG_188_C1_Ent_ECGD1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 4 3 3 91 100 90 91 92 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
68: MG_146_C3_Esch_PHB05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 5 3 4 90 91 100 88 89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
69: MG_048_C2_Ent_ECGD1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 5 3 4 90 91 88 100 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
70: MG_089_C1_Esch_PHB05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 5 3 4 91 93 90 91 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
71: MG_199_C3_Esch_ECD7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
72: MG_148_C1_Ent_RB49 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 89 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
73: MG_002_C1_Ent_JS10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 93 2 2 1 2 2 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
74: MG_193_C3_Ent_JS10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 100 2 3 2 3 2 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
75: MG_002_C2_Esch_O157 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 100 88 88 89 8 8 8 8 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
76: MG_109_C1_Esch_APCEc01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 85 100 87 87 8 8 8 8 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
77: MG_107_C1_Esch_PhAPEC2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 85 87 100 87 8 7 8 8 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
78: MG_135_C2_Ent_RB69 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 3 86 87 86 100 8 8 8 8 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
79: MG_110_C1_Esch_UFV13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 8 9 8 8 100 87 84 86 86 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
80: MG_116_C2_Esch_slur13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 7 8 7 8 87 100 83 86 88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
81: MG_036_C1_Yer_phiD1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 8 8 8 8 86 85 100 86 86 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
82: MG_044_C3_Esch_UFV13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 8 8 8 8 86 86 85 100 86 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
83: MG_178_C1_Yer_phiD1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 7 8 7 8 86 89 85 86 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
84: MG_038_C11_Ent_IME_EC2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
85: MG_099_C1_Esch_Gluttony 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 72 49 66 64 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
86: MG_180_C4_Esch_Pride 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 71 100 48 66 67 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
87: MG_124_C1_Esch_Sloth 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 48 48 100 46 48 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
88: MG_043_C4_Esch_SECphi18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 65 66 47 100 78 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
89: MG_046_C1_Esch_slur05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 64 67 49 79 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
90: MG_144_C1_Halfdan 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0
91: MG_038_Ent_St-1C6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 39 0 13 6 0 0 0 0 0 0 0 0 0
92: MG_180_C3_Ent_IME_EC2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 42 100 0 12 7 0 0 0 0 0 0 0 0 0
93: MG_053_C26_Skarpretter 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0
94: MG_165_C4_Kleb_IME279 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 11 0 100 14 0 0 0 0 0 0 0 0 0
95: MG_162_C4_Kleb_IME279 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 6 0 13 100 0 0 0 0 0 0 0 0 0
96: MG_126_C2_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 22 13 13 15 15 14 15 17
97: MG_039_C2_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22 100 27 28 25 25 23 23 25
98: MG_130_C3_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 26 100 90 47 48 65 67 47
99: MG_144_C2_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 28 93 100 48 48 69 68 47
100: MG_200_C27_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 24 46 46 100 85 41 42 88
101: MG_116_C10_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 24 48 47 86 100 43 43 84
102: MG_182_C43_Ent_J8+65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 23 66 67 43 43 100 93 44
103: MG_140_C1_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 23 67 66 43 44 93 100 44
104: MG_168_C20_Ent_J8-65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 17 24 46 46 88 83 43 44 100
Heat plot for: Comparison[ALL_090419], Alignment[analysis], Analysis[*default*] file:///Users/au574427/Desktop/All_coliphages/All_110419.html
1 of 1 11/04/2019, 11.49
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
2
Table 1 List of 104 unique Escherichia phages identified in 94 Danish wastewater samples. Novelty (*) is based on the 95% demarcation of species. Similarity is sequence 135 identity (%) times sequence coverage (%) to closest related (Blastn). Taxonomy is based on similarity (BLASTn) to closest related and the ICTV Master Species list. 136
Escherichia
Phage
Genome
(bp)
ORFs
(n)
tRNAs
(n)
GC
(%)
Novel
Sample
Location
Cluster
Taxonomy
Similarity
(%)
Accession
number
Tootiki 88257 128 22 39 * 3D Avedøre I Felixounavirus 90,2 MN850647
Mio 83431 121 18 39,1 * 13C Hadsten I Felixounavirus 89,7 MN850631
Allfine 86963 125 20 39 * 14A Hammel I Felixounavirus 91,2 MN850633
Bumzen 87360 126 20 39,1 * 14B, 14D, 15A Hammel, Hinnerup I Felixounavirus 92,5 MN850635
Dune 88511 129 20 39 * 19C Tørring I Felixounavirus 91,5 MN850636
Warpig 86106 127 17 39 * 20A Helsingør I Felixounavirus 93 MN850637
Radambza 86702 127 19 38,9 * 20D Helsingør I Felixounavirus 91,6 MN850639
Ekra 87282 128 20 38,9 * 21C Herning I Felixounavirus 92,9 MN850644
Humlepung 85311 119 19 39,1 * 12B. 25B, 37A Drøsbro, Gram, Bogense I Felixounavirus 92,1 MN850564
Finno 87554 129 20 38,9 * 31C Skovby I Felixounavirus 89,7 MN850619
Garuso 85798 130 20 38,9 * 29A, 33A Nustrup, Sommersted I Felixounavirus 90,9 MN850566
Momo 88168 130 20 39 * 38D Hofmansgave I Felixounavirus 90,7 MN850580
Heid 87590 126 20 39 * 8C, 24B, 24C, 34D, 37D, 38B Esbjerg Ø, Bevtoft, Vojens, Bogense,
Hofmansgave I Felixounavirus 91,2 MN850577
Skuden 87263 131 20 38,9 * 41D Odense NV I Felixounavirus 91,1 MN850585
Pinkbiff 88814 129 20 39 * 46A Marselisborg I Felixounavirus 93,9 MN850603
Fjerdesal 87715 128 21 39 * 3A, 39A, 46C Avedøre, Hårslev, Marselisborg I Felixounavirus 90,6 MN850605
Andreotti 83391 117 20 39,2 * 47B Egå I Felixounavirus 91,9 MN850610
Nataliec 89137 134 20 39 * 47D, 48C Egå, Viby I Felixounavirus 90,3 MN850611
Adrianh 88226 128 19 38,9 * 42C, 48B Otterup, Viby I Felixounavirus 91,1 MN850614
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
3
Escherichia
Phage
Genome
(bp)
ORFs
(n)
tRNAs
(n)
GC
(%)
Novel
Sample
Location
Cluster
Taxonomy
Similarity
(%)
Accession
number
Mistaenkt 86664 128 22 47,2 * 6A Kolding I Suspvirus 91,1 MN850587
Nimi 137039 213 5 43,7 * 11C Varde III Vequintavirus 93,3 MN850626
Navn 141707 224 4 43,6 * 21B Herning III Vequintavirus 91,1 MN850642
Nomine 137991 220 5 43,6 * 23A Lemvig III Vequintavirus 91,5 MN850649
Naswa 138583 222 5 43,6 * 45A Ålborg Ø III Vequintavirus 93,1 MN850595
naam 137129 215 5 43,7
31D Skovby III Vequintavirus 94,5 MN850630
Ime 137114 217 5 43,6 * 12C, 36B Drøsbo III Vequintavirus 93,1 MN850576
Magaca 135826 217 5 43,6
48A Viby III Vequintavirus 96 MN850612
Nom 136114 213 5 43,6 * 28D Jegerup III Vequintavirus 92,6 MN850646
Isim 138289 219 5 43,6 * 31B Skovby III Vequintavirus 93,8 MN850597
Nomo 137702 218 5 43,7 * 38D Hofmansgave III Vequintavirus 93,3 MN850578
Inoa 138710 220 5 43,6 * 44C Ålborg V III Vequintavirus 92 MN850593
Pangalan 136944 215 5 43,7
2A, 45D, 48D Grindsted, Egå, Viby III Vequintavirus 94,8 MN850627
Tuntematon 150473 279 11 39,1 * 7B, 7C Esbjerg V VI Myoviridae 89,6 MN850618
Anhysbys 149335 271 11 39,1 * 22C Hillerød VI Myoviridae 91,5 MN850648
Ukendt 150947 266 11 39 * 32D Skrydstrup VI Myoviridae 88,7 MN850565
Nepoznato 151514 265 10 38,9 * 19D, 35A, 48A, 48D Tørring, Årøsund, Viby VI Myoviridae 85,6 MN850571
Nieznany 144998 254 11 39,1 * 45C Ålborg Ø VI Myoviridae 88,9 MN850598
Muut 146307 243 13 37,4 * 35B Årøsund VII Myoviridae 92 MN850573
Alia 147009 246 13 37,5 * 13D Hadsten VII Myoviridae 93,1 MN850632
Outra 145482 246 13 37,4 * 22A Hillerød VII Myoviridae 93,8 MN850645
Inny 147483 247 13 37,4 * 45D Ålborg Ø VII Myoviridae 92,4 MN850601
Arall 145715 242 13 37,4
41B Odense NØ VII Myoviridae 94,6 MN850584
Kvi 163673 266 - 40,5 * 48C Viby VIII Krischvirus 94,2 MN850615
Kaaroe 163719 267 - 40,5
35D Årøsund VIII Krischvirus 94,7 MN850574
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
4
Escherichia
Phage
Genome
(bp)
ORFs
(n)
tRNAs
(n)
GC
(%)
Novel
Sample
Location
Cluster
Taxonomy
Similarity
(%)
Accession
number
Dhabil 165644 266 3 39,5 * 1B Billund IX Dhakavirus 87,5 MN850621
Dhaeg 170817 278 3 39,4 * 47A, 47C Egå IX Dhakavirus 87,4 MN850609
Mogra 168724 263 2 37,7 * 25C Gram X Mosigvirus 91,1 MN850579
Mobillu 163063 255 2 37,7
1B Billund X Mosigvirus 94,5 MN850622
Moha 168676 267 2 37,6
26A Haderslev X Mosigvirus 94,8 MN850590
Moskry 169410 269 2 37,6 * 32C Skrydstrup X Mosigvirus 93,2 MN850651
Teqskov 165017 257 6 35,4 * 10D Skovlund XI Tequatrovirus 91,7 MN895437
Teqdroes 166833 269 10 35,4 * 12D Drøsbro XI Tequatrovirus 88,6 MN895438
Teqhad 167892 270 10 35,3 * 26B Haderslev XI Tequatrovirus 90,1 MN895434
Teqhal 168070 266 11 35,4 * 27A, 27D Halk XI Tequatrovirus 93,9 MN895435
Teqsoen 166468 268 10 35,5 * 43B Søndersø XI Tequatrovirus 91,7 MN895436
Flopper 52092 78 1 44,2 * 44D Ålborg V - Myoviridae 87 MN850594
Damhaus 51154 89 - 44,1 * 4B Damhusåen IV Hanrivervirus 85,8 MN850602
Herni 50971 89 - 44,1 * 21B, 30D Herning, Over Jerstal IV Hanrivervirus 87,6 MN850640
Grams 49530 83 - 44,1 * 25B Gram IV Hanrivervirus 87,1 MN850567
Aaroes 51662 92 - 44,1 * 35B Årøsund IV Hanrivervirus 83 MN850572
Aalborv 46660 79 - 43,9 * 44B Ålborg V IV Hanrivervirus 86,9 MN850591
Haarsle 48613 85 - 44 * 39B, 45C Hårslev, Ålborg Ø IV Hanrivervirus 87,1 MN850600
Egaa 51643 87 - 44,1 * 47A Egå IV Hanrivervirus 89,7 MN850607
Vojen 50709 86 - 44,1 * 34B Vojens IV Hanrivervirus 89,7 MN850569
Tiwna 51014 85 - 44,6 * 21B Herning V Tunavirinae 87,2 MN850643
Tonijn 51627 86 - 44,6 * 32B, 32C Skrydstrup V Tunavirinae 88,4 MN850641
Tonnikala 51277 86 - 44,8 * 48A Viby V Tunavirinae 86,4 MN850613
Atuna 50732 88 - 44,6 * 7B Esbjerg V V Tunavirinae 84,9 MN850620
Tunus 51111 87 - 44,8 * 20A Helsingør V Tunavirinae 93,7 MN850638
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
5
Escherichia
Phage
Genome
(bp)
ORFs
(n)
tRNAs
(n)
GC
(%)
Novel
Sample
Location
Cluster
Taxonomy
Similarity
(%)
Accession
number
Orkinos 49798 81 - 44,6 * 30B, 30D Over Jerstal V Tunavirinae 91,3 MN850586
Ityhuna 50768 86 - 44,7 * 39D Hårslev V Tunavirinae 93,3 MN850582
Tonn 51012 87 - 44,5 * 8C, 45C Esbjerg Ø, Ålborg Ø V Tunavirinae 94 MN850596
Tinuso 50856 86 - 44,8
14A Hammel V Tunavirinae 97,3 MN850634
Tunzivis 50596 84 - 44,6
46B Marselisborg V Tunavirinae 94,5 MN850604
Tuinn 50505 86 - 44,7
46D Marselisborg V Tunavirinae 94,8 MN850606
Jahat 51101 87 - 45,7 * 35A Vojens - Tunavirinae 68,5 MK552105
Bob 45252 63 - 54,5 * 12C Drøsbro XII Dhillonvirus 88,6 MN850628
Mckay 44443 63 - 54,5 * 13B Hadsten XII Dhillonvirus 83,8 MN850629
Jat 44417 63 - 54,5 * 23C Lemvig XII Dhillonvirus 89,4 MN850650
Rolling 46017 64 - 54,2 * 29D Nustrup XII Dhillonvirus 80,2 MN850575
Welsh 45207 62 - 54,6 * 23A, 43D Lemvig, Søndersø XII Dhillonvirus 83,8 MN850589
Buks 40308 62 - 49,7 * 1A Billund - Jerseyvirus 91,3 MN850616
Skure 59474 92 - 44,6 * 23C Lemvig - Seuratvirus 90,4 MK672798
Halfdan 42858 57 - 53,7 * 34D Vojens - Siphoviridae 28,8 MH362766
Lilleen 5342 6 - 46,9 * 33A Sommersted II Gequatrovirus 93,8 MK629526
Lilleput 5490 6 - 47 * 12D Drøsbro II Gequatrovirus 93,4 MK629525
Lilleto 5492 6 - 46,8 * 43C, 43D, 44B Søndersø, Ålborg V II Gequatrovirus 92,7 MK629529
Lilledu 5483 6 - 47,2 * 25C Gram II Gequatrovirus 92,6 MK791318
Lillemer 5492 6 - 47,1
45C Ålborg Ø II Gequatrovirus 94,5
Lilleven 6090 9 - 44,4 * 11B Varde - Alphatrevirus 93,9 MK629527
Sortsyn 42116 61 - 59 * 11B Varde XIII Unclassified 92,3 MN850623
Sortregn 38200 53 - 59,3
43D Søndrsø XIII Unclassified 97,3 MN850588
Skarpretter 42042 63 - 55,8 * 15A Hinnerup - Unclassified 37,9 MK105855
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
6
Escherichia
Phage
Genome
(bp)
ORFs
(n)
tRNAs
(n)
GC
(%)
Novel
Sample
Location
Cluster
Taxonomy
Similarity
(%)
Accession
number
Sortkaff 42538 61 - 59,5 * 39B Hårslev - Unclassified 89,8 MN850581
Sortsne 41912 62 - 60 * 40A Odense NV - Unclassified 67,6 MK651787
Aldrigsur 42379 55 - 55,7 * 44B Ålborg V XIV Autographvirinae 71,9 MN850592
Altidsur 42197 53 - 55,7 * 33D Sommersted XIV Autographvirinae 71,8 MN850568
Forsur 42476 56 - 55,4 * 48D Viby XIV Autographvirinae 72 MN850617
Glasur 42507 56 - 55,4 * 40D Odense NV XIV Autographvirinae 72,3 MN850583
Lidtsur 42291 56 - 54,6 * 30B Over Jerstal XIV Autographvirinae 69 MK629528
Megetsur 42132 54 - 55,8 * 31B Skovby XIV Autographvirinae 73,1 MN850608
Mellemsur 40770 50 - 55,8 * 34D Vojens XIV Autographvirinae 76,4 MN850570
Smaasur 41110 50 - 55,4 * 11C Varde XIV Autographvirinae 93,3 MN850625
Usur 41906 51 - 55,4 * 27A, 27D Halk XIV Autographvirinae 73,3 MN850624
137 138
The high-throughput screening method favours easily culturable plaque-forming lytic phages. Still, we identified 104 unique Escherichia phages of 139
which only 16% were ≥95% similar (BLAST) to already published phages (Table 2). Phages were identified in wastewater samples from 43 of the 48 140
investigated treatment facilities. From the majority of positive samples (58) a single phage was sequenced, however in some samples the lysate held 141
more than one phage. Twenty-five of the lysates held two phages, eight lysates held three phages and one had as many as four phages. Of the 104 142
unique phages, 91 represent novel species (Table 1, 2). Of these, 51 differed by ≥10% from published phage genomes and some have NT similarities as 143
low as 29% (Table 2).144
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
These newly sequenced phages represent a substantial quota of divergent lytic Escherichia phages in Danish 145
wastewater, but are still far from disclosing the true diversity hereof (Figure 2). An extrapolation of species 146
richness (q = 0) predicts a total of 292 distinct species (requiring a sample size of ∼900 phages). The relatively 147
small sample-size in this study (n = 136), may subject the estimation to a large prediction bias. The sampling-148
method also introduces a bias by selecting for abundance and burst size, thereby potentially underestimating 149
diversity. Nonetheless, the results provide an indication of the minimal diversity of lytic Escherichia (MG1655, 150
K-12) phages in Danish wastewater, estimated to be as a minimum be in the range of 160 to 420 unique phage 151
species (Figure 2b). The novelty and diversity of these wastewater phages is truly remarkable and verifies our 152
hypothesis, as well as the efficiency of the High-throughput Screening Method for exploring diverse phages of a 153
single host (not published). 154
(a) (b)
Figure 2 (a) Sample-size-based rarefaction and extrapolation curve with confidence intervals (0.95). (b) Sample 155 completeness curve with confidence intervals (0.95). 156
157
0
100
200
300
400
0 250 500 750Number of sampling units
Spec
ies
dive
rsity
interpolated extrapolated
0.00
0.25
0.50
0.75
1.00
0 250 500 750Number of sampling units
Sam
ple
cove
rage
interpolated extrapolated
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
2
Table 2 Taxonomic distribution of phages identified in 94 Danish wastewater samples, 158 based on similarity to closest related and the ICTV Master Species list. 1. ≤95% similarity 159 to other phages in the dataset. 2. ≤95% similarity to other phages in the dataset and to 160 published phage genomes. 161
Taxonomy* total unique1 novel2
Caudovirales; Myoviridae; Ounavirinae; Felixounavirus 33 19 19
Caudovirales; Myoviridae; Ounavirinae; Suspvirus 1 1 1
Caudovirales; Myoviridae; Tevenvirinae; Krischvirus 2 2 1
Caudovirales; Myoviridae; Tevenvirinae; Dhakavirus 3 2 2
Caudovirales; Myoviridae; Tevenvirinae; Mosigvirus 4 4 2
Caudovirales; Myoviridae; Tevenvirinae; Tequatrovirus 6 5 5
Caudovirales; Myoviridae; Vequintavirinae; Vequintavirus 15 12 9
Caudovirales; Myoviridae 15 11 10
Caudovirales; Podoviridae; Autographivirinae 10 9 9
Caudovirales; Siphoviridae; Dhillonvirus 6 5 5
Caudovirales; Siphoviridae; Guernseyvirinae; Jerseyvirus 1 1 1
Caudovirales; Siphoviridae; Seuratvirus 1 1 1
Caudovirales; Siphoviridae; Tunavirinae; Hanrivervirus 10 8 8
Caudovirales; Siphoviridae; Tunavirinae 15 12 8
Caudovirales; Siphoviridae 1 1 1
Microviridae; Bullavirinae; Alphatrevirus 1 1 1
Microviridae; Bullavirinae; Gequatrovirus 7 5 4
Unclassified bacterial viruses 5 5 4
Total 136 104 91
162
3.2. Phage genome characteristics 163
The sequenced phage genomes range in sizes from 5 342 bp of the Microviridae lilleen to a span in the 164
Caudovirales from 38 200 bp of the unclassified sortregn to 170 817 bp of the Dhakavirus dhaeg (Figure 3, Table 165
2). GC contents also vary greatly, from only 35.3% (Tequatrovirus teqhad) and up to 60.0% (the unclassified 166
sortsne) (Figure 3,Table 2), heavily diverging from the host GC content of 50.79%. Phages often have a lower 167
GC contents than their host [49], as observed for the majority (81%) of the wastewater phages (Table 2). A 168
lower GC content of phage genomes tends to correlate with an increase in genome size [50], as is also the case 169
for the Caudovirales in this study (Table 2). Rocha & Danchin (2002) hypothesized that phages, along with 170
plasmids and insertion sequences, can be considered intracellular pathogens, and like host dependent bacterial 171
pathogens and symbionts, they experience competition for the energetically expensive and less abundant GTP 172
and CTP and as a consequence develop genomes with comparatively higher AT contents [49]. However, the 173
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
3
differences in AT richness reported by Rocha & Danchin (2002) for dsDNA and isometric ssDNA phages were 174
merely 4.2% and 5% [49], while the dsDNA and isometric ssDNA phages in this study deviate from their 175
isolation host by having up to 31% higher and up to 19% lower AT content (Table 2). Accordingly, the 176
considerable differences in GC/AT contents may instead primarily be a reflection of past host relations [14]. 177
The sequencing of lysates and not only plaquing phages, may have contributed in enabling the capture of such 178
a broad GC-content and overall diversity. 179
180 Figure 3 Bubble-diagram of the 104 unique Escherichia phages displaying genome size- and GC-content distribution. Area 181 of bubbles indicates number of phages. The yellow bubble is Podoviridae phages, green ones are Siphoviridae phage clusters, 182 dark-blue ones are Myoviridae phage clusters with tRNAs, the light-blue bubble is Myovridae phages without tRNAs, the 183 red bubble is Microviridae phages and the grey one is unclassified phages. 184
185
The genome screening algorithms identified no sequences coding for homologs of known virulence or 186
antibiotic resistance genes. Though not a definitive exclusion, this interprets as a reduced risk of presence, a 187
preferable trait for phage therapy application. Currently available tools for AMG screening of viromes did not 188
provide a comprehensive and exclusive assessment of the AMG pool in the dataset. The majority of genes 189
identified are not AMGs, but code for phage DNA modification pathways (Table S1). The function of some of 190
30
35
40
45
50
55
60
65
-5 5 15 25 35 45 55 65 75 85 95 105 115 125 135 145 155 165 175 185
GC
con
tent
(%)
Genome size (kb)
Podoviridae SiphoviridaeMyoviridae w. tRNAs Myoviridae no tRNAsMicroviridae Unclassifed
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
4
the suggested AMGs is unknown, these include a nicotinamide phosphoribosyltransferase (NAMPT) present 191
in cluster I (Felixounavirus and Suspvirus) and VII (unclassified Myoviridae), a 3-deoxy-7-phosphoheptulonate 192
(DAHP) synthase present in cluster X (Mosigvirus) and a complete dTDP-rhamnose biosynthesis pathway 193
present in cluster VI and VII (unclassified Myoviridae). The presence of a dTDP-rhamnose biosynthesis pathway 194
in the DNA metabolism region of phage genomes is peculiar, one possible explanation is, that these phages 195
utilize rhamnose for glycosylation of hydroxy-methylated nucleotides in the same manner as the T4 generated 196
glucosyl-hmC [51]. Dihydrofolate reductase, identified in six of the Myoviridae clusters, is involved in thymine 197
nucleotide biosynthesis, but is also a structural component in the tail baseplate of T-even phages [52]. Multiple 198
verified DNA modification genes were identified. Methyltransferases, some putative, were detected in all 199
Myoviridae of cluster III (vequintavirus), VI (unclassified), VII (unclassified), VIII (Krischvirus), in all Siphoviridae 200
of cluster IV (Hanrivervirus) and in the unclassified phages of cluster XIII. Indeed, cluster XIII and the singletons 201
skarpretter, sortsne, and sortkaff code for both DNA N-6-adenine-methyltransferases (dam) and DNA cytosine 202
methyltransferases (dcm). Finally, two novel epigenetic DNA hypermodification pathways were identified. 203
The mosigviruses of cluster X code for arabinosylation of hmC [51], while the novel seuratvirus Skure codes 204
for the recently verified complex 7-deazaguanine DNA hypermodification system [53,54]. 205
3.3. Forty-eight novel Myoviridae phages species 206
The Myoviridae genomes (56 unique, 49 novel) represents the greatest span in genome sizes in this study, 207
from the unclassified flopper of 52092 bp to the dhakavirus dhaeg of 170817 bp), all, except the krischviruses, 208
code for tRNAs (Figure 3, Table 2). The Myoviridae group into nine distinct clusters and four singletons, 209
representing at least three subfamilies; Tevenvirinae, Vequintavirinae and Ounavirinae in addition to 11 210
unclassified Myoviridae (cluster VI & VII) (Figure 1, Table 2). The eleven novel Tevenvirinae distributes into four 211
distinct genera, two of the Krischvirus (cluster VIII, 164kb, 40.5% GC, no tRNAs), two of the Dhakavirus (cluster 212
IX, 166-170 kb, 39.4-39.5% GC, 3 tRNAs), two of the Mosigvirus (cluster X, 169 kb, 37.6-37.7% GC, 2 tRNAs) and 213
five of the Tequatrovirus (cluster XI, 165-168 kb, 35.3-35.5% GC, 6-11 tRNAs), while the nine novel 214
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
5
Vequintavirinae are all vequintaviruses (cluster III, 136-142 kb, 43.6-43.7% GC, 5 tRNAs) closely related (91.1-215
93.8%, BLAST) to classified species. The vequintaviruses were identified in samples from 12 treatment 216
facilities. Only reads from two samples of a dataset of human gut viromes based on timeseries of faecal 217
samples from 10 healthy persons [55] mapped to the wastewater phages, and only to vequintaviruses. These 218
reads covered 43-54% and 13-26% of all the Vequintavirus genomes except pangalan and navn, respectively. 219
This suggests that the vequintaviruses are related to entero-phages. All but one of the Ounavirinae (cluster I, 220
82-89 kb, 38.9-39.2% GC, 17-22 tRNAs) are felixounaviruses (89.7-93.9%, BLAST) with an intra-Gegenees score 221
of 71-86% (Figure 1). The Felixounavirus is a relatively large genus with 17 recognized species. In this study, 222
felixounaviruses were identified in samples from 23 of the 48 facilities, indicating that they are both ubiquitous 223
in Danish wastewater, numerous and easily cultivated. The last ounavirus, mistaenkt (86.7 kb, 47.2% GC, 22 224
tRNAs) is a Suspvirus. The five cluster VI phages (144.9-151.5 kb, 39±0.1% GC, 10-11 tRNAs) belong to a 225
phylogenetic distinct clade, the recently proposed ´Phapecoctavirus´, and have substantial similarity (86-90%, 226
BLAST) with the anticipated type species Escherichia phage phAPEC8 (JX561091) [22,56]. Cluster VII have 227
significantly (p = 0.038) larger genomes (145.8-147.5 kb) with marginally, though not significantly (p = 0.786), 228
lower GC contents of 37.4-37.5%, all have 13 tRNAs (Table 2) and code for NAMPT not present in cluster VI 229
(Table S1). As a group, cluster VII are even more homogeneous than cluster VI and all are closely related (92-230
95%, BLAST) to the same five unclassified Enterobacteriaceae phages vB_Ecom_PHB05 (MF805809), 231
vB_vPM_PD06 (MH816848), ECGD1 (KU522583), phi92 (NC_023693) and vB_vPM_PD114 (MH675927) 232
[57,58], with whom they represent a novel unclassified genera. The distinctive and novel singleton flopper 233
only shares NT similarity (36-87%, BLAST) with six other phages, the Escherichia phages ST32 (MF04458) [59] 234
and phiEcoM_GJ1 (EF460875) [60], the Erwinia phages Faunus (MH191398) [61] and vB_EamM-Y2 235
(NC_019504) [62], and the Pectobacterium phages PM1 (NC_023865) [63] and PP101 (KY087898). This group of 236
unclassified phages all have genomes of 52.1-56.6 kb with 43.6-44.9% GC and while only flopper, phiEcoM_GJ1 237
and PM1 code for a single tRNA, all of them have exclusively unidirectional coding sequences (CDSs) and 238
code for RNA polymerases, characteristics of the Autographivirinae of the Podoviridae [64]. However, the 239
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
6
verified morphology of ST32, phiEcoM_GJ1, PM1 and PP101, icosahedral head, neck and a contractile tail with 240
tail fibres, classifies them as myoviruses [59,60,62,63]. Based on NT similarity and genome synteny, these seven 241
phages belong to the same, not yet classified, peculiar lineage first described by Jamalludeen et al., (2008) [60]. 242
3.4. Five novel Microviridae phages species 243
The singleton lilleven (6.1 kb, 44.4% GC, no tRNA) and the five (four novel) cluster II phages (5.3-5.5 kb, 244
46.9-47.2% GC, no tRNAs) are all Microviridae, a family of small ssDNA non-enveloped icosahedral phages 245
(Table 2). Lilleven is closely related to (93.9%, BLAST), have pronounced gene pattern synteny and high AA 246
similarities (89-90%), with the Alphatremicrovirus Enterobacteria phage St1 (Figure S2), and is a novel species 247
of the genus Alphatremicrovirus, subfamily Bullavirinae. The cluster II phages comprise four distinct species, 248
only differing by single nucleotide polymorphisms and in non-coding regions (Figure S2). They have high 249
intra-Gegenees score (88-92%) and share genomic organisation and extensive NT similarity (92.6-94.5-%, 250
BLAST) with the unclassified microvirus Escherichia phage SECphi17 (LT960607), primarily differing by 251
single nucleotide polymorphisms and in noncoding regions (Figure S2, Table 2). Cluster II are only related to 252
one recognised phage species Escherichia virus ID52 (63%, BLAST), genus Gequatrovirus, subfamily Bullavirinae, 253
with whom they predominantly differ in the region in and around the major spike protein (gpG), a distinctive 254
marker of the subfamily Bullavirinae involved in host attachment. The phylogenetic analysis separates cluster 255
II from the gequatroviruses, and both the Gegenees scores (<22%) and NT similarities (<65% BLAST) are 256
moderately low (Figure S2). Nonetheless, cluster II are still, based on pronounced genome organisation synteny 257
and a conserved AA similarity (62-64%, Gegenees) considered to be gequatroviruses (Figure S2). 258
The sequencing of the microviruses is peculiar, as library preparation with the Nextera® XT DNA kit 259
applies transposons targeting dsDNA. However, during microvirus infection the host polymerase converts 260
the viral ssDNA into an intermediate state of covalently closed dsDNA, which is then replicated in rolling 261
circle by viral replication proteins transcribed by the host RNA polymerase [65]. This intermediate state may 262
have enabled the library preparation. The presence of host DNA in the sequence results indicates an 263
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
7
insufficient initial DNase1 treatment, which can be attributed to chemical inhibition or inactivation of the 264
enzyme by adhesion to the sides of wells. Hence, it is reasonable to assume that the extracted microvirus DNA 265
was captured as free dsDNA inside host cells during ongoing infections. 266
3.5. Twenty-five novel Siphoviridae phage species 267
In spite of similar genome sizes, the large group of Siphoviridae, is the most diverse (28 unique, 24 novel) 268
in this study, with GC contents ranging from 43.9-54-6% (Figure 3, Table 2). The majority, clusters IV-V and 269
singleton jahat, are of the subfamily Tunavirinae, while the remaining are allocated into at least three divergent 270
genera, Dhillonvirus, Jerseyvirus and seuratvirus. 271
Cluster IV (46.7-51.6 kb, 43.9-44.1% GC, no tRNAs), V (49.8-51.6 kb, 44.6-44.8% GC, no tRNAs) and jahat 272
(51.1 kb, 45.7% GC, no tRNAs) have low inter-Gegenees scores (7-20%) (Figure 1), yet they form a 273
monophyletic clade and are all closely related (85-94%, BLAST) to published Tunavirinae. The Cluster IV phages 274
are notable, as they represent a significant increase to the small genus Hanrivervirus comprising only the type 275
species Shigella virus pSf-1 (51.8 kb, 44% GC, no tRNAs) isolated from the Han River in Korea [66]. The common 276
ancestry of Cluster IV and pSf-1 is evident by comparable genome sizes, GC contents, genomic organization 277
and substantial NT (86-90%, BLAST) and AA (77-85%, Gegenees) similarities (Table 2, Figure S3). During their 278
differentiation, many deletions and insertions of small hypothetical genes have occurred, most notable is a 279
unique version of a putative tail-spike protein in seven of the cluster IV hanriverviruses, indicating divergent 280
host ranges (Figure S3). All the hanriverviruses code for (putative) dam and Psf-1 is resistant towards at least 281
six restriction endonucleases [66], suggesting they employ DNA methylation as a defence strategy. 282
The five novel dhillonviruses of Cluster XII (44.4-46.0 kb, 54.2-54.6% GC, no tRNAs) have substantial NT 283
similarities with a group of unclassified dhillonviruses (80-89%, BLAST) (Table 2) and the type species 284
Escherichia virus HK578 (77-80%, BLAST). As with the hanriverviruses, and as observed by Korf et al., (2019) 285
[22], their genomes mainly differ in minor hypothetical genes and in putative tail-tip proteins, indicating 286
divergent host ranges (Figure S4). Based on NT similarity and the presence of the canonical 7-deazaguanine 287
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
8
operon the singleton skure (59.4 kb, 44.6% GC, no tRNAs) is a seuratvirus, while the singleton Buks (40.3 kb, 288
49.7% GC, no tRNAs) is assigned to the genus Jerseyvirus, subfamily Guernseyvirinae. 289
3.5.1 Escherichia phage halfdan, a novel lineage within the Siphoviridae 290
Interestingly, the singleton halfdan (42.8 kb, 53.7% GC, no tRNAs) has only miniscule similarity with 291
published phages (12-29%, BLAST). These entail two Pseudomonas phages vB_PaeS_SCUT-S3 (MK165657) and 292
Ab26 (HG962376) [67] both Septimatreviruses, two Acinetobacter phages of the Lokivirus IMEAB3 (KF811200) 293
and type species Acinetobacter virus Loki [68], and to a lesser degree the unclassified Achromobacter phage 294
phiAxp-1 (KP313532) [69]. They have a common genomic organization, yet their intra-Gegenees score is ≤1% 295
and NT similarity is negligent in roughly a third of halfdan’s 57 CDSs (Figure 4b, d). The phylogeny and AA 296
similarities (Gegenees) also indicate a distant relation, although grouping halfdan closer with the lokiviruses 297
(40-43%) than the septimatreviruses (33-34%) (Figure 4a, c). The genome of halfdan is mosaic, resembling the 298
lokiviruses in the structural region and the septimatreviruses in the replication region (Figure 4d). 299
Remarkably, halfdan has no NT similarity with known Escherichia phages, which could be an indication of E. 300
coli not being the natural host, and that halfdan may have to transcended species barriers. However, host 301
range, morphology and other physical characteristics are beyond the scope of this study, and will be 302
determined in future lab-based studies. Notably, both halfdan, Loki and Ab26 code for a (putative) MazG, 303
although there is negligible sequence similarity (Figure 4d). Phage encoded MazG is hypothesized to be 304
involved in restoring protein synthesis in a starved cell, in order to keep the cell alive, and ensure optimal 305
replication conditions [70]. The phylogeny, whole genome alignments and low NT and AA similarities 306
suggests that halfdan is distinct from other known phages (Figure 4). Accordingly, as per the ICTV guidelines, 307
halfdan is the first phage sequenced of a novel Siphoviridae genus. Yet, halfdan is also, by its mosaicism, an 308
indicator of the genetic continuum of phages questioning the validity of taxonomic interpretations. 309
310
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
9
(a) (b) (c)
(d)
Figure 4 (a) Phylogenetic tree (Maximum log Likelihood: -7678.71, large terminase subunit), scalebar: substitutions per 311 site. (b) Phylogenomic nucleotide distances (Gegenees, BLASTn: fragment size: 200, step size: 100, threshold: 0%). (c) 312 Phylogenomic amino acid distances (Gegenees, tBLASTx: fragment size: 200, and step size: 100, threshold: 0%). (d) 313 Pairwise alignment of phage genomes (Easyfig, BlASTn), the colour bars between genomes indicate percent pairwise 314 similarity as illustrated by the colour bar in the upper right corner (Easyfig, BlASTn). Gene annotations are provided for 315 phages identified in this study and deposited in GenBank. 316
3.6. Nine novel Podoviridae phage species 317
The nine novel Podoviridae (cluster XIV, 40.7-42.5 kb, 55.4-55.7% GC, no tRNAs), all with the hallmarks of 318
the Autographvirinae i.e. unidirectionally encoded genes and RNA polymerases [64], have conserved genome 319
organisation with the type species of the Phikmvvirus, Pseudomonas virus phiKMV (42.5 kb, 62.3% GC, no 320
tRNAs) [71], but almost no NT similarity (<1%, BLAST). Besides the unclassified Autographvirinae 321
Enterobacteria phage J8-65 (NC_025445) (40.9 kb, 55.7% GC, no tRNAs), with which Cluster XIV has 322
considerable NT similarity (69-93%, BLAST) and similar GC content, Cluster XIV only share >5% nucleotide 323
similarity (40-42%, BLAST) with the Phikmvvirus Pantoea virus Limezero (43.0 kb, 55.4% GC, no tRNAs) [72] 324
Acinetobacter phage IME AB3
Acinetobacter phage vB AbaS Loki
Escherichia phage Halfdan
Pseudomonas phage vB PaeS SCH Ab26
Pseudomonas phage 73
Pseudomonas phage vB paeS SCUT-S3
Acinetobacter phage phiAxp0.050
Organism 1 2 3 4 5 6 7
1: Acinetobacter phage IME_AB3 100 2 0 0 0 0 0
2: Acinetobacter phage vB_AbaS_Loki 2 100 0 0 0 0 0
3: Escherichia phage halfdan 0 0 100 1 0 0 0
4: Pseudomonas phage vB_PaeS-SCH_Ab26 0 0 0 100 83 69 0
5: Pseudomonas phage 73 0 0 0 84 100 68 0
6: Pseudomonas phage vB_PaeS_SCUT-S3 0 0 0 69 69 100 0
7: Achromobacter phage phiAxp-1 0 0 0 0 0 0 100
Heat plot for: Comparison[Halfdan_and_friends_210319], Alig... file:///Users/au574427/Desktop/Coli_unclass/Halfdan/halfdan_n...
1 of 1 29/03/2019, 22.19
Organism 1 2 3 4 5 6 7
1: Acinetobacter phage IME_AB3 100 62 40 26 26 26 23
2: Acinetobacter phage vB_AbaS_Loki 64 100 43 28 27 27 24
3: Escherichia phage halfdan 40 42 100 33 33 33 27
4: Pseudomonas phage vB_PaeS-SCH_Ab26 26 27 33 100 89 77 25
5: Pseudomonas phage 73 26 27 34 89 100 76 24
6: Pseudomonas phage vB_PaeS_SCUT-S3 26 27 33 77 77 100 24
7: Achromobacter phage phiAxp-1 23 23 26 24 24 24 100
Heat plot for: Comparison[hlafdan_tblastx_210319], Alignment[... file:///Users/au574427/Desktop/Coli_unclass/Halfdan/halfdan_2...
1 of 1 29/03/2019, 22.19
100%
64%
Term. Port. Tape m. CoatTube Polym.MazG
Prim.
Lysis
IME_AB3
Loki
Halfdan
vB_PaeS_SCH_Ab26
73
vB_PaeS_SCUT-S3
phiAxp-1
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
10
(Figure 5, Table 2). The Phikmvvirus consists of four species, phiKMV, Limezero, Pantoea virus Limelight (44.5 kb, 325
54% GC, no tRNAs) [72] and Pseudomonas virus LKA1 (41.6 kb, 60.9% GC, no tRNAs) [73]. 326
(a) (b) (c)
(d)
Figure 5 (a) Phylogenetic tree (Maximum log Likelihood: -11728.26, large terminase subunit), scalebar: substitutions per 327 site. (b) Phylogenomic nucleotide distances (Gegenees, BLASTn: fragment size: 200, step size: 100, threshold: 0%). (c) 328 Phylogenomic amino acid distances (Gegenees, tBLASTx: fragment size: 200, and step size: 100, threshold: 0%). (d) 329 Pairwise alignment of phage genomes (Easyfig, BLASTn), the colour bars between genomes indicate percent pairwise 330 similarity as illustrated by the colour bar in the upper right corner (Easyfig, BlASTn). Gene annotations are provided for 331 phages identified in this study and deposited in GenBank. 332
Escherichia phage lidtsurEscherichia phage smaasurEnterobacteria phage J8-65
Escerichia phage glasurEscherichia phage forsur
Escerichia phage usurEscherichia phage megetsurEschericia phage mellemsur
Escherichia phage altidsurEscherichia phage aldrigsur
Pantoea phage LIMEzeroPantoea phage LIMElight
Pseudomonas phage phiKMVPseudomonas phage LKA1-
0.10
Organism 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1: MG_126_C2_Ent_J8-65 100 22 22 17 15 15 13 13 15 14 1 0 0 0
2: MG_039_C2_Ent_J8-65 22 100 84 25 25 25 27 28 23 23 0 0 0 0
3: Enterobacteria_phage_J8-65 23 85 100 27 25 25 27 28 24 24 1 0 0 0
4: MG_168_C20_Ent_J8-65 17 24 26 100 88 83 46 46 44 43 1 0 0 0
5: MG_200_C27_Ent_J8-65 16 24 25 88 100 85 46 46 42 41 1 0 0 0
6: MG_116_C10_Ent_J8-65 16 24 25 84 86 100 48 47 43 43 1 0 0 0
7: MG_130_C3_Ent_J8-65 14 26 26 47 47 48 100 90 67 65 0 0 0 0
8: MG_144_C2_Ent_J8-65 14 28 28 47 48 48 93 100 68 69 1 0 0 0
9: MG_140_C1_Ent_J8-65 15 23 23 44 43 44 67 66 100 93 0 0 0 0
10: MG_182_C43_Ent_J8+65 14 23 23 44 43 43 66 67 93 100 1 0 0 0
11: Pantoea LIMEzero 1 0 1 1 1 1 0 0 0 0 100 0 0 0
12: LIMElight 0 0 0 0 0 0 0 0 0 0 0 100 0 0
13: phiKMV 0 0 0 0 0 0 0 0 0 0 0 0 100 0
14: LKA1 0 0 0 0 0 0 0 0 0 0 0 0 0 100
Heat plot for: Comparison[podos_genuse_290319], Alignment[a... file:///Users/au574427/Desktop/Coli_Podos/podos_genus_29031...
1 of 1 29/03/2019, 18.17
Organism 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1: MG_126_C2_Ent_J8-65 100 70 69 68 68 68 66 66 66 66 47 19 19 20
2: MG_039_C2_Ent_J8-65 72 100 91 72 71 71 72 72 71 71 47 19 20 20
3: Enterobacteria_phage_J8-65 70 91 100 72 72 72 73 72 72 72 47 19 20 20
4: MG_168_C20_Ent_J8-65 67 69 70 100 92 89 77 75 77 77 46 19 19 20
5: MG_200_C27_Ent_J8-65 67 69 70 92 100 90 77 77 77 77 46 19 19 20
6: MG_116_C10_Ent_J8-65 68 70 70 90 91 100 77 77 77 77 47 19 19 20
7: MG_130_C3_Ent_J8-65 66 71 71 78 78 77 100 92 84 84 47 19 19 20
8: MG_144_C2_Ent_J8-65 68 73 73 78 79 79 95 100 86 86 48 19 19 20
9: MG_140_C1_Ent_J8-65 65 70 71 77 77 77 84 83 100 95 46 19 19 20
10: MG_182_C43_Ent_J8+65 65 69 70 77 77 76 84 83 95 100 47 19 19 20
11: Pantoea LIMEzero 45 45 45 46 46 46 46 46 46 46 100 19 19 20
12: LIMElight 19 19 19 19 19 19 19 19 19 19 19 100 19 20
13: phiKMV 19 19 19 19 19 19 19 19 19 19 19 19 100 27
14: LKA1 20 20 20 20 20 20 20 20 20 20 20 20 27 100
Heat plot for: Comparison[podos_tblastx_290319], Alignment[a... file:///Users/au574427/Desktop/Coli_Podos/podos_tblastx_2903...
1 of 1 29/03/2019, 18.23
100%
64%
Term.TailspikePort.RNA pol.DNA pol. Tail Virion VirionLidtsur
Smaasur
J8-65
Glasur
Forsur
Usur
Megetsur
Mellemsur
Altidsur
Aldrigsur
LIMEzero
LIMElight
phiKMV
LKA1
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
11
Cluster XIV and J8-65 form a diverse monophyletic clade, with a substantial amount of deletions and 333
insertions between them, subdividing into three sub-clusters with intra-Gegenees scores ≤28% (Figure 5a, d). 334
Phage lidtsur is singled-out and also codes for a unique version of tailspike colanidase, smaasur resembles J8-335
65 phage and the rest group together (Figure 5). Still, the three sub-clusters have for a large part conserved AA 336
sequences (66-72%, Gegenees) (Figure 5b, c, d). Interestingly, this also applies to Limezero, with whom they 337
have a Gegenees NT score ≤1%, but an AA similarity of 45-48%, supporting the phylogenetic grouping of 338
Limezero and cluster XIV (Figure 5a, b, c). Based on phylogeny, limited NT and low AA similarities it is evident 339
that there is a very distant relation between cluster XIV (and J8-65) and the phikmvviruses (Figure 5). Hence, 340
cluster XIV (and J8-65) are not immediately, according to the ICTV guidelines, considered phikmvviruses. 341
However, the Phikmvvirus already includes phages with minuscule DNA homology infecting divergent hosts 342
[72,73]. The genus delimitation is based on overall genome architecture with the location of a single-subunit 343
RNA polymerase gene adjacent to the structural genes, and not in the early region as in T7-like phages [16]. 344
Another characteristic feature of the phikmvvirus is the presence of direct terminal repeats, though not yet 345
verified in all members [72]. Read abundance in non-coding regions in genomic ends of cluster XIV genomes 346
suggests they also have direct terminal repeats of a few hundred bp. In conclusion, we consider the cluster XIV 347
phages (and J8-65) to be of the same lineage as the phikmvviruses, but contemplate that this genus may at 348
some point be divided into at least two independent genera, as we learn more of the nature of the genes which 349
distinguish cluster XIV from the other phikmvviruses on both NT and AA level and account for more 50% of 350
their genomes. 351
3.7. Unclassified bacterial viruses represent two novel lineages 352
The four novel phages sortsyn of cluster XIII and the three singletons sortsne, sortkaff (41.9-42.5 kb, 59-353
60% GC, no tRNAs) and skarpretter (42.0 kb, 55.8% GC, no tRNAs) all have small genomes and high GC 354
contents (Figure 3) and are suggested to represent two distinct novel lineages. They only share considerable 355
(≥6%, BLAST) NT similarity with four phages, Enterobacteria phage IME_EC2 (KF591601)(41.5 kb, 59.2% GC, 356
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
12
no tRNAs) isolated from hospital sewage [74], Salmonella phage lumpael (MK125141)(41.4 kb, 59.5% GC, no 357
tRNAs) isolated from wastewater of the same sample-set in a previous study, Klebsiella phage 358
vB_KpnS_IME279 (MF614100) (42.5 kb, 59.3% GC, no tRNAs) and Escherichia phage C130_2 (MH363708)(41.7 359
kb, 55.4% GC, no tRNAs) isolated from cheese [75]. 360
(a) (b) (c)
(d)
Figure 6 (a) Phylogenetic tree (Maximum log Likelihood: -8023.43, large terminase subunit), scalebar: substitutions per 361 site. (b) Phylogenomic nucleotide distances (Gegenees, BLASTn: fragment size: 200, step size: 100, threshold: 0%). (c) 362 Phylogenomic amino acid distances (Gegenees, tBLASTx: fragment size: 200, and step size: 100, threshold: 0%). (d) 363 Pairwise alignment of phage genomes (Easyfig, BlASTn), the colour bars between genomes indicate percent pairwise 364 similarity as illustrated by the colour bar in the upper right corner (Easyfig, BlASTn). Gene annotations are provided for 365 phages identified in this study and deposited in GenBank. 366
367
Curiously, IME_EC2 is a confirmed (TEM) Podoviridae, with a short non-contractile tail [74], while C130_2 368
is a confirmed (TEM) Myoviridae with a long contractile tail [75]. Lumpael and vB_KpnS_IME279 are 369
uncharacterised, yet both are assigned as Podoviridae in GenBank [17]. Although sortsne, sortkaff, sortsyn, 370
Escherichia phage Sortkaff
Klebsiella phage vB KpnS IME279
Escherichia phage Sortsne
Escherichia phage sortsyn
Enterobacteria phage IME EC2
Salmonella phage Lumpael
Escherichia phage Skarpretter
Escherichia phage C130 2
0.050
Organism 1 2 3 4 5 6 7 8
1: Sortkaff_ny start 100 86 13 6 6 6 0 0
2: MF614100_IME279 86 100 14 5 5 6 0 0
3: Escherichia phage Sortsne 13 14 100 13 15 11 0 0
4: Sortsyn_new start 7 5 13 100 87 42 0 0
5: KF591601_IME_EC2 6 5 15 88 100 40 0 0
6: MK125141_Lumpael 7 7 12 42 40 100 0 0
7: MK105855_Skarpretter_new start 0 0 0 0 0 0 100 1
8: MH363708_C130_2 0 0 0 0 0 0 1 100
Heat plot for: Comparison[Sortsne_and_the_gang_200319], Ali... file:///Users/au574427/Desktop/Coli_unclass/Skarpretter/skarp_...
1 of 1 30/03/2019, 07.37
Organism 1 2 3 4 5 6 7 8
1: Sortkaff_ny start 100 91 60 50 50 50 33 32
2: MF614100_IME279 91 100 60 49 50 49 33 32
3: Escherichia phage Sortsne 61 61 100 54 55 54 33 31
4: Sortsyn_new start 51 50 54 100 91 67 33 33
5: KF591601_IME_EC2 51 50 56 92 100 67 33 33
6: MK125141_Lumpael 51 51 55 68 68 100 33 30
7: MK105855_Skarpretter_new start 33 33 32 33 33 32 100 43
8: MH363708_C130_2 32 32 31 33 33 30 43 100
Heat plot for: Comparison[skarp_and_freinds_tblastx], Alignmen... file:///Users/au574427/Desktop/Coli_unclass/Skarpretter/skarp_t...
1 of 1 30/03/2019, 07.38
100%
64%
Holin
Lysis
Methylationhflc Term.PortalTailspikeHelica. Coat
Sortkaff
vB_KonS_IME279
Sortsne
Sortsyn
IME_EC2
Lumpael
Skarpretter
C130_2
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
13
vB_KpnS_IME279, IME_EC2 and lumpael form a monophyletic clade, the Gegenees score (5-15%) between 371
sub-clusters is surprisingly low (Figure 6a, b). Still, these six phages have comparable genome sizes (41.5-42.5 372
kb) and organisation, including similar relatively small-sized structural proteins, dam and dcm genes, equally 373
high GC contents (59.5±0.5%) and relatively high AA similarities (49-91%) (Figure 6a, c, Table 2). Hence, they 374
are clearly of the same lineage and likely to resemble IME_EC2 in having Podoviridae morphology. This group 375
is also clearly distinct from all other known phages (<5%, BLAST) and as such constitute a novel genus, with 376
a delimitation to be determined by future physical characterisation. Skarpretter and C130_2 form a 377
monophyletic clade and both have slightly, though not significantly (P = 0.38), lower GC contents (55.6±0.2%), 378
indicating a common ancestry and host. The conserved genome organisation and clustering of skarpretter 379
with the other Podoviridae (Figure 1, 6d), suggests skarpretter also has Podoviridae morphology. Still, its closest 380
relative C130_2 is described as a Myoviridae [75], leaving the true morphology of skarpretter a conundrum 381
only resolved by TEM imaging. According to ICTV guidelines both skarpretter and C130_2 are representatives 382
of novel linages, as they are both clearly distinct from all other known phages. Skarpretter and C130_2 have 383
very low NT (1% Gegenees, 38% BLAST) and AA similarities (43%) (Figure 6b, c). Indeed, skarpretter has ≤20% 384
NT similarity (BLAST) with all other published phages. In addition, skarpretter codes for a putative hflc gene 385
possibly inhibiting proteolysis of essential phage proteins, located in the replication region (Figure 6d). This 386
gene, has to our knowledge never before been observed in phages, but occurs in almost all proteobacteria, 387
including E. coli MG1655 [76]. 388
4. Conclusion 389
By screening Danish wastewater, we identified no less than 104 unique Escherichia MG1655 phages, but 390
predict the species richness to be at least in the range of 160-420, and even higher if including phages infecting 391
other Escherichia species, though it is expected to fluctuate drastically over time and both within and between 392
treatment facilities. Among the unique phages 91 represent novel phage species of at least four different 393
families Myoviridae, Siphoviridae, Podoviridae and Microviridae. The diversity of these phages is striking, they 394
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
14
vary greatly in genome size and have a very broad GC-content range - possibly prompted by former or current 395
alternate hosts. These findings add to our growing understanding of phage ecology and diversity, and through 396
classification of these many phages we come yet another step closer to a more refined taxonomic 397
understanding of phages. Furthermore, the numerous and diverse phages isolated in this study, all lytic to the 398
same single strain, serve as an excellent opportunity to learn important phage-host interactions in future 399
studies. These include, but are not limited to lysogen induced phage immunity, host-range and anti-RE 400
systems. Finally, apart from substantial contributions to known genera, we came upon several unclassified 401
lineages, some completely novel, which in our opinion constitute novel phage genera. We consider sortregn, 402
sortkaff, sortsyn and sortsne together with lumpael, vB_kpnS_IME278 and IME_EC2 to constitute a novel 403
genus within the Podoviridae. The Myoviridae flopper joins an interesting group of not yet classified phages 404
with Myoviridae morphology and Autographvirinae characteristics, just as cluster VII together with five 405
unclassified phages represent a novel unclassified genus within the Myoviridae. Lastly, we consider the 406
Siphoviridae halfdan and the unclassified bacterial virus skarpretter to be the first sequenced representatives of 407
each their novel genus. In conclusion, this study shows that uncharted territory still remains for even well-408
studied phage-host couples. 409
Supplementary Materials: Figure S1: Microviridae, Figure S2: Hanriverviruses, Figure S3: Dhillonviruses, Table 1 410
Auxiliary metabolism genes. 411
Funding: This research was funded by Villum Experiment Grant 17595, Aarhus University Research Foundation AUFF 412
Grant E-2015-FLS-7-28 to Witold Kot and Human Frontier Science Program RGP0024/2018 413
Acknowledgments: This study had not been possible without the much-appreciated contribution by the many members 414
of the Danish Water and Wastewater Association (DANVA), who kindly supplied us with time-series of wastewater 415
samples from their treatment facilities. A special thanks to the Billund and Grindsted treatment facilities of Billund Vand 416
& Energi, Lynetten, Avedøre and Damhusåen treatment facilities of BIOFOS, Kolding treatment facility of BlueKolding, 417
Esbjerg Øst, Esbjerg Vest, Ribe, Varde og Skovlund treatment facilities of DIN Forsyning, Drøsbro, Hadsten, Hammel, 418
Hinnerup and Voldum treatment facilities of Favrskov Forsyning, Haerning treatment facility of Herning Vand, Hillerød 419
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
15
treatment facility of Hillerød Forsyning,Lemvig treatment facility of Lemvig Vand og Spildevand, Bevtoft, Gram, 420
Haderslev, Halk, Jegerup, Nustrup, Over Jerstal, Skovby, Skrydstrup, Sommersted, Vojens and Årøsund treatment 421
facilities of Provas, Marselisborg, Egå and Viby treatment facilities of Aarhus vand, Hedensted, Juelsminde and Tørring 422
treatment facilities of Hedensted Spildevand, Ejby Mølle, Bogense, Hofmansgave, Hårslev, Nordvest, Nordøst, Otterup 423
and Søndersø treatment facilities of VandCenterSyd, Øst and Vest treatment facilities of Aalborg Forsyning and finally 424
Helsingør treatment facility of Forsyning Helsingør. 425
Competing interests: The authors declare no competing interests. 426
References 427
1. Weinbauer, M.G.; Rassoulzadegan, F. Are viruses driving microbial diversification and diversity? Environ. 428
Microbiol. 2003, 6, 1–11. 429
2. Weitz, J.S.; Wilhelm, S.W. Ocean viruses and their effects on microbial communities and biogeochemical cycles. 430
F1000 Biol. Rep. 2012, 4, 17. 431
3. Crummett, L.T.; Puxty, R.J.; Weihe, C.; Marston, M.F.; Martiny, J.B.H. The genomic content and context of 432
auxiliary metabolic genes in marine cyanomyoviruses. Virology 2016, 499, 219–229. 433
4. Boyd, E.F. Bacteriophage-Encoded Bacterial Virulence Factors and Phage–Pathogenicity Island Interactions. Adv. 434
Virus Res. 2012, 82, 91–118. 435
5. Fortier, L.-C.; Sekulovic, O. Importance of prophages to evolution and virulence of bacterial pathogens. 436
https://doi.org/10.4161/viru.24498 2013. 437
6. Volkova, V. V; Lu, Z.; Besser, T.; Gröhn, Y.T. Modeling the infection dynamics of bacteriophages in enteric 438
Escherichia coli: estimating the contribution of transduction to antimicrobial gene spread. Appl. Environ. 439
Microbiol. 2014, 80, 4350–62. 440
7. Bearson, B.L.; Allen, H.K.; Brunelle, B.W.; Lee, I.S.; Casjens, S.R.; Stanton, T.B. The agricultural antibiotic 441
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
16
carbadox induces phage-mediated gene transfer in Salmonella. Front. Microbiol. 2014, 5, 52. 442
8. Grose, J.H.; Casjens, S.R. Understanding the enormous diversity of bacteriophages: the tailed phages that infect 443
the bacterial family Enterobacteriaceae. Virology 2014, 468–470, 421–443. 444
9. Dykhuizen, D. Species Numbers in Bacteria. Proc. Calif. Acad. Sci. 2005, 56, 62. 445
10. Hatfull, G.F.; Pedulla, M.L.; Jacobs-Sera, D.; Cichon, P.M.; Foley, A.; Ford, M.E.; Gonda, R.M.; Houtz, J.M.; 446
Hryckowian, A.J.; Kelchner, V.A.; et al. Exploring the Mycobacteriophage Metaproteome: Phage Genomics as an 447
Educational Platform. PLoS Genet. 2006, 2, e92. 448
11. Hatfull, G.F. Mycobacteriophages: Windows into Tuberculosis. PLoS Pathog. 2014, 10, e1003953. 449
12. Hatfull, G.F.; Jacobs-Sera, D.; Lawrence, J.G.; Pope, W.H.; Russell, D.A.; Ko, C.C.; Weber, R.J.; Patel, M.C.; 450
Germane, K.L.; Edgar, R.H.; et al. Comparative Genomic Analysis of 60 Mycobacteriophage Genomes: Genome 451
Clustering, Gene Acquisition, and Gene Size. J. Mol. Biol. 2010, 397, 119–143. 452
13. Dedrick, R.M.; Jacobs-Sera, D.; Bustamante, C.A.G.; Garlena, R.A.; Mavrich, T.N.; Pope, W.H.; Reyes, J.C.C.; 453
Russell, D.A.; Adair, T.; Alvey, R.; et al. Prophage-mediated defence against viral attack and viral counter-454
defence. Nat. Microbiol. 2017, 2, 16251. 455
14. Jacobs-Sera, D.; Marinelli, L.J.; Bowman, C.; Broussard, G.W.; Guerrero Bustamante, C.; Boyle, M.M.; Petrova, 456
Z.O.; Dedrick, R.M.; Pope, W.H.; Science Education Alliance Phage Hunters Advancing Genomics And 457
Evolutionary Science Sea-Phages Program, S.E.A.P.H.A.G. and E.S. (SEA-P.; et al. On the nature of 458
mycobacteriophage diversity and host preference. Virology 2012, 434, 187–201. 459
15. Hatfull, G.F. The Secret Lives of Mycobacteriophages. Adv. Virus Res. 2012, 82, 179–288. 460
16. Lefkowitz, E.J.; Dempsey, D.M.; Hendrickson, R.C.; Orton, R.J.; Siddell, S.G.; Smith, D.B. Virus taxonomy: the 461
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
17
database of the International Committee on Taxonomy of Viruses (ICTV). Nucleic Acids Res. 2018, 46, D708–D717. 462
17. Benson, D.A.; Cavanaugh, M.; Clark, K.; Karsch-Mizrachi, I.; Lipman, D.J.; Ostell, J.; Sayers, E.W. GenBank. 463
Nucleic Acids Res. 2017, 45, D37–D42. 464
18. Lawrence, J.G.; Hatfull, G.F.; Hendrix, R.W. Imbroglios of viral taxonomy: genetic exchange and failings of 465
phenetic approaches. J. Bacteriol. 2002, 184, 4891–905. 466
19. Aiewsakun, P.; Adriaenssens, E.M.; Lavigne, R.; Kropinski, A.M.; Simmonds, P. Evaluation of the genomic 467
diversity of viruses infecting bacteria, archaea and eukaryotes using a common bioinformatic platform: steps 468
towards a unified taxonomy. J. Gen. Virol. 2018, 99, 1331–1343. 469
20. Nelson, D. Phage taxonomy: we agree to disagree. J. Bacteriol. 2004, 186, 7029–31. 470
21. Adriaenssens, E.; Brister, J.R. How to Name and Classify Your Phage: An Informal Guide. Viruses 2017, 9. 471
22. Korf; Meier-Kolthoff; Adriaenssens; Kropinski; Nimtz; Rohde; van Raaij; Wittmann Still Something to Discover: 472
Novel Insights into Escherichia coli Phage Diversity and Taxonomy. Viruses 2019, 11, 454. 473
23. Jurczak-Kurek, A.; Gąsior, T.; Nejman-Faleńczyk, B.; Bloch, S.; Dydecka, A.; Topka, G.; Necel, A.; Jakubowska-474
Deredas, M.; Narajczyk, M.; Richert, M.; et al. Biodiversity of bacteriophages: morphological and biological 475
properties of a large group of phages isolated from urban sewage. Sci. Rep. 2016, 6, 34338. 476
24. Sambrook, J.F.; Russell, D.W.; Editors Molecular cloning: A laboratory manual, third edition.; 2nd ed.; Cold Spring 477
Harbor Laboratory, 2000; ISBN 0 87969 309 6. 478
25. Kot, W.; Vogensen, F.K.; Sørensen, S.J.; Hansen, L.H. DPS – A rapid method for genome sequencing of DNA-479
containing bacteriophages directly from a single plaque. J. Virol. Methods 2014, 196, 152–156. 480
26. Bankevich, A.; Nurk, S.; Antipov, D.; Gurevich, A.A.; Dvorkin, M.; Kulikov, A.S.; Lesin, V.M.; Nikolenko, S.I.; 481
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
18
Pham, S.; Prjibelski, A.D.; et al. SPAdes: a new genome assembly algorithm and its applications to single-cell 482
sequencing. J. Comput. Biol. 2012, 19, 455–77. 483
27. Brettin, T.; Davis, J.J.; Disz, T.; Edwards, R.A.; Gerdes, S.; Olsen, G.J.; Olson, R.; Overbeek, R.; Parrello, B.; Pusch, 484
G.D.; et al. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom 485
annotation pipelines and annotating batches of genomes. Sci. Rep. 2015, 5, 8365. 486
28. Besemer, J.; Borodovsky, M. GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. 487
Nucleic Acids Res. 2005, 33, W451–W454. 488
29. Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 1990, 489
215, 403–410. 490
30. Soding, J.; Biegert, A.; Lupas, A.N. The HHpred interactive server for protein homology detection and structure 491
prediction. Nucleic Acids Res. 2005, 33, W244–W248. 492
31. Finn, R.D.; Coggill, P.; Eberhardt, R.Y.; Eddy, S.R.; Mistry, J.; Mitchell, A.L.; Potter, S.C.; Punta, M.; Qureshi, M.; 493
Sangrador-Vegas, A.; et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids 494
Res. 2016, 44, D279–D285. 495
32. Enrique González-Tortuero1, Thomas David Sean Sutton1, Vimalkumar Velayudhan1, 4 Andrey Nikolaevich 496
Shkoporov1, Lorraine Anne Draper1, Stephen Robert Stockdale1, 2, Reynolds 5 Paul Ross1, 3, Colin Hill1, 3, 6 497
VIGA: a sensitive, precise and automatic de novo VIral Genome Annotator. bioRxiv 2018, 277509. 498
33. Zankari, E.; Hasman, H.; Cosentino, S.; Vestergaard, M.; Rasmussen, S.; Lund, O.; Aarestrup, F.M.; Larsen, M.V. 499
Identification of acquired antimicrobial resistance genes. J. Antimicrob. Chemother. 2012, 67, 2640–4. 500
34. Kleinheinz, K.A.; Joensen, K.G.; Larsen, M.V. Applying the ResFinder and VirulenceFinder web-services for easy 501
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
19
identification of acquired antibiotic resistance and E. coli virulence genes in bacteriophage and prophage 502
nucleotide sequences. Bacteriophage 2014, 4, e27943. 503
35. Joensen, K.G.; Scheutz, F.; Lund, O.; Hasman, H.; Kaas, R.S.; Nielsen, E.M.; Aarestrup, F.M. Real-Time Whole-504
Genome Sequencing for Routine Typing, Surveillance, and Outbreak Detection of Verotoxigenic Escherichia coli. 505
J. Clin. Microbiol. 2014, 52, 1501–1510. 506
36. Roberts, R.J.; Vincze, T.; Posfai, J.; Macelis, D. REBASE—a database for DNA restriction and modification: 507
enzymes, genes and genomes. Nucleic Acids Res. 2015, 43, D298–D299. 508
37. Kieft, K.; Zhou, Z.; Anantharaman, K. VIBRANT: Automated recovery, annotation and curation of microbial 509
viruses, and evaluation of virome function from genomic sequences. bioRxiv 2019, 855387. 510
38. Adriaenssens, E.; Brister, J.R. How to Name and Classify Your Phage: An Informal Guide. Viruses 2017, 9. 511
39. Ågren, J.; Sundström, A.; Håfström, T.; Segerman, B. Gegenees: Fragmented Alignment of Multiple Genomes for 512
Determining Phylogenomic Distances and Genetic Signatures Unique for Specified Target Groups. PLoS One 513
2012, 7, e39107. 514
40. Kumar, S.; Stecher, G.; Tamura, K. MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger 515
Datasets. Mol. Biol. Evol. 2016, 33, 1870–1874. 516
41. Lopes, A.; Tavares, P.; Petit, M.-A.; Guérois, R.; Zinn-Justin, S. Automated classification of tailed bacteriophages 517
according to their neck organization. BMC Genomics 2014, 15, 1027. 518
42. Mercanti, D.J.; Rousseau, G.M.; Capra, M.L.; Quiberoni, A.; Tremblay, D.M.; Labrie, S.J.; Moineau, S. Genomic 519
Diversity of Phages Infecting Probiotic Strains of Lactobacillus paracasei. Appl. Environ. Microbiol. 2016, 82, 95–520
105. 521
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
20
43. Edgar, R.C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 522
2004, 32, 1792–7. 523
44. Tamura, K.; Nei, M. Estimation of the number of nucleotide substitutions in the control region of mitochondrial 524
DNA in humans and chimpanzees. Mol. Biol. Evol. 1993, 10, 512–26. 525
45. Sullivan, M.J.; Petty, N.K.; Beatson, S.A. Easyfig: a genome comparison visualizer. Bioinformatics 2011, 27, 1009. 526
46. Hsieh, T.C.; Ma, K.H.; Chao, A. iNEXT: an R package for rarefaction and extrapolation of species diversity (Hill 527
numbers). Methods Ecol. Evol. 2016, 7, 1451–1456. 528
47. Chao, A.; Gotelli, N.J.; Hsieh, T.C.; Sander, E.L.; Ma, K.H.; Colwell, R.K.; Ellison, A.M. Rarefaction and 529
extrapolation with Hill numbers: A framework for sampling and estimation in species diversity studies. Ecol. 530
Monogr. 2014, 84, 45–67. 531
48. RStudio Team RStudio: Integrated Development for R. 2016. 532
49. Rocha, E.P.C.; Danchin, A. Base composition bias might result from competition for metabolic resources. Trends 533
Genet. 2002, 18, 291–294. 534
50. Motlagh, A.M.; Bhattacharjee, A.S.; Coutinho, F.H.; Dutilh, B.E.; Casjens, S.R.; Goel, R.K. Insights of Phage-Host 535
Interaction in Hypersaline Ecosystem through Metagenomics Analyses. Front. Microbiol. 2017, 8, 352. 536
51. Thomas, J.A.; Orwenyo, J.; Wang, L.-X.; Black, L.W. The Odd "RB" Phage-Identification of 537
Arabinosylation as a New Epigenetic Modification of DNA in T4-Like Phage RB69. Viruses 2018, 10. 538
52. Purohit, S.; Bestwick, K.; Lasser, G.W.; Rogers, C.M.; Mathews, C.K. T4 Phage-coded Dihydrofolate Reductase. 539
1981, 256, 9121–9125. 540
53. Hutinet, G.; Kot, W.; Cui, L.; Hillebrand, R.; Balamkundu, S.; Gnanakalai, S.; Neelakandan, R.; Carstens, A.B.; 541
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
21
Lui, C.F.; Tremblay, D.; et al. 7-Deazaguanine modifications protect phage DNA from host restriction systems. 542
Nature 2019, 1–12. 543
54. Carstens, A.B.; Kot, W.; Lametsch, R.; Neve, H.; Hansen, L.H. Characterisation of a novel enterobacteria phage, 544
CAjan, isolated from rat faeces. Arch. Virol. 2016, 161, 2219–2226. 545
55. Shkoporov, A.N.; Clooney, A.G.; Sutton, T.D.S.; Ryan, F.J.; Daly, K.M.; Nolan, J.A.; McDonnell, S.A.; Khokhlova, 546
E. V.; Draper, L.A.; Forde, A.; et al. The Human Gut Virome Is Highly Diverse, Stable, and Individual Specific. 547
Cell Host Microbe 2019, 26, 527-541.e5. 548
56. Tsonos, J.; Adriaenssens, E.M.; Klumpp, J.; Hernalsteens, J.-P.; Lavigne, R.; De Greve, H. Complete genome 549
sequence of the novel Escherichia coli phage phAPEC8. J. Virol. 2012, 86, 13117–8. 550
57. Schwarzer, D.; Buettner, F.F.R.; Browning, C.; Nazarov, S.; Rabsch, W.; Bethe, A.; Oberbeck, A.; Bowman, V.D.; 551
Stummeyer, K.; Mühlenhoff, M.; et al. A multivalent adsorption apparatus explains the broad host range of 552
phage phi92: a comprehensive genomic and structural analysis. J. Virol. 2012, 86, 10384–98. 553
58. Schwarzer, D.; Browning, C.; Stummeyer, K.; Oberbeck, A.; Mühlenhoff, M.; Gerardy-Schahn, R.; Leiman, P.G. 554
Structure and biochemical characterization of bacteriophage phi92 endosialidase. Virology 2015, 477, 133–143. 555
59. Liu, H.; Geagea, H.; Rousseau, G.M.; Labrie, S.J.; Tremblay, D.M.; Liu, X.; Moineau, S. Characterization of the 556
Escherichia coli Virulent Myophage ST32. Viruses 2018, 10. 557
60. Jamalludeen, N.; Kropinski, A.M.; Johnson, R.P.; Lingohr, E.; Harel, J.; Gyles, C.L. Complete genomic sequence 558
of bacteriophage phiEcoM-GJ1, a novel phage that has myovirus morphology and a podovirus-like RNA 559
polymerase. Appl. Environ. Microbiol. 2008, 74, 516–25. 560
61. Thompson, D.W.; Casjens, S.R.; Sharma, R.; Grose, J.H. Genomic comparison of 60 completely sequenced 561
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
22
bacteriophages that infect Erwinia and/or Pantoea bacteria. Virology 2019, 535, 59–73. 562
62. Born, Y.; Fieseler, L.; Marazzi, J.; Lurz, R.; Duffy, B.; Loessner, M.J. Novel Virulent and Broad-Host-Range 563
Erwinia amylovora Bacteriophages Reveal a High Degree of Mosaicism and a Relationship to Enterobacteriaceae 564
Phages. Appl. Environ. Microbiol. 2011, 77, 5945–5954. 565
63. Lim, J.-A.; Shin, H.; Lee, D.H.; Han, S.-W.; Lee, J.-H.; Ryu, S.; Heu, S. Complete genome sequence of the 566
Pectobacterium carotovorum subsp. carotovorum virulent bacteriophage PM1. Arch. Virol. 2014, 159, 2185–2187. 567
64. Lavigne, R.; Seto, D.; Mahadevan, P.; Ackermann, H.-W.; Kropinski, A.M. Unifying classical and molecular 568
taxonomic classification: analysis of the Podoviridae using BLASTP-based tools. Res. Microbiol. 2008, 159, 406–569
414. 570
65. Doore, S.M.; Fane, B.A. The microviridae: Diversity, assembly, and experimental evolution. Virology 2016, 491, 571
45–55. 572
66. Jun, J.W.; Kim, J.H.; Shin, S.P.; Han, J.E.; Chai, J.Y.; Park, S.C. Characterization and complete genome sequence of 573
the Shigella bacteriophage pSf-1. Res. Microbiol. 2013, 164, 979–986. 574
67. Essoh, C.; Latino, L.; Midoux, C.; Blouin, Y.; Loukou, G.; Nguetta, S.-P.A.; Lathro, S.; Cablanmian, A.; Kouassi, 575
A.K.; Vergnaud, G.; et al. Investigation of a Large Collection of Pseudomonas aeruginosa Bacteriophages 576
Collected from a Single Environmental Source in Abidjan, Côte d’Ivoire. PLoS One 2015, 10, e0130548. 577
68. Turner, D.; Wand, M.E.; Briers, Y.; Lavigne, R.; Sutton, J.M.; Reynolds, D.M. Characterisation and genome 578
sequence of the lytic Acinetobacter baumannii bacteriophage vB_AbaS_Loki. PLoS One 2017, 12, e0172303. 579
69. Ma, Y.; Li, E.; Qi, Z.; Li, H.; Wei, X.; Lin, W.; Zhao, R.; Jiang, A.; Yang, H.; Yin, Z.; et al. Isolation and molecular 580
characterisation of Achromobacter phage phiAxp-3, an N4-like bacteriophage. Sci. Rep. 2016, 6, 24776. 581
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
23
70. Bryan, M.J.; Burroughs, N.J.; Spence, E.M.; Clokie, M.R.J.; Mann, N.H.; Bryan, S.J. Evidence for the intense 582
exchange of MazG in marine cyanophages by horizontal gene transfer. PLoS One 2008, 3, 1–12. 583
71. Lavigne, R.; Burkal’tseva, M. V; Robben, J.; Sykilinda, N.N.; Kurochkina, L.P.; Grymonprez, B.; Jonckx, B.; 584
Krylov, V.N.; Mesyanzhinov, V. V; Volckaert, G. The genome of bacteriophage φKMV, a T7-like virus infecting 585
Pseudomonas aeruginosa. Virology 2003, 312, 49–59. 586
72. Adriaenssens, E.M.; Ceyssens, P.-J.; Dunon, V.; Ackermann, H.-W.; Van Vaerenbergh, J.; Maes, M.; De Proft, M.; 587
Lavigne, R. Bacteriophages LIMElight and LIMEzero of Pantoea agglomerans, belonging to the "phiKMV-588
like viruses". Appl. Environ. Microbiol. 2011, 77, 3443–50. 589
73. Ceyssens, P.-J.; Lavigne, R.; Mattheus, W.; Chibeu, A.; Hertveldt, K.; Mast, J.; Robben, J.; Volckaert, G. Genomic 590
analysis of Pseudomonas aeruginosa phages LKD16 and LKA1: establishment of the phiKMV subgroup within 591
the T7 supergroup. J. Bacteriol. 2006, 188, 6924–31. 592
74. Hua, Y.; An, X.; Pei, G.; Li, S.; Wang, W.; Xu, X.; Fan, H.; Huang, Y.; Zhang, Z.; Mi, Z.; et al. Characterization of 593
the morphology and genome of an Escherichia coli podovirus. Arch. Virol. 2014, 159, 3249–3256. 594
75. Sváb, D.; Falgenhauer, L.; Rohde, M.; Chakraborty, T.; Tóth, I. Complete genome sequence of C130_2, a novel 595
myovirus infecting pathogenic Escherichia coli and Shigella strains. Arch. Virol. 2019, 164, 321–324. 596
76. Bandyopadhyay, K.; Parua, P.K.; Datta, A.B.; Parrack, P. Escherichia coli HflK and HflC can individually inhibit 597
the HflB (FtsH)-mediated proteolysis of λCII in vitro. Arch. Biochem. Biophys. 2010, 501, 239–243. 598
599
600
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
24
SUPPLEMENTARY FIGURES AND TABLES 601
Figure S1 Microviridae 602
(a) (b) (c)
(d)
Figure S4 (a) Phylogenetic tree (Maximum log Likelihood: -6065.3, DNA replication protein gpA), scalebar: substitutions 603 per site (b) Phylogenomic nucleotide distances (Gegenees, BLASTn: fragment size: 200, step size: 100, threshold: 0%). (c) 604 Phylogenomic amino acid distances (Gegenees, tBLASTx: fragment size: 200, and step size: 100, threshold: 0%). (d) 605 Pairwise alignment of phage genomes, the colour bars between genomes indicate percent pairwise similarity as illustrated 606 by the colour bar in the upper right corner (Easyfig, BlASTn). Gene annotations are provided for phages identified in this 607 study and deposited in GenBank. 608 609
Coliphage G4Coliphage ID52
Escherichia phage SECphi17Escherichia phage Lillemer
Escherichia phage LilletoEscherichia phage Lilleduescherichia phage Lilleput
Escherichia phage LilleenEnterobacteria phage alpha3Escherichia phage Lilleven
Enterobacteria phage St-10.050
Organism 1 2 3 4 5 6 7 8 9 10 11
1: NC_001420_G4 100 62 20 19 19 21 17 21 1 1 1
2: DQ079877_ID52 63 100 15 20 19 20 17 19 2 2 2
3: Escherichia_phage_SECphi17 22 17 100 74 72 73 76 72 1 2 2
4: MG_187_C21_Esch_SECphi17 21 20 76 100 92 93 91 88 0 0 0
5: MG_179_C1_Esch_SECphi17 21 19 74 92 100 92 88 86 1 2 1
6: MG_107_C2_Esch_SECphi17 24 20 76 93 92 100 91 87 1 1 1
7: MG_044_C1_Esch_SECphi17 18 17 76 90 88 90 100 90 1 1 1
8: MG_138_C36_Esch_SECphi17 22 20 74 89 88 89 92 100 1 1 1
9: NC_001330_alpha3 2 2 1 0 1 1 1 1 100 63 66
10: MG_038_C11_Ent_St-1 2 2 1 0 1 1 1 1 67 100 83
11: Enterobacteria_phage_St-1 1 2 1 0 1 1 1 1 66 84 100
Heat plot for: Comparison[blastn_micro_190619], Alignment[analysis], Analysis[*default*] file:///Users/au574427/Desktop/blastn_micro_190619.html
1 of 1 20/06/2019, 14.04
Organism 1 2 3 4 5 6 7 8 9 10 11
1: NC_001420_G4 100 88 62 63 62 62 62 63 42 42 42
2: DQ079877_ID52 88 100 62 62 62 63 62 63 42 41 41
3: Escherichia_phage_SECphi17 62 62 100 89 88 90 90 86 42 42 42
4: MG_187_C21_Esch_SECphi17 62 62 89 100 95 95 94 90 41 41 41
5: MG_179_C1_Esch_SECphi17 62 62 89 95 100 94 93 90 42 42 41
6: MG_107_C2_Esch_SECphi17 62 63 90 96 94 100 93 90 42 42 42
7: MG_044_C1_Esch_SECphi17 62 62 90 94 93 93 100 92 42 42 41
8: MG_138_C36_Esch_SECphi17 64 64 88 92 92 91 94 100 43 43 42
9: NC_001330_alpha3 39 39 39 39 38 39 39 39 100 86 87
10: MG_038_C11_Ent_St-1 39 39 39 39 39 39 39 39 87 100 90
11: GQ149088_St1 39 39 39 39 38 39 39 39 87 89 100
Heat plot for: Comparison[tblastx_micor_190619], Alignment[analysis], Analysis[*default*] file:///Users/au574427/Desktop/tblastx_micro-190619.html
1 of 1 20/06/2019, 14.04
67%
100%
Minor spike (gpH)
Minor spike (gpH)
Minor spike (gpH)
Minor spike (gpH)
Minor spike (gpH)
Minor spike (gpH)
gpG
gpG
gpG
gpG
gpG
gpG
Major coat (gpF)
Major coat (gpF)
Major coat (gpF)
Major coat (gpF)
Major coat (gpF)
Major coat (gpF)
gpD
gpD
gpD
gpD
gpD
gpD
gpC
gpC
gpC
gpC
gpC
gpC
DNA replication (gpA)
DNA replication (gpA)
DNA replication (gpA)
DNA replication (gpA)
DNA replication (gpA)
DNA replication (gpA)
G4
ID52
SECphi17
Lilleto
Lillemer
Lilledu
Lilleput
Lilleen
Alpha3
Lilleven
St-1
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
25
Figure S2 Hanrivervirus 610
1 Damhaus
2 Herni
3 Grams
4 Vojen
5 Aaroes
6 Aalborv
7 Haarsle
8 Egaa
9 Shigella
phage pSf-1 611
612
Figure S2 Phylogenomic amino acid distances (Gegenees, tBLASTx: fragment size: 200, and step size: 100, threshold: 0%). 613 and Pairwise alignment of phage genomes (Easyfig, BlASTn), the colour bars between genomes indicate percent pairwise 614 similarity as illustrated by the colour bar in the lower right corner (Easyfig, BlASTn). Genome order is the same in both 615 analyses. 616
617
Organism 1 2 3 4 5 6 7 8 9
1: MG_010_C12_Shi_pSf-1 100 84 85 85 88 79 81 85 83
2: MG_086_C1_Sal_pSf-1 84 100 86 86 85 82 80 90 85
3: MG_106_C6_Shi_pSf-1 87 88 100 86 88 81 82 87 84
4: MG_142_C100_Shi_pSf-1 85 86 84 100 84 81 82 89 85
5: MG_146_C2_Shi_pSf-1 88 84 85 83 100 79 80 84 81
6: MG_182_C1_Shi_pSf-1 85 88 85 87 86 100 83 86 84
7: MG_187_C4_Shi_pSf-1 85 84 83 84 84 80 100 86 84
8: MG_193_C1_Shi_pSf-1 85 89 84 87 84 79 82 100 85
9: Shigella phage pSf-1 82 84 81 84 81 77 80 85 100
Heat plot for: Comparison[hanriver_tblastx-psf1_020419], Alig... file:///Users/au574427/Desktop/Coli_siphos/Hanriver/hanriver_p...
1 of 1 02/04/2019, 13.43
68%
100%
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
26
Figure S3 Dhillonvirus 618
619
Figure S3 Pairwise alignment of phage genomes (Easyfig, BlASTn), the colour bars between genomes indicate percent 620 pairwise similarity as illustrated by the colour bar in the lower right corner (Easyfig, BlASTn). 621
622
70%
100%
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint
Table S1 Auxiliary metabolism genes 623
624
Gene Protein KO number Function in bacteriophages Count phages with this gene
rmlC dTDP-4-dehydrorhamnose 3,5-epimerase K01790 dtdp rhamnose biosynthesis 10 Cluster VI and VII
rmlA glucose-1-phosphate thymidylyltransferase K00973 dtdp rhamnose biosynthesis 8 Cluster VII, ukendt and tuntematon
rmlB dTDP-glucose 4,6-dehydratase K01710 dtdp rhamnose biosynthesis 8 Cluster VII, ukendt and tuntematon
rmlD dTDP-4-dehydrorhamnose reductase K00067 dtdp rhamnose biosynthesis 10 Cluster VI and VII
dcm DNA (cytosine-5)-methyltransferase K17398 DNA modification 5 Cluster VII
NAMPT nicotinamide phosphoribosyltransferase K03462 unknown 25 Felixounavirus, Suspvirus and Cluster VII
mec CysO sulfur-carrier protein]-S-L-cysteine
hydrolase K21140 tail fiber protein 20 Hanrivervirus and unclassifed Tunavirinae
folA dihydrofolate reductase K00287 de novo synthesis of pyrimidines,
thymidylic acid, and certain amino acids 33
Felixounavirus, Suspvirus, Krischvirus, Dhakavirus, Mosigvirus
and Tequatrovirus
aroAG 3-deoxy-7-phosphoheptulonate synthase K01626 unknown 3 Three of the four Mosigvirus
AUP1 UDP-N-acetylglucosamine diphosphorylase K23144 DNA modification 4 Mosigvirus
KdsD arabinose-5-phosphate isomerase K06041 DNA modification 4 Mosigvirus
dcm DNA (cytosine-5)-methyltransferase K00558 DNA modification 1 Pangalan
queC 7-cyano-7-deazaguanine synthase K06920 DNA modification 1 Skure
queE 7-carboxy-7-deazaguanine synthase K10026 DNA modification 1 Skure
folE GTP cyclohydrolase K01495 DNA modification 1 Skure
queD 6-carboxy-5,6,7,8-tetrahydropterin synthase K01737 DNA modification 1 Skure
625
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted January 19, 2020. ; https://doi.org/10.1101/2020.01.19.911818doi: bioRxiv preprint