toward genetic-based taxonomy: comparative analysis...

42
1 of 37 For publication in Journal of Virology 1 2 3 Toward Genetic-Based Taxonomy: Comparative 4 Analysis of a Genetic-Based Classification and the 5 Taxonomy of Picornaviruses 6 7 8 9 Chris Lauber 1 and Alexander E. Gorbalenya* ,1,2 10 11 12 1 Molecular Virology Laboratory, Department of Medical Microbiology, Leiden 13 University Medical Center, 2300 RC Leiden, The Netherlands 14 2 Faculty of Bioengineering and Bioinformatics, M.V. Lomonosov Moscow 15 State University, 119899 Moscow, Russia 16 17 18 * Corresponding author: Dr. Alexander E. Gorbalenya, Department of Medical 19 Microbiology, Leiden University Medical Center, Albinusdreef 2, P.O.Box 9600, E4- 20 P, 2300 RC Leiden, The Netherlands, Phone: +31-71-526-1652, Fax: +31-71-526- 21 6761, E-mail: [email protected] 22 23 Key words: evolution, genomes, picornaviruses, phylogeny, species, virus discovery, 24 taxonomy 25 26 Running title: Toward Genetic-Based Taxonomy of a Virus Family 27 Abstract: 250 words 28 Text: 5545 words 29 30 Copyright © 2012, American Society for Microbiology. All Rights Reserved. J. Virol. doi:10.1128/JVI.07174-11 JVI Accepts, published online ahead of print on 25 January 2012 on July 14, 2018 by guest http://jvi.asm.org/ Downloaded from

Upload: trinhthuan

Post on 02-Jul-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

1 of 37

For publication in Journal of Virology 1 2

3

Toward Genetic-Based Taxonomy: Comparative 4

Analysis of a Genetic-Based Classification and the 5

Taxonomy of Picornaviruses 6

7 8 9

Chris Lauber1 and Alexander E. Gorbalenya*,1,2 10

11 12 1 Molecular Virology Laboratory, Department of Medical Microbiology, Leiden 13

University Medical Center, 2300 RC Leiden, The Netherlands 14 2 Faculty of Bioengineering and Bioinformatics, M.V. Lomonosov Moscow 15

State University, 119899 Moscow, Russia 16

17

18

* Corresponding author: Dr. Alexander E. Gorbalenya, Department of Medical 19

Microbiology, Leiden University Medical Center, Albinusdreef 2, P.O.Box 9600, E4-20

P, 2300 RC Leiden, The Netherlands, Phone: +31-71-526-1652, Fax: +31-71-526-21

6761, E-mail: [email protected] 22

23

Key words: evolution, genomes, picornaviruses, phylogeny, species, virus discovery, 24

taxonomy 25

26

Running title: Toward Genetic-Based Taxonomy of a Virus Family 27

Abstract: 250 words 28

Text: 5545 words 29

30

Copyright © 2012, American Society for Microbiology. All Rights Reserved.J. Virol. doi:10.1128/JVI.07174-11 JVI Accepts, published online ahead of print on 25 January 2012

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

2 of 37

Abstract 31

Virus taxonomy has received little attention from the research community despite its 32

broad relevance. In Lauber & Gorbalenya (2012) JVI 86(X): xxxx-xxxx we have 33

introduced a quantitative approach to hierarchically classify viruses of a family using 34

pair-wise evolutionary distances (PEDs) as a measure of genetic divergence. When 35

applied to the six most conserved proteins of the Picornaviridae it clustered 1234 36

genome sequences in groups at three hierarchical levels (the GENETIC 37

classification). In this study we compare the GENETIC classification with the expert-38

based picornavirus taxonomy and outline differences in the underlying frameworks 39

regarding the relation of virus groups and genetic diversity that represent, 40

respectively, the structure and content of a classification. To facilitate the analysis we 41

introduce two novel diagrams. The first connects the genetic diversity of taxa to both 42

the PED distribution and the phylogeny of picornaviruses. The second depicts a 43

classification and the accommodated genetic diversity in a standardized manner. 44

Generally, we found striking agreement between the two classifications on species 45

and genus taxa. Few disagreements concern the species Human rhinovirus A and 46

Human rhinovirus C and the genus Aphthovirus, which were split in the GENETIC 47

classification. Furthermore, we propose a new super-genus level and universal, level-48

specific PED thresholds, not reached yet by many taxa. Since the species threshold is 49

approached mostly by taxa with large sampling sizes and those infecting multiple 50

hosts, it may represent an upper limit on divergence beyond which homologous 51

recombination in the six most conserved genes between two picornaviruses might not 52

give viable progeny. 53

54

55

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

3 of 37

Introduction 56

57

Research in virology relies on virus taxonomy for providing a unified intellectual and 58

practical framework for analysis, generalization and knowledge dissemination. 59

Despite its broad relevance, taxonomy has received relatively little attention from the 60

research community. Virus taxonomy is developed under the direction of the 61

Committee on Taxonomy of Viruses (ICTV) and recognizes five hierarchically 62

arranged ranks: order, family, subfamily, genus and species (in the ascending order of 63

inter-virus similarity), with order and subfamily levels being used less commonly. 64

Virus species are of principal importance (60) and for their demarcation the so-called 65

polythetic species concept (3,74) is applied. Accordingly, viruses are recognized as 66

single species if they share a broad range of characteristics while constituting a 67

replicating lineage that occupies a particular ecological niche (36,75). These 68

characteristics, so-called demarcation criteria, are devised for each genus separately 69

and are revised periodically (16,35). To ensure that each virus is classified, they are 70

allowed to vary greatly between and even within families, with no single unifying 71

property being sought after (for review see (76)). Consequently, virus species are 72

operational units that are delimited at the genus level. They can be contrasted to 73

biological species that are commonly defined by shared gene pools and reproductive 74

isolation. The lack of a mandatory common denominator of virus species casts 75

uncertainty over the interpretation and generalization of results obtained across 76

different genera. 77

78

We are interested in exploring the wealth of genomic information for improving the 79

foundation of virus taxonomy. For this purpose we use the family Picornaviridae as a 80

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

4 of 37

case study. Picornaviruses form one of the largest and most actively studied virus 81

families with many human and societally important pathogens whose number is 82

steadily growing (15,64). They employ a single-stranded RNA genome of positive 83

sense (ssRNA+) with lengths in the range of 6500 to 9000 nucleotides of which about 84

90% encode a single polyprotein that is co- and post-translationally cleaved into 85

eleven to thirteen mature proteins (50). In total six proteins, three of the capsid 86

module (1B, 1C and 1D, known also as VP2, VP3 and VP1), and three of the 87

replicase module (2C, 3C and 3D) are conserved family-wide to form the backbone of 88

the genetic plan (20). Other proteins may be specific for different subsets of 89

picornaviruses. Particularly, proteins known as L and 2A come in a large variety of 90

molecular forms (20,40) most of which were implicated in functions that secure virus 91

propagation in the host (1). The open reading frame that encodes the polyprotein (55) 92

is flanked by the two untranslated regions, 5’-UTR and 3’-UTR. The 5’-UTR includes 93

a highly structured internal ribosomal entry site (IRES) which is known to exist in 94

four different molecular forms, from type I to IV (78). The expert-based classification 95

(the ICTV taxonomy) of the Picornaviridae devised by the Picornavirus Study Group 96

(PSG), recognizes 28 species distributed among 12 genera, and no subfamilies (40). A 97

growing number of picornaviruses either is tentatively classified in provisional taxa or 98

remains unclassified. The PSG uses a complex set of rules to devise taxa and classify 99

viruses. All genera form compact monophylogenetic clusters in separate trees of the 100

conserved proteins as well as the capsid and replicative modules, respectively. The 101

polyprotein sequences of viruses in different genera differ by at least 58% amino acid 102

residue (aa) identity (39,70). For genera that include multiple species (Enterovirus, 103

Cardiovirus, Aphthovirus, Parechovirus, Kobuvirus, Sapelovirus) demarcation criteria 104

that separate the species have been developed by the PSG. Most commonly, they 105

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

5 of 37

define lower limits of pair-wise aa identity in the polyprotein and its two parts, the 106

capsid and replicative modules. Additionally, the criteria may include restrictions on 107

genome organization, genome base composition (G+C), host range, host cell receptor 108

variety, and compatibility in processes that underlie the replicative cycle. Some taxa 109

may be distinguished by the presence of a molecular marker that could be an L and/or 110

a 2A protein (20,31), the type of IRES (24,78), the genome position of internal cis-111

replicative element directing the VPg synthesis (CRE) (9,71), or a combination 112

thereof. For genera that include a single species (Hepatovirus, Erbovirus, 113

Teschovirus, Senecavirus, Tremovirus, Avihepatovirus) no species demarcation 114

criteria have been developed due to the lack of sufficient diversity in the available 115

virus sampling. 116

117

In an accompanying paper we have introduced a quantitative approach for partitioning 118

the genetic diversity of a virus family to build a hierarchical classification, which we 119

named DEmARC (43). In contrast to the framework of virus taxonomy, DEmARC 120

uses a sole demarcation criterion – inter-virus genetic divergence. When applying 121

DEmARC to the family Picornaviridae it clustered 1234 genome sequences in groups 122

at three hierarchical levels (the GENETIC classification). In this study, two of the 123

three inferred levels in the GENETIC classification were found to correspond most 124

closely to the species and genus ranks recognized by ICTV (40). Few deviations from 125

the ICTV taxonomy concern assignments for the genus Aphthovirus (40,45) and 126

species Human rhinovirus A and C (2,69). The third level has no counterpart in the 127

current taxonomy. Furthermore, we found the family-wide conserved proteins to have 128

almost universally accumulated fewer substitutions in viruses of the same species than 129

in those belonging to different species, suggesting that picornavirus species are 130

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

6 of 37

genetically separated. This also indicates that the objective discrimination between the 131

genetic divergence within a taxon (intragroup) and that between taxa (intergroup) is 132

attainable. Finally, we outline conceptual differences between the frameworks that 133

underlie the two classifications. These differences concern the relation of genetic 134

diversity, the content of a genetics-based classification, and virus groups representing 135

its structure. To facilitate the comparison we introduce two novel diagrams that 136

illustrate, respectively, the connection of new concepts developed in this study to 137

conventional phylogenetic techniques already used in taxonomy, and the depiction of 138

a classification and the associated genetic diversity in a standardized manner. 139

140

Materials and Methods 141

142

Virus sequences, multiple alignment and distance estimation 143

144

Complete genome sequences for 1234 picornaviruses available on April 15, 2010 at 145

the National Center for Biotechnology Information GenBank/RefSeq (5) databases 146

were downloaded using SARGENS (67) into the Viralis platform (21). A 147

concatenated multiple amino acid alignment covering the family-wide conserved 148

capsid proteins 1B, 1C, 1D and the non-structural proteins 2C, 3C and 3D of the 1234 149

picornaviruses (Fig. 1) was produced using the MUSCLE program (14) and poorly 150

conserved columns were further manually refined. The alignment subsequently 151

facilitated the calculation of pair-wise evolutionary distances (PEDs) using a 152

maximum likelihood (ML) approach (7,17), as implemented in the Tree-Puzzle 153

program (63). The WAG amino acid substitution matrix (77) was applied. PEDs serve 154

as a measure of inter-virus genetic divergence. 155

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

7 of 37

156

Phylogeny reconstruction 157

158

Bayesian posterior probability trees were compiled utilizing the Beast software (12). 159

Bayesian MCMC chains (2 independent runs per dataset) were run for 4 million steps 160

(10% burning, sampled every 100 generations) under the WAG amino acid 161

substitution matrix (77). Substitution rate heterogeneity among alignment sites was 162

allowed as modeled via a gamma distribution with 4 categories. The uncorrelated 163

relaxed molecular clock approach (lognormal distribution) (11) was used as it was 164

strongly favored over the strict molecular clock (log Bayes factor of 56.7) and the 165

relaxed molecular clock approach with exponential distribution (log Bayes Factor of 166

14.6). Convergence of runs was verified using Tracer (13). ML trees were compiled 167

utilizing the PhyML software (23). The WAG amino acid substitution matrix was 168

applied and substitution rate heterogeneity among sites (4 categories) was allowed. 169

Support values for internal nodes were obtained using the non-parametric bootstrap 170

method with 1000 replicates or through SH-like approximate likelihood ratio tests. 171

172

Genetic-based virus classification 173

174

We have developed DEmARC, a quantitative procedure for hierarchical classification 175

of a virus family based on inter-virus genetic divergence (43). It has been evaluated 176

extensively for consistency and stability with respect to key parameters including the 177

amount and/or diversity of the input data, the alignment construction method and the 178

measure of inter-virus divergence. For brevity, we refer to the DEmARC-mediated 179

picornavirus classification as the GENETIC classification. 180

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

8 of 37

181

Measures of quality 182

183

In the accompanying paper (43) we have introduced a cost measure to determine a 184

threshold on intragroup genetic divergence at each classification level in a 185

quantitative way. This cost is calculated as the cumulative violation of intragroup 186

PED values to the respective threshold among all taxa of the level (see (43) for 187

details). Hence, this cost, which is a nonnegative real number, is used as a quality 188

measure for a classification level – the lower the cost the higher the quality. 189

Furthermore, analogs of the cost measure can be calculated for both a taxon and a 190

single virus by summarizing over the respective violating PED values. 191

192

Another measure of quality of a taxon is the fraction of intraspecifc pair-wise 193

distances not exceeding the distance threshold of the respective level, to which we 194

refer as cluster quality (cq). A taxon is considered complete if cq=1, and incomplete 195

otherwise (0<cq<1). 196

197

Results and Discussion 198

199

Phylogeny, PED distribution and classification of picornaviruses 200

201

Our dataset included 1234 genomes sequences from picornaviruses whose taxonomic 202

position at the start of this study was either already established as described above or 203

remained provisional or uncertain due to a considerable time involved in taxa 204

assignments (40). Using a concatenated multiple alignment of six conserved proteins 205

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

9 of 37

of a representative set of 38 picornaviruses we reconstructed a phylogenetic tree under 206

both a ML and a Bayesian framework. The two trees had a matching topology and 207

included monophyletic branches corresponding to the taxa recognized by ICTV (black 208

tree branches and names in Fig. 2). The phylogeny additionally comprised a number 209

of new branches of different lengths accommodating a large number of relatively 210

recently identified picornaviruses. We concluded that the alignment used in our study 211

contains information compatible with taxonomy. Hence, we used this alignment as 212

input for DEmARC in order to devise the GENETIC classification of picornaviruses 213

(43). We identified three statistically most strongly supported positions of 214

discontinuity (thresholds) in the picornavirus PED distribution that we assigned as 215

defining species, genus and super-genus levels of the classification, respectively. 216

217

Below, we compare the GENETIC classification and the ICTV taxonomy at each of 218

these levels separately. To facilitate the comparison we devised a special plot (Fig. 219

3A, central panel), which connects the phylogeny (Fig. 3A, left) and the PED 220

distribution (Fig. 3A, bottom-right) that are used in taxonomy and DEmARC, 221

respectively. The plot (Fig. 3A, central) presents a two-dimensional partitioning of the 222

inter-virus genetic diversity and reveals an association of a taxon in the tree and three 223

ranges in the PED distribution that correspond to the three levels of the GENETIC 224

classification. Thus, the phylogeny and the PED distribution represent complementary 225

projections of the inter-virus genetic diversity that, when combined, reveal the most 226

critical characteristics utilized in taxonomy. The availability of this plot empowers the 227

reader with a tool to inspect the foundations and analyze implications of the proposed 228

classification. 229

230

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

10 of 37

GENETIC classification versus ICTV taxonomy: species level 231

232

At the species level, the principal level in taxonomy, the GENETIC classification 233

includes 38 clusters. Twenty-seven of them correspond one-by-one to species of the 234

ICTV taxonomy (70), three clusters encompass a single species (Human rhinovirus C; 235

HRV-C), and eight clusters comprise recently discovered viruses that were not yet 236

formally classified at the start of the study. HRV-C was split in three species-like 237

clusters provisionally named Human rhinovirus Cα (HRV-Cα), Human rhinovirus Cβ 238

(HRV-Cβ), and Human rhinovirus Cγ (HRV-Cγ) (Figs. 2 and 3A, Table 1). 239

240

The 27 clusters corresponding to the recognized species include already classified 241

viruses and some accommodate also recently discovered viruses, including simian 242

enteroviruses joining Human enterovirus A or B (HEV-A and HEV-B, respectively) 243

(53,54), Saffold virus grouping with Theilovirus (6,8,32,46), Possum enterovirus 244

joining Bovine enterovirus (82), and Porcine kobuvirus being classified with Bovine 245

kobuvirus (62) (Table 1). With the exception of Theilovirus the host range of these 246

species was expanded as a result of this virus update. A recent phylogenetic study of 247

RNA viruses from three families and two genera other than the Picornaviridae 248

revealed that host switching by virus species is more frequent than previously thought 249

(38). 250

251

The eight clusters encompassing exclusively novel viruses include: cosaviruses (4 252

clusters, CosV-A, CosV-B, CosV-C, CosV-D) (26,33), sealion picornavirus (1, AqV-253

A) (34,40), Human klasse- and saliviruses (hereafter referred to as saliviruses) (1, 254

SaliV-A) (22,27), rhinoviruses close but separated from Human rhinovirus A, HRV-A 255

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

11 of 37

(1, provisionally named Human rhinovirus Aβ, HRV-Aβ) (9,56,57) and simian 256

enteroviruses not belonging to Simian enterovirus A (1, SiEV-B) (52-54) (Table 1). 257

There seems to be a good match between the GENETIC classification assignments 258

listed above and those that are in the pipeline for approval by ICTV (39,40). 259

260

Thirty-two out of 38 species include more than one sequence (non-singleton) and they 261

determine the PED range of all 38 species clusters which is defined as “intra-species” 262

genetic divergence (Fig. 3A). Virus sampling for the 38 species varied considerably in 263

the range of one (six species) to 260 (Foot-and-mouth disease virus, FMDV) 264

sequences (Fig. 3A). The corresponding intragroup PED ranges (distances between 265

virus pairs belonging to a single species) differed ~10-fold among the species with 266

more than one non-identical sequences, with maxima varying from 0.04 (Avian 267

encephalomyelitis virus, AvEMV) to 0.41 (HRV-A) (Fig. 3A). All except three 268

species clusters were complete (each intragroup PED is below the species distance 269

threshold) (Fig. 3A) (see (43)). The three incomplete species clusters include viruses 270

that belong to HRV-A (96 viruses in total and 14 viruses define pairs with larger-than-271

threshold distances), Bovine kobuvirus (4 and 1) and the proposed species-like cluster 272

HRV-Cγ (4 and 2) (Table 2; Fig. 4). In these species, respectively, 3.6%, 16.7% and 273

50% of intragroup PEDs exceeded the species threshold (Table 1, Fig. 3A). Combined 274

they account for less than 0.19% (175 out of 93,857) of all intragroup PED values at 275

this level. In respective classifications obtained with three evaluation datasets (43), 276

Bovine kobuvirus was split in two clusters that observe the threshold and are host-277

restricted, which would be in line with the original proposal by the authors who 278

identified the porcine kobuvirus (62). 279

280

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

12 of 37

GENETIC classification versus ICTV taxonomy: rhinoviruses 281

282

Why do the GENETIC classification and the ICTV taxonomy differ so profoundly in 283

respect to HRV-C while agreeing on the virus composition of all other species? 284

Specifics of both HRV-C evolution and the two classification frameworks could play 285

a role. The genetic diversity of these viruses in capsid (1A, also known as VP4, and 286

1D proteins) and non-structural (3D) regions was previously reported to exceed those 287

of other rhinoviruses (51,69). In the 1D protein this difference is smallest and the 288

entire HRV-C diversity was considered to be below the species limit, paving the way 289

to the recognition of HRV-C as a single species. We have also observed HRV-C 290

viruses to form a single species-like cluster in the DEmARC-mediated classification 291

using the major capsid proteins only (43). However, in the analysis of the dataset 292

comprising the six family-wide conserved proteins the observed maximum divergence 293

of HRV-C considerably exceeded that of its most diverged subset (HRV-Cγ) and the 294

family-wide species demarcation threshold: 0.424, 0.392 and 0.37, respectively. This 295

was likely due to a combined effect of congruent phylogenetic signals from both the 296

structural and non-structural proteins (Fig. 4 and data not shown). The virus 297

divergence in HRV-C is so high that even half of intragroup distances in HRV-Cγ 298

exceed the species threshold (Fig. 3A and Table 1). This low support for the HRV-Cγ 299

species (Table 1), which is the lowest overall and only one of three below 100%, is 300

even more striking given that the virus sampling in this provisional species and the 301

two HRV-C sister taxa is very limited (one to four available genome sequences per 302

cluster). Thus, it remains plausible that with the accumulation of sequenced genomes 303

in the future, HRV-Cγ will be split further, increasing the number of provisional 304

HRV-C species to at least four compared to the one currently recognized. Each of 305

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

13 of 37

these species correspond to a separate major lineage in the HRV-C phylogeny (51) 306

(Fig. 4). 307

308

Furthermore, the GENETIC classification proposes the recognition of another 309

potentially new rhinovirus species (HRV-Aβ). It is formed by three viruses and 310

corresponds to the recently identified “clade D” rhinoviruses (57) (known otherwise 311

as the cluster HRV-A2, (9)) that is a sister group to the species HRV-A (Fig. 4). 312

Altogether, our analysis suggests that at least six (rather than three) human rhinovirus 313

species may exist. Testing this more complex species structure in human rhinoviruses 314

could facilitate research into the molecular basis of the observed clinical 315

heterogeneity of rhinovirus infections in humans (2,28,56). 316

317

GENETIC classification and the recognition of virus species as biological entities 318

319

We have found that viruses belonging to a single species are usually separated by less 320

than ~0.4 replacements per residue on average in the six most conserved proteins, 321

while this distance is commonly exceeded in virus pairs representing different species 322

(Fig. 3B). Furthermore, we observed a dependence of the largest intragroup genetic 323

divergence (maximum intragroup PED) on the sampling size (number of viruses) in 324

the 38 species: with increasing sampling size, a species’ maximum genetic divergence 325

tends to approach the species distance threshold (Fig. 3B). Accordingly, the eleven 326

species that constitute the upper ~25% of the maximum PED range are enriched with 327

highly-sampled species. Additionally, host range may be another parameter of 328

relevance to the genetic divergence of species: the upper ~25% of the maximum PED 329

range is also enriched with species that infect multiple hosts (five out of six species of 330

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

14 of 37

this kind) (Fig. 3B). This correlation is sensible biologically since host switching is 331

expected to be accompanied with accelerated virus evolution. 332

333

The abovementioned correlations involve species that belong to four genera, 334

indicating that they may be applicable to all picornavirus species. If so, we may 335

expect that with a sufficient increase of the species sampling size the maximum 336

divergence of all species in the Picornaviridae will approach the species threshold. 337

This would indicate that the intragroup genetic divergence of species is constrained 338

similarly in different lineages. Alternatively, some currently under-sampled lineages 339

could accommodate a smaller natural diversity due to either stricter constrains or 340

being a “young” species. For instance, HAV with its relatively large sampling size 341

and two hosts (Fig. 3B) has an unusually small maximum genetic divergence (see also 342

(4)). Thus, it remains possible that the inferred species threshold represents an upper 343

limit on the maximum intragroup genetic divergence but that the actual limit may be 344

smaller in some picornavirus species. Likewise, we may not exclude that viruses in 345

some species may diverge above the threshold. This might happen due to position-346

specific variations of replacements in the six conserved proteins or involvement of 347

virus lineages that are in the transition to establish separate species. The virus 348

diversity known in taxonomy as the species HRV-A and HRV-C (Fig. 4) could 349

represent such cases. Also, it is important to stress that the species distance threshold 350

represents an average over 2446 positions in six conserved proteins (43) indicating 351

that (lineage-specific) variations of maximum divergence for different proteins are 352

likely (see below and also (44,68)). Further characterization of the natural diversity of 353

picornavirus species, including the surveillance of novel hosts, could address this 354

important aspect of the species delimitation in the GENETIC classification. 355

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

15 of 37

356

The existence of a species threshold on intragroup genetic divergence must be 357

rationalized mechanistically. It may be a manifestation of speciation due to changes 358

accumulated in either conserved proteins or other elements encoded in the 359

picornavirus genome. To discuss the alternatives, it is important to recall that the 360

divergence is a net result of contributions from several sources including mutation and 361

homologous recombination. Although both promote diversity increase, they act in 362

opposite directions concerning progeny divergence: on average, the progeny of two 363

lineages diverged by mutation will be more separated than their parents while those 364

generated through homologous recombination of parents will be closer to each other 365

than their parents (49). In other words, recombination limits the maximum genetic 366

divergence in an asexual population; without it the population will evolve into 367

separate, more distantly related lineages after a sufficient time. 368

369

The inferred species threshold reflects the maximum amount of accumulated genetic 370

differences in the six conserved proteins between two picornaviruses that remains 371

compatible with the viability of progeny produced by homologous recombination, as 372

argued below. The frequency of homologous recombination depends on the extent of 373

base-pairing, with intratypic recombination being most common (37,72). Two 374

picornaviruses that are separated by a distance approaching the species threshold 375

would retain only relatively small stretches of identical orthologous residues in their 376

genome because the threshold is so high; the lack of extensive base-pairing should 377

impede homologous recombination. Even if recombination happens between these 378

viruses, the resulting chimeric progeny will be viable only if the recombinant proteins, 379

which all are essential for virus reproduction, remain functional. The protein 380

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

16 of 37

functionality depends on the intra- and inter-protein compatibility of lineage-specific 381

mutations that have been accumulated since the divergence of these viruses. The 382

mutation spectrum is restricted by so-called epistatic interactions between different 383

protein positions (66) making mutations outside this spectrum incompatible with the 384

protein functioning. As two viruses diverge they will approach the species distance 385

threshold beyond which accumulated mutations may become incompatible with 386

progeny viability in any combination that could be generated in the recombinants. In 387

this framework, the existence of the species threshold reflects the genetic separation 388

of species. This model could be probed in experiments on virus chimera involving the 389

conserved backbone proteins. It is predicted that intra- but not inter-species chimera 390

must be viable. Results compatible with this model are available for Human 391

enterovirus C (29,30). The viability of chimeric progeny may be determined not only 392

by the distance between parents but also by the origins of combined parts (30), 393

indicating that both forth and reciprocal chimera must be characterized. 394

395

In the alternative model, other elements outside the conserved proteins could be 396

implicated in the control of speciation. These elements include L and 2A proteins, 397

which exist in a large variety of molecular forms in picornaviruses (1,20,40), or CRE 398

whose location in the genome varies tremendously among picornaviruses 399

(9,18,19,50,73,80,81), or other elements located in the 5’- and 3’-non-coding 400

regions(71,78). For a number of picornaviruses the viability of inter-species chimera 401

carrying a non-cognate version of either L (58) and 2A protein (47), or CRE (73) and 402

IRES (48,78) was demonstrated experimentally. Also, several picornaviruses with 403

deleted L proteins were found to be viable (42,59) which is in line with their 404

accessory “security” role in virus replication (1). Thus, picornaviruses could accept 405

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

17 of 37

“gene flow” from other species in the case of elements that are not conserved family-406

wide. Consequently, an acquisition or loss or relocation of a non-conserved element 407

by a picornavirus in vivo seems plausible. Furthermore, it is conceivable that such a 408

newly acquired element might confer a function that would allow the virus to explore 409

a new niche, eventually leading to its reproductive isolation from other lineages; in 410

other words it would trigger speciation. However, this model does not provide a 411

mechanistic explanation for the species genetic threshold other than that of the first 412

model (see above). 413

414

Thus, in our opinion, non-conserved and conserved elements of the picornavirus 415

genome may play distinct roles in speciation. The clear-cut relation between the 416

species delimitation and the discontinuity in the inter-virus genetic distance 417

distribution lends support for the notion that picornavirus species are biological 418

entities rather than merely operational units. 419

420

GENETIC classification versus ICTV taxonomy: genus level 421

422

The GENETIC classification includes a genus level comprising 16 clusters. Eleven of 423

them match ICTV genera, two clusters encompass a single genus (Aphthovirus), and 424

three clusters comprise recently discovered viruses (Figs. 2 and 3A). The genus 425

Aphthovirus was split into two clusters that are formed by, respectively, the single 426

species Equine rhinitis A virus (ERAV) (45) and the two species FMDV (10,41) and 427

Bovine rhinitis B virus (BRBV) (25), respectively. The minimum PED of 1.03 428

between viruses of these two clusters is considerably larger than the genus distance 429

threshold of 0.905 and comparable to those between the closest virus pairs of other 430

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

18 of 37

sister genera, e.g. Senecavirus and Cardiovirus or Enterovirus and Sapelovirus. In 431

fact, the distance range between viruses of these two clusters fits in the limits of the 432

next rank (super-genus) that is considered below. This result was also reproduced in 433

classifications of two evaluation datasets (43) in which these viruses are present but 434

which differed in respect to genome region and virus selection, respectively. We note 435

that an L protein variety with a papain-like fold and proteolytic activity that is 436

associated with this monophyletic virus group (40) could be considered a molecular 437

marker of a larger group that also includes the sister genus Erbovirus (45,79). Thus, 438

there is a strong support for splitting the genus Aphthovirus into two genera in future 439

revisions of taxonomy. 440

441

The three genus clusters that are formed by recently discovered viruses include 442

cosaviruses (4 species), sealion picornavirus (1) and saliviruses (1), respectively. All 443

genera clusters were complete with the exception of Enterovirus (Fig. 3A) resulting in 444

less than 0.02% (21 out of 152,194) of intragroup PED values that exceed the genus 445

threshold (Table 2), all involving a single sequence of Enterovirus 71 (Genbank 446

accession AF119795) from HEV-A. Seven out of 16 genera are non-singletons 447

(include more than one species) and they determine the genus-specific PED range 448

which is defined as “inter-species intra-genus” genetic divergence (Fig. 3A). 449

450

GENETIC classification versus ICTV taxonomy: recognition of the new hierarchical 451

level super-genus 452

453

The GENETIC classification recognizes an additional rank – provisionally called 454

super-genus – that has no counterpart in virus taxonomy. The threshold support for 455

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

19 of 37

this level is the strongest overall (43) indicating that it may reflect a clustering that is 456

genetically and evolutionary sensible. At this level we observe five non-singleton 457

super-genera that include more than one genus. They include viruses from 28 species 458

and ten genera. Four of these super-genera represented unions of, respectively, 459

Enterovirus with Sapelovirus, Cardiovirus with Senecavirus, Hepatovirus with 460

Tremovirus, and Kobuvirus with the cluster formed by recently discovered saliviruses 461

(Figs. 2 and 3A). The fifth non-singleton super-genus corresponds to the genus 462

Aphthovirus in the ICTV taxonomy that is split in two genera in the GENETIC 463

classification (see above). The other six super-genera accommodate singleton genera 464

including ten species in total. Four of these super-genera include only a single ICTV 465

genus: Avihepatovirus, Erbovirus, Parechovirus and Teschovirus, respectively. Two 466

supergenera are formed by recently discovered cosaviruses and sealion picornavirus, 467

respectively. All super-genus clusters are complete with the exception of the 468

Enterovirus/Sapelovirus union (Fig. 3A) resulting in less than 0.25% (7 out of 2814) 469

of intragroup PED values that exceed the super-genus threshold (Table 2), all 470

involving a single sequence of avian sapelovirus (RefSeq accession NC_006553) 471

from AvSV. The five non-singleton super-genera determine the super-genus-specific 472

PED range which is defined as “inter-species inter-genus intra-super-genus” genetic 473

divergence (Fig. 3A). 474

475

Multimodality of the PED distribution and the evolution of picornaviruses 476

477

To our knowledge there is nothing in the evolutionary theory that would predict the 478

multimodality of the PED distribution of conserved proteins for a virus family. 479

However, once observed it requires an (evolutionary) explanation. The model of virus 480

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

20 of 37

speciation outlined above may explain the existence of PED discontinuity in which 481

the species threshold resides. This threshold is expected to limit intragroup but not 482

intergroup genetic divergence of lineages once they have crossed the threshold. This 483

biological reasoning seems not to be applicable to other areas of PED discontinuity 484

that are associated with the genus and super-genus thresholds, respectively. One 485

plausible explanation for these discontinuities is that they could reflect large-scale 486

changes in the rate of birth and death that might have happened across all virus 487

lineages. Cellular life forms are known to have gone through alternating periods of 488

both mass birth and death across lineages (61,65). If ancestral (picorna)viruses 489

followed their hosts, alternating peaks and valleys in their PED distribution would 490

reflect periods characterized predominantly by virus speciation and extinction, 491

respectively. Thus, the genus and super-genus levels determined in this study would 492

correspond to two major waves of speciation that are separated by two waves of 493

extinction in the evolution of picornaviruses, possibly reflecting changes in the 494

environment. 495

496

GENETIC classification and the taxonomy of picornaviruses: two different 497

perspectives on known and unknown virus diversities 498

499

As shown above, there is striking agreement between the GENETIC classification and 500

the ICTV taxonomy (70) of the Picornaviridae at the species and genus levels with 501

notable differences concerning the recognition of only few taxa. The observed match 502

is non-trivial (76) since the underlying decision-making frameworks seek to satisfy 503

different criteria. To fully reveal an impact of these criteria in the two frameworks, 504

which are either exclusively (DEmARC) or predominantly (ICTV) genetic–based, we 505

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

21 of 37

sought to characterize their effect on partitioning the virus diversity, the primary 506

target of classification and an important subject of research in virology. To this end, 507

we have developed a circular diagram for presenting the classification of a virus 508

family in a graphical form (Fig. 5). It depicts the proportions of the inter-virus genetic 509

divergence that is partitioned and non-partitioned by a classification, respectively. The 510

circle radius is defined by the PED range observed in the family with inter-virus 511

genetic divergence increasing linearly from the perimeter (PED of zero) towards the 512

centre of the circle (maximum observed PED). Taxa are shown as boxes with heights 513

(in radial dimension) that correspond to the PED range of the respective classification 514

level. Species form the most external layer, followed by the genus layer, and – for the 515

GENETIC classification – the super-genus layer residing most close toward the circle 516

center. Within each taxon, the PED range that has been sampled and not sampled is 517

colored according to the coloring scheme for classification ranks (Fig. 3) using bright 518

and soft colors, respectively. The PED rang that has not been partitioned (yet) by a 519

classification (inner part of the circle) is in white. 520

521

To facilitate an unbiased comparison of the genetic foundations of both frameworks 522

involving as many taxa as possible, the ICTV taxonomy in Fig. 5 was required to 523

follow the GENETIC classification by accepting all taxa containing new viruses and 524

those two (Aphthovirus, HRV-C) that were classified differently. As a result, the 525

taxonomy and the GENETIC classification match each other in the relation to the 526

virus sampling per taxon (the most external layer in Figs. 5A and 5B), and the species 527

and genus structure. At the species level, the PSG applies demarcation criteria that are 528

genus-specific and determined by the maximum observed intragroup genetic 529

divergence among all sampled species of the genus. As a consequence, the limit on 530

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

22 of 37

intragroup genetic divergence of species varies tremendously between genera. 531

Accordingly, in the ICTV diagram only species of the same genus have equal heights 532

(compare taxa 11.x with 12.x in Fig. 5A); for species that comprise a single virus the 533

height is nil (no pair is available to produce a PED; for instance taxon 16.1 in Fig. 534

5A). At the genus level, the PSG does not provide demarcation criteria for the 535

quantification of maximum intragroup genetic divergence and each genus is 536

demarcated separately, usually by means of standard phylogenetic analyses. To reflect 537

this approach, we represented genera as boxes whose heights correspond to the 538

maximum observed intragroup genetic divergence (Fig. 5A). For genera comprising a 539

single species the height is nil (see for instance taxon 15.1 in Fig. 5A). In contrast, in 540

the DEmARC diagram (Fig. 5B) all species, genus or super-genus taxa have uniform, 541

level-specific heights, since in this framework family-wide limits on intragroup 542

genetic divergence are devised (compare for instance taxa 10.1 and 11.1 in Fig. 5B). 543

544

As a consequence of the utilization of family-wide demarcation thresholds, the 545

DEmARC framework, compared to that of ICTV, partitions a larger share of the total 546

PED space (compare white areas in Fig. 5A and 5B). This also shows that for most 547

taxa a fraction of the intragroup genetic divergence is yet to be described in field 548

studies (soft-colored areas in Fig. 5B). Such predictions are not available in the ICTV 549

framework. The diagrams also reveal that most distant relations of viruses in the 550

Picornaviridae remain totally unstructured (white central area in Fig. 5A and 5B). In 551

the DEmARC framework, this area is partially partitioned by super-genera, and could 552

be partitioned further if the subfamily level is introduced (43). 553

554

Concluding Remarks 555

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

23 of 37

556

In a field lacking a gold standard, the striking agreement between the GENETIC 557

classification and the expert-based taxonomy (39,70) of the Picornaviridae could be 558

seen as a cross-validation for both. Of principal importance is that the observed 559

agreement implies that genomes may contain necessary and sufficient information to 560

build a (picorna)virus taxonomy by using an approach (43) that employs a sole (rather 561

than polythetic) demarcation criterion. There are additional benefits of the single 562

criterion: its utilization provides consistency across all taxa, defines expected 563

divergence ranges for poorly sampled taxa, reveals problematic taxa, and makes 564

taxonomy fully genetic-based. We expect the latter to facilitate the interaction 565

between taxonomy and fundamental and applied research. Genetically delimited taxa 566

could be readily targeted for the recognition by virus diagnostic. Furthermore, the 567

validity of the species threshold could be probed in experiments involving 568

homologous recombinants in the backbone genes as well as through characterization 569

of the natural virus diversity in already established and newly identified picornavirus 570

species. Biological foundations of other, higher-rank thresholds could also be 571

addressed. These advancements, combined with the application of DEmARC to other 572

virus families, could bring virus taxonomy into the mainstream of research, and pave 573

the way to ultimately unite it with the taxonomy of cellular life forms. 574

575

Acknowledgments 576

577

We are indebted to Igor Sidorov, Andrey Leontovich and Ivan Antonov for helpful 578

discussions and suggestions, and Dmitry Samborskiy, Igor Sidorov and Alexander 579

Kravchenko for administrating and advancing different Viralis modules. This work 580

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

24 of 37

was partially supported by the Netherlands Bioinformatics Centre (BioRange SP 581

2.3.3), the European Union (FP6 IP Vizier LSHG-CT-2004-511960 and FP7 IP Silver 582

HEALTH-2010-260644), the Collaborative Agreement in Bioinformatics between 583

Leiden University Medical Center and Moscow State University (MoBiLe), and 584

Leiden University Fund (Special Chair in Applied Bioinformatics in Virology). 585

586

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

25 of 37

References 587 588

1. Agol, V. I. and A. P. Gmyl. 2010. Viral security proteins: counteracting host 589 defences. Nat. Rev. Microbiol. 8:867-878. 590

2. Arden, K. E. and I. M. Mackay. 2010. Newly identified human rhinoviruses: 591 molecular methods heat up the cold viruses. Rev. Med. Virol. 20:156-176. 592

3. Beckner, M. 1959. The biological way of thought. Columbia University 593 Press, New York. 594

4. Belalov, I. S., O. V. Isaeva, and A. N. Lukashev. 2011. Recombination in 595 hepatitis A virus: evidence for reproductive isolation of genotypes. J. Gen. 596 Virol. 92:860-872. 597

5. Benson, D. A., I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, and E. W. 598 Sayers. 2010. GenBank. Nucl Acids Res 38:D46-D51. 599

6. Blinkova, O., A. Kapoor, J. Victoria, M. Jones, N. Wolfe, A. Naeem, S. 600 Shaukat, S. Sharif, M. M. Alam, M. Angez, S. Zaidi, and E. L. Delwart. 601 2009. Cardioviruses Are Genetically Diverse and Cause Common Enteric 602 Infections in South Asian Children. J. Virol. 83:4631-4641. 603

7. Cavalli-Sforza, L. L. and A. W. F. Edwards. 1967. Phylogenetic Analysis 604 Models and Estimation Procedures. Am J Hum Genet 19:233-257. 605

8. Chiu, C. Y., A. L. Greninger, K. Kanada, T. Kwok, K. F. Fischer, C. 606 Runckel, J. K. Louie, C. A. Glaser, S. Yagi, D. P. Schnurr, T. D. 607 Haggerty, J. Parsonnet, D. Ganem, and J. L. Derisi. 2008. Identification of 608 cardioviruses related to Theiler's murine encephalomyelitis virus in human 609 infections. Proc. Natl. Acad. Sci. U. S. A. 105:14124-14129. 610

9. Cordey, S., D. Gerlach, T. Junier, E. M. Zdobnov, L. Kaiser, and C. 611 Tapparel. 2008. The cis-acting replication elements define human enterovirus 612 and rhinovirus species. RNA. 14:1568-1578. 613

10. Domingo, E., C. Escarmis, E. Baranowski, C. M. Ruiz-Jarabo, E. Carrillo, 614 J. I. Nunez, and F. Sobrino. 2003. Evolution of foot-and-mouth disease 615 virus. Virus Res. 91:47-63. 616

11. Drummond, A. J., S. Y. W. Ho, M. J. Phillips, and A. Rambaut. 2006. 617 Relaxed phylogenetics and dating with confidence. PLoS Biol 4:699-710. 618

12. Drummond, A. J. and A. Rambaut. 2007. BEAST: Bayesian evolutionary 619 analysis by sampling trees. BMC Evol Biol 7. 620

13. Drummond, A. J. and Rambaut, A. Tracer v1.4, available from 621 http://beast.bio.ed.ac.uk/Tracer. 2007. 622

Ref Type: Generic 623

14. Edgar, R. C. 2004. MUSCLE: multiple sequence alignment with high 624 accuracy and high throughput. Nucl Acids Res 32:1792-1797. 625

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

26 of 37

15. Ehrenfeld, E., Domingo, E., and Roos, R. P. The Picornaviruses. 1-493. 2010. 626 Washington, ASM Press. 627

Ref Type: Edited Book 628

16. Fauquet, C. M., Mayo, M. A., Maniloff, J., Desselberger, U., and Ball, L. A. 629 Virus Taxonomy, Eighth Report of the International Committee on Taxonomy 630 of Viruses. 1-1259. 2005. Amsterdam, Elsevier, Academic Press. 631

Ref Type: Edited Book 632

17. Felsenstein, J. 1981. Evolutionary Trees from Dna-Sequences - A Maximum-633 Likelihood Approach. J Mol Evol 17:368-376. 634

18. Gerber, K., E. Wimmer, and A. V. Paul. 2001. Biochemical and Genetic 635 Studies of the Initiation of Human Rhinovirus 2 RNA Replication: 636 Identification of a cis-Replicating Element in the Coding Sequence of 637 2A(pro). J. Virol. 75:10979-90. 638

19. Goodfellow, I., Y. Chaudhry, A. Richardson, J. Meredith, J. W. Almond, 639 W. Barclay, and D. J. Evans. 2000. Identification of a cis-acting replication 640 element within the poliovirus coding region. J. Virol. 74:4590-4600. 641

20. Gorbalenya, A. E. and C. Lauber. 2010. Origin and Evolution of the 642 Picornaviridae Proteome, p. 253-270. In E. Ehrenfeld, E. Domingo, and R. P. 643 Roos (eds.), The Picornaviruses. ASM Press, Washington. 644

21. Gorbalenya, A. E., P. Lieutaud, M. R. Harris, B. Coutard, B. Canard, G. 645 J. Kleywegt, A. A. Kravchenko, D. V. Samborskiy, I. A. Sidorov, A. M. 646 Leontovich, and T. A. Jones. 2010. Practical application of bioinformatics by 647 the multidisciplinary VIZIER consortium. Antivir Res 87:95-110. 648

22. Greninger, A. L., C. Runckel, C. Y. Chiu, T. Haggerty, J. Parsonnet, D. 649 Ganem, and J. L. Derisi. 2009. The complete genome of klassevirus - a novel 650 picornavirus in pediatric stool. Virol. J. 6. 651

23. Guindon, S. and O. Gascuel. 2003. A simple, fast, and accurate algorithm to 652 estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696-704. 653

24. Hellen, C. U. T. and S. de Breyne. 2007. A distinct group of 654 hepacivirus/pestivirus-like internal ribosomal entry sites in members of 655 diverse Picornavirus genera: Evidence for modular exchange of functional 656 noncoding RNA elements by recombination. J. Virol. 81:5850-5863. 657

25. Hollister, J. R., A. Vagnozzi, N. J. Knowles, and E. Rieder. 2008. 658 Molecular and phylogenetic analyses of bovine rhinovirus type 2 shows it is 659 closely related to foot-and-mouth disease virus. Virology 373:411-425. 660

26. Holtz, L. R., S. R. Finkbeiner, C. D. Kirkwood, and D. Wang. 2008. 661 Identification of a novel picornavirus related to cosaviruses in a child with 662 acute diarrhea. Virol. J. 5. 663

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

27 of 37

27. Holtz, L. R., S. R. Finkbeiner, G. Y. Zhao, C. D. Kirkwood, R. Girones, J. 664 M. Pipas, and D. Wang. 2009. Klassevirus 1, a previously undescribed 665 member of the family Picornaviridae, is globally widespread. Virol. J. 6. 666

28. Jackson, D. J., R. E. Gangnon, M. D. Evans, K. A. Roberg, E. L. 667 Anderson, T. E. Pappas, M. C. Printz, W. M. Lee, P. A. Shult, E. 668 Reisdorf, K. T. Carlson-Dakes, L. P. Salazar, D. F. DaSilva, C. J. Tisler, J. 669 E. Gern, and R. F. Lemanske. 2008. Wheezing rhinovirus illnesses in early 670 life predict asthma development in high-risk children. Am J Resp Crit Care 671 Med 178:667-672. 672

29. Jegouic, S., M. L. Joffret, C. Blanchard, F. B. Riquet, C. Perret, I. 673 Pelletier, F. Colbere-Garapin, M. Rakoto-Andrianarivelo, and F. 674 Delpeyroux. 2009. Recombination between Polioviruses and Co-Circulating 675 Coxsackie A Viruses: Role in the Emergence of Pathogenic Vaccine-Derived 676 Polioviruses. PLoS Pathog. 5. 677

30. Jiang, P., J. A. J. Faase, H. Toyoda, A. Paul, E. Wimmer, and A. E. 678 Gorbalenya. 2007. Evidence for emergence of diverse polioviruses from C-679 cluster coxsackie A viruses and implications for global poliovirus eradication. 680 Proc. Natl. Acad. Sci. U. S. A. 104:9457-9462. 681

31. Johansson, S., B. Niklasson, J. Maizel, A. E. Gorbalenya, and A. M. 682 Lindberg. 2002. Molecular analysis of three Ljungan virus isolates reveals a 683 new, close-to-root lineage of the Picornaviridae with a cluster of two unrelated 684 2A proteins. J. Virol. 76:8920-8930. 685

32. Jones, M. S., V. V. Lukashov, R. D. Ganac, and D. P. Schnurr. 2007. 686 Discovery of a novel human picornavirus in a stool sample from a pediatric 687 patient presenting with fever of unknown origin. J. Clin. Microbiol. 45:2144-688 2150. 689

33. Kapoor, A., J. Victoria, P. Simmonds, E. Slikas, T. Chieochansin, A. 690 Naeem, S. Shaukat, S. Sharif, M. M. Alam, M. Angez, C. L. Wang, R. W. 691 Shafer, S. Zaidi, and E. Delwart. 2008. A highly prevalent and genetically 692 diversified Picornaviridae genus in South Asian children. Proc. Natl. Acad. 693 Sci. U. S. A. 105:20482-20487. 694

34. Kapoor, A., J. Victoria, P. Simmonds, C. Wang, R. W. Shafer, R. Nims, 695 O. Nielsen, and E. Delwart. 2008. A highly divergent picornavirus in a 696 marine mammal. J. Virol. 82:311-320. 697

35. King, A. M. Q., Adams, M. J., Carstens, E. B., and Lefkowitz, E. J. Virus 698 Taxonomy, Ninth Report of the International Committee on Taxonomy of 699 Viruses. 1-1327. 2012. Amsterdam, Elsevier, Academic Press. 700

Ref Type: Edited Book 701

36. Kingsbury, D. W. 1985. Species Classification Problems in Virus Taxonomy. 702 Intervirology 24:62-70. 703

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

28 of 37

37. Kirkegaard, K. and D. Baltimore. 1986. The mechanism of RNA 704 recombination in poliovirus. Cell 47:433-443. 705

38. Kitchen, A., L. A. Shackelton, and E. C. Holmes. 2011. Family level 706 phylogenies reveal modes of macroevolution in RNA viruses. Proc. Natl. 707 Acad. Sci. U. S. A. 108:238-243. 708

39. Knowles, N. J., T. Hovi, T. Hyypiä, A. M. Q. King, A. M. Lindberg, M. A. 709 Pallansch, A. C. Palmenberg, P. Simmonds, T. Skern, G. Stanway, T. 710 Yamashita, and R. Zell. 2012. Picornaviridae, p. 855-880. In A. M. Q. King, 711 M. J. Adams, E. B. Carstens, and E. J. Lefkowitz (eds.), Virus Taxonomy, 712 Ninth Report of the International Committee for the Taxonomy of Viruses. 713 Elsevier Academic Press, Amsterdam. 714

40. Knowles, N. J., T. Hovi, A. M. Q. King, and G. Stanway. 2010. Overview 715 of Taxonomy, p. 19-32. In E. Ehrenfeld, E. Domingo, and R. P. Roos (eds.), 716 The Picornaviruses. ASM Press, Washington. 717

41. Knowles, N. J. and A. R. Samuel. 2003. Molecular epidemiology of foot-718 and-mouth disease virus. Virus Res. 91:65-80. 719

42. Kong, W. P., G. D. Ghadge, and R. P. Roos. 1994. Involvement of 720 Cardiovirus Leader in Host Cell-Restricted Virus Expression. Proc. Natl. 721 Acad. Sci. U. S. A. 91:1796-1800. 722

43. Lauber, C. and A. E. Gorbalenya. 2012. Partitioning the Genetic Diversity 723 of a Virus Family: Approach and Evaluation through a Case Study of 724 Picornaviruses. J. Virol. 86:xxxx-yyyy. 725

44. Lewis-Rogers, N. and K. A. Crandall. 2010. Evolution of Picornaviridae: An 726 examination of phylogenetic relationships and cophylogeny. Mol Phylogenet 727 Evol 54:995-1005. 728

45. Li, F., G. F. Browning, M. J. Studdert, and B. S. Crabb. 1996. Equine 729 rhinovirus 1 is more closely related to foot-and-mouth disease virus than to 730 other picornaviruses. Proc. Natl. Acad. Sci. U. S. A. 93:990-995. 731

46. Liang, Z., A. S. M. Kumar, M. S. Jones, N. J. Knowles, and H. L. Lipton. 732 2008. Phylogenetic Analysis of the Species Theilovirus: Emerging Murine and 733 Human Pathogens. J. Virol. 82:11545-11554. 734

47. Lu, H. H., X. Y. Li, A. Cuconati, and E. Wimmer. 1995. Analysis of 735 Picornavirus 2A(Pro) Proteins - Separation of Proteinase from Translation and 736 Replication Functions. J. Virol. 69:7445-7452. 737

48. Lu, H. H. and E. Wimmer. 1996. Poliovirus chimeras replicating under the 738 translational control of genetic elements of hepatitis C virus reveal unusual 739 properties of the internal ribosomal entry site of hepatitis C virus. Proc. Natl. 740 Acad. Sci. U. S. A. 93:1412-1417. 741

49. Lukashev, A. N. 2010. Recombination among picornaviruses. Rev. Med. 742 Virol. 20:327-337. 743

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

29 of 37

50. Martinez-Salas, E. and M. D. Ryan. 2010. Translation and protein 744 processing, p. 141-161. In E. Ehrenfeld, E. Domingo, and R. P. Roos (eds.), 745 The Picornaviruses. ASM Press, Washington. 746

51. McIntyre, C. L., E. C. M. Leitch, C. Savolainen-Kopra, T. Hovi, and P. 747 Simmonds. 2010. Analysis of Genetic Diversity and Sites of Recombination 748 in Human Rhinovirus Species C. J. Virol. 84:10297-10310. 749

52. Oberste, M. S., X. Jiang, K. Maher, W. A. Nix, and B. M. Jiang. 2008. The 750 complete genome sequences for three simian enteroviruses isolated from 751 captive primates. Arch Virol 153:2117-2122. 752

53. Oberste, M. S., K. Maher, and M. A. Pallansch. 2002. Molecular phylogeny 753 and proposed classification of the simian picornaviruses. J. Virol. 76:1244-754 1251. 755

54. Oberste, M. S., K. Maher, and M. A. Pallansch. 2007. Complete genome 756 sequences for nine simian enteroviruses. J. Gen. Virol. 88:3360-3372. 757

55. Palmenberg, A., D. Neubauer, and T. Skern. 2010. Genome Organization 758 and Encoded Proteins, p. 3-17. In E. Ehrenfeld, E. Domingo, and R. P. Roos 759 (eds.), The Picornaviruses. ASM Press, Washington. 760

56. Palmenberg, A. C., J. A. Rathe, and S. B. Liggett. 2010. Analysis of the 761 complete genome sequences of human rhinovirus. Journal of Allergy and 762 Clinical Immunology 125:1190-1199. 763

57. Palmenberg, A. C., D. Spiro, R. Kuzmickas, S. Wang, A. Djikeng, J. A. 764 Rathe, C. M. Fraser-Liggett, and S. B. Liggett. 2009. Sequencing and 765 Analyses of All Known Human Rhinovirus Genomes Reveal Structure and 766 Evolution. Science. 324:55-59. 767

58. Piccone, M. E., H. H. Chen, R. P. Roos, and M. J. Grubman. 1996. 768 Construction of a chimeric Theiler's murine encephalomyelitis virus 769 containing the leader gene of foot-and-mouth disease virus. Virology 226:135-770 139. 771

59. Piccone, M. E., E. Rieder, P. W. Mason, and M. J. Grubman. 1995. The 772 Foot-And-Mouth-Disease Virus Leader Proteinase Gene Is Not Required for 773 Viral Replication. J. Virol. 69:5376-5382. 774

60. Pringle, C. R. 1991. The 20Th Meeting of the Executive-Committee of the 775 International-Committee-On-Virus-Taxonomy - Virus Species, Higher Taxa, 776 A Universal Virus Database, and Other Matters. Arch Virol 119:303-304. 777

61. Raup, D. M. 1994. The Role of Extinction in Evolution. Proc. Natl. Acad. Sci. 778 U. S. A. 91:6758-6763. 779

62. Reuter, G., A. Boldizsar, and P. Pankovics. 2009. Complete nucleotide and 780 amino acid sequences and genetic organization of porcine kobuvirus, a 781 member of a new species in the genus Kobuvirus, family Picornaviridae. Arch 782 Virol 154:101-108. 783

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

30 of 37

63. Schmidt, H. A., K. Strimmer, M. Vingron, and A. von Haeseler. 2002. 784 TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets 785 and parallel computing. Bioinformatics 18:502-504. 786

64. Semler, B. L. and E. Wimmer. 2002. Molecular biology of picornaviruses. 787 ASM Press, Washington, DC, U.S.A. 788

65. Sepkoski, J. J. 1998. Rates of speciation in the fossil record. Philos Trans R 789 Soc Lond B Biol Sci 353:315-326. 790

66. Shapiro, B., A. Rambaut, O. G. Pybus, and E. C. Holmes. 2006. A 791 phylogenetic method for detecting positive epistasis in gene sequences and its 792 application to RNA virus evolution. Mol. Biol. Evol. 23:1724-1730. 793

67. Sidorov, I. A., Samborskiy, D. V., Leontovich, A. M., and Gorbalenya, A. E. 794 SARGENS, Similarity-based Automatic Retrieval of Genetic Sequences. 795 http://veb.lumc.nl/SARGENS. 2009. 796

Ref Type: Online Source 797

68. Simmonds, P. 2006. Recombination and selection in the evolution of 798 picornaviruses and other mammalian positive-stranded RNA viruses. J. Virol. 799 80:11124-11140. 800

69. Simmonds, P., C. McIntyre, C. Savolainen-Kopra, C. Tapparel, I. M. 801 Mackay, and T. Hovi. 2010. Proposals for the classification of human 802 rhinovirus species C into genotypically assigned types. J. Gen. Virol. 91:2409-803 2419. 804

70. Stanway, G., F. Brown, P. Christian, T. Hovi, T. Hyypiae, A. M. Q. King, 805 N. J. Knowles, S. M. Lemon, P. D. Minor, M. A. Pallansch, A. C. 806 Palmenberg, and T. Skern. 2005. Picornaviridae, p. 757-778. In C. M. 807 Fauquet, M. A. Mayo, J. Maniloff, U. Desselberger, and L. A. Ball (eds.), 808 Virus Taxonomy, Eighth report of the International Committee on Taxonomy 809 of Viruses. Elsevier Academic Press. 810

71. Steil, B. P. and D. J. Barton. 2009. Cis-active RNA elements (CREs) and 811 picornavirus RNA replication. Virus Res. 139:240-252. 812

72. Tolskaya, E. A., L. A. Romanova, M. S. Kolesnikova, and V. I. Agol. 1983. 813 Intertypic Recombination in Poliovirus - Genetic and Biochemical-Studies. 814 Virology 124:121-132. 815

73. van Ooij, M. J. M., D. A. Vogt, A. Paul, C. Castro, J. Kuijpers, F. J. M. 816 van Kuppeveld, C. E. Cameron, E. Wimmer, R. Andino, and W. J. G. 817 Melchers. 2006. Structural and functional characterization of the 818 coxsackievirus B3 CRE(2C): role of CRE(2C) in negative- and positive-strand 819 RNA synthesis. J. Gen. Virol. 87:103-113. 820

74. Van Regenmortel, M. H. V. 1989. Applying the Species Concept to Plant-821 Viruses. Arch Virol 104:1-17. 822

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

31 of 37

75. Van Regenmortel, M. H. V. 2003. Viruses are real, virus species are man-823 made, taxonomic constructions. Arch Virol 148:2481-2488. 824

76. Van Regenmortel, M. H. V. 2007. Virus species and virus identification: Past 825 and current controversies. Inf Genet Evol 7:133-144. 826

77. Whelan, S. and N. Goldman. 2001. A general empirical model of protein 827 evolution derived from multiple protein families using a maximum-likelihood 828 approach. Mol. Biol. Evol. 18:691-699. 829

78. Wimmer, E. and A. Paul. 2010. Making of a picornavirus genome, p. 33-55. 830 In E. Ehrenfeld, E. Domingo, and R. P. Roos (eds.), The Picornaviruses. ASM 831 Press, Washington. 832

79. Wutz, G., H. Auer, N. Nowotny, B. Grosse, T. Skern, and E. Kuechler. 833 1996. Equine rhinovirus serotypes 1 and 2: Relationship to each other and to 834 aphthoviruses and cardioviruses. J. Gen. Virol. 77:1719-1730. 835

80. Yang, Y., R. Rijnbrand, K. L. McKnight, E. Wimmer, A. Paul, A. Martin, 836 and S. M. Lemon. 2002. Sequence requirements for viral RNA replication 837 and VPg uridylylation directed by the internal cis-acting replication element 838 (cre) of human rhinovirus type 14. J. Virol. 76:7485-7494. 839

81. Yang, Y., M. K. Yi, D. J. Evans, P. Simmonds, and S. M. Lemon. 2008. 840 Identification of a Conserved RNA Replication Element (cre) within the 841 3D(pol)-Coding Sequence of Hepatoviruses. J. Virol. 82:10118-10128. 842

82. Zheng, T. 2007. Characterisation of two enteroviruses isolated from 843 Australian brushtail possums (Trichosurus vulpecula) in New Zealand. Arch 844 Virol 152:191-198. 845

846 847

848

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

32 of 37

Figure Legends 849

Figure 1. Picornavirus genome organization. The organization of the picornavirus 850

genome is shown on the example of Porcine sapelovirus. Products derived after 851

cleavage of the encoded polyprotein are indicated by rectangles and names. They 852

include structural proteins (dark-grey background) forming virus particles, non-853

structural/accessory proteins (light-grey) involved in replication and expression and 854

the leader protein (white) which is not found in all picornaviruses. Horizontal bars 855

below highlight the six proteins conserved across the family, a concatenated, 856

picornavirus-wide multiple alignment of which forms the dataset of this study. 857

858

Figure 2. Phylogeny and GENETIC classification of the Picornaviridae. Shown is 859

a maximum likelihood phylogeny of 38 picornaviruses representing species diversity 860

based on the family-wide conserved proteins 1B, 1C, 1D, 2C, 3C and 3D. A Bayesian 861

analysis resulted in an identical tree topology (data not shown). The part of the tree 862

representing the ICTV-defined 28 species and 12 genera is drawn in black, and 863

provisional or currently not recognized taxa in grey. Clusters equivalent to ICTV 864

genera are highlighted by colored ovals. A split of Aphthovirus according to the 865

GENETIC classification is indicated (white line). Genera with identical coloring unite 866

to in total 11 super-genera identified in this study. The viruses shown represent the 867

following species (italics) or species-like clusters according to the GENETIC 868

classification: Porcine sapelovirus (PSV), Simian sapelovirus (SiSV), Avian 869

sapelovirus (AvSV), Human rhinovirus A (HRV-A), Human rhinovirus A2 (HRV-870

A2), Human rhinovirus B (HRV-B), Human rhinovirus C1 (HRV-C1), Human 871

rhinovirus C2 (HRV-C2), Human rhinovirus C3 (HRV-C3), Human enterovirus A 872

(HEV-A), Human enterovirus B (HEV-B), Human enterovirus C1 (HEV-C1), Human 873

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

33 of 37

enterovirus D (HEV-D), Simian enterovirus A (SiEV-A), Simian enterovirus B 874

(SiEV-B), Porcine enterovirus B (PEV-B), Bovine enterovirus (BEV), Bovine 875

kobuvirus (BKoV), Aichi virus (AiV), Salivirus A (SaliV-A), Human parechovirus 876

(HPeV), Ljungan virus (LjV), Duck hepatitis A virus (DuHV), Aquamavirus A (AqV-877

A), Hepatitis A virus (HAV), Avian encephalomyelitis virus (AvEMV), Foot-and-878

mouth disease virus (FMDV), Bovine rhinitis B virus (BRBV), Equine rhinitis A virus 879

(ERAV), Equine rhinitis B virus (ERBV), Theilovirus (TMEV), 880

Encephalomyocarditis virus (EMCV), Seneca Valley virus (SVV), Human cosavirus 881

A (CosaV-A), Human cosavirus B (CosaV-B), Human cosavirus D (CosaV-D), 882

Human cosavirus E (CosaV-E), Porcine teschovirus (PTeV). Numbers at branch 883

points provide support values from 1000 non-parametric bootstraps. The scale bar 884

represents 0.5 amino acid substitutions per site on average. 885

886

Figure 3. Intragroup genetic divergence and species sampling size. (A) Box-and-887

whisker graphs were used to plot distributions of distances between viruses from the 888

same species (orange), between viruses from different species but the same genus 889

(blue) and between viruses from different genera but the same super-genus (purple). 890

The boxes span from the first to the third quartile and include the median (bold line) 891

and the whiskers (dashed lines) extend to the extreme values. For name abbreviations 892

see legend of Fig. 2; numbers in brackets correspond to number of sequences per 893

species; open and filled diamonds indicate single and multiple host species range, 894

respectively. Genera and super-genera constituting only one species are not shown. 895

The corresponding first half of the PED distribution (see (43)) is depicted below. 896

Phylogenetic relationships of the 38 picornavirus species are shown by the cladogram 897

to the left (following the topology in Fig. 2) with intra-genus relations collapsed. 898

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

34 of 37

Colored shapes indicate those taxa that contribute to intragroup distances to the right. 899

Species and genera currently not recognized by ICTV are marked with asterisks and 900

discrepancies between the ICTV taxonomy and the GENETIC classification (not 901

caused by recently discovered viruses) are highlighted in red. (B) The relationship 902

between sampling size and maximum intragroup genetic divergence is shown for each 903

species. 904

905

Figure 4. Phylogeny of rhinoviruses. Shown is a ML phylogeny for 140 906

rhinoviruses based on the family-wide conserved proteins 1B, 1C, 1D, 2C, 3C and 907

3D. SH-like support values are shown for basal branching events. Species taxa 908

recognized by the GENETIC classification are indicated (see also legend of Fig. 2). A 909

minimal set of viruses sufficient to explain all violating PEDs that exceed the species 910

distance threshold are highlighted by grey dots (see Table 2 for details on involved 911

viruses). The scale bar represents 0.1 amino acid substitutions per site on average. 912

913

Figure 5. Taxonomy diagram and comparison of classification frameworks. 914

Shown is a taxonomy diagram for a classification under the ICTV framework (A) and 915

the DEmARC framework (B). For simplicity, the GENETIC classification is 916

visualized in both cases and super-genera are omitted for ICTV. Inter-virus genetic 917

divergence (as PED) increases linearly (arrow) from the perimeter (PED of zero) 918

toward the centre of the circle (maximum PED of 2.78). Applied distance thresholds 919

are shown as black dots and the delimited taxa as rectangle-like shapes. Taxa are 920

filled using the coloring scheme from Figure 3; the three basic colors represent, 921

respectively, the species (orange), genus (blue) and super-genus (purple) level. Each 922

color exists in two shadings that highlight, respectively, the limit on intragroup 923

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

35 of 37

genetic divergence according to a distance threshold (soft shading) and the maximum 924

observed intragroup genetic divergence (bright shading) of a taxon. Outside the circle, 925

the relative density of virus sampling per species is shown as grey shadings from low 926

(light) to high (dark) sampling, which is in the range of 1 (least sampled species) to 927

260 (most sampled species). For simplicity, species identities are indicated via a 928

binary system where the first and the second number represent the genus and the 929

species, respectively, as defined in the common legend below the circles. (A) ICTV 930

treats each genus independently (different heights of genus shapes) and species must 931

conform to genus-specific distance thresholds (equal heights of species shapes only 932

within the same genus). (B) In the DEmARC framework taxa are treated equally at 933

each level and they must conform to family-wide distance thresholds (equal, level-934

specific heights of taxon shapes). The space inside taxon shapes colored in soft 935

shading highlights the genetic diversity that may be missed by the current 936

picornavirus sampling, when assuming a universal, level-wide threshold that limits 937

the actual diversity of each taxon. 938

939

940

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

36 of 37

Table 1. Differences between GENETIC classification and ICTV taxonomy on the species level. 941 virusa difference

typeb ICTVc GENETICd

qualitye

Simian picornavirus 17 new - HEV-B 1 Simian picornavirus 13 new - HEV-A 1 Simian enterovirus SV19, SV43 new - HEV-A 1 Saffold virus new - TheiloV 1 Possum enterovirus W1, W6 new - BEV 1 Seal picornavirus type 1 new - AqV-A* - Simian enterovirus N125, N203, SV6 new - SiEV-B* 1 Enterovirus 103 isolate POo-1 new - SiEV-B* 1 Human cosavirus A1, A2 new - CosV-A* 1 Human cosavirus B new - CosV-B* - Human cosavirus D new - CosV-C* - Human cosavirus E new - CosV-D* - Salivirus NG-J1, Human klassevirus 1 new - SaliV-A* 1 Porcine kobuvirus S-1-HUN, K-30-HUN new - BKoV .833 Human rhinovirus VR-1118, VR-1155, VR-1301 new - HRV-Aβ* 1

Human rhinovirus C 026, NY-074, NAT001, QPM mm HRV-C

HRV-Cα* 1

Human rhinovirus C 025 mm HRV-C

HRV-Cβ* -

Human rhinovirus C N4, N10, NAT045 mm HRV-C

HRV-Cγ* .500 a shown is the Definition field value in the Genbank annotation of one or several viruses 942 b a virus was not available or assigned to a tentative species at time of the ICTV release (new); a 943 mismatch was observed between the ICTV taxonomy and GENETIC classification (mm) 944 c it is shown to which species the virus is classified in the ICTV taxonomy; - if not available at the time 945 d it is shown to which species the virus was assigned in the GENETIC classification; new species 946 proposed by the GENETIC-classification are indicated using asterisks; for species abbreviations see 947 legend of Fig. 2 948 e the proportion of intra-species PED values not exceeding the species distance threshold; - for clusters 949 with less than 3 viruses 950

951

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

37 of 37

Table 2. Violations to a distance threshold in the GENETIC classification. 952

accession virusa threshold violationsb costc

FJ445152 Human rhinovirus 71, ATCC VR-1181 species 33 .902 FJ445136 Human rhinovirus 51, ATCC VR-1161 species 17 .770 GQ415052 Human rhinovirus A, hrv-A101-v1 species 16 .707 FJ445147 Human rhinovirus 65, ATCC VR-1175 species 14 .577 FJ445156 Human rhinovirus 80, ATCC VR-1190 species 14 .431 GQ415051 Human rhinovirus A, hrv-A101 species 13 .434 FJ445120 Human rhinovirus 20, ATCC VR-1130 species 13 .393 DQ473507 Human rhinovirus 53 species 11 .285 FJ445150 Human rhinovirus 68, ATCC VR-1178 species 11 .187 DQ473508 Human rhinovirus 28 species 10 .255 DQ473506 Human rhinovirus 46 species 6 .154 FJ445183 Human rhinovirus 78, ATCC VR-1188 species 6 .149 EF173418 Human rhinovirus 78 species 6 .130 DQ473497 Human rhinovirus 23 species 1 .003 NC_009996 Human rhinovirus C species 2 .100 EF077280 Human rhinovirus NAT045 species 1 .049 NC_004421 Bovine kobuvirus species 1 .011 AF119795 Enterovirus 71, TW/2272/98 genus 21 .157 NC_006553 Avian sapelovirus super-genus 7 .195 a Definition field value in the Genbank annotation; viruses of the same taxon are separated from others 953 by an empty row; only the minimal subset of violating viruses sufficient to explain all violating PEDs are 954 listed 955 b number of PEDs exceeding the respective distance threshold 956 c cumulative value of the disagreement of a virus to the respective distance threshold; calculated as the 957 virus-specific clustering cost (see (43)) using the threshold as a unit 958 959 960 961

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

1 1000 2000 3000 4000 5000 6000 7000 8000 nt

2B2A 2C3A

1B3B1A

3C1C 1DL 3D 3’5’

structural genes non-structural genes

Figure 1 Lauber&Gorbalenya (b) on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

HRV-Cγ

HEV-C

SiEV-B

SaliV-

A

SiSV

HRV-A

BKoV

HEV-D

PTeV

CosaV-A

CosaV-E

DuHV

SVV

BEVPEV-B

LjV

HAV

HRV-C

β

ERAVAiV

HEV-A

EMCV

AvSV

HRV-C

α

PSV

SiEV-A

HEV-B

CosaV-D

TMEV

FMDV

CosaV-B

HRV-B

BRBV

HPeV

ERBV

AvE

MV Aq

V-A

HRV-A

β

1000

1000

1000

1000

610

994

704

1000739 955

1000833

566 10

00

1000

1000

1000

462

998

1000

1000

1000990

940

1000705

826

1000

720

971

975

570

1000

0.5

1000

1000

997

Aphthovirus

Erbovirus

Cosavirus

Teschovirus

Aquamavirus

Avihepatovirus

Parechovirus

HepatovirusTremovirus

Kobuvirus

Salivirus

Sapelovirus

Senecavirus

Cardiovirus

Enterovirus

Figure 2 Lauber&Gorbalenya (b)

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

(4)HRV-Cγ*(1)HRV-Cβ*

0PED

0.2 0.4 0.6 0.8 1.0 1.2

0 0.2 0.4 0.6 0.8 1.0 1.2

(1)

(31)(2)

(65)

(3)(55)(3)

(19)(18)

(2)

(2)(2)(2)(3)

(6)(1)

(260)

(37)(7)

(1)(1)(2)

(1)

(3)

(4)

(7)

(2)

(159)

(32)

(6)

(240)

(132)

(96)

(15)

(5)(4)

AqV-A*

PTeVERBV

DuHV

AvEMVHAVSaliV-A*

EMCVTheiloV

SVV

CosV-E*CosV-D*CosV-B*CosV-A*

ERAVBRBVFMDV

HPeVLjV

AvSVPSVSiSV

SiEV-A

HRV-Aβ*

HRV-Cα*

HEV-D

PEV-B

HEV-B

HRV-B

SiEV-B*

HEV-C

HEV-A

HRV-A

BEV

AiVBKoV

Entero

Sapelo

Cardio

Seneca

Aphtho

ErboTescho

Parecho

Avihepato

TremoHepato

Kobu

A

0 50 100 150 200 25000.10.20.30.4

virus sampling size

max

PE

D

B

Cosa*

Sali*

Aquama*

single host speciesmultiple host species

not recognized by ICTV * sampling size(n)

Figure 3 Lauber&Gorbalenya (b)

intra-species inter-speciesintra-genus

inter-speciesinter-genus

intra-super-genus

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

0.1

HRV-Aβ

HRV-B

HRV-Cβ

HRV-Cγ

HRV-Cα

HRV-A

1

1

0.920.98

1

1

1

1

0.99

0.58

1

Figure 4 Lauber&Gorbalenya (b)

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from

B

Sapelovirus12.1 - SiSV12.2 - PSV12.3 - AvSV

Cardiovirus9.1 - TheiloV9.2 - EMCV

Aphthovirus6.1 - FMDV6.2 - BRBV7.1 - ERAV

Cosavirus*14.1 - CosV-A*14.2 - CosV-B*14.3 - CosV-D*14.4 - CosV-E*

Aquamavirus*16.1 - AqV-A*

Hepatovirus1.1 - HAV

Kobuvirus4.1 - AiV4.2 - BKoV

Erbovirus15.1 - ERBV

Teschovirus3.1 - PTeV

Senecavirus10.1 - SVV

Avihepatovirus13.1 - DuHV

Parechovirus8.1 - HPeV8.2 - LjV

Salivirus*5.1 - SaliV-A*

Tremovirus2.1 - AvEMV

Enterovirus11.1 - HEV-C11.2 - HEV-A 11.3 - HEV-B11.4 - BEV11.5 - SiEV-B* 11.6 - PEV-B11.7 - HEV-D

11.8 - HRV-A11.9 - HRV-B 11.10 - HRV-Cα*11.11 - HRV-Cγ*11.12 - HRV- Cβ*11.13 - HRV-Aβ*11.14 - SiEV-A

Figure 5 Lauber&Gorbalenya (b)

1.1

2.1

3.1

4.1

4.2

5.1

6.16.

27.18.29.19.210.1

11.111.2

11.3

11.4

11.5

11.6

11.7

11.8

11.9

11.10

11.11

11.1

2

11.1

3

11.1

4 12.2

12.3

13.1

14.1

14.2

14.3

14.4

15.1

16.1

8.112.1

inter−virusdivergence

DEmARC

A

inter−virusdivergence

ICTV1.1

2.1

3.1

4.1

4.2

5.1

6.16.

27.18.29.19.210.1

11.111.2

11.3

11.4

11.5

11.6

11.7

11.8

11.9

11.10

11.11

11.1

2

11.1

3

11.1

4 12.2

12.3

13.1

14.1

14.2

14.3

14.4

15.1

16.1

8.112.1

on July 14, 2018 by guesthttp://jvi.asm

.org/D

ownloaded from