domain architecture aware inference of orthologs … · domain-centric phyloinformatics pipeline....

23
1 PROTOCOL FOR THE ANALYSIS AND CLASSIFICATION OF VIRAL PROTEINS USING DOMAIN-ARCHITECTURE AWARE INFERENCE OF ORTHOLOGS (DAIO) July 16, 2019 Publication describing approach: Zmasek CM, Knipe DM, Pellett PE, Scheuermann RH (2019) “Classification of human Herpesviridae proteins using Domain-architecture Aware Inference of Orthologs (DAIO)” Virology, 529, 29-42 [https://doi.org/10.1016/j.virol.2019.01.005] 1. Background We used a computational approach called Domain-architecture Aware Inference of Orthologs (DAIO, see: https://sites.google.com/site/cmzmasek/home/software/forester/daio) for the analysis of genomes by combining phylogenetic and protein domain-architecture information. We performed a systematic phylogenetic and protein domain architecture-based study, encompassing the entire proteomes of Herpesviridae, Poxviridae, and Coronaviridae to define Strict Ortholog Groups (SOGs). Besides assessing the taxonomic distribution for each protein, we computationally inferred gene duplications and performed a protein domain architecture analysis for every protein family. These results allowed us to develop a novel classification system for viral proteins which clusters proteins into SOGs and allows to quickly infer the taxonomic distribution for any protein. Phylogenomics. Orthologs have been defined by Fitch in 1970 as homologous genes in different species that diverged from a common ancestral gene by speciation. Genes that, either in the same or different species, diverged by a gene duplication have been termed paralogs (Fitch, 2000, 1970). While the terms ortholog and paralog have no functional implications (Jensen, 2001), orthologs are oftentimes considered potentially more functionally similar than paralogs at the same level of sequence divergence. This has been termed the “ortholog conjecture” and remains a topic of active research (Altenhoff et al., 2012; Chen and Zhang, 2012; Nehrt et al., 2011; Rogozin et al., 2014), due to its importance for computational sequence functional analysis (Eisen, 1998; Zmasek and Eddy, 2002) and the essential significance of gene duplications for biological evolution by supplying the raw genetic precursor material [(Zhang, 2003) and references therein]. Orthologs (or groups/clusters of orthologs) are oftentimes inferred by indirect methods based on (reciprocal) pairwise highest similarities [e.g. (Remm et al., 2001; Tatusov et al., 1997)]. In this approach, we used explicit phylogenetic inference followed by comparison to a trusted species tree for orthology inference, as this approach is likely to yield more accurate results, albeit at a cost of significant time complexity (Zmasek and Eddy, 2002, 2001). Protein domains and domain architectures. Many eukaryotic proteins, and by extension – eukaryotic viruses, are composed of multiple domains, units with their own evolutionary history and, often, specific and conserved functions. The ordered

Upload: others

Post on 27-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DOMAIN ARCHITECTURE AWARE INFERENCE OF ORTHOLOGS … · domain-centric phyloinformatics pipeline. Any arrangement of single or multiple Pfam domains is considered a domain architecture

1

PROTOCOL FOR THE ANALYSIS AND CLASSIFICATION OF VIRAL PROTEINS USING

DOMAIN-ARCHITECTURE AWARE INFERENCE OF ORTHOLOGS (DAIO)

July 16, 2019 Publication describing approach: Zmasek CM, Knipe DM, Pellett PE, Scheuermann RH (2019) “Classification of human Herpesviridae proteins using Domain-architecture Aware Inference of Orthologs (DAIO)” Virology, 529, 29-42 [https://doi.org/10.1016/j.virol.2019.01.005] 1. Background We used a computational approach called Domain-architecture Aware Inference of Orthologs (DAIO, see: https://sites.google.com/site/cmzmasek/home/software/forester/daio) for the analysis of genomes by combining phylogenetic and protein domain-architecture information. We performed a systematic phylogenetic and protein domain architecture-based study, encompassing the entire proteomes of Herpesviridae, Poxviridae, and Coronaviridae to define Strict Ortholog Groups (SOGs). Besides assessing the taxonomic distribution for each protein, we computationally inferred gene duplications and performed a protein domain architecture analysis for every protein family. These results allowed us to develop a novel classification system for viral proteins which clusters proteins into SOGs and allows to quickly infer the taxonomic distribution for any protein. Phylogenomics. Orthologs have been defined by Fitch in 1970 as homologous genes in different species that diverged from a common ancestral gene by speciation. Genes that, either in the same or different species, diverged by a gene duplication have been termed paralogs (Fitch, 2000, 1970). While the terms ortholog and paralog have no functional implications (Jensen, 2001), orthologs are oftentimes considered potentially more functionally similar than paralogs at the same level of sequence divergence. This has been termed the “ortholog conjecture” and remains a topic of active research (Altenhoff et al., 2012; Chen and Zhang, 2012; Nehrt et al., 2011; Rogozin et al., 2014), due to its importance for computational sequence functional analysis (Eisen, 1998; Zmasek and Eddy, 2002) and the essential significance of gene duplications for biological evolution by supplying the raw genetic precursor material [(Zhang, 2003) and references therein]. Orthologs (or groups/clusters of orthologs) are oftentimes inferred by indirect methods based on (reciprocal) pairwise highest similarities [e.g. (Remm et al., 2001; Tatusov et al., 1997)]. In this approach, we used explicit phylogenetic inference followed by comparison to a trusted species tree for orthology inference, as this approach is likely to yield more accurate results, albeit at a cost of significant time complexity (Zmasek and Eddy, 2002, 2001). Protein domains and domain architectures. Many eukaryotic proteins, and by extension – eukaryotic viruses, are composed of multiple domains, units with their own evolutionary history and, often, specific and conserved functions. The ordered

Page 2: DOMAIN ARCHITECTURE AWARE INFERENCE OF ORTHOLOGS … · domain-centric phyloinformatics pipeline. Any arrangement of single or multiple Pfam domains is considered a domain architecture

2

arrangement of all domains in a given protein constitutes its architecture. Many domains can combine with different partner domains and, as a result, form a wide variety of domain combinations, often even within the same species (Moore et al., 2008). Bringing together multiple domains in one protein creates a distinct entity, combining functions of its constituents. The emergence of proteins with novel domain combinations is considered to be a major mechanism of evolution of new functionality in eukaryotic genomes (Itoh et al., 2007; Peisajovich et al., 2010). It is especially important in the evolution of pathways, where physical proximity of domains in multidomain proteins links different elements of the pathway; thus, emergence of a new domain combination may rearrange pathways or processes in the cell (Peisajovich et al., 2010). The modular structure of eukaryotic proteins provides a mechanism that promotes differentiation and variation of protein functions despite the existence of only a limited number of domains. Proteins can gain (or lose) new domains in genome rearrangements, creating (or removing) domain combinations (Patthy, 2003; Ye and Godzik, 2004). We use a systematic description and classification of the entire set of human Herpesviridae, Poxviridae, and Coronaviridae proteins using Domain-architecture Aware Inference of Orthologs (DAIO). Besides classifying proteins based on their evolutionary history, we equally consider their domain architectures for classification. In particular, our goal is to classify proteins into groups, which we call “Strict Ortholog Groups” (SOGs), in which all proteins exhibit the same domain architecture and are orthologous to each other (related by speciation events). Furthermore, we attempt to provide an informative name for each of these groups. A name that not only includes information about the protein’s function (if known), but also has a suffix that indicates the taxonomic distribution of a protein.

Page 3: DOMAIN ARCHITECTURE AWARE INFERENCE OF ORTHOLOGS … · domain-centric phyloinformatics pipeline. Any arrangement of single or multiple Pfam domains is considered a domain architecture

3

2. Taxonomy Suffixes 2.1. Herpesviridae: Taxonomic Suffixes _ALPHA Alphaherpesvirinae (subfamily)

_BETA Betaherpesvirinae (subfamily)

_CMV Cytomegalovirus (genus)

_GAMMA Gammaherpesvirinae (subfamily)

_H6 Betaherpesvirus 6A and betaherpesvirus 6B (species)

_HERPESVIRIDAE Herpesviridae (family)

_LYMPHOCRYPTOVIRUS Lymphocryptovirus (genus)

_MACAVIRUS Macavirus (genus)

_MARDIVIRUS Mardivirus (genus)

_PERCAVIRUS Percavirus (genus)

_PROBOSCIVIRUS Proboscivirus (genus)

_RHADINOVIRUS Rhadinovirus (genus)

_ROSEOLOVIRUS Roseolovirus (genus)

_SIMPLEXVIRUS Simplexvirus (genus)

_VARICELLOVIRUS Varicellovirus (genus)

Uppercase suffixes mean that corresponding SOG members are found in each species of

a taxonomic unit, lowercase suffixes mean that corresponding SOG members are found

in some species of a taxonomic unit.

Numbers in square brackets are used to differentiate Herpesviridae SOGs with the same

(automatically inferred) name but different domain architectures.

Page 4: DOMAIN ARCHITECTURE AWARE INFERENCE OF ORTHOLOGS … · domain-centric phyloinformatics pipeline. Any arrangement of single or multiple Pfam domains is considered a domain architecture

4

Herpesviridae taxonomic tree used in this work.

Page 5: DOMAIN ARCHITECTURE AWARE INFERENCE OF ORTHOLOGS … · domain-centric phyloinformatics pipeline. Any arrangement of single or multiple Pfam domains is considered a domain architecture

5

2.2. Poxviridae: Taxonomic Suffixes _ALPHAENTOMOPOXVIRUS Alphaentomopoxvirus (genus)

_AVIPOXVIRUS Avipoxvirus (genus)

_BETAENTOMOPOXVIRUS Betaentomopoxvirus (genus)

_CAPRIPOXVIRUS Capripoxvirus (genus)

_CENTAPOXVIRUS Centapoxvirus (genus)

_CERVIDPOXVIRUS Cervidpoxvirus (genus)

_CHORDOPOXVIRINAE Chordopoxvirinae (subfamily)

_CROCODYLIDPOXVIRUS Crocodylidpoxvirus (genus)

_ENTOMOPOXVIRINAE Entomopoxvirinae (subfamily)

_LEPORIPOXVIRUS Leporipoxvirus (genus)

_MOLLUSCIPOXVIRUS Molluscipoxvirus (genus)

_ORTHOPOXVIRUS Orthopoxvirus (genus)

_PARAPOXVIRUS Parapoxvirus (genus)

_POXVIRIDAE Poxviridae (family)

_SUIPOXVIRUS Suipoxvirus (genus)

_YATAPOXVIRUS Yatapoxvirus (genus)

Uppercase suffixes mean that corresponding SOG members are found in each species of

a taxonomic unit, lowercase suffixes mean that corresponding SOG members are found

in some species of a taxonomic unit.

Numbers in square brackets are used to differentiate Poxviridae SOGs with the same

(automatically inferred) name but different domain.

Page 6: DOMAIN ARCHITECTURE AWARE INFERENCE OF ORTHOLOGS … · domain-centric phyloinformatics pipeline. Any arrangement of single or multiple Pfam domains is considered a domain architecture

6

Poxviridae taxonomic tree used in this work.

Page 7: DOMAIN ARCHITECTURE AWARE INFERENCE OF ORTHOLOGS … · domain-centric phyloinformatics pipeline. Any arrangement of single or multiple Pfam domains is considered a domain architecture

7

2.3. Coronaviridae: Taxonomic Suffixes _ALPHACORONAVIRUS Alphacoronavirus (genus)

_BAFINIVIRUS Bafinivirus (genus)

_BETACORONAVIRUS Betacoronavirus (genus)

_CORONAVIRIDAE Coronaviridae (family)

_CORONAVIRINAE Coronavirinae (subfamily)

_DELTACORONAVIRUS Deltacoronavirus (genus)

_GAMMACORONAVIRUS Gammacoronavirus (genus)

_TOROVIRINAE Torovirinae (subfamily)

_TOROVIRUS Torovirus (genus)

Uppercase suffixes mean that corresponding SOG members are found in each species of

a taxonomic unit, lowercase suffixes mean that corresponding SOG members are found

in some species of a taxonomic unit.

Numbers in square brackets are used to differentiate Coronaviridae SOGs with the

same (automatically inferred) name but different domain architectures.

Page 8: DOMAIN ARCHITECTURE AWARE INFERENCE OF ORTHOLOGS … · domain-centric phyloinformatics pipeline. Any arrangement of single or multiple Pfam domains is considered a domain architecture

8

Coronaviridae taxonomic tree used in this work.

Page 9: DOMAIN ARCHITECTURE AWARE INFERENCE OF ORTHOLOGS … · domain-centric phyloinformatics pipeline. Any arrangement of single or multiple Pfam domains is considered a domain architecture

9

3. Method Description Sequence retrieval. Individual protein sequences were downloaded from the ViPR database (Pickett et al., 2012), while entire proteomes were downloaded from UniProtKB (Bateman et al., 2017). Multiple sequence alignments. Multiple sequence alignments were calculated using MAFFT version 7.313 (with “localpair” and “maxiterate 1000” options) (Katoh and Standley, 2013; Kuraku et al., 2013). Prior to phylogenetic inference, multiple sequence alignment columns with more than 50% gaps were deleted; for comparison we also performed the analyses based on alignments for which we only deleted columns with more than 90% gaps. Protein domain analysis. Protein domains were analyzed using HMMER v3.1b2 (Eddy, 2011) and the Pfam 31.0 database (Finn et al., 2016). Phylogenetic analyses. Phylogenetic trees were calculated for individual domain architectures (not full-length sequences) except for US22 domain proteins, because US22 domain alignments lack phylogeneticly sufficient signal. Distance-based minimal evolution trees were inferred by FastME 2.0 (Desper and Gascuel, 2002) (with balanced tree swapping and “GME” initial tree options) based on pairwise distances calculated by TREE-PUZZLE 5.2 (Schmidt et al., 2002) (using the WAG substitution model (Whelan and Goldman, 2001), a uniform model of rate heterogeneity, estimation of amino acid frequencies from the dataset, and approximate parameter estimation using a Neighbor-Joining tree). For maximum likelihood approaches, we employed RAxML version 8.2.9 (Stamatakis et al., 2005) (using 100 bootstrapped data sets and the WAG substitution model). Gene duplication inferences were performed using the SDI and RIO methods (Zmasek and Eddy, 2002, 2001). Automated genome wide domain composition analysis was performed using a specialized software tool, Surfacing version 2.002 [Zmasek CM (2012), a tool for the functional analysis of domainome/genome evolution [available at https://sites.google.com/site/cmzmasek/home/software/forester/surfacing]. Phylogenomic analyses and development of novel naming schema using strict ortholog groups. The processes for defining and naming strict ortholog groups were formalized into a set of “rules” and then implemented into a semi-automatic domain-centric phyloinformatics pipeline. Any arrangement of single or multiple Pfam domains is considered a domain architecture (DA) (Zmasek and Godzik, 2012, 2011). In Herpesviridae, most proteins have DAs consisting of only a single domain. For example, the UDG domain of Uracil DNA glycosylase is a single domain DA, whereas the combination of N-terminal DNA_pol_B_exo1 and C-terminal DNA_pol_B (denoted as DNA_pol_B_exo1––DNA_pol_B) of DNA polymerases is a DA with two domains. In this analysis we consider a given DA “present” in a given Herpesviridae species S if the DA is present under a set of thresholds in at least one strain of the species S. The

Page 10: DOMAIN ARCHITECTURE AWARE INFERENCE OF ORTHOLOGS … · domain-centric phyloinformatics pipeline. Any arrangement of single or multiple Pfam domains is considered a domain architecture

10

rationale for this is that it is possible to miss a DA in a genome, due to incomplete or erroneous sequences, erroneous assembly and gene-predication (false negatives), and even recent, actual, gene loss. The opposite (false positive), on the other hand, is far less likely. For this work, we used two thresholds: a minimal domain length of 40% of the length set forth in the Pfam database (domain fragments are unlikely to be functionally equivalent to full length domains) and a BLAST E-value cutoff of E=10-6. For every domain architecture, a set of bootstrap resampled phylogenetic trees (gene trees) was calculated by RAxML (Stamatakis et al., 2005) using protein sequences from one representative for each of the nine human Herpesviridae species. For comparison and validation, we also calculated phylogenetic trees that included non-human hosted Herpesviridae. For illustrations, gene duplications were inferred by comparing the consensus gene trees to the species tree (Fig. 1) for Herpesviridae using the SDI (Speciation Duplication Inference) algorithm (Zmasek and Eddy, 2001). To obtain confidence values on orthology assignments (bootstrap support values), we employed the RIO approach (Resampled Inference of Orthologs) to compare sets of bootstrap resampled phylogenetic trees with the species tree for Herpesviridae (Zmasek and Eddy, 2002). In this work, we define a strict ortholog group (SOG) as sequences related by speciation events and exhibiting the same domain architecture (based on Pfam domains from Pfam 31.0, a length threshold of 40%, and E-value cutoff of E=10-6). Based on this approach for defining SOGs, we developed the following naming syntax. For protein families such as Uracil DNA glycosylase, which exhibit the same DA in all nine human Herpesviridae, and which are related by speciation events only, we use the “Recommended name” (under “Protein names”) from the UniProtKB database (Bateman et al., 2017) as the base name and add a case-sensitive suffix which indicates the taxonomic distribution - “ABG” in this case, since Uracil DNA glycosylase appears in each human Alpha-, Beta-, and Gammaherpesvirinae species. Therefore, the full name is “ABG/Uracil DNA glycosylase”. To indicate presence in some, but not all members of a subfamily, we use lower-case suffixes. “Ab/Replication origin-binding protein” implies that members of this SOG are present in all human Alphaherpesvirinae species (“A”), and in some (but not all) Betaherpesvirinae (“b”). While most of the human Herpesviridae protein families fall into these basic cases, families which have a (some) domain(s) in common but differ in their DA, are more difficult to rationally name. An example of such a family is Glycoprotein B described above. Because members of this family have different DAs, namely “Glycoprotein_B” and “HCMVantigenic_N—Glycoprotein_B”, it is composed of two SOGs (named “ABG.AbG/Glycoprotein B” and “ABG.b/Glycoprotein B”). In such cases, we split the suffix into two parts, separated by a period. The first part (“ABG”) indicates overall presence of common domain(s) for all members of this SOG, Glycoprotein_B in this case. The second part (after the period) relates to entire DAs. “.AbG” of “ABG.AbG/Glycoprotein B” means that the Glycoprotein_B DA is present in all human Alpha- and Gamma-, and some Betaherpesvirinae. “.b” of “ABG.b/Glycoprotein B” implies that the “HCMVantigenic_N—Glycoprotein_B” DA is present in some Betaherpesvirinae.

Page 11: DOMAIN ARCHITECTURE AWARE INFERENCE OF ORTHOLOGS … · domain-centric phyloinformatics pipeline. Any arrangement of single or multiple Pfam domains is considered a domain architecture

11

References Altenhoff, A.M., Studer, R.A., Robinson-Rechavi, M., Dessimoz, C., 2012. Resolving the

ortholog conjecture: Orthologs tend to be weakly, but significantly, more similar in function than paralogs. PLoS Comput. Biol. 8, e1002514. https://doi.org/10.1371/journal.pcbi.1002514

Bateman, A., Martin, M.J., O’Donovan, C., Magrane, M., Alpi, E., Antunes, R., Bely, B., Bingley, M., Bonilla, C., Britto, R., Bursteinas, B., Bye-AJee, H., Cowley, A., Da Silva, A., De Giorgi, M., Dogan, T., Fazzini, F., Castro, L.G., Figueira, L., Garmiri, P., Georghiou, G., Gonzalez, D., Hatton-Ellis, E., Li, W., Liu, W., Lopez, R., Luo, J., Lussi, Y., MacDougall, A., Nightingale, A., Palka, B., Pichler, K., Poggioli, D., Pundir, S., Pureza, L., Qi, G., Rosanoff, S., Saidi, R., Sawford, T., Shypitsyna, A., Speretta, E., Turner, E., Tyagi, N., Volynkin, V., Wardell, T., Warner, K., Watkins, X., Zaru, R., Zellner, H., Xenarios, I., Bougueleret, L., Bridge, A., Poux, S., Redaschi, N., Aimo, L., ArgoudPuy, G., Auchincloss, A., Axelsen, K., Bansal, P., Baratin, D., Blatter, M.C., Boeckmann, B., Bolleman, J., Boutet, E., Breuza, L., Casal-Casas, C., De Castro, E., Coudert, E., Cuche, B., Doche, M., Dornevil, D., Duvaud, S., Estreicher, A., Famiglietti, L., Feuermann, M., Gasteiger, E., Gehant, S., Gerritsen, V., Gos, A., Gruaz-Gumowski, N., Hinz, U., Hulo, C., Jungo, F., Keller, G., Lara, V., Lemercier, P., Lieberherr, D., Lombardot, T., Martin, X., Masson, P., Morgat, A., Neto, T., Nouspikel, N., Paesano, S., Pedruzzi, I., Pilbout, S., Pozzato, M., Pruess, M., Rivoire, C., Roechert, B., Schneider, M., Sigrist, C., Sonesson, K., Staehli, S., Stutz, A., Sundaram, S., Tognolli, M., Verbregue, L., Veuthey, A.L., Wu, C.H., Arighi, C.N., Arminski, L., Chen, C., Chen, Y., Garavelli, J.S., Huang, H., Laiho, K., McGarvey, P., Natale, D.A., Ross, K., Vinayaka, C.R., Wang, Q., Wang, Y., Yeh, L.S., Zhang, J., 2017. UniProt: The universal protein knowledgebase. Nucleic Acids Res. 45, D158–D169. https://doi.org/10.1093/nar/gkw1099

Chen, X., Zhang, J., 2012. The Ortholog Conjecture Is Untestable by the Current Gene Ontology but Is Supported by RNA Sequencing Data. PLoS Comput. Biol. 8, e1002784. https://doi.org/10.1371/journal.pcbi.1002784

Desper, R., Gascuel, O., 2002. Fast and Accurate Phylogeny Minimum-Evolution Principle. J. Comput. Biol. 9, 687–705.

Eddy, S.R., 2011. Accelerated Profile HMM Searches. PLoS Comput. Biol. 7, e1002195. https://doi.org/10.1371/journal.pcbi.1002195

Eisen, J. a., 1998. Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Res. 8, 163–7. https://doi.org/10.1101/gr.8.3.163

Finn, R.D., Coggill, P., Eberhardt, R.Y., Eddy, S.R., Mistry, J., Mitchell, A.L., Potter, S.C., Punta, M., Qureshi, M., Sangrador-Vegas, A., Salazar, G.A., Tate, J., Bateman,

Page 12: DOMAIN ARCHITECTURE AWARE INFERENCE OF ORTHOLOGS … · domain-centric phyloinformatics pipeline. Any arrangement of single or multiple Pfam domains is considered a domain architecture

12

A., 2016. The Pfam protein families database: Towards a more sustainable future. Nucleic Acids Res. 44, D279–D285. https://doi.org/10.1093/nar/gkv1344

Fitch, W.M., 2000. Homology. Trends Genet. 16, 227–231. Fitch, W.M., 1970. Distinguishing Homologous from Analogous Proteins. Syst. Zool. 19,

99–113. https://doi.org/10.2307/2412448 Itoh, M., Nacher, J.C., Kuma, K., Goto, S., Kanehisa, M., 2007. Evolutionary history and

functional implications of protein domains and their combinations in eukaryotes. Genome Biol. 8, R121. https://doi.org/10.1186/gb-2007-8-6-r121

Jensen, R.A., 2001. Orthologs and paralogs - we need to get it right. Genome Biol. 2, INTERACTIONS1002.

Katoh, K., Standley, D.M., 2013. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–80. https://doi.org/10.1093/molbev/mst010

Kuraku, S., Zmasek, C.M., Nishimura, O., Katoh, K., 2013. aLeaves facilitates on-demand exploration of metazoan gene family trees on MAFFT sequence alignment server with enhanced interactivity. Nucleic Acids Res. 41, W22–W28. https://doi.org/10.1093/nar/gkt389

Moore, A.D., Björklund, Å.K., Ekman, D., Bornberg-Bauer, E., Elofsson, A., 2008. Arrangements in the modular evolution of proteins. Trends Biochem. Sci. 33, 444–451. https://doi.org/10.1016/j.tibs.2008.05.008

Nehrt, N.L., Clark, W.T., Radivojac, P., Hahn, M.W., 2011. Testing the ortholog conjecture with comparative functional genomic data from mammals. PLoS Comput. Biol. 7. https://doi.org/10.1371/journal.pcbi.1002073

Patthy, L., 2003. Modular assembly of genes and the evolution of new functions. Genetica 118, 217–31.

Peisajovich, S.G., Garbarino, J.E., Wei, P., Lim, W. a, 2010. Rapid diversification of cell signaling phenotypes by modular domain recombination. Science (80-. ). 328, 368–72. https://doi.org/10.1126/science.1182376

Pickett, B.E., Greer, D.S., Zhang, Y., Stewart, L., Zhou, L., Sun, G., Gu, Z., Kumar, S., Zaremba, S., Larsen, C.N., Jen, W., Klem, E.B., Scheuermann, R.H., 2012. Virus pathogen Database and Analysis Resource (ViPR): A comprehensive bioinformatics Database and Analysis Resource for the Coronavirus research community. Viruses 4, 3209–3226. https://doi.org/10.3390/v4113209

Remm, M., Storm, C.E.V., Sonnhammer, E.L.L., 2001. Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J. Mol. Biol. 314, 1041–1052. https://doi.org/10.1006/jmbi.2000.5197

Rogozin, I.B., Managadze, D., Shabalina, S.A., Koonin, E. V., 2014. Gene family level comparative analysis of gene expression n mammals validates the ortholog conjecture. Genome Biol. Evol. 6, 754–762. https://doi.org/10.1093/gbe/evu051

Schmidt, H. a, Strimmer, K., Vingron, M., von Haeseler, A., 2002. TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18, 502–4.

Stamatakis, a, Ludwig, T., Meier, H., 2005. RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics 21, 456–63.

Page 13: DOMAIN ARCHITECTURE AWARE INFERENCE OF ORTHOLOGS … · domain-centric phyloinformatics pipeline. Any arrangement of single or multiple Pfam domains is considered a domain architecture

13

https://doi.org/10.1093/bioinformatics/bti191 Tatusov, R.L., Koonin, E. V, Lipman, D.J., 1997. A genomic perspective on protein

families. Science (80-. ). 278, 631–637. https://doi.org/DOI 10.1126/science.278.5338.631

Whelan, S., Goldman, N., 2001. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol. 18, 691–9.

Ye, Y., Godzik, A., 2004. Comparative analysis of protein domain organization. Genome Res. 14, 343–53. https://doi.org/10.1101/gr.1610504

Zhang, J., 2003. Evolution by gene duplication: An update. Trends Ecol. Evol. 18, 292–298. https://doi.org/10.1016/S0169-5347(03)00033-8

Zmasek, C.M., Eddy, S.R., 2002. RIO: analyzing proteomes by automated phylogenomics using resampled inference of orthologs. BMC Bioinformatics 3, 14.

Zmasek, C.M., Eddy, S.R., 2001. A simple algorithm to infer gene duplication and speciation events on a gene tree. Bioinformatics 17, 821–828. https://doi.org/10.1093/bioinformatics/17.9.821

Zmasek, C.M., Godzik, A., 2012. This Déjà Vu Feeling—Analysis of Multidomain Protein Evolution in Eukaryotic Genomes. PLoS Comput. Biol. 8, e1002701. https://doi.org/10.1371/journal.pcbi.1002701

Zmasek, C.M., Godzik, A., 2011. Strong functional patterns in the evolution of eukaryotic genomes revealed by the reconstruction of ancestral protein domain repertoires. Genome Biol. 12, R4. https://doi.org/10.1186/gb-2011-12-1-r4

Page 14: DOMAIN ARCHITECTURE AWARE INFERENCE OF ORTHOLOGS … · domain-centric phyloinformatics pipeline. Any arrangement of single or multiple Pfam domains is considered a domain architecture

14

Appendix: Taxonomic distribution of SOGs The columns are: 1. Taxonomy (see above) 2. Taxonomic coverage 3. Domain Architecture (Pfam domains) 4. SOG name base Herpesviridae ALPHA [28/28] Fusion_gly_K Envelope glycoprotein K ALPHA [28/28] Herpes_UL14 Tegument protein UL14 [2] ALPHA [28/28] Herpes_UL20 Envelope protein UL20 ALPHA [28/28] Herpes_UL4 Nuclear protein UL4 ALPHA [28/28] Herpes_UL46 Tegument protein VP11/12 ALPHA [28/28] Herpes_UL49_2 Tegument protein VP22 ALPHA [28/28] Herpes_gE Envelope glycoprotein E ALPHA [28/28] Marek_A Envelope glycoprotein C BETA [17/17] Cytomega_gL Envelope glycoprotein L BETA [17/17] DUF587--Herpes_UL87 Protein UL87 BETA [17/17] Herpes_U59 Tegument protein UL88 BETA [17/17] UL97 Tegument serine/threonine protein kinase CMV [6/6] Bax1-I Membrane protein US20 CMV [6/6] Cytomega_UL84 Protein UL84 CMV [6/6] Gp_UL130 Envelope glycoprotein UL130 CMV [6/6] Herpes_IE1 Regulatory protein IE1 CMV [6/6] UL141 Membrane protein UL14 GAMMA [16/16] DUF832 ORF 48 protein GAMMA [16/16] Herpes_BLRF2 ORF 52 protein GAMMA [16/16] Herpes_BMRF2 ORF 58 protein GAMMA [16/16] Herpes_BTRF1 ORF 23 protein GAMMA [16/16] Herpes_DNAp_acc BMRF1 GAMMA [16/16] Herpes_ORF11 ORF 10 protein GAMMA [16/16] Phage_glycop_gL Envelope glycoprotein L [4] GAMMA [16/16] Tegument_dsDNA--AIRS_C--GATase_5 BNRF1 H6 [3/3] Rep_N--Parvo_NS1 Protein U94 HERPESVIRIDAE [61/61] Herpes_Helicase DNA replication helicase HERPESVIRIDAE [61/61] Herpes_MCP Major capsid protein HERPESVIRIDAE [61/61] Herpes_UL16 Cytoplasmic envelopment protein 2 HERPESVIRIDAE [61/61] Herpes_UL17 Capsid vertex component 1 HERPESVIRIDAE [61/61] Herpes_UL24 Nuclear protein UL24 HERPESVIRIDAE [61/61] Herpes_UL25 Capsid vertex component 2 HERPESVIRIDAE [61/61] Herpes_UL52 DNA primase HERPESVIRIDAE [61/61] Herpes_UL6 Portal protein HERPESVIRIDAE [61/61] Herpes_V23 Triplex capsid protein 2 HERPESVIRIDAE [61/61] Herpes_alk_exo Alkaline nuclease HERPESVIRIDAE [61/61] Herpes_glycop Envelope glycoprotein M HERPESVIRIDAE [61/61] Herpes_glycop_H Envelope glycoprotein H HERPESVIRIDAE [61/61] PRTP Tripartite terminase subunit 1 HERPESVIRIDAE [61/61] Peptidase_S21 Capsid scaffolding protein HERPESVIRIDAE [61/61] UDG Uracil-DNA glycosylase HERPESVIRIDAE [61/61] Viral_DNA_bp Major DNA-binding protein LYMPHOCRYPTOVIRUS [4/4] EBV-NA1 EBNA-1 MACAVIRUS [4/4] AIRS_C--GATase_5 ORF 3 protein MARDIVIRUS [5/5] Marek_SORF3 SORF3 MUROMEGALOVIRUS [4/4] M157 M151 PERCAVIRUS [2/2] CARD Apoptosis regulator E10 ROSEOLOVIRUS [4/4] HHV6-IE U90 ROSEOLOVIRUS [4/4] Herpes_U15 U15

Page 15: DOMAIN ARCHITECTURE AWARE INFERENCE OF ORTHOLOGS … · domain-centric phyloinformatics pipeline. Any arrangement of single or multiple Pfam domains is considered a domain architecture

15

ROSEOLOVIRUS [4/4] Herpes_U26 Protein U26 ROSEOLOVIRUS [4/4] Herpes_U47 U47 protein ROSEOLOVIRUS [4/4] Herpes_U55 U55 protein alpha [11/28] DNA_pol_B_exo1--DNA_pol_B--DNAPolymera_Pol DNA polymerase [2] alpha [11/28] Herpes_UL42 DNA polymerase processivity subunit [2] alpha [12/28] Herpes_UL1 Envelope glycoprotein L [2] alpha [13/28] DUF1314 Myristylated tegument protein CIRC alpha [13/28] Herpes_UL1--GlyL_C Envelope glycoprotein L [3] alpha [15/28] Herpes_UL42--Herpes_UL42 DNA polymerase processivity subunit [3] alpha [18/28] Herpes_UL49_5 Envelope glycoprotein N alpha [19/28] Alpha_TIF Transactivating tegument protein VP16 alpha [19/28] Herpes_US9 Membrane protein US9 alpha [21/28] Gene66 Virion protein US10 alpha [21/28] Herpes_UL43 Envelope protein UL43 alpha [22/28] UL45 Membrane protein UL45 alpha [23/28] Herpes_UL55 Nuclear protein UL55 alpha [23/28] US2 Virion protein US2 alpha [24/28] UL11 Cytoplasmic envelopment protein 3 [2] alpha [25/28] Herpes_ICP4_N--Herpes_ICP4_C Transcriptional regulator ICP4 [2] alpha [25/28] Herpes_glycop_D Envelope glycoprotein D alpha [26/28] Herpes_IE68 Regulatory protein ICP22 alpha [26/28] Herpes_UL35 Small capsomere-interacting protein [2] alpha [27/28] Herpes_UL21 Tegument protein UL21 alpha [27/28] Herpes_UL3 Nuclear protein UL3 alpha [27/28] Herpes_UL37_1 Inner tegument protein [2] alpha [27/28] Herpes_UL47 Tegument protein VP13/14 alpha [27/28] Herpes_UL51 Tegument protein UL51 [2] alpha [27/28] Herpes_gI Envelope glycoprotein I alpha [27/28] Herpes_teg_N--Herpes_UL36 Large tegument protein deneddylase [2] alpha [3/28] Herpes_ICP4_N Immediate-early transactivator protein alpha [5/28] Herpes_gp2 Envelope glycoprotein alpha [7/28] XPG_I UL41 alpha [8/28] zf-RING_2 Ubiquitin E3 ligase ICP0 [2] beta [10/17] Herpes_UL74 Envelope glycoprotein O beta [12/17] DUF2664 Tegument protein UL14 beta [12/17] HV_small_capsid Small capsomere-interacting protein beta [13/17] Herpes_U5 Protein UL27 beta [14/17] DUF570 Protein UL31 beta [14/17] Herpes_IE2_3 Protein UL117 beta [14/17] Herpes_PAP DNA polymerase processivity subunit beta [14/17] Herpes_UL37_2 Envelope glycoprotein UL37 beta [14/17] Herpes_UL82_83 Tegument protein pp71 beta [14/17] Herpes_pp85 Tegument protein UL35 beta [14/17] U79_P34 Protein UL112 beta [14/17] US22--US22 Tegument protein UL24 beta [16/17] Ribonuc_red_lgC Ribonucleoside-diphosphate reductase large subunit-like protein [2] beta [3/17] ig--C2-set_2 Membrane protein EE22A beta [5/17] Herpes_UL32 Tegument protein pp150 beta [5/17] Herpes_env--Herpes_env Packaging protein UL32 [2] cmv [2/6] Adeno_E3_CR1 Membrane RL1 protein3 cmv [2/6] An_peroxidase Rh10 cmv [2/6] C1-set Membrane protein A13 cmv [2/6] Cytomega_TRL10 Envelope glycoprotein RL10 cmv [2/6] Cytomega_UL20A Glycoprotein UL22A cmv [2/6] HCMV_UL139 Membrane glycoprotein UL139 cmv [2/6] HHV-5_US34A Protein US34A cmv [2/6] MHC_I Membrane glycoprotein UL18 cmv [2/6] MHC_I--C1-set Membrane protein A18 cmv [2/6] TNFR_c6 Membrane glycoprotein UL144 cmv [2/6] UL2 Protein UL2 cmv [2/6] UL40 Membrane glycoprotein UL40 cmv [3/6] US22--US22--US22--US22 Protein UL29 [2] cmv [4/6] Cytomega_US3 Membrane glycoprotein US2 cmv [4/6] DUF2677 Membrane protein UL121 cmv [4/6] US22--US22--US22 Protein UL29 cmv [4/6] gpUL132 Envelope glycoprotein UL132

Page 16: DOMAIN ARCHITECTURE AWARE INFERENCE OF ORTHOLOGS … · domain-centric phyloinformatics pipeline. Any arrangement of single or multiple Pfam domains is considered a domain architecture

16

cmv [5/6] CMV_US Membrane glycoprotein US11 gamma [12/16] DUF717 ORF 30 protein gamma [14/16] DUF848 ORF 35 protein gamma [14/16] Herpes_BBRF1 BRRF1 gamma [14/16] Herpes_TAF50 ORF 50 protein gamma [15/16] DUF2733 Cytoplasmic envelopment protein 3 gamma [15/16] Herpes_TK--Herpes_TK_C Thymidine kinase [2] gamma [15/16] Herpes_capsid Small capsomere-interacting protein [3] gamma [4/16] DED FLICE inhibitory protein gamma [6/16] BALF1 BALF1 gamma [8/16] Herpes_HEPA--Herpes_heli_pri Helicase-primase subunit gamma [8/16] Herpes_heli_pri ORF 41 protein herpesviridae [12/61] IL10 Interleukin-10 herpesviridae [15/61] DNA_pack_N Packaging protein herpesviridae [15/61] US22 Tegument protein UL26 herpesviridae [16/61] DNA_pack_C DNA packaging terminase subunit 1 herpesviridae [17/61] Herpes_UL87 ORF 24 protein herpesviridae [2/61] ig Membrane protein EE22 herpesviridae [22/61] 7tm_1 Envelope glycoprotein UL33 herpesviridae [29/61] Pkinase Serine/threonine protein kinase US3 herpesviridae [3/61] AAA_31 Plasmid-partitioning protein SopA herpesviridae [3/61] CAT Chloramphenicol acetyltransferase herpesviridae [3/61] ParBc Plasmid partition protein B herpesviridae [3/61] Phage_integrase Cre herpesviridae [3/61] Rep_3 Replication initiation protein RepE herpesviridae [3/61] ig--ig Membrane protein EE51 herpesviridae [30/61] Herpes_U30 Inner tegument protein herpesviridae [31/61] Herpes_UL79 Protein UL79 herpesviridae [32/61] Herpes_TK Thymidine kinase herpesviridae [32/61] Herpes_U44 Tegument protein UL51 herpesviridae [33/61] Herpes_UL49_1 Protein UL49 herpesviridae [33/61] Herpes_UL92 Protein UL92 herpesviridae [33/61] Herpes_UL95 Protein UL95 herpesviridae [33/61] Herpes_teg_N Large tegument protein deneddylase herpesviridae [34/61] Herpes_UL73 Envelope glycoprotein N [2] herpesviridae [35/61] Herpes_ori_bp DNA replication origin-binding helicase herpesviridae [4/61] Branch Beta-1,3-galactosyl-O-glycosyl-glycoprotein beta-1,6-N-acetylglucosaminyltransferase herpesviridae [4/61] Sema Membrane protein TE7 herpesviridae [43/61] Ribonuc_red_lgN--Ribonuc_red_lgC Ribonucleoside-diphosphate reductase large subunit herpesviridae [46/61] Ribonuc_red_sm Ribonucleoside-diphosphate reductase small subunit herpesviridae [5/61] IL8 Chemokine vCXCL6 herpesviridae [5/61] Lectin_C B168.5 herpesviridae [5/61] V-set OX-2 membrane glycoprotein homolog herpesviridae [50/61] DNA_pol_B_exo1--DNA_pol_B DNA polymerase herpesviridae [52/61] dUTPase Deoxyuridine 5'-triphosphate nucleotidohydrolase herpesviridae [54/61] DNA_pack_N--DNA_pack_C Tripartite terminase subunit 3 herpesviridae [55/61] Herpes_HEPA DNA helicase/primase complex-associated protein herpesviridae [56/61] Herpes_UL69 Multifunctional expression regulator [2] herpesviridae [57/61] Herpes_UL7 Cytoplasmic envelopment protein 1 herpesviridae [57/61] Herpes_env Packaging protein UL32 herpesviridae [58/61] Herpes_UL33 Tripartite terminase subunit 2 herpesviridae [6/61] Bcl-2 Apoptosis regulator BHRF1 herpesviridae [60/61] Glycoprotein_B Envelope glycoprotein B herpesviridae [60/61] Herpes_U34 Nuclear egress protein 2 herpesviridae [60/61] Herpes_UL31 Nuclear egress protein 1 herpesviridae [60/61] Herpes_VP19C Triplex capsid protein 1 herpesviridae [9/61] Thymidylat_synt Thymidylate synthase lymphocryptovirus [3/4] EBV-NA3 EBNA-3C lymphocryptovirus [3/4] Herpes_BLLF1 Envelope glycoprotein gp350 lymphocryptovirus [3/4] Herpes_LMP2 C7 macavirus [3/4] bZIP_2 A2 mardivirus [3/5] Herpes_pp38 Pp38 mardivirus [4/5] DUF1509 ORF 437 protein rhadinovirus [2/6] Cyclin_N--Herp-Cyclin V-cyclin rhadinovirus [2/6] Cyclin_N--K-cyclin_vir_C ORF 72 protein rhadinovirus [2/6] IRF--IRF-3 K9

Page 17: DOMAIN ARCHITECTURE AWARE INFERENCE OF ORTHOLOGS … · domain-centric phyloinformatics pipeline. Any arrangement of single or multiple Pfam domains is considered a domain architecture

17

rhadinovirus [2/6] dUTPase--dUTPase Deoxyuridine 5'-triphosphate nucleotidohydrolase [2] rhadinovirus [3/6] DHFR_1 Dihydrofolate reductase rhadinovirus [3/6] RINGv E3 ubiquitin-protein ligase MIR1 rhadinovirus [3/6] Sushi--Sushi--Sushi Complement control protein homolog ccph roseolovirus [2/4] US22--Herpes_U5 U7 simplexvirus [3/9] HHV-1_VABD--Herpes_UL69 Multifunctional expression regulator simplexvirus [4/9] Alpha_GJ Envelope glycoprotein J simplexvirus [5/9] Herpes_US12 TAP transporter inhibitor ICP47 simplexvirus [6/9] Herpes_UL56 Membrane protein UL56 simplexvirus [8/9] Alpha_TIF--HSV_VP16_C Transactivating tegument protein VP16 [2] varicellovirus [6/12] zf-C3HC4 Transcriptional regulator varicellovirus [8/12] Herpes_IR6 IR6

Page 18: DOMAIN ARCHITECTURE AWARE INFERENCE OF ORTHOLOGS … · domain-centric phyloinformatics pipeline. Any arrangement of single or multiple Pfam domains is considered a domain architecture

18

Coronaviridae CORONAVIRINAE [42/42] Corona_M Membrane protein GAMMACORONAVIRUS [2/2] IBV_3C Envelope protein IBV_3C TOROVIRUS [2/2] DUF2632 Membrane glycoprotein TOROVIRUS [2/2] Macro--Corona_NS2A Replicase [1] TOROVIRUS [2/2] Nucleocapsid-N Nucleocapsid phosphoprotein TOROVIRUS [2/2] Spike_torovirin Spike glycoprotein [5] alphacoronavirus [15/16] Corona_NS3b Non-structural protein 3 alphacoronavirus [2/16] CNV-Replicase_N CNV-Replicase_N protein alphacoronavirus [2/16] CNV-Replicase_N--Viral_protease--Macro--Viral_protease--Corona_NSP4_C--Peptidase_C30--nsp7--nsp8--nsp9--NSP10 Replicase 1a alphacoronavirus [2/16] Corona_7 Non-structural protein 7 alphacoronavirus [2/16] Macro--Viral_protease--Corona_NSP4_C--Peptidase_C30--nsp7--nsp8--nsp9--NSP10--Corona_RPol_N--AAA_30--AAA_12--NSP11--NSP13 Replicase [2] alphacoronavirus [2/16] Viral_protease--Macro--Viral_protease--Corona_NSP4_C--Peptidase_C30--nsp7--nsp8--nsp9--NSP10--Corona_RPol_N--AAA_30--AAA_12--NSP11--NSP13 ORF 1ab protein [4] protein alphacoronavirus [4/16] Corona_6B_7B Non-structural protein 7b alphacoronavirus [4/16] Viral_protease--Macro--Viral_protease--Corona_NSP4_C--Peptidase_C30--nsp7--nsp8--nsp9--NSP10 ORF 1a protein [6] protein alphacoronavirus [6/16] Viral_protease--Macro--Viral_protease ORF 1 protein betacoronavirus [2/14] Corona_NS1 4.9 kDa non-structural protein betacoronavirus [2/14] Macro--SUD-M--Viral_protease--NAR Macro SUD-M Viral_protease NAR protein betacoronavirus [3/14] DUF3477 Replicase polyprotein 1ab [3] betacoronavirus [3/14] Peptidase_C16--Macro--Viral_protease--NAR Replicase polyprotein 1ab [9] betacoronavirus [4/14] Corona_NS4 4.8 kDa non-structural protein betacoronavirus [5/14] Corona_I Internal protein betacoronavirus [5/14] Corona_NS2A Non-structural protein 2a betacoronavirus [6/14] Corona_NS2 Non-structural protein betacoronavirus [7/14] Spike_NTD--Spike_rec_bind--Corona_S2 Spike glycoprotein [3] betacoronavirus [8/14] Spike_rec_bind--Corona_S2 Spike glycoprotein [4] coronaviridae [8/46] Hema_esterase Hemagglutinin-esterase [2] coronavirinae [12/42] Corona_RPol_N Polyprotein [1] coronavirinae [14/42] AAA_30--AAA_12 Replicase polyprotein 1ab [15] coronavirinae [15/42] Corona_NSP4_C Replicase polyprotein 1ab [1] coronavirinae [16/42] NSP10 Replicase polyprotein 1ab [5] coronavirinae [16/42] NSP11 Replicase polyprotein 1ab [6] coronavirinae [16/42] NSP13 Replicase polyprotein 1ab [7] coronavirinae [16/42] Peptidase_C30 Replicase polyprotein 1ab [10] coronavirinae [16/42] nsp7 Replicase polyprotein 1ab [11] coronavirinae [16/42] nsp8 Replicase polyprotein 1ab [12] coronavirinae [16/42] nsp9 Replicase polyprotein 1ab [13] coronavirinae [2/42] Macro--Viral_protease--Corona_NSP4_C--Peptidase_C30--nsp7--nsp8--nsp9--NSP10 ORF 1a protein [3] protein coronavirinae [2/42] Viral_helicase1 ORF 1ab protein [8] protein coronavirinae [26/42] Corona_S1--Corona_S2 Spike glycoprotein [1] coronavirinae [29/42] NS3_envE Envelope protein NS3_envE coronavirinae [3/42] AAA_30 Replicase polyprotein 1ab [14] coronavirinae [3/42] Corona_RPol_N--AAA_30--AAA_12--NSP11--NSP13 Replicase 1b coronavirinae [39/42] Corona_nucleoca Nucleoprotein [1] coronavirinae [4/42] Corona_S2 Spike glycoprotein [2] coronavirinae [4/42] Corona_nucleoca--Corona_nucleoca Nucleoprotein [2] coronavirinae [4/42] Macro--Viral_protease Polyprotein [2] coronavirinae [5/42] Corona_RPol_N--RdRP_1 Replicase polyprotein 1ab [2]

Page 19: DOMAIN ARCHITECTURE AWARE INFERENCE OF ORTHOLOGS … · domain-centric phyloinformatics pipeline. Any arrangement of single or multiple Pfam domains is considered a domain architecture

19

Poxviridae AVIPOXVIRUS [5/5] Ank_2--Ank_2--PRANC Ankyrin repeat protein [4] AVIPOXVIRUS [5/5] Ank_2--Ank_4--PRANC Ankyrin repeat protein [6] AVIPOXVIRUS [5/5] Bcl-2 Bcl-2 AVIPOXVIRUS [5/5] DFP SWPV2-ORF092 AVIPOXVIRUS [5/5] ELO GNS1/SUR4 AVIPOXVIRUS [5/5] Flavoprotein HAL3 domain protein AVIPOXVIRUS [5/5] TANGO2 T10 protein AVIPOXVIRUS [5/5] dNK Deoxycytidine kinase CAPRIPOXVIRUS [3/3] Pox_Ag35--Pox_Ag35 Late transcription factor VLTF-4 [2] CERVIDPOXVIRUS [2/2] Endothelin Endothelin CERVIDPOXVIRUS [2/2] IL1 IL-1 receptor antagonist CHORDOPOXVIRINAE [33/33] Chordopox_A20R DNA polymerase processivity factor CHORDOPOXVIRINAE [33/33] Chordopox_G2 Late transcription elongation factor CHORDOPOXVIRINAE [33/33] Chordopox_L2 Crescent membrane and immature virion formation protein CHORDOPOXVIRINAE [33/33] DNAP_B_exo_N--DNA_pol_B_exo1--DNA_pol_B_3--DNA_pol_B DNA polymerase [2] CHORDOPOXVIRINAE [33/33] DUF678 Zinc finger-like protein [1] CHORDOPOXVIRINAE [33/33] PLDc_3 Palmytilated EEV membrane protein CHORDOPOXVIRINAE [33/33] Peptidase_M44 Metalloendopeptidase CHORDOPOXVIRINAE [33/33] Pox_A6 Virion morphogenesis protein CHORDOPOXVIRINAE [33/33] Pox_A8 Intermediate transcription factor VITF-3 [1] CHORDOPOXVIRINAE [33/33] Pox_D2 Virion protein [1] CHORDOPOXVIRINAE [33/33] Pox_E2-like Iev morphogenesis protein CHORDOPOXVIRINAE [33/33] Pox_E6 Virion protein [2] CHORDOPOXVIRINAE [33/33] Pox_E8 Membrane protein [1] CHORDOPOXVIRINAE [33/33] Pox_F12L EEV maturation protein CHORDOPOXVIRINAE [33/33] Pox_G7 Virion core protein [4] CHORDOPOXVIRINAE [33/33] Pox_H7 Late protein H7 CHORDOPOXVIRINAE [33/33] Pox_I1 Telomere-binding protein I1 CHORDOPOXVIRINAE [33/33] Pox_I3 DNA-binding phosphoprotein [2] CHORDOPOXVIRINAE [33/33] Pox_J1 Virion protein [3] CHORDOPOXVIRINAE [33/33] Pox_L5 Membrane protein [2] CHORDOPOXVIRINAE [33/33] Pox_P21 IMV membrane protein [10] CHORDOPOXVIRINAE [33/33] Pox_P35 IMV heparin binding surface protein CHORDOPOXVIRINAE [33/33] Pox_RNA_Pol_19 DNA-directed RNA polymerase 19 kDa subunit CHORDOPOXVIRINAE [33/33] Pox_RNA_pol RNA polymerase subunit RPO18 CHORDOPOXVIRINAE [33/33] Pox_TAP Late transcription factor VLTF-1 CHORDOPOXVIRINAE [33/33] Pox_VP8_L4R Core protein vp8 CHORDOPOXVIRINAE [33/33] Pox_mRNA-cap mRNA capping enzyme small subunit CHORDOPOXVIRINAE [33/33] RNA_pol_Rpb1_1--RNA_pol_Rpb1_2--RNA_pol_Rpb1_3--RNA_pol_Rpb1_4--RNA_pol_Rpb1_5 DNA-dependent RNA polymerase subunit rpo147 CHORDOPOXVIRINAE [33/33] rpo30_N--TFIIS_C DNA-directed RNA polymerase 30 kDa polypeptide ENTOMOPOXVIRINAE [7/7] AIG2_2 Gamma-glutamyl cyclotransferase like ENTOMOPOXVIRINAE [7/7] EF-hand_7 Putative calcium binding protein ENTOMOPOXVIRINAE [7/7] Glyco_transf_25 Putative glycosyl transferase ENTOMOPOXVIRINAE [7/7] Lipase_3 Triacylglycerol lipase ENTOMOPOXVIRINAE [7/7] MSV199 N1R/p28-like protein [2] ENTOMOPOXVIRINAE [7/7] MSV199--T5orf172 N1R/p28-like protein [3] ENTOMOPOXVIRINAE [7/7] PP2C Protein phosphatase 2C ENTOMOPOXVIRINAE [7/7] Spheroidin Spheroidin ORTHOPOXVIRUS [10/10] Orthopox_F14 Protein F14 ORTHOPOXVIRUS [10/10] Orthopox_F7 CPXV054 protein PARAPOXVIRUS [5/5] PDGF VEGF-like protein POXVIRIDAE [40/40] Chordopox_G3 Entry/fusion complex component POXVIRIDAE [40/40] L1R_F9L IMV membrane protein [4] POXVIRIDAE [40/40] Peptidase_C57 Viral core cysteine proteinase POXVIRIDAE [40/40] Pox_A21 IMV membrane protein [6] POXVIRIDAE [40/40] Pox_A22 Holliday junction resolvase POXVIRIDAE [40/40] Pox_A28 Envelope protein A28 homolog POXVIRIDAE [40/40] Pox_A32 ATPase [1] POXVIRIDAE [40/40] Pox_ATPase-GT--Pox_MCEL mRNA capping enzyme large subunit POXVIRIDAE [40/40] Pox_E10 Sulfhydryl oxidase POXVIRIDAE [40/40] Pox_G5 FEN1-like nuclease

Page 20: DOMAIN ARCHITECTURE AWARE INFERENCE OF ORTHOLOGS … · domain-centric phyloinformatics pipeline. Any arrangement of single or multiple Pfam domains is considered a domain architecture

20

POXVIRIDAE [40/40] Pox_G9-A16 Myristylated protein POXVIRIDAE [40/40] Pox_L3_FP4 Internal virion protein POXVIRIDAE [40/40] Pox_LP_H2 Entry-fusion complex essential component POXVIRIDAE [40/40] Pox_P4B Virion core protein P4b POXVIRIDAE [40/40] Pox_Rap94 RAP94 POXVIRIDAE [40/40] Pox_Rif Rifampicin resistance protein POXVIRIDAE [40/40] Pox_VERT_large Early transcription factor 82 kDa subunit POXVIRIDAE [40/40] Pox_ser-thr_kin Serine/threonine-protein kinase avipoxvirus [2/5] Ank_2--Ank_2--Ank_4--Ank_2--PRANC Ankyrin repeat family protein avipoxvirus [2/5] Ank_2--Ank_4--Ank_2--PRANC Ankyrin repeat protein [5] avipoxvirus [2/5] Ank_2--Ank_5--PRANC CNPV234 ankyrin repeat protein avipoxvirus [2/5] Ank_4--Ank_2--Ank_4--Ank_2--PRANC CNPV042 ankyrin repeat protein avipoxvirus [2/5] Ank_5--Ank_2--Ank_2--PRANC SWPV1-282 avipoxvirus [2/5] Arrestin_N--Arrestin_C CNPV149 putative thioredoxin binding protein avipoxvirus [2/5] IL8 CNPV283 CC chemokine-like protein avipoxvirus [2/5] Phosphodiest Alkaline phosphodiesterase [1] avipoxvirus [2/5] RNA_helicase CNPV200 Rep-like protein avipoxvirus [2/5] VHL_C VHL_C protein avipoxvirus [3/5] Ank_2--Ank_2--Ank_2 Ankyrin repeat protein [2] avipoxvirus [3/5] Ank_2--Ank_2--Ank_5--PRANC CNPV009 ankyrin repeat protein avipoxvirus [3/5] DUF2661 SWPV1-270 avipoxvirus [3/5] NGF CNPV279 beta-NGF-like protein avipoxvirus [3/5] TspO_MBR Fp9.037 protein avipoxvirus [3/5] UL45 CNPV307 C-type lectin-like protein avipoxvirus [4/5] Ank_4--Ank_2--Ank_2--PRANC Ankyrin repeat protein [9] avipoxvirus [4/5] Ank_4--Ank_2--PRANC Ankyrin repeat protein [11] avipoxvirus [4/5] DNase_II Deoxyribonuclease II avipoxvirus [4/5] PTH2 Putative peptidyl-tRNA hydrolase avipoxvirus [4/5] Phosphodiest--Endonuclease_NS Alkaline phosphodiesterase [2] avipoxvirus [4/5] SNAP AlphaSNAP betaentomopoxvirus [2/5] Kunitz_BPTI AMV007 betaentomopoxvirus [2/5] MADF_DNA_bdg AMV266 betaentomopoxvirus [2/5] MSV199--MSV199--T5orf172 MSV199 MSV199 T5orf172 protein betaentomopoxvirus [2/5] Methyltransf_23 Methyltransf_23 protein betaentomopoxvirus [2/5] Methyltransf_31 AMV004 betaentomopoxvirus [2/5] NMN_transporter AMV001 betaentomopoxvirus [2/5] RNA_ligase AMV019 betaentomopoxvirus [2/5] RNA_pol_Rpb1_2--RNA_pol_Rpb1_4--RNA_pol_Rpb1_5 RNA polymerase RPO147 betaentomopoxvirus [2/5] zf-C3HC4_3 Inhibitor of apoptosis 4 betaentomopoxvirus [3/5] Endonuclea_NS_2 Endonuclea_NS_2 protein betaentomopoxvirus [3/5] P35 p35 apoptosis inhibitor betaentomopoxvirus [3/5] ResIII--Helicase_C--NPHI_C Nucleoside triphosphatase I betaentomopoxvirus [4/5] BIR--BIR--zf-C3HC4_3 AMV021 chordopoxvirinae [10/33] BACK--Kelch_1 Kelch-like protein [2] chordopoxvirinae [10/33] BACK--Kelch_1--Kelch_1 Kelch-like protein [3] chordopoxvirinae [10/33] BTB--BACK--Kelch_1 Kelch-like protein [6] chordopoxvirinae [10/33] DUF1406 Virulence protein chordopoxvirinae [10/33] DUF2701 M016L chordopoxvirinae [10/33] Hydrolase_4 Putative monoglyceride lipase chordopoxvirinae [10/33] Orthopox_A47 Immunoprevalent protein chordopoxvirinae [10/33] TNFR_c6--DUF1406 Crm-B secreted TNF-alpha-receptor-like protein chordopoxvirinae [11/33] DUF1500 Ankyrin-like protein [6] chordopoxvirinae [11/33] Glutaredoxin Glutaredoxin-1 chordopoxvirinae [11/33] Orthopox_A36R IEV transmembrane phosphoprotein chordopoxvirinae [11/33] Orthopox_A43R Putative type-I membrane glycoprotein chordopoxvirinae [11/33] Orthopox_A49R Putative phosphotransferase/anion transport protein chordopoxvirinae [11/33] Orthopox_A5L 39kDa core protein chordopoxvirinae [11/33] Orthopox_F6 CPXV053 protein chordopoxvirinae [11/33] Sema Semaphorin chordopoxvirinae [12/33] ATP-cone--Ribonuc_red_lgN--Ribonuc_red_lgC Ribonucleoside-diphosphate reductase [2] chordopoxvirinae [12/33] PLDc--PLDc_3 Phospholipase-D-like protein chordopoxvirinae [12/33] V-set Hemagglutinin chordopoxvirinae [13/33] 7tm_1 CC chemokine receptor-like protein chordopoxvirinae [13/33] Carb_anhydrase IMV membrane protein [1] chordopoxvirinae [13/33] Orthopox_F8 Cytoplasmic protein chordopoxvirinae [13/33] Pkinase_Tyr Tyrosine protein kinase-like protein

Page 21: DOMAIN ARCHITECTURE AWARE INFERENCE OF ORTHOLOGS … · domain-centric phyloinformatics pipeline. Any arrangement of single or multiple Pfam domains is considered a domain architecture

21

chordopoxvirinae [14/33] ig IL-1-beta-inhibitor chordopoxvirinae [15/33] Pox_M2 NFkB inhibitor chordopoxvirinae [15/33] Pox_T4_N--Pox_T4_C Intracellular viral protein chordopoxvirinae [16/33] IFNGR1 Soluble interferon-gamma receptor-like protein chordopoxvirinae [16/33] Pox_I6 36kDa major membrane protein chordopoxvirinae [17/33] Orthopox_35kD Chemokine binding protein chordopoxvirinae [17/33] Sushi--Sushi EEV type-I membrane glycoprotein chordopoxvirinae [19/33] Ank_2 Ankyrin-like protein [3] chordopoxvirinae [19/33] Ank_2--PRANC Ankyrin repeat protein [8] chordopoxvirinae [19/33] Pox_A31 CPXV166 protein chordopoxvirinae [2/33] Ank_4--Ank_2 Putative ankyrin-repeat protein; c-terminal chordopoxvirinae [2/33] DNA_pol_B_3--DNA_pol_B DNA polymerase [4] chordopoxvirinae [2/33] Kelch_1--Kelch_1--Kelch_1 Gp155R chordopoxvirinae [2/33] PYRIN PYRIN protein chordopoxvirinae [2/33] Pox_D5 NTPase [2] chordopoxvirinae [2/33] Pox_MCEL PP209 chordopoxvirinae [2/33] Pox_TAA1 Late transcription factor VLTF-2 [1] chordopoxvirinae [2/33] Poxvirus_B22R--Poxvirus_B22R_C Surface glycoprotein [1] chordopoxvirinae [21/33] 3Beta_HSD Hydroxysteroid dehydrogenase chordopoxvirinae [21/33] M11L Caspase-9 inhibitor chordopoxvirinae [21/33] Pox_A30L_A26L A-type inclusion protein [1] chordopoxvirinae [21/33] Poxvirus Putative tlr signaling inhibitor chordopoxvirinae [21/33] V-set_CD47--CD47 CD47-like protein chordopoxvirinae [21/33] VD10_N MutT motif protein chordopoxvirinae [22/33] PRANC Ankyrin repeat protein [12] chordopoxvirinae [22/33] Pox_A51 M137R chordopoxvirinae [22/33] z-alpha--dsrm Double-stranded RNA binding protein chordopoxvirinae [23/33] KilA-N--zf-RING_2 Zinc finger-like protein [2] chordopoxvirinae [23/33] Pox_C4_C10 C4L/C10L-like family protein chordopoxvirinae [24/33] DNA_ligase_A_N--DNA_ligase_A_M--DNA_ligase_A_C DNA ligase [2] chordopoxvirinae [25/33] DUF1235 CPXV173 protein chordopoxvirinae [25/33] Pox_C7_F8A Host range protein chordopoxvirinae [25/33] Pox_F11 MC018 chordopoxvirinae [25/33] Pox_F16 Nonfunctional serine recombinase chordopoxvirinae [26/33] Chordopox_A13L IMV membrane protein [2] chordopoxvirinae [26/33] Chordopox_A15 Core protein [1] chordopoxvirinae [26/33] Chordopox_A35R MHC class II antigen presentation inhibitor chordopoxvirinae [26/33] Lectin_C EEV glycoprotein [2] chordopoxvirinae [26/33] Pox_I6--DUF3746 Telomere-binding protein chordopoxvirinae [26/33] Serpin Serpin chordopoxvirinae [27/33] Chordopox_A33R EEV glycoprotein [1] chordopoxvirinae [27/33] NPH-II--Helicase_C RNA helicase NPH-II [2] chordopoxvirinae [27/33] Pox_D3 Virion core protein [3] chordopoxvirinae [27/33] Poxvirus_B22R_N--Poxvirus_B22R--Poxvirus_B22R_C B22R family protein chordopoxvirinae [28/33] Chordopox_E11 Virion core protein [1] chordopoxvirinae [28/33] Pox_Ag35 Late transcription factor VLTF-4 [1] chordopoxvirinae [28/33] Pox_F15 Protein F15 chordopoxvirinae [29/33] Vac_Fusion 14 kDa protein chordopoxvirinae [3/33] Ank_2--Ank_5 SWPV1-207 chordopoxvirinae [3/33] BTB--Kelch_1 M6 chordopoxvirinae [3/33] Baculo_p26 CPXV197 protein chordopoxvirinae [3/33] I-set Interleukin-1 receptor-like protein chordopoxvirinae [3/33] Kelch_1--Kelch_1 Kelch-like protein [15] chordopoxvirinae [3/33] Pox_T4_N 6kDa intracellular viral protein [2] chordopoxvirinae [3/33] Profilin Profilin chordopoxvirinae [3/33] US22 SORF2 domain protein chordopoxvirinae [31/33] Chordopox_A30L Imv protein chordopoxvirinae [31/33] Chordopox_RPO7 DNA-directed RNA polymerase 7 kDa subunit chordopoxvirinae [31/33] DUF1029 IMV membrane protein [3] chordopoxvirinae [31/33] Pox_EPC_I2-L1 IMV membrane protein [8] chordopoxvirinae [31/33] zf-FCS--Pox_TAA1 Late transcription factor VLTF-2 [2] chordopoxvirinae [32/33] D5_N--Pox_D5 NTPase [1] chordopoxvirinae [32/33] DUF836 Glutaredoxin-2 chordopoxvirinae [32/33] Pox_A12 Core protein [2] chordopoxvirinae [32/33] Pox_A14 IMV membrane protein [5] chordopoxvirinae [32/33] Pox_A3L S-S bond formation pathway protein

Page 22: DOMAIN ARCHITECTURE AWARE INFERENCE OF ORTHOLOGS … · domain-centric phyloinformatics pipeline. Any arrangement of single or multiple Pfam domains is considered a domain architecture

22

chordopoxvirinae [32/33] Pox_A9 IMV membrane protein [7] chordopoxvirinae [32/33] Pox_F17 DNA-binding phosphoprotein [1] chordopoxvirinae [32/33] Pox_I5 IMV membrane protein [9] chordopoxvirinae [32/33] Pox_RNA_Pol_22 DNA-directed RNA polymerase subunit chordopoxvirinae [32/33] Pox_RNA_pol_35 DNA-directed RNA polymerase 35 kDa subunit chordopoxvirinae [32/33] rpo132--RNA_pol_Rpb2_3--RNA_pol_Rpb2_6--RNA_pol_Rpb2_7 DNA-dependent RNA polymerase subunit rpo132 chordopoxvirinae [4/33] Ank_4--PRANC Ankyrin repeat-containing protein chordopoxvirinae [4/33] Ank_5--Ank_2 Ankyrin/F-box protein [3] chordopoxvirinae [4/33] BTB--BACK--Kelch_1--Kelch_1--Kelch_1 M140R chordopoxvirinae [4/33] BTB--BACK--Kelch_6 Kelch-like protein [11] chordopoxvirinae [4/33] BTB--Kelch_1--Kelch_1--Kelch_1 CPXV215 protein [2] chordopoxvirinae [4/33] GSHPx Glutathione peroxidase chordopoxvirinae [4/33] Ig_3 Putative IFN-alpha/beta binding protein chordopoxvirinae [4/33] Poxvirus_B22R_C Surface glycoprotein [2] chordopoxvirinae [4/33] Poxvirus_B22R_N B22R-like protein chordopoxvirinae [4/33] Poxvirus_B22R_N--Poxvirus_B22R CPXV219 protein chordopoxvirinae [4/33] Tissue_fac--IFNGR1 Soluble IFN-g receptor chordopoxvirinae [5/33] BEN Virosome component chordopoxvirinae [5/33] BTB--Kelch_1--Kelch_1 CPXV215 protein [1] chordopoxvirinae [5/33] Glyco_transf_29 "A(2,3)-sialyltransferase" chordopoxvirinae [5/33] Pox_A30L_A26L--Vac_Fusion MC131 chordopoxvirinae [5/33] Sushi Putative EEV host range protein chordopoxvirinae [5/33] Sushi--Sushi--Sushi CPXV199 protein chordopoxvirinae [5/33] dsrm M29L chordopoxvirinae [6/33] Ank_5 Ankyrin-like protein [5] chordopoxvirinae [6/33] Ank_5--PRANC Ankyrin [3] chordopoxvirinae [6/33] DUF2718 M3.2 chordopoxvirinae [6/33] Kelch_1 Kelch-like protein [14] chordopoxvirinae [6/33] NUDIX MuT motif expression regulator chordopoxvirinae [6/33] TGF_beta Transforming growth factor B chordopoxvirinae [6/33] VD10_N--NUDIX NPH-PPH downregulator chordopoxvirinae [7/33] Ank_2--Ank_2 Ankyrin repeat protein [1] chordopoxvirinae [7/33] Ank_4 Ankyrin [2] chordopoxvirinae [7/33] BTB--BACK--Kelch_1--Kelch_1 Kelch-like protein [7] chordopoxvirinae [7/33] RINGv E3 ubiquitin-protein ligase LAP chordopoxvirinae [8/33] TNFR_c6 Secreted TNF-receptor-like protein chordopoxvirinae [9/33] Ank_2--Ank_2--Ank_2--PRANC Ankyrin repeat protein [3] chordopoxvirinae [9/33] BACK Kelch-like protein [1] chordopoxvirinae [9/33] BTB--BACK Kelch-like protein [5] chordopoxvirinae [9/33] IL10 Interleukin-10-like protein chordopoxvirinae [9/33] NPH-II RNA helicase NPH-II [1] chordopoxvirinae [9/33] Pox_vIL-18BP Interleukin-18-binding protein entomopoxvirinae [2/7] BIR--zf-C3HC4_3 Inhibitor of apoptosis 3 entomopoxvirinae [2/7] DNA_ligase_aden--DNA_ligase_OB AMV199 entomopoxvirinae [3/7] LRR_4 LRR_4 protein entomopoxvirinae [3/7] RNA_pol_Rpb1_2--RNA_pol_Rpb1_3--RNA_pol_Rpb1_4 Putative RNA polymerase largest subunit entomopoxvirinae [4/7] Bro-N Putative antirepressor [1] entomopoxvirinae [5/7] Bro-N--DUF3627 Putative antirepressor [2] entomopoxvirinae [5/7] DNA_ligase_OB NAD-dependent DNA ligase entomopoxvirinae [5/7] LPMO_10 Fusolin entomopoxvirinae [5/7] Peptidase_M10 Zinc-dependent metalloprotease entomopoxvirinae [5/7] RVT_1 Reverse transcriptase entomopoxvirinae [6/7] DNA_pol_B_thumb Putative DNA polymerase beta/AP endonuclease entomopoxvirinae [6/7] NMT Putative N-myristoyl transferase orthopoxvirus [2/10] ATP-cone CPXV083 protein orthopoxvirus [2/10] B3R Protein B3 orthopoxvirus [2/10] BTB--BACK--Kelch_1--Kelch_6 Kelch-like protein [9] orthopoxvirus [2/10] DNA_ligase_A_N--DNA_ligase_A_M DNA ligase [1] orthopoxvirus [2/10] Ig_2--ig Ifn-alpha/beta receptor glycoprotein [1] orthopoxvirus [2/10] Kelch_6 64.7k Kelch-like protein f2 orthopoxvirus [2/10] Pox_A_type_inc--Pox_A_type_inc--Pox_A_type_inc--Pox_A_type_inc Cowpox A-type inclusion protein [4] orthopoxvirus [2/10] Pox_polyA_pol_N CPXV067 protein orthopoxvirus [2/10] RNA_pol_Rpb1_1 CPXV109 protein

Page 23: DOMAIN ARCHITECTURE AWARE INFERENCE OF ORTHOLOGS … · domain-centric phyloinformatics pipeline. Any arrangement of single or multiple Pfam domains is considered a domain architecture

23

orthopoxvirus [2/10] z-alpha CPXV069 protein orthopoxvirus [3/10] BTB--BACK--Kelch_1--Kelch_1--Kelch_6 Kelch-like protein [8] orthopoxvirus [3/10] Pox_A_type_inc--Pox_A_type_inc--Pox_A_type_inc--Pox_A_type_inc--Pox_A_type_inc Cowpox A-type inclusion protein [5] orthopoxvirus [3/10] Pox_A_type_inc--Pox_A_type_inc--Pox_A_type_inc--Pox_A_type_inc--Pox_A_type_inc--Pox_A_type_inc A-type inclusion protein [2] orthopoxvirus [4/10] B3R--AlbA_2 Schlafen [1] orthopoxvirus [4/10] Bax1-I NMDA receptor-like protein orthopoxvirus [4/10] Pox_A_type_inc--Pox_A_type_inc Cowpox A-type inclusion protein [2] orthopoxvirus [5/10] Ank--PRANC Ankyrin-like protein [2] orthopoxvirus [5/10] Pox_A_type_inc Cowpox A-type inclusion protein [1] orthopoxvirus [5/10] Pox_A_type_inc--Pox_A_type_inc--Pox_A_type_inc Cowpox A-type inclusion protein [3] orthopoxvirus [6/10] BEN--BEN CPXV071 protein orthopoxvirus [6/10] Baculo_p26--B3R--AlbA_2 Schlafen [2] orthopoxvirus [6/10] Orthopox_B11R CPXV205 protein orthopoxvirus [7/10] Ank Ankyrin-like protein [1] orthopoxvirus [7/10] S1 Interferon resistance protein orthopoxvirus [8/10] Guanylate_kin Guanylate kinase orthopoxvirus [8/10] Sushi--Sushi--Sushi--Sushi Complement binding orthopoxvirus [9/10] DUF1231 CPXV210 protein orthopoxvirus [9/10] Ig_2 IFN-alpha/beta-receptor-like secreted glycoprotein parapoxvirus [2/5] Ank_2--Ank Ankyrin/F-box protein [1] parapoxvirus [2/5] Orthopox_C10L ORF virus homologue of retroviral pseudoprotease protein poxviridae [10/40] ubiquitin Ubiquitin poxviridae [11/40] BTB Kelch-like protein [4] poxviridae [12/40] DNA_photolyase Photolyase poxviridae [12/40] KilA-N N1R/p28 family protein poxviridae [14/40] Sod_Cu Superoxide dismutase-like protein poxviridae [15/40] Thymidylate_kin Thymidylate kinase poxviridae [22/40] UDG Uracil DNA glycosylase poxviridae [26/40] Ribonuc_red_sm Ribonucleotide reductase small subunit poxviridae [3/40] DUF3627 Putative antirepressor [3] poxviridae [3/40] RNA_pol_Rpb1_2--RNA_pol_Rpb1_3--RNA_pol_Rpb1_4--RNA_pol_Rpb1_5 TJ7R [1] poxviridae [3/40] RNA_pol_Rpb2_3--RNA_pol_Rpb2_6--RNA_pol_Rpb2_7 Putative RNA polymerase subunit RPO132 poxviridae [3/40] VirDNA-topo-I_N CPXV115 protein [2] poxviridae [3/40] zf-RING_2 Zinc finger-like poxviridae [32/40] TK Thymidine kinase poxviridae [33/40] Pkinase Ser/thr kinase poxviridae [34/40] A2L_zn_ribbon--Pox_VLTF3 Late transcription factor VLTF-3 [1] poxviridae [34/40] Pox_P4A Virion core protein P4a poxviridae [34/40] Pox_int_trans Intermediate transcription factor VITF-3 [2] poxviridae [35/40] Pox_polyA_pol_N--Pox_polyA_pol--Pox_polyA_pol_C Poly(A) polymerase catalytic subunit [3] poxviridae [36/40] ResIII DNA helicase poxviridae [36/40] dUTPase dUTPase poxviridae [37/40] Peptidase_C92 Nlpc/p60 superfamily protein poxviridae [37/40] SNF2_N--Helicase_C--NPHI_C ATPase [2] poxviridae [38/40] DSPc Tyr/ser protein phosphatase poxviridae [38/40] Pox_A11 Viral membrane formation protein poxviridae [39/40] PARP_regulatory Cap-specific mRNA poxviridae [39/40] VirDNA-topo-I_N--Topoisom_I DNA topoisomerase type I poxviridae [5/40] D5_N Putative NTPase poxviridae [5/40] ResIII--Helicase_C "DNA helicase, transcript release factor" poxviridae [6/40] Pox_polyA_pol--Pox_polyA_pol_C Poly(A) polymerase catalytic subunit [1] poxviridae [7/40] Pox_VLTF3 Late transcription factor VLTF-3 [2] poxviridae [7/40] RNA_pol_Rpb2_6--RNA_pol_Rpb2_7 RNA polymerase RPO132 poxviridae [8/40] DNA_pol_B_exo1--DNA_pol_B_3--DNA_pol_B DNA polymerase [6] poxviridae [8/40] KilA-N--DUF3627 N1R/p28-like protein [1] poxviridae [8/40] NPH-II--DEAD--Helicase_C "RNA helicase, DExH-NPH-II domain" poxviridae [8/40] SNF2_N Putative early transcription factor small subunit