massive expansion of ubiquitination-related gene families

16
HAL Id: hal-01419559 https://hal.archives-ouvertes.fr/hal-01419559 Submitted on 19 Dec 2016 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Massive Expansion of Ubiquitination-Related Gene Families within the Chlamydiae Daryl Domman, Astrid Collingro, Ilias Lagkouvardos, Lena Gehre, Thomas Weinmaier, Thomas Rattei, Agathe Subtil, Matthias Horn To cite this version: Daryl Domman, Astrid Collingro, Ilias Lagkouvardos, Lena Gehre, Thomas Weinmaier, et al.. Massive Expansion of Ubiquitination-Related Gene Families within the Chlamydiae. Molecular Biology and Evolution, Oxford University Press (OUP), 2014, 31 (11), pp.2890 - 2904. 10.1093/molbev/msu227. hal-01419559

Upload: others

Post on 24-Dec-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Massive Expansion of Ubiquitination-Related Gene Families

HAL Id hal-01419559httpshalarchives-ouvertesfrhal-01419559

Submitted on 19 Dec 2016

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents whether they are pub-lished or not The documents may come fromteaching and research institutions in France orabroad or from public or private research centers

Lrsquoarchive ouverte pluridisciplinaire HAL estdestineacutee au deacutepocirct et agrave la diffusion de documentsscientifiques de niveau recherche publieacutes ou noneacutemanant des eacutetablissements drsquoenseignement et derecherche franccedilais ou eacutetrangers des laboratoirespublics ou priveacutes

Massive Expansion of Ubiquitination-Related GeneFamilies within the Chlamydiae

Daryl Domman Astrid Collingro Ilias Lagkouvardos Lena Gehre ThomasWeinmaier Thomas Rattei Agathe Subtil Matthias Horn

To cite this versionDaryl Domman Astrid Collingro Ilias Lagkouvardos Lena Gehre Thomas Weinmaier et al MassiveExpansion of Ubiquitination-Related Gene Families within the Chlamydiae Molecular Biology andEvolution Oxford University Press (OUP) 2014 31 (11) pp2890 - 2904 101093molbevmsu227hal-01419559

Article

Massive Expansion of Ubiquitination-Related Gene Familieswithin the Chlamydiae

Daryl Domman1 Astrid Collingro1 Ilias Lagkouvardos1 Lena Gehre2 Thomas Weinmaier1

Thomas Rattei1 Agathe Subtil2 and Matthias Horn1

1Department of Microbiology and Ecosystem Science University of Vienna Vienna Austria2Unite de Biologie des Interactions Cellulaires Institut Pasteur Paris France

Corresponding author E-mail hornmicrobial-ecologynet

Associate editor Howard Ochman

Abstract

Gene loss gain and transfer play an important role in shaping the genomes of all organisms however the interplay ofthese processes in isolated populations such as in obligate intracellular bacteria is less understood Despite a generaltrend towards genome reduction in these microbes our phylogenomic analysis of the phylum Chlamydiae revealed thatwithin the family Parachlamydiaceae gene family expansions have had pronounced effects on gene content We discov-ered that the largest gene families within the phylum are the result of rapid gene birth-and-death evolution These largegene families are comprised of members harboring eukaryotic-like ubiquitination-related domains such as F-box andBTB-box domains marking the largest reservoir of these proteins found among bacteria A heterologous type III secretionsystem assay suggests that these proteins function as effectors manipulating the host cell The large disparity in copynumber of members in these families between closely related organisms suggests that nonadaptive processes mightcontribute to the evolution of these gene families Gene birth-and-death evolution in concert with genomic drift mightrepresent a previously undescribed mechanism by which isolated bacterial populations diversify

Key words gene families birth and death model intracellular bacteria effector proteins F-box

IntroductionThe genomes of organisms reveal complex histories of genetransfer loss gain and rearrangement The extent that theseprocesses play in shaping gene families of both prokaryotesand eukaryotes are markedly different Gene gain within eu-karyotes is largely driven by intragenomic duplication events(Lynch and Conery 2000 Koonin et al 2002 Yang et al 2003Kaessmann 2010) and although duplication certainly shapesbacterial genomes most gains are the result of horizontalgene transfer (HGT) events (Ochman et al 2000 Lerat et al2005 Treangen and Rocha 2011) Estimates of the contribu-tion gene duplication processes play across domains of lifevary from 65 to 30 in the genomes of Arabidopsis andEscherichia coli respectively (Zhang 2003) Although geneticinnovation typically arises through gene acquisition from for-eign sources gene duplication events are increasingly beingrecognized as an important driver of bacterial genome evo-lution (Goldman et al 2006 McLeod et al 2006 Cho et al2007)

Comparisons of closely related organisms have revealed ahighly dynamic landscape of gene families in which the copynumber between species can vary substantially (Pushker et al2004 Lerat et al 2005) Given this background an intriguingevolutionary backdrop to study gene family evolution iswithin obligate intracellular bacteria In these populationsthe fixation of mutations is strongly affected by geneticdrift with a propensity in these genomes for deletion

(Kuo and Ochman 2009a) and thus gene family expansionswithin these genomes are generally rare (Hooper and Berg2003 Gevers et al 2004) Insightful analysis on gene familyevolution is best approached when comparing multiple ge-nomes from closely related species facilitating identificationof paralogs (homologous genes resulting from duplication)orthologs (homologous genes resulting from speciation) orxenologs (homologous genes derived from HGT) In thisregard the phylum Chlamydiae offers an ensemble of fullysequenced genomes across multiple families

All members of the phylum Chlamydiae are obligate in-tracellular bacteria and represent one of the most ancient andsuccessful lineages associated with eukaryotes (Horn 2008Subtil et al 2014) These organisms all share a characteristicbiphasic developmental cycle consisting of an infectious ex-tracellular state and an intracellular replicative state Thephylum can be divided into two major phylogenetic group-ings The family Chlamydiaceae which encompass wellknown animal and human pathogens such as Chlamydiatrachomatis and C pneumoniae and a group of families com-prising the environmentally distributed chlamydiae such asSimkaniaceae Waddliaceae and Parachlamydiaceae collec-tively referred to as environmental chlamydiae Recently itwas shown that the diversity of the phylum is tremendouslygreater with perhaps over 200 families spanning nearly everyenvironment (Lagkouvardos et al 2013) All members of theChlamydiae show notable genomic reductions and truncated

The Author 2014 Published by Oxford University Press on behalf of the Society for Molecular Biology and EvolutionThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (httpcreativecommonsorglicensesby40) which permits unrestricted reuse distribution and reproduction in any medium provided the original work isproperly cited Open Access2890 Mol Biol Evol 31(11)2890ndash2904 doi101093molbevmsu227 Advance Access publication July 28 2014

metabolic pathways including the inability to synthesizemany amino acids and nucleotides (Stephens et al 1998Kalman et al 1999 Horn et al 2004 Bertelli et al 2010Collingro et al 2011 Myers et al 2012)

In this study we set out to determine how gene familieshave evolved in members of the phylum Chlamydiae Wepresent four new genome sequences for members of thefamily Parachlamydiaceae which include two genome se-quences for the genus Neochlamydia We show that organ-isms within the Parachlamydiaceae have unprecedentednumbers of proteins harboring domains typically found ineukaryotes the majority of which are related to eukaryoticubiquitination pathways We show that these genes haveundergone rapid expansions and form the largest gene fam-ilies within the phylum We demonstrate that many of theselarge gene families are evolving under a gene-birth-deathmodel (Nei and Rooney 2005) and that differences betweenclosely related organisms may be explained by genomic drift

Results

Genome Sequencing of Novel Members of theChlamydiae

Currently there are nine described families within theChlamydiae however the majority of available genomesequences come from a single family the pathogenic Chla-mydiaceae To deepen our insights into a family outside of theChlamydiaceae we sequenced the genomes of four membersof the family Parachlamydiaceae which include two membersof the genus Neochlamydia and two additional genomes ofProtochlamydia and Parachlamydia All of the newly

sequenced Parachlamydiaceae members were isolated fromfree-living amoeba With the exception of Neochlamydia spEPS4 the isolates have been described previously (Fritscheet al 2000 Heinz et al 2007 Schmitz-Esser et al 2008) Thedraft genomes represent nearly complete genome sequencesbased on paired end read data (90ndash96) and the presence ofconserved single-copy marker genes (98ndash100 supplemen-tary table S1 Supplementary Material online) Using theseadditional genome sequences we first aimed to construct aphylogenetic framework of the phylum Chlamydiae usingconcatenated alignments of 32 marker proteins (supplemen-tary table S2 Supplementary Material online) Phylogenetictrees obtained with different methods confirmed themonophyly of the Chlamydiaceae and the Parachlamydiaceaewith strong support (fig 1) The Chlamydiaceae can besubdivided in two previously recognized groups and withinthe Parachlamydiaceae the genera Protochlamydia Neochla-mydia and Parachlamydia were recovered with highconfidence

All members of the Chlamydiaceae show highly similargenomes in terms of gene content and synteny (Myerset al 2012) however between chlamydial families rearrange-ments have played a major role in genome evolution(Collingro et al 2011) Whole-genome alignments of mem-bers of the Parachlamydiaceae clearly illustrate that within thegenera Protochlamydia Neochlamydia and Parachlamydiathere are few rearrangements and the genomes are highlysyntenic (fig 1) Between these genera however there havebeen extensive genome rearrangements demonstrating thesurprising dynamic nature of these reduced genomes

FIG 1 Phylogeny of the Chlamydiae and rearrangement history of genomes within the Parachlamydiaceae Phylogeny of the Chlamydiae based on 32phylogenetic marker proteins A Bayesian analysis using MrBayes (Ronquist and Huelsenbeck 2003) was performed on a set of 24 ribosomal proteins inaddition to GyrB RecA RpoB RpoC and EF-Tu from 19 sequenced members of the phylum (supplementary table S2 Supplementary Material online)Members of the Planctomycetes (Blastopirellula marina DSM 3645 Candidatus Kuenenia stuttgartiensis and Gemmata obscuriglobus UQM 2246) andthe Verrucomicrobia (Akkermansia muciniphila MucT Lentisphaera araneosa HTCC2155 Opitutus terrae PB90-1 and Verrucomicrobium spinosumDSM 4136) were used as outgroups (not shown) Colors denote family level classification Posterior probability scores are indicated only if below 100To the right conserved synteny and rearrangement history of genomes within the Parachlamydiaceae are shown The genomes of six members of thefamily were aligned using MAUVE to elucidate synteny between genomes and visualized using genoPlotR Extensive rearrangements are apparentbetween members of different genera whereas within genus comparisons show little rearrangements with a notable exception in the Protochlamydiawhere a large block has been rearranged

2891

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

The Gene Family Landscape of the Chlamydiae

To explore gene family evolution among members of theChlamydiae we first identified gene families using clustersof orthologous groups of proteins within the predicted pro-teomes from 19 chlamydial genomes (supplementary tableS2 Supplementary Material online) We then searched forgene families that contain expansion events that is thosehaving multiple members from one organism Previouswork has demonstrated that genome size correlates withthe number of paralogs with larger genomes containingmore paralogs than smaller ones (Bratlie et al 2010) Takinginto account all gene family members regardless of whetherthey originate from duplication processes or transfer eventsthis trend is generally observed within chlamydial genomes(fig 2a)

As described previously gene family expansions are sparsewithin the genomes of the Chlamydiaceae (Kalman et al1999 Kamneva et al 2012) with C pneumoniae CWL029harboring the largest number (n = 24) In line with previousobservations the largest gene families identified in our studyencode the polymorphic membrane proteins (PMPs)(Grimwood and Stephens 1999 Gomes et al 2006) includingnine members from C pneumoniae LPCoLN and two fromthe C trachomatis serovars The observed split of PMPsamong several smaller gene families in our analysis is an in-dication that our approach is rather conservative in assigninga protein to a gene family

The total number of expansion events (n = 277) detectedin the genome of Simkania negevensis represents a 10-foldincrease when compared with the Chlamydiaceae As thesegroup into many small gene families the extended number ofgene copies in S negevensis is the result of many small-scaleduplication or transfer events (fig 2b) This situation is similarin Waddlia chondrophila In stark contrast roughly half of thetotal of genes resulting from expansion events in Neochlamy-dia and Protochlamydia are the contribution of only few genefamilies

Large Gene Family Expansions in theParachlamydiaceae

The detection of large gene families in Neochlamydia andProtochlamydia indicates that there have been several large-scale expansion events within the Parachlamydiaceae Nota-bly different gene families are expanded in Neochlamydia andProtochlamydia (fig 2b) These represent the four largest genefamilies (containing between 27 and 138 members) foundwithin the phylum and include two gene families specific toNeochlamydia and two restricted to Protochlamydia

Intrigued by these four large-scale lineage-specific expan-sion events between the species pairs of Protochlamydia andNeochlamydia we sought to better characterize these genefamilies as most of their members are yet unknown withrespect to their functional role (ie they are classified as hy-pothetical proteins) Remarkably despite being in differentgene families from different organisms there are several sim-ilarities between these proteins Firstly they all encompassproteinndashprotein interaction domains such as leucine-rich

repeats (LRRs) or tetratricopeptide repeats (TPRs) Secondlymany of them contain additional domains typically found ineukaryotes such as F-boxes BTB-boxes and RINGU-boxeswhich are associated with eukaryotic ubiquitination pathways(Angot et al 2007)

A

B

FIG 2 The paralogous gene landscape of the Chlamydiae (A) Numberof paralogous genes within the Chlamydiae The number of paralogousgenes not including multiple copies is plotted against genome sizealong with a linear regression line (y = 4787x ndash 429 R2 = 070 blackline) The dashed gray line is plotted as a reference from 200 prokaryoticgenomes (Bratlie et al 2010) (B) Distribution of chlamydial gene familiesper genome with two or more members The number of genes withineach family is plotted for representative genomes The genomes of theChlamydiaceae have relatively small gene family sizes The polymorphicouter membrane proteins comprise the largest gene families in theChlamydiaceae and can be seen as the two largest blocks in theChlamydia pneumoniae LPCoLN (Cpn) bar The size distribution ofgene families is ordered from smallest to greatest and the appearanceof a solid ldquoblack boxrdquo at the base is merely an effect of the spacing ofmany small gene families There are several extensive gene families (la-beled) within members of Neochlamydia (NEX1 NEX2) andProtochlamydia (PEX1 PEX2)

2892

Domman et al doi101093molbevmsu227 MBE

Eukaryotic Ubiquitination-Associated DomainsPredominate Large Gene Families

The largest gene family termed Neochlamydia expansion 1(NEX1) in the phylum comprised a total of 138 memberswhich are contributed by the two Neochlamydia genomesThe domain architecture within this large gene family is het-erogeneous however all proteins contain various C-terminalrepetitions of LRR domains (fig 3) We have identified twosubfamilies that we delineate NEX1a and NEX1b within theNEX1 family The majority of members fall into the NEX1asubfamily in which they have a highly conserved N-terminalF-box or F-box-like domain The members have an average of77 sequence similarity among each other and the F-box-likedomain is 57 and 45 similar to Acanthamoeba castellaniiand human F-box-like domains respectively The smallerNEX1b family in contrast has a conserved RINGU-box atthe N-terminus Both the F-box and RINGU-box domains areassociated with eukaryotic E3 ubiquitin ligase complexes(Willems et al 2004) Between all members in the NEX1family there is a region of roughly 50 amino acids betweenthe predicted N-terminal domains and the LRRs that is highlyconserved However we failed to detect any known domainsin this region nor was there any homology to known proteins

The second large gene family within the Neochlamydiarepresents the third largest family of the phylum This largegene family termed NEX2 comprised 50 proteins (fig 3)

Similar to NEX1 the prevailing domain architecture is thatof eukaryotic-like E3 ubiquitin ligase-associated domainspaired with repeat domains This family is defined by thepresence of multiple TPR domains at the C-terminus and ageneral conservation of an F-box domain at the N-terminus inthe majority of members In most members there is also aconserved DUF294 domain located mid protein which is aputative nucleotidyltransferase In ten members there is anovarian tumor (OTU) (Balakirev et al 2003) domain directlyfollowing the F-box domain followed by the DUF294 domain

The other large gene families occur primarily in the mem-bers of the Protochlamydia and represent the second andfourth largest gene families of the phylum The largest genefamily in the Protochlamydia (PEX1) comprised 73 membersin total (fig 3) Intriguingly another E3 ubiquitin ligase-relateddomain the BTB domain is present in all but three membersat the N-terminus The BTB domain is then coupled toC-terminal LRR domains in all members The PEX2 familycomprised a total of 27 proteins that despite no detectabledomain at the N-terminus share multiple TPR domains in themiddle of the protein followed by a CHAT domain(Sakakibara and Hattori 2000) at the C-terminus

In summary the four largest gene families represent a sur-prisingly diverse armada of proteins which most likely func-tion within eukaryotic host cells where they potentiallyinterfere with the ubiquitination pathway The heterogeneity

FIG 3 Protein domain architecture of largest gene families The domain architecture of Neochlamydia (NEX1 NEX2) and Protochlamydia (PEX1 PEX2)gene families are shown The range of the number of domain repeats and functional assignments of the detected domains are indicated NEX1 can bedivided into two subfamilies based on phylogeny and domain presenceabsence A role of these proteins in the context of eukaryotic cells can bepostulated based on the presence of domains otherwise found in eukaryotes

2893

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

in domain architecture among these proteins interestinglymirrors that of their eukaryotic counterparts (Perez-Torradoet al 2006 Xu 2006)

A Pool of Putative Effector Proteins

If the members of the largest gene families in the phylumChlamydiae serve as effector proteins for host manipulationthey would need to be secreted and transported to the hostcell cytosol This is typically achieved through a type III secre-tion system a well-conserved virulence mechanism amongthe Chlamydiae which has been shown to translocate severalcharacterized effectors (Peters et al 2007 Betts et al 2009)Indeed many of the proteins found within the expandedNeochlamydia and Protochlamydia gene families are pre-dicted by computational analysis to be secreted by the typeIII secretion system and to be extracellular host associatedWithin the NEX1 family 66 of 138 (49) members are pre-dicted to be secreted A total of 37 (51) and 10 (37) werepredicted to be secreted from within the PEX1 and PEX2 genefamilies respectively The NEX2 gene family had the fewestpredicted with only two members

As the identification of the signal for secretion via the typeIII secretion system is inherently difficult (Arnold et al 2009)we tested representatives of each of the four largest genefamilies in vitro using a heterologous type III secretion sub-strate assay with Shigella flexneri as a host for protein expres-sion This assay has been used to successfully characterize typeIII secretion effector proteins from the chlamydiae before(Subtil et al 2001) and because of the lack of routine genetictools the Sh flexneri system is an attractive surrogate methodfor analyzing type III secretion in chlamydiae in vivo Thisexperiment demonstrated that the tested members ofNEX1 NEX2 PEX1 and PEX2 contain a functional type IIIsecretion recognition signal (supplementary fig S1Supplementary Material online) Together with the presenceof eukaryotic-like domains and computational predictionsthis strongly indicates that these gene families are largepools of effector proteins

Molecular Evolution of Large Gene Families

To better understand how these large gene families may haveevolved we reconstructed their phylogenetic relationshipsGene family trees were calculated using conserved sitesamong the protein alignment (supplementary figs S2ndashS6Supplementary Material online) The average amino acididentity between members ranges from 45 to 64 withthe most closely related sequences belonging to PEX2 Treetopologies suggest that the members of all four gene familieshave rapidly diverged as indicated by their long branchlengths Although the number of LRR and TPR domainsvaries dramatically between 1 and 39 this had no apparenteffect on the phylogenetic placement

We find clear cases in which the orthologs of two speciesgroup together indicating expansions have occurred beforespeciation (supplementary figs S2ndashS6 SupplementaryMaterial online) Alternatively expansions post speciation isapparent in all gene families Reconciliation of gene family

trees with the species tree indicates that in addition to ex-pansions many gene losses have occurred for each genefamily (supplementary table S3 Supplementary Materialonline) For instance for the NEX1 family there have been78 expansion events whereas 33 losses have occurred attrib-uted to 13 and 20 losses in Neochlamydia spp EPS4 andTUME1 respectively There were nearly equal losses betweenEPS4 and TUME1 (13 and 10) in NEX2 and a total of 46expansions Similarly PEX1 consists of 52 expansions and15 and 9 losses in Protochlamydia amoebophila EI2 andUWE25 respectively

A Birth-and-Death Model of Evolution

A pattern of differential gain loss and maintenance of genefamily members is strongly indicative of these gene familiesevolving according to a birth-and-death model (Nei andRooney 2005) Because of this differential maintenance ofgene family members the hallmarks of the birth-and-deathmodel are interspecies clustering of members in the phyloge-netic trees and the presence of pseudogenes from degradedmembers (Nei 2007) As we observe interspecies clustering forthe PEX and NEX gene families (supplementary figs S2ndashS6Supplementary Material online) we also tested for pseudo-genization events in the intergenic regions of the P amoebo-phila UWE25 genome (the draft Neochlamydia genomes areless suitable for this analysis) By utilizing BLAST we searchedfor matches to the predicted proteome using all intergenicregions as a query We then mapped the best BLAST hitsrepresenting 116 pseudogenes to their respective gene fam-ilies to get a picture of a given familiesrsquo representation in theintergenic regions The most represented gene family in theintergenic regions surprisingly was PEX1 (16 pseudogenes)For PEX2 one pseudogenized fragment was detected Theobserved presence of interspecies clustering and pseudo-genes and the dynamic history of gains and losses withinthe gene families are indicative of a birth-and-death modelof evolution

In contrast if these families were evolving via concertedevolution the phylogenetic trees would depict intraspeciesclustering that is that members of a gene family will be morehomologous to the other members from the same organismthan to that of other species Intraspecies clustering occursdue to repeated recombination among gene family memberswithin a genome leading to an overall high sequence similar-ity of all members a process known as gene conversion(Santoyo and Romero 2005) We thus tested for the possibil-ity of recombination within the gene families using the meth-ods implemented in the RDP4 software suite (Martin et al2010) Care must be taken however when assessing theimpact of recombination among divergent proteins as therecombination signal is quite error prone when proteins shareless than 70 similarity and these methods are heavily de-pendent on the alignment (Martin et al 2010) We detectsome recombination events between members within thePEX and NEX gene families however the majority of thepredictions are only marginally significant (the consensusscores are below the confidence threshold of 06) In the

2894

Domman et al doi101093molbevmsu227 MBE

NEX1a family the portion of the sequences most identified asrecombinant is the F-box domain (supplementary fig S8Supplementary Material online) This should be judiciouslyinterpreted as this might represent sequence similarity due topurifying selection operating on this domain Overall we didnot find convincing evidence that recombination has played amajor role in shaping the evolution of the PEX and NEX genefamilies

Gene Duplications and Purifying Selection

Gene family expansions can be the result of either gene du-plication or HGT The disentangling of these events is nottrivial and in fact may be impossible in the case of the genefamilies investigated here due to lineage-specific evolutionand the absence of clear homologs in other bacterial taxa(Kuo and Ochman 2009b) However several lines of evidencesuggest that regardless of the initial origin of these genes geneduplication processes have played a clear role in the evolutionthe large Parachlamydiaceae gene families First a hallmark ofgene duplication is the presence of tandem arrays of genecopies In this regard we find several large tandem arrays withmembers of the large gene families in Protochlamydia andNeochlamydia including examples for recent duplications(fig 4 supplementary fig S7 Supplementary Materialonline) In the P amoebophila UWE25 genome we detected47 tandem duplication events (distance within 10 genes) rep-resented by only 13 gene families Nearly half of these are thecontribution of PEX1 (n = 20) however they appear in severalclusters spread throughout the genome PEX2 demonstratesthe most dramatic case of a tandem array consisting of 13members After the PEX gene families the third largesttandem array only comprised three members intriguinglyalso F-box domain containing proteins NEX1 has 50 instances(36) where members are found within five genes of eachother Second the majority of the large gene family membersmeet the generally used criterion for identification of paralogsthat is they show at least 30 amino acid sequence identityover at least 60 of the protein length Thirdly phylogenetictrees show several species-specific expansion events

(including those genes still arranged in tandem arrays)which are best explained by gene duplication

Effector proteins have been shown to be among the fastestevolving proteins in a number of pathogen genomes(Nogueira et al 2012) and are often shown to be under pos-itive selection To assess the mode of selection acting onmembers of the expanded gene families we calculated theratio of nonsynonymous (dN) versus synonymous (dS) sub-stitution rates for each of the four gene families A dNdS ratioequal to 1 indicates a neutral state of selection whereas valueshigher or lower than 1 indicate positive or purifying selectionrespectively Global estimates of dNdS under the Model M0from CodeML (Yang 2007) ranged from 029 (NEX2) to 063(NEX1a) Additionally using pairwise sequence comparisonsfor dNdS calculation we did not find support that these genefamilies are currently evolving under positive selectionHowever the probability of these gene families evolvingunder purifying selection was highly significant (Plt 001)Therefore purifying selection is currently the dominantforce driving the evolution of these gene families and thusfacilitates their maintenance

Massive Expansion of Ubiquitination-AssociatedProteins

Driven by the discovery that members of the four largestParachlamydiaceae gene families showed rapid divergenceand are kept in chlamydial genomes despite apparent func-tional redundancy we asked whether there are additionalgenes not included in these gene families but encoding similarfunctional domains The common theme of the largeParachlamydiaceae gene families is the presence of domainsthat serve in the recruitment of target proteins to the eukary-otic ubiquitination machinery We therefore extracted all pro-teins containing F-boxF-box-like BTBPOZ and RINGU-boxdomains by scanning all chlamydial proteomes with eachrespective HMM profile We found no RINGU-box contain-ing proteins apart from those identified earlier as members ofNEX1b and we detected only few additional proteins inNeochlamydia and Parachlamydia harboring a BTBPOZdomain similar to those of PEX1 However our search

FIG 4 Example of duplication of BTB-box proteins in the Protochlamydia A new duplicate has arisen in Protochlamydia amoebophila UWE25 (shownvia the arrow) after the split from P amoebophila EI2 BTB-box proteins are indicated in purple The phylogenetic placement of these proteins(supplementary fig S5 Supplementary Material online) supports this scenario Orthologous proteins between the Protochlamydia are indicated byconnecting blocks

2895

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

unveiled an astonishing number of proteins harboring F-boxF-box-like domains with over 370 proteins within thephylum Nearly 300 of the F-box proteins are the contributionof the two Neochlamydia species (129 in TUME1 and 158 inEPS4) To characterize the relationships among this F-boxsuperfamily we constructed a phylogenetic tree based on adomain alignment This analysis shows that many of the ad-ditional F-box proteins found in Neochlamydia cluster witheither NEX1a or NEX2 (fig 5a) We also see several lineage-specific expansions of F-box proteins within Protochlamydiaand Parachlamydia species Reconciliation of the F-boxsuperfamily tree with the chlamydial species tree confirmsan extremely dynamic history of large-scale gene birth anddeath events (fig 5b) mirroring the evolutionary pattern seen

for the large Parachlamydiaceae gene families (supplementaryfigs S2ndashS6 and table S3 Supplementary Material online)

Furthermore we searched for additional chlamydial pro-teins containing domains for proteinndashprotein interactionsuch as LRR TPR or ankyrin repeats which are often associ-ated with F-boxF-box-like BTBPOZ and RINGU-box do-mains in the large gene families This search identified a vastnumber of proteins for each domain For instance chlamydialgenomes encode 409 proteins with LRR domains nearlydoubling the amount of LRR proteins contributed by thelarge Parachlamydiaceae gene families Taken together in ad-dition to the four large gene families there is an even greaterpool of chlamydial proteins with a putative role in hostinteraction

FIG 5 The phylogeny and evolutionary history of the F-box superfamily within the Chlamydiae (A) The phylogeny of 376 proteins within theChlamydiae that harbor an F-boxF-box-like domain This domain was extracted from each protein and aligned using MAFFT Maximum-likelihoodreconstruction of the phylogeny of the superfamily was performed with FastTree2 (B) The F-box domain superfamily gene tree was reconciled with thechlamydial species tree to reconstruct the evolutionary history of this group for members of the Chlamydiae The nodes in blue indicate the predictednumber of F-box proteins and numbers on the branches depict the gains and losses The extant species are indicated with their respective counts forF-box proteins The Neochlamydia have undergone massive gains and losses after the divergence from Protochlamydia

2896

Domman et al doi101093molbevmsu227 MBE

To gain a broader overview of the occurrence of F-boxF-box-like and BTBPOZ domains among other prokaryotesand eukaryotes we extracted domain abundance data fromPfam (Finn et al 2013) and included the current counts forthe Chlamydiae genomes present in this study (fig 6) Thisrevealed a striking pattern that the few bacterial groupsencoding proteins with F-box and BTB domains are almostexclusively amoeba-associated organisms These includemembers of the Legionellales (Gammaproteobacteria) theRickettsiales (Alphaproteobacteria) and the amoeba symbi-ont Amoebophilus asiaticus (Bacteroidetes) When thenumber of F-box proteins is normalized against the totalnumber of species in a given taxon the Chlamydiae lead inthe number of F-box proteins found in bacteria and evenharbor more than several lineages of eukaryotes includingthe Amoebozoa For the BTB proteins the Chlamydiaeappear to be the only bacterial lineage that harbors thisdomain It is intriguing that many of the large double-stranded DNA viruses namely the amoeba-infecting giantviruses contain many proteins with an F-box or BTB domain

Discussion

Large Gene Families Are Rare within ReducedBacterial Genomes

Chlamydial genomes are among the smallest known for pro-karyotes due to genome degradation consistent with long-term obligate associations with eukaryotic organisms(McCutcheon and Moran 2011) The genomes of host-asso-ciated bacteria such as Coxiella Mycoplasma Rickettsia spe-cies and members of the Chlamydiaceae tend to have smallor single copy gene families (Gevers et al 2004) and genefamily expansions either by gene duplication or HGT appearto have less effect on shaping the genomes of these obligatehost-associated bacteria (Bordenstein and Reznikoff 2005)However there is evidence that HGT may be more prevalentthan once thought in these organisms (Blanc et al 2007)A notable exception to this paradigm is the genome of theobligate intracellular pathogen Orientia tsutsugamushi(Rickettsiales) in which there has been massive expansionof type IV secretion system (over 350 tra-related genes) andhostndashmicrobe interaction genes (Cho et al 2007) Howeverthese expansions are thought to be the result of copiousplasmid integration and the genome is also littered withmobile elements a scenario not shared within chlamydialgenomes Here we have shown that several members ofthe Chlamydiae harbor gene families that have expanded atimmense magnitudes especially when compared with otherintracellular bacteria (fig 2) We find strong support for du-plication processes contributing to the expansion of thesefamilies thus highlighting that innovation through gene

FIG 6 Taxonomic profile of F-box and BTB domains The distributionof (A) the F-box clan and (B) the BTB-box throughout sequenced or-ganisms The size of the node indicates the number of species harboringproteins with the domain Thus larger node size indicates a largernumber of species in which a domain is found within a taxon Nodesare ordered from least to greatest by the total number of proteins thatcontain the domain within the taxon This is different than the numberof species as one species can have many proteins harboring a givendomain To reflect this disparity and to facilitate comparisons we com-puted a normalized value for each taxon that represents the number oftotal proteins divided by the number of species This normalizationvalue is represented by the width of the arc in the diagram For instancethe chlamydiae are represented by few species (small node size) but areamong the taxa containing the largest numbers of proteins with F-Boxand BTB domains (position on vertical axis) and show a high number ofproteins with these domains per species (arc width) All bacterial taxa

FIG 6 Continuedare plotted in purple and selected major eukaryotic taxa in blue TheChlamydiae are labeled in red and double-stranded DNA viruses areshown in green The data were obtained from the Pfam database foreach domain and counts were updated to reflect findings in this study

2897

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

duplication has had pronounced effects in shaping these chla-mydial genomes

Chlamydial Proteins Putatively Involved inInterference with Eukaryotic Ubiquitination Pathways

We demonstrated that the largest chlamydial gene familiesharbor proteins with domains associated with eukaryotic ubi-quitination pathways and that the chlamydial F-box super-family in particular is tremendous in size (figs 3 and 5) Arecent survey of prokaryotes found a total of 74 F-box pro-teins distributed in 22 species (Price and Kwaik 2010) whichmeans the number of these proteins present in a singleNeochlamydia genome is twice that of all previously knownbacterial F-box proteins combined The 76-member BTB su-perfamily is also remarkable in that the Parachlamydiaceae arethe only prokaryotes known to harbor this domain (fig 6)

In eukaryotes ubiquitin plays a pivotal regulatory role as aposttranslational modification that includes targeting pro-teins for degradation The process of adding ubiquitin to aprotein occurs when an assembled E3 ubiquitin ligase com-plex carrying a target protein is bound to the ubiquitin con-jugating enzyme E2 The ubiquitin ligase is a multiproteincomplex termed the Skp1-Cullin-F-box protein complex bywhich F-box proteins recruit target proteins and subsequentlybind to Skp1 which is linked to a Cullin protein (Zheng et al2002) A RINGU-box protein then serves as a linker of E2 tothe newly formed ubiquitin ligase and the transfer of theubiquitin moiety to the target occurs BTB-box proteins canhave multiple functions but chief among them is a functionalequivalent to the F-box protein in recruiting targets by bind-ing to the E3 ubiquitin ligase complex (Perez-Torrado et al2006) Proteinndashprotein interaction domains such as ankyrinkelch WD-40 LRR or TPR repeat domains are coupled toF-box and BTB-box domains and confer the specificity fortarget proteins

Given the conservation and essentiality of the ubiquitina-tion pathway throughout eukaryotes it should come as nosurprise that bacterial pathogens have engineered ways tomanipulate this pathway The use of F-box proteins appearsto be a common feature among plant pathogens such asAgrobacterium tumefaciens and Ralstonia solanacearumwhose genomes encode one and four F-box proteins respec-tively (Magori and Citovsky 2011) Intriguingly F-box proteinsseem to be a common feature of amoeba-associated bacteriaand viruses The amoeba symbiont A asiaticus a Bacteroi-detes is predicted to harbor 15 F-box proteins and until nowwas the largest known pool of these proteins among se-quenced genomes (Schmitz-Esser et al 2010) AdditionallyLegionella pneumophila exports an F-box protein coupledto ankyrin repeats termed AnkB that is essential for infectionof both human cell lines and Acanthamoeba (Price et al 2009Lomma et al 2010) AnkB blocks host proteosomal degrada-tion and thus generates increased levels of required aminoacids (Price et al 2009 Price et al 2011) Several other F-boxproteins secreted by L pneumophila have been shown tointeract with host E3 ligase complexes (Ensminger andIsberg 2010) Among the members of the Chlamydiaceae

we could not detect an F-box or BTB-box domain Howeverthere are several proteins within the Chlamydiaceae thatfunction as deubiquinating proteases such as the C tracho-matis ChlaDub1 (Misaghi et al 2006) and the recently de-scribed ChlaOTU characterized in Chlamydia caviae (Furtadoet al 2013)

We have shown experimentally that representative mem-bers of the investigated chlamydial gene families contain func-tional type III secretion signals (supplementary fig S1Supplementary Material online) Thus they likely representan extensive pool of effector proteins with a putative role inhijacking the host ubiquitination machinery perhaps in amanner similar to AnkB from Legionella by increasing nutri-ent availability In the absence of direct proteinndashprotein in-teraction data however we can only speculate as to what theinteraction partner(s) are for the members of the PEX andNEX gene families

Birth-and-Death Evolution Has Shaped LargeParachlamydiaceae Gene Families

Large gene families are thought to generally either evolve viaconcerted evolution or according to a birth-and-death model(Nei and Rooney 2005) When gene families are evolving con-certedly all members experience the same evolutionary pres-sure and evolve as a unit The gene family is marked byrecombination between members that leads to a homogeni-zation of all members and thus in the phylogenetic analysisone observes intraspecies clustering of gene family membersAlthough we do find minor evidence that recombination hasoccurred between members in the PEX and NEX gene families(supplementary fig S8 Supplementary Material online) theeffect does not appear to be that of homogenization In con-trast long branch lengths in the trees and moderate overallsequence similarity indicate these proteins have divergedquite extensively (supplementary figs S2ndashS6 SupplementaryMaterial online) As we do not observe a dominance of in-traspecies clustering in the phylogenetic analysis these genefamilies are not evolving in a fashion as would be predicted viaconcerted evolution

We provide clear evidence supporting birth-and-deathevolution of the PEX and NEX gene families which ismarked by independent gains and losses of members (Neiand Rooney 2005) We detected frequent lineage-specific du-plication and loss events leading to high rates of variation incopy number between closely related organisms (supplemen-tary table S3 Supplementary Material online) In the phylo-genetic trees of the PEX and NEX gene families we findinterspecies clustering of members which is a hallmark ofthe birth-and-death model (supplementary figs S2ndashS6Supplementary Material online) Additionally we detectedpseudogenized gene fragments of members related to thelarge gene families which is another hallmark of this modeof evolution To our knowledge this mode of evolution has sofar only been described once for a bacterial gene family(Rooney and Ward 2008)

It has been proposed that gene families that control phe-notypic characters are generally subject to birth-and-death

2898

Domman et al doi101093molbevmsu227 MBE

evolution (Nei and Rooney 2005 Nei 2007) In the case of theexpansions seen in the Parachlamydiaceae this character islikely the ability to effectively infect and replicate in protisthosts The advantage of the birth-and-death scenario as op-posed to concerted evolution is that individual members ofthe gene family are able to functionally differentiate from eachother and thus might facilitate the adaptation to new eco-logical niches (Nei 2007) Therefore a possible driver for thebirth-and-death model for the PEX and NEX families is theexploitation of new ecological niches which in this case ismost likely new host(s) species or a novel way to subvert hostcell machinery

A Confluence of Drift and Selection

Birth-and-death evolution of gene families occurs by bothadaptive processes and chance events such as genetic drift(Nei et al 2008) This is due to the fact that although geneduplications are intrinsically stochastic events their fixation isinfluenced by both selection and genetic drift If a duplicate isfixed functional divergence can occur due to relaxed selec-tion or diversifying selection in one of the gene copies aprocess known as neofunctionalization (Lynch and Conery2000) Once these genes have diverged the new functionsare then maintained in the genome via purifying selectionWe envision an evolutionary scenario where the PEX and NEXgene family members rapidly diverged either due to positivediversifying or relaxed selection after duplication likely leadingto functional diversification The window for detecting theseearly diversification processes is very small but our analysisindicates that the gene family members are now being main-tained via purifying selection

The retention of such large gene families in organisms thattypically undergo extreme genome reduction is perplexingespecially when considering the variation in copy numberbetween organisms The large number of gains and lossesfor F-box domain containing proteins across chlamydial line-ages indicate substantial fluctuations in gene content some-times over short evolutionary time (fig 5b) One describedcorollary of gene families evolving via a birth-and-deathmodel is that the copy number variation of memberswithin a gene family may vary due to chance duplicationloss events both between and within species a process coinedldquogenomic drift (Nei et al 2008) Genomic drift has been in-voked to explain the large copy number variations in severallarge gene families in eukaryotes including animal chemosen-sory receptors (Nozawa et al 2007) homeobox genes (Namand Nei 2005) and fatty-acid reductases (Eirın-Lopez et al2012)

Remarkably eukaryotic F-box and BTB-box gene familiesdemonstrate the same dramatic evolution following a rapidbirth-and-death model and extensive copy number variationis seen both between and within eukaryotic species (Xu 2006Navarro-Quezada et al 2013) This is especially true in higherplant species (Stogios et al 2005 Xu et al 2009) whereArabidopsis thaliana and Oryza sativa harbor upwards of800 F-box proteins (Xu et al 2009) and large expansionshave also been described in some nematode lineages

(Thomas 2006) A case has been made that genomic driftalso influences the evolution of these eukaryotic F-boxBTB-box protein families (Xu et al 2009) Genomic drift there-fore seems a plausible mechanism describing some aspects ofevolution of the PEX and NEX gene families in theParachlamydiaceae Genome sequences from other chlamyd-ial genomes and data from within populations would allowfurther testing of this hypothesis

Another possibility is that the expansion of these families isthe result of selection pressure due to a coevolutionary armsrace with host counterpart protein(s) Under this scenariothe expansion and diversification of these families are in directresponse to changes in target proteins These evolutionarydynamics also referred to as the Red Queen hypothesis(Valen 1974) have been most exemplified in bacteriawithin plantndashpathogen relationships most notably betweenthe pathogen Pseudomonas syringae and its host Ar thaliana(Ma et al 2006 Baltrus et al 2011) Copy number variation ofeffector proteins has been shown for Ps syringae and it isspeculated that these differences confer differential hostranges (Baltrus et al 2011) It is plausible therefore that var-iations in copy number between the PEX and NEX gene fam-ilies is influenced by the host range of particular chlamydiallineages This would imply that these proteins interact withmany targets from a narrow host range or that there are alimited number of targets for a large number of possible hostsIt seems most parsimonious that the latter was the case andthat these genes serve as accessory virulence factors allowingthese chlamydiae to expand their host range However thefailure to detect positive selection as the major force acting onthe PEX and NEX gene family members casts doubts on acoevolutionary arms race as the sole mechanism for the evo-lution of these gene families

Another conceivable scenario might be that selection pres-sure for increased gene dosage has contributed to the expan-sion of the large chlamydial gene families In reduced genomeswhere the pool of regulatory proteins is limited a path forincreased protein expression might be gene duplication Thismode of ldquogene amplification has been described for instancefor the increase of antibiotic resistance genes in E coli thecholera toxin gene in Vibrio cholerae and capsule biosynthesisgenes in Haemophilus influenza (reviewed in Andersson andHughes 2009) Although a gene dosage scenario cannot befully ruled out for the PEX and NEX gene families it seemsunlikely as the high divergence of their members makes acompletely overlapping function highly improbableAdditionally gene amplifications are in response to strongselection pressure and would therefore be under strong pos-itive selection It is also known that once these selection pres-sures are removed the amplified gene copies are rapidly lostwithin the population often within several generations(Andersson and Hughes 2009)

We favor a hypothesis that incorporates both selectionand chance into the equation We envision that the expan-sion of these gene families is reflective of their role in hostndashmicrobe interaction where they are interfering with the hostubiquitination pathway The expansions may reflect either achange in the environmental niche perhaps the ability to

2899

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

infect a new host organism or they may be in response to agrowing number of targets within a current host Because thePEX and NEX gene families have respectively expanded in theProtochlamydia and Neochlamydia they appear to providelineage-specific functions related to particular host interac-tions The PEX and NEX variation in copy number betweenclosely related organisms may be reflective of genomic drift inwhich the independent gain and loss of members within afamily has been determined to some extent by chanceevents The fact that members of the Chlamydiae includingpathogenic Chlamydia also harbor various proteins to sub-vert the host ubiquitination pathway indicates a case of con-vergent evolution toward exploitation of this system withinthis phylum

To summarize the Chlamydiae harbor several lineage-spe-cific gene families which are the largest among intracellularmicrobes with small genomes Experimental evidence andcomputational analysis strongly indicate that members ofthese large gene families function as effector proteins involvedin manipulating the ubiquitination machinery of their eukary-otic host cells The large chlamydial gene families follow abirth-and-death model of evolution where genomic driftmay influence copy number variation This might representa previously undescribed mechanism by which organismswith limited exposure to larger gene pools generate geneticdiversity

Materials and Methods

Genome Sequencing

We sequenced the genomes of P amoebophila EI2 (Schmitz-Esser et al 2008) Parachlamydia acanthamoebae OEW1(Heinz et al 2007) Neochlamydia sp TUME1 (Fritsche et al2000) and Neochlamydia sp EPS4 The Acanthamoeba spharboring the latter was isolated from pond sediment fromElba Italy Cells and DNA were prepared as described previ-ously (Schmitz-Esser et al 2010) All four genomes were se-quenced using 454 technology and assemblies wereperformed using Newbler 26 We implemented an in-housepipeline for genome annotation that combines multipleapproaches for gene calling and function prediction Genecalling was performed by combining ab initio predictionsfrom GeneMark (Besemer et al 2001) Glimmer v302 (Del-cher et al 2007) Prodigal (Hyatt et al 2010) and Critica v105(Badger and Olsen 1999) with homology information derivedfrom a BLAST search against National Center for Biotechnol-ogy Information nonredundant protein database (NCBIResource Coordinators 2014) RNA genes were called bytRNAscan-SE (Lowe and Eddy 1997) RNAmmer (Lagesenet al 2007) and Rfam (Griffiths-Jones et al 2005) Functionprediction was performed via BLAST against the UniProtdatabase (UniProt Consortium 2014) and domain predictionwas performed via InterProScan 5 (Jones et al 2014)Completeness estimates were performed based on the pres-ence of single-copy marker genes (n = 54) found in 99 of allbacterial genomes Sequences have been deposited at Gen-bankEMBLDDBJ under accession numbers PRJNA242498PRJNA242497 PRJNA242499 and PRJNA242500

Comparative Genome Analysis

Orthologous protein groups were calculated using OrthoMCL(Li et al 2003) with default parameters using the predictedproteomes from 19 chlamydial organisms (supplementarytable S2 Supplementary Material online) For those gene fam-ilies under analysis membership to a given family was furtherevaluated by alignment and assessment of the phylogenetictrees Members that had obvious major differences in thealignment and trees were dubbed spurious and likely tohave been grouped due to homology in the repeat regionand thus were removed from the gene family Whole-genomealignments were performed using the MAUVE progressive-Mauve algorithm (Darling et al 2010) The MAUVE align-ments and local synteny plots were visualized in R (v 301)with the genoPlotR package (Guy et al 2010)

Species Tree Construction

Phylogeny of the Chlamydiae was reconstructed using 32phylogenetic marker proteins (supplementary table S2Supplementary Material online) Multiple sequence align-ments were performed with MAFFT (Katoh and Standley2013) using the settings ldquondashmaxiterate 1000 and all align-ments subsequently concatenated using FASconCAT v10(Keurouck and Meusemann 2010) Maximum-likelihood analysiswas performed with RAxML (Stamatakis 2006) with 1000bootstrap iterations under the PROTGAMMAGTR modeland Bayesian inference was performed with MrBayes v32(Ronquist and Huelsenbeck 2003) using the mixed aminoacid model and standard settings via the CIPRES science gate-way (Miller et al 2010) Alignment and tree files are availableas supplementary material Supplementary Material online

Gene Family Analysis

As multiple sequence alignments of repeat containing pro-teins are not trivial we employed several methods to obtainreliable alignments We compared the alignments producedby MUSCLE (Edgar 2004) MAFFT using the ldquogenapairsoption (Katoh and Standley 2013) and DIALIGN-PFAM (AlAit et al 2013) all with and without trimming by Gblocksusing relaxed parameters ldquo-b5=h (Castresana 2000) In nearlyall cases the MAFFT alignment combined with Gblocksyielded the best alignment as judged by manual inspectionThe exceptions were that DIALIGN-PFAM with Gblocks wasthe best method for the NEX1b and PEX1 alignments Toensure robustness we calculated neighbor joining maximumlikelihood and Bayesian trees for each of the alignment datasets and selected the most supported tree for the final anal-ysis In all cases the Bayesian trees yielded the most supportwhich were calculated on the CIPRES gateway (Miller et al2010) with MrBayes as mentioned above for the species treeFor protein domain phylogenies we extracted and aligned thedomains using the hmmalign program from HMMER3 pack-age (Eddy 2011) and phylogenetic trees for the domains werecalculated using FastTree2 (Price et al 2010) Reconciliationsof gene trees to the species tree to infer gene gain and losswere performed in Notung v26 (Stolzer et al 2012) and theroot for the gene trees was assigned within the program to

2900

Domman et al doi101093molbevmsu227 MBE

achieve the lowest duplication-to-loss ratio Phylogenetictrees were visualized using iTOL (Letunic and Bork 2007) orthe ETE2 toolkit (Huerta-Cepas et al 2010) Alignment andtree files are available as supplementary materialSupplementary Material online

Detection of Selection and Recombination

Whole-protein alignments were converted to codon align-ments via the Pal2Nal program (Suyama et al 2006) Todetect the mode of selection acting on these gene familieswe used CodeML from the PAML package (Yang 2007) Onlysequences that were nearly full length and not contig fusionswere used for the CodeML analysis We additionally used themodified NeindashGojobori method for codon selection from theMEGA v51 (Tamura et al 2011) program with 1000 boot-straps and treating missing data with pairwise deletions Weused the RDP4 software suite (Martin et al 2010) to detectrecombination events which is an amalgamation of manyindividual recombination programs linked into one softwarearchitecture We employed seven recombination detectionprograms that include RDP (Martin and Rybicki 2000)GENECONV (Sawyer 1989) MaxChi (Smith 1992) BootScan(Martin et al 2005) Chimaera (Posada and Crandall 2001)SiScan (Gibbs et al 2000) and 3Seq (Boni et al 2007) Onlyevents that were predicted with more than four programswere considered

Intergenic Sequences Analysis

The proteome of Protochlamydia was used as query(tBLASTn) against the intergenic regions The hits were con-servatively filtered based on size (4 50 aa) bitscore (4 50)and E value (lt1010) For each intergenic region when twoquery proteins overlap only the best hit was considered Inaddition if different parts of the same protein had hits in oneintergenic space only the top scoring (bitscore) was consid-ered Putative domains in the intergenic regions were de-tected using InterProScan 5 (Jones et al 2014)

Domain Distribution

For each domain of interest we downloaded the Pfam HMMmodel (Finn et al 2013) and scanned all chlamydial prote-omes using the hmmscan program from the HMMER3 suite(Eddy 2011) In the case of the F-box and F-box-like domainsthe results were combined into a nonredundant list The datafor the other taxa were obtained through the Pfam websiteunder the species distribution tab for each domain or clan asin the case for the F-box The networks were created using theldquoarcdiagramrdquo package in R

Type III Secretion Analysis

Fusion proteins containing the 50-part of the genes of interest(including the first 20 codons) and the adenylate cyclase Cyawere expressed in Sh Flexneri SF401 and SF620 derivatives ofthe wild-type strain M90T in which the mxiD and ipaB geneshave been inactivated (Allaoui et al 1993) The 50-part of thetarget genes were amplified by polymerase chain reaction andcloned in the puc19cya vector as described (Subtil et al 2001)

Secretion assays were performed on 30 ml of exponentiallygrown cultures as described previously (Subtil et al 2001)Antibodies against CRP a cytosolic marker were used to es-timate the contamination of supernatant fractions with bac-terial proteins as a result of bacterial lysis Antibodies againstIpaD a type-III-secreted protein of Shigella were used to verifythat type III secretion occurred normally in the transformedstrains A monoclonal antibody against Cya and polyclonalantibodies against CRP and IpaD were generously provided byDrs N Guiso A Ullmann and C Parsot respectively (InstitutPasteur Paris) Prediction of type III secretion was performedvia the Effective database (Jehl et al 2011) and the PSORTbwebserver (Yu et al 2010)

Supplementary MaterialSupplementary tables S1ndashS3 and figures S1ndashS8 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Thomas Penz and Gabriele Schwammel are acknowledged forisolating genomic DNA This work was supported byAustrian Science Fund FWF grants Y277-B06 and I1628-B22European Research Council ERC StG EVOCHLAMY (281633)and Marie Curie Initial Training Network SymbiomicsGenome sequencing was performed at the NorwegianSequencing Center

ReferencesAl Ait L Yamak Z Morgenstern B 2013 DIALIGN at GOBICSmdashmultiple

sequence alignment using various sources of external informationNucleic Acids Res 41W3ndashW7

Allaoui A Sansonetti PJ Parsot C 1993 MxiD an outer membraneprotein necessary for the secretion of the Shigella flexneri lpa inva-sins Mol Microbiol 759ndash68

Andersson DI Hughes D 2009 Gene amplification and adaptive evolu-tion in bacteria Annu Rev Genet 43167ndash195

Angot A Vergunst A Genin S Peeters N 2007 Exploitationof eukaryotic ubiquitin signaling pathways by effectorstranslocated by bacterial type III and type IV Secretion SystemsPLoS Pathog 3e3

Arnold R Brandmaier S Kleine F Tischler P Heinz E Behrens SNiinikoski A Mewes HW Horn M Rattei T 2009 Sequence-based prediction of type III secreted proteins PLoS Pathog 5e1000376

Badger JH Olsen GJ 1999 CRITICA coding region identification toolinvoking comparative analysis Mol Biol Evol 16512ndash524

Balakirev MY Tcherniuk SO Jaquinod M Chroboczek J 2003 Otubainsa new family of cysteine proteases in the ubiquitin pathway EMBORep 4517ndash522

Baltrus DA Nishimura MT Romanchuk A Chang JH Mukhtar MSCherkis K Roach J Grant SR Jones CD Dangl JL 2011 Dynamicevolution of pathogenicity revealed by sequencing and comparativegenomics of 19 Pseudomonas syringae isolates PLoS Pathog 7e1002132

Bertelli C Collyn F Croxatto A Reurouckert C Polkinghorne A Kebbi-Beghdadi C Goesmann A Vaughan L Greub G 2010 TheWaddlia genome a window into chlamydial biology PLoS One 5e10890

2901

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

Besemer J Lomsadze A Borodovsky M 2001 GeneMarkS a self-trainingmethod for prediction of gene starts in microbial genomesImplications for finding sequence motifs in regulatory regionsNucleic Acids Res 292607ndash2618

Betts HJ Wolf K Fields KA 2009 Effector protein modulation of hostcells examples in the Chlamydia spp arsenal Curr Opin Microbiol1281ndash87

Blanc G Ogata H Robert C Audic S Claverie J-M Raoult D 2007Lateral gene transfer between obligate intracellular bacteria evi-dence from the Rickettsia massiliae genome Genome Res 171657ndash1664

Boni MF Posada D Feldman MW 2007 An exact nonparametricmethod for inferring mosaic structure in sequence tripletsGenetics 1761035ndash1047

Bordenstein SR Reznikoff WS 2005 Mobile DNA in obligate intracellularbacteria Nat Rev Microbiol 3688ndash699

Bratlie MS Johansen J Sherman BT Huang DW Lempicki RA Drabloslashs F2010 Gene duplications in prokaryotes can be associated with en-vironmental adaptation BMC Genomics 11588

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Cho N-H Kim H-R Lee J-H Kim SY Kim J Cha S Kim SY Darby ACFuxelius HH Yin J et al 2007 The Orientia tsutsugamushi genome re-veals massive proliferation of conjugative type IV secretion systemand host-cell interaction genes Proc Natl Acad Sci U S A 1047981ndash7986

Collingro A Tischler P Weinmaier T Penz T Heinz E Brunham RC ReadTD Bavoil PM Sachse K Kahane S et al 2011 Unity in varietymdashthe pan-genome of the Chlamydiae Mol Biol Evol 283253ndash3270

Darling AE Mau B Perna NT 2010 progressiveMauve multiple genomealignment with gene gain loss and rearrangement PLoS One 5e11147

Delcher AL Bratke KA Powers EC Salzberg SL 2007 Identifying bacterialgenes and endosymbiont DNA with Glimmer Bioinformatics 23673ndash679

Eddy SR 2011 Accelerated profile hmm searches PLoS Comput Biol 7e1002195

Edgar RC 2004 MUSCLE multiple sequence alignmentwith high accuracy and high throughput Nucleic Acids Res 321792ndash1797

Eirın-Lopez JM Rebordinos L Rooney AP Rozas J 2012 The birth-and-death evolution of multigene families revisited Genome Dyn7170ndash196

Ensminger AW Isberg RR 2010 E3 ubiquitin ligase activity and targetingof Bat3 by multiple Legionella pneumophila translocated substratesInfect Immun 783905ndash3919

Finn RD Bateman A Clements J Coggill P Eberhardt RY Eddy SR HegerA Hetherington K Holm L Mistry J et al 2013 Pfam the proteinfamilies database Nucleic Acids Res 42D222ndashD230

Fritsche TR Horn M Wagner M Herwig RP Schleifer K-H Gautom RK2000 Phylogenetic diversity among geographically dispersedChlamydiales endosymbionts recovered from clinical and environ-mental isolates of Acanthamoeba spp Appl Environ Microbiol 662613ndash2619

Furtado AR Essid M Perrinet S Bala~na ME Yoder N Dehoux P Subtil A2013 The chlamydial OTU domain-containing protein ChlaOTU isan early type III secretion effector targeting ubiquitin and NDP52Cell Microbiol 152064ndash2079

Gevers D Vandepoele K Simillon C Van de Peer Y 2004 Gene dupli-cation and biased functional retention of paralogs in bacterial ge-nomes Trends Microbiol 12148ndash154

Gibbs MJ Armstrong JS Gibbs AJ 2000 Sister-scanning a Monte Carloprocedure for assessing signals in recombinant sequencesBioinformatics 16573ndash582

Goldman BS Nierman WC Kaiser D Slater SC Durkin AS Eisen JARonning CM Barbazuk WB Blanchard M Field C et al 2006Evolution of sensory complexity recorded in a myxobacterialgenome Proc Natl Acad Sci U S A 10315200ndash15205

Gomes JP Nunes A Bruno WJ Borrego MJ Florindo C Dean D 2006Polymorphisms in the nine polymorphic membrane proteins ofChlamydia trachomatis across all serovars evidence for serovar Darecombination and correlation with tissue tropism J Bacteriol 188275ndash286

Griffiths-Jones S Moxon S Marshall M Khanna A Eddy SR Bateman A2005 Rfam annotating non-coding RNAs in complete genomesNucleic Acids Res 33D121ndashD124

Grimwood J Stephens RS 1999 Computational analysis of thepolymorphic membrane protein superfamily of Chlamydia tracho-matis and Chlamydia pneumoniae Microb Comp Genomics 4187ndash201

Guy L Kultima JR Andersson SGE 2010 genoPlotR comparative geneand genome visualization in R Bioinformatics 262334ndash2335

Heinz E Kolarov I Keuroastner C Toenshoff ER Wagner M Horn M 2007An Acanthamoeba sp containing two phylogenetically differentbacterial endosymbionts Environ Microbiol 91604ndash1609

Hooper SD Berg OG 2003 On the nature of gene innovation duplica-tion patterns in microbial genomes Mol Biol Evol 20945ndash954

Horn M 2008 Chlamydiae as symbionts in eukaryotes Annu RevMicrobiol 62113ndash131

Horn M Collingro A Schmitz-Esser S Beier CL Purkhold U Fartmann BBrandt P Nyakatura GJ Droege M Frishman D et al 2004Illuminating the evolutionary history of chlamydiae Science 304728ndash730

Huerta-Cepas J Dopazo J Gabaldon T 2010 ETE a python environmentfor tree exploration BMC Bioinformatics 1124

Hyatt D Chen G-L LoCascio PF Land ML Larimer FW Hauser LJ 2010Prodigal prokaryotic gene recognition and translation initiation siteidentification BMC Bioinformatics 11119

Jehl M-A Arnold R Rattei T 2011 Effectivemdasha database ofpredicted secreted bacterial proteins Nucleic Acids Res 39D591ndashD595

Jones P Binns D Chang H-Y Fraser M Li W McAnulla C McWilliam HMaslen J Mitchell A Nuka G et al 2014 InterProScan 5 genome-scale protein function classification Bioinformatics 301236ndash1240

Kaessmann H 2010 Origins evolution and phenotypic impact of newgenes Genome Res 201313ndash1326

Kalman S Mitchell W Marathe R Lammel C Fan J Hyman RW OlingerL Grimwood J Davis RW Stephens RS 1999 Comparative genomesof Chlamydia pneumoniae and C trachomatis Nat Genet 21385ndash389

Kamneva OK Knight SJ Liberles DA Ward NL 2012 Analysis of genomecontent evolution in PVC bacterial super-phylum assessment ofcandidate genes associated with cellular organization and lifestyleGenome Biol Evol 41375ndash1390

Katoh K Standley DM 2013 MAFFT multiple sequence alignment soft-ware version 7 improvements in performance and usability Mol BiolEvol 30772ndash780

Koonin EV Wolf YI Karev GP 2002 The structure of the protein uni-verse and genome evolution Nature 420218ndash223

Keurouck P Meusemann K 2010 FASconCAT convenient handling of datamatrices Mol Phylogenet Evol 561115ndash1118

Kuo C-H Ochman H 2009a Deletional bias across the three domains oflife Genome Biol Evol 1145ndash152

Kuo C-H Ochman H 2009b The fate of new bacterial genes FEMSMicrobiol Rev 3338ndash43

Lagesen K Hallin P Roslashdland EA Staefeldt H-H Rognes T Ussery DW2007 RNAmmer consistent and rapid annotation of ribosomalRNA genes Nucleic Acids Res 353100ndash3108

Lagkouvardos I Weinmaier T Lauro FM Cavicchioli R Rattei T Horn M2013 Integrating metagenomic and amplicon databases to re-solve the phylogenetic and ecological diversity of the ChlamydiaeISME J 8115ndash125

Lerat E Daubin V Ochman H Moran NA 2005 Evolutionary origins ofgenomic repertoires in bacteria PLoS Biol 3e130

Letunic I Bork P 2007 Interactive Tree Of Life (iTOL) an online tool forphylogenetic tree display and annotation Bioinformatics 23127ndash128

2902

Domman et al doi101093molbevmsu227 MBE

Li L Stoeckert CJ Jr Roos DS 2003 OrthoMCL identification ofortholog groups for eukaryotic genomes Genome Res 132178ndash2189

Lomma M Dervins-Ravault D Rolando M Nora T Newton HJ SansomFM Sahr T Gomez-Valero L Jules M Hartland EL et al 2010The Legionella pneumophila F-box protein Lpp2082 (AnkB) modu-lates ubiquitination of the host protein parvin B and promotesintracellular replication Cell Microbiol 121272ndash1291

Lowe TM Eddy SR 1997 tRNAscan-SE a program for improved detec-tion of transfer RNA genes in genomic sequence Nucleic Acids Res25955ndash964

Lynch M Conery JS 2000 The evolutionary fate and consequences ofduplicate genes Science 2901151ndash1155

Ma W Dong FFT Stavrinides J Guttman DS 2006 Type III effec-tor diversification via both pathoadaptation and horizontaltransfer in response to a coevolutionary arms race PLoS Genet 2e209

Magori S Citovsky V 2011 Hijacking of the host SCF ubiquitin ligasemachinery by plant pathogens Front Plant Sci 287

Martin D Rybicki E 2000 RDP detection of recombination amongstaligned sequences Bioinformatics 16562ndash563

Martin DP Lemey P Lott M Moulton V Posada D Lefeuvre P 2010RDP3 a flexible and fast computer program for analyzing recombi-nation Bioinformatics 262462ndash2463

Martin DP Posada D Crandall KA Williamson C 2005 A modifiedbootscan algorithm for automated identification of recombinantsequences and recombination breakpoints AIDS Res HumRetroviruses 2198ndash102

McCutcheon JP Moran NA 2011 Extreme genome reduction in sym-biotic bacteria Nat Rev Microbiol 1013ndash26

McLeod MP Warren RL Hsiao WWL Araki N Myhre M Fernandes CMiyazawa D Wong W Lillquist AL Wang D et al 2006 Thecomplete genome of Rhodococcus sp RHA1 provides insightsinto a catabolic powerhouse Proc Natl Acad Sci U S A 10315582ndash15587

Miller MA Pfeiffer W Schwartz T 2010 Creating the CIPRES sciencegateway for inference of large phylogenetic trees In Proceedings ofthe Gateway Computing Environments Workshop (GCE) 2010 p1ndash8

Misaghi S Balsara ZR Catic A Spooner E Ploegh HL Starnbach MN2006 Chlamydia trachomatis-derived deubiquitinating enzymes inmammalian cells during infection Mol Microbiol 61142ndash150

Myers GSA Crabtree J Huot CH 2012 Deep and wide comparativegenomics of Chlamydia In Tan M Bavoil PM editors Intracellularpathogens Chlamydiales Washington (DC) ASM Press p 27ndash50

Nam J Nei M 2005 Evolutionary change of the numbers of homeoboxgenes in bilateral animals Mol Biol Evol 222386ndash2394

Navarro-Quezada A Schumann N Quint M 2013 Plant F-box proteinevolution is determined by lineage-specific timing of major genefamily expansion waves PLoS One 8e68672

NCBI Resource Coordinators 2014 Database resources of the NationalCenter for Biotechnology Information Nucleic Acids Res 42D7ndashD17

Nei M 2007 The new mutation theory of phenotypic evolution ProcNatl Acad Sci U S A 10412235ndash12242

Nei M Niimura Y Nozawa M 2008 The evolution of animal chemo-sensory receptor gene repertoires roles of chance and necessity NatRev Genet 9951ndash963

Nei M Rooney AP 2005 Concerted and birth-and-death evolution ofmultigene families Annu Rev Genet 39121ndash152

Nogueira T Touchon M Rocha EPC 2012 Rapid evolution of the se-quences and gene repertoires of secreted proteins in bacteria PLoSOne 7e49403

Nozawa M Kawahara Y Nei M 2007 Genomic drift and copy numbervariation of sensory receptor genes in humans Proc Natl Acad SciU S A 10420421ndash20426

Ochman H Lawrence JG Groisman EA 2000 Lateral gene transfer andthe nature of bacterial innovation Nature 405299ndash304

Perez-Torrado R Yamada D Defossez P-A 2006 Born to bind the BTBproteinndashprotein interaction domain BioEssays 281194ndash1202

Peters J Wilson DP Myers G Timms P Bavoil PM 2007 Type III secre-tion a la Chlamydia Trends Microbiol 15241ndash251

Posada D Crandall KA 2001 Evaluation of methods for detecting re-combination from DNA sequences computer simulations Proc NatlAcad Sci U S A 9813757ndash13762

Price CT Kwaik YA 2010 Exploitation of host polyubiquitination ma-chinery through molecular mimicry by eukaryotic-like bacterial F-box effectors Cell Infect Microbiol 1122

Price CT Al-Khodor S Al-Quadan T Santic M Habyarimana F Kalia AKwaik YA 2009 Molecular mimicry by an F-box effector ofLegionella pneumophila hijacks a conserved polyubiquitinationmachinery within macrophages and protozoa PLoS Pathog 5e1000704

Price CTD Al-Quadan T Santic M Rosenshine I Kwaik YA 2011 Hostproteasomal degradation generates amino acids essential for intra-cellular bacterial growth Science 3341553ndash1557

Price MN Dehal PS Arkin AP 2010 FastTree 2mdashapproximately maxi-mum-likelihood trees for large alignments PLoS One 5e9490

Pushker R Mira A Rodriguez-Valera F 2004 Comparative geno-mics of gene-family size in closely related bacteria Genome Biol 5R27

Ronquist F Huelsenbeck JP 2003 MrBayes 3 Bayesian phylogeneticinference under mixed models Bioinformatics 191572ndash1574

Rooney AP Ward TJ 2008 Birth-and-death evolution of the internalinmultigene family in Listeria Gene 427124ndash128

Sakakibara A Hattori S 2000 Chat a CasHEF1-associated adaptor pro-tein that integrates multiple signaling pathways J Biol Chem 2756404ndash6410

Santoyo G Romero D 2005 Gene conversion and concerted evolutionin bacterial genomes FEMS Microbiol Rev 29169ndash183

Sawyer S 1989 Statistical tests for detecting gene conversion Mol BiolEvol 6526ndash538

Schmitz-Esser S Tischler P Arnold R Montanaro J Wagner M Rattei THorn M 2010 The genome of the amoeba symbiont ldquoCandidatusAmoebophilus asiaticusrdquo reveals common mechanisms for host cellinteraction among amoeba-associated bacteria J Bacteriol 1921045ndash1057

Schmitz-Esser S Toenshoff ER Haider S Heinz E Hoenninger VMWagner M Horn M 2008 Diversity of bacterial endosymbionts ofenvironmental Acanthamoeba isolates Appl Environ Microbiol 745822ndash5831

Smith JM 1992 Analyzing the mosaic structure of genes J Mol Evol 34126ndash129

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Stephens RS Kalman S Lammel C Fan J Marathe R Aravind L MitchellW Olinger L Tatusov RL Zhao Q et al 1998 Genome sequence ofan obligate intracellular pathogen of humans Chlamydia trachoma-tis Science 282754ndash759

Stogios PJ Downs GS Jauhal JJ Nandra SK Prive GG 2005Sequence and structural analysis of BTB domain proteins GenomeBiol 6R82

Stolzer M Lai H Xu M Sathaye D Vernot B Durand D 2012Inferring duplications losses transfers and incomplete line-age sorting with nonbinary species trees Bioinformatics 28i409ndashi415

Subtil A Collingro A Horn M 2014 Tracing the primordial Chlamydiaeextinct parasites of plants Trends Plant Sci 1936ndash43

Subtil A Parsot C Dautry-Varsat A 2001 Secretion of predicted Incproteins of Chlamydia pneumoniae by a heterologous type III ma-chinery Mol Microbiol 39792ndash800

Suyama M Torrents D Bork P 2006 PAL2NAL robust conversion ofprotein sequence alignments into the corresponding codon align-ments Nucleic Acids Res 34W609ndashW612

2903

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

Tamura K Peterson D Peterson N Stecher G Nei M Kumar S 2011MEGA5 molecular evolutionary genetics analysis using maximumlikelihood evolutionary distance and maximum parsimony meth-ods Mol Biol Evol 282731ndash2739

Thomas JH 2006 Adaptive evolution in two large families of ubiquitin-ligase adapters in nematodes and plants Genome Res 161017ndash1030

Treangen TJ Rocha EPC 2011 Horizontal transfer not duplicationdrives the expansion of protein families in prokaryotes PLoSGenet 7e1001284

UniProt Consortium 2014 Activities at the Universal Protein Resource(UniProt) Nucleic Acids Res 42D191ndashD198

Valen LV 1974 Molecular evolution as predicted by natural selectionJ Mol Evol 389ndash101

Willems AR Schwab M Tyers M 2004 A hitchhikerrsquos guide to the cullinubiquitin ligases SCF and its kin Biochim Biophys Acta 1695133ndash170

Xu G Ma H Nei M Kong H 2009 Evolution of F-box genes in plantsdifferent modes of sequence divergence and their relationships withfunctional diversification Proc Natl Acad Sci U S A 106835ndash840

Xu J 2006 Microbial ecology in the age of genomics andmetagenomics concepts tools and recent advances Mol Ecol 151713ndash1731

Yang J Lusk R Li W-H 2003 Organismal complexity protein complex-ity and gene duplicability Proc Natl Acad Sci U S A 10015661ndash15665

Yang Z 2007 PAML 4 phylogenetic analysis by maximum likelihoodMol Biol Evol 241586ndash1591

Yu NY Wagner JR Laird MR Melli G Rey S Lo R Dao P Sahinalp SCEster M Foster LJ et al 2010 PSORTb 30 improved protein sub-cellular localization prediction with refined localization subcate-gories and predictive capabilities for all prokaryotes Bioinformatics261608ndash1615

Zhang J 2003 Evolution by gene duplication an update Trends EcolEvol 18292ndash298

Zheng N Schulman BA Song L Miller JJ Jeffrey PD Wang P Chu CKoepp DM Elledge SJ Pagano M et al 2002 Structure of the Cul1ndashRbx1ndashSkp1ndashF boxSkp2 SCF ubiquitin ligase complex Nature 416703ndash709

2904

Domman et al doi101093molbevmsu227 MBE

Page 2: Massive Expansion of Ubiquitination-Related Gene Families

Article

Massive Expansion of Ubiquitination-Related Gene Familieswithin the Chlamydiae

Daryl Domman1 Astrid Collingro1 Ilias Lagkouvardos1 Lena Gehre2 Thomas Weinmaier1

Thomas Rattei1 Agathe Subtil2 and Matthias Horn1

1Department of Microbiology and Ecosystem Science University of Vienna Vienna Austria2Unite de Biologie des Interactions Cellulaires Institut Pasteur Paris France

Corresponding author E-mail hornmicrobial-ecologynet

Associate editor Howard Ochman

Abstract

Gene loss gain and transfer play an important role in shaping the genomes of all organisms however the interplay ofthese processes in isolated populations such as in obligate intracellular bacteria is less understood Despite a generaltrend towards genome reduction in these microbes our phylogenomic analysis of the phylum Chlamydiae revealed thatwithin the family Parachlamydiaceae gene family expansions have had pronounced effects on gene content We discov-ered that the largest gene families within the phylum are the result of rapid gene birth-and-death evolution These largegene families are comprised of members harboring eukaryotic-like ubiquitination-related domains such as F-box andBTB-box domains marking the largest reservoir of these proteins found among bacteria A heterologous type III secretionsystem assay suggests that these proteins function as effectors manipulating the host cell The large disparity in copynumber of members in these families between closely related organisms suggests that nonadaptive processes mightcontribute to the evolution of these gene families Gene birth-and-death evolution in concert with genomic drift mightrepresent a previously undescribed mechanism by which isolated bacterial populations diversify

Key words gene families birth and death model intracellular bacteria effector proteins F-box

IntroductionThe genomes of organisms reveal complex histories of genetransfer loss gain and rearrangement The extent that theseprocesses play in shaping gene families of both prokaryotesand eukaryotes are markedly different Gene gain within eu-karyotes is largely driven by intragenomic duplication events(Lynch and Conery 2000 Koonin et al 2002 Yang et al 2003Kaessmann 2010) and although duplication certainly shapesbacterial genomes most gains are the result of horizontalgene transfer (HGT) events (Ochman et al 2000 Lerat et al2005 Treangen and Rocha 2011) Estimates of the contribu-tion gene duplication processes play across domains of lifevary from 65 to 30 in the genomes of Arabidopsis andEscherichia coli respectively (Zhang 2003) Although geneticinnovation typically arises through gene acquisition from for-eign sources gene duplication events are increasingly beingrecognized as an important driver of bacterial genome evo-lution (Goldman et al 2006 McLeod et al 2006 Cho et al2007)

Comparisons of closely related organisms have revealed ahighly dynamic landscape of gene families in which the copynumber between species can vary substantially (Pushker et al2004 Lerat et al 2005) Given this background an intriguingevolutionary backdrop to study gene family evolution iswithin obligate intracellular bacteria In these populationsthe fixation of mutations is strongly affected by geneticdrift with a propensity in these genomes for deletion

(Kuo and Ochman 2009a) and thus gene family expansionswithin these genomes are generally rare (Hooper and Berg2003 Gevers et al 2004) Insightful analysis on gene familyevolution is best approached when comparing multiple ge-nomes from closely related species facilitating identificationof paralogs (homologous genes resulting from duplication)orthologs (homologous genes resulting from speciation) orxenologs (homologous genes derived from HGT) In thisregard the phylum Chlamydiae offers an ensemble of fullysequenced genomes across multiple families

All members of the phylum Chlamydiae are obligate in-tracellular bacteria and represent one of the most ancient andsuccessful lineages associated with eukaryotes (Horn 2008Subtil et al 2014) These organisms all share a characteristicbiphasic developmental cycle consisting of an infectious ex-tracellular state and an intracellular replicative state Thephylum can be divided into two major phylogenetic group-ings The family Chlamydiaceae which encompass wellknown animal and human pathogens such as Chlamydiatrachomatis and C pneumoniae and a group of families com-prising the environmentally distributed chlamydiae such asSimkaniaceae Waddliaceae and Parachlamydiaceae collec-tively referred to as environmental chlamydiae Recently itwas shown that the diversity of the phylum is tremendouslygreater with perhaps over 200 families spanning nearly everyenvironment (Lagkouvardos et al 2013) All members of theChlamydiae show notable genomic reductions and truncated

The Author 2014 Published by Oxford University Press on behalf of the Society for Molecular Biology and EvolutionThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (httpcreativecommonsorglicensesby40) which permits unrestricted reuse distribution and reproduction in any medium provided the original work isproperly cited Open Access2890 Mol Biol Evol 31(11)2890ndash2904 doi101093molbevmsu227 Advance Access publication July 28 2014

metabolic pathways including the inability to synthesizemany amino acids and nucleotides (Stephens et al 1998Kalman et al 1999 Horn et al 2004 Bertelli et al 2010Collingro et al 2011 Myers et al 2012)

In this study we set out to determine how gene familieshave evolved in members of the phylum Chlamydiae Wepresent four new genome sequences for members of thefamily Parachlamydiaceae which include two genome se-quences for the genus Neochlamydia We show that organ-isms within the Parachlamydiaceae have unprecedentednumbers of proteins harboring domains typically found ineukaryotes the majority of which are related to eukaryoticubiquitination pathways We show that these genes haveundergone rapid expansions and form the largest gene fam-ilies within the phylum We demonstrate that many of theselarge gene families are evolving under a gene-birth-deathmodel (Nei and Rooney 2005) and that differences betweenclosely related organisms may be explained by genomic drift

Results

Genome Sequencing of Novel Members of theChlamydiae

Currently there are nine described families within theChlamydiae however the majority of available genomesequences come from a single family the pathogenic Chla-mydiaceae To deepen our insights into a family outside of theChlamydiaceae we sequenced the genomes of four membersof the family Parachlamydiaceae which include two membersof the genus Neochlamydia and two additional genomes ofProtochlamydia and Parachlamydia All of the newly

sequenced Parachlamydiaceae members were isolated fromfree-living amoeba With the exception of Neochlamydia spEPS4 the isolates have been described previously (Fritscheet al 2000 Heinz et al 2007 Schmitz-Esser et al 2008) Thedraft genomes represent nearly complete genome sequencesbased on paired end read data (90ndash96) and the presence ofconserved single-copy marker genes (98ndash100 supplemen-tary table S1 Supplementary Material online) Using theseadditional genome sequences we first aimed to construct aphylogenetic framework of the phylum Chlamydiae usingconcatenated alignments of 32 marker proteins (supplemen-tary table S2 Supplementary Material online) Phylogenetictrees obtained with different methods confirmed themonophyly of the Chlamydiaceae and the Parachlamydiaceaewith strong support (fig 1) The Chlamydiaceae can besubdivided in two previously recognized groups and withinthe Parachlamydiaceae the genera Protochlamydia Neochla-mydia and Parachlamydia were recovered with highconfidence

All members of the Chlamydiaceae show highly similargenomes in terms of gene content and synteny (Myerset al 2012) however between chlamydial families rearrange-ments have played a major role in genome evolution(Collingro et al 2011) Whole-genome alignments of mem-bers of the Parachlamydiaceae clearly illustrate that within thegenera Protochlamydia Neochlamydia and Parachlamydiathere are few rearrangements and the genomes are highlysyntenic (fig 1) Between these genera however there havebeen extensive genome rearrangements demonstrating thesurprising dynamic nature of these reduced genomes

FIG 1 Phylogeny of the Chlamydiae and rearrangement history of genomes within the Parachlamydiaceae Phylogeny of the Chlamydiae based on 32phylogenetic marker proteins A Bayesian analysis using MrBayes (Ronquist and Huelsenbeck 2003) was performed on a set of 24 ribosomal proteins inaddition to GyrB RecA RpoB RpoC and EF-Tu from 19 sequenced members of the phylum (supplementary table S2 Supplementary Material online)Members of the Planctomycetes (Blastopirellula marina DSM 3645 Candidatus Kuenenia stuttgartiensis and Gemmata obscuriglobus UQM 2246) andthe Verrucomicrobia (Akkermansia muciniphila MucT Lentisphaera araneosa HTCC2155 Opitutus terrae PB90-1 and Verrucomicrobium spinosumDSM 4136) were used as outgroups (not shown) Colors denote family level classification Posterior probability scores are indicated only if below 100To the right conserved synteny and rearrangement history of genomes within the Parachlamydiaceae are shown The genomes of six members of thefamily were aligned using MAUVE to elucidate synteny between genomes and visualized using genoPlotR Extensive rearrangements are apparentbetween members of different genera whereas within genus comparisons show little rearrangements with a notable exception in the Protochlamydiawhere a large block has been rearranged

2891

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

The Gene Family Landscape of the Chlamydiae

To explore gene family evolution among members of theChlamydiae we first identified gene families using clustersof orthologous groups of proteins within the predicted pro-teomes from 19 chlamydial genomes (supplementary tableS2 Supplementary Material online) We then searched forgene families that contain expansion events that is thosehaving multiple members from one organism Previouswork has demonstrated that genome size correlates withthe number of paralogs with larger genomes containingmore paralogs than smaller ones (Bratlie et al 2010) Takinginto account all gene family members regardless of whetherthey originate from duplication processes or transfer eventsthis trend is generally observed within chlamydial genomes(fig 2a)

As described previously gene family expansions are sparsewithin the genomes of the Chlamydiaceae (Kalman et al1999 Kamneva et al 2012) with C pneumoniae CWL029harboring the largest number (n = 24) In line with previousobservations the largest gene families identified in our studyencode the polymorphic membrane proteins (PMPs)(Grimwood and Stephens 1999 Gomes et al 2006) includingnine members from C pneumoniae LPCoLN and two fromthe C trachomatis serovars The observed split of PMPsamong several smaller gene families in our analysis is an in-dication that our approach is rather conservative in assigninga protein to a gene family

The total number of expansion events (n = 277) detectedin the genome of Simkania negevensis represents a 10-foldincrease when compared with the Chlamydiaceae As thesegroup into many small gene families the extended number ofgene copies in S negevensis is the result of many small-scaleduplication or transfer events (fig 2b) This situation is similarin Waddlia chondrophila In stark contrast roughly half of thetotal of genes resulting from expansion events in Neochlamy-dia and Protochlamydia are the contribution of only few genefamilies

Large Gene Family Expansions in theParachlamydiaceae

The detection of large gene families in Neochlamydia andProtochlamydia indicates that there have been several large-scale expansion events within the Parachlamydiaceae Nota-bly different gene families are expanded in Neochlamydia andProtochlamydia (fig 2b) These represent the four largest genefamilies (containing between 27 and 138 members) foundwithin the phylum and include two gene families specific toNeochlamydia and two restricted to Protochlamydia

Intrigued by these four large-scale lineage-specific expan-sion events between the species pairs of Protochlamydia andNeochlamydia we sought to better characterize these genefamilies as most of their members are yet unknown withrespect to their functional role (ie they are classified as hy-pothetical proteins) Remarkably despite being in differentgene families from different organisms there are several sim-ilarities between these proteins Firstly they all encompassproteinndashprotein interaction domains such as leucine-rich

repeats (LRRs) or tetratricopeptide repeats (TPRs) Secondlymany of them contain additional domains typically found ineukaryotes such as F-boxes BTB-boxes and RINGU-boxeswhich are associated with eukaryotic ubiquitination pathways(Angot et al 2007)

A

B

FIG 2 The paralogous gene landscape of the Chlamydiae (A) Numberof paralogous genes within the Chlamydiae The number of paralogousgenes not including multiple copies is plotted against genome sizealong with a linear regression line (y = 4787x ndash 429 R2 = 070 blackline) The dashed gray line is plotted as a reference from 200 prokaryoticgenomes (Bratlie et al 2010) (B) Distribution of chlamydial gene familiesper genome with two or more members The number of genes withineach family is plotted for representative genomes The genomes of theChlamydiaceae have relatively small gene family sizes The polymorphicouter membrane proteins comprise the largest gene families in theChlamydiaceae and can be seen as the two largest blocks in theChlamydia pneumoniae LPCoLN (Cpn) bar The size distribution ofgene families is ordered from smallest to greatest and the appearanceof a solid ldquoblack boxrdquo at the base is merely an effect of the spacing ofmany small gene families There are several extensive gene families (la-beled) within members of Neochlamydia (NEX1 NEX2) andProtochlamydia (PEX1 PEX2)

2892

Domman et al doi101093molbevmsu227 MBE

Eukaryotic Ubiquitination-Associated DomainsPredominate Large Gene Families

The largest gene family termed Neochlamydia expansion 1(NEX1) in the phylum comprised a total of 138 memberswhich are contributed by the two Neochlamydia genomesThe domain architecture within this large gene family is het-erogeneous however all proteins contain various C-terminalrepetitions of LRR domains (fig 3) We have identified twosubfamilies that we delineate NEX1a and NEX1b within theNEX1 family The majority of members fall into the NEX1asubfamily in which they have a highly conserved N-terminalF-box or F-box-like domain The members have an average of77 sequence similarity among each other and the F-box-likedomain is 57 and 45 similar to Acanthamoeba castellaniiand human F-box-like domains respectively The smallerNEX1b family in contrast has a conserved RINGU-box atthe N-terminus Both the F-box and RINGU-box domains areassociated with eukaryotic E3 ubiquitin ligase complexes(Willems et al 2004) Between all members in the NEX1family there is a region of roughly 50 amino acids betweenthe predicted N-terminal domains and the LRRs that is highlyconserved However we failed to detect any known domainsin this region nor was there any homology to known proteins

The second large gene family within the Neochlamydiarepresents the third largest family of the phylum This largegene family termed NEX2 comprised 50 proteins (fig 3)

Similar to NEX1 the prevailing domain architecture is thatof eukaryotic-like E3 ubiquitin ligase-associated domainspaired with repeat domains This family is defined by thepresence of multiple TPR domains at the C-terminus and ageneral conservation of an F-box domain at the N-terminus inthe majority of members In most members there is also aconserved DUF294 domain located mid protein which is aputative nucleotidyltransferase In ten members there is anovarian tumor (OTU) (Balakirev et al 2003) domain directlyfollowing the F-box domain followed by the DUF294 domain

The other large gene families occur primarily in the mem-bers of the Protochlamydia and represent the second andfourth largest gene families of the phylum The largest genefamily in the Protochlamydia (PEX1) comprised 73 membersin total (fig 3) Intriguingly another E3 ubiquitin ligase-relateddomain the BTB domain is present in all but three membersat the N-terminus The BTB domain is then coupled toC-terminal LRR domains in all members The PEX2 familycomprised a total of 27 proteins that despite no detectabledomain at the N-terminus share multiple TPR domains in themiddle of the protein followed by a CHAT domain(Sakakibara and Hattori 2000) at the C-terminus

In summary the four largest gene families represent a sur-prisingly diverse armada of proteins which most likely func-tion within eukaryotic host cells where they potentiallyinterfere with the ubiquitination pathway The heterogeneity

FIG 3 Protein domain architecture of largest gene families The domain architecture of Neochlamydia (NEX1 NEX2) and Protochlamydia (PEX1 PEX2)gene families are shown The range of the number of domain repeats and functional assignments of the detected domains are indicated NEX1 can bedivided into two subfamilies based on phylogeny and domain presenceabsence A role of these proteins in the context of eukaryotic cells can bepostulated based on the presence of domains otherwise found in eukaryotes

2893

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

in domain architecture among these proteins interestinglymirrors that of their eukaryotic counterparts (Perez-Torradoet al 2006 Xu 2006)

A Pool of Putative Effector Proteins

If the members of the largest gene families in the phylumChlamydiae serve as effector proteins for host manipulationthey would need to be secreted and transported to the hostcell cytosol This is typically achieved through a type III secre-tion system a well-conserved virulence mechanism amongthe Chlamydiae which has been shown to translocate severalcharacterized effectors (Peters et al 2007 Betts et al 2009)Indeed many of the proteins found within the expandedNeochlamydia and Protochlamydia gene families are pre-dicted by computational analysis to be secreted by the typeIII secretion system and to be extracellular host associatedWithin the NEX1 family 66 of 138 (49) members are pre-dicted to be secreted A total of 37 (51) and 10 (37) werepredicted to be secreted from within the PEX1 and PEX2 genefamilies respectively The NEX2 gene family had the fewestpredicted with only two members

As the identification of the signal for secretion via the typeIII secretion system is inherently difficult (Arnold et al 2009)we tested representatives of each of the four largest genefamilies in vitro using a heterologous type III secretion sub-strate assay with Shigella flexneri as a host for protein expres-sion This assay has been used to successfully characterize typeIII secretion effector proteins from the chlamydiae before(Subtil et al 2001) and because of the lack of routine genetictools the Sh flexneri system is an attractive surrogate methodfor analyzing type III secretion in chlamydiae in vivo Thisexperiment demonstrated that the tested members ofNEX1 NEX2 PEX1 and PEX2 contain a functional type IIIsecretion recognition signal (supplementary fig S1Supplementary Material online) Together with the presenceof eukaryotic-like domains and computational predictionsthis strongly indicates that these gene families are largepools of effector proteins

Molecular Evolution of Large Gene Families

To better understand how these large gene families may haveevolved we reconstructed their phylogenetic relationshipsGene family trees were calculated using conserved sitesamong the protein alignment (supplementary figs S2ndashS6Supplementary Material online) The average amino acididentity between members ranges from 45 to 64 withthe most closely related sequences belonging to PEX2 Treetopologies suggest that the members of all four gene familieshave rapidly diverged as indicated by their long branchlengths Although the number of LRR and TPR domainsvaries dramatically between 1 and 39 this had no apparenteffect on the phylogenetic placement

We find clear cases in which the orthologs of two speciesgroup together indicating expansions have occurred beforespeciation (supplementary figs S2ndashS6 SupplementaryMaterial online) Alternatively expansions post speciation isapparent in all gene families Reconciliation of gene family

trees with the species tree indicates that in addition to ex-pansions many gene losses have occurred for each genefamily (supplementary table S3 Supplementary Materialonline) For instance for the NEX1 family there have been78 expansion events whereas 33 losses have occurred attrib-uted to 13 and 20 losses in Neochlamydia spp EPS4 andTUME1 respectively There were nearly equal losses betweenEPS4 and TUME1 (13 and 10) in NEX2 and a total of 46expansions Similarly PEX1 consists of 52 expansions and15 and 9 losses in Protochlamydia amoebophila EI2 andUWE25 respectively

A Birth-and-Death Model of Evolution

A pattern of differential gain loss and maintenance of genefamily members is strongly indicative of these gene familiesevolving according to a birth-and-death model (Nei andRooney 2005) Because of this differential maintenance ofgene family members the hallmarks of the birth-and-deathmodel are interspecies clustering of members in the phyloge-netic trees and the presence of pseudogenes from degradedmembers (Nei 2007) As we observe interspecies clustering forthe PEX and NEX gene families (supplementary figs S2ndashS6Supplementary Material online) we also tested for pseudo-genization events in the intergenic regions of the P amoebo-phila UWE25 genome (the draft Neochlamydia genomes areless suitable for this analysis) By utilizing BLAST we searchedfor matches to the predicted proteome using all intergenicregions as a query We then mapped the best BLAST hitsrepresenting 116 pseudogenes to their respective gene fam-ilies to get a picture of a given familiesrsquo representation in theintergenic regions The most represented gene family in theintergenic regions surprisingly was PEX1 (16 pseudogenes)For PEX2 one pseudogenized fragment was detected Theobserved presence of interspecies clustering and pseudo-genes and the dynamic history of gains and losses withinthe gene families are indicative of a birth-and-death modelof evolution

In contrast if these families were evolving via concertedevolution the phylogenetic trees would depict intraspeciesclustering that is that members of a gene family will be morehomologous to the other members from the same organismthan to that of other species Intraspecies clustering occursdue to repeated recombination among gene family memberswithin a genome leading to an overall high sequence similar-ity of all members a process known as gene conversion(Santoyo and Romero 2005) We thus tested for the possibil-ity of recombination within the gene families using the meth-ods implemented in the RDP4 software suite (Martin et al2010) Care must be taken however when assessing theimpact of recombination among divergent proteins as therecombination signal is quite error prone when proteins shareless than 70 similarity and these methods are heavily de-pendent on the alignment (Martin et al 2010) We detectsome recombination events between members within thePEX and NEX gene families however the majority of thepredictions are only marginally significant (the consensusscores are below the confidence threshold of 06) In the

2894

Domman et al doi101093molbevmsu227 MBE

NEX1a family the portion of the sequences most identified asrecombinant is the F-box domain (supplementary fig S8Supplementary Material online) This should be judiciouslyinterpreted as this might represent sequence similarity due topurifying selection operating on this domain Overall we didnot find convincing evidence that recombination has played amajor role in shaping the evolution of the PEX and NEX genefamilies

Gene Duplications and Purifying Selection

Gene family expansions can be the result of either gene du-plication or HGT The disentangling of these events is nottrivial and in fact may be impossible in the case of the genefamilies investigated here due to lineage-specific evolutionand the absence of clear homologs in other bacterial taxa(Kuo and Ochman 2009b) However several lines of evidencesuggest that regardless of the initial origin of these genes geneduplication processes have played a clear role in the evolutionthe large Parachlamydiaceae gene families First a hallmark ofgene duplication is the presence of tandem arrays of genecopies In this regard we find several large tandem arrays withmembers of the large gene families in Protochlamydia andNeochlamydia including examples for recent duplications(fig 4 supplementary fig S7 Supplementary Materialonline) In the P amoebophila UWE25 genome we detected47 tandem duplication events (distance within 10 genes) rep-resented by only 13 gene families Nearly half of these are thecontribution of PEX1 (n = 20) however they appear in severalclusters spread throughout the genome PEX2 demonstratesthe most dramatic case of a tandem array consisting of 13members After the PEX gene families the third largesttandem array only comprised three members intriguinglyalso F-box domain containing proteins NEX1 has 50 instances(36) where members are found within five genes of eachother Second the majority of the large gene family membersmeet the generally used criterion for identification of paralogsthat is they show at least 30 amino acid sequence identityover at least 60 of the protein length Thirdly phylogenetictrees show several species-specific expansion events

(including those genes still arranged in tandem arrays)which are best explained by gene duplication

Effector proteins have been shown to be among the fastestevolving proteins in a number of pathogen genomes(Nogueira et al 2012) and are often shown to be under pos-itive selection To assess the mode of selection acting onmembers of the expanded gene families we calculated theratio of nonsynonymous (dN) versus synonymous (dS) sub-stitution rates for each of the four gene families A dNdS ratioequal to 1 indicates a neutral state of selection whereas valueshigher or lower than 1 indicate positive or purifying selectionrespectively Global estimates of dNdS under the Model M0from CodeML (Yang 2007) ranged from 029 (NEX2) to 063(NEX1a) Additionally using pairwise sequence comparisonsfor dNdS calculation we did not find support that these genefamilies are currently evolving under positive selectionHowever the probability of these gene families evolvingunder purifying selection was highly significant (Plt 001)Therefore purifying selection is currently the dominantforce driving the evolution of these gene families and thusfacilitates their maintenance

Massive Expansion of Ubiquitination-AssociatedProteins

Driven by the discovery that members of the four largestParachlamydiaceae gene families showed rapid divergenceand are kept in chlamydial genomes despite apparent func-tional redundancy we asked whether there are additionalgenes not included in these gene families but encoding similarfunctional domains The common theme of the largeParachlamydiaceae gene families is the presence of domainsthat serve in the recruitment of target proteins to the eukary-otic ubiquitination machinery We therefore extracted all pro-teins containing F-boxF-box-like BTBPOZ and RINGU-boxdomains by scanning all chlamydial proteomes with eachrespective HMM profile We found no RINGU-box contain-ing proteins apart from those identified earlier as members ofNEX1b and we detected only few additional proteins inNeochlamydia and Parachlamydia harboring a BTBPOZdomain similar to those of PEX1 However our search

FIG 4 Example of duplication of BTB-box proteins in the Protochlamydia A new duplicate has arisen in Protochlamydia amoebophila UWE25 (shownvia the arrow) after the split from P amoebophila EI2 BTB-box proteins are indicated in purple The phylogenetic placement of these proteins(supplementary fig S5 Supplementary Material online) supports this scenario Orthologous proteins between the Protochlamydia are indicated byconnecting blocks

2895

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

unveiled an astonishing number of proteins harboring F-boxF-box-like domains with over 370 proteins within thephylum Nearly 300 of the F-box proteins are the contributionof the two Neochlamydia species (129 in TUME1 and 158 inEPS4) To characterize the relationships among this F-boxsuperfamily we constructed a phylogenetic tree based on adomain alignment This analysis shows that many of the ad-ditional F-box proteins found in Neochlamydia cluster witheither NEX1a or NEX2 (fig 5a) We also see several lineage-specific expansions of F-box proteins within Protochlamydiaand Parachlamydia species Reconciliation of the F-boxsuperfamily tree with the chlamydial species tree confirmsan extremely dynamic history of large-scale gene birth anddeath events (fig 5b) mirroring the evolutionary pattern seen

for the large Parachlamydiaceae gene families (supplementaryfigs S2ndashS6 and table S3 Supplementary Material online)

Furthermore we searched for additional chlamydial pro-teins containing domains for proteinndashprotein interactionsuch as LRR TPR or ankyrin repeats which are often associ-ated with F-boxF-box-like BTBPOZ and RINGU-box do-mains in the large gene families This search identified a vastnumber of proteins for each domain For instance chlamydialgenomes encode 409 proteins with LRR domains nearlydoubling the amount of LRR proteins contributed by thelarge Parachlamydiaceae gene families Taken together in ad-dition to the four large gene families there is an even greaterpool of chlamydial proteins with a putative role in hostinteraction

FIG 5 The phylogeny and evolutionary history of the F-box superfamily within the Chlamydiae (A) The phylogeny of 376 proteins within theChlamydiae that harbor an F-boxF-box-like domain This domain was extracted from each protein and aligned using MAFFT Maximum-likelihoodreconstruction of the phylogeny of the superfamily was performed with FastTree2 (B) The F-box domain superfamily gene tree was reconciled with thechlamydial species tree to reconstruct the evolutionary history of this group for members of the Chlamydiae The nodes in blue indicate the predictednumber of F-box proteins and numbers on the branches depict the gains and losses The extant species are indicated with their respective counts forF-box proteins The Neochlamydia have undergone massive gains and losses after the divergence from Protochlamydia

2896

Domman et al doi101093molbevmsu227 MBE

To gain a broader overview of the occurrence of F-boxF-box-like and BTBPOZ domains among other prokaryotesand eukaryotes we extracted domain abundance data fromPfam (Finn et al 2013) and included the current counts forthe Chlamydiae genomes present in this study (fig 6) Thisrevealed a striking pattern that the few bacterial groupsencoding proteins with F-box and BTB domains are almostexclusively amoeba-associated organisms These includemembers of the Legionellales (Gammaproteobacteria) theRickettsiales (Alphaproteobacteria) and the amoeba symbi-ont Amoebophilus asiaticus (Bacteroidetes) When thenumber of F-box proteins is normalized against the totalnumber of species in a given taxon the Chlamydiae lead inthe number of F-box proteins found in bacteria and evenharbor more than several lineages of eukaryotes includingthe Amoebozoa For the BTB proteins the Chlamydiaeappear to be the only bacterial lineage that harbors thisdomain It is intriguing that many of the large double-stranded DNA viruses namely the amoeba-infecting giantviruses contain many proteins with an F-box or BTB domain

Discussion

Large Gene Families Are Rare within ReducedBacterial Genomes

Chlamydial genomes are among the smallest known for pro-karyotes due to genome degradation consistent with long-term obligate associations with eukaryotic organisms(McCutcheon and Moran 2011) The genomes of host-asso-ciated bacteria such as Coxiella Mycoplasma Rickettsia spe-cies and members of the Chlamydiaceae tend to have smallor single copy gene families (Gevers et al 2004) and genefamily expansions either by gene duplication or HGT appearto have less effect on shaping the genomes of these obligatehost-associated bacteria (Bordenstein and Reznikoff 2005)However there is evidence that HGT may be more prevalentthan once thought in these organisms (Blanc et al 2007)A notable exception to this paradigm is the genome of theobligate intracellular pathogen Orientia tsutsugamushi(Rickettsiales) in which there has been massive expansionof type IV secretion system (over 350 tra-related genes) andhostndashmicrobe interaction genes (Cho et al 2007) Howeverthese expansions are thought to be the result of copiousplasmid integration and the genome is also littered withmobile elements a scenario not shared within chlamydialgenomes Here we have shown that several members ofthe Chlamydiae harbor gene families that have expanded atimmense magnitudes especially when compared with otherintracellular bacteria (fig 2) We find strong support for du-plication processes contributing to the expansion of thesefamilies thus highlighting that innovation through gene

FIG 6 Taxonomic profile of F-box and BTB domains The distributionof (A) the F-box clan and (B) the BTB-box throughout sequenced or-ganisms The size of the node indicates the number of species harboringproteins with the domain Thus larger node size indicates a largernumber of species in which a domain is found within a taxon Nodesare ordered from least to greatest by the total number of proteins thatcontain the domain within the taxon This is different than the numberof species as one species can have many proteins harboring a givendomain To reflect this disparity and to facilitate comparisons we com-puted a normalized value for each taxon that represents the number oftotal proteins divided by the number of species This normalizationvalue is represented by the width of the arc in the diagram For instancethe chlamydiae are represented by few species (small node size) but areamong the taxa containing the largest numbers of proteins with F-Boxand BTB domains (position on vertical axis) and show a high number ofproteins with these domains per species (arc width) All bacterial taxa

FIG 6 Continuedare plotted in purple and selected major eukaryotic taxa in blue TheChlamydiae are labeled in red and double-stranded DNA viruses areshown in green The data were obtained from the Pfam database foreach domain and counts were updated to reflect findings in this study

2897

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

duplication has had pronounced effects in shaping these chla-mydial genomes

Chlamydial Proteins Putatively Involved inInterference with Eukaryotic Ubiquitination Pathways

We demonstrated that the largest chlamydial gene familiesharbor proteins with domains associated with eukaryotic ubi-quitination pathways and that the chlamydial F-box super-family in particular is tremendous in size (figs 3 and 5) Arecent survey of prokaryotes found a total of 74 F-box pro-teins distributed in 22 species (Price and Kwaik 2010) whichmeans the number of these proteins present in a singleNeochlamydia genome is twice that of all previously knownbacterial F-box proteins combined The 76-member BTB su-perfamily is also remarkable in that the Parachlamydiaceae arethe only prokaryotes known to harbor this domain (fig 6)

In eukaryotes ubiquitin plays a pivotal regulatory role as aposttranslational modification that includes targeting pro-teins for degradation The process of adding ubiquitin to aprotein occurs when an assembled E3 ubiquitin ligase com-plex carrying a target protein is bound to the ubiquitin con-jugating enzyme E2 The ubiquitin ligase is a multiproteincomplex termed the Skp1-Cullin-F-box protein complex bywhich F-box proteins recruit target proteins and subsequentlybind to Skp1 which is linked to a Cullin protein (Zheng et al2002) A RINGU-box protein then serves as a linker of E2 tothe newly formed ubiquitin ligase and the transfer of theubiquitin moiety to the target occurs BTB-box proteins canhave multiple functions but chief among them is a functionalequivalent to the F-box protein in recruiting targets by bind-ing to the E3 ubiquitin ligase complex (Perez-Torrado et al2006) Proteinndashprotein interaction domains such as ankyrinkelch WD-40 LRR or TPR repeat domains are coupled toF-box and BTB-box domains and confer the specificity fortarget proteins

Given the conservation and essentiality of the ubiquitina-tion pathway throughout eukaryotes it should come as nosurprise that bacterial pathogens have engineered ways tomanipulate this pathway The use of F-box proteins appearsto be a common feature among plant pathogens such asAgrobacterium tumefaciens and Ralstonia solanacearumwhose genomes encode one and four F-box proteins respec-tively (Magori and Citovsky 2011) Intriguingly F-box proteinsseem to be a common feature of amoeba-associated bacteriaand viruses The amoeba symbiont A asiaticus a Bacteroi-detes is predicted to harbor 15 F-box proteins and until nowwas the largest known pool of these proteins among se-quenced genomes (Schmitz-Esser et al 2010) AdditionallyLegionella pneumophila exports an F-box protein coupledto ankyrin repeats termed AnkB that is essential for infectionof both human cell lines and Acanthamoeba (Price et al 2009Lomma et al 2010) AnkB blocks host proteosomal degrada-tion and thus generates increased levels of required aminoacids (Price et al 2009 Price et al 2011) Several other F-boxproteins secreted by L pneumophila have been shown tointeract with host E3 ligase complexes (Ensminger andIsberg 2010) Among the members of the Chlamydiaceae

we could not detect an F-box or BTB-box domain Howeverthere are several proteins within the Chlamydiaceae thatfunction as deubiquinating proteases such as the C tracho-matis ChlaDub1 (Misaghi et al 2006) and the recently de-scribed ChlaOTU characterized in Chlamydia caviae (Furtadoet al 2013)

We have shown experimentally that representative mem-bers of the investigated chlamydial gene families contain func-tional type III secretion signals (supplementary fig S1Supplementary Material online) Thus they likely representan extensive pool of effector proteins with a putative role inhijacking the host ubiquitination machinery perhaps in amanner similar to AnkB from Legionella by increasing nutri-ent availability In the absence of direct proteinndashprotein in-teraction data however we can only speculate as to what theinteraction partner(s) are for the members of the PEX andNEX gene families

Birth-and-Death Evolution Has Shaped LargeParachlamydiaceae Gene Families

Large gene families are thought to generally either evolve viaconcerted evolution or according to a birth-and-death model(Nei and Rooney 2005) When gene families are evolving con-certedly all members experience the same evolutionary pres-sure and evolve as a unit The gene family is marked byrecombination between members that leads to a homogeni-zation of all members and thus in the phylogenetic analysisone observes intraspecies clustering of gene family membersAlthough we do find minor evidence that recombination hasoccurred between members in the PEX and NEX gene families(supplementary fig S8 Supplementary Material online) theeffect does not appear to be that of homogenization In con-trast long branch lengths in the trees and moderate overallsequence similarity indicate these proteins have divergedquite extensively (supplementary figs S2ndashS6 SupplementaryMaterial online) As we do not observe a dominance of in-traspecies clustering in the phylogenetic analysis these genefamilies are not evolving in a fashion as would be predicted viaconcerted evolution

We provide clear evidence supporting birth-and-deathevolution of the PEX and NEX gene families which ismarked by independent gains and losses of members (Neiand Rooney 2005) We detected frequent lineage-specific du-plication and loss events leading to high rates of variation incopy number between closely related organisms (supplemen-tary table S3 Supplementary Material online) In the phylo-genetic trees of the PEX and NEX gene families we findinterspecies clustering of members which is a hallmark ofthe birth-and-death model (supplementary figs S2ndashS6Supplementary Material online) Additionally we detectedpseudogenized gene fragments of members related to thelarge gene families which is another hallmark of this modeof evolution To our knowledge this mode of evolution has sofar only been described once for a bacterial gene family(Rooney and Ward 2008)

It has been proposed that gene families that control phe-notypic characters are generally subject to birth-and-death

2898

Domman et al doi101093molbevmsu227 MBE

evolution (Nei and Rooney 2005 Nei 2007) In the case of theexpansions seen in the Parachlamydiaceae this character islikely the ability to effectively infect and replicate in protisthosts The advantage of the birth-and-death scenario as op-posed to concerted evolution is that individual members ofthe gene family are able to functionally differentiate from eachother and thus might facilitate the adaptation to new eco-logical niches (Nei 2007) Therefore a possible driver for thebirth-and-death model for the PEX and NEX families is theexploitation of new ecological niches which in this case ismost likely new host(s) species or a novel way to subvert hostcell machinery

A Confluence of Drift and Selection

Birth-and-death evolution of gene families occurs by bothadaptive processes and chance events such as genetic drift(Nei et al 2008) This is due to the fact that although geneduplications are intrinsically stochastic events their fixation isinfluenced by both selection and genetic drift If a duplicate isfixed functional divergence can occur due to relaxed selec-tion or diversifying selection in one of the gene copies aprocess known as neofunctionalization (Lynch and Conery2000) Once these genes have diverged the new functionsare then maintained in the genome via purifying selectionWe envision an evolutionary scenario where the PEX and NEXgene family members rapidly diverged either due to positivediversifying or relaxed selection after duplication likely leadingto functional diversification The window for detecting theseearly diversification processes is very small but our analysisindicates that the gene family members are now being main-tained via purifying selection

The retention of such large gene families in organisms thattypically undergo extreme genome reduction is perplexingespecially when considering the variation in copy numberbetween organisms The large number of gains and lossesfor F-box domain containing proteins across chlamydial line-ages indicate substantial fluctuations in gene content some-times over short evolutionary time (fig 5b) One describedcorollary of gene families evolving via a birth-and-deathmodel is that the copy number variation of memberswithin a gene family may vary due to chance duplicationloss events both between and within species a process coinedldquogenomic drift (Nei et al 2008) Genomic drift has been in-voked to explain the large copy number variations in severallarge gene families in eukaryotes including animal chemosen-sory receptors (Nozawa et al 2007) homeobox genes (Namand Nei 2005) and fatty-acid reductases (Eirın-Lopez et al2012)

Remarkably eukaryotic F-box and BTB-box gene familiesdemonstrate the same dramatic evolution following a rapidbirth-and-death model and extensive copy number variationis seen both between and within eukaryotic species (Xu 2006Navarro-Quezada et al 2013) This is especially true in higherplant species (Stogios et al 2005 Xu et al 2009) whereArabidopsis thaliana and Oryza sativa harbor upwards of800 F-box proteins (Xu et al 2009) and large expansionshave also been described in some nematode lineages

(Thomas 2006) A case has been made that genomic driftalso influences the evolution of these eukaryotic F-boxBTB-box protein families (Xu et al 2009) Genomic drift there-fore seems a plausible mechanism describing some aspects ofevolution of the PEX and NEX gene families in theParachlamydiaceae Genome sequences from other chlamyd-ial genomes and data from within populations would allowfurther testing of this hypothesis

Another possibility is that the expansion of these families isthe result of selection pressure due to a coevolutionary armsrace with host counterpart protein(s) Under this scenariothe expansion and diversification of these families are in directresponse to changes in target proteins These evolutionarydynamics also referred to as the Red Queen hypothesis(Valen 1974) have been most exemplified in bacteriawithin plantndashpathogen relationships most notably betweenthe pathogen Pseudomonas syringae and its host Ar thaliana(Ma et al 2006 Baltrus et al 2011) Copy number variation ofeffector proteins has been shown for Ps syringae and it isspeculated that these differences confer differential hostranges (Baltrus et al 2011) It is plausible therefore that var-iations in copy number between the PEX and NEX gene fam-ilies is influenced by the host range of particular chlamydiallineages This would imply that these proteins interact withmany targets from a narrow host range or that there are alimited number of targets for a large number of possible hostsIt seems most parsimonious that the latter was the case andthat these genes serve as accessory virulence factors allowingthese chlamydiae to expand their host range However thefailure to detect positive selection as the major force acting onthe PEX and NEX gene family members casts doubts on acoevolutionary arms race as the sole mechanism for the evo-lution of these gene families

Another conceivable scenario might be that selection pres-sure for increased gene dosage has contributed to the expan-sion of the large chlamydial gene families In reduced genomeswhere the pool of regulatory proteins is limited a path forincreased protein expression might be gene duplication Thismode of ldquogene amplification has been described for instancefor the increase of antibiotic resistance genes in E coli thecholera toxin gene in Vibrio cholerae and capsule biosynthesisgenes in Haemophilus influenza (reviewed in Andersson andHughes 2009) Although a gene dosage scenario cannot befully ruled out for the PEX and NEX gene families it seemsunlikely as the high divergence of their members makes acompletely overlapping function highly improbableAdditionally gene amplifications are in response to strongselection pressure and would therefore be under strong pos-itive selection It is also known that once these selection pres-sures are removed the amplified gene copies are rapidly lostwithin the population often within several generations(Andersson and Hughes 2009)

We favor a hypothesis that incorporates both selectionand chance into the equation We envision that the expan-sion of these gene families is reflective of their role in hostndashmicrobe interaction where they are interfering with the hostubiquitination pathway The expansions may reflect either achange in the environmental niche perhaps the ability to

2899

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

infect a new host organism or they may be in response to agrowing number of targets within a current host Because thePEX and NEX gene families have respectively expanded in theProtochlamydia and Neochlamydia they appear to providelineage-specific functions related to particular host interac-tions The PEX and NEX variation in copy number betweenclosely related organisms may be reflective of genomic drift inwhich the independent gain and loss of members within afamily has been determined to some extent by chanceevents The fact that members of the Chlamydiae includingpathogenic Chlamydia also harbor various proteins to sub-vert the host ubiquitination pathway indicates a case of con-vergent evolution toward exploitation of this system withinthis phylum

To summarize the Chlamydiae harbor several lineage-spe-cific gene families which are the largest among intracellularmicrobes with small genomes Experimental evidence andcomputational analysis strongly indicate that members ofthese large gene families function as effector proteins involvedin manipulating the ubiquitination machinery of their eukary-otic host cells The large chlamydial gene families follow abirth-and-death model of evolution where genomic driftmay influence copy number variation This might representa previously undescribed mechanism by which organismswith limited exposure to larger gene pools generate geneticdiversity

Materials and Methods

Genome Sequencing

We sequenced the genomes of P amoebophila EI2 (Schmitz-Esser et al 2008) Parachlamydia acanthamoebae OEW1(Heinz et al 2007) Neochlamydia sp TUME1 (Fritsche et al2000) and Neochlamydia sp EPS4 The Acanthamoeba spharboring the latter was isolated from pond sediment fromElba Italy Cells and DNA were prepared as described previ-ously (Schmitz-Esser et al 2010) All four genomes were se-quenced using 454 technology and assemblies wereperformed using Newbler 26 We implemented an in-housepipeline for genome annotation that combines multipleapproaches for gene calling and function prediction Genecalling was performed by combining ab initio predictionsfrom GeneMark (Besemer et al 2001) Glimmer v302 (Del-cher et al 2007) Prodigal (Hyatt et al 2010) and Critica v105(Badger and Olsen 1999) with homology information derivedfrom a BLAST search against National Center for Biotechnol-ogy Information nonredundant protein database (NCBIResource Coordinators 2014) RNA genes were called bytRNAscan-SE (Lowe and Eddy 1997) RNAmmer (Lagesenet al 2007) and Rfam (Griffiths-Jones et al 2005) Functionprediction was performed via BLAST against the UniProtdatabase (UniProt Consortium 2014) and domain predictionwas performed via InterProScan 5 (Jones et al 2014)Completeness estimates were performed based on the pres-ence of single-copy marker genes (n = 54) found in 99 of allbacterial genomes Sequences have been deposited at Gen-bankEMBLDDBJ under accession numbers PRJNA242498PRJNA242497 PRJNA242499 and PRJNA242500

Comparative Genome Analysis

Orthologous protein groups were calculated using OrthoMCL(Li et al 2003) with default parameters using the predictedproteomes from 19 chlamydial organisms (supplementarytable S2 Supplementary Material online) For those gene fam-ilies under analysis membership to a given family was furtherevaluated by alignment and assessment of the phylogenetictrees Members that had obvious major differences in thealignment and trees were dubbed spurious and likely tohave been grouped due to homology in the repeat regionand thus were removed from the gene family Whole-genomealignments were performed using the MAUVE progressive-Mauve algorithm (Darling et al 2010) The MAUVE align-ments and local synteny plots were visualized in R (v 301)with the genoPlotR package (Guy et al 2010)

Species Tree Construction

Phylogeny of the Chlamydiae was reconstructed using 32phylogenetic marker proteins (supplementary table S2Supplementary Material online) Multiple sequence align-ments were performed with MAFFT (Katoh and Standley2013) using the settings ldquondashmaxiterate 1000 and all align-ments subsequently concatenated using FASconCAT v10(Keurouck and Meusemann 2010) Maximum-likelihood analysiswas performed with RAxML (Stamatakis 2006) with 1000bootstrap iterations under the PROTGAMMAGTR modeland Bayesian inference was performed with MrBayes v32(Ronquist and Huelsenbeck 2003) using the mixed aminoacid model and standard settings via the CIPRES science gate-way (Miller et al 2010) Alignment and tree files are availableas supplementary material Supplementary Material online

Gene Family Analysis

As multiple sequence alignments of repeat containing pro-teins are not trivial we employed several methods to obtainreliable alignments We compared the alignments producedby MUSCLE (Edgar 2004) MAFFT using the ldquogenapairsoption (Katoh and Standley 2013) and DIALIGN-PFAM (AlAit et al 2013) all with and without trimming by Gblocksusing relaxed parameters ldquo-b5=h (Castresana 2000) In nearlyall cases the MAFFT alignment combined with Gblocksyielded the best alignment as judged by manual inspectionThe exceptions were that DIALIGN-PFAM with Gblocks wasthe best method for the NEX1b and PEX1 alignments Toensure robustness we calculated neighbor joining maximumlikelihood and Bayesian trees for each of the alignment datasets and selected the most supported tree for the final anal-ysis In all cases the Bayesian trees yielded the most supportwhich were calculated on the CIPRES gateway (Miller et al2010) with MrBayes as mentioned above for the species treeFor protein domain phylogenies we extracted and aligned thedomains using the hmmalign program from HMMER3 pack-age (Eddy 2011) and phylogenetic trees for the domains werecalculated using FastTree2 (Price et al 2010) Reconciliationsof gene trees to the species tree to infer gene gain and losswere performed in Notung v26 (Stolzer et al 2012) and theroot for the gene trees was assigned within the program to

2900

Domman et al doi101093molbevmsu227 MBE

achieve the lowest duplication-to-loss ratio Phylogenetictrees were visualized using iTOL (Letunic and Bork 2007) orthe ETE2 toolkit (Huerta-Cepas et al 2010) Alignment andtree files are available as supplementary materialSupplementary Material online

Detection of Selection and Recombination

Whole-protein alignments were converted to codon align-ments via the Pal2Nal program (Suyama et al 2006) Todetect the mode of selection acting on these gene familieswe used CodeML from the PAML package (Yang 2007) Onlysequences that were nearly full length and not contig fusionswere used for the CodeML analysis We additionally used themodified NeindashGojobori method for codon selection from theMEGA v51 (Tamura et al 2011) program with 1000 boot-straps and treating missing data with pairwise deletions Weused the RDP4 software suite (Martin et al 2010) to detectrecombination events which is an amalgamation of manyindividual recombination programs linked into one softwarearchitecture We employed seven recombination detectionprograms that include RDP (Martin and Rybicki 2000)GENECONV (Sawyer 1989) MaxChi (Smith 1992) BootScan(Martin et al 2005) Chimaera (Posada and Crandall 2001)SiScan (Gibbs et al 2000) and 3Seq (Boni et al 2007) Onlyevents that were predicted with more than four programswere considered

Intergenic Sequences Analysis

The proteome of Protochlamydia was used as query(tBLASTn) against the intergenic regions The hits were con-servatively filtered based on size (4 50 aa) bitscore (4 50)and E value (lt1010) For each intergenic region when twoquery proteins overlap only the best hit was considered Inaddition if different parts of the same protein had hits in oneintergenic space only the top scoring (bitscore) was consid-ered Putative domains in the intergenic regions were de-tected using InterProScan 5 (Jones et al 2014)

Domain Distribution

For each domain of interest we downloaded the Pfam HMMmodel (Finn et al 2013) and scanned all chlamydial prote-omes using the hmmscan program from the HMMER3 suite(Eddy 2011) In the case of the F-box and F-box-like domainsthe results were combined into a nonredundant list The datafor the other taxa were obtained through the Pfam websiteunder the species distribution tab for each domain or clan asin the case for the F-box The networks were created using theldquoarcdiagramrdquo package in R

Type III Secretion Analysis

Fusion proteins containing the 50-part of the genes of interest(including the first 20 codons) and the adenylate cyclase Cyawere expressed in Sh Flexneri SF401 and SF620 derivatives ofthe wild-type strain M90T in which the mxiD and ipaB geneshave been inactivated (Allaoui et al 1993) The 50-part of thetarget genes were amplified by polymerase chain reaction andcloned in the puc19cya vector as described (Subtil et al 2001)

Secretion assays were performed on 30 ml of exponentiallygrown cultures as described previously (Subtil et al 2001)Antibodies against CRP a cytosolic marker were used to es-timate the contamination of supernatant fractions with bac-terial proteins as a result of bacterial lysis Antibodies againstIpaD a type-III-secreted protein of Shigella were used to verifythat type III secretion occurred normally in the transformedstrains A monoclonal antibody against Cya and polyclonalantibodies against CRP and IpaD were generously provided byDrs N Guiso A Ullmann and C Parsot respectively (InstitutPasteur Paris) Prediction of type III secretion was performedvia the Effective database (Jehl et al 2011) and the PSORTbwebserver (Yu et al 2010)

Supplementary MaterialSupplementary tables S1ndashS3 and figures S1ndashS8 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Thomas Penz and Gabriele Schwammel are acknowledged forisolating genomic DNA This work was supported byAustrian Science Fund FWF grants Y277-B06 and I1628-B22European Research Council ERC StG EVOCHLAMY (281633)and Marie Curie Initial Training Network SymbiomicsGenome sequencing was performed at the NorwegianSequencing Center

ReferencesAl Ait L Yamak Z Morgenstern B 2013 DIALIGN at GOBICSmdashmultiple

sequence alignment using various sources of external informationNucleic Acids Res 41W3ndashW7

Allaoui A Sansonetti PJ Parsot C 1993 MxiD an outer membraneprotein necessary for the secretion of the Shigella flexneri lpa inva-sins Mol Microbiol 759ndash68

Andersson DI Hughes D 2009 Gene amplification and adaptive evolu-tion in bacteria Annu Rev Genet 43167ndash195

Angot A Vergunst A Genin S Peeters N 2007 Exploitationof eukaryotic ubiquitin signaling pathways by effectorstranslocated by bacterial type III and type IV Secretion SystemsPLoS Pathog 3e3

Arnold R Brandmaier S Kleine F Tischler P Heinz E Behrens SNiinikoski A Mewes HW Horn M Rattei T 2009 Sequence-based prediction of type III secreted proteins PLoS Pathog 5e1000376

Badger JH Olsen GJ 1999 CRITICA coding region identification toolinvoking comparative analysis Mol Biol Evol 16512ndash524

Balakirev MY Tcherniuk SO Jaquinod M Chroboczek J 2003 Otubainsa new family of cysteine proteases in the ubiquitin pathway EMBORep 4517ndash522

Baltrus DA Nishimura MT Romanchuk A Chang JH Mukhtar MSCherkis K Roach J Grant SR Jones CD Dangl JL 2011 Dynamicevolution of pathogenicity revealed by sequencing and comparativegenomics of 19 Pseudomonas syringae isolates PLoS Pathog 7e1002132

Bertelli C Collyn F Croxatto A Reurouckert C Polkinghorne A Kebbi-Beghdadi C Goesmann A Vaughan L Greub G 2010 TheWaddlia genome a window into chlamydial biology PLoS One 5e10890

2901

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

Besemer J Lomsadze A Borodovsky M 2001 GeneMarkS a self-trainingmethod for prediction of gene starts in microbial genomesImplications for finding sequence motifs in regulatory regionsNucleic Acids Res 292607ndash2618

Betts HJ Wolf K Fields KA 2009 Effector protein modulation of hostcells examples in the Chlamydia spp arsenal Curr Opin Microbiol1281ndash87

Blanc G Ogata H Robert C Audic S Claverie J-M Raoult D 2007Lateral gene transfer between obligate intracellular bacteria evi-dence from the Rickettsia massiliae genome Genome Res 171657ndash1664

Boni MF Posada D Feldman MW 2007 An exact nonparametricmethod for inferring mosaic structure in sequence tripletsGenetics 1761035ndash1047

Bordenstein SR Reznikoff WS 2005 Mobile DNA in obligate intracellularbacteria Nat Rev Microbiol 3688ndash699

Bratlie MS Johansen J Sherman BT Huang DW Lempicki RA Drabloslashs F2010 Gene duplications in prokaryotes can be associated with en-vironmental adaptation BMC Genomics 11588

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Cho N-H Kim H-R Lee J-H Kim SY Kim J Cha S Kim SY Darby ACFuxelius HH Yin J et al 2007 The Orientia tsutsugamushi genome re-veals massive proliferation of conjugative type IV secretion systemand host-cell interaction genes Proc Natl Acad Sci U S A 1047981ndash7986

Collingro A Tischler P Weinmaier T Penz T Heinz E Brunham RC ReadTD Bavoil PM Sachse K Kahane S et al 2011 Unity in varietymdashthe pan-genome of the Chlamydiae Mol Biol Evol 283253ndash3270

Darling AE Mau B Perna NT 2010 progressiveMauve multiple genomealignment with gene gain loss and rearrangement PLoS One 5e11147

Delcher AL Bratke KA Powers EC Salzberg SL 2007 Identifying bacterialgenes and endosymbiont DNA with Glimmer Bioinformatics 23673ndash679

Eddy SR 2011 Accelerated profile hmm searches PLoS Comput Biol 7e1002195

Edgar RC 2004 MUSCLE multiple sequence alignmentwith high accuracy and high throughput Nucleic Acids Res 321792ndash1797

Eirın-Lopez JM Rebordinos L Rooney AP Rozas J 2012 The birth-and-death evolution of multigene families revisited Genome Dyn7170ndash196

Ensminger AW Isberg RR 2010 E3 ubiquitin ligase activity and targetingof Bat3 by multiple Legionella pneumophila translocated substratesInfect Immun 783905ndash3919

Finn RD Bateman A Clements J Coggill P Eberhardt RY Eddy SR HegerA Hetherington K Holm L Mistry J et al 2013 Pfam the proteinfamilies database Nucleic Acids Res 42D222ndashD230

Fritsche TR Horn M Wagner M Herwig RP Schleifer K-H Gautom RK2000 Phylogenetic diversity among geographically dispersedChlamydiales endosymbionts recovered from clinical and environ-mental isolates of Acanthamoeba spp Appl Environ Microbiol 662613ndash2619

Furtado AR Essid M Perrinet S Bala~na ME Yoder N Dehoux P Subtil A2013 The chlamydial OTU domain-containing protein ChlaOTU isan early type III secretion effector targeting ubiquitin and NDP52Cell Microbiol 152064ndash2079

Gevers D Vandepoele K Simillon C Van de Peer Y 2004 Gene dupli-cation and biased functional retention of paralogs in bacterial ge-nomes Trends Microbiol 12148ndash154

Gibbs MJ Armstrong JS Gibbs AJ 2000 Sister-scanning a Monte Carloprocedure for assessing signals in recombinant sequencesBioinformatics 16573ndash582

Goldman BS Nierman WC Kaiser D Slater SC Durkin AS Eisen JARonning CM Barbazuk WB Blanchard M Field C et al 2006Evolution of sensory complexity recorded in a myxobacterialgenome Proc Natl Acad Sci U S A 10315200ndash15205

Gomes JP Nunes A Bruno WJ Borrego MJ Florindo C Dean D 2006Polymorphisms in the nine polymorphic membrane proteins ofChlamydia trachomatis across all serovars evidence for serovar Darecombination and correlation with tissue tropism J Bacteriol 188275ndash286

Griffiths-Jones S Moxon S Marshall M Khanna A Eddy SR Bateman A2005 Rfam annotating non-coding RNAs in complete genomesNucleic Acids Res 33D121ndashD124

Grimwood J Stephens RS 1999 Computational analysis of thepolymorphic membrane protein superfamily of Chlamydia tracho-matis and Chlamydia pneumoniae Microb Comp Genomics 4187ndash201

Guy L Kultima JR Andersson SGE 2010 genoPlotR comparative geneand genome visualization in R Bioinformatics 262334ndash2335

Heinz E Kolarov I Keuroastner C Toenshoff ER Wagner M Horn M 2007An Acanthamoeba sp containing two phylogenetically differentbacterial endosymbionts Environ Microbiol 91604ndash1609

Hooper SD Berg OG 2003 On the nature of gene innovation duplica-tion patterns in microbial genomes Mol Biol Evol 20945ndash954

Horn M 2008 Chlamydiae as symbionts in eukaryotes Annu RevMicrobiol 62113ndash131

Horn M Collingro A Schmitz-Esser S Beier CL Purkhold U Fartmann BBrandt P Nyakatura GJ Droege M Frishman D et al 2004Illuminating the evolutionary history of chlamydiae Science 304728ndash730

Huerta-Cepas J Dopazo J Gabaldon T 2010 ETE a python environmentfor tree exploration BMC Bioinformatics 1124

Hyatt D Chen G-L LoCascio PF Land ML Larimer FW Hauser LJ 2010Prodigal prokaryotic gene recognition and translation initiation siteidentification BMC Bioinformatics 11119

Jehl M-A Arnold R Rattei T 2011 Effectivemdasha database ofpredicted secreted bacterial proteins Nucleic Acids Res 39D591ndashD595

Jones P Binns D Chang H-Y Fraser M Li W McAnulla C McWilliam HMaslen J Mitchell A Nuka G et al 2014 InterProScan 5 genome-scale protein function classification Bioinformatics 301236ndash1240

Kaessmann H 2010 Origins evolution and phenotypic impact of newgenes Genome Res 201313ndash1326

Kalman S Mitchell W Marathe R Lammel C Fan J Hyman RW OlingerL Grimwood J Davis RW Stephens RS 1999 Comparative genomesof Chlamydia pneumoniae and C trachomatis Nat Genet 21385ndash389

Kamneva OK Knight SJ Liberles DA Ward NL 2012 Analysis of genomecontent evolution in PVC bacterial super-phylum assessment ofcandidate genes associated with cellular organization and lifestyleGenome Biol Evol 41375ndash1390

Katoh K Standley DM 2013 MAFFT multiple sequence alignment soft-ware version 7 improvements in performance and usability Mol BiolEvol 30772ndash780

Koonin EV Wolf YI Karev GP 2002 The structure of the protein uni-verse and genome evolution Nature 420218ndash223

Keurouck P Meusemann K 2010 FASconCAT convenient handling of datamatrices Mol Phylogenet Evol 561115ndash1118

Kuo C-H Ochman H 2009a Deletional bias across the three domains oflife Genome Biol Evol 1145ndash152

Kuo C-H Ochman H 2009b The fate of new bacterial genes FEMSMicrobiol Rev 3338ndash43

Lagesen K Hallin P Roslashdland EA Staefeldt H-H Rognes T Ussery DW2007 RNAmmer consistent and rapid annotation of ribosomalRNA genes Nucleic Acids Res 353100ndash3108

Lagkouvardos I Weinmaier T Lauro FM Cavicchioli R Rattei T Horn M2013 Integrating metagenomic and amplicon databases to re-solve the phylogenetic and ecological diversity of the ChlamydiaeISME J 8115ndash125

Lerat E Daubin V Ochman H Moran NA 2005 Evolutionary origins ofgenomic repertoires in bacteria PLoS Biol 3e130

Letunic I Bork P 2007 Interactive Tree Of Life (iTOL) an online tool forphylogenetic tree display and annotation Bioinformatics 23127ndash128

2902

Domman et al doi101093molbevmsu227 MBE

Li L Stoeckert CJ Jr Roos DS 2003 OrthoMCL identification ofortholog groups for eukaryotic genomes Genome Res 132178ndash2189

Lomma M Dervins-Ravault D Rolando M Nora T Newton HJ SansomFM Sahr T Gomez-Valero L Jules M Hartland EL et al 2010The Legionella pneumophila F-box protein Lpp2082 (AnkB) modu-lates ubiquitination of the host protein parvin B and promotesintracellular replication Cell Microbiol 121272ndash1291

Lowe TM Eddy SR 1997 tRNAscan-SE a program for improved detec-tion of transfer RNA genes in genomic sequence Nucleic Acids Res25955ndash964

Lynch M Conery JS 2000 The evolutionary fate and consequences ofduplicate genes Science 2901151ndash1155

Ma W Dong FFT Stavrinides J Guttman DS 2006 Type III effec-tor diversification via both pathoadaptation and horizontaltransfer in response to a coevolutionary arms race PLoS Genet 2e209

Magori S Citovsky V 2011 Hijacking of the host SCF ubiquitin ligasemachinery by plant pathogens Front Plant Sci 287

Martin D Rybicki E 2000 RDP detection of recombination amongstaligned sequences Bioinformatics 16562ndash563

Martin DP Lemey P Lott M Moulton V Posada D Lefeuvre P 2010RDP3 a flexible and fast computer program for analyzing recombi-nation Bioinformatics 262462ndash2463

Martin DP Posada D Crandall KA Williamson C 2005 A modifiedbootscan algorithm for automated identification of recombinantsequences and recombination breakpoints AIDS Res HumRetroviruses 2198ndash102

McCutcheon JP Moran NA 2011 Extreme genome reduction in sym-biotic bacteria Nat Rev Microbiol 1013ndash26

McLeod MP Warren RL Hsiao WWL Araki N Myhre M Fernandes CMiyazawa D Wong W Lillquist AL Wang D et al 2006 Thecomplete genome of Rhodococcus sp RHA1 provides insightsinto a catabolic powerhouse Proc Natl Acad Sci U S A 10315582ndash15587

Miller MA Pfeiffer W Schwartz T 2010 Creating the CIPRES sciencegateway for inference of large phylogenetic trees In Proceedings ofthe Gateway Computing Environments Workshop (GCE) 2010 p1ndash8

Misaghi S Balsara ZR Catic A Spooner E Ploegh HL Starnbach MN2006 Chlamydia trachomatis-derived deubiquitinating enzymes inmammalian cells during infection Mol Microbiol 61142ndash150

Myers GSA Crabtree J Huot CH 2012 Deep and wide comparativegenomics of Chlamydia In Tan M Bavoil PM editors Intracellularpathogens Chlamydiales Washington (DC) ASM Press p 27ndash50

Nam J Nei M 2005 Evolutionary change of the numbers of homeoboxgenes in bilateral animals Mol Biol Evol 222386ndash2394

Navarro-Quezada A Schumann N Quint M 2013 Plant F-box proteinevolution is determined by lineage-specific timing of major genefamily expansion waves PLoS One 8e68672

NCBI Resource Coordinators 2014 Database resources of the NationalCenter for Biotechnology Information Nucleic Acids Res 42D7ndashD17

Nei M 2007 The new mutation theory of phenotypic evolution ProcNatl Acad Sci U S A 10412235ndash12242

Nei M Niimura Y Nozawa M 2008 The evolution of animal chemo-sensory receptor gene repertoires roles of chance and necessity NatRev Genet 9951ndash963

Nei M Rooney AP 2005 Concerted and birth-and-death evolution ofmultigene families Annu Rev Genet 39121ndash152

Nogueira T Touchon M Rocha EPC 2012 Rapid evolution of the se-quences and gene repertoires of secreted proteins in bacteria PLoSOne 7e49403

Nozawa M Kawahara Y Nei M 2007 Genomic drift and copy numbervariation of sensory receptor genes in humans Proc Natl Acad SciU S A 10420421ndash20426

Ochman H Lawrence JG Groisman EA 2000 Lateral gene transfer andthe nature of bacterial innovation Nature 405299ndash304

Perez-Torrado R Yamada D Defossez P-A 2006 Born to bind the BTBproteinndashprotein interaction domain BioEssays 281194ndash1202

Peters J Wilson DP Myers G Timms P Bavoil PM 2007 Type III secre-tion a la Chlamydia Trends Microbiol 15241ndash251

Posada D Crandall KA 2001 Evaluation of methods for detecting re-combination from DNA sequences computer simulations Proc NatlAcad Sci U S A 9813757ndash13762

Price CT Kwaik YA 2010 Exploitation of host polyubiquitination ma-chinery through molecular mimicry by eukaryotic-like bacterial F-box effectors Cell Infect Microbiol 1122

Price CT Al-Khodor S Al-Quadan T Santic M Habyarimana F Kalia AKwaik YA 2009 Molecular mimicry by an F-box effector ofLegionella pneumophila hijacks a conserved polyubiquitinationmachinery within macrophages and protozoa PLoS Pathog 5e1000704

Price CTD Al-Quadan T Santic M Rosenshine I Kwaik YA 2011 Hostproteasomal degradation generates amino acids essential for intra-cellular bacterial growth Science 3341553ndash1557

Price MN Dehal PS Arkin AP 2010 FastTree 2mdashapproximately maxi-mum-likelihood trees for large alignments PLoS One 5e9490

Pushker R Mira A Rodriguez-Valera F 2004 Comparative geno-mics of gene-family size in closely related bacteria Genome Biol 5R27

Ronquist F Huelsenbeck JP 2003 MrBayes 3 Bayesian phylogeneticinference under mixed models Bioinformatics 191572ndash1574

Rooney AP Ward TJ 2008 Birth-and-death evolution of the internalinmultigene family in Listeria Gene 427124ndash128

Sakakibara A Hattori S 2000 Chat a CasHEF1-associated adaptor pro-tein that integrates multiple signaling pathways J Biol Chem 2756404ndash6410

Santoyo G Romero D 2005 Gene conversion and concerted evolutionin bacterial genomes FEMS Microbiol Rev 29169ndash183

Sawyer S 1989 Statistical tests for detecting gene conversion Mol BiolEvol 6526ndash538

Schmitz-Esser S Tischler P Arnold R Montanaro J Wagner M Rattei THorn M 2010 The genome of the amoeba symbiont ldquoCandidatusAmoebophilus asiaticusrdquo reveals common mechanisms for host cellinteraction among amoeba-associated bacteria J Bacteriol 1921045ndash1057

Schmitz-Esser S Toenshoff ER Haider S Heinz E Hoenninger VMWagner M Horn M 2008 Diversity of bacterial endosymbionts ofenvironmental Acanthamoeba isolates Appl Environ Microbiol 745822ndash5831

Smith JM 1992 Analyzing the mosaic structure of genes J Mol Evol 34126ndash129

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Stephens RS Kalman S Lammel C Fan J Marathe R Aravind L MitchellW Olinger L Tatusov RL Zhao Q et al 1998 Genome sequence ofan obligate intracellular pathogen of humans Chlamydia trachoma-tis Science 282754ndash759

Stogios PJ Downs GS Jauhal JJ Nandra SK Prive GG 2005Sequence and structural analysis of BTB domain proteins GenomeBiol 6R82

Stolzer M Lai H Xu M Sathaye D Vernot B Durand D 2012Inferring duplications losses transfers and incomplete line-age sorting with nonbinary species trees Bioinformatics 28i409ndashi415

Subtil A Collingro A Horn M 2014 Tracing the primordial Chlamydiaeextinct parasites of plants Trends Plant Sci 1936ndash43

Subtil A Parsot C Dautry-Varsat A 2001 Secretion of predicted Incproteins of Chlamydia pneumoniae by a heterologous type III ma-chinery Mol Microbiol 39792ndash800

Suyama M Torrents D Bork P 2006 PAL2NAL robust conversion ofprotein sequence alignments into the corresponding codon align-ments Nucleic Acids Res 34W609ndashW612

2903

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

Tamura K Peterson D Peterson N Stecher G Nei M Kumar S 2011MEGA5 molecular evolutionary genetics analysis using maximumlikelihood evolutionary distance and maximum parsimony meth-ods Mol Biol Evol 282731ndash2739

Thomas JH 2006 Adaptive evolution in two large families of ubiquitin-ligase adapters in nematodes and plants Genome Res 161017ndash1030

Treangen TJ Rocha EPC 2011 Horizontal transfer not duplicationdrives the expansion of protein families in prokaryotes PLoSGenet 7e1001284

UniProt Consortium 2014 Activities at the Universal Protein Resource(UniProt) Nucleic Acids Res 42D191ndashD198

Valen LV 1974 Molecular evolution as predicted by natural selectionJ Mol Evol 389ndash101

Willems AR Schwab M Tyers M 2004 A hitchhikerrsquos guide to the cullinubiquitin ligases SCF and its kin Biochim Biophys Acta 1695133ndash170

Xu G Ma H Nei M Kong H 2009 Evolution of F-box genes in plantsdifferent modes of sequence divergence and their relationships withfunctional diversification Proc Natl Acad Sci U S A 106835ndash840

Xu J 2006 Microbial ecology in the age of genomics andmetagenomics concepts tools and recent advances Mol Ecol 151713ndash1731

Yang J Lusk R Li W-H 2003 Organismal complexity protein complex-ity and gene duplicability Proc Natl Acad Sci U S A 10015661ndash15665

Yang Z 2007 PAML 4 phylogenetic analysis by maximum likelihoodMol Biol Evol 241586ndash1591

Yu NY Wagner JR Laird MR Melli G Rey S Lo R Dao P Sahinalp SCEster M Foster LJ et al 2010 PSORTb 30 improved protein sub-cellular localization prediction with refined localization subcate-gories and predictive capabilities for all prokaryotes Bioinformatics261608ndash1615

Zhang J 2003 Evolution by gene duplication an update Trends EcolEvol 18292ndash298

Zheng N Schulman BA Song L Miller JJ Jeffrey PD Wang P Chu CKoepp DM Elledge SJ Pagano M et al 2002 Structure of the Cul1ndashRbx1ndashSkp1ndashF boxSkp2 SCF ubiquitin ligase complex Nature 416703ndash709

2904

Domman et al doi101093molbevmsu227 MBE

Page 3: Massive Expansion of Ubiquitination-Related Gene Families

metabolic pathways including the inability to synthesizemany amino acids and nucleotides (Stephens et al 1998Kalman et al 1999 Horn et al 2004 Bertelli et al 2010Collingro et al 2011 Myers et al 2012)

In this study we set out to determine how gene familieshave evolved in members of the phylum Chlamydiae Wepresent four new genome sequences for members of thefamily Parachlamydiaceae which include two genome se-quences for the genus Neochlamydia We show that organ-isms within the Parachlamydiaceae have unprecedentednumbers of proteins harboring domains typically found ineukaryotes the majority of which are related to eukaryoticubiquitination pathways We show that these genes haveundergone rapid expansions and form the largest gene fam-ilies within the phylum We demonstrate that many of theselarge gene families are evolving under a gene-birth-deathmodel (Nei and Rooney 2005) and that differences betweenclosely related organisms may be explained by genomic drift

Results

Genome Sequencing of Novel Members of theChlamydiae

Currently there are nine described families within theChlamydiae however the majority of available genomesequences come from a single family the pathogenic Chla-mydiaceae To deepen our insights into a family outside of theChlamydiaceae we sequenced the genomes of four membersof the family Parachlamydiaceae which include two membersof the genus Neochlamydia and two additional genomes ofProtochlamydia and Parachlamydia All of the newly

sequenced Parachlamydiaceae members were isolated fromfree-living amoeba With the exception of Neochlamydia spEPS4 the isolates have been described previously (Fritscheet al 2000 Heinz et al 2007 Schmitz-Esser et al 2008) Thedraft genomes represent nearly complete genome sequencesbased on paired end read data (90ndash96) and the presence ofconserved single-copy marker genes (98ndash100 supplemen-tary table S1 Supplementary Material online) Using theseadditional genome sequences we first aimed to construct aphylogenetic framework of the phylum Chlamydiae usingconcatenated alignments of 32 marker proteins (supplemen-tary table S2 Supplementary Material online) Phylogenetictrees obtained with different methods confirmed themonophyly of the Chlamydiaceae and the Parachlamydiaceaewith strong support (fig 1) The Chlamydiaceae can besubdivided in two previously recognized groups and withinthe Parachlamydiaceae the genera Protochlamydia Neochla-mydia and Parachlamydia were recovered with highconfidence

All members of the Chlamydiaceae show highly similargenomes in terms of gene content and synteny (Myerset al 2012) however between chlamydial families rearrange-ments have played a major role in genome evolution(Collingro et al 2011) Whole-genome alignments of mem-bers of the Parachlamydiaceae clearly illustrate that within thegenera Protochlamydia Neochlamydia and Parachlamydiathere are few rearrangements and the genomes are highlysyntenic (fig 1) Between these genera however there havebeen extensive genome rearrangements demonstrating thesurprising dynamic nature of these reduced genomes

FIG 1 Phylogeny of the Chlamydiae and rearrangement history of genomes within the Parachlamydiaceae Phylogeny of the Chlamydiae based on 32phylogenetic marker proteins A Bayesian analysis using MrBayes (Ronquist and Huelsenbeck 2003) was performed on a set of 24 ribosomal proteins inaddition to GyrB RecA RpoB RpoC and EF-Tu from 19 sequenced members of the phylum (supplementary table S2 Supplementary Material online)Members of the Planctomycetes (Blastopirellula marina DSM 3645 Candidatus Kuenenia stuttgartiensis and Gemmata obscuriglobus UQM 2246) andthe Verrucomicrobia (Akkermansia muciniphila MucT Lentisphaera araneosa HTCC2155 Opitutus terrae PB90-1 and Verrucomicrobium spinosumDSM 4136) were used as outgroups (not shown) Colors denote family level classification Posterior probability scores are indicated only if below 100To the right conserved synteny and rearrangement history of genomes within the Parachlamydiaceae are shown The genomes of six members of thefamily were aligned using MAUVE to elucidate synteny between genomes and visualized using genoPlotR Extensive rearrangements are apparentbetween members of different genera whereas within genus comparisons show little rearrangements with a notable exception in the Protochlamydiawhere a large block has been rearranged

2891

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

The Gene Family Landscape of the Chlamydiae

To explore gene family evolution among members of theChlamydiae we first identified gene families using clustersof orthologous groups of proteins within the predicted pro-teomes from 19 chlamydial genomes (supplementary tableS2 Supplementary Material online) We then searched forgene families that contain expansion events that is thosehaving multiple members from one organism Previouswork has demonstrated that genome size correlates withthe number of paralogs with larger genomes containingmore paralogs than smaller ones (Bratlie et al 2010) Takinginto account all gene family members regardless of whetherthey originate from duplication processes or transfer eventsthis trend is generally observed within chlamydial genomes(fig 2a)

As described previously gene family expansions are sparsewithin the genomes of the Chlamydiaceae (Kalman et al1999 Kamneva et al 2012) with C pneumoniae CWL029harboring the largest number (n = 24) In line with previousobservations the largest gene families identified in our studyencode the polymorphic membrane proteins (PMPs)(Grimwood and Stephens 1999 Gomes et al 2006) includingnine members from C pneumoniae LPCoLN and two fromthe C trachomatis serovars The observed split of PMPsamong several smaller gene families in our analysis is an in-dication that our approach is rather conservative in assigninga protein to a gene family

The total number of expansion events (n = 277) detectedin the genome of Simkania negevensis represents a 10-foldincrease when compared with the Chlamydiaceae As thesegroup into many small gene families the extended number ofgene copies in S negevensis is the result of many small-scaleduplication or transfer events (fig 2b) This situation is similarin Waddlia chondrophila In stark contrast roughly half of thetotal of genes resulting from expansion events in Neochlamy-dia and Protochlamydia are the contribution of only few genefamilies

Large Gene Family Expansions in theParachlamydiaceae

The detection of large gene families in Neochlamydia andProtochlamydia indicates that there have been several large-scale expansion events within the Parachlamydiaceae Nota-bly different gene families are expanded in Neochlamydia andProtochlamydia (fig 2b) These represent the four largest genefamilies (containing between 27 and 138 members) foundwithin the phylum and include two gene families specific toNeochlamydia and two restricted to Protochlamydia

Intrigued by these four large-scale lineage-specific expan-sion events between the species pairs of Protochlamydia andNeochlamydia we sought to better characterize these genefamilies as most of their members are yet unknown withrespect to their functional role (ie they are classified as hy-pothetical proteins) Remarkably despite being in differentgene families from different organisms there are several sim-ilarities between these proteins Firstly they all encompassproteinndashprotein interaction domains such as leucine-rich

repeats (LRRs) or tetratricopeptide repeats (TPRs) Secondlymany of them contain additional domains typically found ineukaryotes such as F-boxes BTB-boxes and RINGU-boxeswhich are associated with eukaryotic ubiquitination pathways(Angot et al 2007)

A

B

FIG 2 The paralogous gene landscape of the Chlamydiae (A) Numberof paralogous genes within the Chlamydiae The number of paralogousgenes not including multiple copies is plotted against genome sizealong with a linear regression line (y = 4787x ndash 429 R2 = 070 blackline) The dashed gray line is plotted as a reference from 200 prokaryoticgenomes (Bratlie et al 2010) (B) Distribution of chlamydial gene familiesper genome with two or more members The number of genes withineach family is plotted for representative genomes The genomes of theChlamydiaceae have relatively small gene family sizes The polymorphicouter membrane proteins comprise the largest gene families in theChlamydiaceae and can be seen as the two largest blocks in theChlamydia pneumoniae LPCoLN (Cpn) bar The size distribution ofgene families is ordered from smallest to greatest and the appearanceof a solid ldquoblack boxrdquo at the base is merely an effect of the spacing ofmany small gene families There are several extensive gene families (la-beled) within members of Neochlamydia (NEX1 NEX2) andProtochlamydia (PEX1 PEX2)

2892

Domman et al doi101093molbevmsu227 MBE

Eukaryotic Ubiquitination-Associated DomainsPredominate Large Gene Families

The largest gene family termed Neochlamydia expansion 1(NEX1) in the phylum comprised a total of 138 memberswhich are contributed by the two Neochlamydia genomesThe domain architecture within this large gene family is het-erogeneous however all proteins contain various C-terminalrepetitions of LRR domains (fig 3) We have identified twosubfamilies that we delineate NEX1a and NEX1b within theNEX1 family The majority of members fall into the NEX1asubfamily in which they have a highly conserved N-terminalF-box or F-box-like domain The members have an average of77 sequence similarity among each other and the F-box-likedomain is 57 and 45 similar to Acanthamoeba castellaniiand human F-box-like domains respectively The smallerNEX1b family in contrast has a conserved RINGU-box atthe N-terminus Both the F-box and RINGU-box domains areassociated with eukaryotic E3 ubiquitin ligase complexes(Willems et al 2004) Between all members in the NEX1family there is a region of roughly 50 amino acids betweenthe predicted N-terminal domains and the LRRs that is highlyconserved However we failed to detect any known domainsin this region nor was there any homology to known proteins

The second large gene family within the Neochlamydiarepresents the third largest family of the phylum This largegene family termed NEX2 comprised 50 proteins (fig 3)

Similar to NEX1 the prevailing domain architecture is thatof eukaryotic-like E3 ubiquitin ligase-associated domainspaired with repeat domains This family is defined by thepresence of multiple TPR domains at the C-terminus and ageneral conservation of an F-box domain at the N-terminus inthe majority of members In most members there is also aconserved DUF294 domain located mid protein which is aputative nucleotidyltransferase In ten members there is anovarian tumor (OTU) (Balakirev et al 2003) domain directlyfollowing the F-box domain followed by the DUF294 domain

The other large gene families occur primarily in the mem-bers of the Protochlamydia and represent the second andfourth largest gene families of the phylum The largest genefamily in the Protochlamydia (PEX1) comprised 73 membersin total (fig 3) Intriguingly another E3 ubiquitin ligase-relateddomain the BTB domain is present in all but three membersat the N-terminus The BTB domain is then coupled toC-terminal LRR domains in all members The PEX2 familycomprised a total of 27 proteins that despite no detectabledomain at the N-terminus share multiple TPR domains in themiddle of the protein followed by a CHAT domain(Sakakibara and Hattori 2000) at the C-terminus

In summary the four largest gene families represent a sur-prisingly diverse armada of proteins which most likely func-tion within eukaryotic host cells where they potentiallyinterfere with the ubiquitination pathway The heterogeneity

FIG 3 Protein domain architecture of largest gene families The domain architecture of Neochlamydia (NEX1 NEX2) and Protochlamydia (PEX1 PEX2)gene families are shown The range of the number of domain repeats and functional assignments of the detected domains are indicated NEX1 can bedivided into two subfamilies based on phylogeny and domain presenceabsence A role of these proteins in the context of eukaryotic cells can bepostulated based on the presence of domains otherwise found in eukaryotes

2893

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

in domain architecture among these proteins interestinglymirrors that of their eukaryotic counterparts (Perez-Torradoet al 2006 Xu 2006)

A Pool of Putative Effector Proteins

If the members of the largest gene families in the phylumChlamydiae serve as effector proteins for host manipulationthey would need to be secreted and transported to the hostcell cytosol This is typically achieved through a type III secre-tion system a well-conserved virulence mechanism amongthe Chlamydiae which has been shown to translocate severalcharacterized effectors (Peters et al 2007 Betts et al 2009)Indeed many of the proteins found within the expandedNeochlamydia and Protochlamydia gene families are pre-dicted by computational analysis to be secreted by the typeIII secretion system and to be extracellular host associatedWithin the NEX1 family 66 of 138 (49) members are pre-dicted to be secreted A total of 37 (51) and 10 (37) werepredicted to be secreted from within the PEX1 and PEX2 genefamilies respectively The NEX2 gene family had the fewestpredicted with only two members

As the identification of the signal for secretion via the typeIII secretion system is inherently difficult (Arnold et al 2009)we tested representatives of each of the four largest genefamilies in vitro using a heterologous type III secretion sub-strate assay with Shigella flexneri as a host for protein expres-sion This assay has been used to successfully characterize typeIII secretion effector proteins from the chlamydiae before(Subtil et al 2001) and because of the lack of routine genetictools the Sh flexneri system is an attractive surrogate methodfor analyzing type III secretion in chlamydiae in vivo Thisexperiment demonstrated that the tested members ofNEX1 NEX2 PEX1 and PEX2 contain a functional type IIIsecretion recognition signal (supplementary fig S1Supplementary Material online) Together with the presenceof eukaryotic-like domains and computational predictionsthis strongly indicates that these gene families are largepools of effector proteins

Molecular Evolution of Large Gene Families

To better understand how these large gene families may haveevolved we reconstructed their phylogenetic relationshipsGene family trees were calculated using conserved sitesamong the protein alignment (supplementary figs S2ndashS6Supplementary Material online) The average amino acididentity between members ranges from 45 to 64 withthe most closely related sequences belonging to PEX2 Treetopologies suggest that the members of all four gene familieshave rapidly diverged as indicated by their long branchlengths Although the number of LRR and TPR domainsvaries dramatically between 1 and 39 this had no apparenteffect on the phylogenetic placement

We find clear cases in which the orthologs of two speciesgroup together indicating expansions have occurred beforespeciation (supplementary figs S2ndashS6 SupplementaryMaterial online) Alternatively expansions post speciation isapparent in all gene families Reconciliation of gene family

trees with the species tree indicates that in addition to ex-pansions many gene losses have occurred for each genefamily (supplementary table S3 Supplementary Materialonline) For instance for the NEX1 family there have been78 expansion events whereas 33 losses have occurred attrib-uted to 13 and 20 losses in Neochlamydia spp EPS4 andTUME1 respectively There were nearly equal losses betweenEPS4 and TUME1 (13 and 10) in NEX2 and a total of 46expansions Similarly PEX1 consists of 52 expansions and15 and 9 losses in Protochlamydia amoebophila EI2 andUWE25 respectively

A Birth-and-Death Model of Evolution

A pattern of differential gain loss and maintenance of genefamily members is strongly indicative of these gene familiesevolving according to a birth-and-death model (Nei andRooney 2005) Because of this differential maintenance ofgene family members the hallmarks of the birth-and-deathmodel are interspecies clustering of members in the phyloge-netic trees and the presence of pseudogenes from degradedmembers (Nei 2007) As we observe interspecies clustering forthe PEX and NEX gene families (supplementary figs S2ndashS6Supplementary Material online) we also tested for pseudo-genization events in the intergenic regions of the P amoebo-phila UWE25 genome (the draft Neochlamydia genomes areless suitable for this analysis) By utilizing BLAST we searchedfor matches to the predicted proteome using all intergenicregions as a query We then mapped the best BLAST hitsrepresenting 116 pseudogenes to their respective gene fam-ilies to get a picture of a given familiesrsquo representation in theintergenic regions The most represented gene family in theintergenic regions surprisingly was PEX1 (16 pseudogenes)For PEX2 one pseudogenized fragment was detected Theobserved presence of interspecies clustering and pseudo-genes and the dynamic history of gains and losses withinthe gene families are indicative of a birth-and-death modelof evolution

In contrast if these families were evolving via concertedevolution the phylogenetic trees would depict intraspeciesclustering that is that members of a gene family will be morehomologous to the other members from the same organismthan to that of other species Intraspecies clustering occursdue to repeated recombination among gene family memberswithin a genome leading to an overall high sequence similar-ity of all members a process known as gene conversion(Santoyo and Romero 2005) We thus tested for the possibil-ity of recombination within the gene families using the meth-ods implemented in the RDP4 software suite (Martin et al2010) Care must be taken however when assessing theimpact of recombination among divergent proteins as therecombination signal is quite error prone when proteins shareless than 70 similarity and these methods are heavily de-pendent on the alignment (Martin et al 2010) We detectsome recombination events between members within thePEX and NEX gene families however the majority of thepredictions are only marginally significant (the consensusscores are below the confidence threshold of 06) In the

2894

Domman et al doi101093molbevmsu227 MBE

NEX1a family the portion of the sequences most identified asrecombinant is the F-box domain (supplementary fig S8Supplementary Material online) This should be judiciouslyinterpreted as this might represent sequence similarity due topurifying selection operating on this domain Overall we didnot find convincing evidence that recombination has played amajor role in shaping the evolution of the PEX and NEX genefamilies

Gene Duplications and Purifying Selection

Gene family expansions can be the result of either gene du-plication or HGT The disentangling of these events is nottrivial and in fact may be impossible in the case of the genefamilies investigated here due to lineage-specific evolutionand the absence of clear homologs in other bacterial taxa(Kuo and Ochman 2009b) However several lines of evidencesuggest that regardless of the initial origin of these genes geneduplication processes have played a clear role in the evolutionthe large Parachlamydiaceae gene families First a hallmark ofgene duplication is the presence of tandem arrays of genecopies In this regard we find several large tandem arrays withmembers of the large gene families in Protochlamydia andNeochlamydia including examples for recent duplications(fig 4 supplementary fig S7 Supplementary Materialonline) In the P amoebophila UWE25 genome we detected47 tandem duplication events (distance within 10 genes) rep-resented by only 13 gene families Nearly half of these are thecontribution of PEX1 (n = 20) however they appear in severalclusters spread throughout the genome PEX2 demonstratesthe most dramatic case of a tandem array consisting of 13members After the PEX gene families the third largesttandem array only comprised three members intriguinglyalso F-box domain containing proteins NEX1 has 50 instances(36) where members are found within five genes of eachother Second the majority of the large gene family membersmeet the generally used criterion for identification of paralogsthat is they show at least 30 amino acid sequence identityover at least 60 of the protein length Thirdly phylogenetictrees show several species-specific expansion events

(including those genes still arranged in tandem arrays)which are best explained by gene duplication

Effector proteins have been shown to be among the fastestevolving proteins in a number of pathogen genomes(Nogueira et al 2012) and are often shown to be under pos-itive selection To assess the mode of selection acting onmembers of the expanded gene families we calculated theratio of nonsynonymous (dN) versus synonymous (dS) sub-stitution rates for each of the four gene families A dNdS ratioequal to 1 indicates a neutral state of selection whereas valueshigher or lower than 1 indicate positive or purifying selectionrespectively Global estimates of dNdS under the Model M0from CodeML (Yang 2007) ranged from 029 (NEX2) to 063(NEX1a) Additionally using pairwise sequence comparisonsfor dNdS calculation we did not find support that these genefamilies are currently evolving under positive selectionHowever the probability of these gene families evolvingunder purifying selection was highly significant (Plt 001)Therefore purifying selection is currently the dominantforce driving the evolution of these gene families and thusfacilitates their maintenance

Massive Expansion of Ubiquitination-AssociatedProteins

Driven by the discovery that members of the four largestParachlamydiaceae gene families showed rapid divergenceand are kept in chlamydial genomes despite apparent func-tional redundancy we asked whether there are additionalgenes not included in these gene families but encoding similarfunctional domains The common theme of the largeParachlamydiaceae gene families is the presence of domainsthat serve in the recruitment of target proteins to the eukary-otic ubiquitination machinery We therefore extracted all pro-teins containing F-boxF-box-like BTBPOZ and RINGU-boxdomains by scanning all chlamydial proteomes with eachrespective HMM profile We found no RINGU-box contain-ing proteins apart from those identified earlier as members ofNEX1b and we detected only few additional proteins inNeochlamydia and Parachlamydia harboring a BTBPOZdomain similar to those of PEX1 However our search

FIG 4 Example of duplication of BTB-box proteins in the Protochlamydia A new duplicate has arisen in Protochlamydia amoebophila UWE25 (shownvia the arrow) after the split from P amoebophila EI2 BTB-box proteins are indicated in purple The phylogenetic placement of these proteins(supplementary fig S5 Supplementary Material online) supports this scenario Orthologous proteins between the Protochlamydia are indicated byconnecting blocks

2895

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

unveiled an astonishing number of proteins harboring F-boxF-box-like domains with over 370 proteins within thephylum Nearly 300 of the F-box proteins are the contributionof the two Neochlamydia species (129 in TUME1 and 158 inEPS4) To characterize the relationships among this F-boxsuperfamily we constructed a phylogenetic tree based on adomain alignment This analysis shows that many of the ad-ditional F-box proteins found in Neochlamydia cluster witheither NEX1a or NEX2 (fig 5a) We also see several lineage-specific expansions of F-box proteins within Protochlamydiaand Parachlamydia species Reconciliation of the F-boxsuperfamily tree with the chlamydial species tree confirmsan extremely dynamic history of large-scale gene birth anddeath events (fig 5b) mirroring the evolutionary pattern seen

for the large Parachlamydiaceae gene families (supplementaryfigs S2ndashS6 and table S3 Supplementary Material online)

Furthermore we searched for additional chlamydial pro-teins containing domains for proteinndashprotein interactionsuch as LRR TPR or ankyrin repeats which are often associ-ated with F-boxF-box-like BTBPOZ and RINGU-box do-mains in the large gene families This search identified a vastnumber of proteins for each domain For instance chlamydialgenomes encode 409 proteins with LRR domains nearlydoubling the amount of LRR proteins contributed by thelarge Parachlamydiaceae gene families Taken together in ad-dition to the four large gene families there is an even greaterpool of chlamydial proteins with a putative role in hostinteraction

FIG 5 The phylogeny and evolutionary history of the F-box superfamily within the Chlamydiae (A) The phylogeny of 376 proteins within theChlamydiae that harbor an F-boxF-box-like domain This domain was extracted from each protein and aligned using MAFFT Maximum-likelihoodreconstruction of the phylogeny of the superfamily was performed with FastTree2 (B) The F-box domain superfamily gene tree was reconciled with thechlamydial species tree to reconstruct the evolutionary history of this group for members of the Chlamydiae The nodes in blue indicate the predictednumber of F-box proteins and numbers on the branches depict the gains and losses The extant species are indicated with their respective counts forF-box proteins The Neochlamydia have undergone massive gains and losses after the divergence from Protochlamydia

2896

Domman et al doi101093molbevmsu227 MBE

To gain a broader overview of the occurrence of F-boxF-box-like and BTBPOZ domains among other prokaryotesand eukaryotes we extracted domain abundance data fromPfam (Finn et al 2013) and included the current counts forthe Chlamydiae genomes present in this study (fig 6) Thisrevealed a striking pattern that the few bacterial groupsencoding proteins with F-box and BTB domains are almostexclusively amoeba-associated organisms These includemembers of the Legionellales (Gammaproteobacteria) theRickettsiales (Alphaproteobacteria) and the amoeba symbi-ont Amoebophilus asiaticus (Bacteroidetes) When thenumber of F-box proteins is normalized against the totalnumber of species in a given taxon the Chlamydiae lead inthe number of F-box proteins found in bacteria and evenharbor more than several lineages of eukaryotes includingthe Amoebozoa For the BTB proteins the Chlamydiaeappear to be the only bacterial lineage that harbors thisdomain It is intriguing that many of the large double-stranded DNA viruses namely the amoeba-infecting giantviruses contain many proteins with an F-box or BTB domain

Discussion

Large Gene Families Are Rare within ReducedBacterial Genomes

Chlamydial genomes are among the smallest known for pro-karyotes due to genome degradation consistent with long-term obligate associations with eukaryotic organisms(McCutcheon and Moran 2011) The genomes of host-asso-ciated bacteria such as Coxiella Mycoplasma Rickettsia spe-cies and members of the Chlamydiaceae tend to have smallor single copy gene families (Gevers et al 2004) and genefamily expansions either by gene duplication or HGT appearto have less effect on shaping the genomes of these obligatehost-associated bacteria (Bordenstein and Reznikoff 2005)However there is evidence that HGT may be more prevalentthan once thought in these organisms (Blanc et al 2007)A notable exception to this paradigm is the genome of theobligate intracellular pathogen Orientia tsutsugamushi(Rickettsiales) in which there has been massive expansionof type IV secretion system (over 350 tra-related genes) andhostndashmicrobe interaction genes (Cho et al 2007) Howeverthese expansions are thought to be the result of copiousplasmid integration and the genome is also littered withmobile elements a scenario not shared within chlamydialgenomes Here we have shown that several members ofthe Chlamydiae harbor gene families that have expanded atimmense magnitudes especially when compared with otherintracellular bacteria (fig 2) We find strong support for du-plication processes contributing to the expansion of thesefamilies thus highlighting that innovation through gene

FIG 6 Taxonomic profile of F-box and BTB domains The distributionof (A) the F-box clan and (B) the BTB-box throughout sequenced or-ganisms The size of the node indicates the number of species harboringproteins with the domain Thus larger node size indicates a largernumber of species in which a domain is found within a taxon Nodesare ordered from least to greatest by the total number of proteins thatcontain the domain within the taxon This is different than the numberof species as one species can have many proteins harboring a givendomain To reflect this disparity and to facilitate comparisons we com-puted a normalized value for each taxon that represents the number oftotal proteins divided by the number of species This normalizationvalue is represented by the width of the arc in the diagram For instancethe chlamydiae are represented by few species (small node size) but areamong the taxa containing the largest numbers of proteins with F-Boxand BTB domains (position on vertical axis) and show a high number ofproteins with these domains per species (arc width) All bacterial taxa

FIG 6 Continuedare plotted in purple and selected major eukaryotic taxa in blue TheChlamydiae are labeled in red and double-stranded DNA viruses areshown in green The data were obtained from the Pfam database foreach domain and counts were updated to reflect findings in this study

2897

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

duplication has had pronounced effects in shaping these chla-mydial genomes

Chlamydial Proteins Putatively Involved inInterference with Eukaryotic Ubiquitination Pathways

We demonstrated that the largest chlamydial gene familiesharbor proteins with domains associated with eukaryotic ubi-quitination pathways and that the chlamydial F-box super-family in particular is tremendous in size (figs 3 and 5) Arecent survey of prokaryotes found a total of 74 F-box pro-teins distributed in 22 species (Price and Kwaik 2010) whichmeans the number of these proteins present in a singleNeochlamydia genome is twice that of all previously knownbacterial F-box proteins combined The 76-member BTB su-perfamily is also remarkable in that the Parachlamydiaceae arethe only prokaryotes known to harbor this domain (fig 6)

In eukaryotes ubiquitin plays a pivotal regulatory role as aposttranslational modification that includes targeting pro-teins for degradation The process of adding ubiquitin to aprotein occurs when an assembled E3 ubiquitin ligase com-plex carrying a target protein is bound to the ubiquitin con-jugating enzyme E2 The ubiquitin ligase is a multiproteincomplex termed the Skp1-Cullin-F-box protein complex bywhich F-box proteins recruit target proteins and subsequentlybind to Skp1 which is linked to a Cullin protein (Zheng et al2002) A RINGU-box protein then serves as a linker of E2 tothe newly formed ubiquitin ligase and the transfer of theubiquitin moiety to the target occurs BTB-box proteins canhave multiple functions but chief among them is a functionalequivalent to the F-box protein in recruiting targets by bind-ing to the E3 ubiquitin ligase complex (Perez-Torrado et al2006) Proteinndashprotein interaction domains such as ankyrinkelch WD-40 LRR or TPR repeat domains are coupled toF-box and BTB-box domains and confer the specificity fortarget proteins

Given the conservation and essentiality of the ubiquitina-tion pathway throughout eukaryotes it should come as nosurprise that bacterial pathogens have engineered ways tomanipulate this pathway The use of F-box proteins appearsto be a common feature among plant pathogens such asAgrobacterium tumefaciens and Ralstonia solanacearumwhose genomes encode one and four F-box proteins respec-tively (Magori and Citovsky 2011) Intriguingly F-box proteinsseem to be a common feature of amoeba-associated bacteriaand viruses The amoeba symbiont A asiaticus a Bacteroi-detes is predicted to harbor 15 F-box proteins and until nowwas the largest known pool of these proteins among se-quenced genomes (Schmitz-Esser et al 2010) AdditionallyLegionella pneumophila exports an F-box protein coupledto ankyrin repeats termed AnkB that is essential for infectionof both human cell lines and Acanthamoeba (Price et al 2009Lomma et al 2010) AnkB blocks host proteosomal degrada-tion and thus generates increased levels of required aminoacids (Price et al 2009 Price et al 2011) Several other F-boxproteins secreted by L pneumophila have been shown tointeract with host E3 ligase complexes (Ensminger andIsberg 2010) Among the members of the Chlamydiaceae

we could not detect an F-box or BTB-box domain Howeverthere are several proteins within the Chlamydiaceae thatfunction as deubiquinating proteases such as the C tracho-matis ChlaDub1 (Misaghi et al 2006) and the recently de-scribed ChlaOTU characterized in Chlamydia caviae (Furtadoet al 2013)

We have shown experimentally that representative mem-bers of the investigated chlamydial gene families contain func-tional type III secretion signals (supplementary fig S1Supplementary Material online) Thus they likely representan extensive pool of effector proteins with a putative role inhijacking the host ubiquitination machinery perhaps in amanner similar to AnkB from Legionella by increasing nutri-ent availability In the absence of direct proteinndashprotein in-teraction data however we can only speculate as to what theinteraction partner(s) are for the members of the PEX andNEX gene families

Birth-and-Death Evolution Has Shaped LargeParachlamydiaceae Gene Families

Large gene families are thought to generally either evolve viaconcerted evolution or according to a birth-and-death model(Nei and Rooney 2005) When gene families are evolving con-certedly all members experience the same evolutionary pres-sure and evolve as a unit The gene family is marked byrecombination between members that leads to a homogeni-zation of all members and thus in the phylogenetic analysisone observes intraspecies clustering of gene family membersAlthough we do find minor evidence that recombination hasoccurred between members in the PEX and NEX gene families(supplementary fig S8 Supplementary Material online) theeffect does not appear to be that of homogenization In con-trast long branch lengths in the trees and moderate overallsequence similarity indicate these proteins have divergedquite extensively (supplementary figs S2ndashS6 SupplementaryMaterial online) As we do not observe a dominance of in-traspecies clustering in the phylogenetic analysis these genefamilies are not evolving in a fashion as would be predicted viaconcerted evolution

We provide clear evidence supporting birth-and-deathevolution of the PEX and NEX gene families which ismarked by independent gains and losses of members (Neiand Rooney 2005) We detected frequent lineage-specific du-plication and loss events leading to high rates of variation incopy number between closely related organisms (supplemen-tary table S3 Supplementary Material online) In the phylo-genetic trees of the PEX and NEX gene families we findinterspecies clustering of members which is a hallmark ofthe birth-and-death model (supplementary figs S2ndashS6Supplementary Material online) Additionally we detectedpseudogenized gene fragments of members related to thelarge gene families which is another hallmark of this modeof evolution To our knowledge this mode of evolution has sofar only been described once for a bacterial gene family(Rooney and Ward 2008)

It has been proposed that gene families that control phe-notypic characters are generally subject to birth-and-death

2898

Domman et al doi101093molbevmsu227 MBE

evolution (Nei and Rooney 2005 Nei 2007) In the case of theexpansions seen in the Parachlamydiaceae this character islikely the ability to effectively infect and replicate in protisthosts The advantage of the birth-and-death scenario as op-posed to concerted evolution is that individual members ofthe gene family are able to functionally differentiate from eachother and thus might facilitate the adaptation to new eco-logical niches (Nei 2007) Therefore a possible driver for thebirth-and-death model for the PEX and NEX families is theexploitation of new ecological niches which in this case ismost likely new host(s) species or a novel way to subvert hostcell machinery

A Confluence of Drift and Selection

Birth-and-death evolution of gene families occurs by bothadaptive processes and chance events such as genetic drift(Nei et al 2008) This is due to the fact that although geneduplications are intrinsically stochastic events their fixation isinfluenced by both selection and genetic drift If a duplicate isfixed functional divergence can occur due to relaxed selec-tion or diversifying selection in one of the gene copies aprocess known as neofunctionalization (Lynch and Conery2000) Once these genes have diverged the new functionsare then maintained in the genome via purifying selectionWe envision an evolutionary scenario where the PEX and NEXgene family members rapidly diverged either due to positivediversifying or relaxed selection after duplication likely leadingto functional diversification The window for detecting theseearly diversification processes is very small but our analysisindicates that the gene family members are now being main-tained via purifying selection

The retention of such large gene families in organisms thattypically undergo extreme genome reduction is perplexingespecially when considering the variation in copy numberbetween organisms The large number of gains and lossesfor F-box domain containing proteins across chlamydial line-ages indicate substantial fluctuations in gene content some-times over short evolutionary time (fig 5b) One describedcorollary of gene families evolving via a birth-and-deathmodel is that the copy number variation of memberswithin a gene family may vary due to chance duplicationloss events both between and within species a process coinedldquogenomic drift (Nei et al 2008) Genomic drift has been in-voked to explain the large copy number variations in severallarge gene families in eukaryotes including animal chemosen-sory receptors (Nozawa et al 2007) homeobox genes (Namand Nei 2005) and fatty-acid reductases (Eirın-Lopez et al2012)

Remarkably eukaryotic F-box and BTB-box gene familiesdemonstrate the same dramatic evolution following a rapidbirth-and-death model and extensive copy number variationis seen both between and within eukaryotic species (Xu 2006Navarro-Quezada et al 2013) This is especially true in higherplant species (Stogios et al 2005 Xu et al 2009) whereArabidopsis thaliana and Oryza sativa harbor upwards of800 F-box proteins (Xu et al 2009) and large expansionshave also been described in some nematode lineages

(Thomas 2006) A case has been made that genomic driftalso influences the evolution of these eukaryotic F-boxBTB-box protein families (Xu et al 2009) Genomic drift there-fore seems a plausible mechanism describing some aspects ofevolution of the PEX and NEX gene families in theParachlamydiaceae Genome sequences from other chlamyd-ial genomes and data from within populations would allowfurther testing of this hypothesis

Another possibility is that the expansion of these families isthe result of selection pressure due to a coevolutionary armsrace with host counterpart protein(s) Under this scenariothe expansion and diversification of these families are in directresponse to changes in target proteins These evolutionarydynamics also referred to as the Red Queen hypothesis(Valen 1974) have been most exemplified in bacteriawithin plantndashpathogen relationships most notably betweenthe pathogen Pseudomonas syringae and its host Ar thaliana(Ma et al 2006 Baltrus et al 2011) Copy number variation ofeffector proteins has been shown for Ps syringae and it isspeculated that these differences confer differential hostranges (Baltrus et al 2011) It is plausible therefore that var-iations in copy number between the PEX and NEX gene fam-ilies is influenced by the host range of particular chlamydiallineages This would imply that these proteins interact withmany targets from a narrow host range or that there are alimited number of targets for a large number of possible hostsIt seems most parsimonious that the latter was the case andthat these genes serve as accessory virulence factors allowingthese chlamydiae to expand their host range However thefailure to detect positive selection as the major force acting onthe PEX and NEX gene family members casts doubts on acoevolutionary arms race as the sole mechanism for the evo-lution of these gene families

Another conceivable scenario might be that selection pres-sure for increased gene dosage has contributed to the expan-sion of the large chlamydial gene families In reduced genomeswhere the pool of regulatory proteins is limited a path forincreased protein expression might be gene duplication Thismode of ldquogene amplification has been described for instancefor the increase of antibiotic resistance genes in E coli thecholera toxin gene in Vibrio cholerae and capsule biosynthesisgenes in Haemophilus influenza (reviewed in Andersson andHughes 2009) Although a gene dosage scenario cannot befully ruled out for the PEX and NEX gene families it seemsunlikely as the high divergence of their members makes acompletely overlapping function highly improbableAdditionally gene amplifications are in response to strongselection pressure and would therefore be under strong pos-itive selection It is also known that once these selection pres-sures are removed the amplified gene copies are rapidly lostwithin the population often within several generations(Andersson and Hughes 2009)

We favor a hypothesis that incorporates both selectionand chance into the equation We envision that the expan-sion of these gene families is reflective of their role in hostndashmicrobe interaction where they are interfering with the hostubiquitination pathway The expansions may reflect either achange in the environmental niche perhaps the ability to

2899

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

infect a new host organism or they may be in response to agrowing number of targets within a current host Because thePEX and NEX gene families have respectively expanded in theProtochlamydia and Neochlamydia they appear to providelineage-specific functions related to particular host interac-tions The PEX and NEX variation in copy number betweenclosely related organisms may be reflective of genomic drift inwhich the independent gain and loss of members within afamily has been determined to some extent by chanceevents The fact that members of the Chlamydiae includingpathogenic Chlamydia also harbor various proteins to sub-vert the host ubiquitination pathway indicates a case of con-vergent evolution toward exploitation of this system withinthis phylum

To summarize the Chlamydiae harbor several lineage-spe-cific gene families which are the largest among intracellularmicrobes with small genomes Experimental evidence andcomputational analysis strongly indicate that members ofthese large gene families function as effector proteins involvedin manipulating the ubiquitination machinery of their eukary-otic host cells The large chlamydial gene families follow abirth-and-death model of evolution where genomic driftmay influence copy number variation This might representa previously undescribed mechanism by which organismswith limited exposure to larger gene pools generate geneticdiversity

Materials and Methods

Genome Sequencing

We sequenced the genomes of P amoebophila EI2 (Schmitz-Esser et al 2008) Parachlamydia acanthamoebae OEW1(Heinz et al 2007) Neochlamydia sp TUME1 (Fritsche et al2000) and Neochlamydia sp EPS4 The Acanthamoeba spharboring the latter was isolated from pond sediment fromElba Italy Cells and DNA were prepared as described previ-ously (Schmitz-Esser et al 2010) All four genomes were se-quenced using 454 technology and assemblies wereperformed using Newbler 26 We implemented an in-housepipeline for genome annotation that combines multipleapproaches for gene calling and function prediction Genecalling was performed by combining ab initio predictionsfrom GeneMark (Besemer et al 2001) Glimmer v302 (Del-cher et al 2007) Prodigal (Hyatt et al 2010) and Critica v105(Badger and Olsen 1999) with homology information derivedfrom a BLAST search against National Center for Biotechnol-ogy Information nonredundant protein database (NCBIResource Coordinators 2014) RNA genes were called bytRNAscan-SE (Lowe and Eddy 1997) RNAmmer (Lagesenet al 2007) and Rfam (Griffiths-Jones et al 2005) Functionprediction was performed via BLAST against the UniProtdatabase (UniProt Consortium 2014) and domain predictionwas performed via InterProScan 5 (Jones et al 2014)Completeness estimates were performed based on the pres-ence of single-copy marker genes (n = 54) found in 99 of allbacterial genomes Sequences have been deposited at Gen-bankEMBLDDBJ under accession numbers PRJNA242498PRJNA242497 PRJNA242499 and PRJNA242500

Comparative Genome Analysis

Orthologous protein groups were calculated using OrthoMCL(Li et al 2003) with default parameters using the predictedproteomes from 19 chlamydial organisms (supplementarytable S2 Supplementary Material online) For those gene fam-ilies under analysis membership to a given family was furtherevaluated by alignment and assessment of the phylogenetictrees Members that had obvious major differences in thealignment and trees were dubbed spurious and likely tohave been grouped due to homology in the repeat regionand thus were removed from the gene family Whole-genomealignments were performed using the MAUVE progressive-Mauve algorithm (Darling et al 2010) The MAUVE align-ments and local synteny plots were visualized in R (v 301)with the genoPlotR package (Guy et al 2010)

Species Tree Construction

Phylogeny of the Chlamydiae was reconstructed using 32phylogenetic marker proteins (supplementary table S2Supplementary Material online) Multiple sequence align-ments were performed with MAFFT (Katoh and Standley2013) using the settings ldquondashmaxiterate 1000 and all align-ments subsequently concatenated using FASconCAT v10(Keurouck and Meusemann 2010) Maximum-likelihood analysiswas performed with RAxML (Stamatakis 2006) with 1000bootstrap iterations under the PROTGAMMAGTR modeland Bayesian inference was performed with MrBayes v32(Ronquist and Huelsenbeck 2003) using the mixed aminoacid model and standard settings via the CIPRES science gate-way (Miller et al 2010) Alignment and tree files are availableas supplementary material Supplementary Material online

Gene Family Analysis

As multiple sequence alignments of repeat containing pro-teins are not trivial we employed several methods to obtainreliable alignments We compared the alignments producedby MUSCLE (Edgar 2004) MAFFT using the ldquogenapairsoption (Katoh and Standley 2013) and DIALIGN-PFAM (AlAit et al 2013) all with and without trimming by Gblocksusing relaxed parameters ldquo-b5=h (Castresana 2000) In nearlyall cases the MAFFT alignment combined with Gblocksyielded the best alignment as judged by manual inspectionThe exceptions were that DIALIGN-PFAM with Gblocks wasthe best method for the NEX1b and PEX1 alignments Toensure robustness we calculated neighbor joining maximumlikelihood and Bayesian trees for each of the alignment datasets and selected the most supported tree for the final anal-ysis In all cases the Bayesian trees yielded the most supportwhich were calculated on the CIPRES gateway (Miller et al2010) with MrBayes as mentioned above for the species treeFor protein domain phylogenies we extracted and aligned thedomains using the hmmalign program from HMMER3 pack-age (Eddy 2011) and phylogenetic trees for the domains werecalculated using FastTree2 (Price et al 2010) Reconciliationsof gene trees to the species tree to infer gene gain and losswere performed in Notung v26 (Stolzer et al 2012) and theroot for the gene trees was assigned within the program to

2900

Domman et al doi101093molbevmsu227 MBE

achieve the lowest duplication-to-loss ratio Phylogenetictrees were visualized using iTOL (Letunic and Bork 2007) orthe ETE2 toolkit (Huerta-Cepas et al 2010) Alignment andtree files are available as supplementary materialSupplementary Material online

Detection of Selection and Recombination

Whole-protein alignments were converted to codon align-ments via the Pal2Nal program (Suyama et al 2006) Todetect the mode of selection acting on these gene familieswe used CodeML from the PAML package (Yang 2007) Onlysequences that were nearly full length and not contig fusionswere used for the CodeML analysis We additionally used themodified NeindashGojobori method for codon selection from theMEGA v51 (Tamura et al 2011) program with 1000 boot-straps and treating missing data with pairwise deletions Weused the RDP4 software suite (Martin et al 2010) to detectrecombination events which is an amalgamation of manyindividual recombination programs linked into one softwarearchitecture We employed seven recombination detectionprograms that include RDP (Martin and Rybicki 2000)GENECONV (Sawyer 1989) MaxChi (Smith 1992) BootScan(Martin et al 2005) Chimaera (Posada and Crandall 2001)SiScan (Gibbs et al 2000) and 3Seq (Boni et al 2007) Onlyevents that were predicted with more than four programswere considered

Intergenic Sequences Analysis

The proteome of Protochlamydia was used as query(tBLASTn) against the intergenic regions The hits were con-servatively filtered based on size (4 50 aa) bitscore (4 50)and E value (lt1010) For each intergenic region when twoquery proteins overlap only the best hit was considered Inaddition if different parts of the same protein had hits in oneintergenic space only the top scoring (bitscore) was consid-ered Putative domains in the intergenic regions were de-tected using InterProScan 5 (Jones et al 2014)

Domain Distribution

For each domain of interest we downloaded the Pfam HMMmodel (Finn et al 2013) and scanned all chlamydial prote-omes using the hmmscan program from the HMMER3 suite(Eddy 2011) In the case of the F-box and F-box-like domainsthe results were combined into a nonredundant list The datafor the other taxa were obtained through the Pfam websiteunder the species distribution tab for each domain or clan asin the case for the F-box The networks were created using theldquoarcdiagramrdquo package in R

Type III Secretion Analysis

Fusion proteins containing the 50-part of the genes of interest(including the first 20 codons) and the adenylate cyclase Cyawere expressed in Sh Flexneri SF401 and SF620 derivatives ofthe wild-type strain M90T in which the mxiD and ipaB geneshave been inactivated (Allaoui et al 1993) The 50-part of thetarget genes were amplified by polymerase chain reaction andcloned in the puc19cya vector as described (Subtil et al 2001)

Secretion assays were performed on 30 ml of exponentiallygrown cultures as described previously (Subtil et al 2001)Antibodies against CRP a cytosolic marker were used to es-timate the contamination of supernatant fractions with bac-terial proteins as a result of bacterial lysis Antibodies againstIpaD a type-III-secreted protein of Shigella were used to verifythat type III secretion occurred normally in the transformedstrains A monoclonal antibody against Cya and polyclonalantibodies against CRP and IpaD were generously provided byDrs N Guiso A Ullmann and C Parsot respectively (InstitutPasteur Paris) Prediction of type III secretion was performedvia the Effective database (Jehl et al 2011) and the PSORTbwebserver (Yu et al 2010)

Supplementary MaterialSupplementary tables S1ndashS3 and figures S1ndashS8 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Thomas Penz and Gabriele Schwammel are acknowledged forisolating genomic DNA This work was supported byAustrian Science Fund FWF grants Y277-B06 and I1628-B22European Research Council ERC StG EVOCHLAMY (281633)and Marie Curie Initial Training Network SymbiomicsGenome sequencing was performed at the NorwegianSequencing Center

ReferencesAl Ait L Yamak Z Morgenstern B 2013 DIALIGN at GOBICSmdashmultiple

sequence alignment using various sources of external informationNucleic Acids Res 41W3ndashW7

Allaoui A Sansonetti PJ Parsot C 1993 MxiD an outer membraneprotein necessary for the secretion of the Shigella flexneri lpa inva-sins Mol Microbiol 759ndash68

Andersson DI Hughes D 2009 Gene amplification and adaptive evolu-tion in bacteria Annu Rev Genet 43167ndash195

Angot A Vergunst A Genin S Peeters N 2007 Exploitationof eukaryotic ubiquitin signaling pathways by effectorstranslocated by bacterial type III and type IV Secretion SystemsPLoS Pathog 3e3

Arnold R Brandmaier S Kleine F Tischler P Heinz E Behrens SNiinikoski A Mewes HW Horn M Rattei T 2009 Sequence-based prediction of type III secreted proteins PLoS Pathog 5e1000376

Badger JH Olsen GJ 1999 CRITICA coding region identification toolinvoking comparative analysis Mol Biol Evol 16512ndash524

Balakirev MY Tcherniuk SO Jaquinod M Chroboczek J 2003 Otubainsa new family of cysteine proteases in the ubiquitin pathway EMBORep 4517ndash522

Baltrus DA Nishimura MT Romanchuk A Chang JH Mukhtar MSCherkis K Roach J Grant SR Jones CD Dangl JL 2011 Dynamicevolution of pathogenicity revealed by sequencing and comparativegenomics of 19 Pseudomonas syringae isolates PLoS Pathog 7e1002132

Bertelli C Collyn F Croxatto A Reurouckert C Polkinghorne A Kebbi-Beghdadi C Goesmann A Vaughan L Greub G 2010 TheWaddlia genome a window into chlamydial biology PLoS One 5e10890

2901

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

Besemer J Lomsadze A Borodovsky M 2001 GeneMarkS a self-trainingmethod for prediction of gene starts in microbial genomesImplications for finding sequence motifs in regulatory regionsNucleic Acids Res 292607ndash2618

Betts HJ Wolf K Fields KA 2009 Effector protein modulation of hostcells examples in the Chlamydia spp arsenal Curr Opin Microbiol1281ndash87

Blanc G Ogata H Robert C Audic S Claverie J-M Raoult D 2007Lateral gene transfer between obligate intracellular bacteria evi-dence from the Rickettsia massiliae genome Genome Res 171657ndash1664

Boni MF Posada D Feldman MW 2007 An exact nonparametricmethod for inferring mosaic structure in sequence tripletsGenetics 1761035ndash1047

Bordenstein SR Reznikoff WS 2005 Mobile DNA in obligate intracellularbacteria Nat Rev Microbiol 3688ndash699

Bratlie MS Johansen J Sherman BT Huang DW Lempicki RA Drabloslashs F2010 Gene duplications in prokaryotes can be associated with en-vironmental adaptation BMC Genomics 11588

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Cho N-H Kim H-R Lee J-H Kim SY Kim J Cha S Kim SY Darby ACFuxelius HH Yin J et al 2007 The Orientia tsutsugamushi genome re-veals massive proliferation of conjugative type IV secretion systemand host-cell interaction genes Proc Natl Acad Sci U S A 1047981ndash7986

Collingro A Tischler P Weinmaier T Penz T Heinz E Brunham RC ReadTD Bavoil PM Sachse K Kahane S et al 2011 Unity in varietymdashthe pan-genome of the Chlamydiae Mol Biol Evol 283253ndash3270

Darling AE Mau B Perna NT 2010 progressiveMauve multiple genomealignment with gene gain loss and rearrangement PLoS One 5e11147

Delcher AL Bratke KA Powers EC Salzberg SL 2007 Identifying bacterialgenes and endosymbiont DNA with Glimmer Bioinformatics 23673ndash679

Eddy SR 2011 Accelerated profile hmm searches PLoS Comput Biol 7e1002195

Edgar RC 2004 MUSCLE multiple sequence alignmentwith high accuracy and high throughput Nucleic Acids Res 321792ndash1797

Eirın-Lopez JM Rebordinos L Rooney AP Rozas J 2012 The birth-and-death evolution of multigene families revisited Genome Dyn7170ndash196

Ensminger AW Isberg RR 2010 E3 ubiquitin ligase activity and targetingof Bat3 by multiple Legionella pneumophila translocated substratesInfect Immun 783905ndash3919

Finn RD Bateman A Clements J Coggill P Eberhardt RY Eddy SR HegerA Hetherington K Holm L Mistry J et al 2013 Pfam the proteinfamilies database Nucleic Acids Res 42D222ndashD230

Fritsche TR Horn M Wagner M Herwig RP Schleifer K-H Gautom RK2000 Phylogenetic diversity among geographically dispersedChlamydiales endosymbionts recovered from clinical and environ-mental isolates of Acanthamoeba spp Appl Environ Microbiol 662613ndash2619

Furtado AR Essid M Perrinet S Bala~na ME Yoder N Dehoux P Subtil A2013 The chlamydial OTU domain-containing protein ChlaOTU isan early type III secretion effector targeting ubiquitin and NDP52Cell Microbiol 152064ndash2079

Gevers D Vandepoele K Simillon C Van de Peer Y 2004 Gene dupli-cation and biased functional retention of paralogs in bacterial ge-nomes Trends Microbiol 12148ndash154

Gibbs MJ Armstrong JS Gibbs AJ 2000 Sister-scanning a Monte Carloprocedure for assessing signals in recombinant sequencesBioinformatics 16573ndash582

Goldman BS Nierman WC Kaiser D Slater SC Durkin AS Eisen JARonning CM Barbazuk WB Blanchard M Field C et al 2006Evolution of sensory complexity recorded in a myxobacterialgenome Proc Natl Acad Sci U S A 10315200ndash15205

Gomes JP Nunes A Bruno WJ Borrego MJ Florindo C Dean D 2006Polymorphisms in the nine polymorphic membrane proteins ofChlamydia trachomatis across all serovars evidence for serovar Darecombination and correlation with tissue tropism J Bacteriol 188275ndash286

Griffiths-Jones S Moxon S Marshall M Khanna A Eddy SR Bateman A2005 Rfam annotating non-coding RNAs in complete genomesNucleic Acids Res 33D121ndashD124

Grimwood J Stephens RS 1999 Computational analysis of thepolymorphic membrane protein superfamily of Chlamydia tracho-matis and Chlamydia pneumoniae Microb Comp Genomics 4187ndash201

Guy L Kultima JR Andersson SGE 2010 genoPlotR comparative geneand genome visualization in R Bioinformatics 262334ndash2335

Heinz E Kolarov I Keuroastner C Toenshoff ER Wagner M Horn M 2007An Acanthamoeba sp containing two phylogenetically differentbacterial endosymbionts Environ Microbiol 91604ndash1609

Hooper SD Berg OG 2003 On the nature of gene innovation duplica-tion patterns in microbial genomes Mol Biol Evol 20945ndash954

Horn M 2008 Chlamydiae as symbionts in eukaryotes Annu RevMicrobiol 62113ndash131

Horn M Collingro A Schmitz-Esser S Beier CL Purkhold U Fartmann BBrandt P Nyakatura GJ Droege M Frishman D et al 2004Illuminating the evolutionary history of chlamydiae Science 304728ndash730

Huerta-Cepas J Dopazo J Gabaldon T 2010 ETE a python environmentfor tree exploration BMC Bioinformatics 1124

Hyatt D Chen G-L LoCascio PF Land ML Larimer FW Hauser LJ 2010Prodigal prokaryotic gene recognition and translation initiation siteidentification BMC Bioinformatics 11119

Jehl M-A Arnold R Rattei T 2011 Effectivemdasha database ofpredicted secreted bacterial proteins Nucleic Acids Res 39D591ndashD595

Jones P Binns D Chang H-Y Fraser M Li W McAnulla C McWilliam HMaslen J Mitchell A Nuka G et al 2014 InterProScan 5 genome-scale protein function classification Bioinformatics 301236ndash1240

Kaessmann H 2010 Origins evolution and phenotypic impact of newgenes Genome Res 201313ndash1326

Kalman S Mitchell W Marathe R Lammel C Fan J Hyman RW OlingerL Grimwood J Davis RW Stephens RS 1999 Comparative genomesof Chlamydia pneumoniae and C trachomatis Nat Genet 21385ndash389

Kamneva OK Knight SJ Liberles DA Ward NL 2012 Analysis of genomecontent evolution in PVC bacterial super-phylum assessment ofcandidate genes associated with cellular organization and lifestyleGenome Biol Evol 41375ndash1390

Katoh K Standley DM 2013 MAFFT multiple sequence alignment soft-ware version 7 improvements in performance and usability Mol BiolEvol 30772ndash780

Koonin EV Wolf YI Karev GP 2002 The structure of the protein uni-verse and genome evolution Nature 420218ndash223

Keurouck P Meusemann K 2010 FASconCAT convenient handling of datamatrices Mol Phylogenet Evol 561115ndash1118

Kuo C-H Ochman H 2009a Deletional bias across the three domains oflife Genome Biol Evol 1145ndash152

Kuo C-H Ochman H 2009b The fate of new bacterial genes FEMSMicrobiol Rev 3338ndash43

Lagesen K Hallin P Roslashdland EA Staefeldt H-H Rognes T Ussery DW2007 RNAmmer consistent and rapid annotation of ribosomalRNA genes Nucleic Acids Res 353100ndash3108

Lagkouvardos I Weinmaier T Lauro FM Cavicchioli R Rattei T Horn M2013 Integrating metagenomic and amplicon databases to re-solve the phylogenetic and ecological diversity of the ChlamydiaeISME J 8115ndash125

Lerat E Daubin V Ochman H Moran NA 2005 Evolutionary origins ofgenomic repertoires in bacteria PLoS Biol 3e130

Letunic I Bork P 2007 Interactive Tree Of Life (iTOL) an online tool forphylogenetic tree display and annotation Bioinformatics 23127ndash128

2902

Domman et al doi101093molbevmsu227 MBE

Li L Stoeckert CJ Jr Roos DS 2003 OrthoMCL identification ofortholog groups for eukaryotic genomes Genome Res 132178ndash2189

Lomma M Dervins-Ravault D Rolando M Nora T Newton HJ SansomFM Sahr T Gomez-Valero L Jules M Hartland EL et al 2010The Legionella pneumophila F-box protein Lpp2082 (AnkB) modu-lates ubiquitination of the host protein parvin B and promotesintracellular replication Cell Microbiol 121272ndash1291

Lowe TM Eddy SR 1997 tRNAscan-SE a program for improved detec-tion of transfer RNA genes in genomic sequence Nucleic Acids Res25955ndash964

Lynch M Conery JS 2000 The evolutionary fate and consequences ofduplicate genes Science 2901151ndash1155

Ma W Dong FFT Stavrinides J Guttman DS 2006 Type III effec-tor diversification via both pathoadaptation and horizontaltransfer in response to a coevolutionary arms race PLoS Genet 2e209

Magori S Citovsky V 2011 Hijacking of the host SCF ubiquitin ligasemachinery by plant pathogens Front Plant Sci 287

Martin D Rybicki E 2000 RDP detection of recombination amongstaligned sequences Bioinformatics 16562ndash563

Martin DP Lemey P Lott M Moulton V Posada D Lefeuvre P 2010RDP3 a flexible and fast computer program for analyzing recombi-nation Bioinformatics 262462ndash2463

Martin DP Posada D Crandall KA Williamson C 2005 A modifiedbootscan algorithm for automated identification of recombinantsequences and recombination breakpoints AIDS Res HumRetroviruses 2198ndash102

McCutcheon JP Moran NA 2011 Extreme genome reduction in sym-biotic bacteria Nat Rev Microbiol 1013ndash26

McLeod MP Warren RL Hsiao WWL Araki N Myhre M Fernandes CMiyazawa D Wong W Lillquist AL Wang D et al 2006 Thecomplete genome of Rhodococcus sp RHA1 provides insightsinto a catabolic powerhouse Proc Natl Acad Sci U S A 10315582ndash15587

Miller MA Pfeiffer W Schwartz T 2010 Creating the CIPRES sciencegateway for inference of large phylogenetic trees In Proceedings ofthe Gateway Computing Environments Workshop (GCE) 2010 p1ndash8

Misaghi S Balsara ZR Catic A Spooner E Ploegh HL Starnbach MN2006 Chlamydia trachomatis-derived deubiquitinating enzymes inmammalian cells during infection Mol Microbiol 61142ndash150

Myers GSA Crabtree J Huot CH 2012 Deep and wide comparativegenomics of Chlamydia In Tan M Bavoil PM editors Intracellularpathogens Chlamydiales Washington (DC) ASM Press p 27ndash50

Nam J Nei M 2005 Evolutionary change of the numbers of homeoboxgenes in bilateral animals Mol Biol Evol 222386ndash2394

Navarro-Quezada A Schumann N Quint M 2013 Plant F-box proteinevolution is determined by lineage-specific timing of major genefamily expansion waves PLoS One 8e68672

NCBI Resource Coordinators 2014 Database resources of the NationalCenter for Biotechnology Information Nucleic Acids Res 42D7ndashD17

Nei M 2007 The new mutation theory of phenotypic evolution ProcNatl Acad Sci U S A 10412235ndash12242

Nei M Niimura Y Nozawa M 2008 The evolution of animal chemo-sensory receptor gene repertoires roles of chance and necessity NatRev Genet 9951ndash963

Nei M Rooney AP 2005 Concerted and birth-and-death evolution ofmultigene families Annu Rev Genet 39121ndash152

Nogueira T Touchon M Rocha EPC 2012 Rapid evolution of the se-quences and gene repertoires of secreted proteins in bacteria PLoSOne 7e49403

Nozawa M Kawahara Y Nei M 2007 Genomic drift and copy numbervariation of sensory receptor genes in humans Proc Natl Acad SciU S A 10420421ndash20426

Ochman H Lawrence JG Groisman EA 2000 Lateral gene transfer andthe nature of bacterial innovation Nature 405299ndash304

Perez-Torrado R Yamada D Defossez P-A 2006 Born to bind the BTBproteinndashprotein interaction domain BioEssays 281194ndash1202

Peters J Wilson DP Myers G Timms P Bavoil PM 2007 Type III secre-tion a la Chlamydia Trends Microbiol 15241ndash251

Posada D Crandall KA 2001 Evaluation of methods for detecting re-combination from DNA sequences computer simulations Proc NatlAcad Sci U S A 9813757ndash13762

Price CT Kwaik YA 2010 Exploitation of host polyubiquitination ma-chinery through molecular mimicry by eukaryotic-like bacterial F-box effectors Cell Infect Microbiol 1122

Price CT Al-Khodor S Al-Quadan T Santic M Habyarimana F Kalia AKwaik YA 2009 Molecular mimicry by an F-box effector ofLegionella pneumophila hijacks a conserved polyubiquitinationmachinery within macrophages and protozoa PLoS Pathog 5e1000704

Price CTD Al-Quadan T Santic M Rosenshine I Kwaik YA 2011 Hostproteasomal degradation generates amino acids essential for intra-cellular bacterial growth Science 3341553ndash1557

Price MN Dehal PS Arkin AP 2010 FastTree 2mdashapproximately maxi-mum-likelihood trees for large alignments PLoS One 5e9490

Pushker R Mira A Rodriguez-Valera F 2004 Comparative geno-mics of gene-family size in closely related bacteria Genome Biol 5R27

Ronquist F Huelsenbeck JP 2003 MrBayes 3 Bayesian phylogeneticinference under mixed models Bioinformatics 191572ndash1574

Rooney AP Ward TJ 2008 Birth-and-death evolution of the internalinmultigene family in Listeria Gene 427124ndash128

Sakakibara A Hattori S 2000 Chat a CasHEF1-associated adaptor pro-tein that integrates multiple signaling pathways J Biol Chem 2756404ndash6410

Santoyo G Romero D 2005 Gene conversion and concerted evolutionin bacterial genomes FEMS Microbiol Rev 29169ndash183

Sawyer S 1989 Statistical tests for detecting gene conversion Mol BiolEvol 6526ndash538

Schmitz-Esser S Tischler P Arnold R Montanaro J Wagner M Rattei THorn M 2010 The genome of the amoeba symbiont ldquoCandidatusAmoebophilus asiaticusrdquo reveals common mechanisms for host cellinteraction among amoeba-associated bacteria J Bacteriol 1921045ndash1057

Schmitz-Esser S Toenshoff ER Haider S Heinz E Hoenninger VMWagner M Horn M 2008 Diversity of bacterial endosymbionts ofenvironmental Acanthamoeba isolates Appl Environ Microbiol 745822ndash5831

Smith JM 1992 Analyzing the mosaic structure of genes J Mol Evol 34126ndash129

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Stephens RS Kalman S Lammel C Fan J Marathe R Aravind L MitchellW Olinger L Tatusov RL Zhao Q et al 1998 Genome sequence ofan obligate intracellular pathogen of humans Chlamydia trachoma-tis Science 282754ndash759

Stogios PJ Downs GS Jauhal JJ Nandra SK Prive GG 2005Sequence and structural analysis of BTB domain proteins GenomeBiol 6R82

Stolzer M Lai H Xu M Sathaye D Vernot B Durand D 2012Inferring duplications losses transfers and incomplete line-age sorting with nonbinary species trees Bioinformatics 28i409ndashi415

Subtil A Collingro A Horn M 2014 Tracing the primordial Chlamydiaeextinct parasites of plants Trends Plant Sci 1936ndash43

Subtil A Parsot C Dautry-Varsat A 2001 Secretion of predicted Incproteins of Chlamydia pneumoniae by a heterologous type III ma-chinery Mol Microbiol 39792ndash800

Suyama M Torrents D Bork P 2006 PAL2NAL robust conversion ofprotein sequence alignments into the corresponding codon align-ments Nucleic Acids Res 34W609ndashW612

2903

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

Tamura K Peterson D Peterson N Stecher G Nei M Kumar S 2011MEGA5 molecular evolutionary genetics analysis using maximumlikelihood evolutionary distance and maximum parsimony meth-ods Mol Biol Evol 282731ndash2739

Thomas JH 2006 Adaptive evolution in two large families of ubiquitin-ligase adapters in nematodes and plants Genome Res 161017ndash1030

Treangen TJ Rocha EPC 2011 Horizontal transfer not duplicationdrives the expansion of protein families in prokaryotes PLoSGenet 7e1001284

UniProt Consortium 2014 Activities at the Universal Protein Resource(UniProt) Nucleic Acids Res 42D191ndashD198

Valen LV 1974 Molecular evolution as predicted by natural selectionJ Mol Evol 389ndash101

Willems AR Schwab M Tyers M 2004 A hitchhikerrsquos guide to the cullinubiquitin ligases SCF and its kin Biochim Biophys Acta 1695133ndash170

Xu G Ma H Nei M Kong H 2009 Evolution of F-box genes in plantsdifferent modes of sequence divergence and their relationships withfunctional diversification Proc Natl Acad Sci U S A 106835ndash840

Xu J 2006 Microbial ecology in the age of genomics andmetagenomics concepts tools and recent advances Mol Ecol 151713ndash1731

Yang J Lusk R Li W-H 2003 Organismal complexity protein complex-ity and gene duplicability Proc Natl Acad Sci U S A 10015661ndash15665

Yang Z 2007 PAML 4 phylogenetic analysis by maximum likelihoodMol Biol Evol 241586ndash1591

Yu NY Wagner JR Laird MR Melli G Rey S Lo R Dao P Sahinalp SCEster M Foster LJ et al 2010 PSORTb 30 improved protein sub-cellular localization prediction with refined localization subcate-gories and predictive capabilities for all prokaryotes Bioinformatics261608ndash1615

Zhang J 2003 Evolution by gene duplication an update Trends EcolEvol 18292ndash298

Zheng N Schulman BA Song L Miller JJ Jeffrey PD Wang P Chu CKoepp DM Elledge SJ Pagano M et al 2002 Structure of the Cul1ndashRbx1ndashSkp1ndashF boxSkp2 SCF ubiquitin ligase complex Nature 416703ndash709

2904

Domman et al doi101093molbevmsu227 MBE

Page 4: Massive Expansion of Ubiquitination-Related Gene Families

The Gene Family Landscape of the Chlamydiae

To explore gene family evolution among members of theChlamydiae we first identified gene families using clustersof orthologous groups of proteins within the predicted pro-teomes from 19 chlamydial genomes (supplementary tableS2 Supplementary Material online) We then searched forgene families that contain expansion events that is thosehaving multiple members from one organism Previouswork has demonstrated that genome size correlates withthe number of paralogs with larger genomes containingmore paralogs than smaller ones (Bratlie et al 2010) Takinginto account all gene family members regardless of whetherthey originate from duplication processes or transfer eventsthis trend is generally observed within chlamydial genomes(fig 2a)

As described previously gene family expansions are sparsewithin the genomes of the Chlamydiaceae (Kalman et al1999 Kamneva et al 2012) with C pneumoniae CWL029harboring the largest number (n = 24) In line with previousobservations the largest gene families identified in our studyencode the polymorphic membrane proteins (PMPs)(Grimwood and Stephens 1999 Gomes et al 2006) includingnine members from C pneumoniae LPCoLN and two fromthe C trachomatis serovars The observed split of PMPsamong several smaller gene families in our analysis is an in-dication that our approach is rather conservative in assigninga protein to a gene family

The total number of expansion events (n = 277) detectedin the genome of Simkania negevensis represents a 10-foldincrease when compared with the Chlamydiaceae As thesegroup into many small gene families the extended number ofgene copies in S negevensis is the result of many small-scaleduplication or transfer events (fig 2b) This situation is similarin Waddlia chondrophila In stark contrast roughly half of thetotal of genes resulting from expansion events in Neochlamy-dia and Protochlamydia are the contribution of only few genefamilies

Large Gene Family Expansions in theParachlamydiaceae

The detection of large gene families in Neochlamydia andProtochlamydia indicates that there have been several large-scale expansion events within the Parachlamydiaceae Nota-bly different gene families are expanded in Neochlamydia andProtochlamydia (fig 2b) These represent the four largest genefamilies (containing between 27 and 138 members) foundwithin the phylum and include two gene families specific toNeochlamydia and two restricted to Protochlamydia

Intrigued by these four large-scale lineage-specific expan-sion events between the species pairs of Protochlamydia andNeochlamydia we sought to better characterize these genefamilies as most of their members are yet unknown withrespect to their functional role (ie they are classified as hy-pothetical proteins) Remarkably despite being in differentgene families from different organisms there are several sim-ilarities between these proteins Firstly they all encompassproteinndashprotein interaction domains such as leucine-rich

repeats (LRRs) or tetratricopeptide repeats (TPRs) Secondlymany of them contain additional domains typically found ineukaryotes such as F-boxes BTB-boxes and RINGU-boxeswhich are associated with eukaryotic ubiquitination pathways(Angot et al 2007)

A

B

FIG 2 The paralogous gene landscape of the Chlamydiae (A) Numberof paralogous genes within the Chlamydiae The number of paralogousgenes not including multiple copies is plotted against genome sizealong with a linear regression line (y = 4787x ndash 429 R2 = 070 blackline) The dashed gray line is plotted as a reference from 200 prokaryoticgenomes (Bratlie et al 2010) (B) Distribution of chlamydial gene familiesper genome with two or more members The number of genes withineach family is plotted for representative genomes The genomes of theChlamydiaceae have relatively small gene family sizes The polymorphicouter membrane proteins comprise the largest gene families in theChlamydiaceae and can be seen as the two largest blocks in theChlamydia pneumoniae LPCoLN (Cpn) bar The size distribution ofgene families is ordered from smallest to greatest and the appearanceof a solid ldquoblack boxrdquo at the base is merely an effect of the spacing ofmany small gene families There are several extensive gene families (la-beled) within members of Neochlamydia (NEX1 NEX2) andProtochlamydia (PEX1 PEX2)

2892

Domman et al doi101093molbevmsu227 MBE

Eukaryotic Ubiquitination-Associated DomainsPredominate Large Gene Families

The largest gene family termed Neochlamydia expansion 1(NEX1) in the phylum comprised a total of 138 memberswhich are contributed by the two Neochlamydia genomesThe domain architecture within this large gene family is het-erogeneous however all proteins contain various C-terminalrepetitions of LRR domains (fig 3) We have identified twosubfamilies that we delineate NEX1a and NEX1b within theNEX1 family The majority of members fall into the NEX1asubfamily in which they have a highly conserved N-terminalF-box or F-box-like domain The members have an average of77 sequence similarity among each other and the F-box-likedomain is 57 and 45 similar to Acanthamoeba castellaniiand human F-box-like domains respectively The smallerNEX1b family in contrast has a conserved RINGU-box atthe N-terminus Both the F-box and RINGU-box domains areassociated with eukaryotic E3 ubiquitin ligase complexes(Willems et al 2004) Between all members in the NEX1family there is a region of roughly 50 amino acids betweenthe predicted N-terminal domains and the LRRs that is highlyconserved However we failed to detect any known domainsin this region nor was there any homology to known proteins

The second large gene family within the Neochlamydiarepresents the third largest family of the phylum This largegene family termed NEX2 comprised 50 proteins (fig 3)

Similar to NEX1 the prevailing domain architecture is thatof eukaryotic-like E3 ubiquitin ligase-associated domainspaired with repeat domains This family is defined by thepresence of multiple TPR domains at the C-terminus and ageneral conservation of an F-box domain at the N-terminus inthe majority of members In most members there is also aconserved DUF294 domain located mid protein which is aputative nucleotidyltransferase In ten members there is anovarian tumor (OTU) (Balakirev et al 2003) domain directlyfollowing the F-box domain followed by the DUF294 domain

The other large gene families occur primarily in the mem-bers of the Protochlamydia and represent the second andfourth largest gene families of the phylum The largest genefamily in the Protochlamydia (PEX1) comprised 73 membersin total (fig 3) Intriguingly another E3 ubiquitin ligase-relateddomain the BTB domain is present in all but three membersat the N-terminus The BTB domain is then coupled toC-terminal LRR domains in all members The PEX2 familycomprised a total of 27 proteins that despite no detectabledomain at the N-terminus share multiple TPR domains in themiddle of the protein followed by a CHAT domain(Sakakibara and Hattori 2000) at the C-terminus

In summary the four largest gene families represent a sur-prisingly diverse armada of proteins which most likely func-tion within eukaryotic host cells where they potentiallyinterfere with the ubiquitination pathway The heterogeneity

FIG 3 Protein domain architecture of largest gene families The domain architecture of Neochlamydia (NEX1 NEX2) and Protochlamydia (PEX1 PEX2)gene families are shown The range of the number of domain repeats and functional assignments of the detected domains are indicated NEX1 can bedivided into two subfamilies based on phylogeny and domain presenceabsence A role of these proteins in the context of eukaryotic cells can bepostulated based on the presence of domains otherwise found in eukaryotes

2893

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

in domain architecture among these proteins interestinglymirrors that of their eukaryotic counterparts (Perez-Torradoet al 2006 Xu 2006)

A Pool of Putative Effector Proteins

If the members of the largest gene families in the phylumChlamydiae serve as effector proteins for host manipulationthey would need to be secreted and transported to the hostcell cytosol This is typically achieved through a type III secre-tion system a well-conserved virulence mechanism amongthe Chlamydiae which has been shown to translocate severalcharacterized effectors (Peters et al 2007 Betts et al 2009)Indeed many of the proteins found within the expandedNeochlamydia and Protochlamydia gene families are pre-dicted by computational analysis to be secreted by the typeIII secretion system and to be extracellular host associatedWithin the NEX1 family 66 of 138 (49) members are pre-dicted to be secreted A total of 37 (51) and 10 (37) werepredicted to be secreted from within the PEX1 and PEX2 genefamilies respectively The NEX2 gene family had the fewestpredicted with only two members

As the identification of the signal for secretion via the typeIII secretion system is inherently difficult (Arnold et al 2009)we tested representatives of each of the four largest genefamilies in vitro using a heterologous type III secretion sub-strate assay with Shigella flexneri as a host for protein expres-sion This assay has been used to successfully characterize typeIII secretion effector proteins from the chlamydiae before(Subtil et al 2001) and because of the lack of routine genetictools the Sh flexneri system is an attractive surrogate methodfor analyzing type III secretion in chlamydiae in vivo Thisexperiment demonstrated that the tested members ofNEX1 NEX2 PEX1 and PEX2 contain a functional type IIIsecretion recognition signal (supplementary fig S1Supplementary Material online) Together with the presenceof eukaryotic-like domains and computational predictionsthis strongly indicates that these gene families are largepools of effector proteins

Molecular Evolution of Large Gene Families

To better understand how these large gene families may haveevolved we reconstructed their phylogenetic relationshipsGene family trees were calculated using conserved sitesamong the protein alignment (supplementary figs S2ndashS6Supplementary Material online) The average amino acididentity between members ranges from 45 to 64 withthe most closely related sequences belonging to PEX2 Treetopologies suggest that the members of all four gene familieshave rapidly diverged as indicated by their long branchlengths Although the number of LRR and TPR domainsvaries dramatically between 1 and 39 this had no apparenteffect on the phylogenetic placement

We find clear cases in which the orthologs of two speciesgroup together indicating expansions have occurred beforespeciation (supplementary figs S2ndashS6 SupplementaryMaterial online) Alternatively expansions post speciation isapparent in all gene families Reconciliation of gene family

trees with the species tree indicates that in addition to ex-pansions many gene losses have occurred for each genefamily (supplementary table S3 Supplementary Materialonline) For instance for the NEX1 family there have been78 expansion events whereas 33 losses have occurred attrib-uted to 13 and 20 losses in Neochlamydia spp EPS4 andTUME1 respectively There were nearly equal losses betweenEPS4 and TUME1 (13 and 10) in NEX2 and a total of 46expansions Similarly PEX1 consists of 52 expansions and15 and 9 losses in Protochlamydia amoebophila EI2 andUWE25 respectively

A Birth-and-Death Model of Evolution

A pattern of differential gain loss and maintenance of genefamily members is strongly indicative of these gene familiesevolving according to a birth-and-death model (Nei andRooney 2005) Because of this differential maintenance ofgene family members the hallmarks of the birth-and-deathmodel are interspecies clustering of members in the phyloge-netic trees and the presence of pseudogenes from degradedmembers (Nei 2007) As we observe interspecies clustering forthe PEX and NEX gene families (supplementary figs S2ndashS6Supplementary Material online) we also tested for pseudo-genization events in the intergenic regions of the P amoebo-phila UWE25 genome (the draft Neochlamydia genomes areless suitable for this analysis) By utilizing BLAST we searchedfor matches to the predicted proteome using all intergenicregions as a query We then mapped the best BLAST hitsrepresenting 116 pseudogenes to their respective gene fam-ilies to get a picture of a given familiesrsquo representation in theintergenic regions The most represented gene family in theintergenic regions surprisingly was PEX1 (16 pseudogenes)For PEX2 one pseudogenized fragment was detected Theobserved presence of interspecies clustering and pseudo-genes and the dynamic history of gains and losses withinthe gene families are indicative of a birth-and-death modelof evolution

In contrast if these families were evolving via concertedevolution the phylogenetic trees would depict intraspeciesclustering that is that members of a gene family will be morehomologous to the other members from the same organismthan to that of other species Intraspecies clustering occursdue to repeated recombination among gene family memberswithin a genome leading to an overall high sequence similar-ity of all members a process known as gene conversion(Santoyo and Romero 2005) We thus tested for the possibil-ity of recombination within the gene families using the meth-ods implemented in the RDP4 software suite (Martin et al2010) Care must be taken however when assessing theimpact of recombination among divergent proteins as therecombination signal is quite error prone when proteins shareless than 70 similarity and these methods are heavily de-pendent on the alignment (Martin et al 2010) We detectsome recombination events between members within thePEX and NEX gene families however the majority of thepredictions are only marginally significant (the consensusscores are below the confidence threshold of 06) In the

2894

Domman et al doi101093molbevmsu227 MBE

NEX1a family the portion of the sequences most identified asrecombinant is the F-box domain (supplementary fig S8Supplementary Material online) This should be judiciouslyinterpreted as this might represent sequence similarity due topurifying selection operating on this domain Overall we didnot find convincing evidence that recombination has played amajor role in shaping the evolution of the PEX and NEX genefamilies

Gene Duplications and Purifying Selection

Gene family expansions can be the result of either gene du-plication or HGT The disentangling of these events is nottrivial and in fact may be impossible in the case of the genefamilies investigated here due to lineage-specific evolutionand the absence of clear homologs in other bacterial taxa(Kuo and Ochman 2009b) However several lines of evidencesuggest that regardless of the initial origin of these genes geneduplication processes have played a clear role in the evolutionthe large Parachlamydiaceae gene families First a hallmark ofgene duplication is the presence of tandem arrays of genecopies In this regard we find several large tandem arrays withmembers of the large gene families in Protochlamydia andNeochlamydia including examples for recent duplications(fig 4 supplementary fig S7 Supplementary Materialonline) In the P amoebophila UWE25 genome we detected47 tandem duplication events (distance within 10 genes) rep-resented by only 13 gene families Nearly half of these are thecontribution of PEX1 (n = 20) however they appear in severalclusters spread throughout the genome PEX2 demonstratesthe most dramatic case of a tandem array consisting of 13members After the PEX gene families the third largesttandem array only comprised three members intriguinglyalso F-box domain containing proteins NEX1 has 50 instances(36) where members are found within five genes of eachother Second the majority of the large gene family membersmeet the generally used criterion for identification of paralogsthat is they show at least 30 amino acid sequence identityover at least 60 of the protein length Thirdly phylogenetictrees show several species-specific expansion events

(including those genes still arranged in tandem arrays)which are best explained by gene duplication

Effector proteins have been shown to be among the fastestevolving proteins in a number of pathogen genomes(Nogueira et al 2012) and are often shown to be under pos-itive selection To assess the mode of selection acting onmembers of the expanded gene families we calculated theratio of nonsynonymous (dN) versus synonymous (dS) sub-stitution rates for each of the four gene families A dNdS ratioequal to 1 indicates a neutral state of selection whereas valueshigher or lower than 1 indicate positive or purifying selectionrespectively Global estimates of dNdS under the Model M0from CodeML (Yang 2007) ranged from 029 (NEX2) to 063(NEX1a) Additionally using pairwise sequence comparisonsfor dNdS calculation we did not find support that these genefamilies are currently evolving under positive selectionHowever the probability of these gene families evolvingunder purifying selection was highly significant (Plt 001)Therefore purifying selection is currently the dominantforce driving the evolution of these gene families and thusfacilitates their maintenance

Massive Expansion of Ubiquitination-AssociatedProteins

Driven by the discovery that members of the four largestParachlamydiaceae gene families showed rapid divergenceand are kept in chlamydial genomes despite apparent func-tional redundancy we asked whether there are additionalgenes not included in these gene families but encoding similarfunctional domains The common theme of the largeParachlamydiaceae gene families is the presence of domainsthat serve in the recruitment of target proteins to the eukary-otic ubiquitination machinery We therefore extracted all pro-teins containing F-boxF-box-like BTBPOZ and RINGU-boxdomains by scanning all chlamydial proteomes with eachrespective HMM profile We found no RINGU-box contain-ing proteins apart from those identified earlier as members ofNEX1b and we detected only few additional proteins inNeochlamydia and Parachlamydia harboring a BTBPOZdomain similar to those of PEX1 However our search

FIG 4 Example of duplication of BTB-box proteins in the Protochlamydia A new duplicate has arisen in Protochlamydia amoebophila UWE25 (shownvia the arrow) after the split from P amoebophila EI2 BTB-box proteins are indicated in purple The phylogenetic placement of these proteins(supplementary fig S5 Supplementary Material online) supports this scenario Orthologous proteins between the Protochlamydia are indicated byconnecting blocks

2895

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

unveiled an astonishing number of proteins harboring F-boxF-box-like domains with over 370 proteins within thephylum Nearly 300 of the F-box proteins are the contributionof the two Neochlamydia species (129 in TUME1 and 158 inEPS4) To characterize the relationships among this F-boxsuperfamily we constructed a phylogenetic tree based on adomain alignment This analysis shows that many of the ad-ditional F-box proteins found in Neochlamydia cluster witheither NEX1a or NEX2 (fig 5a) We also see several lineage-specific expansions of F-box proteins within Protochlamydiaand Parachlamydia species Reconciliation of the F-boxsuperfamily tree with the chlamydial species tree confirmsan extremely dynamic history of large-scale gene birth anddeath events (fig 5b) mirroring the evolutionary pattern seen

for the large Parachlamydiaceae gene families (supplementaryfigs S2ndashS6 and table S3 Supplementary Material online)

Furthermore we searched for additional chlamydial pro-teins containing domains for proteinndashprotein interactionsuch as LRR TPR or ankyrin repeats which are often associ-ated with F-boxF-box-like BTBPOZ and RINGU-box do-mains in the large gene families This search identified a vastnumber of proteins for each domain For instance chlamydialgenomes encode 409 proteins with LRR domains nearlydoubling the amount of LRR proteins contributed by thelarge Parachlamydiaceae gene families Taken together in ad-dition to the four large gene families there is an even greaterpool of chlamydial proteins with a putative role in hostinteraction

FIG 5 The phylogeny and evolutionary history of the F-box superfamily within the Chlamydiae (A) The phylogeny of 376 proteins within theChlamydiae that harbor an F-boxF-box-like domain This domain was extracted from each protein and aligned using MAFFT Maximum-likelihoodreconstruction of the phylogeny of the superfamily was performed with FastTree2 (B) The F-box domain superfamily gene tree was reconciled with thechlamydial species tree to reconstruct the evolutionary history of this group for members of the Chlamydiae The nodes in blue indicate the predictednumber of F-box proteins and numbers on the branches depict the gains and losses The extant species are indicated with their respective counts forF-box proteins The Neochlamydia have undergone massive gains and losses after the divergence from Protochlamydia

2896

Domman et al doi101093molbevmsu227 MBE

To gain a broader overview of the occurrence of F-boxF-box-like and BTBPOZ domains among other prokaryotesand eukaryotes we extracted domain abundance data fromPfam (Finn et al 2013) and included the current counts forthe Chlamydiae genomes present in this study (fig 6) Thisrevealed a striking pattern that the few bacterial groupsencoding proteins with F-box and BTB domains are almostexclusively amoeba-associated organisms These includemembers of the Legionellales (Gammaproteobacteria) theRickettsiales (Alphaproteobacteria) and the amoeba symbi-ont Amoebophilus asiaticus (Bacteroidetes) When thenumber of F-box proteins is normalized against the totalnumber of species in a given taxon the Chlamydiae lead inthe number of F-box proteins found in bacteria and evenharbor more than several lineages of eukaryotes includingthe Amoebozoa For the BTB proteins the Chlamydiaeappear to be the only bacterial lineage that harbors thisdomain It is intriguing that many of the large double-stranded DNA viruses namely the amoeba-infecting giantviruses contain many proteins with an F-box or BTB domain

Discussion

Large Gene Families Are Rare within ReducedBacterial Genomes

Chlamydial genomes are among the smallest known for pro-karyotes due to genome degradation consistent with long-term obligate associations with eukaryotic organisms(McCutcheon and Moran 2011) The genomes of host-asso-ciated bacteria such as Coxiella Mycoplasma Rickettsia spe-cies and members of the Chlamydiaceae tend to have smallor single copy gene families (Gevers et al 2004) and genefamily expansions either by gene duplication or HGT appearto have less effect on shaping the genomes of these obligatehost-associated bacteria (Bordenstein and Reznikoff 2005)However there is evidence that HGT may be more prevalentthan once thought in these organisms (Blanc et al 2007)A notable exception to this paradigm is the genome of theobligate intracellular pathogen Orientia tsutsugamushi(Rickettsiales) in which there has been massive expansionof type IV secretion system (over 350 tra-related genes) andhostndashmicrobe interaction genes (Cho et al 2007) Howeverthese expansions are thought to be the result of copiousplasmid integration and the genome is also littered withmobile elements a scenario not shared within chlamydialgenomes Here we have shown that several members ofthe Chlamydiae harbor gene families that have expanded atimmense magnitudes especially when compared with otherintracellular bacteria (fig 2) We find strong support for du-plication processes contributing to the expansion of thesefamilies thus highlighting that innovation through gene

FIG 6 Taxonomic profile of F-box and BTB domains The distributionof (A) the F-box clan and (B) the BTB-box throughout sequenced or-ganisms The size of the node indicates the number of species harboringproteins with the domain Thus larger node size indicates a largernumber of species in which a domain is found within a taxon Nodesare ordered from least to greatest by the total number of proteins thatcontain the domain within the taxon This is different than the numberof species as one species can have many proteins harboring a givendomain To reflect this disparity and to facilitate comparisons we com-puted a normalized value for each taxon that represents the number oftotal proteins divided by the number of species This normalizationvalue is represented by the width of the arc in the diagram For instancethe chlamydiae are represented by few species (small node size) but areamong the taxa containing the largest numbers of proteins with F-Boxand BTB domains (position on vertical axis) and show a high number ofproteins with these domains per species (arc width) All bacterial taxa

FIG 6 Continuedare plotted in purple and selected major eukaryotic taxa in blue TheChlamydiae are labeled in red and double-stranded DNA viruses areshown in green The data were obtained from the Pfam database foreach domain and counts were updated to reflect findings in this study

2897

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

duplication has had pronounced effects in shaping these chla-mydial genomes

Chlamydial Proteins Putatively Involved inInterference with Eukaryotic Ubiquitination Pathways

We demonstrated that the largest chlamydial gene familiesharbor proteins with domains associated with eukaryotic ubi-quitination pathways and that the chlamydial F-box super-family in particular is tremendous in size (figs 3 and 5) Arecent survey of prokaryotes found a total of 74 F-box pro-teins distributed in 22 species (Price and Kwaik 2010) whichmeans the number of these proteins present in a singleNeochlamydia genome is twice that of all previously knownbacterial F-box proteins combined The 76-member BTB su-perfamily is also remarkable in that the Parachlamydiaceae arethe only prokaryotes known to harbor this domain (fig 6)

In eukaryotes ubiquitin plays a pivotal regulatory role as aposttranslational modification that includes targeting pro-teins for degradation The process of adding ubiquitin to aprotein occurs when an assembled E3 ubiquitin ligase com-plex carrying a target protein is bound to the ubiquitin con-jugating enzyme E2 The ubiquitin ligase is a multiproteincomplex termed the Skp1-Cullin-F-box protein complex bywhich F-box proteins recruit target proteins and subsequentlybind to Skp1 which is linked to a Cullin protein (Zheng et al2002) A RINGU-box protein then serves as a linker of E2 tothe newly formed ubiquitin ligase and the transfer of theubiquitin moiety to the target occurs BTB-box proteins canhave multiple functions but chief among them is a functionalequivalent to the F-box protein in recruiting targets by bind-ing to the E3 ubiquitin ligase complex (Perez-Torrado et al2006) Proteinndashprotein interaction domains such as ankyrinkelch WD-40 LRR or TPR repeat domains are coupled toF-box and BTB-box domains and confer the specificity fortarget proteins

Given the conservation and essentiality of the ubiquitina-tion pathway throughout eukaryotes it should come as nosurprise that bacterial pathogens have engineered ways tomanipulate this pathway The use of F-box proteins appearsto be a common feature among plant pathogens such asAgrobacterium tumefaciens and Ralstonia solanacearumwhose genomes encode one and four F-box proteins respec-tively (Magori and Citovsky 2011) Intriguingly F-box proteinsseem to be a common feature of amoeba-associated bacteriaand viruses The amoeba symbiont A asiaticus a Bacteroi-detes is predicted to harbor 15 F-box proteins and until nowwas the largest known pool of these proteins among se-quenced genomes (Schmitz-Esser et al 2010) AdditionallyLegionella pneumophila exports an F-box protein coupledto ankyrin repeats termed AnkB that is essential for infectionof both human cell lines and Acanthamoeba (Price et al 2009Lomma et al 2010) AnkB blocks host proteosomal degrada-tion and thus generates increased levels of required aminoacids (Price et al 2009 Price et al 2011) Several other F-boxproteins secreted by L pneumophila have been shown tointeract with host E3 ligase complexes (Ensminger andIsberg 2010) Among the members of the Chlamydiaceae

we could not detect an F-box or BTB-box domain Howeverthere are several proteins within the Chlamydiaceae thatfunction as deubiquinating proteases such as the C tracho-matis ChlaDub1 (Misaghi et al 2006) and the recently de-scribed ChlaOTU characterized in Chlamydia caviae (Furtadoet al 2013)

We have shown experimentally that representative mem-bers of the investigated chlamydial gene families contain func-tional type III secretion signals (supplementary fig S1Supplementary Material online) Thus they likely representan extensive pool of effector proteins with a putative role inhijacking the host ubiquitination machinery perhaps in amanner similar to AnkB from Legionella by increasing nutri-ent availability In the absence of direct proteinndashprotein in-teraction data however we can only speculate as to what theinteraction partner(s) are for the members of the PEX andNEX gene families

Birth-and-Death Evolution Has Shaped LargeParachlamydiaceae Gene Families

Large gene families are thought to generally either evolve viaconcerted evolution or according to a birth-and-death model(Nei and Rooney 2005) When gene families are evolving con-certedly all members experience the same evolutionary pres-sure and evolve as a unit The gene family is marked byrecombination between members that leads to a homogeni-zation of all members and thus in the phylogenetic analysisone observes intraspecies clustering of gene family membersAlthough we do find minor evidence that recombination hasoccurred between members in the PEX and NEX gene families(supplementary fig S8 Supplementary Material online) theeffect does not appear to be that of homogenization In con-trast long branch lengths in the trees and moderate overallsequence similarity indicate these proteins have divergedquite extensively (supplementary figs S2ndashS6 SupplementaryMaterial online) As we do not observe a dominance of in-traspecies clustering in the phylogenetic analysis these genefamilies are not evolving in a fashion as would be predicted viaconcerted evolution

We provide clear evidence supporting birth-and-deathevolution of the PEX and NEX gene families which ismarked by independent gains and losses of members (Neiand Rooney 2005) We detected frequent lineage-specific du-plication and loss events leading to high rates of variation incopy number between closely related organisms (supplemen-tary table S3 Supplementary Material online) In the phylo-genetic trees of the PEX and NEX gene families we findinterspecies clustering of members which is a hallmark ofthe birth-and-death model (supplementary figs S2ndashS6Supplementary Material online) Additionally we detectedpseudogenized gene fragments of members related to thelarge gene families which is another hallmark of this modeof evolution To our knowledge this mode of evolution has sofar only been described once for a bacterial gene family(Rooney and Ward 2008)

It has been proposed that gene families that control phe-notypic characters are generally subject to birth-and-death

2898

Domman et al doi101093molbevmsu227 MBE

evolution (Nei and Rooney 2005 Nei 2007) In the case of theexpansions seen in the Parachlamydiaceae this character islikely the ability to effectively infect and replicate in protisthosts The advantage of the birth-and-death scenario as op-posed to concerted evolution is that individual members ofthe gene family are able to functionally differentiate from eachother and thus might facilitate the adaptation to new eco-logical niches (Nei 2007) Therefore a possible driver for thebirth-and-death model for the PEX and NEX families is theexploitation of new ecological niches which in this case ismost likely new host(s) species or a novel way to subvert hostcell machinery

A Confluence of Drift and Selection

Birth-and-death evolution of gene families occurs by bothadaptive processes and chance events such as genetic drift(Nei et al 2008) This is due to the fact that although geneduplications are intrinsically stochastic events their fixation isinfluenced by both selection and genetic drift If a duplicate isfixed functional divergence can occur due to relaxed selec-tion or diversifying selection in one of the gene copies aprocess known as neofunctionalization (Lynch and Conery2000) Once these genes have diverged the new functionsare then maintained in the genome via purifying selectionWe envision an evolutionary scenario where the PEX and NEXgene family members rapidly diverged either due to positivediversifying or relaxed selection after duplication likely leadingto functional diversification The window for detecting theseearly diversification processes is very small but our analysisindicates that the gene family members are now being main-tained via purifying selection

The retention of such large gene families in organisms thattypically undergo extreme genome reduction is perplexingespecially when considering the variation in copy numberbetween organisms The large number of gains and lossesfor F-box domain containing proteins across chlamydial line-ages indicate substantial fluctuations in gene content some-times over short evolutionary time (fig 5b) One describedcorollary of gene families evolving via a birth-and-deathmodel is that the copy number variation of memberswithin a gene family may vary due to chance duplicationloss events both between and within species a process coinedldquogenomic drift (Nei et al 2008) Genomic drift has been in-voked to explain the large copy number variations in severallarge gene families in eukaryotes including animal chemosen-sory receptors (Nozawa et al 2007) homeobox genes (Namand Nei 2005) and fatty-acid reductases (Eirın-Lopez et al2012)

Remarkably eukaryotic F-box and BTB-box gene familiesdemonstrate the same dramatic evolution following a rapidbirth-and-death model and extensive copy number variationis seen both between and within eukaryotic species (Xu 2006Navarro-Quezada et al 2013) This is especially true in higherplant species (Stogios et al 2005 Xu et al 2009) whereArabidopsis thaliana and Oryza sativa harbor upwards of800 F-box proteins (Xu et al 2009) and large expansionshave also been described in some nematode lineages

(Thomas 2006) A case has been made that genomic driftalso influences the evolution of these eukaryotic F-boxBTB-box protein families (Xu et al 2009) Genomic drift there-fore seems a plausible mechanism describing some aspects ofevolution of the PEX and NEX gene families in theParachlamydiaceae Genome sequences from other chlamyd-ial genomes and data from within populations would allowfurther testing of this hypothesis

Another possibility is that the expansion of these families isthe result of selection pressure due to a coevolutionary armsrace with host counterpart protein(s) Under this scenariothe expansion and diversification of these families are in directresponse to changes in target proteins These evolutionarydynamics also referred to as the Red Queen hypothesis(Valen 1974) have been most exemplified in bacteriawithin plantndashpathogen relationships most notably betweenthe pathogen Pseudomonas syringae and its host Ar thaliana(Ma et al 2006 Baltrus et al 2011) Copy number variation ofeffector proteins has been shown for Ps syringae and it isspeculated that these differences confer differential hostranges (Baltrus et al 2011) It is plausible therefore that var-iations in copy number between the PEX and NEX gene fam-ilies is influenced by the host range of particular chlamydiallineages This would imply that these proteins interact withmany targets from a narrow host range or that there are alimited number of targets for a large number of possible hostsIt seems most parsimonious that the latter was the case andthat these genes serve as accessory virulence factors allowingthese chlamydiae to expand their host range However thefailure to detect positive selection as the major force acting onthe PEX and NEX gene family members casts doubts on acoevolutionary arms race as the sole mechanism for the evo-lution of these gene families

Another conceivable scenario might be that selection pres-sure for increased gene dosage has contributed to the expan-sion of the large chlamydial gene families In reduced genomeswhere the pool of regulatory proteins is limited a path forincreased protein expression might be gene duplication Thismode of ldquogene amplification has been described for instancefor the increase of antibiotic resistance genes in E coli thecholera toxin gene in Vibrio cholerae and capsule biosynthesisgenes in Haemophilus influenza (reviewed in Andersson andHughes 2009) Although a gene dosage scenario cannot befully ruled out for the PEX and NEX gene families it seemsunlikely as the high divergence of their members makes acompletely overlapping function highly improbableAdditionally gene amplifications are in response to strongselection pressure and would therefore be under strong pos-itive selection It is also known that once these selection pres-sures are removed the amplified gene copies are rapidly lostwithin the population often within several generations(Andersson and Hughes 2009)

We favor a hypothesis that incorporates both selectionand chance into the equation We envision that the expan-sion of these gene families is reflective of their role in hostndashmicrobe interaction where they are interfering with the hostubiquitination pathway The expansions may reflect either achange in the environmental niche perhaps the ability to

2899

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

infect a new host organism or they may be in response to agrowing number of targets within a current host Because thePEX and NEX gene families have respectively expanded in theProtochlamydia and Neochlamydia they appear to providelineage-specific functions related to particular host interac-tions The PEX and NEX variation in copy number betweenclosely related organisms may be reflective of genomic drift inwhich the independent gain and loss of members within afamily has been determined to some extent by chanceevents The fact that members of the Chlamydiae includingpathogenic Chlamydia also harbor various proteins to sub-vert the host ubiquitination pathway indicates a case of con-vergent evolution toward exploitation of this system withinthis phylum

To summarize the Chlamydiae harbor several lineage-spe-cific gene families which are the largest among intracellularmicrobes with small genomes Experimental evidence andcomputational analysis strongly indicate that members ofthese large gene families function as effector proteins involvedin manipulating the ubiquitination machinery of their eukary-otic host cells The large chlamydial gene families follow abirth-and-death model of evolution where genomic driftmay influence copy number variation This might representa previously undescribed mechanism by which organismswith limited exposure to larger gene pools generate geneticdiversity

Materials and Methods

Genome Sequencing

We sequenced the genomes of P amoebophila EI2 (Schmitz-Esser et al 2008) Parachlamydia acanthamoebae OEW1(Heinz et al 2007) Neochlamydia sp TUME1 (Fritsche et al2000) and Neochlamydia sp EPS4 The Acanthamoeba spharboring the latter was isolated from pond sediment fromElba Italy Cells and DNA were prepared as described previ-ously (Schmitz-Esser et al 2010) All four genomes were se-quenced using 454 technology and assemblies wereperformed using Newbler 26 We implemented an in-housepipeline for genome annotation that combines multipleapproaches for gene calling and function prediction Genecalling was performed by combining ab initio predictionsfrom GeneMark (Besemer et al 2001) Glimmer v302 (Del-cher et al 2007) Prodigal (Hyatt et al 2010) and Critica v105(Badger and Olsen 1999) with homology information derivedfrom a BLAST search against National Center for Biotechnol-ogy Information nonredundant protein database (NCBIResource Coordinators 2014) RNA genes were called bytRNAscan-SE (Lowe and Eddy 1997) RNAmmer (Lagesenet al 2007) and Rfam (Griffiths-Jones et al 2005) Functionprediction was performed via BLAST against the UniProtdatabase (UniProt Consortium 2014) and domain predictionwas performed via InterProScan 5 (Jones et al 2014)Completeness estimates were performed based on the pres-ence of single-copy marker genes (n = 54) found in 99 of allbacterial genomes Sequences have been deposited at Gen-bankEMBLDDBJ under accession numbers PRJNA242498PRJNA242497 PRJNA242499 and PRJNA242500

Comparative Genome Analysis

Orthologous protein groups were calculated using OrthoMCL(Li et al 2003) with default parameters using the predictedproteomes from 19 chlamydial organisms (supplementarytable S2 Supplementary Material online) For those gene fam-ilies under analysis membership to a given family was furtherevaluated by alignment and assessment of the phylogenetictrees Members that had obvious major differences in thealignment and trees were dubbed spurious and likely tohave been grouped due to homology in the repeat regionand thus were removed from the gene family Whole-genomealignments were performed using the MAUVE progressive-Mauve algorithm (Darling et al 2010) The MAUVE align-ments and local synteny plots were visualized in R (v 301)with the genoPlotR package (Guy et al 2010)

Species Tree Construction

Phylogeny of the Chlamydiae was reconstructed using 32phylogenetic marker proteins (supplementary table S2Supplementary Material online) Multiple sequence align-ments were performed with MAFFT (Katoh and Standley2013) using the settings ldquondashmaxiterate 1000 and all align-ments subsequently concatenated using FASconCAT v10(Keurouck and Meusemann 2010) Maximum-likelihood analysiswas performed with RAxML (Stamatakis 2006) with 1000bootstrap iterations under the PROTGAMMAGTR modeland Bayesian inference was performed with MrBayes v32(Ronquist and Huelsenbeck 2003) using the mixed aminoacid model and standard settings via the CIPRES science gate-way (Miller et al 2010) Alignment and tree files are availableas supplementary material Supplementary Material online

Gene Family Analysis

As multiple sequence alignments of repeat containing pro-teins are not trivial we employed several methods to obtainreliable alignments We compared the alignments producedby MUSCLE (Edgar 2004) MAFFT using the ldquogenapairsoption (Katoh and Standley 2013) and DIALIGN-PFAM (AlAit et al 2013) all with and without trimming by Gblocksusing relaxed parameters ldquo-b5=h (Castresana 2000) In nearlyall cases the MAFFT alignment combined with Gblocksyielded the best alignment as judged by manual inspectionThe exceptions were that DIALIGN-PFAM with Gblocks wasthe best method for the NEX1b and PEX1 alignments Toensure robustness we calculated neighbor joining maximumlikelihood and Bayesian trees for each of the alignment datasets and selected the most supported tree for the final anal-ysis In all cases the Bayesian trees yielded the most supportwhich were calculated on the CIPRES gateway (Miller et al2010) with MrBayes as mentioned above for the species treeFor protein domain phylogenies we extracted and aligned thedomains using the hmmalign program from HMMER3 pack-age (Eddy 2011) and phylogenetic trees for the domains werecalculated using FastTree2 (Price et al 2010) Reconciliationsof gene trees to the species tree to infer gene gain and losswere performed in Notung v26 (Stolzer et al 2012) and theroot for the gene trees was assigned within the program to

2900

Domman et al doi101093molbevmsu227 MBE

achieve the lowest duplication-to-loss ratio Phylogenetictrees were visualized using iTOL (Letunic and Bork 2007) orthe ETE2 toolkit (Huerta-Cepas et al 2010) Alignment andtree files are available as supplementary materialSupplementary Material online

Detection of Selection and Recombination

Whole-protein alignments were converted to codon align-ments via the Pal2Nal program (Suyama et al 2006) Todetect the mode of selection acting on these gene familieswe used CodeML from the PAML package (Yang 2007) Onlysequences that were nearly full length and not contig fusionswere used for the CodeML analysis We additionally used themodified NeindashGojobori method for codon selection from theMEGA v51 (Tamura et al 2011) program with 1000 boot-straps and treating missing data with pairwise deletions Weused the RDP4 software suite (Martin et al 2010) to detectrecombination events which is an amalgamation of manyindividual recombination programs linked into one softwarearchitecture We employed seven recombination detectionprograms that include RDP (Martin and Rybicki 2000)GENECONV (Sawyer 1989) MaxChi (Smith 1992) BootScan(Martin et al 2005) Chimaera (Posada and Crandall 2001)SiScan (Gibbs et al 2000) and 3Seq (Boni et al 2007) Onlyevents that were predicted with more than four programswere considered

Intergenic Sequences Analysis

The proteome of Protochlamydia was used as query(tBLASTn) against the intergenic regions The hits were con-servatively filtered based on size (4 50 aa) bitscore (4 50)and E value (lt1010) For each intergenic region when twoquery proteins overlap only the best hit was considered Inaddition if different parts of the same protein had hits in oneintergenic space only the top scoring (bitscore) was consid-ered Putative domains in the intergenic regions were de-tected using InterProScan 5 (Jones et al 2014)

Domain Distribution

For each domain of interest we downloaded the Pfam HMMmodel (Finn et al 2013) and scanned all chlamydial prote-omes using the hmmscan program from the HMMER3 suite(Eddy 2011) In the case of the F-box and F-box-like domainsthe results were combined into a nonredundant list The datafor the other taxa were obtained through the Pfam websiteunder the species distribution tab for each domain or clan asin the case for the F-box The networks were created using theldquoarcdiagramrdquo package in R

Type III Secretion Analysis

Fusion proteins containing the 50-part of the genes of interest(including the first 20 codons) and the adenylate cyclase Cyawere expressed in Sh Flexneri SF401 and SF620 derivatives ofthe wild-type strain M90T in which the mxiD and ipaB geneshave been inactivated (Allaoui et al 1993) The 50-part of thetarget genes were amplified by polymerase chain reaction andcloned in the puc19cya vector as described (Subtil et al 2001)

Secretion assays were performed on 30 ml of exponentiallygrown cultures as described previously (Subtil et al 2001)Antibodies against CRP a cytosolic marker were used to es-timate the contamination of supernatant fractions with bac-terial proteins as a result of bacterial lysis Antibodies againstIpaD a type-III-secreted protein of Shigella were used to verifythat type III secretion occurred normally in the transformedstrains A monoclonal antibody against Cya and polyclonalantibodies against CRP and IpaD were generously provided byDrs N Guiso A Ullmann and C Parsot respectively (InstitutPasteur Paris) Prediction of type III secretion was performedvia the Effective database (Jehl et al 2011) and the PSORTbwebserver (Yu et al 2010)

Supplementary MaterialSupplementary tables S1ndashS3 and figures S1ndashS8 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Thomas Penz and Gabriele Schwammel are acknowledged forisolating genomic DNA This work was supported byAustrian Science Fund FWF grants Y277-B06 and I1628-B22European Research Council ERC StG EVOCHLAMY (281633)and Marie Curie Initial Training Network SymbiomicsGenome sequencing was performed at the NorwegianSequencing Center

ReferencesAl Ait L Yamak Z Morgenstern B 2013 DIALIGN at GOBICSmdashmultiple

sequence alignment using various sources of external informationNucleic Acids Res 41W3ndashW7

Allaoui A Sansonetti PJ Parsot C 1993 MxiD an outer membraneprotein necessary for the secretion of the Shigella flexneri lpa inva-sins Mol Microbiol 759ndash68

Andersson DI Hughes D 2009 Gene amplification and adaptive evolu-tion in bacteria Annu Rev Genet 43167ndash195

Angot A Vergunst A Genin S Peeters N 2007 Exploitationof eukaryotic ubiquitin signaling pathways by effectorstranslocated by bacterial type III and type IV Secretion SystemsPLoS Pathog 3e3

Arnold R Brandmaier S Kleine F Tischler P Heinz E Behrens SNiinikoski A Mewes HW Horn M Rattei T 2009 Sequence-based prediction of type III secreted proteins PLoS Pathog 5e1000376

Badger JH Olsen GJ 1999 CRITICA coding region identification toolinvoking comparative analysis Mol Biol Evol 16512ndash524

Balakirev MY Tcherniuk SO Jaquinod M Chroboczek J 2003 Otubainsa new family of cysteine proteases in the ubiquitin pathway EMBORep 4517ndash522

Baltrus DA Nishimura MT Romanchuk A Chang JH Mukhtar MSCherkis K Roach J Grant SR Jones CD Dangl JL 2011 Dynamicevolution of pathogenicity revealed by sequencing and comparativegenomics of 19 Pseudomonas syringae isolates PLoS Pathog 7e1002132

Bertelli C Collyn F Croxatto A Reurouckert C Polkinghorne A Kebbi-Beghdadi C Goesmann A Vaughan L Greub G 2010 TheWaddlia genome a window into chlamydial biology PLoS One 5e10890

2901

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

Besemer J Lomsadze A Borodovsky M 2001 GeneMarkS a self-trainingmethod for prediction of gene starts in microbial genomesImplications for finding sequence motifs in regulatory regionsNucleic Acids Res 292607ndash2618

Betts HJ Wolf K Fields KA 2009 Effector protein modulation of hostcells examples in the Chlamydia spp arsenal Curr Opin Microbiol1281ndash87

Blanc G Ogata H Robert C Audic S Claverie J-M Raoult D 2007Lateral gene transfer between obligate intracellular bacteria evi-dence from the Rickettsia massiliae genome Genome Res 171657ndash1664

Boni MF Posada D Feldman MW 2007 An exact nonparametricmethod for inferring mosaic structure in sequence tripletsGenetics 1761035ndash1047

Bordenstein SR Reznikoff WS 2005 Mobile DNA in obligate intracellularbacteria Nat Rev Microbiol 3688ndash699

Bratlie MS Johansen J Sherman BT Huang DW Lempicki RA Drabloslashs F2010 Gene duplications in prokaryotes can be associated with en-vironmental adaptation BMC Genomics 11588

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Cho N-H Kim H-R Lee J-H Kim SY Kim J Cha S Kim SY Darby ACFuxelius HH Yin J et al 2007 The Orientia tsutsugamushi genome re-veals massive proliferation of conjugative type IV secretion systemand host-cell interaction genes Proc Natl Acad Sci U S A 1047981ndash7986

Collingro A Tischler P Weinmaier T Penz T Heinz E Brunham RC ReadTD Bavoil PM Sachse K Kahane S et al 2011 Unity in varietymdashthe pan-genome of the Chlamydiae Mol Biol Evol 283253ndash3270

Darling AE Mau B Perna NT 2010 progressiveMauve multiple genomealignment with gene gain loss and rearrangement PLoS One 5e11147

Delcher AL Bratke KA Powers EC Salzberg SL 2007 Identifying bacterialgenes and endosymbiont DNA with Glimmer Bioinformatics 23673ndash679

Eddy SR 2011 Accelerated profile hmm searches PLoS Comput Biol 7e1002195

Edgar RC 2004 MUSCLE multiple sequence alignmentwith high accuracy and high throughput Nucleic Acids Res 321792ndash1797

Eirın-Lopez JM Rebordinos L Rooney AP Rozas J 2012 The birth-and-death evolution of multigene families revisited Genome Dyn7170ndash196

Ensminger AW Isberg RR 2010 E3 ubiquitin ligase activity and targetingof Bat3 by multiple Legionella pneumophila translocated substratesInfect Immun 783905ndash3919

Finn RD Bateman A Clements J Coggill P Eberhardt RY Eddy SR HegerA Hetherington K Holm L Mistry J et al 2013 Pfam the proteinfamilies database Nucleic Acids Res 42D222ndashD230

Fritsche TR Horn M Wagner M Herwig RP Schleifer K-H Gautom RK2000 Phylogenetic diversity among geographically dispersedChlamydiales endosymbionts recovered from clinical and environ-mental isolates of Acanthamoeba spp Appl Environ Microbiol 662613ndash2619

Furtado AR Essid M Perrinet S Bala~na ME Yoder N Dehoux P Subtil A2013 The chlamydial OTU domain-containing protein ChlaOTU isan early type III secretion effector targeting ubiquitin and NDP52Cell Microbiol 152064ndash2079

Gevers D Vandepoele K Simillon C Van de Peer Y 2004 Gene dupli-cation and biased functional retention of paralogs in bacterial ge-nomes Trends Microbiol 12148ndash154

Gibbs MJ Armstrong JS Gibbs AJ 2000 Sister-scanning a Monte Carloprocedure for assessing signals in recombinant sequencesBioinformatics 16573ndash582

Goldman BS Nierman WC Kaiser D Slater SC Durkin AS Eisen JARonning CM Barbazuk WB Blanchard M Field C et al 2006Evolution of sensory complexity recorded in a myxobacterialgenome Proc Natl Acad Sci U S A 10315200ndash15205

Gomes JP Nunes A Bruno WJ Borrego MJ Florindo C Dean D 2006Polymorphisms in the nine polymorphic membrane proteins ofChlamydia trachomatis across all serovars evidence for serovar Darecombination and correlation with tissue tropism J Bacteriol 188275ndash286

Griffiths-Jones S Moxon S Marshall M Khanna A Eddy SR Bateman A2005 Rfam annotating non-coding RNAs in complete genomesNucleic Acids Res 33D121ndashD124

Grimwood J Stephens RS 1999 Computational analysis of thepolymorphic membrane protein superfamily of Chlamydia tracho-matis and Chlamydia pneumoniae Microb Comp Genomics 4187ndash201

Guy L Kultima JR Andersson SGE 2010 genoPlotR comparative geneand genome visualization in R Bioinformatics 262334ndash2335

Heinz E Kolarov I Keuroastner C Toenshoff ER Wagner M Horn M 2007An Acanthamoeba sp containing two phylogenetically differentbacterial endosymbionts Environ Microbiol 91604ndash1609

Hooper SD Berg OG 2003 On the nature of gene innovation duplica-tion patterns in microbial genomes Mol Biol Evol 20945ndash954

Horn M 2008 Chlamydiae as symbionts in eukaryotes Annu RevMicrobiol 62113ndash131

Horn M Collingro A Schmitz-Esser S Beier CL Purkhold U Fartmann BBrandt P Nyakatura GJ Droege M Frishman D et al 2004Illuminating the evolutionary history of chlamydiae Science 304728ndash730

Huerta-Cepas J Dopazo J Gabaldon T 2010 ETE a python environmentfor tree exploration BMC Bioinformatics 1124

Hyatt D Chen G-L LoCascio PF Land ML Larimer FW Hauser LJ 2010Prodigal prokaryotic gene recognition and translation initiation siteidentification BMC Bioinformatics 11119

Jehl M-A Arnold R Rattei T 2011 Effectivemdasha database ofpredicted secreted bacterial proteins Nucleic Acids Res 39D591ndashD595

Jones P Binns D Chang H-Y Fraser M Li W McAnulla C McWilliam HMaslen J Mitchell A Nuka G et al 2014 InterProScan 5 genome-scale protein function classification Bioinformatics 301236ndash1240

Kaessmann H 2010 Origins evolution and phenotypic impact of newgenes Genome Res 201313ndash1326

Kalman S Mitchell W Marathe R Lammel C Fan J Hyman RW OlingerL Grimwood J Davis RW Stephens RS 1999 Comparative genomesof Chlamydia pneumoniae and C trachomatis Nat Genet 21385ndash389

Kamneva OK Knight SJ Liberles DA Ward NL 2012 Analysis of genomecontent evolution in PVC bacterial super-phylum assessment ofcandidate genes associated with cellular organization and lifestyleGenome Biol Evol 41375ndash1390

Katoh K Standley DM 2013 MAFFT multiple sequence alignment soft-ware version 7 improvements in performance and usability Mol BiolEvol 30772ndash780

Koonin EV Wolf YI Karev GP 2002 The structure of the protein uni-verse and genome evolution Nature 420218ndash223

Keurouck P Meusemann K 2010 FASconCAT convenient handling of datamatrices Mol Phylogenet Evol 561115ndash1118

Kuo C-H Ochman H 2009a Deletional bias across the three domains oflife Genome Biol Evol 1145ndash152

Kuo C-H Ochman H 2009b The fate of new bacterial genes FEMSMicrobiol Rev 3338ndash43

Lagesen K Hallin P Roslashdland EA Staefeldt H-H Rognes T Ussery DW2007 RNAmmer consistent and rapid annotation of ribosomalRNA genes Nucleic Acids Res 353100ndash3108

Lagkouvardos I Weinmaier T Lauro FM Cavicchioli R Rattei T Horn M2013 Integrating metagenomic and amplicon databases to re-solve the phylogenetic and ecological diversity of the ChlamydiaeISME J 8115ndash125

Lerat E Daubin V Ochman H Moran NA 2005 Evolutionary origins ofgenomic repertoires in bacteria PLoS Biol 3e130

Letunic I Bork P 2007 Interactive Tree Of Life (iTOL) an online tool forphylogenetic tree display and annotation Bioinformatics 23127ndash128

2902

Domman et al doi101093molbevmsu227 MBE

Li L Stoeckert CJ Jr Roos DS 2003 OrthoMCL identification ofortholog groups for eukaryotic genomes Genome Res 132178ndash2189

Lomma M Dervins-Ravault D Rolando M Nora T Newton HJ SansomFM Sahr T Gomez-Valero L Jules M Hartland EL et al 2010The Legionella pneumophila F-box protein Lpp2082 (AnkB) modu-lates ubiquitination of the host protein parvin B and promotesintracellular replication Cell Microbiol 121272ndash1291

Lowe TM Eddy SR 1997 tRNAscan-SE a program for improved detec-tion of transfer RNA genes in genomic sequence Nucleic Acids Res25955ndash964

Lynch M Conery JS 2000 The evolutionary fate and consequences ofduplicate genes Science 2901151ndash1155

Ma W Dong FFT Stavrinides J Guttman DS 2006 Type III effec-tor diversification via both pathoadaptation and horizontaltransfer in response to a coevolutionary arms race PLoS Genet 2e209

Magori S Citovsky V 2011 Hijacking of the host SCF ubiquitin ligasemachinery by plant pathogens Front Plant Sci 287

Martin D Rybicki E 2000 RDP detection of recombination amongstaligned sequences Bioinformatics 16562ndash563

Martin DP Lemey P Lott M Moulton V Posada D Lefeuvre P 2010RDP3 a flexible and fast computer program for analyzing recombi-nation Bioinformatics 262462ndash2463

Martin DP Posada D Crandall KA Williamson C 2005 A modifiedbootscan algorithm for automated identification of recombinantsequences and recombination breakpoints AIDS Res HumRetroviruses 2198ndash102

McCutcheon JP Moran NA 2011 Extreme genome reduction in sym-biotic bacteria Nat Rev Microbiol 1013ndash26

McLeod MP Warren RL Hsiao WWL Araki N Myhre M Fernandes CMiyazawa D Wong W Lillquist AL Wang D et al 2006 Thecomplete genome of Rhodococcus sp RHA1 provides insightsinto a catabolic powerhouse Proc Natl Acad Sci U S A 10315582ndash15587

Miller MA Pfeiffer W Schwartz T 2010 Creating the CIPRES sciencegateway for inference of large phylogenetic trees In Proceedings ofthe Gateway Computing Environments Workshop (GCE) 2010 p1ndash8

Misaghi S Balsara ZR Catic A Spooner E Ploegh HL Starnbach MN2006 Chlamydia trachomatis-derived deubiquitinating enzymes inmammalian cells during infection Mol Microbiol 61142ndash150

Myers GSA Crabtree J Huot CH 2012 Deep and wide comparativegenomics of Chlamydia In Tan M Bavoil PM editors Intracellularpathogens Chlamydiales Washington (DC) ASM Press p 27ndash50

Nam J Nei M 2005 Evolutionary change of the numbers of homeoboxgenes in bilateral animals Mol Biol Evol 222386ndash2394

Navarro-Quezada A Schumann N Quint M 2013 Plant F-box proteinevolution is determined by lineage-specific timing of major genefamily expansion waves PLoS One 8e68672

NCBI Resource Coordinators 2014 Database resources of the NationalCenter for Biotechnology Information Nucleic Acids Res 42D7ndashD17

Nei M 2007 The new mutation theory of phenotypic evolution ProcNatl Acad Sci U S A 10412235ndash12242

Nei M Niimura Y Nozawa M 2008 The evolution of animal chemo-sensory receptor gene repertoires roles of chance and necessity NatRev Genet 9951ndash963

Nei M Rooney AP 2005 Concerted and birth-and-death evolution ofmultigene families Annu Rev Genet 39121ndash152

Nogueira T Touchon M Rocha EPC 2012 Rapid evolution of the se-quences and gene repertoires of secreted proteins in bacteria PLoSOne 7e49403

Nozawa M Kawahara Y Nei M 2007 Genomic drift and copy numbervariation of sensory receptor genes in humans Proc Natl Acad SciU S A 10420421ndash20426

Ochman H Lawrence JG Groisman EA 2000 Lateral gene transfer andthe nature of bacterial innovation Nature 405299ndash304

Perez-Torrado R Yamada D Defossez P-A 2006 Born to bind the BTBproteinndashprotein interaction domain BioEssays 281194ndash1202

Peters J Wilson DP Myers G Timms P Bavoil PM 2007 Type III secre-tion a la Chlamydia Trends Microbiol 15241ndash251

Posada D Crandall KA 2001 Evaluation of methods for detecting re-combination from DNA sequences computer simulations Proc NatlAcad Sci U S A 9813757ndash13762

Price CT Kwaik YA 2010 Exploitation of host polyubiquitination ma-chinery through molecular mimicry by eukaryotic-like bacterial F-box effectors Cell Infect Microbiol 1122

Price CT Al-Khodor S Al-Quadan T Santic M Habyarimana F Kalia AKwaik YA 2009 Molecular mimicry by an F-box effector ofLegionella pneumophila hijacks a conserved polyubiquitinationmachinery within macrophages and protozoa PLoS Pathog 5e1000704

Price CTD Al-Quadan T Santic M Rosenshine I Kwaik YA 2011 Hostproteasomal degradation generates amino acids essential for intra-cellular bacterial growth Science 3341553ndash1557

Price MN Dehal PS Arkin AP 2010 FastTree 2mdashapproximately maxi-mum-likelihood trees for large alignments PLoS One 5e9490

Pushker R Mira A Rodriguez-Valera F 2004 Comparative geno-mics of gene-family size in closely related bacteria Genome Biol 5R27

Ronquist F Huelsenbeck JP 2003 MrBayes 3 Bayesian phylogeneticinference under mixed models Bioinformatics 191572ndash1574

Rooney AP Ward TJ 2008 Birth-and-death evolution of the internalinmultigene family in Listeria Gene 427124ndash128

Sakakibara A Hattori S 2000 Chat a CasHEF1-associated adaptor pro-tein that integrates multiple signaling pathways J Biol Chem 2756404ndash6410

Santoyo G Romero D 2005 Gene conversion and concerted evolutionin bacterial genomes FEMS Microbiol Rev 29169ndash183

Sawyer S 1989 Statistical tests for detecting gene conversion Mol BiolEvol 6526ndash538

Schmitz-Esser S Tischler P Arnold R Montanaro J Wagner M Rattei THorn M 2010 The genome of the amoeba symbiont ldquoCandidatusAmoebophilus asiaticusrdquo reveals common mechanisms for host cellinteraction among amoeba-associated bacteria J Bacteriol 1921045ndash1057

Schmitz-Esser S Toenshoff ER Haider S Heinz E Hoenninger VMWagner M Horn M 2008 Diversity of bacterial endosymbionts ofenvironmental Acanthamoeba isolates Appl Environ Microbiol 745822ndash5831

Smith JM 1992 Analyzing the mosaic structure of genes J Mol Evol 34126ndash129

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Stephens RS Kalman S Lammel C Fan J Marathe R Aravind L MitchellW Olinger L Tatusov RL Zhao Q et al 1998 Genome sequence ofan obligate intracellular pathogen of humans Chlamydia trachoma-tis Science 282754ndash759

Stogios PJ Downs GS Jauhal JJ Nandra SK Prive GG 2005Sequence and structural analysis of BTB domain proteins GenomeBiol 6R82

Stolzer M Lai H Xu M Sathaye D Vernot B Durand D 2012Inferring duplications losses transfers and incomplete line-age sorting with nonbinary species trees Bioinformatics 28i409ndashi415

Subtil A Collingro A Horn M 2014 Tracing the primordial Chlamydiaeextinct parasites of plants Trends Plant Sci 1936ndash43

Subtil A Parsot C Dautry-Varsat A 2001 Secretion of predicted Incproteins of Chlamydia pneumoniae by a heterologous type III ma-chinery Mol Microbiol 39792ndash800

Suyama M Torrents D Bork P 2006 PAL2NAL robust conversion ofprotein sequence alignments into the corresponding codon align-ments Nucleic Acids Res 34W609ndashW612

2903

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

Tamura K Peterson D Peterson N Stecher G Nei M Kumar S 2011MEGA5 molecular evolutionary genetics analysis using maximumlikelihood evolutionary distance and maximum parsimony meth-ods Mol Biol Evol 282731ndash2739

Thomas JH 2006 Adaptive evolution in two large families of ubiquitin-ligase adapters in nematodes and plants Genome Res 161017ndash1030

Treangen TJ Rocha EPC 2011 Horizontal transfer not duplicationdrives the expansion of protein families in prokaryotes PLoSGenet 7e1001284

UniProt Consortium 2014 Activities at the Universal Protein Resource(UniProt) Nucleic Acids Res 42D191ndashD198

Valen LV 1974 Molecular evolution as predicted by natural selectionJ Mol Evol 389ndash101

Willems AR Schwab M Tyers M 2004 A hitchhikerrsquos guide to the cullinubiquitin ligases SCF and its kin Biochim Biophys Acta 1695133ndash170

Xu G Ma H Nei M Kong H 2009 Evolution of F-box genes in plantsdifferent modes of sequence divergence and their relationships withfunctional diversification Proc Natl Acad Sci U S A 106835ndash840

Xu J 2006 Microbial ecology in the age of genomics andmetagenomics concepts tools and recent advances Mol Ecol 151713ndash1731

Yang J Lusk R Li W-H 2003 Organismal complexity protein complex-ity and gene duplicability Proc Natl Acad Sci U S A 10015661ndash15665

Yang Z 2007 PAML 4 phylogenetic analysis by maximum likelihoodMol Biol Evol 241586ndash1591

Yu NY Wagner JR Laird MR Melli G Rey S Lo R Dao P Sahinalp SCEster M Foster LJ et al 2010 PSORTb 30 improved protein sub-cellular localization prediction with refined localization subcate-gories and predictive capabilities for all prokaryotes Bioinformatics261608ndash1615

Zhang J 2003 Evolution by gene duplication an update Trends EcolEvol 18292ndash298

Zheng N Schulman BA Song L Miller JJ Jeffrey PD Wang P Chu CKoepp DM Elledge SJ Pagano M et al 2002 Structure of the Cul1ndashRbx1ndashSkp1ndashF boxSkp2 SCF ubiquitin ligase complex Nature 416703ndash709

2904

Domman et al doi101093molbevmsu227 MBE

Page 5: Massive Expansion of Ubiquitination-Related Gene Families

Eukaryotic Ubiquitination-Associated DomainsPredominate Large Gene Families

The largest gene family termed Neochlamydia expansion 1(NEX1) in the phylum comprised a total of 138 memberswhich are contributed by the two Neochlamydia genomesThe domain architecture within this large gene family is het-erogeneous however all proteins contain various C-terminalrepetitions of LRR domains (fig 3) We have identified twosubfamilies that we delineate NEX1a and NEX1b within theNEX1 family The majority of members fall into the NEX1asubfamily in which they have a highly conserved N-terminalF-box or F-box-like domain The members have an average of77 sequence similarity among each other and the F-box-likedomain is 57 and 45 similar to Acanthamoeba castellaniiand human F-box-like domains respectively The smallerNEX1b family in contrast has a conserved RINGU-box atthe N-terminus Both the F-box and RINGU-box domains areassociated with eukaryotic E3 ubiquitin ligase complexes(Willems et al 2004) Between all members in the NEX1family there is a region of roughly 50 amino acids betweenthe predicted N-terminal domains and the LRRs that is highlyconserved However we failed to detect any known domainsin this region nor was there any homology to known proteins

The second large gene family within the Neochlamydiarepresents the third largest family of the phylum This largegene family termed NEX2 comprised 50 proteins (fig 3)

Similar to NEX1 the prevailing domain architecture is thatof eukaryotic-like E3 ubiquitin ligase-associated domainspaired with repeat domains This family is defined by thepresence of multiple TPR domains at the C-terminus and ageneral conservation of an F-box domain at the N-terminus inthe majority of members In most members there is also aconserved DUF294 domain located mid protein which is aputative nucleotidyltransferase In ten members there is anovarian tumor (OTU) (Balakirev et al 2003) domain directlyfollowing the F-box domain followed by the DUF294 domain

The other large gene families occur primarily in the mem-bers of the Protochlamydia and represent the second andfourth largest gene families of the phylum The largest genefamily in the Protochlamydia (PEX1) comprised 73 membersin total (fig 3) Intriguingly another E3 ubiquitin ligase-relateddomain the BTB domain is present in all but three membersat the N-terminus The BTB domain is then coupled toC-terminal LRR domains in all members The PEX2 familycomprised a total of 27 proteins that despite no detectabledomain at the N-terminus share multiple TPR domains in themiddle of the protein followed by a CHAT domain(Sakakibara and Hattori 2000) at the C-terminus

In summary the four largest gene families represent a sur-prisingly diverse armada of proteins which most likely func-tion within eukaryotic host cells where they potentiallyinterfere with the ubiquitination pathway The heterogeneity

FIG 3 Protein domain architecture of largest gene families The domain architecture of Neochlamydia (NEX1 NEX2) and Protochlamydia (PEX1 PEX2)gene families are shown The range of the number of domain repeats and functional assignments of the detected domains are indicated NEX1 can bedivided into two subfamilies based on phylogeny and domain presenceabsence A role of these proteins in the context of eukaryotic cells can bepostulated based on the presence of domains otherwise found in eukaryotes

2893

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

in domain architecture among these proteins interestinglymirrors that of their eukaryotic counterparts (Perez-Torradoet al 2006 Xu 2006)

A Pool of Putative Effector Proteins

If the members of the largest gene families in the phylumChlamydiae serve as effector proteins for host manipulationthey would need to be secreted and transported to the hostcell cytosol This is typically achieved through a type III secre-tion system a well-conserved virulence mechanism amongthe Chlamydiae which has been shown to translocate severalcharacterized effectors (Peters et al 2007 Betts et al 2009)Indeed many of the proteins found within the expandedNeochlamydia and Protochlamydia gene families are pre-dicted by computational analysis to be secreted by the typeIII secretion system and to be extracellular host associatedWithin the NEX1 family 66 of 138 (49) members are pre-dicted to be secreted A total of 37 (51) and 10 (37) werepredicted to be secreted from within the PEX1 and PEX2 genefamilies respectively The NEX2 gene family had the fewestpredicted with only two members

As the identification of the signal for secretion via the typeIII secretion system is inherently difficult (Arnold et al 2009)we tested representatives of each of the four largest genefamilies in vitro using a heterologous type III secretion sub-strate assay with Shigella flexneri as a host for protein expres-sion This assay has been used to successfully characterize typeIII secretion effector proteins from the chlamydiae before(Subtil et al 2001) and because of the lack of routine genetictools the Sh flexneri system is an attractive surrogate methodfor analyzing type III secretion in chlamydiae in vivo Thisexperiment demonstrated that the tested members ofNEX1 NEX2 PEX1 and PEX2 contain a functional type IIIsecretion recognition signal (supplementary fig S1Supplementary Material online) Together with the presenceof eukaryotic-like domains and computational predictionsthis strongly indicates that these gene families are largepools of effector proteins

Molecular Evolution of Large Gene Families

To better understand how these large gene families may haveevolved we reconstructed their phylogenetic relationshipsGene family trees were calculated using conserved sitesamong the protein alignment (supplementary figs S2ndashS6Supplementary Material online) The average amino acididentity between members ranges from 45 to 64 withthe most closely related sequences belonging to PEX2 Treetopologies suggest that the members of all four gene familieshave rapidly diverged as indicated by their long branchlengths Although the number of LRR and TPR domainsvaries dramatically between 1 and 39 this had no apparenteffect on the phylogenetic placement

We find clear cases in which the orthologs of two speciesgroup together indicating expansions have occurred beforespeciation (supplementary figs S2ndashS6 SupplementaryMaterial online) Alternatively expansions post speciation isapparent in all gene families Reconciliation of gene family

trees with the species tree indicates that in addition to ex-pansions many gene losses have occurred for each genefamily (supplementary table S3 Supplementary Materialonline) For instance for the NEX1 family there have been78 expansion events whereas 33 losses have occurred attrib-uted to 13 and 20 losses in Neochlamydia spp EPS4 andTUME1 respectively There were nearly equal losses betweenEPS4 and TUME1 (13 and 10) in NEX2 and a total of 46expansions Similarly PEX1 consists of 52 expansions and15 and 9 losses in Protochlamydia amoebophila EI2 andUWE25 respectively

A Birth-and-Death Model of Evolution

A pattern of differential gain loss and maintenance of genefamily members is strongly indicative of these gene familiesevolving according to a birth-and-death model (Nei andRooney 2005) Because of this differential maintenance ofgene family members the hallmarks of the birth-and-deathmodel are interspecies clustering of members in the phyloge-netic trees and the presence of pseudogenes from degradedmembers (Nei 2007) As we observe interspecies clustering forthe PEX and NEX gene families (supplementary figs S2ndashS6Supplementary Material online) we also tested for pseudo-genization events in the intergenic regions of the P amoebo-phila UWE25 genome (the draft Neochlamydia genomes areless suitable for this analysis) By utilizing BLAST we searchedfor matches to the predicted proteome using all intergenicregions as a query We then mapped the best BLAST hitsrepresenting 116 pseudogenes to their respective gene fam-ilies to get a picture of a given familiesrsquo representation in theintergenic regions The most represented gene family in theintergenic regions surprisingly was PEX1 (16 pseudogenes)For PEX2 one pseudogenized fragment was detected Theobserved presence of interspecies clustering and pseudo-genes and the dynamic history of gains and losses withinthe gene families are indicative of a birth-and-death modelof evolution

In contrast if these families were evolving via concertedevolution the phylogenetic trees would depict intraspeciesclustering that is that members of a gene family will be morehomologous to the other members from the same organismthan to that of other species Intraspecies clustering occursdue to repeated recombination among gene family memberswithin a genome leading to an overall high sequence similar-ity of all members a process known as gene conversion(Santoyo and Romero 2005) We thus tested for the possibil-ity of recombination within the gene families using the meth-ods implemented in the RDP4 software suite (Martin et al2010) Care must be taken however when assessing theimpact of recombination among divergent proteins as therecombination signal is quite error prone when proteins shareless than 70 similarity and these methods are heavily de-pendent on the alignment (Martin et al 2010) We detectsome recombination events between members within thePEX and NEX gene families however the majority of thepredictions are only marginally significant (the consensusscores are below the confidence threshold of 06) In the

2894

Domman et al doi101093molbevmsu227 MBE

NEX1a family the portion of the sequences most identified asrecombinant is the F-box domain (supplementary fig S8Supplementary Material online) This should be judiciouslyinterpreted as this might represent sequence similarity due topurifying selection operating on this domain Overall we didnot find convincing evidence that recombination has played amajor role in shaping the evolution of the PEX and NEX genefamilies

Gene Duplications and Purifying Selection

Gene family expansions can be the result of either gene du-plication or HGT The disentangling of these events is nottrivial and in fact may be impossible in the case of the genefamilies investigated here due to lineage-specific evolutionand the absence of clear homologs in other bacterial taxa(Kuo and Ochman 2009b) However several lines of evidencesuggest that regardless of the initial origin of these genes geneduplication processes have played a clear role in the evolutionthe large Parachlamydiaceae gene families First a hallmark ofgene duplication is the presence of tandem arrays of genecopies In this regard we find several large tandem arrays withmembers of the large gene families in Protochlamydia andNeochlamydia including examples for recent duplications(fig 4 supplementary fig S7 Supplementary Materialonline) In the P amoebophila UWE25 genome we detected47 tandem duplication events (distance within 10 genes) rep-resented by only 13 gene families Nearly half of these are thecontribution of PEX1 (n = 20) however they appear in severalclusters spread throughout the genome PEX2 demonstratesthe most dramatic case of a tandem array consisting of 13members After the PEX gene families the third largesttandem array only comprised three members intriguinglyalso F-box domain containing proteins NEX1 has 50 instances(36) where members are found within five genes of eachother Second the majority of the large gene family membersmeet the generally used criterion for identification of paralogsthat is they show at least 30 amino acid sequence identityover at least 60 of the protein length Thirdly phylogenetictrees show several species-specific expansion events

(including those genes still arranged in tandem arrays)which are best explained by gene duplication

Effector proteins have been shown to be among the fastestevolving proteins in a number of pathogen genomes(Nogueira et al 2012) and are often shown to be under pos-itive selection To assess the mode of selection acting onmembers of the expanded gene families we calculated theratio of nonsynonymous (dN) versus synonymous (dS) sub-stitution rates for each of the four gene families A dNdS ratioequal to 1 indicates a neutral state of selection whereas valueshigher or lower than 1 indicate positive or purifying selectionrespectively Global estimates of dNdS under the Model M0from CodeML (Yang 2007) ranged from 029 (NEX2) to 063(NEX1a) Additionally using pairwise sequence comparisonsfor dNdS calculation we did not find support that these genefamilies are currently evolving under positive selectionHowever the probability of these gene families evolvingunder purifying selection was highly significant (Plt 001)Therefore purifying selection is currently the dominantforce driving the evolution of these gene families and thusfacilitates their maintenance

Massive Expansion of Ubiquitination-AssociatedProteins

Driven by the discovery that members of the four largestParachlamydiaceae gene families showed rapid divergenceand are kept in chlamydial genomes despite apparent func-tional redundancy we asked whether there are additionalgenes not included in these gene families but encoding similarfunctional domains The common theme of the largeParachlamydiaceae gene families is the presence of domainsthat serve in the recruitment of target proteins to the eukary-otic ubiquitination machinery We therefore extracted all pro-teins containing F-boxF-box-like BTBPOZ and RINGU-boxdomains by scanning all chlamydial proteomes with eachrespective HMM profile We found no RINGU-box contain-ing proteins apart from those identified earlier as members ofNEX1b and we detected only few additional proteins inNeochlamydia and Parachlamydia harboring a BTBPOZdomain similar to those of PEX1 However our search

FIG 4 Example of duplication of BTB-box proteins in the Protochlamydia A new duplicate has arisen in Protochlamydia amoebophila UWE25 (shownvia the arrow) after the split from P amoebophila EI2 BTB-box proteins are indicated in purple The phylogenetic placement of these proteins(supplementary fig S5 Supplementary Material online) supports this scenario Orthologous proteins between the Protochlamydia are indicated byconnecting blocks

2895

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

unveiled an astonishing number of proteins harboring F-boxF-box-like domains with over 370 proteins within thephylum Nearly 300 of the F-box proteins are the contributionof the two Neochlamydia species (129 in TUME1 and 158 inEPS4) To characterize the relationships among this F-boxsuperfamily we constructed a phylogenetic tree based on adomain alignment This analysis shows that many of the ad-ditional F-box proteins found in Neochlamydia cluster witheither NEX1a or NEX2 (fig 5a) We also see several lineage-specific expansions of F-box proteins within Protochlamydiaand Parachlamydia species Reconciliation of the F-boxsuperfamily tree with the chlamydial species tree confirmsan extremely dynamic history of large-scale gene birth anddeath events (fig 5b) mirroring the evolutionary pattern seen

for the large Parachlamydiaceae gene families (supplementaryfigs S2ndashS6 and table S3 Supplementary Material online)

Furthermore we searched for additional chlamydial pro-teins containing domains for proteinndashprotein interactionsuch as LRR TPR or ankyrin repeats which are often associ-ated with F-boxF-box-like BTBPOZ and RINGU-box do-mains in the large gene families This search identified a vastnumber of proteins for each domain For instance chlamydialgenomes encode 409 proteins with LRR domains nearlydoubling the amount of LRR proteins contributed by thelarge Parachlamydiaceae gene families Taken together in ad-dition to the four large gene families there is an even greaterpool of chlamydial proteins with a putative role in hostinteraction

FIG 5 The phylogeny and evolutionary history of the F-box superfamily within the Chlamydiae (A) The phylogeny of 376 proteins within theChlamydiae that harbor an F-boxF-box-like domain This domain was extracted from each protein and aligned using MAFFT Maximum-likelihoodreconstruction of the phylogeny of the superfamily was performed with FastTree2 (B) The F-box domain superfamily gene tree was reconciled with thechlamydial species tree to reconstruct the evolutionary history of this group for members of the Chlamydiae The nodes in blue indicate the predictednumber of F-box proteins and numbers on the branches depict the gains and losses The extant species are indicated with their respective counts forF-box proteins The Neochlamydia have undergone massive gains and losses after the divergence from Protochlamydia

2896

Domman et al doi101093molbevmsu227 MBE

To gain a broader overview of the occurrence of F-boxF-box-like and BTBPOZ domains among other prokaryotesand eukaryotes we extracted domain abundance data fromPfam (Finn et al 2013) and included the current counts forthe Chlamydiae genomes present in this study (fig 6) Thisrevealed a striking pattern that the few bacterial groupsencoding proteins with F-box and BTB domains are almostexclusively amoeba-associated organisms These includemembers of the Legionellales (Gammaproteobacteria) theRickettsiales (Alphaproteobacteria) and the amoeba symbi-ont Amoebophilus asiaticus (Bacteroidetes) When thenumber of F-box proteins is normalized against the totalnumber of species in a given taxon the Chlamydiae lead inthe number of F-box proteins found in bacteria and evenharbor more than several lineages of eukaryotes includingthe Amoebozoa For the BTB proteins the Chlamydiaeappear to be the only bacterial lineage that harbors thisdomain It is intriguing that many of the large double-stranded DNA viruses namely the amoeba-infecting giantviruses contain many proteins with an F-box or BTB domain

Discussion

Large Gene Families Are Rare within ReducedBacterial Genomes

Chlamydial genomes are among the smallest known for pro-karyotes due to genome degradation consistent with long-term obligate associations with eukaryotic organisms(McCutcheon and Moran 2011) The genomes of host-asso-ciated bacteria such as Coxiella Mycoplasma Rickettsia spe-cies and members of the Chlamydiaceae tend to have smallor single copy gene families (Gevers et al 2004) and genefamily expansions either by gene duplication or HGT appearto have less effect on shaping the genomes of these obligatehost-associated bacteria (Bordenstein and Reznikoff 2005)However there is evidence that HGT may be more prevalentthan once thought in these organisms (Blanc et al 2007)A notable exception to this paradigm is the genome of theobligate intracellular pathogen Orientia tsutsugamushi(Rickettsiales) in which there has been massive expansionof type IV secretion system (over 350 tra-related genes) andhostndashmicrobe interaction genes (Cho et al 2007) Howeverthese expansions are thought to be the result of copiousplasmid integration and the genome is also littered withmobile elements a scenario not shared within chlamydialgenomes Here we have shown that several members ofthe Chlamydiae harbor gene families that have expanded atimmense magnitudes especially when compared with otherintracellular bacteria (fig 2) We find strong support for du-plication processes contributing to the expansion of thesefamilies thus highlighting that innovation through gene

FIG 6 Taxonomic profile of F-box and BTB domains The distributionof (A) the F-box clan and (B) the BTB-box throughout sequenced or-ganisms The size of the node indicates the number of species harboringproteins with the domain Thus larger node size indicates a largernumber of species in which a domain is found within a taxon Nodesare ordered from least to greatest by the total number of proteins thatcontain the domain within the taxon This is different than the numberof species as one species can have many proteins harboring a givendomain To reflect this disparity and to facilitate comparisons we com-puted a normalized value for each taxon that represents the number oftotal proteins divided by the number of species This normalizationvalue is represented by the width of the arc in the diagram For instancethe chlamydiae are represented by few species (small node size) but areamong the taxa containing the largest numbers of proteins with F-Boxand BTB domains (position on vertical axis) and show a high number ofproteins with these domains per species (arc width) All bacterial taxa

FIG 6 Continuedare plotted in purple and selected major eukaryotic taxa in blue TheChlamydiae are labeled in red and double-stranded DNA viruses areshown in green The data were obtained from the Pfam database foreach domain and counts were updated to reflect findings in this study

2897

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

duplication has had pronounced effects in shaping these chla-mydial genomes

Chlamydial Proteins Putatively Involved inInterference with Eukaryotic Ubiquitination Pathways

We demonstrated that the largest chlamydial gene familiesharbor proteins with domains associated with eukaryotic ubi-quitination pathways and that the chlamydial F-box super-family in particular is tremendous in size (figs 3 and 5) Arecent survey of prokaryotes found a total of 74 F-box pro-teins distributed in 22 species (Price and Kwaik 2010) whichmeans the number of these proteins present in a singleNeochlamydia genome is twice that of all previously knownbacterial F-box proteins combined The 76-member BTB su-perfamily is also remarkable in that the Parachlamydiaceae arethe only prokaryotes known to harbor this domain (fig 6)

In eukaryotes ubiquitin plays a pivotal regulatory role as aposttranslational modification that includes targeting pro-teins for degradation The process of adding ubiquitin to aprotein occurs when an assembled E3 ubiquitin ligase com-plex carrying a target protein is bound to the ubiquitin con-jugating enzyme E2 The ubiquitin ligase is a multiproteincomplex termed the Skp1-Cullin-F-box protein complex bywhich F-box proteins recruit target proteins and subsequentlybind to Skp1 which is linked to a Cullin protein (Zheng et al2002) A RINGU-box protein then serves as a linker of E2 tothe newly formed ubiquitin ligase and the transfer of theubiquitin moiety to the target occurs BTB-box proteins canhave multiple functions but chief among them is a functionalequivalent to the F-box protein in recruiting targets by bind-ing to the E3 ubiquitin ligase complex (Perez-Torrado et al2006) Proteinndashprotein interaction domains such as ankyrinkelch WD-40 LRR or TPR repeat domains are coupled toF-box and BTB-box domains and confer the specificity fortarget proteins

Given the conservation and essentiality of the ubiquitina-tion pathway throughout eukaryotes it should come as nosurprise that bacterial pathogens have engineered ways tomanipulate this pathway The use of F-box proteins appearsto be a common feature among plant pathogens such asAgrobacterium tumefaciens and Ralstonia solanacearumwhose genomes encode one and four F-box proteins respec-tively (Magori and Citovsky 2011) Intriguingly F-box proteinsseem to be a common feature of amoeba-associated bacteriaand viruses The amoeba symbiont A asiaticus a Bacteroi-detes is predicted to harbor 15 F-box proteins and until nowwas the largest known pool of these proteins among se-quenced genomes (Schmitz-Esser et al 2010) AdditionallyLegionella pneumophila exports an F-box protein coupledto ankyrin repeats termed AnkB that is essential for infectionof both human cell lines and Acanthamoeba (Price et al 2009Lomma et al 2010) AnkB blocks host proteosomal degrada-tion and thus generates increased levels of required aminoacids (Price et al 2009 Price et al 2011) Several other F-boxproteins secreted by L pneumophila have been shown tointeract with host E3 ligase complexes (Ensminger andIsberg 2010) Among the members of the Chlamydiaceae

we could not detect an F-box or BTB-box domain Howeverthere are several proteins within the Chlamydiaceae thatfunction as deubiquinating proteases such as the C tracho-matis ChlaDub1 (Misaghi et al 2006) and the recently de-scribed ChlaOTU characterized in Chlamydia caviae (Furtadoet al 2013)

We have shown experimentally that representative mem-bers of the investigated chlamydial gene families contain func-tional type III secretion signals (supplementary fig S1Supplementary Material online) Thus they likely representan extensive pool of effector proteins with a putative role inhijacking the host ubiquitination machinery perhaps in amanner similar to AnkB from Legionella by increasing nutri-ent availability In the absence of direct proteinndashprotein in-teraction data however we can only speculate as to what theinteraction partner(s) are for the members of the PEX andNEX gene families

Birth-and-Death Evolution Has Shaped LargeParachlamydiaceae Gene Families

Large gene families are thought to generally either evolve viaconcerted evolution or according to a birth-and-death model(Nei and Rooney 2005) When gene families are evolving con-certedly all members experience the same evolutionary pres-sure and evolve as a unit The gene family is marked byrecombination between members that leads to a homogeni-zation of all members and thus in the phylogenetic analysisone observes intraspecies clustering of gene family membersAlthough we do find minor evidence that recombination hasoccurred between members in the PEX and NEX gene families(supplementary fig S8 Supplementary Material online) theeffect does not appear to be that of homogenization In con-trast long branch lengths in the trees and moderate overallsequence similarity indicate these proteins have divergedquite extensively (supplementary figs S2ndashS6 SupplementaryMaterial online) As we do not observe a dominance of in-traspecies clustering in the phylogenetic analysis these genefamilies are not evolving in a fashion as would be predicted viaconcerted evolution

We provide clear evidence supporting birth-and-deathevolution of the PEX and NEX gene families which ismarked by independent gains and losses of members (Neiand Rooney 2005) We detected frequent lineage-specific du-plication and loss events leading to high rates of variation incopy number between closely related organisms (supplemen-tary table S3 Supplementary Material online) In the phylo-genetic trees of the PEX and NEX gene families we findinterspecies clustering of members which is a hallmark ofthe birth-and-death model (supplementary figs S2ndashS6Supplementary Material online) Additionally we detectedpseudogenized gene fragments of members related to thelarge gene families which is another hallmark of this modeof evolution To our knowledge this mode of evolution has sofar only been described once for a bacterial gene family(Rooney and Ward 2008)

It has been proposed that gene families that control phe-notypic characters are generally subject to birth-and-death

2898

Domman et al doi101093molbevmsu227 MBE

evolution (Nei and Rooney 2005 Nei 2007) In the case of theexpansions seen in the Parachlamydiaceae this character islikely the ability to effectively infect and replicate in protisthosts The advantage of the birth-and-death scenario as op-posed to concerted evolution is that individual members ofthe gene family are able to functionally differentiate from eachother and thus might facilitate the adaptation to new eco-logical niches (Nei 2007) Therefore a possible driver for thebirth-and-death model for the PEX and NEX families is theexploitation of new ecological niches which in this case ismost likely new host(s) species or a novel way to subvert hostcell machinery

A Confluence of Drift and Selection

Birth-and-death evolution of gene families occurs by bothadaptive processes and chance events such as genetic drift(Nei et al 2008) This is due to the fact that although geneduplications are intrinsically stochastic events their fixation isinfluenced by both selection and genetic drift If a duplicate isfixed functional divergence can occur due to relaxed selec-tion or diversifying selection in one of the gene copies aprocess known as neofunctionalization (Lynch and Conery2000) Once these genes have diverged the new functionsare then maintained in the genome via purifying selectionWe envision an evolutionary scenario where the PEX and NEXgene family members rapidly diverged either due to positivediversifying or relaxed selection after duplication likely leadingto functional diversification The window for detecting theseearly diversification processes is very small but our analysisindicates that the gene family members are now being main-tained via purifying selection

The retention of such large gene families in organisms thattypically undergo extreme genome reduction is perplexingespecially when considering the variation in copy numberbetween organisms The large number of gains and lossesfor F-box domain containing proteins across chlamydial line-ages indicate substantial fluctuations in gene content some-times over short evolutionary time (fig 5b) One describedcorollary of gene families evolving via a birth-and-deathmodel is that the copy number variation of memberswithin a gene family may vary due to chance duplicationloss events both between and within species a process coinedldquogenomic drift (Nei et al 2008) Genomic drift has been in-voked to explain the large copy number variations in severallarge gene families in eukaryotes including animal chemosen-sory receptors (Nozawa et al 2007) homeobox genes (Namand Nei 2005) and fatty-acid reductases (Eirın-Lopez et al2012)

Remarkably eukaryotic F-box and BTB-box gene familiesdemonstrate the same dramatic evolution following a rapidbirth-and-death model and extensive copy number variationis seen both between and within eukaryotic species (Xu 2006Navarro-Quezada et al 2013) This is especially true in higherplant species (Stogios et al 2005 Xu et al 2009) whereArabidopsis thaliana and Oryza sativa harbor upwards of800 F-box proteins (Xu et al 2009) and large expansionshave also been described in some nematode lineages

(Thomas 2006) A case has been made that genomic driftalso influences the evolution of these eukaryotic F-boxBTB-box protein families (Xu et al 2009) Genomic drift there-fore seems a plausible mechanism describing some aspects ofevolution of the PEX and NEX gene families in theParachlamydiaceae Genome sequences from other chlamyd-ial genomes and data from within populations would allowfurther testing of this hypothesis

Another possibility is that the expansion of these families isthe result of selection pressure due to a coevolutionary armsrace with host counterpart protein(s) Under this scenariothe expansion and diversification of these families are in directresponse to changes in target proteins These evolutionarydynamics also referred to as the Red Queen hypothesis(Valen 1974) have been most exemplified in bacteriawithin plantndashpathogen relationships most notably betweenthe pathogen Pseudomonas syringae and its host Ar thaliana(Ma et al 2006 Baltrus et al 2011) Copy number variation ofeffector proteins has been shown for Ps syringae and it isspeculated that these differences confer differential hostranges (Baltrus et al 2011) It is plausible therefore that var-iations in copy number between the PEX and NEX gene fam-ilies is influenced by the host range of particular chlamydiallineages This would imply that these proteins interact withmany targets from a narrow host range or that there are alimited number of targets for a large number of possible hostsIt seems most parsimonious that the latter was the case andthat these genes serve as accessory virulence factors allowingthese chlamydiae to expand their host range However thefailure to detect positive selection as the major force acting onthe PEX and NEX gene family members casts doubts on acoevolutionary arms race as the sole mechanism for the evo-lution of these gene families

Another conceivable scenario might be that selection pres-sure for increased gene dosage has contributed to the expan-sion of the large chlamydial gene families In reduced genomeswhere the pool of regulatory proteins is limited a path forincreased protein expression might be gene duplication Thismode of ldquogene amplification has been described for instancefor the increase of antibiotic resistance genes in E coli thecholera toxin gene in Vibrio cholerae and capsule biosynthesisgenes in Haemophilus influenza (reviewed in Andersson andHughes 2009) Although a gene dosage scenario cannot befully ruled out for the PEX and NEX gene families it seemsunlikely as the high divergence of their members makes acompletely overlapping function highly improbableAdditionally gene amplifications are in response to strongselection pressure and would therefore be under strong pos-itive selection It is also known that once these selection pres-sures are removed the amplified gene copies are rapidly lostwithin the population often within several generations(Andersson and Hughes 2009)

We favor a hypothesis that incorporates both selectionand chance into the equation We envision that the expan-sion of these gene families is reflective of their role in hostndashmicrobe interaction where they are interfering with the hostubiquitination pathway The expansions may reflect either achange in the environmental niche perhaps the ability to

2899

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

infect a new host organism or they may be in response to agrowing number of targets within a current host Because thePEX and NEX gene families have respectively expanded in theProtochlamydia and Neochlamydia they appear to providelineage-specific functions related to particular host interac-tions The PEX and NEX variation in copy number betweenclosely related organisms may be reflective of genomic drift inwhich the independent gain and loss of members within afamily has been determined to some extent by chanceevents The fact that members of the Chlamydiae includingpathogenic Chlamydia also harbor various proteins to sub-vert the host ubiquitination pathway indicates a case of con-vergent evolution toward exploitation of this system withinthis phylum

To summarize the Chlamydiae harbor several lineage-spe-cific gene families which are the largest among intracellularmicrobes with small genomes Experimental evidence andcomputational analysis strongly indicate that members ofthese large gene families function as effector proteins involvedin manipulating the ubiquitination machinery of their eukary-otic host cells The large chlamydial gene families follow abirth-and-death model of evolution where genomic driftmay influence copy number variation This might representa previously undescribed mechanism by which organismswith limited exposure to larger gene pools generate geneticdiversity

Materials and Methods

Genome Sequencing

We sequenced the genomes of P amoebophila EI2 (Schmitz-Esser et al 2008) Parachlamydia acanthamoebae OEW1(Heinz et al 2007) Neochlamydia sp TUME1 (Fritsche et al2000) and Neochlamydia sp EPS4 The Acanthamoeba spharboring the latter was isolated from pond sediment fromElba Italy Cells and DNA were prepared as described previ-ously (Schmitz-Esser et al 2010) All four genomes were se-quenced using 454 technology and assemblies wereperformed using Newbler 26 We implemented an in-housepipeline for genome annotation that combines multipleapproaches for gene calling and function prediction Genecalling was performed by combining ab initio predictionsfrom GeneMark (Besemer et al 2001) Glimmer v302 (Del-cher et al 2007) Prodigal (Hyatt et al 2010) and Critica v105(Badger and Olsen 1999) with homology information derivedfrom a BLAST search against National Center for Biotechnol-ogy Information nonredundant protein database (NCBIResource Coordinators 2014) RNA genes were called bytRNAscan-SE (Lowe and Eddy 1997) RNAmmer (Lagesenet al 2007) and Rfam (Griffiths-Jones et al 2005) Functionprediction was performed via BLAST against the UniProtdatabase (UniProt Consortium 2014) and domain predictionwas performed via InterProScan 5 (Jones et al 2014)Completeness estimates were performed based on the pres-ence of single-copy marker genes (n = 54) found in 99 of allbacterial genomes Sequences have been deposited at Gen-bankEMBLDDBJ under accession numbers PRJNA242498PRJNA242497 PRJNA242499 and PRJNA242500

Comparative Genome Analysis

Orthologous protein groups were calculated using OrthoMCL(Li et al 2003) with default parameters using the predictedproteomes from 19 chlamydial organisms (supplementarytable S2 Supplementary Material online) For those gene fam-ilies under analysis membership to a given family was furtherevaluated by alignment and assessment of the phylogenetictrees Members that had obvious major differences in thealignment and trees were dubbed spurious and likely tohave been grouped due to homology in the repeat regionand thus were removed from the gene family Whole-genomealignments were performed using the MAUVE progressive-Mauve algorithm (Darling et al 2010) The MAUVE align-ments and local synteny plots were visualized in R (v 301)with the genoPlotR package (Guy et al 2010)

Species Tree Construction

Phylogeny of the Chlamydiae was reconstructed using 32phylogenetic marker proteins (supplementary table S2Supplementary Material online) Multiple sequence align-ments were performed with MAFFT (Katoh and Standley2013) using the settings ldquondashmaxiterate 1000 and all align-ments subsequently concatenated using FASconCAT v10(Keurouck and Meusemann 2010) Maximum-likelihood analysiswas performed with RAxML (Stamatakis 2006) with 1000bootstrap iterations under the PROTGAMMAGTR modeland Bayesian inference was performed with MrBayes v32(Ronquist and Huelsenbeck 2003) using the mixed aminoacid model and standard settings via the CIPRES science gate-way (Miller et al 2010) Alignment and tree files are availableas supplementary material Supplementary Material online

Gene Family Analysis

As multiple sequence alignments of repeat containing pro-teins are not trivial we employed several methods to obtainreliable alignments We compared the alignments producedby MUSCLE (Edgar 2004) MAFFT using the ldquogenapairsoption (Katoh and Standley 2013) and DIALIGN-PFAM (AlAit et al 2013) all with and without trimming by Gblocksusing relaxed parameters ldquo-b5=h (Castresana 2000) In nearlyall cases the MAFFT alignment combined with Gblocksyielded the best alignment as judged by manual inspectionThe exceptions were that DIALIGN-PFAM with Gblocks wasthe best method for the NEX1b and PEX1 alignments Toensure robustness we calculated neighbor joining maximumlikelihood and Bayesian trees for each of the alignment datasets and selected the most supported tree for the final anal-ysis In all cases the Bayesian trees yielded the most supportwhich were calculated on the CIPRES gateway (Miller et al2010) with MrBayes as mentioned above for the species treeFor protein domain phylogenies we extracted and aligned thedomains using the hmmalign program from HMMER3 pack-age (Eddy 2011) and phylogenetic trees for the domains werecalculated using FastTree2 (Price et al 2010) Reconciliationsof gene trees to the species tree to infer gene gain and losswere performed in Notung v26 (Stolzer et al 2012) and theroot for the gene trees was assigned within the program to

2900

Domman et al doi101093molbevmsu227 MBE

achieve the lowest duplication-to-loss ratio Phylogenetictrees were visualized using iTOL (Letunic and Bork 2007) orthe ETE2 toolkit (Huerta-Cepas et al 2010) Alignment andtree files are available as supplementary materialSupplementary Material online

Detection of Selection and Recombination

Whole-protein alignments were converted to codon align-ments via the Pal2Nal program (Suyama et al 2006) Todetect the mode of selection acting on these gene familieswe used CodeML from the PAML package (Yang 2007) Onlysequences that were nearly full length and not contig fusionswere used for the CodeML analysis We additionally used themodified NeindashGojobori method for codon selection from theMEGA v51 (Tamura et al 2011) program with 1000 boot-straps and treating missing data with pairwise deletions Weused the RDP4 software suite (Martin et al 2010) to detectrecombination events which is an amalgamation of manyindividual recombination programs linked into one softwarearchitecture We employed seven recombination detectionprograms that include RDP (Martin and Rybicki 2000)GENECONV (Sawyer 1989) MaxChi (Smith 1992) BootScan(Martin et al 2005) Chimaera (Posada and Crandall 2001)SiScan (Gibbs et al 2000) and 3Seq (Boni et al 2007) Onlyevents that were predicted with more than four programswere considered

Intergenic Sequences Analysis

The proteome of Protochlamydia was used as query(tBLASTn) against the intergenic regions The hits were con-servatively filtered based on size (4 50 aa) bitscore (4 50)and E value (lt1010) For each intergenic region when twoquery proteins overlap only the best hit was considered Inaddition if different parts of the same protein had hits in oneintergenic space only the top scoring (bitscore) was consid-ered Putative domains in the intergenic regions were de-tected using InterProScan 5 (Jones et al 2014)

Domain Distribution

For each domain of interest we downloaded the Pfam HMMmodel (Finn et al 2013) and scanned all chlamydial prote-omes using the hmmscan program from the HMMER3 suite(Eddy 2011) In the case of the F-box and F-box-like domainsthe results were combined into a nonredundant list The datafor the other taxa were obtained through the Pfam websiteunder the species distribution tab for each domain or clan asin the case for the F-box The networks were created using theldquoarcdiagramrdquo package in R

Type III Secretion Analysis

Fusion proteins containing the 50-part of the genes of interest(including the first 20 codons) and the adenylate cyclase Cyawere expressed in Sh Flexneri SF401 and SF620 derivatives ofthe wild-type strain M90T in which the mxiD and ipaB geneshave been inactivated (Allaoui et al 1993) The 50-part of thetarget genes were amplified by polymerase chain reaction andcloned in the puc19cya vector as described (Subtil et al 2001)

Secretion assays were performed on 30 ml of exponentiallygrown cultures as described previously (Subtil et al 2001)Antibodies against CRP a cytosolic marker were used to es-timate the contamination of supernatant fractions with bac-terial proteins as a result of bacterial lysis Antibodies againstIpaD a type-III-secreted protein of Shigella were used to verifythat type III secretion occurred normally in the transformedstrains A monoclonal antibody against Cya and polyclonalantibodies against CRP and IpaD were generously provided byDrs N Guiso A Ullmann and C Parsot respectively (InstitutPasteur Paris) Prediction of type III secretion was performedvia the Effective database (Jehl et al 2011) and the PSORTbwebserver (Yu et al 2010)

Supplementary MaterialSupplementary tables S1ndashS3 and figures S1ndashS8 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Thomas Penz and Gabriele Schwammel are acknowledged forisolating genomic DNA This work was supported byAustrian Science Fund FWF grants Y277-B06 and I1628-B22European Research Council ERC StG EVOCHLAMY (281633)and Marie Curie Initial Training Network SymbiomicsGenome sequencing was performed at the NorwegianSequencing Center

ReferencesAl Ait L Yamak Z Morgenstern B 2013 DIALIGN at GOBICSmdashmultiple

sequence alignment using various sources of external informationNucleic Acids Res 41W3ndashW7

Allaoui A Sansonetti PJ Parsot C 1993 MxiD an outer membraneprotein necessary for the secretion of the Shigella flexneri lpa inva-sins Mol Microbiol 759ndash68

Andersson DI Hughes D 2009 Gene amplification and adaptive evolu-tion in bacteria Annu Rev Genet 43167ndash195

Angot A Vergunst A Genin S Peeters N 2007 Exploitationof eukaryotic ubiquitin signaling pathways by effectorstranslocated by bacterial type III and type IV Secretion SystemsPLoS Pathog 3e3

Arnold R Brandmaier S Kleine F Tischler P Heinz E Behrens SNiinikoski A Mewes HW Horn M Rattei T 2009 Sequence-based prediction of type III secreted proteins PLoS Pathog 5e1000376

Badger JH Olsen GJ 1999 CRITICA coding region identification toolinvoking comparative analysis Mol Biol Evol 16512ndash524

Balakirev MY Tcherniuk SO Jaquinod M Chroboczek J 2003 Otubainsa new family of cysteine proteases in the ubiquitin pathway EMBORep 4517ndash522

Baltrus DA Nishimura MT Romanchuk A Chang JH Mukhtar MSCherkis K Roach J Grant SR Jones CD Dangl JL 2011 Dynamicevolution of pathogenicity revealed by sequencing and comparativegenomics of 19 Pseudomonas syringae isolates PLoS Pathog 7e1002132

Bertelli C Collyn F Croxatto A Reurouckert C Polkinghorne A Kebbi-Beghdadi C Goesmann A Vaughan L Greub G 2010 TheWaddlia genome a window into chlamydial biology PLoS One 5e10890

2901

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

Besemer J Lomsadze A Borodovsky M 2001 GeneMarkS a self-trainingmethod for prediction of gene starts in microbial genomesImplications for finding sequence motifs in regulatory regionsNucleic Acids Res 292607ndash2618

Betts HJ Wolf K Fields KA 2009 Effector protein modulation of hostcells examples in the Chlamydia spp arsenal Curr Opin Microbiol1281ndash87

Blanc G Ogata H Robert C Audic S Claverie J-M Raoult D 2007Lateral gene transfer between obligate intracellular bacteria evi-dence from the Rickettsia massiliae genome Genome Res 171657ndash1664

Boni MF Posada D Feldman MW 2007 An exact nonparametricmethod for inferring mosaic structure in sequence tripletsGenetics 1761035ndash1047

Bordenstein SR Reznikoff WS 2005 Mobile DNA in obligate intracellularbacteria Nat Rev Microbiol 3688ndash699

Bratlie MS Johansen J Sherman BT Huang DW Lempicki RA Drabloslashs F2010 Gene duplications in prokaryotes can be associated with en-vironmental adaptation BMC Genomics 11588

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Cho N-H Kim H-R Lee J-H Kim SY Kim J Cha S Kim SY Darby ACFuxelius HH Yin J et al 2007 The Orientia tsutsugamushi genome re-veals massive proliferation of conjugative type IV secretion systemand host-cell interaction genes Proc Natl Acad Sci U S A 1047981ndash7986

Collingro A Tischler P Weinmaier T Penz T Heinz E Brunham RC ReadTD Bavoil PM Sachse K Kahane S et al 2011 Unity in varietymdashthe pan-genome of the Chlamydiae Mol Biol Evol 283253ndash3270

Darling AE Mau B Perna NT 2010 progressiveMauve multiple genomealignment with gene gain loss and rearrangement PLoS One 5e11147

Delcher AL Bratke KA Powers EC Salzberg SL 2007 Identifying bacterialgenes and endosymbiont DNA with Glimmer Bioinformatics 23673ndash679

Eddy SR 2011 Accelerated profile hmm searches PLoS Comput Biol 7e1002195

Edgar RC 2004 MUSCLE multiple sequence alignmentwith high accuracy and high throughput Nucleic Acids Res 321792ndash1797

Eirın-Lopez JM Rebordinos L Rooney AP Rozas J 2012 The birth-and-death evolution of multigene families revisited Genome Dyn7170ndash196

Ensminger AW Isberg RR 2010 E3 ubiquitin ligase activity and targetingof Bat3 by multiple Legionella pneumophila translocated substratesInfect Immun 783905ndash3919

Finn RD Bateman A Clements J Coggill P Eberhardt RY Eddy SR HegerA Hetherington K Holm L Mistry J et al 2013 Pfam the proteinfamilies database Nucleic Acids Res 42D222ndashD230

Fritsche TR Horn M Wagner M Herwig RP Schleifer K-H Gautom RK2000 Phylogenetic diversity among geographically dispersedChlamydiales endosymbionts recovered from clinical and environ-mental isolates of Acanthamoeba spp Appl Environ Microbiol 662613ndash2619

Furtado AR Essid M Perrinet S Bala~na ME Yoder N Dehoux P Subtil A2013 The chlamydial OTU domain-containing protein ChlaOTU isan early type III secretion effector targeting ubiquitin and NDP52Cell Microbiol 152064ndash2079

Gevers D Vandepoele K Simillon C Van de Peer Y 2004 Gene dupli-cation and biased functional retention of paralogs in bacterial ge-nomes Trends Microbiol 12148ndash154

Gibbs MJ Armstrong JS Gibbs AJ 2000 Sister-scanning a Monte Carloprocedure for assessing signals in recombinant sequencesBioinformatics 16573ndash582

Goldman BS Nierman WC Kaiser D Slater SC Durkin AS Eisen JARonning CM Barbazuk WB Blanchard M Field C et al 2006Evolution of sensory complexity recorded in a myxobacterialgenome Proc Natl Acad Sci U S A 10315200ndash15205

Gomes JP Nunes A Bruno WJ Borrego MJ Florindo C Dean D 2006Polymorphisms in the nine polymorphic membrane proteins ofChlamydia trachomatis across all serovars evidence for serovar Darecombination and correlation with tissue tropism J Bacteriol 188275ndash286

Griffiths-Jones S Moxon S Marshall M Khanna A Eddy SR Bateman A2005 Rfam annotating non-coding RNAs in complete genomesNucleic Acids Res 33D121ndashD124

Grimwood J Stephens RS 1999 Computational analysis of thepolymorphic membrane protein superfamily of Chlamydia tracho-matis and Chlamydia pneumoniae Microb Comp Genomics 4187ndash201

Guy L Kultima JR Andersson SGE 2010 genoPlotR comparative geneand genome visualization in R Bioinformatics 262334ndash2335

Heinz E Kolarov I Keuroastner C Toenshoff ER Wagner M Horn M 2007An Acanthamoeba sp containing two phylogenetically differentbacterial endosymbionts Environ Microbiol 91604ndash1609

Hooper SD Berg OG 2003 On the nature of gene innovation duplica-tion patterns in microbial genomes Mol Biol Evol 20945ndash954

Horn M 2008 Chlamydiae as symbionts in eukaryotes Annu RevMicrobiol 62113ndash131

Horn M Collingro A Schmitz-Esser S Beier CL Purkhold U Fartmann BBrandt P Nyakatura GJ Droege M Frishman D et al 2004Illuminating the evolutionary history of chlamydiae Science 304728ndash730

Huerta-Cepas J Dopazo J Gabaldon T 2010 ETE a python environmentfor tree exploration BMC Bioinformatics 1124

Hyatt D Chen G-L LoCascio PF Land ML Larimer FW Hauser LJ 2010Prodigal prokaryotic gene recognition and translation initiation siteidentification BMC Bioinformatics 11119

Jehl M-A Arnold R Rattei T 2011 Effectivemdasha database ofpredicted secreted bacterial proteins Nucleic Acids Res 39D591ndashD595

Jones P Binns D Chang H-Y Fraser M Li W McAnulla C McWilliam HMaslen J Mitchell A Nuka G et al 2014 InterProScan 5 genome-scale protein function classification Bioinformatics 301236ndash1240

Kaessmann H 2010 Origins evolution and phenotypic impact of newgenes Genome Res 201313ndash1326

Kalman S Mitchell W Marathe R Lammel C Fan J Hyman RW OlingerL Grimwood J Davis RW Stephens RS 1999 Comparative genomesof Chlamydia pneumoniae and C trachomatis Nat Genet 21385ndash389

Kamneva OK Knight SJ Liberles DA Ward NL 2012 Analysis of genomecontent evolution in PVC bacterial super-phylum assessment ofcandidate genes associated with cellular organization and lifestyleGenome Biol Evol 41375ndash1390

Katoh K Standley DM 2013 MAFFT multiple sequence alignment soft-ware version 7 improvements in performance and usability Mol BiolEvol 30772ndash780

Koonin EV Wolf YI Karev GP 2002 The structure of the protein uni-verse and genome evolution Nature 420218ndash223

Keurouck P Meusemann K 2010 FASconCAT convenient handling of datamatrices Mol Phylogenet Evol 561115ndash1118

Kuo C-H Ochman H 2009a Deletional bias across the three domains oflife Genome Biol Evol 1145ndash152

Kuo C-H Ochman H 2009b The fate of new bacterial genes FEMSMicrobiol Rev 3338ndash43

Lagesen K Hallin P Roslashdland EA Staefeldt H-H Rognes T Ussery DW2007 RNAmmer consistent and rapid annotation of ribosomalRNA genes Nucleic Acids Res 353100ndash3108

Lagkouvardos I Weinmaier T Lauro FM Cavicchioli R Rattei T Horn M2013 Integrating metagenomic and amplicon databases to re-solve the phylogenetic and ecological diversity of the ChlamydiaeISME J 8115ndash125

Lerat E Daubin V Ochman H Moran NA 2005 Evolutionary origins ofgenomic repertoires in bacteria PLoS Biol 3e130

Letunic I Bork P 2007 Interactive Tree Of Life (iTOL) an online tool forphylogenetic tree display and annotation Bioinformatics 23127ndash128

2902

Domman et al doi101093molbevmsu227 MBE

Li L Stoeckert CJ Jr Roos DS 2003 OrthoMCL identification ofortholog groups for eukaryotic genomes Genome Res 132178ndash2189

Lomma M Dervins-Ravault D Rolando M Nora T Newton HJ SansomFM Sahr T Gomez-Valero L Jules M Hartland EL et al 2010The Legionella pneumophila F-box protein Lpp2082 (AnkB) modu-lates ubiquitination of the host protein parvin B and promotesintracellular replication Cell Microbiol 121272ndash1291

Lowe TM Eddy SR 1997 tRNAscan-SE a program for improved detec-tion of transfer RNA genes in genomic sequence Nucleic Acids Res25955ndash964

Lynch M Conery JS 2000 The evolutionary fate and consequences ofduplicate genes Science 2901151ndash1155

Ma W Dong FFT Stavrinides J Guttman DS 2006 Type III effec-tor diversification via both pathoadaptation and horizontaltransfer in response to a coevolutionary arms race PLoS Genet 2e209

Magori S Citovsky V 2011 Hijacking of the host SCF ubiquitin ligasemachinery by plant pathogens Front Plant Sci 287

Martin D Rybicki E 2000 RDP detection of recombination amongstaligned sequences Bioinformatics 16562ndash563

Martin DP Lemey P Lott M Moulton V Posada D Lefeuvre P 2010RDP3 a flexible and fast computer program for analyzing recombi-nation Bioinformatics 262462ndash2463

Martin DP Posada D Crandall KA Williamson C 2005 A modifiedbootscan algorithm for automated identification of recombinantsequences and recombination breakpoints AIDS Res HumRetroviruses 2198ndash102

McCutcheon JP Moran NA 2011 Extreme genome reduction in sym-biotic bacteria Nat Rev Microbiol 1013ndash26

McLeod MP Warren RL Hsiao WWL Araki N Myhre M Fernandes CMiyazawa D Wong W Lillquist AL Wang D et al 2006 Thecomplete genome of Rhodococcus sp RHA1 provides insightsinto a catabolic powerhouse Proc Natl Acad Sci U S A 10315582ndash15587

Miller MA Pfeiffer W Schwartz T 2010 Creating the CIPRES sciencegateway for inference of large phylogenetic trees In Proceedings ofthe Gateway Computing Environments Workshop (GCE) 2010 p1ndash8

Misaghi S Balsara ZR Catic A Spooner E Ploegh HL Starnbach MN2006 Chlamydia trachomatis-derived deubiquitinating enzymes inmammalian cells during infection Mol Microbiol 61142ndash150

Myers GSA Crabtree J Huot CH 2012 Deep and wide comparativegenomics of Chlamydia In Tan M Bavoil PM editors Intracellularpathogens Chlamydiales Washington (DC) ASM Press p 27ndash50

Nam J Nei M 2005 Evolutionary change of the numbers of homeoboxgenes in bilateral animals Mol Biol Evol 222386ndash2394

Navarro-Quezada A Schumann N Quint M 2013 Plant F-box proteinevolution is determined by lineage-specific timing of major genefamily expansion waves PLoS One 8e68672

NCBI Resource Coordinators 2014 Database resources of the NationalCenter for Biotechnology Information Nucleic Acids Res 42D7ndashD17

Nei M 2007 The new mutation theory of phenotypic evolution ProcNatl Acad Sci U S A 10412235ndash12242

Nei M Niimura Y Nozawa M 2008 The evolution of animal chemo-sensory receptor gene repertoires roles of chance and necessity NatRev Genet 9951ndash963

Nei M Rooney AP 2005 Concerted and birth-and-death evolution ofmultigene families Annu Rev Genet 39121ndash152

Nogueira T Touchon M Rocha EPC 2012 Rapid evolution of the se-quences and gene repertoires of secreted proteins in bacteria PLoSOne 7e49403

Nozawa M Kawahara Y Nei M 2007 Genomic drift and copy numbervariation of sensory receptor genes in humans Proc Natl Acad SciU S A 10420421ndash20426

Ochman H Lawrence JG Groisman EA 2000 Lateral gene transfer andthe nature of bacterial innovation Nature 405299ndash304

Perez-Torrado R Yamada D Defossez P-A 2006 Born to bind the BTBproteinndashprotein interaction domain BioEssays 281194ndash1202

Peters J Wilson DP Myers G Timms P Bavoil PM 2007 Type III secre-tion a la Chlamydia Trends Microbiol 15241ndash251

Posada D Crandall KA 2001 Evaluation of methods for detecting re-combination from DNA sequences computer simulations Proc NatlAcad Sci U S A 9813757ndash13762

Price CT Kwaik YA 2010 Exploitation of host polyubiquitination ma-chinery through molecular mimicry by eukaryotic-like bacterial F-box effectors Cell Infect Microbiol 1122

Price CT Al-Khodor S Al-Quadan T Santic M Habyarimana F Kalia AKwaik YA 2009 Molecular mimicry by an F-box effector ofLegionella pneumophila hijacks a conserved polyubiquitinationmachinery within macrophages and protozoa PLoS Pathog 5e1000704

Price CTD Al-Quadan T Santic M Rosenshine I Kwaik YA 2011 Hostproteasomal degradation generates amino acids essential for intra-cellular bacterial growth Science 3341553ndash1557

Price MN Dehal PS Arkin AP 2010 FastTree 2mdashapproximately maxi-mum-likelihood trees for large alignments PLoS One 5e9490

Pushker R Mira A Rodriguez-Valera F 2004 Comparative geno-mics of gene-family size in closely related bacteria Genome Biol 5R27

Ronquist F Huelsenbeck JP 2003 MrBayes 3 Bayesian phylogeneticinference under mixed models Bioinformatics 191572ndash1574

Rooney AP Ward TJ 2008 Birth-and-death evolution of the internalinmultigene family in Listeria Gene 427124ndash128

Sakakibara A Hattori S 2000 Chat a CasHEF1-associated adaptor pro-tein that integrates multiple signaling pathways J Biol Chem 2756404ndash6410

Santoyo G Romero D 2005 Gene conversion and concerted evolutionin bacterial genomes FEMS Microbiol Rev 29169ndash183

Sawyer S 1989 Statistical tests for detecting gene conversion Mol BiolEvol 6526ndash538

Schmitz-Esser S Tischler P Arnold R Montanaro J Wagner M Rattei THorn M 2010 The genome of the amoeba symbiont ldquoCandidatusAmoebophilus asiaticusrdquo reveals common mechanisms for host cellinteraction among amoeba-associated bacteria J Bacteriol 1921045ndash1057

Schmitz-Esser S Toenshoff ER Haider S Heinz E Hoenninger VMWagner M Horn M 2008 Diversity of bacterial endosymbionts ofenvironmental Acanthamoeba isolates Appl Environ Microbiol 745822ndash5831

Smith JM 1992 Analyzing the mosaic structure of genes J Mol Evol 34126ndash129

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Stephens RS Kalman S Lammel C Fan J Marathe R Aravind L MitchellW Olinger L Tatusov RL Zhao Q et al 1998 Genome sequence ofan obligate intracellular pathogen of humans Chlamydia trachoma-tis Science 282754ndash759

Stogios PJ Downs GS Jauhal JJ Nandra SK Prive GG 2005Sequence and structural analysis of BTB domain proteins GenomeBiol 6R82

Stolzer M Lai H Xu M Sathaye D Vernot B Durand D 2012Inferring duplications losses transfers and incomplete line-age sorting with nonbinary species trees Bioinformatics 28i409ndashi415

Subtil A Collingro A Horn M 2014 Tracing the primordial Chlamydiaeextinct parasites of plants Trends Plant Sci 1936ndash43

Subtil A Parsot C Dautry-Varsat A 2001 Secretion of predicted Incproteins of Chlamydia pneumoniae by a heterologous type III ma-chinery Mol Microbiol 39792ndash800

Suyama M Torrents D Bork P 2006 PAL2NAL robust conversion ofprotein sequence alignments into the corresponding codon align-ments Nucleic Acids Res 34W609ndashW612

2903

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

Tamura K Peterson D Peterson N Stecher G Nei M Kumar S 2011MEGA5 molecular evolutionary genetics analysis using maximumlikelihood evolutionary distance and maximum parsimony meth-ods Mol Biol Evol 282731ndash2739

Thomas JH 2006 Adaptive evolution in two large families of ubiquitin-ligase adapters in nematodes and plants Genome Res 161017ndash1030

Treangen TJ Rocha EPC 2011 Horizontal transfer not duplicationdrives the expansion of protein families in prokaryotes PLoSGenet 7e1001284

UniProt Consortium 2014 Activities at the Universal Protein Resource(UniProt) Nucleic Acids Res 42D191ndashD198

Valen LV 1974 Molecular evolution as predicted by natural selectionJ Mol Evol 389ndash101

Willems AR Schwab M Tyers M 2004 A hitchhikerrsquos guide to the cullinubiquitin ligases SCF and its kin Biochim Biophys Acta 1695133ndash170

Xu G Ma H Nei M Kong H 2009 Evolution of F-box genes in plantsdifferent modes of sequence divergence and their relationships withfunctional diversification Proc Natl Acad Sci U S A 106835ndash840

Xu J 2006 Microbial ecology in the age of genomics andmetagenomics concepts tools and recent advances Mol Ecol 151713ndash1731

Yang J Lusk R Li W-H 2003 Organismal complexity protein complex-ity and gene duplicability Proc Natl Acad Sci U S A 10015661ndash15665

Yang Z 2007 PAML 4 phylogenetic analysis by maximum likelihoodMol Biol Evol 241586ndash1591

Yu NY Wagner JR Laird MR Melli G Rey S Lo R Dao P Sahinalp SCEster M Foster LJ et al 2010 PSORTb 30 improved protein sub-cellular localization prediction with refined localization subcate-gories and predictive capabilities for all prokaryotes Bioinformatics261608ndash1615

Zhang J 2003 Evolution by gene duplication an update Trends EcolEvol 18292ndash298

Zheng N Schulman BA Song L Miller JJ Jeffrey PD Wang P Chu CKoepp DM Elledge SJ Pagano M et al 2002 Structure of the Cul1ndashRbx1ndashSkp1ndashF boxSkp2 SCF ubiquitin ligase complex Nature 416703ndash709

2904

Domman et al doi101093molbevmsu227 MBE

Page 6: Massive Expansion of Ubiquitination-Related Gene Families

in domain architecture among these proteins interestinglymirrors that of their eukaryotic counterparts (Perez-Torradoet al 2006 Xu 2006)

A Pool of Putative Effector Proteins

If the members of the largest gene families in the phylumChlamydiae serve as effector proteins for host manipulationthey would need to be secreted and transported to the hostcell cytosol This is typically achieved through a type III secre-tion system a well-conserved virulence mechanism amongthe Chlamydiae which has been shown to translocate severalcharacterized effectors (Peters et al 2007 Betts et al 2009)Indeed many of the proteins found within the expandedNeochlamydia and Protochlamydia gene families are pre-dicted by computational analysis to be secreted by the typeIII secretion system and to be extracellular host associatedWithin the NEX1 family 66 of 138 (49) members are pre-dicted to be secreted A total of 37 (51) and 10 (37) werepredicted to be secreted from within the PEX1 and PEX2 genefamilies respectively The NEX2 gene family had the fewestpredicted with only two members

As the identification of the signal for secretion via the typeIII secretion system is inherently difficult (Arnold et al 2009)we tested representatives of each of the four largest genefamilies in vitro using a heterologous type III secretion sub-strate assay with Shigella flexneri as a host for protein expres-sion This assay has been used to successfully characterize typeIII secretion effector proteins from the chlamydiae before(Subtil et al 2001) and because of the lack of routine genetictools the Sh flexneri system is an attractive surrogate methodfor analyzing type III secretion in chlamydiae in vivo Thisexperiment demonstrated that the tested members ofNEX1 NEX2 PEX1 and PEX2 contain a functional type IIIsecretion recognition signal (supplementary fig S1Supplementary Material online) Together with the presenceof eukaryotic-like domains and computational predictionsthis strongly indicates that these gene families are largepools of effector proteins

Molecular Evolution of Large Gene Families

To better understand how these large gene families may haveevolved we reconstructed their phylogenetic relationshipsGene family trees were calculated using conserved sitesamong the protein alignment (supplementary figs S2ndashS6Supplementary Material online) The average amino acididentity between members ranges from 45 to 64 withthe most closely related sequences belonging to PEX2 Treetopologies suggest that the members of all four gene familieshave rapidly diverged as indicated by their long branchlengths Although the number of LRR and TPR domainsvaries dramatically between 1 and 39 this had no apparenteffect on the phylogenetic placement

We find clear cases in which the orthologs of two speciesgroup together indicating expansions have occurred beforespeciation (supplementary figs S2ndashS6 SupplementaryMaterial online) Alternatively expansions post speciation isapparent in all gene families Reconciliation of gene family

trees with the species tree indicates that in addition to ex-pansions many gene losses have occurred for each genefamily (supplementary table S3 Supplementary Materialonline) For instance for the NEX1 family there have been78 expansion events whereas 33 losses have occurred attrib-uted to 13 and 20 losses in Neochlamydia spp EPS4 andTUME1 respectively There were nearly equal losses betweenEPS4 and TUME1 (13 and 10) in NEX2 and a total of 46expansions Similarly PEX1 consists of 52 expansions and15 and 9 losses in Protochlamydia amoebophila EI2 andUWE25 respectively

A Birth-and-Death Model of Evolution

A pattern of differential gain loss and maintenance of genefamily members is strongly indicative of these gene familiesevolving according to a birth-and-death model (Nei andRooney 2005) Because of this differential maintenance ofgene family members the hallmarks of the birth-and-deathmodel are interspecies clustering of members in the phyloge-netic trees and the presence of pseudogenes from degradedmembers (Nei 2007) As we observe interspecies clustering forthe PEX and NEX gene families (supplementary figs S2ndashS6Supplementary Material online) we also tested for pseudo-genization events in the intergenic regions of the P amoebo-phila UWE25 genome (the draft Neochlamydia genomes areless suitable for this analysis) By utilizing BLAST we searchedfor matches to the predicted proteome using all intergenicregions as a query We then mapped the best BLAST hitsrepresenting 116 pseudogenes to their respective gene fam-ilies to get a picture of a given familiesrsquo representation in theintergenic regions The most represented gene family in theintergenic regions surprisingly was PEX1 (16 pseudogenes)For PEX2 one pseudogenized fragment was detected Theobserved presence of interspecies clustering and pseudo-genes and the dynamic history of gains and losses withinthe gene families are indicative of a birth-and-death modelof evolution

In contrast if these families were evolving via concertedevolution the phylogenetic trees would depict intraspeciesclustering that is that members of a gene family will be morehomologous to the other members from the same organismthan to that of other species Intraspecies clustering occursdue to repeated recombination among gene family memberswithin a genome leading to an overall high sequence similar-ity of all members a process known as gene conversion(Santoyo and Romero 2005) We thus tested for the possibil-ity of recombination within the gene families using the meth-ods implemented in the RDP4 software suite (Martin et al2010) Care must be taken however when assessing theimpact of recombination among divergent proteins as therecombination signal is quite error prone when proteins shareless than 70 similarity and these methods are heavily de-pendent on the alignment (Martin et al 2010) We detectsome recombination events between members within thePEX and NEX gene families however the majority of thepredictions are only marginally significant (the consensusscores are below the confidence threshold of 06) In the

2894

Domman et al doi101093molbevmsu227 MBE

NEX1a family the portion of the sequences most identified asrecombinant is the F-box domain (supplementary fig S8Supplementary Material online) This should be judiciouslyinterpreted as this might represent sequence similarity due topurifying selection operating on this domain Overall we didnot find convincing evidence that recombination has played amajor role in shaping the evolution of the PEX and NEX genefamilies

Gene Duplications and Purifying Selection

Gene family expansions can be the result of either gene du-plication or HGT The disentangling of these events is nottrivial and in fact may be impossible in the case of the genefamilies investigated here due to lineage-specific evolutionand the absence of clear homologs in other bacterial taxa(Kuo and Ochman 2009b) However several lines of evidencesuggest that regardless of the initial origin of these genes geneduplication processes have played a clear role in the evolutionthe large Parachlamydiaceae gene families First a hallmark ofgene duplication is the presence of tandem arrays of genecopies In this regard we find several large tandem arrays withmembers of the large gene families in Protochlamydia andNeochlamydia including examples for recent duplications(fig 4 supplementary fig S7 Supplementary Materialonline) In the P amoebophila UWE25 genome we detected47 tandem duplication events (distance within 10 genes) rep-resented by only 13 gene families Nearly half of these are thecontribution of PEX1 (n = 20) however they appear in severalclusters spread throughout the genome PEX2 demonstratesthe most dramatic case of a tandem array consisting of 13members After the PEX gene families the third largesttandem array only comprised three members intriguinglyalso F-box domain containing proteins NEX1 has 50 instances(36) where members are found within five genes of eachother Second the majority of the large gene family membersmeet the generally used criterion for identification of paralogsthat is they show at least 30 amino acid sequence identityover at least 60 of the protein length Thirdly phylogenetictrees show several species-specific expansion events

(including those genes still arranged in tandem arrays)which are best explained by gene duplication

Effector proteins have been shown to be among the fastestevolving proteins in a number of pathogen genomes(Nogueira et al 2012) and are often shown to be under pos-itive selection To assess the mode of selection acting onmembers of the expanded gene families we calculated theratio of nonsynonymous (dN) versus synonymous (dS) sub-stitution rates for each of the four gene families A dNdS ratioequal to 1 indicates a neutral state of selection whereas valueshigher or lower than 1 indicate positive or purifying selectionrespectively Global estimates of dNdS under the Model M0from CodeML (Yang 2007) ranged from 029 (NEX2) to 063(NEX1a) Additionally using pairwise sequence comparisonsfor dNdS calculation we did not find support that these genefamilies are currently evolving under positive selectionHowever the probability of these gene families evolvingunder purifying selection was highly significant (Plt 001)Therefore purifying selection is currently the dominantforce driving the evolution of these gene families and thusfacilitates their maintenance

Massive Expansion of Ubiquitination-AssociatedProteins

Driven by the discovery that members of the four largestParachlamydiaceae gene families showed rapid divergenceand are kept in chlamydial genomes despite apparent func-tional redundancy we asked whether there are additionalgenes not included in these gene families but encoding similarfunctional domains The common theme of the largeParachlamydiaceae gene families is the presence of domainsthat serve in the recruitment of target proteins to the eukary-otic ubiquitination machinery We therefore extracted all pro-teins containing F-boxF-box-like BTBPOZ and RINGU-boxdomains by scanning all chlamydial proteomes with eachrespective HMM profile We found no RINGU-box contain-ing proteins apart from those identified earlier as members ofNEX1b and we detected only few additional proteins inNeochlamydia and Parachlamydia harboring a BTBPOZdomain similar to those of PEX1 However our search

FIG 4 Example of duplication of BTB-box proteins in the Protochlamydia A new duplicate has arisen in Protochlamydia amoebophila UWE25 (shownvia the arrow) after the split from P amoebophila EI2 BTB-box proteins are indicated in purple The phylogenetic placement of these proteins(supplementary fig S5 Supplementary Material online) supports this scenario Orthologous proteins between the Protochlamydia are indicated byconnecting blocks

2895

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

unveiled an astonishing number of proteins harboring F-boxF-box-like domains with over 370 proteins within thephylum Nearly 300 of the F-box proteins are the contributionof the two Neochlamydia species (129 in TUME1 and 158 inEPS4) To characterize the relationships among this F-boxsuperfamily we constructed a phylogenetic tree based on adomain alignment This analysis shows that many of the ad-ditional F-box proteins found in Neochlamydia cluster witheither NEX1a or NEX2 (fig 5a) We also see several lineage-specific expansions of F-box proteins within Protochlamydiaand Parachlamydia species Reconciliation of the F-boxsuperfamily tree with the chlamydial species tree confirmsan extremely dynamic history of large-scale gene birth anddeath events (fig 5b) mirroring the evolutionary pattern seen

for the large Parachlamydiaceae gene families (supplementaryfigs S2ndashS6 and table S3 Supplementary Material online)

Furthermore we searched for additional chlamydial pro-teins containing domains for proteinndashprotein interactionsuch as LRR TPR or ankyrin repeats which are often associ-ated with F-boxF-box-like BTBPOZ and RINGU-box do-mains in the large gene families This search identified a vastnumber of proteins for each domain For instance chlamydialgenomes encode 409 proteins with LRR domains nearlydoubling the amount of LRR proteins contributed by thelarge Parachlamydiaceae gene families Taken together in ad-dition to the four large gene families there is an even greaterpool of chlamydial proteins with a putative role in hostinteraction

FIG 5 The phylogeny and evolutionary history of the F-box superfamily within the Chlamydiae (A) The phylogeny of 376 proteins within theChlamydiae that harbor an F-boxF-box-like domain This domain was extracted from each protein and aligned using MAFFT Maximum-likelihoodreconstruction of the phylogeny of the superfamily was performed with FastTree2 (B) The F-box domain superfamily gene tree was reconciled with thechlamydial species tree to reconstruct the evolutionary history of this group for members of the Chlamydiae The nodes in blue indicate the predictednumber of F-box proteins and numbers on the branches depict the gains and losses The extant species are indicated with their respective counts forF-box proteins The Neochlamydia have undergone massive gains and losses after the divergence from Protochlamydia

2896

Domman et al doi101093molbevmsu227 MBE

To gain a broader overview of the occurrence of F-boxF-box-like and BTBPOZ domains among other prokaryotesand eukaryotes we extracted domain abundance data fromPfam (Finn et al 2013) and included the current counts forthe Chlamydiae genomes present in this study (fig 6) Thisrevealed a striking pattern that the few bacterial groupsencoding proteins with F-box and BTB domains are almostexclusively amoeba-associated organisms These includemembers of the Legionellales (Gammaproteobacteria) theRickettsiales (Alphaproteobacteria) and the amoeba symbi-ont Amoebophilus asiaticus (Bacteroidetes) When thenumber of F-box proteins is normalized against the totalnumber of species in a given taxon the Chlamydiae lead inthe number of F-box proteins found in bacteria and evenharbor more than several lineages of eukaryotes includingthe Amoebozoa For the BTB proteins the Chlamydiaeappear to be the only bacterial lineage that harbors thisdomain It is intriguing that many of the large double-stranded DNA viruses namely the amoeba-infecting giantviruses contain many proteins with an F-box or BTB domain

Discussion

Large Gene Families Are Rare within ReducedBacterial Genomes

Chlamydial genomes are among the smallest known for pro-karyotes due to genome degradation consistent with long-term obligate associations with eukaryotic organisms(McCutcheon and Moran 2011) The genomes of host-asso-ciated bacteria such as Coxiella Mycoplasma Rickettsia spe-cies and members of the Chlamydiaceae tend to have smallor single copy gene families (Gevers et al 2004) and genefamily expansions either by gene duplication or HGT appearto have less effect on shaping the genomes of these obligatehost-associated bacteria (Bordenstein and Reznikoff 2005)However there is evidence that HGT may be more prevalentthan once thought in these organisms (Blanc et al 2007)A notable exception to this paradigm is the genome of theobligate intracellular pathogen Orientia tsutsugamushi(Rickettsiales) in which there has been massive expansionof type IV secretion system (over 350 tra-related genes) andhostndashmicrobe interaction genes (Cho et al 2007) Howeverthese expansions are thought to be the result of copiousplasmid integration and the genome is also littered withmobile elements a scenario not shared within chlamydialgenomes Here we have shown that several members ofthe Chlamydiae harbor gene families that have expanded atimmense magnitudes especially when compared with otherintracellular bacteria (fig 2) We find strong support for du-plication processes contributing to the expansion of thesefamilies thus highlighting that innovation through gene

FIG 6 Taxonomic profile of F-box and BTB domains The distributionof (A) the F-box clan and (B) the BTB-box throughout sequenced or-ganisms The size of the node indicates the number of species harboringproteins with the domain Thus larger node size indicates a largernumber of species in which a domain is found within a taxon Nodesare ordered from least to greatest by the total number of proteins thatcontain the domain within the taxon This is different than the numberof species as one species can have many proteins harboring a givendomain To reflect this disparity and to facilitate comparisons we com-puted a normalized value for each taxon that represents the number oftotal proteins divided by the number of species This normalizationvalue is represented by the width of the arc in the diagram For instancethe chlamydiae are represented by few species (small node size) but areamong the taxa containing the largest numbers of proteins with F-Boxand BTB domains (position on vertical axis) and show a high number ofproteins with these domains per species (arc width) All bacterial taxa

FIG 6 Continuedare plotted in purple and selected major eukaryotic taxa in blue TheChlamydiae are labeled in red and double-stranded DNA viruses areshown in green The data were obtained from the Pfam database foreach domain and counts were updated to reflect findings in this study

2897

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

duplication has had pronounced effects in shaping these chla-mydial genomes

Chlamydial Proteins Putatively Involved inInterference with Eukaryotic Ubiquitination Pathways

We demonstrated that the largest chlamydial gene familiesharbor proteins with domains associated with eukaryotic ubi-quitination pathways and that the chlamydial F-box super-family in particular is tremendous in size (figs 3 and 5) Arecent survey of prokaryotes found a total of 74 F-box pro-teins distributed in 22 species (Price and Kwaik 2010) whichmeans the number of these proteins present in a singleNeochlamydia genome is twice that of all previously knownbacterial F-box proteins combined The 76-member BTB su-perfamily is also remarkable in that the Parachlamydiaceae arethe only prokaryotes known to harbor this domain (fig 6)

In eukaryotes ubiquitin plays a pivotal regulatory role as aposttranslational modification that includes targeting pro-teins for degradation The process of adding ubiquitin to aprotein occurs when an assembled E3 ubiquitin ligase com-plex carrying a target protein is bound to the ubiquitin con-jugating enzyme E2 The ubiquitin ligase is a multiproteincomplex termed the Skp1-Cullin-F-box protein complex bywhich F-box proteins recruit target proteins and subsequentlybind to Skp1 which is linked to a Cullin protein (Zheng et al2002) A RINGU-box protein then serves as a linker of E2 tothe newly formed ubiquitin ligase and the transfer of theubiquitin moiety to the target occurs BTB-box proteins canhave multiple functions but chief among them is a functionalequivalent to the F-box protein in recruiting targets by bind-ing to the E3 ubiquitin ligase complex (Perez-Torrado et al2006) Proteinndashprotein interaction domains such as ankyrinkelch WD-40 LRR or TPR repeat domains are coupled toF-box and BTB-box domains and confer the specificity fortarget proteins

Given the conservation and essentiality of the ubiquitina-tion pathway throughout eukaryotes it should come as nosurprise that bacterial pathogens have engineered ways tomanipulate this pathway The use of F-box proteins appearsto be a common feature among plant pathogens such asAgrobacterium tumefaciens and Ralstonia solanacearumwhose genomes encode one and four F-box proteins respec-tively (Magori and Citovsky 2011) Intriguingly F-box proteinsseem to be a common feature of amoeba-associated bacteriaand viruses The amoeba symbiont A asiaticus a Bacteroi-detes is predicted to harbor 15 F-box proteins and until nowwas the largest known pool of these proteins among se-quenced genomes (Schmitz-Esser et al 2010) AdditionallyLegionella pneumophila exports an F-box protein coupledto ankyrin repeats termed AnkB that is essential for infectionof both human cell lines and Acanthamoeba (Price et al 2009Lomma et al 2010) AnkB blocks host proteosomal degrada-tion and thus generates increased levels of required aminoacids (Price et al 2009 Price et al 2011) Several other F-boxproteins secreted by L pneumophila have been shown tointeract with host E3 ligase complexes (Ensminger andIsberg 2010) Among the members of the Chlamydiaceae

we could not detect an F-box or BTB-box domain Howeverthere are several proteins within the Chlamydiaceae thatfunction as deubiquinating proteases such as the C tracho-matis ChlaDub1 (Misaghi et al 2006) and the recently de-scribed ChlaOTU characterized in Chlamydia caviae (Furtadoet al 2013)

We have shown experimentally that representative mem-bers of the investigated chlamydial gene families contain func-tional type III secretion signals (supplementary fig S1Supplementary Material online) Thus they likely representan extensive pool of effector proteins with a putative role inhijacking the host ubiquitination machinery perhaps in amanner similar to AnkB from Legionella by increasing nutri-ent availability In the absence of direct proteinndashprotein in-teraction data however we can only speculate as to what theinteraction partner(s) are for the members of the PEX andNEX gene families

Birth-and-Death Evolution Has Shaped LargeParachlamydiaceae Gene Families

Large gene families are thought to generally either evolve viaconcerted evolution or according to a birth-and-death model(Nei and Rooney 2005) When gene families are evolving con-certedly all members experience the same evolutionary pres-sure and evolve as a unit The gene family is marked byrecombination between members that leads to a homogeni-zation of all members and thus in the phylogenetic analysisone observes intraspecies clustering of gene family membersAlthough we do find minor evidence that recombination hasoccurred between members in the PEX and NEX gene families(supplementary fig S8 Supplementary Material online) theeffect does not appear to be that of homogenization In con-trast long branch lengths in the trees and moderate overallsequence similarity indicate these proteins have divergedquite extensively (supplementary figs S2ndashS6 SupplementaryMaterial online) As we do not observe a dominance of in-traspecies clustering in the phylogenetic analysis these genefamilies are not evolving in a fashion as would be predicted viaconcerted evolution

We provide clear evidence supporting birth-and-deathevolution of the PEX and NEX gene families which ismarked by independent gains and losses of members (Neiand Rooney 2005) We detected frequent lineage-specific du-plication and loss events leading to high rates of variation incopy number between closely related organisms (supplemen-tary table S3 Supplementary Material online) In the phylo-genetic trees of the PEX and NEX gene families we findinterspecies clustering of members which is a hallmark ofthe birth-and-death model (supplementary figs S2ndashS6Supplementary Material online) Additionally we detectedpseudogenized gene fragments of members related to thelarge gene families which is another hallmark of this modeof evolution To our knowledge this mode of evolution has sofar only been described once for a bacterial gene family(Rooney and Ward 2008)

It has been proposed that gene families that control phe-notypic characters are generally subject to birth-and-death

2898

Domman et al doi101093molbevmsu227 MBE

evolution (Nei and Rooney 2005 Nei 2007) In the case of theexpansions seen in the Parachlamydiaceae this character islikely the ability to effectively infect and replicate in protisthosts The advantage of the birth-and-death scenario as op-posed to concerted evolution is that individual members ofthe gene family are able to functionally differentiate from eachother and thus might facilitate the adaptation to new eco-logical niches (Nei 2007) Therefore a possible driver for thebirth-and-death model for the PEX and NEX families is theexploitation of new ecological niches which in this case ismost likely new host(s) species or a novel way to subvert hostcell machinery

A Confluence of Drift and Selection

Birth-and-death evolution of gene families occurs by bothadaptive processes and chance events such as genetic drift(Nei et al 2008) This is due to the fact that although geneduplications are intrinsically stochastic events their fixation isinfluenced by both selection and genetic drift If a duplicate isfixed functional divergence can occur due to relaxed selec-tion or diversifying selection in one of the gene copies aprocess known as neofunctionalization (Lynch and Conery2000) Once these genes have diverged the new functionsare then maintained in the genome via purifying selectionWe envision an evolutionary scenario where the PEX and NEXgene family members rapidly diverged either due to positivediversifying or relaxed selection after duplication likely leadingto functional diversification The window for detecting theseearly diversification processes is very small but our analysisindicates that the gene family members are now being main-tained via purifying selection

The retention of such large gene families in organisms thattypically undergo extreme genome reduction is perplexingespecially when considering the variation in copy numberbetween organisms The large number of gains and lossesfor F-box domain containing proteins across chlamydial line-ages indicate substantial fluctuations in gene content some-times over short evolutionary time (fig 5b) One describedcorollary of gene families evolving via a birth-and-deathmodel is that the copy number variation of memberswithin a gene family may vary due to chance duplicationloss events both between and within species a process coinedldquogenomic drift (Nei et al 2008) Genomic drift has been in-voked to explain the large copy number variations in severallarge gene families in eukaryotes including animal chemosen-sory receptors (Nozawa et al 2007) homeobox genes (Namand Nei 2005) and fatty-acid reductases (Eirın-Lopez et al2012)

Remarkably eukaryotic F-box and BTB-box gene familiesdemonstrate the same dramatic evolution following a rapidbirth-and-death model and extensive copy number variationis seen both between and within eukaryotic species (Xu 2006Navarro-Quezada et al 2013) This is especially true in higherplant species (Stogios et al 2005 Xu et al 2009) whereArabidopsis thaliana and Oryza sativa harbor upwards of800 F-box proteins (Xu et al 2009) and large expansionshave also been described in some nematode lineages

(Thomas 2006) A case has been made that genomic driftalso influences the evolution of these eukaryotic F-boxBTB-box protein families (Xu et al 2009) Genomic drift there-fore seems a plausible mechanism describing some aspects ofevolution of the PEX and NEX gene families in theParachlamydiaceae Genome sequences from other chlamyd-ial genomes and data from within populations would allowfurther testing of this hypothesis

Another possibility is that the expansion of these families isthe result of selection pressure due to a coevolutionary armsrace with host counterpart protein(s) Under this scenariothe expansion and diversification of these families are in directresponse to changes in target proteins These evolutionarydynamics also referred to as the Red Queen hypothesis(Valen 1974) have been most exemplified in bacteriawithin plantndashpathogen relationships most notably betweenthe pathogen Pseudomonas syringae and its host Ar thaliana(Ma et al 2006 Baltrus et al 2011) Copy number variation ofeffector proteins has been shown for Ps syringae and it isspeculated that these differences confer differential hostranges (Baltrus et al 2011) It is plausible therefore that var-iations in copy number between the PEX and NEX gene fam-ilies is influenced by the host range of particular chlamydiallineages This would imply that these proteins interact withmany targets from a narrow host range or that there are alimited number of targets for a large number of possible hostsIt seems most parsimonious that the latter was the case andthat these genes serve as accessory virulence factors allowingthese chlamydiae to expand their host range However thefailure to detect positive selection as the major force acting onthe PEX and NEX gene family members casts doubts on acoevolutionary arms race as the sole mechanism for the evo-lution of these gene families

Another conceivable scenario might be that selection pres-sure for increased gene dosage has contributed to the expan-sion of the large chlamydial gene families In reduced genomeswhere the pool of regulatory proteins is limited a path forincreased protein expression might be gene duplication Thismode of ldquogene amplification has been described for instancefor the increase of antibiotic resistance genes in E coli thecholera toxin gene in Vibrio cholerae and capsule biosynthesisgenes in Haemophilus influenza (reviewed in Andersson andHughes 2009) Although a gene dosage scenario cannot befully ruled out for the PEX and NEX gene families it seemsunlikely as the high divergence of their members makes acompletely overlapping function highly improbableAdditionally gene amplifications are in response to strongselection pressure and would therefore be under strong pos-itive selection It is also known that once these selection pres-sures are removed the amplified gene copies are rapidly lostwithin the population often within several generations(Andersson and Hughes 2009)

We favor a hypothesis that incorporates both selectionand chance into the equation We envision that the expan-sion of these gene families is reflective of their role in hostndashmicrobe interaction where they are interfering with the hostubiquitination pathway The expansions may reflect either achange in the environmental niche perhaps the ability to

2899

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

infect a new host organism or they may be in response to agrowing number of targets within a current host Because thePEX and NEX gene families have respectively expanded in theProtochlamydia and Neochlamydia they appear to providelineage-specific functions related to particular host interac-tions The PEX and NEX variation in copy number betweenclosely related organisms may be reflective of genomic drift inwhich the independent gain and loss of members within afamily has been determined to some extent by chanceevents The fact that members of the Chlamydiae includingpathogenic Chlamydia also harbor various proteins to sub-vert the host ubiquitination pathway indicates a case of con-vergent evolution toward exploitation of this system withinthis phylum

To summarize the Chlamydiae harbor several lineage-spe-cific gene families which are the largest among intracellularmicrobes with small genomes Experimental evidence andcomputational analysis strongly indicate that members ofthese large gene families function as effector proteins involvedin manipulating the ubiquitination machinery of their eukary-otic host cells The large chlamydial gene families follow abirth-and-death model of evolution where genomic driftmay influence copy number variation This might representa previously undescribed mechanism by which organismswith limited exposure to larger gene pools generate geneticdiversity

Materials and Methods

Genome Sequencing

We sequenced the genomes of P amoebophila EI2 (Schmitz-Esser et al 2008) Parachlamydia acanthamoebae OEW1(Heinz et al 2007) Neochlamydia sp TUME1 (Fritsche et al2000) and Neochlamydia sp EPS4 The Acanthamoeba spharboring the latter was isolated from pond sediment fromElba Italy Cells and DNA were prepared as described previ-ously (Schmitz-Esser et al 2010) All four genomes were se-quenced using 454 technology and assemblies wereperformed using Newbler 26 We implemented an in-housepipeline for genome annotation that combines multipleapproaches for gene calling and function prediction Genecalling was performed by combining ab initio predictionsfrom GeneMark (Besemer et al 2001) Glimmer v302 (Del-cher et al 2007) Prodigal (Hyatt et al 2010) and Critica v105(Badger and Olsen 1999) with homology information derivedfrom a BLAST search against National Center for Biotechnol-ogy Information nonredundant protein database (NCBIResource Coordinators 2014) RNA genes were called bytRNAscan-SE (Lowe and Eddy 1997) RNAmmer (Lagesenet al 2007) and Rfam (Griffiths-Jones et al 2005) Functionprediction was performed via BLAST against the UniProtdatabase (UniProt Consortium 2014) and domain predictionwas performed via InterProScan 5 (Jones et al 2014)Completeness estimates were performed based on the pres-ence of single-copy marker genes (n = 54) found in 99 of allbacterial genomes Sequences have been deposited at Gen-bankEMBLDDBJ under accession numbers PRJNA242498PRJNA242497 PRJNA242499 and PRJNA242500

Comparative Genome Analysis

Orthologous protein groups were calculated using OrthoMCL(Li et al 2003) with default parameters using the predictedproteomes from 19 chlamydial organisms (supplementarytable S2 Supplementary Material online) For those gene fam-ilies under analysis membership to a given family was furtherevaluated by alignment and assessment of the phylogenetictrees Members that had obvious major differences in thealignment and trees were dubbed spurious and likely tohave been grouped due to homology in the repeat regionand thus were removed from the gene family Whole-genomealignments were performed using the MAUVE progressive-Mauve algorithm (Darling et al 2010) The MAUVE align-ments and local synteny plots were visualized in R (v 301)with the genoPlotR package (Guy et al 2010)

Species Tree Construction

Phylogeny of the Chlamydiae was reconstructed using 32phylogenetic marker proteins (supplementary table S2Supplementary Material online) Multiple sequence align-ments were performed with MAFFT (Katoh and Standley2013) using the settings ldquondashmaxiterate 1000 and all align-ments subsequently concatenated using FASconCAT v10(Keurouck and Meusemann 2010) Maximum-likelihood analysiswas performed with RAxML (Stamatakis 2006) with 1000bootstrap iterations under the PROTGAMMAGTR modeland Bayesian inference was performed with MrBayes v32(Ronquist and Huelsenbeck 2003) using the mixed aminoacid model and standard settings via the CIPRES science gate-way (Miller et al 2010) Alignment and tree files are availableas supplementary material Supplementary Material online

Gene Family Analysis

As multiple sequence alignments of repeat containing pro-teins are not trivial we employed several methods to obtainreliable alignments We compared the alignments producedby MUSCLE (Edgar 2004) MAFFT using the ldquogenapairsoption (Katoh and Standley 2013) and DIALIGN-PFAM (AlAit et al 2013) all with and without trimming by Gblocksusing relaxed parameters ldquo-b5=h (Castresana 2000) In nearlyall cases the MAFFT alignment combined with Gblocksyielded the best alignment as judged by manual inspectionThe exceptions were that DIALIGN-PFAM with Gblocks wasthe best method for the NEX1b and PEX1 alignments Toensure robustness we calculated neighbor joining maximumlikelihood and Bayesian trees for each of the alignment datasets and selected the most supported tree for the final anal-ysis In all cases the Bayesian trees yielded the most supportwhich were calculated on the CIPRES gateway (Miller et al2010) with MrBayes as mentioned above for the species treeFor protein domain phylogenies we extracted and aligned thedomains using the hmmalign program from HMMER3 pack-age (Eddy 2011) and phylogenetic trees for the domains werecalculated using FastTree2 (Price et al 2010) Reconciliationsof gene trees to the species tree to infer gene gain and losswere performed in Notung v26 (Stolzer et al 2012) and theroot for the gene trees was assigned within the program to

2900

Domman et al doi101093molbevmsu227 MBE

achieve the lowest duplication-to-loss ratio Phylogenetictrees were visualized using iTOL (Letunic and Bork 2007) orthe ETE2 toolkit (Huerta-Cepas et al 2010) Alignment andtree files are available as supplementary materialSupplementary Material online

Detection of Selection and Recombination

Whole-protein alignments were converted to codon align-ments via the Pal2Nal program (Suyama et al 2006) Todetect the mode of selection acting on these gene familieswe used CodeML from the PAML package (Yang 2007) Onlysequences that were nearly full length and not contig fusionswere used for the CodeML analysis We additionally used themodified NeindashGojobori method for codon selection from theMEGA v51 (Tamura et al 2011) program with 1000 boot-straps and treating missing data with pairwise deletions Weused the RDP4 software suite (Martin et al 2010) to detectrecombination events which is an amalgamation of manyindividual recombination programs linked into one softwarearchitecture We employed seven recombination detectionprograms that include RDP (Martin and Rybicki 2000)GENECONV (Sawyer 1989) MaxChi (Smith 1992) BootScan(Martin et al 2005) Chimaera (Posada and Crandall 2001)SiScan (Gibbs et al 2000) and 3Seq (Boni et al 2007) Onlyevents that were predicted with more than four programswere considered

Intergenic Sequences Analysis

The proteome of Protochlamydia was used as query(tBLASTn) against the intergenic regions The hits were con-servatively filtered based on size (4 50 aa) bitscore (4 50)and E value (lt1010) For each intergenic region when twoquery proteins overlap only the best hit was considered Inaddition if different parts of the same protein had hits in oneintergenic space only the top scoring (bitscore) was consid-ered Putative domains in the intergenic regions were de-tected using InterProScan 5 (Jones et al 2014)

Domain Distribution

For each domain of interest we downloaded the Pfam HMMmodel (Finn et al 2013) and scanned all chlamydial prote-omes using the hmmscan program from the HMMER3 suite(Eddy 2011) In the case of the F-box and F-box-like domainsthe results were combined into a nonredundant list The datafor the other taxa were obtained through the Pfam websiteunder the species distribution tab for each domain or clan asin the case for the F-box The networks were created using theldquoarcdiagramrdquo package in R

Type III Secretion Analysis

Fusion proteins containing the 50-part of the genes of interest(including the first 20 codons) and the adenylate cyclase Cyawere expressed in Sh Flexneri SF401 and SF620 derivatives ofthe wild-type strain M90T in which the mxiD and ipaB geneshave been inactivated (Allaoui et al 1993) The 50-part of thetarget genes were amplified by polymerase chain reaction andcloned in the puc19cya vector as described (Subtil et al 2001)

Secretion assays were performed on 30 ml of exponentiallygrown cultures as described previously (Subtil et al 2001)Antibodies against CRP a cytosolic marker were used to es-timate the contamination of supernatant fractions with bac-terial proteins as a result of bacterial lysis Antibodies againstIpaD a type-III-secreted protein of Shigella were used to verifythat type III secretion occurred normally in the transformedstrains A monoclonal antibody against Cya and polyclonalantibodies against CRP and IpaD were generously provided byDrs N Guiso A Ullmann and C Parsot respectively (InstitutPasteur Paris) Prediction of type III secretion was performedvia the Effective database (Jehl et al 2011) and the PSORTbwebserver (Yu et al 2010)

Supplementary MaterialSupplementary tables S1ndashS3 and figures S1ndashS8 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Thomas Penz and Gabriele Schwammel are acknowledged forisolating genomic DNA This work was supported byAustrian Science Fund FWF grants Y277-B06 and I1628-B22European Research Council ERC StG EVOCHLAMY (281633)and Marie Curie Initial Training Network SymbiomicsGenome sequencing was performed at the NorwegianSequencing Center

ReferencesAl Ait L Yamak Z Morgenstern B 2013 DIALIGN at GOBICSmdashmultiple

sequence alignment using various sources of external informationNucleic Acids Res 41W3ndashW7

Allaoui A Sansonetti PJ Parsot C 1993 MxiD an outer membraneprotein necessary for the secretion of the Shigella flexneri lpa inva-sins Mol Microbiol 759ndash68

Andersson DI Hughes D 2009 Gene amplification and adaptive evolu-tion in bacteria Annu Rev Genet 43167ndash195

Angot A Vergunst A Genin S Peeters N 2007 Exploitationof eukaryotic ubiquitin signaling pathways by effectorstranslocated by bacterial type III and type IV Secretion SystemsPLoS Pathog 3e3

Arnold R Brandmaier S Kleine F Tischler P Heinz E Behrens SNiinikoski A Mewes HW Horn M Rattei T 2009 Sequence-based prediction of type III secreted proteins PLoS Pathog 5e1000376

Badger JH Olsen GJ 1999 CRITICA coding region identification toolinvoking comparative analysis Mol Biol Evol 16512ndash524

Balakirev MY Tcherniuk SO Jaquinod M Chroboczek J 2003 Otubainsa new family of cysteine proteases in the ubiquitin pathway EMBORep 4517ndash522

Baltrus DA Nishimura MT Romanchuk A Chang JH Mukhtar MSCherkis K Roach J Grant SR Jones CD Dangl JL 2011 Dynamicevolution of pathogenicity revealed by sequencing and comparativegenomics of 19 Pseudomonas syringae isolates PLoS Pathog 7e1002132

Bertelli C Collyn F Croxatto A Reurouckert C Polkinghorne A Kebbi-Beghdadi C Goesmann A Vaughan L Greub G 2010 TheWaddlia genome a window into chlamydial biology PLoS One 5e10890

2901

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

Besemer J Lomsadze A Borodovsky M 2001 GeneMarkS a self-trainingmethod for prediction of gene starts in microbial genomesImplications for finding sequence motifs in regulatory regionsNucleic Acids Res 292607ndash2618

Betts HJ Wolf K Fields KA 2009 Effector protein modulation of hostcells examples in the Chlamydia spp arsenal Curr Opin Microbiol1281ndash87

Blanc G Ogata H Robert C Audic S Claverie J-M Raoult D 2007Lateral gene transfer between obligate intracellular bacteria evi-dence from the Rickettsia massiliae genome Genome Res 171657ndash1664

Boni MF Posada D Feldman MW 2007 An exact nonparametricmethod for inferring mosaic structure in sequence tripletsGenetics 1761035ndash1047

Bordenstein SR Reznikoff WS 2005 Mobile DNA in obligate intracellularbacteria Nat Rev Microbiol 3688ndash699

Bratlie MS Johansen J Sherman BT Huang DW Lempicki RA Drabloslashs F2010 Gene duplications in prokaryotes can be associated with en-vironmental adaptation BMC Genomics 11588

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Cho N-H Kim H-R Lee J-H Kim SY Kim J Cha S Kim SY Darby ACFuxelius HH Yin J et al 2007 The Orientia tsutsugamushi genome re-veals massive proliferation of conjugative type IV secretion systemand host-cell interaction genes Proc Natl Acad Sci U S A 1047981ndash7986

Collingro A Tischler P Weinmaier T Penz T Heinz E Brunham RC ReadTD Bavoil PM Sachse K Kahane S et al 2011 Unity in varietymdashthe pan-genome of the Chlamydiae Mol Biol Evol 283253ndash3270

Darling AE Mau B Perna NT 2010 progressiveMauve multiple genomealignment with gene gain loss and rearrangement PLoS One 5e11147

Delcher AL Bratke KA Powers EC Salzberg SL 2007 Identifying bacterialgenes and endosymbiont DNA with Glimmer Bioinformatics 23673ndash679

Eddy SR 2011 Accelerated profile hmm searches PLoS Comput Biol 7e1002195

Edgar RC 2004 MUSCLE multiple sequence alignmentwith high accuracy and high throughput Nucleic Acids Res 321792ndash1797

Eirın-Lopez JM Rebordinos L Rooney AP Rozas J 2012 The birth-and-death evolution of multigene families revisited Genome Dyn7170ndash196

Ensminger AW Isberg RR 2010 E3 ubiquitin ligase activity and targetingof Bat3 by multiple Legionella pneumophila translocated substratesInfect Immun 783905ndash3919

Finn RD Bateman A Clements J Coggill P Eberhardt RY Eddy SR HegerA Hetherington K Holm L Mistry J et al 2013 Pfam the proteinfamilies database Nucleic Acids Res 42D222ndashD230

Fritsche TR Horn M Wagner M Herwig RP Schleifer K-H Gautom RK2000 Phylogenetic diversity among geographically dispersedChlamydiales endosymbionts recovered from clinical and environ-mental isolates of Acanthamoeba spp Appl Environ Microbiol 662613ndash2619

Furtado AR Essid M Perrinet S Bala~na ME Yoder N Dehoux P Subtil A2013 The chlamydial OTU domain-containing protein ChlaOTU isan early type III secretion effector targeting ubiquitin and NDP52Cell Microbiol 152064ndash2079

Gevers D Vandepoele K Simillon C Van de Peer Y 2004 Gene dupli-cation and biased functional retention of paralogs in bacterial ge-nomes Trends Microbiol 12148ndash154

Gibbs MJ Armstrong JS Gibbs AJ 2000 Sister-scanning a Monte Carloprocedure for assessing signals in recombinant sequencesBioinformatics 16573ndash582

Goldman BS Nierman WC Kaiser D Slater SC Durkin AS Eisen JARonning CM Barbazuk WB Blanchard M Field C et al 2006Evolution of sensory complexity recorded in a myxobacterialgenome Proc Natl Acad Sci U S A 10315200ndash15205

Gomes JP Nunes A Bruno WJ Borrego MJ Florindo C Dean D 2006Polymorphisms in the nine polymorphic membrane proteins ofChlamydia trachomatis across all serovars evidence for serovar Darecombination and correlation with tissue tropism J Bacteriol 188275ndash286

Griffiths-Jones S Moxon S Marshall M Khanna A Eddy SR Bateman A2005 Rfam annotating non-coding RNAs in complete genomesNucleic Acids Res 33D121ndashD124

Grimwood J Stephens RS 1999 Computational analysis of thepolymorphic membrane protein superfamily of Chlamydia tracho-matis and Chlamydia pneumoniae Microb Comp Genomics 4187ndash201

Guy L Kultima JR Andersson SGE 2010 genoPlotR comparative geneand genome visualization in R Bioinformatics 262334ndash2335

Heinz E Kolarov I Keuroastner C Toenshoff ER Wagner M Horn M 2007An Acanthamoeba sp containing two phylogenetically differentbacterial endosymbionts Environ Microbiol 91604ndash1609

Hooper SD Berg OG 2003 On the nature of gene innovation duplica-tion patterns in microbial genomes Mol Biol Evol 20945ndash954

Horn M 2008 Chlamydiae as symbionts in eukaryotes Annu RevMicrobiol 62113ndash131

Horn M Collingro A Schmitz-Esser S Beier CL Purkhold U Fartmann BBrandt P Nyakatura GJ Droege M Frishman D et al 2004Illuminating the evolutionary history of chlamydiae Science 304728ndash730

Huerta-Cepas J Dopazo J Gabaldon T 2010 ETE a python environmentfor tree exploration BMC Bioinformatics 1124

Hyatt D Chen G-L LoCascio PF Land ML Larimer FW Hauser LJ 2010Prodigal prokaryotic gene recognition and translation initiation siteidentification BMC Bioinformatics 11119

Jehl M-A Arnold R Rattei T 2011 Effectivemdasha database ofpredicted secreted bacterial proteins Nucleic Acids Res 39D591ndashD595

Jones P Binns D Chang H-Y Fraser M Li W McAnulla C McWilliam HMaslen J Mitchell A Nuka G et al 2014 InterProScan 5 genome-scale protein function classification Bioinformatics 301236ndash1240

Kaessmann H 2010 Origins evolution and phenotypic impact of newgenes Genome Res 201313ndash1326

Kalman S Mitchell W Marathe R Lammel C Fan J Hyman RW OlingerL Grimwood J Davis RW Stephens RS 1999 Comparative genomesof Chlamydia pneumoniae and C trachomatis Nat Genet 21385ndash389

Kamneva OK Knight SJ Liberles DA Ward NL 2012 Analysis of genomecontent evolution in PVC bacterial super-phylum assessment ofcandidate genes associated with cellular organization and lifestyleGenome Biol Evol 41375ndash1390

Katoh K Standley DM 2013 MAFFT multiple sequence alignment soft-ware version 7 improvements in performance and usability Mol BiolEvol 30772ndash780

Koonin EV Wolf YI Karev GP 2002 The structure of the protein uni-verse and genome evolution Nature 420218ndash223

Keurouck P Meusemann K 2010 FASconCAT convenient handling of datamatrices Mol Phylogenet Evol 561115ndash1118

Kuo C-H Ochman H 2009a Deletional bias across the three domains oflife Genome Biol Evol 1145ndash152

Kuo C-H Ochman H 2009b The fate of new bacterial genes FEMSMicrobiol Rev 3338ndash43

Lagesen K Hallin P Roslashdland EA Staefeldt H-H Rognes T Ussery DW2007 RNAmmer consistent and rapid annotation of ribosomalRNA genes Nucleic Acids Res 353100ndash3108

Lagkouvardos I Weinmaier T Lauro FM Cavicchioli R Rattei T Horn M2013 Integrating metagenomic and amplicon databases to re-solve the phylogenetic and ecological diversity of the ChlamydiaeISME J 8115ndash125

Lerat E Daubin V Ochman H Moran NA 2005 Evolutionary origins ofgenomic repertoires in bacteria PLoS Biol 3e130

Letunic I Bork P 2007 Interactive Tree Of Life (iTOL) an online tool forphylogenetic tree display and annotation Bioinformatics 23127ndash128

2902

Domman et al doi101093molbevmsu227 MBE

Li L Stoeckert CJ Jr Roos DS 2003 OrthoMCL identification ofortholog groups for eukaryotic genomes Genome Res 132178ndash2189

Lomma M Dervins-Ravault D Rolando M Nora T Newton HJ SansomFM Sahr T Gomez-Valero L Jules M Hartland EL et al 2010The Legionella pneumophila F-box protein Lpp2082 (AnkB) modu-lates ubiquitination of the host protein parvin B and promotesintracellular replication Cell Microbiol 121272ndash1291

Lowe TM Eddy SR 1997 tRNAscan-SE a program for improved detec-tion of transfer RNA genes in genomic sequence Nucleic Acids Res25955ndash964

Lynch M Conery JS 2000 The evolutionary fate and consequences ofduplicate genes Science 2901151ndash1155

Ma W Dong FFT Stavrinides J Guttman DS 2006 Type III effec-tor diversification via both pathoadaptation and horizontaltransfer in response to a coevolutionary arms race PLoS Genet 2e209

Magori S Citovsky V 2011 Hijacking of the host SCF ubiquitin ligasemachinery by plant pathogens Front Plant Sci 287

Martin D Rybicki E 2000 RDP detection of recombination amongstaligned sequences Bioinformatics 16562ndash563

Martin DP Lemey P Lott M Moulton V Posada D Lefeuvre P 2010RDP3 a flexible and fast computer program for analyzing recombi-nation Bioinformatics 262462ndash2463

Martin DP Posada D Crandall KA Williamson C 2005 A modifiedbootscan algorithm for automated identification of recombinantsequences and recombination breakpoints AIDS Res HumRetroviruses 2198ndash102

McCutcheon JP Moran NA 2011 Extreme genome reduction in sym-biotic bacteria Nat Rev Microbiol 1013ndash26

McLeod MP Warren RL Hsiao WWL Araki N Myhre M Fernandes CMiyazawa D Wong W Lillquist AL Wang D et al 2006 Thecomplete genome of Rhodococcus sp RHA1 provides insightsinto a catabolic powerhouse Proc Natl Acad Sci U S A 10315582ndash15587

Miller MA Pfeiffer W Schwartz T 2010 Creating the CIPRES sciencegateway for inference of large phylogenetic trees In Proceedings ofthe Gateway Computing Environments Workshop (GCE) 2010 p1ndash8

Misaghi S Balsara ZR Catic A Spooner E Ploegh HL Starnbach MN2006 Chlamydia trachomatis-derived deubiquitinating enzymes inmammalian cells during infection Mol Microbiol 61142ndash150

Myers GSA Crabtree J Huot CH 2012 Deep and wide comparativegenomics of Chlamydia In Tan M Bavoil PM editors Intracellularpathogens Chlamydiales Washington (DC) ASM Press p 27ndash50

Nam J Nei M 2005 Evolutionary change of the numbers of homeoboxgenes in bilateral animals Mol Biol Evol 222386ndash2394

Navarro-Quezada A Schumann N Quint M 2013 Plant F-box proteinevolution is determined by lineage-specific timing of major genefamily expansion waves PLoS One 8e68672

NCBI Resource Coordinators 2014 Database resources of the NationalCenter for Biotechnology Information Nucleic Acids Res 42D7ndashD17

Nei M 2007 The new mutation theory of phenotypic evolution ProcNatl Acad Sci U S A 10412235ndash12242

Nei M Niimura Y Nozawa M 2008 The evolution of animal chemo-sensory receptor gene repertoires roles of chance and necessity NatRev Genet 9951ndash963

Nei M Rooney AP 2005 Concerted and birth-and-death evolution ofmultigene families Annu Rev Genet 39121ndash152

Nogueira T Touchon M Rocha EPC 2012 Rapid evolution of the se-quences and gene repertoires of secreted proteins in bacteria PLoSOne 7e49403

Nozawa M Kawahara Y Nei M 2007 Genomic drift and copy numbervariation of sensory receptor genes in humans Proc Natl Acad SciU S A 10420421ndash20426

Ochman H Lawrence JG Groisman EA 2000 Lateral gene transfer andthe nature of bacterial innovation Nature 405299ndash304

Perez-Torrado R Yamada D Defossez P-A 2006 Born to bind the BTBproteinndashprotein interaction domain BioEssays 281194ndash1202

Peters J Wilson DP Myers G Timms P Bavoil PM 2007 Type III secre-tion a la Chlamydia Trends Microbiol 15241ndash251

Posada D Crandall KA 2001 Evaluation of methods for detecting re-combination from DNA sequences computer simulations Proc NatlAcad Sci U S A 9813757ndash13762

Price CT Kwaik YA 2010 Exploitation of host polyubiquitination ma-chinery through molecular mimicry by eukaryotic-like bacterial F-box effectors Cell Infect Microbiol 1122

Price CT Al-Khodor S Al-Quadan T Santic M Habyarimana F Kalia AKwaik YA 2009 Molecular mimicry by an F-box effector ofLegionella pneumophila hijacks a conserved polyubiquitinationmachinery within macrophages and protozoa PLoS Pathog 5e1000704

Price CTD Al-Quadan T Santic M Rosenshine I Kwaik YA 2011 Hostproteasomal degradation generates amino acids essential for intra-cellular bacterial growth Science 3341553ndash1557

Price MN Dehal PS Arkin AP 2010 FastTree 2mdashapproximately maxi-mum-likelihood trees for large alignments PLoS One 5e9490

Pushker R Mira A Rodriguez-Valera F 2004 Comparative geno-mics of gene-family size in closely related bacteria Genome Biol 5R27

Ronquist F Huelsenbeck JP 2003 MrBayes 3 Bayesian phylogeneticinference under mixed models Bioinformatics 191572ndash1574

Rooney AP Ward TJ 2008 Birth-and-death evolution of the internalinmultigene family in Listeria Gene 427124ndash128

Sakakibara A Hattori S 2000 Chat a CasHEF1-associated adaptor pro-tein that integrates multiple signaling pathways J Biol Chem 2756404ndash6410

Santoyo G Romero D 2005 Gene conversion and concerted evolutionin bacterial genomes FEMS Microbiol Rev 29169ndash183

Sawyer S 1989 Statistical tests for detecting gene conversion Mol BiolEvol 6526ndash538

Schmitz-Esser S Tischler P Arnold R Montanaro J Wagner M Rattei THorn M 2010 The genome of the amoeba symbiont ldquoCandidatusAmoebophilus asiaticusrdquo reveals common mechanisms for host cellinteraction among amoeba-associated bacteria J Bacteriol 1921045ndash1057

Schmitz-Esser S Toenshoff ER Haider S Heinz E Hoenninger VMWagner M Horn M 2008 Diversity of bacterial endosymbionts ofenvironmental Acanthamoeba isolates Appl Environ Microbiol 745822ndash5831

Smith JM 1992 Analyzing the mosaic structure of genes J Mol Evol 34126ndash129

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Stephens RS Kalman S Lammel C Fan J Marathe R Aravind L MitchellW Olinger L Tatusov RL Zhao Q et al 1998 Genome sequence ofan obligate intracellular pathogen of humans Chlamydia trachoma-tis Science 282754ndash759

Stogios PJ Downs GS Jauhal JJ Nandra SK Prive GG 2005Sequence and structural analysis of BTB domain proteins GenomeBiol 6R82

Stolzer M Lai H Xu M Sathaye D Vernot B Durand D 2012Inferring duplications losses transfers and incomplete line-age sorting with nonbinary species trees Bioinformatics 28i409ndashi415

Subtil A Collingro A Horn M 2014 Tracing the primordial Chlamydiaeextinct parasites of plants Trends Plant Sci 1936ndash43

Subtil A Parsot C Dautry-Varsat A 2001 Secretion of predicted Incproteins of Chlamydia pneumoniae by a heterologous type III ma-chinery Mol Microbiol 39792ndash800

Suyama M Torrents D Bork P 2006 PAL2NAL robust conversion ofprotein sequence alignments into the corresponding codon align-ments Nucleic Acids Res 34W609ndashW612

2903

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

Tamura K Peterson D Peterson N Stecher G Nei M Kumar S 2011MEGA5 molecular evolutionary genetics analysis using maximumlikelihood evolutionary distance and maximum parsimony meth-ods Mol Biol Evol 282731ndash2739

Thomas JH 2006 Adaptive evolution in two large families of ubiquitin-ligase adapters in nematodes and plants Genome Res 161017ndash1030

Treangen TJ Rocha EPC 2011 Horizontal transfer not duplicationdrives the expansion of protein families in prokaryotes PLoSGenet 7e1001284

UniProt Consortium 2014 Activities at the Universal Protein Resource(UniProt) Nucleic Acids Res 42D191ndashD198

Valen LV 1974 Molecular evolution as predicted by natural selectionJ Mol Evol 389ndash101

Willems AR Schwab M Tyers M 2004 A hitchhikerrsquos guide to the cullinubiquitin ligases SCF and its kin Biochim Biophys Acta 1695133ndash170

Xu G Ma H Nei M Kong H 2009 Evolution of F-box genes in plantsdifferent modes of sequence divergence and their relationships withfunctional diversification Proc Natl Acad Sci U S A 106835ndash840

Xu J 2006 Microbial ecology in the age of genomics andmetagenomics concepts tools and recent advances Mol Ecol 151713ndash1731

Yang J Lusk R Li W-H 2003 Organismal complexity protein complex-ity and gene duplicability Proc Natl Acad Sci U S A 10015661ndash15665

Yang Z 2007 PAML 4 phylogenetic analysis by maximum likelihoodMol Biol Evol 241586ndash1591

Yu NY Wagner JR Laird MR Melli G Rey S Lo R Dao P Sahinalp SCEster M Foster LJ et al 2010 PSORTb 30 improved protein sub-cellular localization prediction with refined localization subcate-gories and predictive capabilities for all prokaryotes Bioinformatics261608ndash1615

Zhang J 2003 Evolution by gene duplication an update Trends EcolEvol 18292ndash298

Zheng N Schulman BA Song L Miller JJ Jeffrey PD Wang P Chu CKoepp DM Elledge SJ Pagano M et al 2002 Structure of the Cul1ndashRbx1ndashSkp1ndashF boxSkp2 SCF ubiquitin ligase complex Nature 416703ndash709

2904

Domman et al doi101093molbevmsu227 MBE

Page 7: Massive Expansion of Ubiquitination-Related Gene Families

NEX1a family the portion of the sequences most identified asrecombinant is the F-box domain (supplementary fig S8Supplementary Material online) This should be judiciouslyinterpreted as this might represent sequence similarity due topurifying selection operating on this domain Overall we didnot find convincing evidence that recombination has played amajor role in shaping the evolution of the PEX and NEX genefamilies

Gene Duplications and Purifying Selection

Gene family expansions can be the result of either gene du-plication or HGT The disentangling of these events is nottrivial and in fact may be impossible in the case of the genefamilies investigated here due to lineage-specific evolutionand the absence of clear homologs in other bacterial taxa(Kuo and Ochman 2009b) However several lines of evidencesuggest that regardless of the initial origin of these genes geneduplication processes have played a clear role in the evolutionthe large Parachlamydiaceae gene families First a hallmark ofgene duplication is the presence of tandem arrays of genecopies In this regard we find several large tandem arrays withmembers of the large gene families in Protochlamydia andNeochlamydia including examples for recent duplications(fig 4 supplementary fig S7 Supplementary Materialonline) In the P amoebophila UWE25 genome we detected47 tandem duplication events (distance within 10 genes) rep-resented by only 13 gene families Nearly half of these are thecontribution of PEX1 (n = 20) however they appear in severalclusters spread throughout the genome PEX2 demonstratesthe most dramatic case of a tandem array consisting of 13members After the PEX gene families the third largesttandem array only comprised three members intriguinglyalso F-box domain containing proteins NEX1 has 50 instances(36) where members are found within five genes of eachother Second the majority of the large gene family membersmeet the generally used criterion for identification of paralogsthat is they show at least 30 amino acid sequence identityover at least 60 of the protein length Thirdly phylogenetictrees show several species-specific expansion events

(including those genes still arranged in tandem arrays)which are best explained by gene duplication

Effector proteins have been shown to be among the fastestevolving proteins in a number of pathogen genomes(Nogueira et al 2012) and are often shown to be under pos-itive selection To assess the mode of selection acting onmembers of the expanded gene families we calculated theratio of nonsynonymous (dN) versus synonymous (dS) sub-stitution rates for each of the four gene families A dNdS ratioequal to 1 indicates a neutral state of selection whereas valueshigher or lower than 1 indicate positive or purifying selectionrespectively Global estimates of dNdS under the Model M0from CodeML (Yang 2007) ranged from 029 (NEX2) to 063(NEX1a) Additionally using pairwise sequence comparisonsfor dNdS calculation we did not find support that these genefamilies are currently evolving under positive selectionHowever the probability of these gene families evolvingunder purifying selection was highly significant (Plt 001)Therefore purifying selection is currently the dominantforce driving the evolution of these gene families and thusfacilitates their maintenance

Massive Expansion of Ubiquitination-AssociatedProteins

Driven by the discovery that members of the four largestParachlamydiaceae gene families showed rapid divergenceand are kept in chlamydial genomes despite apparent func-tional redundancy we asked whether there are additionalgenes not included in these gene families but encoding similarfunctional domains The common theme of the largeParachlamydiaceae gene families is the presence of domainsthat serve in the recruitment of target proteins to the eukary-otic ubiquitination machinery We therefore extracted all pro-teins containing F-boxF-box-like BTBPOZ and RINGU-boxdomains by scanning all chlamydial proteomes with eachrespective HMM profile We found no RINGU-box contain-ing proteins apart from those identified earlier as members ofNEX1b and we detected only few additional proteins inNeochlamydia and Parachlamydia harboring a BTBPOZdomain similar to those of PEX1 However our search

FIG 4 Example of duplication of BTB-box proteins in the Protochlamydia A new duplicate has arisen in Protochlamydia amoebophila UWE25 (shownvia the arrow) after the split from P amoebophila EI2 BTB-box proteins are indicated in purple The phylogenetic placement of these proteins(supplementary fig S5 Supplementary Material online) supports this scenario Orthologous proteins between the Protochlamydia are indicated byconnecting blocks

2895

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

unveiled an astonishing number of proteins harboring F-boxF-box-like domains with over 370 proteins within thephylum Nearly 300 of the F-box proteins are the contributionof the two Neochlamydia species (129 in TUME1 and 158 inEPS4) To characterize the relationships among this F-boxsuperfamily we constructed a phylogenetic tree based on adomain alignment This analysis shows that many of the ad-ditional F-box proteins found in Neochlamydia cluster witheither NEX1a or NEX2 (fig 5a) We also see several lineage-specific expansions of F-box proteins within Protochlamydiaand Parachlamydia species Reconciliation of the F-boxsuperfamily tree with the chlamydial species tree confirmsan extremely dynamic history of large-scale gene birth anddeath events (fig 5b) mirroring the evolutionary pattern seen

for the large Parachlamydiaceae gene families (supplementaryfigs S2ndashS6 and table S3 Supplementary Material online)

Furthermore we searched for additional chlamydial pro-teins containing domains for proteinndashprotein interactionsuch as LRR TPR or ankyrin repeats which are often associ-ated with F-boxF-box-like BTBPOZ and RINGU-box do-mains in the large gene families This search identified a vastnumber of proteins for each domain For instance chlamydialgenomes encode 409 proteins with LRR domains nearlydoubling the amount of LRR proteins contributed by thelarge Parachlamydiaceae gene families Taken together in ad-dition to the four large gene families there is an even greaterpool of chlamydial proteins with a putative role in hostinteraction

FIG 5 The phylogeny and evolutionary history of the F-box superfamily within the Chlamydiae (A) The phylogeny of 376 proteins within theChlamydiae that harbor an F-boxF-box-like domain This domain was extracted from each protein and aligned using MAFFT Maximum-likelihoodreconstruction of the phylogeny of the superfamily was performed with FastTree2 (B) The F-box domain superfamily gene tree was reconciled with thechlamydial species tree to reconstruct the evolutionary history of this group for members of the Chlamydiae The nodes in blue indicate the predictednumber of F-box proteins and numbers on the branches depict the gains and losses The extant species are indicated with their respective counts forF-box proteins The Neochlamydia have undergone massive gains and losses after the divergence from Protochlamydia

2896

Domman et al doi101093molbevmsu227 MBE

To gain a broader overview of the occurrence of F-boxF-box-like and BTBPOZ domains among other prokaryotesand eukaryotes we extracted domain abundance data fromPfam (Finn et al 2013) and included the current counts forthe Chlamydiae genomes present in this study (fig 6) Thisrevealed a striking pattern that the few bacterial groupsencoding proteins with F-box and BTB domains are almostexclusively amoeba-associated organisms These includemembers of the Legionellales (Gammaproteobacteria) theRickettsiales (Alphaproteobacteria) and the amoeba symbi-ont Amoebophilus asiaticus (Bacteroidetes) When thenumber of F-box proteins is normalized against the totalnumber of species in a given taxon the Chlamydiae lead inthe number of F-box proteins found in bacteria and evenharbor more than several lineages of eukaryotes includingthe Amoebozoa For the BTB proteins the Chlamydiaeappear to be the only bacterial lineage that harbors thisdomain It is intriguing that many of the large double-stranded DNA viruses namely the amoeba-infecting giantviruses contain many proteins with an F-box or BTB domain

Discussion

Large Gene Families Are Rare within ReducedBacterial Genomes

Chlamydial genomes are among the smallest known for pro-karyotes due to genome degradation consistent with long-term obligate associations with eukaryotic organisms(McCutcheon and Moran 2011) The genomes of host-asso-ciated bacteria such as Coxiella Mycoplasma Rickettsia spe-cies and members of the Chlamydiaceae tend to have smallor single copy gene families (Gevers et al 2004) and genefamily expansions either by gene duplication or HGT appearto have less effect on shaping the genomes of these obligatehost-associated bacteria (Bordenstein and Reznikoff 2005)However there is evidence that HGT may be more prevalentthan once thought in these organisms (Blanc et al 2007)A notable exception to this paradigm is the genome of theobligate intracellular pathogen Orientia tsutsugamushi(Rickettsiales) in which there has been massive expansionof type IV secretion system (over 350 tra-related genes) andhostndashmicrobe interaction genes (Cho et al 2007) Howeverthese expansions are thought to be the result of copiousplasmid integration and the genome is also littered withmobile elements a scenario not shared within chlamydialgenomes Here we have shown that several members ofthe Chlamydiae harbor gene families that have expanded atimmense magnitudes especially when compared with otherintracellular bacteria (fig 2) We find strong support for du-plication processes contributing to the expansion of thesefamilies thus highlighting that innovation through gene

FIG 6 Taxonomic profile of F-box and BTB domains The distributionof (A) the F-box clan and (B) the BTB-box throughout sequenced or-ganisms The size of the node indicates the number of species harboringproteins with the domain Thus larger node size indicates a largernumber of species in which a domain is found within a taxon Nodesare ordered from least to greatest by the total number of proteins thatcontain the domain within the taxon This is different than the numberof species as one species can have many proteins harboring a givendomain To reflect this disparity and to facilitate comparisons we com-puted a normalized value for each taxon that represents the number oftotal proteins divided by the number of species This normalizationvalue is represented by the width of the arc in the diagram For instancethe chlamydiae are represented by few species (small node size) but areamong the taxa containing the largest numbers of proteins with F-Boxand BTB domains (position on vertical axis) and show a high number ofproteins with these domains per species (arc width) All bacterial taxa

FIG 6 Continuedare plotted in purple and selected major eukaryotic taxa in blue TheChlamydiae are labeled in red and double-stranded DNA viruses areshown in green The data were obtained from the Pfam database foreach domain and counts were updated to reflect findings in this study

2897

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

duplication has had pronounced effects in shaping these chla-mydial genomes

Chlamydial Proteins Putatively Involved inInterference with Eukaryotic Ubiquitination Pathways

We demonstrated that the largest chlamydial gene familiesharbor proteins with domains associated with eukaryotic ubi-quitination pathways and that the chlamydial F-box super-family in particular is tremendous in size (figs 3 and 5) Arecent survey of prokaryotes found a total of 74 F-box pro-teins distributed in 22 species (Price and Kwaik 2010) whichmeans the number of these proteins present in a singleNeochlamydia genome is twice that of all previously knownbacterial F-box proteins combined The 76-member BTB su-perfamily is also remarkable in that the Parachlamydiaceae arethe only prokaryotes known to harbor this domain (fig 6)

In eukaryotes ubiquitin plays a pivotal regulatory role as aposttranslational modification that includes targeting pro-teins for degradation The process of adding ubiquitin to aprotein occurs when an assembled E3 ubiquitin ligase com-plex carrying a target protein is bound to the ubiquitin con-jugating enzyme E2 The ubiquitin ligase is a multiproteincomplex termed the Skp1-Cullin-F-box protein complex bywhich F-box proteins recruit target proteins and subsequentlybind to Skp1 which is linked to a Cullin protein (Zheng et al2002) A RINGU-box protein then serves as a linker of E2 tothe newly formed ubiquitin ligase and the transfer of theubiquitin moiety to the target occurs BTB-box proteins canhave multiple functions but chief among them is a functionalequivalent to the F-box protein in recruiting targets by bind-ing to the E3 ubiquitin ligase complex (Perez-Torrado et al2006) Proteinndashprotein interaction domains such as ankyrinkelch WD-40 LRR or TPR repeat domains are coupled toF-box and BTB-box domains and confer the specificity fortarget proteins

Given the conservation and essentiality of the ubiquitina-tion pathway throughout eukaryotes it should come as nosurprise that bacterial pathogens have engineered ways tomanipulate this pathway The use of F-box proteins appearsto be a common feature among plant pathogens such asAgrobacterium tumefaciens and Ralstonia solanacearumwhose genomes encode one and four F-box proteins respec-tively (Magori and Citovsky 2011) Intriguingly F-box proteinsseem to be a common feature of amoeba-associated bacteriaand viruses The amoeba symbiont A asiaticus a Bacteroi-detes is predicted to harbor 15 F-box proteins and until nowwas the largest known pool of these proteins among se-quenced genomes (Schmitz-Esser et al 2010) AdditionallyLegionella pneumophila exports an F-box protein coupledto ankyrin repeats termed AnkB that is essential for infectionof both human cell lines and Acanthamoeba (Price et al 2009Lomma et al 2010) AnkB blocks host proteosomal degrada-tion and thus generates increased levels of required aminoacids (Price et al 2009 Price et al 2011) Several other F-boxproteins secreted by L pneumophila have been shown tointeract with host E3 ligase complexes (Ensminger andIsberg 2010) Among the members of the Chlamydiaceae

we could not detect an F-box or BTB-box domain Howeverthere are several proteins within the Chlamydiaceae thatfunction as deubiquinating proteases such as the C tracho-matis ChlaDub1 (Misaghi et al 2006) and the recently de-scribed ChlaOTU characterized in Chlamydia caviae (Furtadoet al 2013)

We have shown experimentally that representative mem-bers of the investigated chlamydial gene families contain func-tional type III secretion signals (supplementary fig S1Supplementary Material online) Thus they likely representan extensive pool of effector proteins with a putative role inhijacking the host ubiquitination machinery perhaps in amanner similar to AnkB from Legionella by increasing nutri-ent availability In the absence of direct proteinndashprotein in-teraction data however we can only speculate as to what theinteraction partner(s) are for the members of the PEX andNEX gene families

Birth-and-Death Evolution Has Shaped LargeParachlamydiaceae Gene Families

Large gene families are thought to generally either evolve viaconcerted evolution or according to a birth-and-death model(Nei and Rooney 2005) When gene families are evolving con-certedly all members experience the same evolutionary pres-sure and evolve as a unit The gene family is marked byrecombination between members that leads to a homogeni-zation of all members and thus in the phylogenetic analysisone observes intraspecies clustering of gene family membersAlthough we do find minor evidence that recombination hasoccurred between members in the PEX and NEX gene families(supplementary fig S8 Supplementary Material online) theeffect does not appear to be that of homogenization In con-trast long branch lengths in the trees and moderate overallsequence similarity indicate these proteins have divergedquite extensively (supplementary figs S2ndashS6 SupplementaryMaterial online) As we do not observe a dominance of in-traspecies clustering in the phylogenetic analysis these genefamilies are not evolving in a fashion as would be predicted viaconcerted evolution

We provide clear evidence supporting birth-and-deathevolution of the PEX and NEX gene families which ismarked by independent gains and losses of members (Neiand Rooney 2005) We detected frequent lineage-specific du-plication and loss events leading to high rates of variation incopy number between closely related organisms (supplemen-tary table S3 Supplementary Material online) In the phylo-genetic trees of the PEX and NEX gene families we findinterspecies clustering of members which is a hallmark ofthe birth-and-death model (supplementary figs S2ndashS6Supplementary Material online) Additionally we detectedpseudogenized gene fragments of members related to thelarge gene families which is another hallmark of this modeof evolution To our knowledge this mode of evolution has sofar only been described once for a bacterial gene family(Rooney and Ward 2008)

It has been proposed that gene families that control phe-notypic characters are generally subject to birth-and-death

2898

Domman et al doi101093molbevmsu227 MBE

evolution (Nei and Rooney 2005 Nei 2007) In the case of theexpansions seen in the Parachlamydiaceae this character islikely the ability to effectively infect and replicate in protisthosts The advantage of the birth-and-death scenario as op-posed to concerted evolution is that individual members ofthe gene family are able to functionally differentiate from eachother and thus might facilitate the adaptation to new eco-logical niches (Nei 2007) Therefore a possible driver for thebirth-and-death model for the PEX and NEX families is theexploitation of new ecological niches which in this case ismost likely new host(s) species or a novel way to subvert hostcell machinery

A Confluence of Drift and Selection

Birth-and-death evolution of gene families occurs by bothadaptive processes and chance events such as genetic drift(Nei et al 2008) This is due to the fact that although geneduplications are intrinsically stochastic events their fixation isinfluenced by both selection and genetic drift If a duplicate isfixed functional divergence can occur due to relaxed selec-tion or diversifying selection in one of the gene copies aprocess known as neofunctionalization (Lynch and Conery2000) Once these genes have diverged the new functionsare then maintained in the genome via purifying selectionWe envision an evolutionary scenario where the PEX and NEXgene family members rapidly diverged either due to positivediversifying or relaxed selection after duplication likely leadingto functional diversification The window for detecting theseearly diversification processes is very small but our analysisindicates that the gene family members are now being main-tained via purifying selection

The retention of such large gene families in organisms thattypically undergo extreme genome reduction is perplexingespecially when considering the variation in copy numberbetween organisms The large number of gains and lossesfor F-box domain containing proteins across chlamydial line-ages indicate substantial fluctuations in gene content some-times over short evolutionary time (fig 5b) One describedcorollary of gene families evolving via a birth-and-deathmodel is that the copy number variation of memberswithin a gene family may vary due to chance duplicationloss events both between and within species a process coinedldquogenomic drift (Nei et al 2008) Genomic drift has been in-voked to explain the large copy number variations in severallarge gene families in eukaryotes including animal chemosen-sory receptors (Nozawa et al 2007) homeobox genes (Namand Nei 2005) and fatty-acid reductases (Eirın-Lopez et al2012)

Remarkably eukaryotic F-box and BTB-box gene familiesdemonstrate the same dramatic evolution following a rapidbirth-and-death model and extensive copy number variationis seen both between and within eukaryotic species (Xu 2006Navarro-Quezada et al 2013) This is especially true in higherplant species (Stogios et al 2005 Xu et al 2009) whereArabidopsis thaliana and Oryza sativa harbor upwards of800 F-box proteins (Xu et al 2009) and large expansionshave also been described in some nematode lineages

(Thomas 2006) A case has been made that genomic driftalso influences the evolution of these eukaryotic F-boxBTB-box protein families (Xu et al 2009) Genomic drift there-fore seems a plausible mechanism describing some aspects ofevolution of the PEX and NEX gene families in theParachlamydiaceae Genome sequences from other chlamyd-ial genomes and data from within populations would allowfurther testing of this hypothesis

Another possibility is that the expansion of these families isthe result of selection pressure due to a coevolutionary armsrace with host counterpart protein(s) Under this scenariothe expansion and diversification of these families are in directresponse to changes in target proteins These evolutionarydynamics also referred to as the Red Queen hypothesis(Valen 1974) have been most exemplified in bacteriawithin plantndashpathogen relationships most notably betweenthe pathogen Pseudomonas syringae and its host Ar thaliana(Ma et al 2006 Baltrus et al 2011) Copy number variation ofeffector proteins has been shown for Ps syringae and it isspeculated that these differences confer differential hostranges (Baltrus et al 2011) It is plausible therefore that var-iations in copy number between the PEX and NEX gene fam-ilies is influenced by the host range of particular chlamydiallineages This would imply that these proteins interact withmany targets from a narrow host range or that there are alimited number of targets for a large number of possible hostsIt seems most parsimonious that the latter was the case andthat these genes serve as accessory virulence factors allowingthese chlamydiae to expand their host range However thefailure to detect positive selection as the major force acting onthe PEX and NEX gene family members casts doubts on acoevolutionary arms race as the sole mechanism for the evo-lution of these gene families

Another conceivable scenario might be that selection pres-sure for increased gene dosage has contributed to the expan-sion of the large chlamydial gene families In reduced genomeswhere the pool of regulatory proteins is limited a path forincreased protein expression might be gene duplication Thismode of ldquogene amplification has been described for instancefor the increase of antibiotic resistance genes in E coli thecholera toxin gene in Vibrio cholerae and capsule biosynthesisgenes in Haemophilus influenza (reviewed in Andersson andHughes 2009) Although a gene dosage scenario cannot befully ruled out for the PEX and NEX gene families it seemsunlikely as the high divergence of their members makes acompletely overlapping function highly improbableAdditionally gene amplifications are in response to strongselection pressure and would therefore be under strong pos-itive selection It is also known that once these selection pres-sures are removed the amplified gene copies are rapidly lostwithin the population often within several generations(Andersson and Hughes 2009)

We favor a hypothesis that incorporates both selectionand chance into the equation We envision that the expan-sion of these gene families is reflective of their role in hostndashmicrobe interaction where they are interfering with the hostubiquitination pathway The expansions may reflect either achange in the environmental niche perhaps the ability to

2899

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

infect a new host organism or they may be in response to agrowing number of targets within a current host Because thePEX and NEX gene families have respectively expanded in theProtochlamydia and Neochlamydia they appear to providelineage-specific functions related to particular host interac-tions The PEX and NEX variation in copy number betweenclosely related organisms may be reflective of genomic drift inwhich the independent gain and loss of members within afamily has been determined to some extent by chanceevents The fact that members of the Chlamydiae includingpathogenic Chlamydia also harbor various proteins to sub-vert the host ubiquitination pathway indicates a case of con-vergent evolution toward exploitation of this system withinthis phylum

To summarize the Chlamydiae harbor several lineage-spe-cific gene families which are the largest among intracellularmicrobes with small genomes Experimental evidence andcomputational analysis strongly indicate that members ofthese large gene families function as effector proteins involvedin manipulating the ubiquitination machinery of their eukary-otic host cells The large chlamydial gene families follow abirth-and-death model of evolution where genomic driftmay influence copy number variation This might representa previously undescribed mechanism by which organismswith limited exposure to larger gene pools generate geneticdiversity

Materials and Methods

Genome Sequencing

We sequenced the genomes of P amoebophila EI2 (Schmitz-Esser et al 2008) Parachlamydia acanthamoebae OEW1(Heinz et al 2007) Neochlamydia sp TUME1 (Fritsche et al2000) and Neochlamydia sp EPS4 The Acanthamoeba spharboring the latter was isolated from pond sediment fromElba Italy Cells and DNA were prepared as described previ-ously (Schmitz-Esser et al 2010) All four genomes were se-quenced using 454 technology and assemblies wereperformed using Newbler 26 We implemented an in-housepipeline for genome annotation that combines multipleapproaches for gene calling and function prediction Genecalling was performed by combining ab initio predictionsfrom GeneMark (Besemer et al 2001) Glimmer v302 (Del-cher et al 2007) Prodigal (Hyatt et al 2010) and Critica v105(Badger and Olsen 1999) with homology information derivedfrom a BLAST search against National Center for Biotechnol-ogy Information nonredundant protein database (NCBIResource Coordinators 2014) RNA genes were called bytRNAscan-SE (Lowe and Eddy 1997) RNAmmer (Lagesenet al 2007) and Rfam (Griffiths-Jones et al 2005) Functionprediction was performed via BLAST against the UniProtdatabase (UniProt Consortium 2014) and domain predictionwas performed via InterProScan 5 (Jones et al 2014)Completeness estimates were performed based on the pres-ence of single-copy marker genes (n = 54) found in 99 of allbacterial genomes Sequences have been deposited at Gen-bankEMBLDDBJ under accession numbers PRJNA242498PRJNA242497 PRJNA242499 and PRJNA242500

Comparative Genome Analysis

Orthologous protein groups were calculated using OrthoMCL(Li et al 2003) with default parameters using the predictedproteomes from 19 chlamydial organisms (supplementarytable S2 Supplementary Material online) For those gene fam-ilies under analysis membership to a given family was furtherevaluated by alignment and assessment of the phylogenetictrees Members that had obvious major differences in thealignment and trees were dubbed spurious and likely tohave been grouped due to homology in the repeat regionand thus were removed from the gene family Whole-genomealignments were performed using the MAUVE progressive-Mauve algorithm (Darling et al 2010) The MAUVE align-ments and local synteny plots were visualized in R (v 301)with the genoPlotR package (Guy et al 2010)

Species Tree Construction

Phylogeny of the Chlamydiae was reconstructed using 32phylogenetic marker proteins (supplementary table S2Supplementary Material online) Multiple sequence align-ments were performed with MAFFT (Katoh and Standley2013) using the settings ldquondashmaxiterate 1000 and all align-ments subsequently concatenated using FASconCAT v10(Keurouck and Meusemann 2010) Maximum-likelihood analysiswas performed with RAxML (Stamatakis 2006) with 1000bootstrap iterations under the PROTGAMMAGTR modeland Bayesian inference was performed with MrBayes v32(Ronquist and Huelsenbeck 2003) using the mixed aminoacid model and standard settings via the CIPRES science gate-way (Miller et al 2010) Alignment and tree files are availableas supplementary material Supplementary Material online

Gene Family Analysis

As multiple sequence alignments of repeat containing pro-teins are not trivial we employed several methods to obtainreliable alignments We compared the alignments producedby MUSCLE (Edgar 2004) MAFFT using the ldquogenapairsoption (Katoh and Standley 2013) and DIALIGN-PFAM (AlAit et al 2013) all with and without trimming by Gblocksusing relaxed parameters ldquo-b5=h (Castresana 2000) In nearlyall cases the MAFFT alignment combined with Gblocksyielded the best alignment as judged by manual inspectionThe exceptions were that DIALIGN-PFAM with Gblocks wasthe best method for the NEX1b and PEX1 alignments Toensure robustness we calculated neighbor joining maximumlikelihood and Bayesian trees for each of the alignment datasets and selected the most supported tree for the final anal-ysis In all cases the Bayesian trees yielded the most supportwhich were calculated on the CIPRES gateway (Miller et al2010) with MrBayes as mentioned above for the species treeFor protein domain phylogenies we extracted and aligned thedomains using the hmmalign program from HMMER3 pack-age (Eddy 2011) and phylogenetic trees for the domains werecalculated using FastTree2 (Price et al 2010) Reconciliationsof gene trees to the species tree to infer gene gain and losswere performed in Notung v26 (Stolzer et al 2012) and theroot for the gene trees was assigned within the program to

2900

Domman et al doi101093molbevmsu227 MBE

achieve the lowest duplication-to-loss ratio Phylogenetictrees were visualized using iTOL (Letunic and Bork 2007) orthe ETE2 toolkit (Huerta-Cepas et al 2010) Alignment andtree files are available as supplementary materialSupplementary Material online

Detection of Selection and Recombination

Whole-protein alignments were converted to codon align-ments via the Pal2Nal program (Suyama et al 2006) Todetect the mode of selection acting on these gene familieswe used CodeML from the PAML package (Yang 2007) Onlysequences that were nearly full length and not contig fusionswere used for the CodeML analysis We additionally used themodified NeindashGojobori method for codon selection from theMEGA v51 (Tamura et al 2011) program with 1000 boot-straps and treating missing data with pairwise deletions Weused the RDP4 software suite (Martin et al 2010) to detectrecombination events which is an amalgamation of manyindividual recombination programs linked into one softwarearchitecture We employed seven recombination detectionprograms that include RDP (Martin and Rybicki 2000)GENECONV (Sawyer 1989) MaxChi (Smith 1992) BootScan(Martin et al 2005) Chimaera (Posada and Crandall 2001)SiScan (Gibbs et al 2000) and 3Seq (Boni et al 2007) Onlyevents that were predicted with more than four programswere considered

Intergenic Sequences Analysis

The proteome of Protochlamydia was used as query(tBLASTn) against the intergenic regions The hits were con-servatively filtered based on size (4 50 aa) bitscore (4 50)and E value (lt1010) For each intergenic region when twoquery proteins overlap only the best hit was considered Inaddition if different parts of the same protein had hits in oneintergenic space only the top scoring (bitscore) was consid-ered Putative domains in the intergenic regions were de-tected using InterProScan 5 (Jones et al 2014)

Domain Distribution

For each domain of interest we downloaded the Pfam HMMmodel (Finn et al 2013) and scanned all chlamydial prote-omes using the hmmscan program from the HMMER3 suite(Eddy 2011) In the case of the F-box and F-box-like domainsthe results were combined into a nonredundant list The datafor the other taxa were obtained through the Pfam websiteunder the species distribution tab for each domain or clan asin the case for the F-box The networks were created using theldquoarcdiagramrdquo package in R

Type III Secretion Analysis

Fusion proteins containing the 50-part of the genes of interest(including the first 20 codons) and the adenylate cyclase Cyawere expressed in Sh Flexneri SF401 and SF620 derivatives ofthe wild-type strain M90T in which the mxiD and ipaB geneshave been inactivated (Allaoui et al 1993) The 50-part of thetarget genes were amplified by polymerase chain reaction andcloned in the puc19cya vector as described (Subtil et al 2001)

Secretion assays were performed on 30 ml of exponentiallygrown cultures as described previously (Subtil et al 2001)Antibodies against CRP a cytosolic marker were used to es-timate the contamination of supernatant fractions with bac-terial proteins as a result of bacterial lysis Antibodies againstIpaD a type-III-secreted protein of Shigella were used to verifythat type III secretion occurred normally in the transformedstrains A monoclonal antibody against Cya and polyclonalantibodies against CRP and IpaD were generously provided byDrs N Guiso A Ullmann and C Parsot respectively (InstitutPasteur Paris) Prediction of type III secretion was performedvia the Effective database (Jehl et al 2011) and the PSORTbwebserver (Yu et al 2010)

Supplementary MaterialSupplementary tables S1ndashS3 and figures S1ndashS8 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Thomas Penz and Gabriele Schwammel are acknowledged forisolating genomic DNA This work was supported byAustrian Science Fund FWF grants Y277-B06 and I1628-B22European Research Council ERC StG EVOCHLAMY (281633)and Marie Curie Initial Training Network SymbiomicsGenome sequencing was performed at the NorwegianSequencing Center

ReferencesAl Ait L Yamak Z Morgenstern B 2013 DIALIGN at GOBICSmdashmultiple

sequence alignment using various sources of external informationNucleic Acids Res 41W3ndashW7

Allaoui A Sansonetti PJ Parsot C 1993 MxiD an outer membraneprotein necessary for the secretion of the Shigella flexneri lpa inva-sins Mol Microbiol 759ndash68

Andersson DI Hughes D 2009 Gene amplification and adaptive evolu-tion in bacteria Annu Rev Genet 43167ndash195

Angot A Vergunst A Genin S Peeters N 2007 Exploitationof eukaryotic ubiquitin signaling pathways by effectorstranslocated by bacterial type III and type IV Secretion SystemsPLoS Pathog 3e3

Arnold R Brandmaier S Kleine F Tischler P Heinz E Behrens SNiinikoski A Mewes HW Horn M Rattei T 2009 Sequence-based prediction of type III secreted proteins PLoS Pathog 5e1000376

Badger JH Olsen GJ 1999 CRITICA coding region identification toolinvoking comparative analysis Mol Biol Evol 16512ndash524

Balakirev MY Tcherniuk SO Jaquinod M Chroboczek J 2003 Otubainsa new family of cysteine proteases in the ubiquitin pathway EMBORep 4517ndash522

Baltrus DA Nishimura MT Romanchuk A Chang JH Mukhtar MSCherkis K Roach J Grant SR Jones CD Dangl JL 2011 Dynamicevolution of pathogenicity revealed by sequencing and comparativegenomics of 19 Pseudomonas syringae isolates PLoS Pathog 7e1002132

Bertelli C Collyn F Croxatto A Reurouckert C Polkinghorne A Kebbi-Beghdadi C Goesmann A Vaughan L Greub G 2010 TheWaddlia genome a window into chlamydial biology PLoS One 5e10890

2901

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

Besemer J Lomsadze A Borodovsky M 2001 GeneMarkS a self-trainingmethod for prediction of gene starts in microbial genomesImplications for finding sequence motifs in regulatory regionsNucleic Acids Res 292607ndash2618

Betts HJ Wolf K Fields KA 2009 Effector protein modulation of hostcells examples in the Chlamydia spp arsenal Curr Opin Microbiol1281ndash87

Blanc G Ogata H Robert C Audic S Claverie J-M Raoult D 2007Lateral gene transfer between obligate intracellular bacteria evi-dence from the Rickettsia massiliae genome Genome Res 171657ndash1664

Boni MF Posada D Feldman MW 2007 An exact nonparametricmethod for inferring mosaic structure in sequence tripletsGenetics 1761035ndash1047

Bordenstein SR Reznikoff WS 2005 Mobile DNA in obligate intracellularbacteria Nat Rev Microbiol 3688ndash699

Bratlie MS Johansen J Sherman BT Huang DW Lempicki RA Drabloslashs F2010 Gene duplications in prokaryotes can be associated with en-vironmental adaptation BMC Genomics 11588

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Cho N-H Kim H-R Lee J-H Kim SY Kim J Cha S Kim SY Darby ACFuxelius HH Yin J et al 2007 The Orientia tsutsugamushi genome re-veals massive proliferation of conjugative type IV secretion systemand host-cell interaction genes Proc Natl Acad Sci U S A 1047981ndash7986

Collingro A Tischler P Weinmaier T Penz T Heinz E Brunham RC ReadTD Bavoil PM Sachse K Kahane S et al 2011 Unity in varietymdashthe pan-genome of the Chlamydiae Mol Biol Evol 283253ndash3270

Darling AE Mau B Perna NT 2010 progressiveMauve multiple genomealignment with gene gain loss and rearrangement PLoS One 5e11147

Delcher AL Bratke KA Powers EC Salzberg SL 2007 Identifying bacterialgenes and endosymbiont DNA with Glimmer Bioinformatics 23673ndash679

Eddy SR 2011 Accelerated profile hmm searches PLoS Comput Biol 7e1002195

Edgar RC 2004 MUSCLE multiple sequence alignmentwith high accuracy and high throughput Nucleic Acids Res 321792ndash1797

Eirın-Lopez JM Rebordinos L Rooney AP Rozas J 2012 The birth-and-death evolution of multigene families revisited Genome Dyn7170ndash196

Ensminger AW Isberg RR 2010 E3 ubiquitin ligase activity and targetingof Bat3 by multiple Legionella pneumophila translocated substratesInfect Immun 783905ndash3919

Finn RD Bateman A Clements J Coggill P Eberhardt RY Eddy SR HegerA Hetherington K Holm L Mistry J et al 2013 Pfam the proteinfamilies database Nucleic Acids Res 42D222ndashD230

Fritsche TR Horn M Wagner M Herwig RP Schleifer K-H Gautom RK2000 Phylogenetic diversity among geographically dispersedChlamydiales endosymbionts recovered from clinical and environ-mental isolates of Acanthamoeba spp Appl Environ Microbiol 662613ndash2619

Furtado AR Essid M Perrinet S Bala~na ME Yoder N Dehoux P Subtil A2013 The chlamydial OTU domain-containing protein ChlaOTU isan early type III secretion effector targeting ubiquitin and NDP52Cell Microbiol 152064ndash2079

Gevers D Vandepoele K Simillon C Van de Peer Y 2004 Gene dupli-cation and biased functional retention of paralogs in bacterial ge-nomes Trends Microbiol 12148ndash154

Gibbs MJ Armstrong JS Gibbs AJ 2000 Sister-scanning a Monte Carloprocedure for assessing signals in recombinant sequencesBioinformatics 16573ndash582

Goldman BS Nierman WC Kaiser D Slater SC Durkin AS Eisen JARonning CM Barbazuk WB Blanchard M Field C et al 2006Evolution of sensory complexity recorded in a myxobacterialgenome Proc Natl Acad Sci U S A 10315200ndash15205

Gomes JP Nunes A Bruno WJ Borrego MJ Florindo C Dean D 2006Polymorphisms in the nine polymorphic membrane proteins ofChlamydia trachomatis across all serovars evidence for serovar Darecombination and correlation with tissue tropism J Bacteriol 188275ndash286

Griffiths-Jones S Moxon S Marshall M Khanna A Eddy SR Bateman A2005 Rfam annotating non-coding RNAs in complete genomesNucleic Acids Res 33D121ndashD124

Grimwood J Stephens RS 1999 Computational analysis of thepolymorphic membrane protein superfamily of Chlamydia tracho-matis and Chlamydia pneumoniae Microb Comp Genomics 4187ndash201

Guy L Kultima JR Andersson SGE 2010 genoPlotR comparative geneand genome visualization in R Bioinformatics 262334ndash2335

Heinz E Kolarov I Keuroastner C Toenshoff ER Wagner M Horn M 2007An Acanthamoeba sp containing two phylogenetically differentbacterial endosymbionts Environ Microbiol 91604ndash1609

Hooper SD Berg OG 2003 On the nature of gene innovation duplica-tion patterns in microbial genomes Mol Biol Evol 20945ndash954

Horn M 2008 Chlamydiae as symbionts in eukaryotes Annu RevMicrobiol 62113ndash131

Horn M Collingro A Schmitz-Esser S Beier CL Purkhold U Fartmann BBrandt P Nyakatura GJ Droege M Frishman D et al 2004Illuminating the evolutionary history of chlamydiae Science 304728ndash730

Huerta-Cepas J Dopazo J Gabaldon T 2010 ETE a python environmentfor tree exploration BMC Bioinformatics 1124

Hyatt D Chen G-L LoCascio PF Land ML Larimer FW Hauser LJ 2010Prodigal prokaryotic gene recognition and translation initiation siteidentification BMC Bioinformatics 11119

Jehl M-A Arnold R Rattei T 2011 Effectivemdasha database ofpredicted secreted bacterial proteins Nucleic Acids Res 39D591ndashD595

Jones P Binns D Chang H-Y Fraser M Li W McAnulla C McWilliam HMaslen J Mitchell A Nuka G et al 2014 InterProScan 5 genome-scale protein function classification Bioinformatics 301236ndash1240

Kaessmann H 2010 Origins evolution and phenotypic impact of newgenes Genome Res 201313ndash1326

Kalman S Mitchell W Marathe R Lammel C Fan J Hyman RW OlingerL Grimwood J Davis RW Stephens RS 1999 Comparative genomesof Chlamydia pneumoniae and C trachomatis Nat Genet 21385ndash389

Kamneva OK Knight SJ Liberles DA Ward NL 2012 Analysis of genomecontent evolution in PVC bacterial super-phylum assessment ofcandidate genes associated with cellular organization and lifestyleGenome Biol Evol 41375ndash1390

Katoh K Standley DM 2013 MAFFT multiple sequence alignment soft-ware version 7 improvements in performance and usability Mol BiolEvol 30772ndash780

Koonin EV Wolf YI Karev GP 2002 The structure of the protein uni-verse and genome evolution Nature 420218ndash223

Keurouck P Meusemann K 2010 FASconCAT convenient handling of datamatrices Mol Phylogenet Evol 561115ndash1118

Kuo C-H Ochman H 2009a Deletional bias across the three domains oflife Genome Biol Evol 1145ndash152

Kuo C-H Ochman H 2009b The fate of new bacterial genes FEMSMicrobiol Rev 3338ndash43

Lagesen K Hallin P Roslashdland EA Staefeldt H-H Rognes T Ussery DW2007 RNAmmer consistent and rapid annotation of ribosomalRNA genes Nucleic Acids Res 353100ndash3108

Lagkouvardos I Weinmaier T Lauro FM Cavicchioli R Rattei T Horn M2013 Integrating metagenomic and amplicon databases to re-solve the phylogenetic and ecological diversity of the ChlamydiaeISME J 8115ndash125

Lerat E Daubin V Ochman H Moran NA 2005 Evolutionary origins ofgenomic repertoires in bacteria PLoS Biol 3e130

Letunic I Bork P 2007 Interactive Tree Of Life (iTOL) an online tool forphylogenetic tree display and annotation Bioinformatics 23127ndash128

2902

Domman et al doi101093molbevmsu227 MBE

Li L Stoeckert CJ Jr Roos DS 2003 OrthoMCL identification ofortholog groups for eukaryotic genomes Genome Res 132178ndash2189

Lomma M Dervins-Ravault D Rolando M Nora T Newton HJ SansomFM Sahr T Gomez-Valero L Jules M Hartland EL et al 2010The Legionella pneumophila F-box protein Lpp2082 (AnkB) modu-lates ubiquitination of the host protein parvin B and promotesintracellular replication Cell Microbiol 121272ndash1291

Lowe TM Eddy SR 1997 tRNAscan-SE a program for improved detec-tion of transfer RNA genes in genomic sequence Nucleic Acids Res25955ndash964

Lynch M Conery JS 2000 The evolutionary fate and consequences ofduplicate genes Science 2901151ndash1155

Ma W Dong FFT Stavrinides J Guttman DS 2006 Type III effec-tor diversification via both pathoadaptation and horizontaltransfer in response to a coevolutionary arms race PLoS Genet 2e209

Magori S Citovsky V 2011 Hijacking of the host SCF ubiquitin ligasemachinery by plant pathogens Front Plant Sci 287

Martin D Rybicki E 2000 RDP detection of recombination amongstaligned sequences Bioinformatics 16562ndash563

Martin DP Lemey P Lott M Moulton V Posada D Lefeuvre P 2010RDP3 a flexible and fast computer program for analyzing recombi-nation Bioinformatics 262462ndash2463

Martin DP Posada D Crandall KA Williamson C 2005 A modifiedbootscan algorithm for automated identification of recombinantsequences and recombination breakpoints AIDS Res HumRetroviruses 2198ndash102

McCutcheon JP Moran NA 2011 Extreme genome reduction in sym-biotic bacteria Nat Rev Microbiol 1013ndash26

McLeod MP Warren RL Hsiao WWL Araki N Myhre M Fernandes CMiyazawa D Wong W Lillquist AL Wang D et al 2006 Thecomplete genome of Rhodococcus sp RHA1 provides insightsinto a catabolic powerhouse Proc Natl Acad Sci U S A 10315582ndash15587

Miller MA Pfeiffer W Schwartz T 2010 Creating the CIPRES sciencegateway for inference of large phylogenetic trees In Proceedings ofthe Gateway Computing Environments Workshop (GCE) 2010 p1ndash8

Misaghi S Balsara ZR Catic A Spooner E Ploegh HL Starnbach MN2006 Chlamydia trachomatis-derived deubiquitinating enzymes inmammalian cells during infection Mol Microbiol 61142ndash150

Myers GSA Crabtree J Huot CH 2012 Deep and wide comparativegenomics of Chlamydia In Tan M Bavoil PM editors Intracellularpathogens Chlamydiales Washington (DC) ASM Press p 27ndash50

Nam J Nei M 2005 Evolutionary change of the numbers of homeoboxgenes in bilateral animals Mol Biol Evol 222386ndash2394

Navarro-Quezada A Schumann N Quint M 2013 Plant F-box proteinevolution is determined by lineage-specific timing of major genefamily expansion waves PLoS One 8e68672

NCBI Resource Coordinators 2014 Database resources of the NationalCenter for Biotechnology Information Nucleic Acids Res 42D7ndashD17

Nei M 2007 The new mutation theory of phenotypic evolution ProcNatl Acad Sci U S A 10412235ndash12242

Nei M Niimura Y Nozawa M 2008 The evolution of animal chemo-sensory receptor gene repertoires roles of chance and necessity NatRev Genet 9951ndash963

Nei M Rooney AP 2005 Concerted and birth-and-death evolution ofmultigene families Annu Rev Genet 39121ndash152

Nogueira T Touchon M Rocha EPC 2012 Rapid evolution of the se-quences and gene repertoires of secreted proteins in bacteria PLoSOne 7e49403

Nozawa M Kawahara Y Nei M 2007 Genomic drift and copy numbervariation of sensory receptor genes in humans Proc Natl Acad SciU S A 10420421ndash20426

Ochman H Lawrence JG Groisman EA 2000 Lateral gene transfer andthe nature of bacterial innovation Nature 405299ndash304

Perez-Torrado R Yamada D Defossez P-A 2006 Born to bind the BTBproteinndashprotein interaction domain BioEssays 281194ndash1202

Peters J Wilson DP Myers G Timms P Bavoil PM 2007 Type III secre-tion a la Chlamydia Trends Microbiol 15241ndash251

Posada D Crandall KA 2001 Evaluation of methods for detecting re-combination from DNA sequences computer simulations Proc NatlAcad Sci U S A 9813757ndash13762

Price CT Kwaik YA 2010 Exploitation of host polyubiquitination ma-chinery through molecular mimicry by eukaryotic-like bacterial F-box effectors Cell Infect Microbiol 1122

Price CT Al-Khodor S Al-Quadan T Santic M Habyarimana F Kalia AKwaik YA 2009 Molecular mimicry by an F-box effector ofLegionella pneumophila hijacks a conserved polyubiquitinationmachinery within macrophages and protozoa PLoS Pathog 5e1000704

Price CTD Al-Quadan T Santic M Rosenshine I Kwaik YA 2011 Hostproteasomal degradation generates amino acids essential for intra-cellular bacterial growth Science 3341553ndash1557

Price MN Dehal PS Arkin AP 2010 FastTree 2mdashapproximately maxi-mum-likelihood trees for large alignments PLoS One 5e9490

Pushker R Mira A Rodriguez-Valera F 2004 Comparative geno-mics of gene-family size in closely related bacteria Genome Biol 5R27

Ronquist F Huelsenbeck JP 2003 MrBayes 3 Bayesian phylogeneticinference under mixed models Bioinformatics 191572ndash1574

Rooney AP Ward TJ 2008 Birth-and-death evolution of the internalinmultigene family in Listeria Gene 427124ndash128

Sakakibara A Hattori S 2000 Chat a CasHEF1-associated adaptor pro-tein that integrates multiple signaling pathways J Biol Chem 2756404ndash6410

Santoyo G Romero D 2005 Gene conversion and concerted evolutionin bacterial genomes FEMS Microbiol Rev 29169ndash183

Sawyer S 1989 Statistical tests for detecting gene conversion Mol BiolEvol 6526ndash538

Schmitz-Esser S Tischler P Arnold R Montanaro J Wagner M Rattei THorn M 2010 The genome of the amoeba symbiont ldquoCandidatusAmoebophilus asiaticusrdquo reveals common mechanisms for host cellinteraction among amoeba-associated bacteria J Bacteriol 1921045ndash1057

Schmitz-Esser S Toenshoff ER Haider S Heinz E Hoenninger VMWagner M Horn M 2008 Diversity of bacterial endosymbionts ofenvironmental Acanthamoeba isolates Appl Environ Microbiol 745822ndash5831

Smith JM 1992 Analyzing the mosaic structure of genes J Mol Evol 34126ndash129

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Stephens RS Kalman S Lammel C Fan J Marathe R Aravind L MitchellW Olinger L Tatusov RL Zhao Q et al 1998 Genome sequence ofan obligate intracellular pathogen of humans Chlamydia trachoma-tis Science 282754ndash759

Stogios PJ Downs GS Jauhal JJ Nandra SK Prive GG 2005Sequence and structural analysis of BTB domain proteins GenomeBiol 6R82

Stolzer M Lai H Xu M Sathaye D Vernot B Durand D 2012Inferring duplications losses transfers and incomplete line-age sorting with nonbinary species trees Bioinformatics 28i409ndashi415

Subtil A Collingro A Horn M 2014 Tracing the primordial Chlamydiaeextinct parasites of plants Trends Plant Sci 1936ndash43

Subtil A Parsot C Dautry-Varsat A 2001 Secretion of predicted Incproteins of Chlamydia pneumoniae by a heterologous type III ma-chinery Mol Microbiol 39792ndash800

Suyama M Torrents D Bork P 2006 PAL2NAL robust conversion ofprotein sequence alignments into the corresponding codon align-ments Nucleic Acids Res 34W609ndashW612

2903

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

Tamura K Peterson D Peterson N Stecher G Nei M Kumar S 2011MEGA5 molecular evolutionary genetics analysis using maximumlikelihood evolutionary distance and maximum parsimony meth-ods Mol Biol Evol 282731ndash2739

Thomas JH 2006 Adaptive evolution in two large families of ubiquitin-ligase adapters in nematodes and plants Genome Res 161017ndash1030

Treangen TJ Rocha EPC 2011 Horizontal transfer not duplicationdrives the expansion of protein families in prokaryotes PLoSGenet 7e1001284

UniProt Consortium 2014 Activities at the Universal Protein Resource(UniProt) Nucleic Acids Res 42D191ndashD198

Valen LV 1974 Molecular evolution as predicted by natural selectionJ Mol Evol 389ndash101

Willems AR Schwab M Tyers M 2004 A hitchhikerrsquos guide to the cullinubiquitin ligases SCF and its kin Biochim Biophys Acta 1695133ndash170

Xu G Ma H Nei M Kong H 2009 Evolution of F-box genes in plantsdifferent modes of sequence divergence and their relationships withfunctional diversification Proc Natl Acad Sci U S A 106835ndash840

Xu J 2006 Microbial ecology in the age of genomics andmetagenomics concepts tools and recent advances Mol Ecol 151713ndash1731

Yang J Lusk R Li W-H 2003 Organismal complexity protein complex-ity and gene duplicability Proc Natl Acad Sci U S A 10015661ndash15665

Yang Z 2007 PAML 4 phylogenetic analysis by maximum likelihoodMol Biol Evol 241586ndash1591

Yu NY Wagner JR Laird MR Melli G Rey S Lo R Dao P Sahinalp SCEster M Foster LJ et al 2010 PSORTb 30 improved protein sub-cellular localization prediction with refined localization subcate-gories and predictive capabilities for all prokaryotes Bioinformatics261608ndash1615

Zhang J 2003 Evolution by gene duplication an update Trends EcolEvol 18292ndash298

Zheng N Schulman BA Song L Miller JJ Jeffrey PD Wang P Chu CKoepp DM Elledge SJ Pagano M et al 2002 Structure of the Cul1ndashRbx1ndashSkp1ndashF boxSkp2 SCF ubiquitin ligase complex Nature 416703ndash709

2904

Domman et al doi101093molbevmsu227 MBE

Page 8: Massive Expansion of Ubiquitination-Related Gene Families

unveiled an astonishing number of proteins harboring F-boxF-box-like domains with over 370 proteins within thephylum Nearly 300 of the F-box proteins are the contributionof the two Neochlamydia species (129 in TUME1 and 158 inEPS4) To characterize the relationships among this F-boxsuperfamily we constructed a phylogenetic tree based on adomain alignment This analysis shows that many of the ad-ditional F-box proteins found in Neochlamydia cluster witheither NEX1a or NEX2 (fig 5a) We also see several lineage-specific expansions of F-box proteins within Protochlamydiaand Parachlamydia species Reconciliation of the F-boxsuperfamily tree with the chlamydial species tree confirmsan extremely dynamic history of large-scale gene birth anddeath events (fig 5b) mirroring the evolutionary pattern seen

for the large Parachlamydiaceae gene families (supplementaryfigs S2ndashS6 and table S3 Supplementary Material online)

Furthermore we searched for additional chlamydial pro-teins containing domains for proteinndashprotein interactionsuch as LRR TPR or ankyrin repeats which are often associ-ated with F-boxF-box-like BTBPOZ and RINGU-box do-mains in the large gene families This search identified a vastnumber of proteins for each domain For instance chlamydialgenomes encode 409 proteins with LRR domains nearlydoubling the amount of LRR proteins contributed by thelarge Parachlamydiaceae gene families Taken together in ad-dition to the four large gene families there is an even greaterpool of chlamydial proteins with a putative role in hostinteraction

FIG 5 The phylogeny and evolutionary history of the F-box superfamily within the Chlamydiae (A) The phylogeny of 376 proteins within theChlamydiae that harbor an F-boxF-box-like domain This domain was extracted from each protein and aligned using MAFFT Maximum-likelihoodreconstruction of the phylogeny of the superfamily was performed with FastTree2 (B) The F-box domain superfamily gene tree was reconciled with thechlamydial species tree to reconstruct the evolutionary history of this group for members of the Chlamydiae The nodes in blue indicate the predictednumber of F-box proteins and numbers on the branches depict the gains and losses The extant species are indicated with their respective counts forF-box proteins The Neochlamydia have undergone massive gains and losses after the divergence from Protochlamydia

2896

Domman et al doi101093molbevmsu227 MBE

To gain a broader overview of the occurrence of F-boxF-box-like and BTBPOZ domains among other prokaryotesand eukaryotes we extracted domain abundance data fromPfam (Finn et al 2013) and included the current counts forthe Chlamydiae genomes present in this study (fig 6) Thisrevealed a striking pattern that the few bacterial groupsencoding proteins with F-box and BTB domains are almostexclusively amoeba-associated organisms These includemembers of the Legionellales (Gammaproteobacteria) theRickettsiales (Alphaproteobacteria) and the amoeba symbi-ont Amoebophilus asiaticus (Bacteroidetes) When thenumber of F-box proteins is normalized against the totalnumber of species in a given taxon the Chlamydiae lead inthe number of F-box proteins found in bacteria and evenharbor more than several lineages of eukaryotes includingthe Amoebozoa For the BTB proteins the Chlamydiaeappear to be the only bacterial lineage that harbors thisdomain It is intriguing that many of the large double-stranded DNA viruses namely the amoeba-infecting giantviruses contain many proteins with an F-box or BTB domain

Discussion

Large Gene Families Are Rare within ReducedBacterial Genomes

Chlamydial genomes are among the smallest known for pro-karyotes due to genome degradation consistent with long-term obligate associations with eukaryotic organisms(McCutcheon and Moran 2011) The genomes of host-asso-ciated bacteria such as Coxiella Mycoplasma Rickettsia spe-cies and members of the Chlamydiaceae tend to have smallor single copy gene families (Gevers et al 2004) and genefamily expansions either by gene duplication or HGT appearto have less effect on shaping the genomes of these obligatehost-associated bacteria (Bordenstein and Reznikoff 2005)However there is evidence that HGT may be more prevalentthan once thought in these organisms (Blanc et al 2007)A notable exception to this paradigm is the genome of theobligate intracellular pathogen Orientia tsutsugamushi(Rickettsiales) in which there has been massive expansionof type IV secretion system (over 350 tra-related genes) andhostndashmicrobe interaction genes (Cho et al 2007) Howeverthese expansions are thought to be the result of copiousplasmid integration and the genome is also littered withmobile elements a scenario not shared within chlamydialgenomes Here we have shown that several members ofthe Chlamydiae harbor gene families that have expanded atimmense magnitudes especially when compared with otherintracellular bacteria (fig 2) We find strong support for du-plication processes contributing to the expansion of thesefamilies thus highlighting that innovation through gene

FIG 6 Taxonomic profile of F-box and BTB domains The distributionof (A) the F-box clan and (B) the BTB-box throughout sequenced or-ganisms The size of the node indicates the number of species harboringproteins with the domain Thus larger node size indicates a largernumber of species in which a domain is found within a taxon Nodesare ordered from least to greatest by the total number of proteins thatcontain the domain within the taxon This is different than the numberof species as one species can have many proteins harboring a givendomain To reflect this disparity and to facilitate comparisons we com-puted a normalized value for each taxon that represents the number oftotal proteins divided by the number of species This normalizationvalue is represented by the width of the arc in the diagram For instancethe chlamydiae are represented by few species (small node size) but areamong the taxa containing the largest numbers of proteins with F-Boxand BTB domains (position on vertical axis) and show a high number ofproteins with these domains per species (arc width) All bacterial taxa

FIG 6 Continuedare plotted in purple and selected major eukaryotic taxa in blue TheChlamydiae are labeled in red and double-stranded DNA viruses areshown in green The data were obtained from the Pfam database foreach domain and counts were updated to reflect findings in this study

2897

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

duplication has had pronounced effects in shaping these chla-mydial genomes

Chlamydial Proteins Putatively Involved inInterference with Eukaryotic Ubiquitination Pathways

We demonstrated that the largest chlamydial gene familiesharbor proteins with domains associated with eukaryotic ubi-quitination pathways and that the chlamydial F-box super-family in particular is tremendous in size (figs 3 and 5) Arecent survey of prokaryotes found a total of 74 F-box pro-teins distributed in 22 species (Price and Kwaik 2010) whichmeans the number of these proteins present in a singleNeochlamydia genome is twice that of all previously knownbacterial F-box proteins combined The 76-member BTB su-perfamily is also remarkable in that the Parachlamydiaceae arethe only prokaryotes known to harbor this domain (fig 6)

In eukaryotes ubiquitin plays a pivotal regulatory role as aposttranslational modification that includes targeting pro-teins for degradation The process of adding ubiquitin to aprotein occurs when an assembled E3 ubiquitin ligase com-plex carrying a target protein is bound to the ubiquitin con-jugating enzyme E2 The ubiquitin ligase is a multiproteincomplex termed the Skp1-Cullin-F-box protein complex bywhich F-box proteins recruit target proteins and subsequentlybind to Skp1 which is linked to a Cullin protein (Zheng et al2002) A RINGU-box protein then serves as a linker of E2 tothe newly formed ubiquitin ligase and the transfer of theubiquitin moiety to the target occurs BTB-box proteins canhave multiple functions but chief among them is a functionalequivalent to the F-box protein in recruiting targets by bind-ing to the E3 ubiquitin ligase complex (Perez-Torrado et al2006) Proteinndashprotein interaction domains such as ankyrinkelch WD-40 LRR or TPR repeat domains are coupled toF-box and BTB-box domains and confer the specificity fortarget proteins

Given the conservation and essentiality of the ubiquitina-tion pathway throughout eukaryotes it should come as nosurprise that bacterial pathogens have engineered ways tomanipulate this pathway The use of F-box proteins appearsto be a common feature among plant pathogens such asAgrobacterium tumefaciens and Ralstonia solanacearumwhose genomes encode one and four F-box proteins respec-tively (Magori and Citovsky 2011) Intriguingly F-box proteinsseem to be a common feature of amoeba-associated bacteriaand viruses The amoeba symbiont A asiaticus a Bacteroi-detes is predicted to harbor 15 F-box proteins and until nowwas the largest known pool of these proteins among se-quenced genomes (Schmitz-Esser et al 2010) AdditionallyLegionella pneumophila exports an F-box protein coupledto ankyrin repeats termed AnkB that is essential for infectionof both human cell lines and Acanthamoeba (Price et al 2009Lomma et al 2010) AnkB blocks host proteosomal degrada-tion and thus generates increased levels of required aminoacids (Price et al 2009 Price et al 2011) Several other F-boxproteins secreted by L pneumophila have been shown tointeract with host E3 ligase complexes (Ensminger andIsberg 2010) Among the members of the Chlamydiaceae

we could not detect an F-box or BTB-box domain Howeverthere are several proteins within the Chlamydiaceae thatfunction as deubiquinating proteases such as the C tracho-matis ChlaDub1 (Misaghi et al 2006) and the recently de-scribed ChlaOTU characterized in Chlamydia caviae (Furtadoet al 2013)

We have shown experimentally that representative mem-bers of the investigated chlamydial gene families contain func-tional type III secretion signals (supplementary fig S1Supplementary Material online) Thus they likely representan extensive pool of effector proteins with a putative role inhijacking the host ubiquitination machinery perhaps in amanner similar to AnkB from Legionella by increasing nutri-ent availability In the absence of direct proteinndashprotein in-teraction data however we can only speculate as to what theinteraction partner(s) are for the members of the PEX andNEX gene families

Birth-and-Death Evolution Has Shaped LargeParachlamydiaceae Gene Families

Large gene families are thought to generally either evolve viaconcerted evolution or according to a birth-and-death model(Nei and Rooney 2005) When gene families are evolving con-certedly all members experience the same evolutionary pres-sure and evolve as a unit The gene family is marked byrecombination between members that leads to a homogeni-zation of all members and thus in the phylogenetic analysisone observes intraspecies clustering of gene family membersAlthough we do find minor evidence that recombination hasoccurred between members in the PEX and NEX gene families(supplementary fig S8 Supplementary Material online) theeffect does not appear to be that of homogenization In con-trast long branch lengths in the trees and moderate overallsequence similarity indicate these proteins have divergedquite extensively (supplementary figs S2ndashS6 SupplementaryMaterial online) As we do not observe a dominance of in-traspecies clustering in the phylogenetic analysis these genefamilies are not evolving in a fashion as would be predicted viaconcerted evolution

We provide clear evidence supporting birth-and-deathevolution of the PEX and NEX gene families which ismarked by independent gains and losses of members (Neiand Rooney 2005) We detected frequent lineage-specific du-plication and loss events leading to high rates of variation incopy number between closely related organisms (supplemen-tary table S3 Supplementary Material online) In the phylo-genetic trees of the PEX and NEX gene families we findinterspecies clustering of members which is a hallmark ofthe birth-and-death model (supplementary figs S2ndashS6Supplementary Material online) Additionally we detectedpseudogenized gene fragments of members related to thelarge gene families which is another hallmark of this modeof evolution To our knowledge this mode of evolution has sofar only been described once for a bacterial gene family(Rooney and Ward 2008)

It has been proposed that gene families that control phe-notypic characters are generally subject to birth-and-death

2898

Domman et al doi101093molbevmsu227 MBE

evolution (Nei and Rooney 2005 Nei 2007) In the case of theexpansions seen in the Parachlamydiaceae this character islikely the ability to effectively infect and replicate in protisthosts The advantage of the birth-and-death scenario as op-posed to concerted evolution is that individual members ofthe gene family are able to functionally differentiate from eachother and thus might facilitate the adaptation to new eco-logical niches (Nei 2007) Therefore a possible driver for thebirth-and-death model for the PEX and NEX families is theexploitation of new ecological niches which in this case ismost likely new host(s) species or a novel way to subvert hostcell machinery

A Confluence of Drift and Selection

Birth-and-death evolution of gene families occurs by bothadaptive processes and chance events such as genetic drift(Nei et al 2008) This is due to the fact that although geneduplications are intrinsically stochastic events their fixation isinfluenced by both selection and genetic drift If a duplicate isfixed functional divergence can occur due to relaxed selec-tion or diversifying selection in one of the gene copies aprocess known as neofunctionalization (Lynch and Conery2000) Once these genes have diverged the new functionsare then maintained in the genome via purifying selectionWe envision an evolutionary scenario where the PEX and NEXgene family members rapidly diverged either due to positivediversifying or relaxed selection after duplication likely leadingto functional diversification The window for detecting theseearly diversification processes is very small but our analysisindicates that the gene family members are now being main-tained via purifying selection

The retention of such large gene families in organisms thattypically undergo extreme genome reduction is perplexingespecially when considering the variation in copy numberbetween organisms The large number of gains and lossesfor F-box domain containing proteins across chlamydial line-ages indicate substantial fluctuations in gene content some-times over short evolutionary time (fig 5b) One describedcorollary of gene families evolving via a birth-and-deathmodel is that the copy number variation of memberswithin a gene family may vary due to chance duplicationloss events both between and within species a process coinedldquogenomic drift (Nei et al 2008) Genomic drift has been in-voked to explain the large copy number variations in severallarge gene families in eukaryotes including animal chemosen-sory receptors (Nozawa et al 2007) homeobox genes (Namand Nei 2005) and fatty-acid reductases (Eirın-Lopez et al2012)

Remarkably eukaryotic F-box and BTB-box gene familiesdemonstrate the same dramatic evolution following a rapidbirth-and-death model and extensive copy number variationis seen both between and within eukaryotic species (Xu 2006Navarro-Quezada et al 2013) This is especially true in higherplant species (Stogios et al 2005 Xu et al 2009) whereArabidopsis thaliana and Oryza sativa harbor upwards of800 F-box proteins (Xu et al 2009) and large expansionshave also been described in some nematode lineages

(Thomas 2006) A case has been made that genomic driftalso influences the evolution of these eukaryotic F-boxBTB-box protein families (Xu et al 2009) Genomic drift there-fore seems a plausible mechanism describing some aspects ofevolution of the PEX and NEX gene families in theParachlamydiaceae Genome sequences from other chlamyd-ial genomes and data from within populations would allowfurther testing of this hypothesis

Another possibility is that the expansion of these families isthe result of selection pressure due to a coevolutionary armsrace with host counterpart protein(s) Under this scenariothe expansion and diversification of these families are in directresponse to changes in target proteins These evolutionarydynamics also referred to as the Red Queen hypothesis(Valen 1974) have been most exemplified in bacteriawithin plantndashpathogen relationships most notably betweenthe pathogen Pseudomonas syringae and its host Ar thaliana(Ma et al 2006 Baltrus et al 2011) Copy number variation ofeffector proteins has been shown for Ps syringae and it isspeculated that these differences confer differential hostranges (Baltrus et al 2011) It is plausible therefore that var-iations in copy number between the PEX and NEX gene fam-ilies is influenced by the host range of particular chlamydiallineages This would imply that these proteins interact withmany targets from a narrow host range or that there are alimited number of targets for a large number of possible hostsIt seems most parsimonious that the latter was the case andthat these genes serve as accessory virulence factors allowingthese chlamydiae to expand their host range However thefailure to detect positive selection as the major force acting onthe PEX and NEX gene family members casts doubts on acoevolutionary arms race as the sole mechanism for the evo-lution of these gene families

Another conceivable scenario might be that selection pres-sure for increased gene dosage has contributed to the expan-sion of the large chlamydial gene families In reduced genomeswhere the pool of regulatory proteins is limited a path forincreased protein expression might be gene duplication Thismode of ldquogene amplification has been described for instancefor the increase of antibiotic resistance genes in E coli thecholera toxin gene in Vibrio cholerae and capsule biosynthesisgenes in Haemophilus influenza (reviewed in Andersson andHughes 2009) Although a gene dosage scenario cannot befully ruled out for the PEX and NEX gene families it seemsunlikely as the high divergence of their members makes acompletely overlapping function highly improbableAdditionally gene amplifications are in response to strongselection pressure and would therefore be under strong pos-itive selection It is also known that once these selection pres-sures are removed the amplified gene copies are rapidly lostwithin the population often within several generations(Andersson and Hughes 2009)

We favor a hypothesis that incorporates both selectionand chance into the equation We envision that the expan-sion of these gene families is reflective of their role in hostndashmicrobe interaction where they are interfering with the hostubiquitination pathway The expansions may reflect either achange in the environmental niche perhaps the ability to

2899

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

infect a new host organism or they may be in response to agrowing number of targets within a current host Because thePEX and NEX gene families have respectively expanded in theProtochlamydia and Neochlamydia they appear to providelineage-specific functions related to particular host interac-tions The PEX and NEX variation in copy number betweenclosely related organisms may be reflective of genomic drift inwhich the independent gain and loss of members within afamily has been determined to some extent by chanceevents The fact that members of the Chlamydiae includingpathogenic Chlamydia also harbor various proteins to sub-vert the host ubiquitination pathway indicates a case of con-vergent evolution toward exploitation of this system withinthis phylum

To summarize the Chlamydiae harbor several lineage-spe-cific gene families which are the largest among intracellularmicrobes with small genomes Experimental evidence andcomputational analysis strongly indicate that members ofthese large gene families function as effector proteins involvedin manipulating the ubiquitination machinery of their eukary-otic host cells The large chlamydial gene families follow abirth-and-death model of evolution where genomic driftmay influence copy number variation This might representa previously undescribed mechanism by which organismswith limited exposure to larger gene pools generate geneticdiversity

Materials and Methods

Genome Sequencing

We sequenced the genomes of P amoebophila EI2 (Schmitz-Esser et al 2008) Parachlamydia acanthamoebae OEW1(Heinz et al 2007) Neochlamydia sp TUME1 (Fritsche et al2000) and Neochlamydia sp EPS4 The Acanthamoeba spharboring the latter was isolated from pond sediment fromElba Italy Cells and DNA were prepared as described previ-ously (Schmitz-Esser et al 2010) All four genomes were se-quenced using 454 technology and assemblies wereperformed using Newbler 26 We implemented an in-housepipeline for genome annotation that combines multipleapproaches for gene calling and function prediction Genecalling was performed by combining ab initio predictionsfrom GeneMark (Besemer et al 2001) Glimmer v302 (Del-cher et al 2007) Prodigal (Hyatt et al 2010) and Critica v105(Badger and Olsen 1999) with homology information derivedfrom a BLAST search against National Center for Biotechnol-ogy Information nonredundant protein database (NCBIResource Coordinators 2014) RNA genes were called bytRNAscan-SE (Lowe and Eddy 1997) RNAmmer (Lagesenet al 2007) and Rfam (Griffiths-Jones et al 2005) Functionprediction was performed via BLAST against the UniProtdatabase (UniProt Consortium 2014) and domain predictionwas performed via InterProScan 5 (Jones et al 2014)Completeness estimates were performed based on the pres-ence of single-copy marker genes (n = 54) found in 99 of allbacterial genomes Sequences have been deposited at Gen-bankEMBLDDBJ under accession numbers PRJNA242498PRJNA242497 PRJNA242499 and PRJNA242500

Comparative Genome Analysis

Orthologous protein groups were calculated using OrthoMCL(Li et al 2003) with default parameters using the predictedproteomes from 19 chlamydial organisms (supplementarytable S2 Supplementary Material online) For those gene fam-ilies under analysis membership to a given family was furtherevaluated by alignment and assessment of the phylogenetictrees Members that had obvious major differences in thealignment and trees were dubbed spurious and likely tohave been grouped due to homology in the repeat regionand thus were removed from the gene family Whole-genomealignments were performed using the MAUVE progressive-Mauve algorithm (Darling et al 2010) The MAUVE align-ments and local synteny plots were visualized in R (v 301)with the genoPlotR package (Guy et al 2010)

Species Tree Construction

Phylogeny of the Chlamydiae was reconstructed using 32phylogenetic marker proteins (supplementary table S2Supplementary Material online) Multiple sequence align-ments were performed with MAFFT (Katoh and Standley2013) using the settings ldquondashmaxiterate 1000 and all align-ments subsequently concatenated using FASconCAT v10(Keurouck and Meusemann 2010) Maximum-likelihood analysiswas performed with RAxML (Stamatakis 2006) with 1000bootstrap iterations under the PROTGAMMAGTR modeland Bayesian inference was performed with MrBayes v32(Ronquist and Huelsenbeck 2003) using the mixed aminoacid model and standard settings via the CIPRES science gate-way (Miller et al 2010) Alignment and tree files are availableas supplementary material Supplementary Material online

Gene Family Analysis

As multiple sequence alignments of repeat containing pro-teins are not trivial we employed several methods to obtainreliable alignments We compared the alignments producedby MUSCLE (Edgar 2004) MAFFT using the ldquogenapairsoption (Katoh and Standley 2013) and DIALIGN-PFAM (AlAit et al 2013) all with and without trimming by Gblocksusing relaxed parameters ldquo-b5=h (Castresana 2000) In nearlyall cases the MAFFT alignment combined with Gblocksyielded the best alignment as judged by manual inspectionThe exceptions were that DIALIGN-PFAM with Gblocks wasthe best method for the NEX1b and PEX1 alignments Toensure robustness we calculated neighbor joining maximumlikelihood and Bayesian trees for each of the alignment datasets and selected the most supported tree for the final anal-ysis In all cases the Bayesian trees yielded the most supportwhich were calculated on the CIPRES gateway (Miller et al2010) with MrBayes as mentioned above for the species treeFor protein domain phylogenies we extracted and aligned thedomains using the hmmalign program from HMMER3 pack-age (Eddy 2011) and phylogenetic trees for the domains werecalculated using FastTree2 (Price et al 2010) Reconciliationsof gene trees to the species tree to infer gene gain and losswere performed in Notung v26 (Stolzer et al 2012) and theroot for the gene trees was assigned within the program to

2900

Domman et al doi101093molbevmsu227 MBE

achieve the lowest duplication-to-loss ratio Phylogenetictrees were visualized using iTOL (Letunic and Bork 2007) orthe ETE2 toolkit (Huerta-Cepas et al 2010) Alignment andtree files are available as supplementary materialSupplementary Material online

Detection of Selection and Recombination

Whole-protein alignments were converted to codon align-ments via the Pal2Nal program (Suyama et al 2006) Todetect the mode of selection acting on these gene familieswe used CodeML from the PAML package (Yang 2007) Onlysequences that were nearly full length and not contig fusionswere used for the CodeML analysis We additionally used themodified NeindashGojobori method for codon selection from theMEGA v51 (Tamura et al 2011) program with 1000 boot-straps and treating missing data with pairwise deletions Weused the RDP4 software suite (Martin et al 2010) to detectrecombination events which is an amalgamation of manyindividual recombination programs linked into one softwarearchitecture We employed seven recombination detectionprograms that include RDP (Martin and Rybicki 2000)GENECONV (Sawyer 1989) MaxChi (Smith 1992) BootScan(Martin et al 2005) Chimaera (Posada and Crandall 2001)SiScan (Gibbs et al 2000) and 3Seq (Boni et al 2007) Onlyevents that were predicted with more than four programswere considered

Intergenic Sequences Analysis

The proteome of Protochlamydia was used as query(tBLASTn) against the intergenic regions The hits were con-servatively filtered based on size (4 50 aa) bitscore (4 50)and E value (lt1010) For each intergenic region when twoquery proteins overlap only the best hit was considered Inaddition if different parts of the same protein had hits in oneintergenic space only the top scoring (bitscore) was consid-ered Putative domains in the intergenic regions were de-tected using InterProScan 5 (Jones et al 2014)

Domain Distribution

For each domain of interest we downloaded the Pfam HMMmodel (Finn et al 2013) and scanned all chlamydial prote-omes using the hmmscan program from the HMMER3 suite(Eddy 2011) In the case of the F-box and F-box-like domainsthe results were combined into a nonredundant list The datafor the other taxa were obtained through the Pfam websiteunder the species distribution tab for each domain or clan asin the case for the F-box The networks were created using theldquoarcdiagramrdquo package in R

Type III Secretion Analysis

Fusion proteins containing the 50-part of the genes of interest(including the first 20 codons) and the adenylate cyclase Cyawere expressed in Sh Flexneri SF401 and SF620 derivatives ofthe wild-type strain M90T in which the mxiD and ipaB geneshave been inactivated (Allaoui et al 1993) The 50-part of thetarget genes were amplified by polymerase chain reaction andcloned in the puc19cya vector as described (Subtil et al 2001)

Secretion assays were performed on 30 ml of exponentiallygrown cultures as described previously (Subtil et al 2001)Antibodies against CRP a cytosolic marker were used to es-timate the contamination of supernatant fractions with bac-terial proteins as a result of bacterial lysis Antibodies againstIpaD a type-III-secreted protein of Shigella were used to verifythat type III secretion occurred normally in the transformedstrains A monoclonal antibody against Cya and polyclonalantibodies against CRP and IpaD were generously provided byDrs N Guiso A Ullmann and C Parsot respectively (InstitutPasteur Paris) Prediction of type III secretion was performedvia the Effective database (Jehl et al 2011) and the PSORTbwebserver (Yu et al 2010)

Supplementary MaterialSupplementary tables S1ndashS3 and figures S1ndashS8 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Thomas Penz and Gabriele Schwammel are acknowledged forisolating genomic DNA This work was supported byAustrian Science Fund FWF grants Y277-B06 and I1628-B22European Research Council ERC StG EVOCHLAMY (281633)and Marie Curie Initial Training Network SymbiomicsGenome sequencing was performed at the NorwegianSequencing Center

ReferencesAl Ait L Yamak Z Morgenstern B 2013 DIALIGN at GOBICSmdashmultiple

sequence alignment using various sources of external informationNucleic Acids Res 41W3ndashW7

Allaoui A Sansonetti PJ Parsot C 1993 MxiD an outer membraneprotein necessary for the secretion of the Shigella flexneri lpa inva-sins Mol Microbiol 759ndash68

Andersson DI Hughes D 2009 Gene amplification and adaptive evolu-tion in bacteria Annu Rev Genet 43167ndash195

Angot A Vergunst A Genin S Peeters N 2007 Exploitationof eukaryotic ubiquitin signaling pathways by effectorstranslocated by bacterial type III and type IV Secretion SystemsPLoS Pathog 3e3

Arnold R Brandmaier S Kleine F Tischler P Heinz E Behrens SNiinikoski A Mewes HW Horn M Rattei T 2009 Sequence-based prediction of type III secreted proteins PLoS Pathog 5e1000376

Badger JH Olsen GJ 1999 CRITICA coding region identification toolinvoking comparative analysis Mol Biol Evol 16512ndash524

Balakirev MY Tcherniuk SO Jaquinod M Chroboczek J 2003 Otubainsa new family of cysteine proteases in the ubiquitin pathway EMBORep 4517ndash522

Baltrus DA Nishimura MT Romanchuk A Chang JH Mukhtar MSCherkis K Roach J Grant SR Jones CD Dangl JL 2011 Dynamicevolution of pathogenicity revealed by sequencing and comparativegenomics of 19 Pseudomonas syringae isolates PLoS Pathog 7e1002132

Bertelli C Collyn F Croxatto A Reurouckert C Polkinghorne A Kebbi-Beghdadi C Goesmann A Vaughan L Greub G 2010 TheWaddlia genome a window into chlamydial biology PLoS One 5e10890

2901

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

Besemer J Lomsadze A Borodovsky M 2001 GeneMarkS a self-trainingmethod for prediction of gene starts in microbial genomesImplications for finding sequence motifs in regulatory regionsNucleic Acids Res 292607ndash2618

Betts HJ Wolf K Fields KA 2009 Effector protein modulation of hostcells examples in the Chlamydia spp arsenal Curr Opin Microbiol1281ndash87

Blanc G Ogata H Robert C Audic S Claverie J-M Raoult D 2007Lateral gene transfer between obligate intracellular bacteria evi-dence from the Rickettsia massiliae genome Genome Res 171657ndash1664

Boni MF Posada D Feldman MW 2007 An exact nonparametricmethod for inferring mosaic structure in sequence tripletsGenetics 1761035ndash1047

Bordenstein SR Reznikoff WS 2005 Mobile DNA in obligate intracellularbacteria Nat Rev Microbiol 3688ndash699

Bratlie MS Johansen J Sherman BT Huang DW Lempicki RA Drabloslashs F2010 Gene duplications in prokaryotes can be associated with en-vironmental adaptation BMC Genomics 11588

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Cho N-H Kim H-R Lee J-H Kim SY Kim J Cha S Kim SY Darby ACFuxelius HH Yin J et al 2007 The Orientia tsutsugamushi genome re-veals massive proliferation of conjugative type IV secretion systemand host-cell interaction genes Proc Natl Acad Sci U S A 1047981ndash7986

Collingro A Tischler P Weinmaier T Penz T Heinz E Brunham RC ReadTD Bavoil PM Sachse K Kahane S et al 2011 Unity in varietymdashthe pan-genome of the Chlamydiae Mol Biol Evol 283253ndash3270

Darling AE Mau B Perna NT 2010 progressiveMauve multiple genomealignment with gene gain loss and rearrangement PLoS One 5e11147

Delcher AL Bratke KA Powers EC Salzberg SL 2007 Identifying bacterialgenes and endosymbiont DNA with Glimmer Bioinformatics 23673ndash679

Eddy SR 2011 Accelerated profile hmm searches PLoS Comput Biol 7e1002195

Edgar RC 2004 MUSCLE multiple sequence alignmentwith high accuracy and high throughput Nucleic Acids Res 321792ndash1797

Eirın-Lopez JM Rebordinos L Rooney AP Rozas J 2012 The birth-and-death evolution of multigene families revisited Genome Dyn7170ndash196

Ensminger AW Isberg RR 2010 E3 ubiquitin ligase activity and targetingof Bat3 by multiple Legionella pneumophila translocated substratesInfect Immun 783905ndash3919

Finn RD Bateman A Clements J Coggill P Eberhardt RY Eddy SR HegerA Hetherington K Holm L Mistry J et al 2013 Pfam the proteinfamilies database Nucleic Acids Res 42D222ndashD230

Fritsche TR Horn M Wagner M Herwig RP Schleifer K-H Gautom RK2000 Phylogenetic diversity among geographically dispersedChlamydiales endosymbionts recovered from clinical and environ-mental isolates of Acanthamoeba spp Appl Environ Microbiol 662613ndash2619

Furtado AR Essid M Perrinet S Bala~na ME Yoder N Dehoux P Subtil A2013 The chlamydial OTU domain-containing protein ChlaOTU isan early type III secretion effector targeting ubiquitin and NDP52Cell Microbiol 152064ndash2079

Gevers D Vandepoele K Simillon C Van de Peer Y 2004 Gene dupli-cation and biased functional retention of paralogs in bacterial ge-nomes Trends Microbiol 12148ndash154

Gibbs MJ Armstrong JS Gibbs AJ 2000 Sister-scanning a Monte Carloprocedure for assessing signals in recombinant sequencesBioinformatics 16573ndash582

Goldman BS Nierman WC Kaiser D Slater SC Durkin AS Eisen JARonning CM Barbazuk WB Blanchard M Field C et al 2006Evolution of sensory complexity recorded in a myxobacterialgenome Proc Natl Acad Sci U S A 10315200ndash15205

Gomes JP Nunes A Bruno WJ Borrego MJ Florindo C Dean D 2006Polymorphisms in the nine polymorphic membrane proteins ofChlamydia trachomatis across all serovars evidence for serovar Darecombination and correlation with tissue tropism J Bacteriol 188275ndash286

Griffiths-Jones S Moxon S Marshall M Khanna A Eddy SR Bateman A2005 Rfam annotating non-coding RNAs in complete genomesNucleic Acids Res 33D121ndashD124

Grimwood J Stephens RS 1999 Computational analysis of thepolymorphic membrane protein superfamily of Chlamydia tracho-matis and Chlamydia pneumoniae Microb Comp Genomics 4187ndash201

Guy L Kultima JR Andersson SGE 2010 genoPlotR comparative geneand genome visualization in R Bioinformatics 262334ndash2335

Heinz E Kolarov I Keuroastner C Toenshoff ER Wagner M Horn M 2007An Acanthamoeba sp containing two phylogenetically differentbacterial endosymbionts Environ Microbiol 91604ndash1609

Hooper SD Berg OG 2003 On the nature of gene innovation duplica-tion patterns in microbial genomes Mol Biol Evol 20945ndash954

Horn M 2008 Chlamydiae as symbionts in eukaryotes Annu RevMicrobiol 62113ndash131

Horn M Collingro A Schmitz-Esser S Beier CL Purkhold U Fartmann BBrandt P Nyakatura GJ Droege M Frishman D et al 2004Illuminating the evolutionary history of chlamydiae Science 304728ndash730

Huerta-Cepas J Dopazo J Gabaldon T 2010 ETE a python environmentfor tree exploration BMC Bioinformatics 1124

Hyatt D Chen G-L LoCascio PF Land ML Larimer FW Hauser LJ 2010Prodigal prokaryotic gene recognition and translation initiation siteidentification BMC Bioinformatics 11119

Jehl M-A Arnold R Rattei T 2011 Effectivemdasha database ofpredicted secreted bacterial proteins Nucleic Acids Res 39D591ndashD595

Jones P Binns D Chang H-Y Fraser M Li W McAnulla C McWilliam HMaslen J Mitchell A Nuka G et al 2014 InterProScan 5 genome-scale protein function classification Bioinformatics 301236ndash1240

Kaessmann H 2010 Origins evolution and phenotypic impact of newgenes Genome Res 201313ndash1326

Kalman S Mitchell W Marathe R Lammel C Fan J Hyman RW OlingerL Grimwood J Davis RW Stephens RS 1999 Comparative genomesof Chlamydia pneumoniae and C trachomatis Nat Genet 21385ndash389

Kamneva OK Knight SJ Liberles DA Ward NL 2012 Analysis of genomecontent evolution in PVC bacterial super-phylum assessment ofcandidate genes associated with cellular organization and lifestyleGenome Biol Evol 41375ndash1390

Katoh K Standley DM 2013 MAFFT multiple sequence alignment soft-ware version 7 improvements in performance and usability Mol BiolEvol 30772ndash780

Koonin EV Wolf YI Karev GP 2002 The structure of the protein uni-verse and genome evolution Nature 420218ndash223

Keurouck P Meusemann K 2010 FASconCAT convenient handling of datamatrices Mol Phylogenet Evol 561115ndash1118

Kuo C-H Ochman H 2009a Deletional bias across the three domains oflife Genome Biol Evol 1145ndash152

Kuo C-H Ochman H 2009b The fate of new bacterial genes FEMSMicrobiol Rev 3338ndash43

Lagesen K Hallin P Roslashdland EA Staefeldt H-H Rognes T Ussery DW2007 RNAmmer consistent and rapid annotation of ribosomalRNA genes Nucleic Acids Res 353100ndash3108

Lagkouvardos I Weinmaier T Lauro FM Cavicchioli R Rattei T Horn M2013 Integrating metagenomic and amplicon databases to re-solve the phylogenetic and ecological diversity of the ChlamydiaeISME J 8115ndash125

Lerat E Daubin V Ochman H Moran NA 2005 Evolutionary origins ofgenomic repertoires in bacteria PLoS Biol 3e130

Letunic I Bork P 2007 Interactive Tree Of Life (iTOL) an online tool forphylogenetic tree display and annotation Bioinformatics 23127ndash128

2902

Domman et al doi101093molbevmsu227 MBE

Li L Stoeckert CJ Jr Roos DS 2003 OrthoMCL identification ofortholog groups for eukaryotic genomes Genome Res 132178ndash2189

Lomma M Dervins-Ravault D Rolando M Nora T Newton HJ SansomFM Sahr T Gomez-Valero L Jules M Hartland EL et al 2010The Legionella pneumophila F-box protein Lpp2082 (AnkB) modu-lates ubiquitination of the host protein parvin B and promotesintracellular replication Cell Microbiol 121272ndash1291

Lowe TM Eddy SR 1997 tRNAscan-SE a program for improved detec-tion of transfer RNA genes in genomic sequence Nucleic Acids Res25955ndash964

Lynch M Conery JS 2000 The evolutionary fate and consequences ofduplicate genes Science 2901151ndash1155

Ma W Dong FFT Stavrinides J Guttman DS 2006 Type III effec-tor diversification via both pathoadaptation and horizontaltransfer in response to a coevolutionary arms race PLoS Genet 2e209

Magori S Citovsky V 2011 Hijacking of the host SCF ubiquitin ligasemachinery by plant pathogens Front Plant Sci 287

Martin D Rybicki E 2000 RDP detection of recombination amongstaligned sequences Bioinformatics 16562ndash563

Martin DP Lemey P Lott M Moulton V Posada D Lefeuvre P 2010RDP3 a flexible and fast computer program for analyzing recombi-nation Bioinformatics 262462ndash2463

Martin DP Posada D Crandall KA Williamson C 2005 A modifiedbootscan algorithm for automated identification of recombinantsequences and recombination breakpoints AIDS Res HumRetroviruses 2198ndash102

McCutcheon JP Moran NA 2011 Extreme genome reduction in sym-biotic bacteria Nat Rev Microbiol 1013ndash26

McLeod MP Warren RL Hsiao WWL Araki N Myhre M Fernandes CMiyazawa D Wong W Lillquist AL Wang D et al 2006 Thecomplete genome of Rhodococcus sp RHA1 provides insightsinto a catabolic powerhouse Proc Natl Acad Sci U S A 10315582ndash15587

Miller MA Pfeiffer W Schwartz T 2010 Creating the CIPRES sciencegateway for inference of large phylogenetic trees In Proceedings ofthe Gateway Computing Environments Workshop (GCE) 2010 p1ndash8

Misaghi S Balsara ZR Catic A Spooner E Ploegh HL Starnbach MN2006 Chlamydia trachomatis-derived deubiquitinating enzymes inmammalian cells during infection Mol Microbiol 61142ndash150

Myers GSA Crabtree J Huot CH 2012 Deep and wide comparativegenomics of Chlamydia In Tan M Bavoil PM editors Intracellularpathogens Chlamydiales Washington (DC) ASM Press p 27ndash50

Nam J Nei M 2005 Evolutionary change of the numbers of homeoboxgenes in bilateral animals Mol Biol Evol 222386ndash2394

Navarro-Quezada A Schumann N Quint M 2013 Plant F-box proteinevolution is determined by lineage-specific timing of major genefamily expansion waves PLoS One 8e68672

NCBI Resource Coordinators 2014 Database resources of the NationalCenter for Biotechnology Information Nucleic Acids Res 42D7ndashD17

Nei M 2007 The new mutation theory of phenotypic evolution ProcNatl Acad Sci U S A 10412235ndash12242

Nei M Niimura Y Nozawa M 2008 The evolution of animal chemo-sensory receptor gene repertoires roles of chance and necessity NatRev Genet 9951ndash963

Nei M Rooney AP 2005 Concerted and birth-and-death evolution ofmultigene families Annu Rev Genet 39121ndash152

Nogueira T Touchon M Rocha EPC 2012 Rapid evolution of the se-quences and gene repertoires of secreted proteins in bacteria PLoSOne 7e49403

Nozawa M Kawahara Y Nei M 2007 Genomic drift and copy numbervariation of sensory receptor genes in humans Proc Natl Acad SciU S A 10420421ndash20426

Ochman H Lawrence JG Groisman EA 2000 Lateral gene transfer andthe nature of bacterial innovation Nature 405299ndash304

Perez-Torrado R Yamada D Defossez P-A 2006 Born to bind the BTBproteinndashprotein interaction domain BioEssays 281194ndash1202

Peters J Wilson DP Myers G Timms P Bavoil PM 2007 Type III secre-tion a la Chlamydia Trends Microbiol 15241ndash251

Posada D Crandall KA 2001 Evaluation of methods for detecting re-combination from DNA sequences computer simulations Proc NatlAcad Sci U S A 9813757ndash13762

Price CT Kwaik YA 2010 Exploitation of host polyubiquitination ma-chinery through molecular mimicry by eukaryotic-like bacterial F-box effectors Cell Infect Microbiol 1122

Price CT Al-Khodor S Al-Quadan T Santic M Habyarimana F Kalia AKwaik YA 2009 Molecular mimicry by an F-box effector ofLegionella pneumophila hijacks a conserved polyubiquitinationmachinery within macrophages and protozoa PLoS Pathog 5e1000704

Price CTD Al-Quadan T Santic M Rosenshine I Kwaik YA 2011 Hostproteasomal degradation generates amino acids essential for intra-cellular bacterial growth Science 3341553ndash1557

Price MN Dehal PS Arkin AP 2010 FastTree 2mdashapproximately maxi-mum-likelihood trees for large alignments PLoS One 5e9490

Pushker R Mira A Rodriguez-Valera F 2004 Comparative geno-mics of gene-family size in closely related bacteria Genome Biol 5R27

Ronquist F Huelsenbeck JP 2003 MrBayes 3 Bayesian phylogeneticinference under mixed models Bioinformatics 191572ndash1574

Rooney AP Ward TJ 2008 Birth-and-death evolution of the internalinmultigene family in Listeria Gene 427124ndash128

Sakakibara A Hattori S 2000 Chat a CasHEF1-associated adaptor pro-tein that integrates multiple signaling pathways J Biol Chem 2756404ndash6410

Santoyo G Romero D 2005 Gene conversion and concerted evolutionin bacterial genomes FEMS Microbiol Rev 29169ndash183

Sawyer S 1989 Statistical tests for detecting gene conversion Mol BiolEvol 6526ndash538

Schmitz-Esser S Tischler P Arnold R Montanaro J Wagner M Rattei THorn M 2010 The genome of the amoeba symbiont ldquoCandidatusAmoebophilus asiaticusrdquo reveals common mechanisms for host cellinteraction among amoeba-associated bacteria J Bacteriol 1921045ndash1057

Schmitz-Esser S Toenshoff ER Haider S Heinz E Hoenninger VMWagner M Horn M 2008 Diversity of bacterial endosymbionts ofenvironmental Acanthamoeba isolates Appl Environ Microbiol 745822ndash5831

Smith JM 1992 Analyzing the mosaic structure of genes J Mol Evol 34126ndash129

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Stephens RS Kalman S Lammel C Fan J Marathe R Aravind L MitchellW Olinger L Tatusov RL Zhao Q et al 1998 Genome sequence ofan obligate intracellular pathogen of humans Chlamydia trachoma-tis Science 282754ndash759

Stogios PJ Downs GS Jauhal JJ Nandra SK Prive GG 2005Sequence and structural analysis of BTB domain proteins GenomeBiol 6R82

Stolzer M Lai H Xu M Sathaye D Vernot B Durand D 2012Inferring duplications losses transfers and incomplete line-age sorting with nonbinary species trees Bioinformatics 28i409ndashi415

Subtil A Collingro A Horn M 2014 Tracing the primordial Chlamydiaeextinct parasites of plants Trends Plant Sci 1936ndash43

Subtil A Parsot C Dautry-Varsat A 2001 Secretion of predicted Incproteins of Chlamydia pneumoniae by a heterologous type III ma-chinery Mol Microbiol 39792ndash800

Suyama M Torrents D Bork P 2006 PAL2NAL robust conversion ofprotein sequence alignments into the corresponding codon align-ments Nucleic Acids Res 34W609ndashW612

2903

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

Tamura K Peterson D Peterson N Stecher G Nei M Kumar S 2011MEGA5 molecular evolutionary genetics analysis using maximumlikelihood evolutionary distance and maximum parsimony meth-ods Mol Biol Evol 282731ndash2739

Thomas JH 2006 Adaptive evolution in two large families of ubiquitin-ligase adapters in nematodes and plants Genome Res 161017ndash1030

Treangen TJ Rocha EPC 2011 Horizontal transfer not duplicationdrives the expansion of protein families in prokaryotes PLoSGenet 7e1001284

UniProt Consortium 2014 Activities at the Universal Protein Resource(UniProt) Nucleic Acids Res 42D191ndashD198

Valen LV 1974 Molecular evolution as predicted by natural selectionJ Mol Evol 389ndash101

Willems AR Schwab M Tyers M 2004 A hitchhikerrsquos guide to the cullinubiquitin ligases SCF and its kin Biochim Biophys Acta 1695133ndash170

Xu G Ma H Nei M Kong H 2009 Evolution of F-box genes in plantsdifferent modes of sequence divergence and their relationships withfunctional diversification Proc Natl Acad Sci U S A 106835ndash840

Xu J 2006 Microbial ecology in the age of genomics andmetagenomics concepts tools and recent advances Mol Ecol 151713ndash1731

Yang J Lusk R Li W-H 2003 Organismal complexity protein complex-ity and gene duplicability Proc Natl Acad Sci U S A 10015661ndash15665

Yang Z 2007 PAML 4 phylogenetic analysis by maximum likelihoodMol Biol Evol 241586ndash1591

Yu NY Wagner JR Laird MR Melli G Rey S Lo R Dao P Sahinalp SCEster M Foster LJ et al 2010 PSORTb 30 improved protein sub-cellular localization prediction with refined localization subcate-gories and predictive capabilities for all prokaryotes Bioinformatics261608ndash1615

Zhang J 2003 Evolution by gene duplication an update Trends EcolEvol 18292ndash298

Zheng N Schulman BA Song L Miller JJ Jeffrey PD Wang P Chu CKoepp DM Elledge SJ Pagano M et al 2002 Structure of the Cul1ndashRbx1ndashSkp1ndashF boxSkp2 SCF ubiquitin ligase complex Nature 416703ndash709

2904

Domman et al doi101093molbevmsu227 MBE

Page 9: Massive Expansion of Ubiquitination-Related Gene Families

To gain a broader overview of the occurrence of F-boxF-box-like and BTBPOZ domains among other prokaryotesand eukaryotes we extracted domain abundance data fromPfam (Finn et al 2013) and included the current counts forthe Chlamydiae genomes present in this study (fig 6) Thisrevealed a striking pattern that the few bacterial groupsencoding proteins with F-box and BTB domains are almostexclusively amoeba-associated organisms These includemembers of the Legionellales (Gammaproteobacteria) theRickettsiales (Alphaproteobacteria) and the amoeba symbi-ont Amoebophilus asiaticus (Bacteroidetes) When thenumber of F-box proteins is normalized against the totalnumber of species in a given taxon the Chlamydiae lead inthe number of F-box proteins found in bacteria and evenharbor more than several lineages of eukaryotes includingthe Amoebozoa For the BTB proteins the Chlamydiaeappear to be the only bacterial lineage that harbors thisdomain It is intriguing that many of the large double-stranded DNA viruses namely the amoeba-infecting giantviruses contain many proteins with an F-box or BTB domain

Discussion

Large Gene Families Are Rare within ReducedBacterial Genomes

Chlamydial genomes are among the smallest known for pro-karyotes due to genome degradation consistent with long-term obligate associations with eukaryotic organisms(McCutcheon and Moran 2011) The genomes of host-asso-ciated bacteria such as Coxiella Mycoplasma Rickettsia spe-cies and members of the Chlamydiaceae tend to have smallor single copy gene families (Gevers et al 2004) and genefamily expansions either by gene duplication or HGT appearto have less effect on shaping the genomes of these obligatehost-associated bacteria (Bordenstein and Reznikoff 2005)However there is evidence that HGT may be more prevalentthan once thought in these organisms (Blanc et al 2007)A notable exception to this paradigm is the genome of theobligate intracellular pathogen Orientia tsutsugamushi(Rickettsiales) in which there has been massive expansionof type IV secretion system (over 350 tra-related genes) andhostndashmicrobe interaction genes (Cho et al 2007) Howeverthese expansions are thought to be the result of copiousplasmid integration and the genome is also littered withmobile elements a scenario not shared within chlamydialgenomes Here we have shown that several members ofthe Chlamydiae harbor gene families that have expanded atimmense magnitudes especially when compared with otherintracellular bacteria (fig 2) We find strong support for du-plication processes contributing to the expansion of thesefamilies thus highlighting that innovation through gene

FIG 6 Taxonomic profile of F-box and BTB domains The distributionof (A) the F-box clan and (B) the BTB-box throughout sequenced or-ganisms The size of the node indicates the number of species harboringproteins with the domain Thus larger node size indicates a largernumber of species in which a domain is found within a taxon Nodesare ordered from least to greatest by the total number of proteins thatcontain the domain within the taxon This is different than the numberof species as one species can have many proteins harboring a givendomain To reflect this disparity and to facilitate comparisons we com-puted a normalized value for each taxon that represents the number oftotal proteins divided by the number of species This normalizationvalue is represented by the width of the arc in the diagram For instancethe chlamydiae are represented by few species (small node size) but areamong the taxa containing the largest numbers of proteins with F-Boxand BTB domains (position on vertical axis) and show a high number ofproteins with these domains per species (arc width) All bacterial taxa

FIG 6 Continuedare plotted in purple and selected major eukaryotic taxa in blue TheChlamydiae are labeled in red and double-stranded DNA viruses areshown in green The data were obtained from the Pfam database foreach domain and counts were updated to reflect findings in this study

2897

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

duplication has had pronounced effects in shaping these chla-mydial genomes

Chlamydial Proteins Putatively Involved inInterference with Eukaryotic Ubiquitination Pathways

We demonstrated that the largest chlamydial gene familiesharbor proteins with domains associated with eukaryotic ubi-quitination pathways and that the chlamydial F-box super-family in particular is tremendous in size (figs 3 and 5) Arecent survey of prokaryotes found a total of 74 F-box pro-teins distributed in 22 species (Price and Kwaik 2010) whichmeans the number of these proteins present in a singleNeochlamydia genome is twice that of all previously knownbacterial F-box proteins combined The 76-member BTB su-perfamily is also remarkable in that the Parachlamydiaceae arethe only prokaryotes known to harbor this domain (fig 6)

In eukaryotes ubiquitin plays a pivotal regulatory role as aposttranslational modification that includes targeting pro-teins for degradation The process of adding ubiquitin to aprotein occurs when an assembled E3 ubiquitin ligase com-plex carrying a target protein is bound to the ubiquitin con-jugating enzyme E2 The ubiquitin ligase is a multiproteincomplex termed the Skp1-Cullin-F-box protein complex bywhich F-box proteins recruit target proteins and subsequentlybind to Skp1 which is linked to a Cullin protein (Zheng et al2002) A RINGU-box protein then serves as a linker of E2 tothe newly formed ubiquitin ligase and the transfer of theubiquitin moiety to the target occurs BTB-box proteins canhave multiple functions but chief among them is a functionalequivalent to the F-box protein in recruiting targets by bind-ing to the E3 ubiquitin ligase complex (Perez-Torrado et al2006) Proteinndashprotein interaction domains such as ankyrinkelch WD-40 LRR or TPR repeat domains are coupled toF-box and BTB-box domains and confer the specificity fortarget proteins

Given the conservation and essentiality of the ubiquitina-tion pathway throughout eukaryotes it should come as nosurprise that bacterial pathogens have engineered ways tomanipulate this pathway The use of F-box proteins appearsto be a common feature among plant pathogens such asAgrobacterium tumefaciens and Ralstonia solanacearumwhose genomes encode one and four F-box proteins respec-tively (Magori and Citovsky 2011) Intriguingly F-box proteinsseem to be a common feature of amoeba-associated bacteriaand viruses The amoeba symbiont A asiaticus a Bacteroi-detes is predicted to harbor 15 F-box proteins and until nowwas the largest known pool of these proteins among se-quenced genomes (Schmitz-Esser et al 2010) AdditionallyLegionella pneumophila exports an F-box protein coupledto ankyrin repeats termed AnkB that is essential for infectionof both human cell lines and Acanthamoeba (Price et al 2009Lomma et al 2010) AnkB blocks host proteosomal degrada-tion and thus generates increased levels of required aminoacids (Price et al 2009 Price et al 2011) Several other F-boxproteins secreted by L pneumophila have been shown tointeract with host E3 ligase complexes (Ensminger andIsberg 2010) Among the members of the Chlamydiaceae

we could not detect an F-box or BTB-box domain Howeverthere are several proteins within the Chlamydiaceae thatfunction as deubiquinating proteases such as the C tracho-matis ChlaDub1 (Misaghi et al 2006) and the recently de-scribed ChlaOTU characterized in Chlamydia caviae (Furtadoet al 2013)

We have shown experimentally that representative mem-bers of the investigated chlamydial gene families contain func-tional type III secretion signals (supplementary fig S1Supplementary Material online) Thus they likely representan extensive pool of effector proteins with a putative role inhijacking the host ubiquitination machinery perhaps in amanner similar to AnkB from Legionella by increasing nutri-ent availability In the absence of direct proteinndashprotein in-teraction data however we can only speculate as to what theinteraction partner(s) are for the members of the PEX andNEX gene families

Birth-and-Death Evolution Has Shaped LargeParachlamydiaceae Gene Families

Large gene families are thought to generally either evolve viaconcerted evolution or according to a birth-and-death model(Nei and Rooney 2005) When gene families are evolving con-certedly all members experience the same evolutionary pres-sure and evolve as a unit The gene family is marked byrecombination between members that leads to a homogeni-zation of all members and thus in the phylogenetic analysisone observes intraspecies clustering of gene family membersAlthough we do find minor evidence that recombination hasoccurred between members in the PEX and NEX gene families(supplementary fig S8 Supplementary Material online) theeffect does not appear to be that of homogenization In con-trast long branch lengths in the trees and moderate overallsequence similarity indicate these proteins have divergedquite extensively (supplementary figs S2ndashS6 SupplementaryMaterial online) As we do not observe a dominance of in-traspecies clustering in the phylogenetic analysis these genefamilies are not evolving in a fashion as would be predicted viaconcerted evolution

We provide clear evidence supporting birth-and-deathevolution of the PEX and NEX gene families which ismarked by independent gains and losses of members (Neiand Rooney 2005) We detected frequent lineage-specific du-plication and loss events leading to high rates of variation incopy number between closely related organisms (supplemen-tary table S3 Supplementary Material online) In the phylo-genetic trees of the PEX and NEX gene families we findinterspecies clustering of members which is a hallmark ofthe birth-and-death model (supplementary figs S2ndashS6Supplementary Material online) Additionally we detectedpseudogenized gene fragments of members related to thelarge gene families which is another hallmark of this modeof evolution To our knowledge this mode of evolution has sofar only been described once for a bacterial gene family(Rooney and Ward 2008)

It has been proposed that gene families that control phe-notypic characters are generally subject to birth-and-death

2898

Domman et al doi101093molbevmsu227 MBE

evolution (Nei and Rooney 2005 Nei 2007) In the case of theexpansions seen in the Parachlamydiaceae this character islikely the ability to effectively infect and replicate in protisthosts The advantage of the birth-and-death scenario as op-posed to concerted evolution is that individual members ofthe gene family are able to functionally differentiate from eachother and thus might facilitate the adaptation to new eco-logical niches (Nei 2007) Therefore a possible driver for thebirth-and-death model for the PEX and NEX families is theexploitation of new ecological niches which in this case ismost likely new host(s) species or a novel way to subvert hostcell machinery

A Confluence of Drift and Selection

Birth-and-death evolution of gene families occurs by bothadaptive processes and chance events such as genetic drift(Nei et al 2008) This is due to the fact that although geneduplications are intrinsically stochastic events their fixation isinfluenced by both selection and genetic drift If a duplicate isfixed functional divergence can occur due to relaxed selec-tion or diversifying selection in one of the gene copies aprocess known as neofunctionalization (Lynch and Conery2000) Once these genes have diverged the new functionsare then maintained in the genome via purifying selectionWe envision an evolutionary scenario where the PEX and NEXgene family members rapidly diverged either due to positivediversifying or relaxed selection after duplication likely leadingto functional diversification The window for detecting theseearly diversification processes is very small but our analysisindicates that the gene family members are now being main-tained via purifying selection

The retention of such large gene families in organisms thattypically undergo extreme genome reduction is perplexingespecially when considering the variation in copy numberbetween organisms The large number of gains and lossesfor F-box domain containing proteins across chlamydial line-ages indicate substantial fluctuations in gene content some-times over short evolutionary time (fig 5b) One describedcorollary of gene families evolving via a birth-and-deathmodel is that the copy number variation of memberswithin a gene family may vary due to chance duplicationloss events both between and within species a process coinedldquogenomic drift (Nei et al 2008) Genomic drift has been in-voked to explain the large copy number variations in severallarge gene families in eukaryotes including animal chemosen-sory receptors (Nozawa et al 2007) homeobox genes (Namand Nei 2005) and fatty-acid reductases (Eirın-Lopez et al2012)

Remarkably eukaryotic F-box and BTB-box gene familiesdemonstrate the same dramatic evolution following a rapidbirth-and-death model and extensive copy number variationis seen both between and within eukaryotic species (Xu 2006Navarro-Quezada et al 2013) This is especially true in higherplant species (Stogios et al 2005 Xu et al 2009) whereArabidopsis thaliana and Oryza sativa harbor upwards of800 F-box proteins (Xu et al 2009) and large expansionshave also been described in some nematode lineages

(Thomas 2006) A case has been made that genomic driftalso influences the evolution of these eukaryotic F-boxBTB-box protein families (Xu et al 2009) Genomic drift there-fore seems a plausible mechanism describing some aspects ofevolution of the PEX and NEX gene families in theParachlamydiaceae Genome sequences from other chlamyd-ial genomes and data from within populations would allowfurther testing of this hypothesis

Another possibility is that the expansion of these families isthe result of selection pressure due to a coevolutionary armsrace with host counterpart protein(s) Under this scenariothe expansion and diversification of these families are in directresponse to changes in target proteins These evolutionarydynamics also referred to as the Red Queen hypothesis(Valen 1974) have been most exemplified in bacteriawithin plantndashpathogen relationships most notably betweenthe pathogen Pseudomonas syringae and its host Ar thaliana(Ma et al 2006 Baltrus et al 2011) Copy number variation ofeffector proteins has been shown for Ps syringae and it isspeculated that these differences confer differential hostranges (Baltrus et al 2011) It is plausible therefore that var-iations in copy number between the PEX and NEX gene fam-ilies is influenced by the host range of particular chlamydiallineages This would imply that these proteins interact withmany targets from a narrow host range or that there are alimited number of targets for a large number of possible hostsIt seems most parsimonious that the latter was the case andthat these genes serve as accessory virulence factors allowingthese chlamydiae to expand their host range However thefailure to detect positive selection as the major force acting onthe PEX and NEX gene family members casts doubts on acoevolutionary arms race as the sole mechanism for the evo-lution of these gene families

Another conceivable scenario might be that selection pres-sure for increased gene dosage has contributed to the expan-sion of the large chlamydial gene families In reduced genomeswhere the pool of regulatory proteins is limited a path forincreased protein expression might be gene duplication Thismode of ldquogene amplification has been described for instancefor the increase of antibiotic resistance genes in E coli thecholera toxin gene in Vibrio cholerae and capsule biosynthesisgenes in Haemophilus influenza (reviewed in Andersson andHughes 2009) Although a gene dosage scenario cannot befully ruled out for the PEX and NEX gene families it seemsunlikely as the high divergence of their members makes acompletely overlapping function highly improbableAdditionally gene amplifications are in response to strongselection pressure and would therefore be under strong pos-itive selection It is also known that once these selection pres-sures are removed the amplified gene copies are rapidly lostwithin the population often within several generations(Andersson and Hughes 2009)

We favor a hypothesis that incorporates both selectionand chance into the equation We envision that the expan-sion of these gene families is reflective of their role in hostndashmicrobe interaction where they are interfering with the hostubiquitination pathway The expansions may reflect either achange in the environmental niche perhaps the ability to

2899

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

infect a new host organism or they may be in response to agrowing number of targets within a current host Because thePEX and NEX gene families have respectively expanded in theProtochlamydia and Neochlamydia they appear to providelineage-specific functions related to particular host interac-tions The PEX and NEX variation in copy number betweenclosely related organisms may be reflective of genomic drift inwhich the independent gain and loss of members within afamily has been determined to some extent by chanceevents The fact that members of the Chlamydiae includingpathogenic Chlamydia also harbor various proteins to sub-vert the host ubiquitination pathway indicates a case of con-vergent evolution toward exploitation of this system withinthis phylum

To summarize the Chlamydiae harbor several lineage-spe-cific gene families which are the largest among intracellularmicrobes with small genomes Experimental evidence andcomputational analysis strongly indicate that members ofthese large gene families function as effector proteins involvedin manipulating the ubiquitination machinery of their eukary-otic host cells The large chlamydial gene families follow abirth-and-death model of evolution where genomic driftmay influence copy number variation This might representa previously undescribed mechanism by which organismswith limited exposure to larger gene pools generate geneticdiversity

Materials and Methods

Genome Sequencing

We sequenced the genomes of P amoebophila EI2 (Schmitz-Esser et al 2008) Parachlamydia acanthamoebae OEW1(Heinz et al 2007) Neochlamydia sp TUME1 (Fritsche et al2000) and Neochlamydia sp EPS4 The Acanthamoeba spharboring the latter was isolated from pond sediment fromElba Italy Cells and DNA were prepared as described previ-ously (Schmitz-Esser et al 2010) All four genomes were se-quenced using 454 technology and assemblies wereperformed using Newbler 26 We implemented an in-housepipeline for genome annotation that combines multipleapproaches for gene calling and function prediction Genecalling was performed by combining ab initio predictionsfrom GeneMark (Besemer et al 2001) Glimmer v302 (Del-cher et al 2007) Prodigal (Hyatt et al 2010) and Critica v105(Badger and Olsen 1999) with homology information derivedfrom a BLAST search against National Center for Biotechnol-ogy Information nonredundant protein database (NCBIResource Coordinators 2014) RNA genes were called bytRNAscan-SE (Lowe and Eddy 1997) RNAmmer (Lagesenet al 2007) and Rfam (Griffiths-Jones et al 2005) Functionprediction was performed via BLAST against the UniProtdatabase (UniProt Consortium 2014) and domain predictionwas performed via InterProScan 5 (Jones et al 2014)Completeness estimates were performed based on the pres-ence of single-copy marker genes (n = 54) found in 99 of allbacterial genomes Sequences have been deposited at Gen-bankEMBLDDBJ under accession numbers PRJNA242498PRJNA242497 PRJNA242499 and PRJNA242500

Comparative Genome Analysis

Orthologous protein groups were calculated using OrthoMCL(Li et al 2003) with default parameters using the predictedproteomes from 19 chlamydial organisms (supplementarytable S2 Supplementary Material online) For those gene fam-ilies under analysis membership to a given family was furtherevaluated by alignment and assessment of the phylogenetictrees Members that had obvious major differences in thealignment and trees were dubbed spurious and likely tohave been grouped due to homology in the repeat regionand thus were removed from the gene family Whole-genomealignments were performed using the MAUVE progressive-Mauve algorithm (Darling et al 2010) The MAUVE align-ments and local synteny plots were visualized in R (v 301)with the genoPlotR package (Guy et al 2010)

Species Tree Construction

Phylogeny of the Chlamydiae was reconstructed using 32phylogenetic marker proteins (supplementary table S2Supplementary Material online) Multiple sequence align-ments were performed with MAFFT (Katoh and Standley2013) using the settings ldquondashmaxiterate 1000 and all align-ments subsequently concatenated using FASconCAT v10(Keurouck and Meusemann 2010) Maximum-likelihood analysiswas performed with RAxML (Stamatakis 2006) with 1000bootstrap iterations under the PROTGAMMAGTR modeland Bayesian inference was performed with MrBayes v32(Ronquist and Huelsenbeck 2003) using the mixed aminoacid model and standard settings via the CIPRES science gate-way (Miller et al 2010) Alignment and tree files are availableas supplementary material Supplementary Material online

Gene Family Analysis

As multiple sequence alignments of repeat containing pro-teins are not trivial we employed several methods to obtainreliable alignments We compared the alignments producedby MUSCLE (Edgar 2004) MAFFT using the ldquogenapairsoption (Katoh and Standley 2013) and DIALIGN-PFAM (AlAit et al 2013) all with and without trimming by Gblocksusing relaxed parameters ldquo-b5=h (Castresana 2000) In nearlyall cases the MAFFT alignment combined with Gblocksyielded the best alignment as judged by manual inspectionThe exceptions were that DIALIGN-PFAM with Gblocks wasthe best method for the NEX1b and PEX1 alignments Toensure robustness we calculated neighbor joining maximumlikelihood and Bayesian trees for each of the alignment datasets and selected the most supported tree for the final anal-ysis In all cases the Bayesian trees yielded the most supportwhich were calculated on the CIPRES gateway (Miller et al2010) with MrBayes as mentioned above for the species treeFor protein domain phylogenies we extracted and aligned thedomains using the hmmalign program from HMMER3 pack-age (Eddy 2011) and phylogenetic trees for the domains werecalculated using FastTree2 (Price et al 2010) Reconciliationsof gene trees to the species tree to infer gene gain and losswere performed in Notung v26 (Stolzer et al 2012) and theroot for the gene trees was assigned within the program to

2900

Domman et al doi101093molbevmsu227 MBE

achieve the lowest duplication-to-loss ratio Phylogenetictrees were visualized using iTOL (Letunic and Bork 2007) orthe ETE2 toolkit (Huerta-Cepas et al 2010) Alignment andtree files are available as supplementary materialSupplementary Material online

Detection of Selection and Recombination

Whole-protein alignments were converted to codon align-ments via the Pal2Nal program (Suyama et al 2006) Todetect the mode of selection acting on these gene familieswe used CodeML from the PAML package (Yang 2007) Onlysequences that were nearly full length and not contig fusionswere used for the CodeML analysis We additionally used themodified NeindashGojobori method for codon selection from theMEGA v51 (Tamura et al 2011) program with 1000 boot-straps and treating missing data with pairwise deletions Weused the RDP4 software suite (Martin et al 2010) to detectrecombination events which is an amalgamation of manyindividual recombination programs linked into one softwarearchitecture We employed seven recombination detectionprograms that include RDP (Martin and Rybicki 2000)GENECONV (Sawyer 1989) MaxChi (Smith 1992) BootScan(Martin et al 2005) Chimaera (Posada and Crandall 2001)SiScan (Gibbs et al 2000) and 3Seq (Boni et al 2007) Onlyevents that were predicted with more than four programswere considered

Intergenic Sequences Analysis

The proteome of Protochlamydia was used as query(tBLASTn) against the intergenic regions The hits were con-servatively filtered based on size (4 50 aa) bitscore (4 50)and E value (lt1010) For each intergenic region when twoquery proteins overlap only the best hit was considered Inaddition if different parts of the same protein had hits in oneintergenic space only the top scoring (bitscore) was consid-ered Putative domains in the intergenic regions were de-tected using InterProScan 5 (Jones et al 2014)

Domain Distribution

For each domain of interest we downloaded the Pfam HMMmodel (Finn et al 2013) and scanned all chlamydial prote-omes using the hmmscan program from the HMMER3 suite(Eddy 2011) In the case of the F-box and F-box-like domainsthe results were combined into a nonredundant list The datafor the other taxa were obtained through the Pfam websiteunder the species distribution tab for each domain or clan asin the case for the F-box The networks were created using theldquoarcdiagramrdquo package in R

Type III Secretion Analysis

Fusion proteins containing the 50-part of the genes of interest(including the first 20 codons) and the adenylate cyclase Cyawere expressed in Sh Flexneri SF401 and SF620 derivatives ofthe wild-type strain M90T in which the mxiD and ipaB geneshave been inactivated (Allaoui et al 1993) The 50-part of thetarget genes were amplified by polymerase chain reaction andcloned in the puc19cya vector as described (Subtil et al 2001)

Secretion assays were performed on 30 ml of exponentiallygrown cultures as described previously (Subtil et al 2001)Antibodies against CRP a cytosolic marker were used to es-timate the contamination of supernatant fractions with bac-terial proteins as a result of bacterial lysis Antibodies againstIpaD a type-III-secreted protein of Shigella were used to verifythat type III secretion occurred normally in the transformedstrains A monoclonal antibody against Cya and polyclonalantibodies against CRP and IpaD were generously provided byDrs N Guiso A Ullmann and C Parsot respectively (InstitutPasteur Paris) Prediction of type III secretion was performedvia the Effective database (Jehl et al 2011) and the PSORTbwebserver (Yu et al 2010)

Supplementary MaterialSupplementary tables S1ndashS3 and figures S1ndashS8 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Thomas Penz and Gabriele Schwammel are acknowledged forisolating genomic DNA This work was supported byAustrian Science Fund FWF grants Y277-B06 and I1628-B22European Research Council ERC StG EVOCHLAMY (281633)and Marie Curie Initial Training Network SymbiomicsGenome sequencing was performed at the NorwegianSequencing Center

ReferencesAl Ait L Yamak Z Morgenstern B 2013 DIALIGN at GOBICSmdashmultiple

sequence alignment using various sources of external informationNucleic Acids Res 41W3ndashW7

Allaoui A Sansonetti PJ Parsot C 1993 MxiD an outer membraneprotein necessary for the secretion of the Shigella flexneri lpa inva-sins Mol Microbiol 759ndash68

Andersson DI Hughes D 2009 Gene amplification and adaptive evolu-tion in bacteria Annu Rev Genet 43167ndash195

Angot A Vergunst A Genin S Peeters N 2007 Exploitationof eukaryotic ubiquitin signaling pathways by effectorstranslocated by bacterial type III and type IV Secretion SystemsPLoS Pathog 3e3

Arnold R Brandmaier S Kleine F Tischler P Heinz E Behrens SNiinikoski A Mewes HW Horn M Rattei T 2009 Sequence-based prediction of type III secreted proteins PLoS Pathog 5e1000376

Badger JH Olsen GJ 1999 CRITICA coding region identification toolinvoking comparative analysis Mol Biol Evol 16512ndash524

Balakirev MY Tcherniuk SO Jaquinod M Chroboczek J 2003 Otubainsa new family of cysteine proteases in the ubiquitin pathway EMBORep 4517ndash522

Baltrus DA Nishimura MT Romanchuk A Chang JH Mukhtar MSCherkis K Roach J Grant SR Jones CD Dangl JL 2011 Dynamicevolution of pathogenicity revealed by sequencing and comparativegenomics of 19 Pseudomonas syringae isolates PLoS Pathog 7e1002132

Bertelli C Collyn F Croxatto A Reurouckert C Polkinghorne A Kebbi-Beghdadi C Goesmann A Vaughan L Greub G 2010 TheWaddlia genome a window into chlamydial biology PLoS One 5e10890

2901

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

Besemer J Lomsadze A Borodovsky M 2001 GeneMarkS a self-trainingmethod for prediction of gene starts in microbial genomesImplications for finding sequence motifs in regulatory regionsNucleic Acids Res 292607ndash2618

Betts HJ Wolf K Fields KA 2009 Effector protein modulation of hostcells examples in the Chlamydia spp arsenal Curr Opin Microbiol1281ndash87

Blanc G Ogata H Robert C Audic S Claverie J-M Raoult D 2007Lateral gene transfer between obligate intracellular bacteria evi-dence from the Rickettsia massiliae genome Genome Res 171657ndash1664

Boni MF Posada D Feldman MW 2007 An exact nonparametricmethod for inferring mosaic structure in sequence tripletsGenetics 1761035ndash1047

Bordenstein SR Reznikoff WS 2005 Mobile DNA in obligate intracellularbacteria Nat Rev Microbiol 3688ndash699

Bratlie MS Johansen J Sherman BT Huang DW Lempicki RA Drabloslashs F2010 Gene duplications in prokaryotes can be associated with en-vironmental adaptation BMC Genomics 11588

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Cho N-H Kim H-R Lee J-H Kim SY Kim J Cha S Kim SY Darby ACFuxelius HH Yin J et al 2007 The Orientia tsutsugamushi genome re-veals massive proliferation of conjugative type IV secretion systemand host-cell interaction genes Proc Natl Acad Sci U S A 1047981ndash7986

Collingro A Tischler P Weinmaier T Penz T Heinz E Brunham RC ReadTD Bavoil PM Sachse K Kahane S et al 2011 Unity in varietymdashthe pan-genome of the Chlamydiae Mol Biol Evol 283253ndash3270

Darling AE Mau B Perna NT 2010 progressiveMauve multiple genomealignment with gene gain loss and rearrangement PLoS One 5e11147

Delcher AL Bratke KA Powers EC Salzberg SL 2007 Identifying bacterialgenes and endosymbiont DNA with Glimmer Bioinformatics 23673ndash679

Eddy SR 2011 Accelerated profile hmm searches PLoS Comput Biol 7e1002195

Edgar RC 2004 MUSCLE multiple sequence alignmentwith high accuracy and high throughput Nucleic Acids Res 321792ndash1797

Eirın-Lopez JM Rebordinos L Rooney AP Rozas J 2012 The birth-and-death evolution of multigene families revisited Genome Dyn7170ndash196

Ensminger AW Isberg RR 2010 E3 ubiquitin ligase activity and targetingof Bat3 by multiple Legionella pneumophila translocated substratesInfect Immun 783905ndash3919

Finn RD Bateman A Clements J Coggill P Eberhardt RY Eddy SR HegerA Hetherington K Holm L Mistry J et al 2013 Pfam the proteinfamilies database Nucleic Acids Res 42D222ndashD230

Fritsche TR Horn M Wagner M Herwig RP Schleifer K-H Gautom RK2000 Phylogenetic diversity among geographically dispersedChlamydiales endosymbionts recovered from clinical and environ-mental isolates of Acanthamoeba spp Appl Environ Microbiol 662613ndash2619

Furtado AR Essid M Perrinet S Bala~na ME Yoder N Dehoux P Subtil A2013 The chlamydial OTU domain-containing protein ChlaOTU isan early type III secretion effector targeting ubiquitin and NDP52Cell Microbiol 152064ndash2079

Gevers D Vandepoele K Simillon C Van de Peer Y 2004 Gene dupli-cation and biased functional retention of paralogs in bacterial ge-nomes Trends Microbiol 12148ndash154

Gibbs MJ Armstrong JS Gibbs AJ 2000 Sister-scanning a Monte Carloprocedure for assessing signals in recombinant sequencesBioinformatics 16573ndash582

Goldman BS Nierman WC Kaiser D Slater SC Durkin AS Eisen JARonning CM Barbazuk WB Blanchard M Field C et al 2006Evolution of sensory complexity recorded in a myxobacterialgenome Proc Natl Acad Sci U S A 10315200ndash15205

Gomes JP Nunes A Bruno WJ Borrego MJ Florindo C Dean D 2006Polymorphisms in the nine polymorphic membrane proteins ofChlamydia trachomatis across all serovars evidence for serovar Darecombination and correlation with tissue tropism J Bacteriol 188275ndash286

Griffiths-Jones S Moxon S Marshall M Khanna A Eddy SR Bateman A2005 Rfam annotating non-coding RNAs in complete genomesNucleic Acids Res 33D121ndashD124

Grimwood J Stephens RS 1999 Computational analysis of thepolymorphic membrane protein superfamily of Chlamydia tracho-matis and Chlamydia pneumoniae Microb Comp Genomics 4187ndash201

Guy L Kultima JR Andersson SGE 2010 genoPlotR comparative geneand genome visualization in R Bioinformatics 262334ndash2335

Heinz E Kolarov I Keuroastner C Toenshoff ER Wagner M Horn M 2007An Acanthamoeba sp containing two phylogenetically differentbacterial endosymbionts Environ Microbiol 91604ndash1609

Hooper SD Berg OG 2003 On the nature of gene innovation duplica-tion patterns in microbial genomes Mol Biol Evol 20945ndash954

Horn M 2008 Chlamydiae as symbionts in eukaryotes Annu RevMicrobiol 62113ndash131

Horn M Collingro A Schmitz-Esser S Beier CL Purkhold U Fartmann BBrandt P Nyakatura GJ Droege M Frishman D et al 2004Illuminating the evolutionary history of chlamydiae Science 304728ndash730

Huerta-Cepas J Dopazo J Gabaldon T 2010 ETE a python environmentfor tree exploration BMC Bioinformatics 1124

Hyatt D Chen G-L LoCascio PF Land ML Larimer FW Hauser LJ 2010Prodigal prokaryotic gene recognition and translation initiation siteidentification BMC Bioinformatics 11119

Jehl M-A Arnold R Rattei T 2011 Effectivemdasha database ofpredicted secreted bacterial proteins Nucleic Acids Res 39D591ndashD595

Jones P Binns D Chang H-Y Fraser M Li W McAnulla C McWilliam HMaslen J Mitchell A Nuka G et al 2014 InterProScan 5 genome-scale protein function classification Bioinformatics 301236ndash1240

Kaessmann H 2010 Origins evolution and phenotypic impact of newgenes Genome Res 201313ndash1326

Kalman S Mitchell W Marathe R Lammel C Fan J Hyman RW OlingerL Grimwood J Davis RW Stephens RS 1999 Comparative genomesof Chlamydia pneumoniae and C trachomatis Nat Genet 21385ndash389

Kamneva OK Knight SJ Liberles DA Ward NL 2012 Analysis of genomecontent evolution in PVC bacterial super-phylum assessment ofcandidate genes associated with cellular organization and lifestyleGenome Biol Evol 41375ndash1390

Katoh K Standley DM 2013 MAFFT multiple sequence alignment soft-ware version 7 improvements in performance and usability Mol BiolEvol 30772ndash780

Koonin EV Wolf YI Karev GP 2002 The structure of the protein uni-verse and genome evolution Nature 420218ndash223

Keurouck P Meusemann K 2010 FASconCAT convenient handling of datamatrices Mol Phylogenet Evol 561115ndash1118

Kuo C-H Ochman H 2009a Deletional bias across the three domains oflife Genome Biol Evol 1145ndash152

Kuo C-H Ochman H 2009b The fate of new bacterial genes FEMSMicrobiol Rev 3338ndash43

Lagesen K Hallin P Roslashdland EA Staefeldt H-H Rognes T Ussery DW2007 RNAmmer consistent and rapid annotation of ribosomalRNA genes Nucleic Acids Res 353100ndash3108

Lagkouvardos I Weinmaier T Lauro FM Cavicchioli R Rattei T Horn M2013 Integrating metagenomic and amplicon databases to re-solve the phylogenetic and ecological diversity of the ChlamydiaeISME J 8115ndash125

Lerat E Daubin V Ochman H Moran NA 2005 Evolutionary origins ofgenomic repertoires in bacteria PLoS Biol 3e130

Letunic I Bork P 2007 Interactive Tree Of Life (iTOL) an online tool forphylogenetic tree display and annotation Bioinformatics 23127ndash128

2902

Domman et al doi101093molbevmsu227 MBE

Li L Stoeckert CJ Jr Roos DS 2003 OrthoMCL identification ofortholog groups for eukaryotic genomes Genome Res 132178ndash2189

Lomma M Dervins-Ravault D Rolando M Nora T Newton HJ SansomFM Sahr T Gomez-Valero L Jules M Hartland EL et al 2010The Legionella pneumophila F-box protein Lpp2082 (AnkB) modu-lates ubiquitination of the host protein parvin B and promotesintracellular replication Cell Microbiol 121272ndash1291

Lowe TM Eddy SR 1997 tRNAscan-SE a program for improved detec-tion of transfer RNA genes in genomic sequence Nucleic Acids Res25955ndash964

Lynch M Conery JS 2000 The evolutionary fate and consequences ofduplicate genes Science 2901151ndash1155

Ma W Dong FFT Stavrinides J Guttman DS 2006 Type III effec-tor diversification via both pathoadaptation and horizontaltransfer in response to a coevolutionary arms race PLoS Genet 2e209

Magori S Citovsky V 2011 Hijacking of the host SCF ubiquitin ligasemachinery by plant pathogens Front Plant Sci 287

Martin D Rybicki E 2000 RDP detection of recombination amongstaligned sequences Bioinformatics 16562ndash563

Martin DP Lemey P Lott M Moulton V Posada D Lefeuvre P 2010RDP3 a flexible and fast computer program for analyzing recombi-nation Bioinformatics 262462ndash2463

Martin DP Posada D Crandall KA Williamson C 2005 A modifiedbootscan algorithm for automated identification of recombinantsequences and recombination breakpoints AIDS Res HumRetroviruses 2198ndash102

McCutcheon JP Moran NA 2011 Extreme genome reduction in sym-biotic bacteria Nat Rev Microbiol 1013ndash26

McLeod MP Warren RL Hsiao WWL Araki N Myhre M Fernandes CMiyazawa D Wong W Lillquist AL Wang D et al 2006 Thecomplete genome of Rhodococcus sp RHA1 provides insightsinto a catabolic powerhouse Proc Natl Acad Sci U S A 10315582ndash15587

Miller MA Pfeiffer W Schwartz T 2010 Creating the CIPRES sciencegateway for inference of large phylogenetic trees In Proceedings ofthe Gateway Computing Environments Workshop (GCE) 2010 p1ndash8

Misaghi S Balsara ZR Catic A Spooner E Ploegh HL Starnbach MN2006 Chlamydia trachomatis-derived deubiquitinating enzymes inmammalian cells during infection Mol Microbiol 61142ndash150

Myers GSA Crabtree J Huot CH 2012 Deep and wide comparativegenomics of Chlamydia In Tan M Bavoil PM editors Intracellularpathogens Chlamydiales Washington (DC) ASM Press p 27ndash50

Nam J Nei M 2005 Evolutionary change of the numbers of homeoboxgenes in bilateral animals Mol Biol Evol 222386ndash2394

Navarro-Quezada A Schumann N Quint M 2013 Plant F-box proteinevolution is determined by lineage-specific timing of major genefamily expansion waves PLoS One 8e68672

NCBI Resource Coordinators 2014 Database resources of the NationalCenter for Biotechnology Information Nucleic Acids Res 42D7ndashD17

Nei M 2007 The new mutation theory of phenotypic evolution ProcNatl Acad Sci U S A 10412235ndash12242

Nei M Niimura Y Nozawa M 2008 The evolution of animal chemo-sensory receptor gene repertoires roles of chance and necessity NatRev Genet 9951ndash963

Nei M Rooney AP 2005 Concerted and birth-and-death evolution ofmultigene families Annu Rev Genet 39121ndash152

Nogueira T Touchon M Rocha EPC 2012 Rapid evolution of the se-quences and gene repertoires of secreted proteins in bacteria PLoSOne 7e49403

Nozawa M Kawahara Y Nei M 2007 Genomic drift and copy numbervariation of sensory receptor genes in humans Proc Natl Acad SciU S A 10420421ndash20426

Ochman H Lawrence JG Groisman EA 2000 Lateral gene transfer andthe nature of bacterial innovation Nature 405299ndash304

Perez-Torrado R Yamada D Defossez P-A 2006 Born to bind the BTBproteinndashprotein interaction domain BioEssays 281194ndash1202

Peters J Wilson DP Myers G Timms P Bavoil PM 2007 Type III secre-tion a la Chlamydia Trends Microbiol 15241ndash251

Posada D Crandall KA 2001 Evaluation of methods for detecting re-combination from DNA sequences computer simulations Proc NatlAcad Sci U S A 9813757ndash13762

Price CT Kwaik YA 2010 Exploitation of host polyubiquitination ma-chinery through molecular mimicry by eukaryotic-like bacterial F-box effectors Cell Infect Microbiol 1122

Price CT Al-Khodor S Al-Quadan T Santic M Habyarimana F Kalia AKwaik YA 2009 Molecular mimicry by an F-box effector ofLegionella pneumophila hijacks a conserved polyubiquitinationmachinery within macrophages and protozoa PLoS Pathog 5e1000704

Price CTD Al-Quadan T Santic M Rosenshine I Kwaik YA 2011 Hostproteasomal degradation generates amino acids essential for intra-cellular bacterial growth Science 3341553ndash1557

Price MN Dehal PS Arkin AP 2010 FastTree 2mdashapproximately maxi-mum-likelihood trees for large alignments PLoS One 5e9490

Pushker R Mira A Rodriguez-Valera F 2004 Comparative geno-mics of gene-family size in closely related bacteria Genome Biol 5R27

Ronquist F Huelsenbeck JP 2003 MrBayes 3 Bayesian phylogeneticinference under mixed models Bioinformatics 191572ndash1574

Rooney AP Ward TJ 2008 Birth-and-death evolution of the internalinmultigene family in Listeria Gene 427124ndash128

Sakakibara A Hattori S 2000 Chat a CasHEF1-associated adaptor pro-tein that integrates multiple signaling pathways J Biol Chem 2756404ndash6410

Santoyo G Romero D 2005 Gene conversion and concerted evolutionin bacterial genomes FEMS Microbiol Rev 29169ndash183

Sawyer S 1989 Statistical tests for detecting gene conversion Mol BiolEvol 6526ndash538

Schmitz-Esser S Tischler P Arnold R Montanaro J Wagner M Rattei THorn M 2010 The genome of the amoeba symbiont ldquoCandidatusAmoebophilus asiaticusrdquo reveals common mechanisms for host cellinteraction among amoeba-associated bacteria J Bacteriol 1921045ndash1057

Schmitz-Esser S Toenshoff ER Haider S Heinz E Hoenninger VMWagner M Horn M 2008 Diversity of bacterial endosymbionts ofenvironmental Acanthamoeba isolates Appl Environ Microbiol 745822ndash5831

Smith JM 1992 Analyzing the mosaic structure of genes J Mol Evol 34126ndash129

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Stephens RS Kalman S Lammel C Fan J Marathe R Aravind L MitchellW Olinger L Tatusov RL Zhao Q et al 1998 Genome sequence ofan obligate intracellular pathogen of humans Chlamydia trachoma-tis Science 282754ndash759

Stogios PJ Downs GS Jauhal JJ Nandra SK Prive GG 2005Sequence and structural analysis of BTB domain proteins GenomeBiol 6R82

Stolzer M Lai H Xu M Sathaye D Vernot B Durand D 2012Inferring duplications losses transfers and incomplete line-age sorting with nonbinary species trees Bioinformatics 28i409ndashi415

Subtil A Collingro A Horn M 2014 Tracing the primordial Chlamydiaeextinct parasites of plants Trends Plant Sci 1936ndash43

Subtil A Parsot C Dautry-Varsat A 2001 Secretion of predicted Incproteins of Chlamydia pneumoniae by a heterologous type III ma-chinery Mol Microbiol 39792ndash800

Suyama M Torrents D Bork P 2006 PAL2NAL robust conversion ofprotein sequence alignments into the corresponding codon align-ments Nucleic Acids Res 34W609ndashW612

2903

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

Tamura K Peterson D Peterson N Stecher G Nei M Kumar S 2011MEGA5 molecular evolutionary genetics analysis using maximumlikelihood evolutionary distance and maximum parsimony meth-ods Mol Biol Evol 282731ndash2739

Thomas JH 2006 Adaptive evolution in two large families of ubiquitin-ligase adapters in nematodes and plants Genome Res 161017ndash1030

Treangen TJ Rocha EPC 2011 Horizontal transfer not duplicationdrives the expansion of protein families in prokaryotes PLoSGenet 7e1001284

UniProt Consortium 2014 Activities at the Universal Protein Resource(UniProt) Nucleic Acids Res 42D191ndashD198

Valen LV 1974 Molecular evolution as predicted by natural selectionJ Mol Evol 389ndash101

Willems AR Schwab M Tyers M 2004 A hitchhikerrsquos guide to the cullinubiquitin ligases SCF and its kin Biochim Biophys Acta 1695133ndash170

Xu G Ma H Nei M Kong H 2009 Evolution of F-box genes in plantsdifferent modes of sequence divergence and their relationships withfunctional diversification Proc Natl Acad Sci U S A 106835ndash840

Xu J 2006 Microbial ecology in the age of genomics andmetagenomics concepts tools and recent advances Mol Ecol 151713ndash1731

Yang J Lusk R Li W-H 2003 Organismal complexity protein complex-ity and gene duplicability Proc Natl Acad Sci U S A 10015661ndash15665

Yang Z 2007 PAML 4 phylogenetic analysis by maximum likelihoodMol Biol Evol 241586ndash1591

Yu NY Wagner JR Laird MR Melli G Rey S Lo R Dao P Sahinalp SCEster M Foster LJ et al 2010 PSORTb 30 improved protein sub-cellular localization prediction with refined localization subcate-gories and predictive capabilities for all prokaryotes Bioinformatics261608ndash1615

Zhang J 2003 Evolution by gene duplication an update Trends EcolEvol 18292ndash298

Zheng N Schulman BA Song L Miller JJ Jeffrey PD Wang P Chu CKoepp DM Elledge SJ Pagano M et al 2002 Structure of the Cul1ndashRbx1ndashSkp1ndashF boxSkp2 SCF ubiquitin ligase complex Nature 416703ndash709

2904

Domman et al doi101093molbevmsu227 MBE

Page 10: Massive Expansion of Ubiquitination-Related Gene Families

duplication has had pronounced effects in shaping these chla-mydial genomes

Chlamydial Proteins Putatively Involved inInterference with Eukaryotic Ubiquitination Pathways

We demonstrated that the largest chlamydial gene familiesharbor proteins with domains associated with eukaryotic ubi-quitination pathways and that the chlamydial F-box super-family in particular is tremendous in size (figs 3 and 5) Arecent survey of prokaryotes found a total of 74 F-box pro-teins distributed in 22 species (Price and Kwaik 2010) whichmeans the number of these proteins present in a singleNeochlamydia genome is twice that of all previously knownbacterial F-box proteins combined The 76-member BTB su-perfamily is also remarkable in that the Parachlamydiaceae arethe only prokaryotes known to harbor this domain (fig 6)

In eukaryotes ubiquitin plays a pivotal regulatory role as aposttranslational modification that includes targeting pro-teins for degradation The process of adding ubiquitin to aprotein occurs when an assembled E3 ubiquitin ligase com-plex carrying a target protein is bound to the ubiquitin con-jugating enzyme E2 The ubiquitin ligase is a multiproteincomplex termed the Skp1-Cullin-F-box protein complex bywhich F-box proteins recruit target proteins and subsequentlybind to Skp1 which is linked to a Cullin protein (Zheng et al2002) A RINGU-box protein then serves as a linker of E2 tothe newly formed ubiquitin ligase and the transfer of theubiquitin moiety to the target occurs BTB-box proteins canhave multiple functions but chief among them is a functionalequivalent to the F-box protein in recruiting targets by bind-ing to the E3 ubiquitin ligase complex (Perez-Torrado et al2006) Proteinndashprotein interaction domains such as ankyrinkelch WD-40 LRR or TPR repeat domains are coupled toF-box and BTB-box domains and confer the specificity fortarget proteins

Given the conservation and essentiality of the ubiquitina-tion pathway throughout eukaryotes it should come as nosurprise that bacterial pathogens have engineered ways tomanipulate this pathway The use of F-box proteins appearsto be a common feature among plant pathogens such asAgrobacterium tumefaciens and Ralstonia solanacearumwhose genomes encode one and four F-box proteins respec-tively (Magori and Citovsky 2011) Intriguingly F-box proteinsseem to be a common feature of amoeba-associated bacteriaand viruses The amoeba symbiont A asiaticus a Bacteroi-detes is predicted to harbor 15 F-box proteins and until nowwas the largest known pool of these proteins among se-quenced genomes (Schmitz-Esser et al 2010) AdditionallyLegionella pneumophila exports an F-box protein coupledto ankyrin repeats termed AnkB that is essential for infectionof both human cell lines and Acanthamoeba (Price et al 2009Lomma et al 2010) AnkB blocks host proteosomal degrada-tion and thus generates increased levels of required aminoacids (Price et al 2009 Price et al 2011) Several other F-boxproteins secreted by L pneumophila have been shown tointeract with host E3 ligase complexes (Ensminger andIsberg 2010) Among the members of the Chlamydiaceae

we could not detect an F-box or BTB-box domain Howeverthere are several proteins within the Chlamydiaceae thatfunction as deubiquinating proteases such as the C tracho-matis ChlaDub1 (Misaghi et al 2006) and the recently de-scribed ChlaOTU characterized in Chlamydia caviae (Furtadoet al 2013)

We have shown experimentally that representative mem-bers of the investigated chlamydial gene families contain func-tional type III secretion signals (supplementary fig S1Supplementary Material online) Thus they likely representan extensive pool of effector proteins with a putative role inhijacking the host ubiquitination machinery perhaps in amanner similar to AnkB from Legionella by increasing nutri-ent availability In the absence of direct proteinndashprotein in-teraction data however we can only speculate as to what theinteraction partner(s) are for the members of the PEX andNEX gene families

Birth-and-Death Evolution Has Shaped LargeParachlamydiaceae Gene Families

Large gene families are thought to generally either evolve viaconcerted evolution or according to a birth-and-death model(Nei and Rooney 2005) When gene families are evolving con-certedly all members experience the same evolutionary pres-sure and evolve as a unit The gene family is marked byrecombination between members that leads to a homogeni-zation of all members and thus in the phylogenetic analysisone observes intraspecies clustering of gene family membersAlthough we do find minor evidence that recombination hasoccurred between members in the PEX and NEX gene families(supplementary fig S8 Supplementary Material online) theeffect does not appear to be that of homogenization In con-trast long branch lengths in the trees and moderate overallsequence similarity indicate these proteins have divergedquite extensively (supplementary figs S2ndashS6 SupplementaryMaterial online) As we do not observe a dominance of in-traspecies clustering in the phylogenetic analysis these genefamilies are not evolving in a fashion as would be predicted viaconcerted evolution

We provide clear evidence supporting birth-and-deathevolution of the PEX and NEX gene families which ismarked by independent gains and losses of members (Neiand Rooney 2005) We detected frequent lineage-specific du-plication and loss events leading to high rates of variation incopy number between closely related organisms (supplemen-tary table S3 Supplementary Material online) In the phylo-genetic trees of the PEX and NEX gene families we findinterspecies clustering of members which is a hallmark ofthe birth-and-death model (supplementary figs S2ndashS6Supplementary Material online) Additionally we detectedpseudogenized gene fragments of members related to thelarge gene families which is another hallmark of this modeof evolution To our knowledge this mode of evolution has sofar only been described once for a bacterial gene family(Rooney and Ward 2008)

It has been proposed that gene families that control phe-notypic characters are generally subject to birth-and-death

2898

Domman et al doi101093molbevmsu227 MBE

evolution (Nei and Rooney 2005 Nei 2007) In the case of theexpansions seen in the Parachlamydiaceae this character islikely the ability to effectively infect and replicate in protisthosts The advantage of the birth-and-death scenario as op-posed to concerted evolution is that individual members ofthe gene family are able to functionally differentiate from eachother and thus might facilitate the adaptation to new eco-logical niches (Nei 2007) Therefore a possible driver for thebirth-and-death model for the PEX and NEX families is theexploitation of new ecological niches which in this case ismost likely new host(s) species or a novel way to subvert hostcell machinery

A Confluence of Drift and Selection

Birth-and-death evolution of gene families occurs by bothadaptive processes and chance events such as genetic drift(Nei et al 2008) This is due to the fact that although geneduplications are intrinsically stochastic events their fixation isinfluenced by both selection and genetic drift If a duplicate isfixed functional divergence can occur due to relaxed selec-tion or diversifying selection in one of the gene copies aprocess known as neofunctionalization (Lynch and Conery2000) Once these genes have diverged the new functionsare then maintained in the genome via purifying selectionWe envision an evolutionary scenario where the PEX and NEXgene family members rapidly diverged either due to positivediversifying or relaxed selection after duplication likely leadingto functional diversification The window for detecting theseearly diversification processes is very small but our analysisindicates that the gene family members are now being main-tained via purifying selection

The retention of such large gene families in organisms thattypically undergo extreme genome reduction is perplexingespecially when considering the variation in copy numberbetween organisms The large number of gains and lossesfor F-box domain containing proteins across chlamydial line-ages indicate substantial fluctuations in gene content some-times over short evolutionary time (fig 5b) One describedcorollary of gene families evolving via a birth-and-deathmodel is that the copy number variation of memberswithin a gene family may vary due to chance duplicationloss events both between and within species a process coinedldquogenomic drift (Nei et al 2008) Genomic drift has been in-voked to explain the large copy number variations in severallarge gene families in eukaryotes including animal chemosen-sory receptors (Nozawa et al 2007) homeobox genes (Namand Nei 2005) and fatty-acid reductases (Eirın-Lopez et al2012)

Remarkably eukaryotic F-box and BTB-box gene familiesdemonstrate the same dramatic evolution following a rapidbirth-and-death model and extensive copy number variationis seen both between and within eukaryotic species (Xu 2006Navarro-Quezada et al 2013) This is especially true in higherplant species (Stogios et al 2005 Xu et al 2009) whereArabidopsis thaliana and Oryza sativa harbor upwards of800 F-box proteins (Xu et al 2009) and large expansionshave also been described in some nematode lineages

(Thomas 2006) A case has been made that genomic driftalso influences the evolution of these eukaryotic F-boxBTB-box protein families (Xu et al 2009) Genomic drift there-fore seems a plausible mechanism describing some aspects ofevolution of the PEX and NEX gene families in theParachlamydiaceae Genome sequences from other chlamyd-ial genomes and data from within populations would allowfurther testing of this hypothesis

Another possibility is that the expansion of these families isthe result of selection pressure due to a coevolutionary armsrace with host counterpart protein(s) Under this scenariothe expansion and diversification of these families are in directresponse to changes in target proteins These evolutionarydynamics also referred to as the Red Queen hypothesis(Valen 1974) have been most exemplified in bacteriawithin plantndashpathogen relationships most notably betweenthe pathogen Pseudomonas syringae and its host Ar thaliana(Ma et al 2006 Baltrus et al 2011) Copy number variation ofeffector proteins has been shown for Ps syringae and it isspeculated that these differences confer differential hostranges (Baltrus et al 2011) It is plausible therefore that var-iations in copy number between the PEX and NEX gene fam-ilies is influenced by the host range of particular chlamydiallineages This would imply that these proteins interact withmany targets from a narrow host range or that there are alimited number of targets for a large number of possible hostsIt seems most parsimonious that the latter was the case andthat these genes serve as accessory virulence factors allowingthese chlamydiae to expand their host range However thefailure to detect positive selection as the major force acting onthe PEX and NEX gene family members casts doubts on acoevolutionary arms race as the sole mechanism for the evo-lution of these gene families

Another conceivable scenario might be that selection pres-sure for increased gene dosage has contributed to the expan-sion of the large chlamydial gene families In reduced genomeswhere the pool of regulatory proteins is limited a path forincreased protein expression might be gene duplication Thismode of ldquogene amplification has been described for instancefor the increase of antibiotic resistance genes in E coli thecholera toxin gene in Vibrio cholerae and capsule biosynthesisgenes in Haemophilus influenza (reviewed in Andersson andHughes 2009) Although a gene dosage scenario cannot befully ruled out for the PEX and NEX gene families it seemsunlikely as the high divergence of their members makes acompletely overlapping function highly improbableAdditionally gene amplifications are in response to strongselection pressure and would therefore be under strong pos-itive selection It is also known that once these selection pres-sures are removed the amplified gene copies are rapidly lostwithin the population often within several generations(Andersson and Hughes 2009)

We favor a hypothesis that incorporates both selectionand chance into the equation We envision that the expan-sion of these gene families is reflective of their role in hostndashmicrobe interaction where they are interfering with the hostubiquitination pathway The expansions may reflect either achange in the environmental niche perhaps the ability to

2899

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

infect a new host organism or they may be in response to agrowing number of targets within a current host Because thePEX and NEX gene families have respectively expanded in theProtochlamydia and Neochlamydia they appear to providelineage-specific functions related to particular host interac-tions The PEX and NEX variation in copy number betweenclosely related organisms may be reflective of genomic drift inwhich the independent gain and loss of members within afamily has been determined to some extent by chanceevents The fact that members of the Chlamydiae includingpathogenic Chlamydia also harbor various proteins to sub-vert the host ubiquitination pathway indicates a case of con-vergent evolution toward exploitation of this system withinthis phylum

To summarize the Chlamydiae harbor several lineage-spe-cific gene families which are the largest among intracellularmicrobes with small genomes Experimental evidence andcomputational analysis strongly indicate that members ofthese large gene families function as effector proteins involvedin manipulating the ubiquitination machinery of their eukary-otic host cells The large chlamydial gene families follow abirth-and-death model of evolution where genomic driftmay influence copy number variation This might representa previously undescribed mechanism by which organismswith limited exposure to larger gene pools generate geneticdiversity

Materials and Methods

Genome Sequencing

We sequenced the genomes of P amoebophila EI2 (Schmitz-Esser et al 2008) Parachlamydia acanthamoebae OEW1(Heinz et al 2007) Neochlamydia sp TUME1 (Fritsche et al2000) and Neochlamydia sp EPS4 The Acanthamoeba spharboring the latter was isolated from pond sediment fromElba Italy Cells and DNA were prepared as described previ-ously (Schmitz-Esser et al 2010) All four genomes were se-quenced using 454 technology and assemblies wereperformed using Newbler 26 We implemented an in-housepipeline for genome annotation that combines multipleapproaches for gene calling and function prediction Genecalling was performed by combining ab initio predictionsfrom GeneMark (Besemer et al 2001) Glimmer v302 (Del-cher et al 2007) Prodigal (Hyatt et al 2010) and Critica v105(Badger and Olsen 1999) with homology information derivedfrom a BLAST search against National Center for Biotechnol-ogy Information nonredundant protein database (NCBIResource Coordinators 2014) RNA genes were called bytRNAscan-SE (Lowe and Eddy 1997) RNAmmer (Lagesenet al 2007) and Rfam (Griffiths-Jones et al 2005) Functionprediction was performed via BLAST against the UniProtdatabase (UniProt Consortium 2014) and domain predictionwas performed via InterProScan 5 (Jones et al 2014)Completeness estimates were performed based on the pres-ence of single-copy marker genes (n = 54) found in 99 of allbacterial genomes Sequences have been deposited at Gen-bankEMBLDDBJ under accession numbers PRJNA242498PRJNA242497 PRJNA242499 and PRJNA242500

Comparative Genome Analysis

Orthologous protein groups were calculated using OrthoMCL(Li et al 2003) with default parameters using the predictedproteomes from 19 chlamydial organisms (supplementarytable S2 Supplementary Material online) For those gene fam-ilies under analysis membership to a given family was furtherevaluated by alignment and assessment of the phylogenetictrees Members that had obvious major differences in thealignment and trees were dubbed spurious and likely tohave been grouped due to homology in the repeat regionand thus were removed from the gene family Whole-genomealignments were performed using the MAUVE progressive-Mauve algorithm (Darling et al 2010) The MAUVE align-ments and local synteny plots were visualized in R (v 301)with the genoPlotR package (Guy et al 2010)

Species Tree Construction

Phylogeny of the Chlamydiae was reconstructed using 32phylogenetic marker proteins (supplementary table S2Supplementary Material online) Multiple sequence align-ments were performed with MAFFT (Katoh and Standley2013) using the settings ldquondashmaxiterate 1000 and all align-ments subsequently concatenated using FASconCAT v10(Keurouck and Meusemann 2010) Maximum-likelihood analysiswas performed with RAxML (Stamatakis 2006) with 1000bootstrap iterations under the PROTGAMMAGTR modeland Bayesian inference was performed with MrBayes v32(Ronquist and Huelsenbeck 2003) using the mixed aminoacid model and standard settings via the CIPRES science gate-way (Miller et al 2010) Alignment and tree files are availableas supplementary material Supplementary Material online

Gene Family Analysis

As multiple sequence alignments of repeat containing pro-teins are not trivial we employed several methods to obtainreliable alignments We compared the alignments producedby MUSCLE (Edgar 2004) MAFFT using the ldquogenapairsoption (Katoh and Standley 2013) and DIALIGN-PFAM (AlAit et al 2013) all with and without trimming by Gblocksusing relaxed parameters ldquo-b5=h (Castresana 2000) In nearlyall cases the MAFFT alignment combined with Gblocksyielded the best alignment as judged by manual inspectionThe exceptions were that DIALIGN-PFAM with Gblocks wasthe best method for the NEX1b and PEX1 alignments Toensure robustness we calculated neighbor joining maximumlikelihood and Bayesian trees for each of the alignment datasets and selected the most supported tree for the final anal-ysis In all cases the Bayesian trees yielded the most supportwhich were calculated on the CIPRES gateway (Miller et al2010) with MrBayes as mentioned above for the species treeFor protein domain phylogenies we extracted and aligned thedomains using the hmmalign program from HMMER3 pack-age (Eddy 2011) and phylogenetic trees for the domains werecalculated using FastTree2 (Price et al 2010) Reconciliationsof gene trees to the species tree to infer gene gain and losswere performed in Notung v26 (Stolzer et al 2012) and theroot for the gene trees was assigned within the program to

2900

Domman et al doi101093molbevmsu227 MBE

achieve the lowest duplication-to-loss ratio Phylogenetictrees were visualized using iTOL (Letunic and Bork 2007) orthe ETE2 toolkit (Huerta-Cepas et al 2010) Alignment andtree files are available as supplementary materialSupplementary Material online

Detection of Selection and Recombination

Whole-protein alignments were converted to codon align-ments via the Pal2Nal program (Suyama et al 2006) Todetect the mode of selection acting on these gene familieswe used CodeML from the PAML package (Yang 2007) Onlysequences that were nearly full length and not contig fusionswere used for the CodeML analysis We additionally used themodified NeindashGojobori method for codon selection from theMEGA v51 (Tamura et al 2011) program with 1000 boot-straps and treating missing data with pairwise deletions Weused the RDP4 software suite (Martin et al 2010) to detectrecombination events which is an amalgamation of manyindividual recombination programs linked into one softwarearchitecture We employed seven recombination detectionprograms that include RDP (Martin and Rybicki 2000)GENECONV (Sawyer 1989) MaxChi (Smith 1992) BootScan(Martin et al 2005) Chimaera (Posada and Crandall 2001)SiScan (Gibbs et al 2000) and 3Seq (Boni et al 2007) Onlyevents that were predicted with more than four programswere considered

Intergenic Sequences Analysis

The proteome of Protochlamydia was used as query(tBLASTn) against the intergenic regions The hits were con-servatively filtered based on size (4 50 aa) bitscore (4 50)and E value (lt1010) For each intergenic region when twoquery proteins overlap only the best hit was considered Inaddition if different parts of the same protein had hits in oneintergenic space only the top scoring (bitscore) was consid-ered Putative domains in the intergenic regions were de-tected using InterProScan 5 (Jones et al 2014)

Domain Distribution

For each domain of interest we downloaded the Pfam HMMmodel (Finn et al 2013) and scanned all chlamydial prote-omes using the hmmscan program from the HMMER3 suite(Eddy 2011) In the case of the F-box and F-box-like domainsthe results were combined into a nonredundant list The datafor the other taxa were obtained through the Pfam websiteunder the species distribution tab for each domain or clan asin the case for the F-box The networks were created using theldquoarcdiagramrdquo package in R

Type III Secretion Analysis

Fusion proteins containing the 50-part of the genes of interest(including the first 20 codons) and the adenylate cyclase Cyawere expressed in Sh Flexneri SF401 and SF620 derivatives ofthe wild-type strain M90T in which the mxiD and ipaB geneshave been inactivated (Allaoui et al 1993) The 50-part of thetarget genes were amplified by polymerase chain reaction andcloned in the puc19cya vector as described (Subtil et al 2001)

Secretion assays were performed on 30 ml of exponentiallygrown cultures as described previously (Subtil et al 2001)Antibodies against CRP a cytosolic marker were used to es-timate the contamination of supernatant fractions with bac-terial proteins as a result of bacterial lysis Antibodies againstIpaD a type-III-secreted protein of Shigella were used to verifythat type III secretion occurred normally in the transformedstrains A monoclonal antibody against Cya and polyclonalantibodies against CRP and IpaD were generously provided byDrs N Guiso A Ullmann and C Parsot respectively (InstitutPasteur Paris) Prediction of type III secretion was performedvia the Effective database (Jehl et al 2011) and the PSORTbwebserver (Yu et al 2010)

Supplementary MaterialSupplementary tables S1ndashS3 and figures S1ndashS8 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Thomas Penz and Gabriele Schwammel are acknowledged forisolating genomic DNA This work was supported byAustrian Science Fund FWF grants Y277-B06 and I1628-B22European Research Council ERC StG EVOCHLAMY (281633)and Marie Curie Initial Training Network SymbiomicsGenome sequencing was performed at the NorwegianSequencing Center

ReferencesAl Ait L Yamak Z Morgenstern B 2013 DIALIGN at GOBICSmdashmultiple

sequence alignment using various sources of external informationNucleic Acids Res 41W3ndashW7

Allaoui A Sansonetti PJ Parsot C 1993 MxiD an outer membraneprotein necessary for the secretion of the Shigella flexneri lpa inva-sins Mol Microbiol 759ndash68

Andersson DI Hughes D 2009 Gene amplification and adaptive evolu-tion in bacteria Annu Rev Genet 43167ndash195

Angot A Vergunst A Genin S Peeters N 2007 Exploitationof eukaryotic ubiquitin signaling pathways by effectorstranslocated by bacterial type III and type IV Secretion SystemsPLoS Pathog 3e3

Arnold R Brandmaier S Kleine F Tischler P Heinz E Behrens SNiinikoski A Mewes HW Horn M Rattei T 2009 Sequence-based prediction of type III secreted proteins PLoS Pathog 5e1000376

Badger JH Olsen GJ 1999 CRITICA coding region identification toolinvoking comparative analysis Mol Biol Evol 16512ndash524

Balakirev MY Tcherniuk SO Jaquinod M Chroboczek J 2003 Otubainsa new family of cysteine proteases in the ubiquitin pathway EMBORep 4517ndash522

Baltrus DA Nishimura MT Romanchuk A Chang JH Mukhtar MSCherkis K Roach J Grant SR Jones CD Dangl JL 2011 Dynamicevolution of pathogenicity revealed by sequencing and comparativegenomics of 19 Pseudomonas syringae isolates PLoS Pathog 7e1002132

Bertelli C Collyn F Croxatto A Reurouckert C Polkinghorne A Kebbi-Beghdadi C Goesmann A Vaughan L Greub G 2010 TheWaddlia genome a window into chlamydial biology PLoS One 5e10890

2901

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

Besemer J Lomsadze A Borodovsky M 2001 GeneMarkS a self-trainingmethod for prediction of gene starts in microbial genomesImplications for finding sequence motifs in regulatory regionsNucleic Acids Res 292607ndash2618

Betts HJ Wolf K Fields KA 2009 Effector protein modulation of hostcells examples in the Chlamydia spp arsenal Curr Opin Microbiol1281ndash87

Blanc G Ogata H Robert C Audic S Claverie J-M Raoult D 2007Lateral gene transfer between obligate intracellular bacteria evi-dence from the Rickettsia massiliae genome Genome Res 171657ndash1664

Boni MF Posada D Feldman MW 2007 An exact nonparametricmethod for inferring mosaic structure in sequence tripletsGenetics 1761035ndash1047

Bordenstein SR Reznikoff WS 2005 Mobile DNA in obligate intracellularbacteria Nat Rev Microbiol 3688ndash699

Bratlie MS Johansen J Sherman BT Huang DW Lempicki RA Drabloslashs F2010 Gene duplications in prokaryotes can be associated with en-vironmental adaptation BMC Genomics 11588

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Cho N-H Kim H-R Lee J-H Kim SY Kim J Cha S Kim SY Darby ACFuxelius HH Yin J et al 2007 The Orientia tsutsugamushi genome re-veals massive proliferation of conjugative type IV secretion systemand host-cell interaction genes Proc Natl Acad Sci U S A 1047981ndash7986

Collingro A Tischler P Weinmaier T Penz T Heinz E Brunham RC ReadTD Bavoil PM Sachse K Kahane S et al 2011 Unity in varietymdashthe pan-genome of the Chlamydiae Mol Biol Evol 283253ndash3270

Darling AE Mau B Perna NT 2010 progressiveMauve multiple genomealignment with gene gain loss and rearrangement PLoS One 5e11147

Delcher AL Bratke KA Powers EC Salzberg SL 2007 Identifying bacterialgenes and endosymbiont DNA with Glimmer Bioinformatics 23673ndash679

Eddy SR 2011 Accelerated profile hmm searches PLoS Comput Biol 7e1002195

Edgar RC 2004 MUSCLE multiple sequence alignmentwith high accuracy and high throughput Nucleic Acids Res 321792ndash1797

Eirın-Lopez JM Rebordinos L Rooney AP Rozas J 2012 The birth-and-death evolution of multigene families revisited Genome Dyn7170ndash196

Ensminger AW Isberg RR 2010 E3 ubiquitin ligase activity and targetingof Bat3 by multiple Legionella pneumophila translocated substratesInfect Immun 783905ndash3919

Finn RD Bateman A Clements J Coggill P Eberhardt RY Eddy SR HegerA Hetherington K Holm L Mistry J et al 2013 Pfam the proteinfamilies database Nucleic Acids Res 42D222ndashD230

Fritsche TR Horn M Wagner M Herwig RP Schleifer K-H Gautom RK2000 Phylogenetic diversity among geographically dispersedChlamydiales endosymbionts recovered from clinical and environ-mental isolates of Acanthamoeba spp Appl Environ Microbiol 662613ndash2619

Furtado AR Essid M Perrinet S Bala~na ME Yoder N Dehoux P Subtil A2013 The chlamydial OTU domain-containing protein ChlaOTU isan early type III secretion effector targeting ubiquitin and NDP52Cell Microbiol 152064ndash2079

Gevers D Vandepoele K Simillon C Van de Peer Y 2004 Gene dupli-cation and biased functional retention of paralogs in bacterial ge-nomes Trends Microbiol 12148ndash154

Gibbs MJ Armstrong JS Gibbs AJ 2000 Sister-scanning a Monte Carloprocedure for assessing signals in recombinant sequencesBioinformatics 16573ndash582

Goldman BS Nierman WC Kaiser D Slater SC Durkin AS Eisen JARonning CM Barbazuk WB Blanchard M Field C et al 2006Evolution of sensory complexity recorded in a myxobacterialgenome Proc Natl Acad Sci U S A 10315200ndash15205

Gomes JP Nunes A Bruno WJ Borrego MJ Florindo C Dean D 2006Polymorphisms in the nine polymorphic membrane proteins ofChlamydia trachomatis across all serovars evidence for serovar Darecombination and correlation with tissue tropism J Bacteriol 188275ndash286

Griffiths-Jones S Moxon S Marshall M Khanna A Eddy SR Bateman A2005 Rfam annotating non-coding RNAs in complete genomesNucleic Acids Res 33D121ndashD124

Grimwood J Stephens RS 1999 Computational analysis of thepolymorphic membrane protein superfamily of Chlamydia tracho-matis and Chlamydia pneumoniae Microb Comp Genomics 4187ndash201

Guy L Kultima JR Andersson SGE 2010 genoPlotR comparative geneand genome visualization in R Bioinformatics 262334ndash2335

Heinz E Kolarov I Keuroastner C Toenshoff ER Wagner M Horn M 2007An Acanthamoeba sp containing two phylogenetically differentbacterial endosymbionts Environ Microbiol 91604ndash1609

Hooper SD Berg OG 2003 On the nature of gene innovation duplica-tion patterns in microbial genomes Mol Biol Evol 20945ndash954

Horn M 2008 Chlamydiae as symbionts in eukaryotes Annu RevMicrobiol 62113ndash131

Horn M Collingro A Schmitz-Esser S Beier CL Purkhold U Fartmann BBrandt P Nyakatura GJ Droege M Frishman D et al 2004Illuminating the evolutionary history of chlamydiae Science 304728ndash730

Huerta-Cepas J Dopazo J Gabaldon T 2010 ETE a python environmentfor tree exploration BMC Bioinformatics 1124

Hyatt D Chen G-L LoCascio PF Land ML Larimer FW Hauser LJ 2010Prodigal prokaryotic gene recognition and translation initiation siteidentification BMC Bioinformatics 11119

Jehl M-A Arnold R Rattei T 2011 Effectivemdasha database ofpredicted secreted bacterial proteins Nucleic Acids Res 39D591ndashD595

Jones P Binns D Chang H-Y Fraser M Li W McAnulla C McWilliam HMaslen J Mitchell A Nuka G et al 2014 InterProScan 5 genome-scale protein function classification Bioinformatics 301236ndash1240

Kaessmann H 2010 Origins evolution and phenotypic impact of newgenes Genome Res 201313ndash1326

Kalman S Mitchell W Marathe R Lammel C Fan J Hyman RW OlingerL Grimwood J Davis RW Stephens RS 1999 Comparative genomesof Chlamydia pneumoniae and C trachomatis Nat Genet 21385ndash389

Kamneva OK Knight SJ Liberles DA Ward NL 2012 Analysis of genomecontent evolution in PVC bacterial super-phylum assessment ofcandidate genes associated with cellular organization and lifestyleGenome Biol Evol 41375ndash1390

Katoh K Standley DM 2013 MAFFT multiple sequence alignment soft-ware version 7 improvements in performance and usability Mol BiolEvol 30772ndash780

Koonin EV Wolf YI Karev GP 2002 The structure of the protein uni-verse and genome evolution Nature 420218ndash223

Keurouck P Meusemann K 2010 FASconCAT convenient handling of datamatrices Mol Phylogenet Evol 561115ndash1118

Kuo C-H Ochman H 2009a Deletional bias across the three domains oflife Genome Biol Evol 1145ndash152

Kuo C-H Ochman H 2009b The fate of new bacterial genes FEMSMicrobiol Rev 3338ndash43

Lagesen K Hallin P Roslashdland EA Staefeldt H-H Rognes T Ussery DW2007 RNAmmer consistent and rapid annotation of ribosomalRNA genes Nucleic Acids Res 353100ndash3108

Lagkouvardos I Weinmaier T Lauro FM Cavicchioli R Rattei T Horn M2013 Integrating metagenomic and amplicon databases to re-solve the phylogenetic and ecological diversity of the ChlamydiaeISME J 8115ndash125

Lerat E Daubin V Ochman H Moran NA 2005 Evolutionary origins ofgenomic repertoires in bacteria PLoS Biol 3e130

Letunic I Bork P 2007 Interactive Tree Of Life (iTOL) an online tool forphylogenetic tree display and annotation Bioinformatics 23127ndash128

2902

Domman et al doi101093molbevmsu227 MBE

Li L Stoeckert CJ Jr Roos DS 2003 OrthoMCL identification ofortholog groups for eukaryotic genomes Genome Res 132178ndash2189

Lomma M Dervins-Ravault D Rolando M Nora T Newton HJ SansomFM Sahr T Gomez-Valero L Jules M Hartland EL et al 2010The Legionella pneumophila F-box protein Lpp2082 (AnkB) modu-lates ubiquitination of the host protein parvin B and promotesintracellular replication Cell Microbiol 121272ndash1291

Lowe TM Eddy SR 1997 tRNAscan-SE a program for improved detec-tion of transfer RNA genes in genomic sequence Nucleic Acids Res25955ndash964

Lynch M Conery JS 2000 The evolutionary fate and consequences ofduplicate genes Science 2901151ndash1155

Ma W Dong FFT Stavrinides J Guttman DS 2006 Type III effec-tor diversification via both pathoadaptation and horizontaltransfer in response to a coevolutionary arms race PLoS Genet 2e209

Magori S Citovsky V 2011 Hijacking of the host SCF ubiquitin ligasemachinery by plant pathogens Front Plant Sci 287

Martin D Rybicki E 2000 RDP detection of recombination amongstaligned sequences Bioinformatics 16562ndash563

Martin DP Lemey P Lott M Moulton V Posada D Lefeuvre P 2010RDP3 a flexible and fast computer program for analyzing recombi-nation Bioinformatics 262462ndash2463

Martin DP Posada D Crandall KA Williamson C 2005 A modifiedbootscan algorithm for automated identification of recombinantsequences and recombination breakpoints AIDS Res HumRetroviruses 2198ndash102

McCutcheon JP Moran NA 2011 Extreme genome reduction in sym-biotic bacteria Nat Rev Microbiol 1013ndash26

McLeod MP Warren RL Hsiao WWL Araki N Myhre M Fernandes CMiyazawa D Wong W Lillquist AL Wang D et al 2006 Thecomplete genome of Rhodococcus sp RHA1 provides insightsinto a catabolic powerhouse Proc Natl Acad Sci U S A 10315582ndash15587

Miller MA Pfeiffer W Schwartz T 2010 Creating the CIPRES sciencegateway for inference of large phylogenetic trees In Proceedings ofthe Gateway Computing Environments Workshop (GCE) 2010 p1ndash8

Misaghi S Balsara ZR Catic A Spooner E Ploegh HL Starnbach MN2006 Chlamydia trachomatis-derived deubiquitinating enzymes inmammalian cells during infection Mol Microbiol 61142ndash150

Myers GSA Crabtree J Huot CH 2012 Deep and wide comparativegenomics of Chlamydia In Tan M Bavoil PM editors Intracellularpathogens Chlamydiales Washington (DC) ASM Press p 27ndash50

Nam J Nei M 2005 Evolutionary change of the numbers of homeoboxgenes in bilateral animals Mol Biol Evol 222386ndash2394

Navarro-Quezada A Schumann N Quint M 2013 Plant F-box proteinevolution is determined by lineage-specific timing of major genefamily expansion waves PLoS One 8e68672

NCBI Resource Coordinators 2014 Database resources of the NationalCenter for Biotechnology Information Nucleic Acids Res 42D7ndashD17

Nei M 2007 The new mutation theory of phenotypic evolution ProcNatl Acad Sci U S A 10412235ndash12242

Nei M Niimura Y Nozawa M 2008 The evolution of animal chemo-sensory receptor gene repertoires roles of chance and necessity NatRev Genet 9951ndash963

Nei M Rooney AP 2005 Concerted and birth-and-death evolution ofmultigene families Annu Rev Genet 39121ndash152

Nogueira T Touchon M Rocha EPC 2012 Rapid evolution of the se-quences and gene repertoires of secreted proteins in bacteria PLoSOne 7e49403

Nozawa M Kawahara Y Nei M 2007 Genomic drift and copy numbervariation of sensory receptor genes in humans Proc Natl Acad SciU S A 10420421ndash20426

Ochman H Lawrence JG Groisman EA 2000 Lateral gene transfer andthe nature of bacterial innovation Nature 405299ndash304

Perez-Torrado R Yamada D Defossez P-A 2006 Born to bind the BTBproteinndashprotein interaction domain BioEssays 281194ndash1202

Peters J Wilson DP Myers G Timms P Bavoil PM 2007 Type III secre-tion a la Chlamydia Trends Microbiol 15241ndash251

Posada D Crandall KA 2001 Evaluation of methods for detecting re-combination from DNA sequences computer simulations Proc NatlAcad Sci U S A 9813757ndash13762

Price CT Kwaik YA 2010 Exploitation of host polyubiquitination ma-chinery through molecular mimicry by eukaryotic-like bacterial F-box effectors Cell Infect Microbiol 1122

Price CT Al-Khodor S Al-Quadan T Santic M Habyarimana F Kalia AKwaik YA 2009 Molecular mimicry by an F-box effector ofLegionella pneumophila hijacks a conserved polyubiquitinationmachinery within macrophages and protozoa PLoS Pathog 5e1000704

Price CTD Al-Quadan T Santic M Rosenshine I Kwaik YA 2011 Hostproteasomal degradation generates amino acids essential for intra-cellular bacterial growth Science 3341553ndash1557

Price MN Dehal PS Arkin AP 2010 FastTree 2mdashapproximately maxi-mum-likelihood trees for large alignments PLoS One 5e9490

Pushker R Mira A Rodriguez-Valera F 2004 Comparative geno-mics of gene-family size in closely related bacteria Genome Biol 5R27

Ronquist F Huelsenbeck JP 2003 MrBayes 3 Bayesian phylogeneticinference under mixed models Bioinformatics 191572ndash1574

Rooney AP Ward TJ 2008 Birth-and-death evolution of the internalinmultigene family in Listeria Gene 427124ndash128

Sakakibara A Hattori S 2000 Chat a CasHEF1-associated adaptor pro-tein that integrates multiple signaling pathways J Biol Chem 2756404ndash6410

Santoyo G Romero D 2005 Gene conversion and concerted evolutionin bacterial genomes FEMS Microbiol Rev 29169ndash183

Sawyer S 1989 Statistical tests for detecting gene conversion Mol BiolEvol 6526ndash538

Schmitz-Esser S Tischler P Arnold R Montanaro J Wagner M Rattei THorn M 2010 The genome of the amoeba symbiont ldquoCandidatusAmoebophilus asiaticusrdquo reveals common mechanisms for host cellinteraction among amoeba-associated bacteria J Bacteriol 1921045ndash1057

Schmitz-Esser S Toenshoff ER Haider S Heinz E Hoenninger VMWagner M Horn M 2008 Diversity of bacterial endosymbionts ofenvironmental Acanthamoeba isolates Appl Environ Microbiol 745822ndash5831

Smith JM 1992 Analyzing the mosaic structure of genes J Mol Evol 34126ndash129

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Stephens RS Kalman S Lammel C Fan J Marathe R Aravind L MitchellW Olinger L Tatusov RL Zhao Q et al 1998 Genome sequence ofan obligate intracellular pathogen of humans Chlamydia trachoma-tis Science 282754ndash759

Stogios PJ Downs GS Jauhal JJ Nandra SK Prive GG 2005Sequence and structural analysis of BTB domain proteins GenomeBiol 6R82

Stolzer M Lai H Xu M Sathaye D Vernot B Durand D 2012Inferring duplications losses transfers and incomplete line-age sorting with nonbinary species trees Bioinformatics 28i409ndashi415

Subtil A Collingro A Horn M 2014 Tracing the primordial Chlamydiaeextinct parasites of plants Trends Plant Sci 1936ndash43

Subtil A Parsot C Dautry-Varsat A 2001 Secretion of predicted Incproteins of Chlamydia pneumoniae by a heterologous type III ma-chinery Mol Microbiol 39792ndash800

Suyama M Torrents D Bork P 2006 PAL2NAL robust conversion ofprotein sequence alignments into the corresponding codon align-ments Nucleic Acids Res 34W609ndashW612

2903

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

Tamura K Peterson D Peterson N Stecher G Nei M Kumar S 2011MEGA5 molecular evolutionary genetics analysis using maximumlikelihood evolutionary distance and maximum parsimony meth-ods Mol Biol Evol 282731ndash2739

Thomas JH 2006 Adaptive evolution in two large families of ubiquitin-ligase adapters in nematodes and plants Genome Res 161017ndash1030

Treangen TJ Rocha EPC 2011 Horizontal transfer not duplicationdrives the expansion of protein families in prokaryotes PLoSGenet 7e1001284

UniProt Consortium 2014 Activities at the Universal Protein Resource(UniProt) Nucleic Acids Res 42D191ndashD198

Valen LV 1974 Molecular evolution as predicted by natural selectionJ Mol Evol 389ndash101

Willems AR Schwab M Tyers M 2004 A hitchhikerrsquos guide to the cullinubiquitin ligases SCF and its kin Biochim Biophys Acta 1695133ndash170

Xu G Ma H Nei M Kong H 2009 Evolution of F-box genes in plantsdifferent modes of sequence divergence and their relationships withfunctional diversification Proc Natl Acad Sci U S A 106835ndash840

Xu J 2006 Microbial ecology in the age of genomics andmetagenomics concepts tools and recent advances Mol Ecol 151713ndash1731

Yang J Lusk R Li W-H 2003 Organismal complexity protein complex-ity and gene duplicability Proc Natl Acad Sci U S A 10015661ndash15665

Yang Z 2007 PAML 4 phylogenetic analysis by maximum likelihoodMol Biol Evol 241586ndash1591

Yu NY Wagner JR Laird MR Melli G Rey S Lo R Dao P Sahinalp SCEster M Foster LJ et al 2010 PSORTb 30 improved protein sub-cellular localization prediction with refined localization subcate-gories and predictive capabilities for all prokaryotes Bioinformatics261608ndash1615

Zhang J 2003 Evolution by gene duplication an update Trends EcolEvol 18292ndash298

Zheng N Schulman BA Song L Miller JJ Jeffrey PD Wang P Chu CKoepp DM Elledge SJ Pagano M et al 2002 Structure of the Cul1ndashRbx1ndashSkp1ndashF boxSkp2 SCF ubiquitin ligase complex Nature 416703ndash709

2904

Domman et al doi101093molbevmsu227 MBE

Page 11: Massive Expansion of Ubiquitination-Related Gene Families

evolution (Nei and Rooney 2005 Nei 2007) In the case of theexpansions seen in the Parachlamydiaceae this character islikely the ability to effectively infect and replicate in protisthosts The advantage of the birth-and-death scenario as op-posed to concerted evolution is that individual members ofthe gene family are able to functionally differentiate from eachother and thus might facilitate the adaptation to new eco-logical niches (Nei 2007) Therefore a possible driver for thebirth-and-death model for the PEX and NEX families is theexploitation of new ecological niches which in this case ismost likely new host(s) species or a novel way to subvert hostcell machinery

A Confluence of Drift and Selection

Birth-and-death evolution of gene families occurs by bothadaptive processes and chance events such as genetic drift(Nei et al 2008) This is due to the fact that although geneduplications are intrinsically stochastic events their fixation isinfluenced by both selection and genetic drift If a duplicate isfixed functional divergence can occur due to relaxed selec-tion or diversifying selection in one of the gene copies aprocess known as neofunctionalization (Lynch and Conery2000) Once these genes have diverged the new functionsare then maintained in the genome via purifying selectionWe envision an evolutionary scenario where the PEX and NEXgene family members rapidly diverged either due to positivediversifying or relaxed selection after duplication likely leadingto functional diversification The window for detecting theseearly diversification processes is very small but our analysisindicates that the gene family members are now being main-tained via purifying selection

The retention of such large gene families in organisms thattypically undergo extreme genome reduction is perplexingespecially when considering the variation in copy numberbetween organisms The large number of gains and lossesfor F-box domain containing proteins across chlamydial line-ages indicate substantial fluctuations in gene content some-times over short evolutionary time (fig 5b) One describedcorollary of gene families evolving via a birth-and-deathmodel is that the copy number variation of memberswithin a gene family may vary due to chance duplicationloss events both between and within species a process coinedldquogenomic drift (Nei et al 2008) Genomic drift has been in-voked to explain the large copy number variations in severallarge gene families in eukaryotes including animal chemosen-sory receptors (Nozawa et al 2007) homeobox genes (Namand Nei 2005) and fatty-acid reductases (Eirın-Lopez et al2012)

Remarkably eukaryotic F-box and BTB-box gene familiesdemonstrate the same dramatic evolution following a rapidbirth-and-death model and extensive copy number variationis seen both between and within eukaryotic species (Xu 2006Navarro-Quezada et al 2013) This is especially true in higherplant species (Stogios et al 2005 Xu et al 2009) whereArabidopsis thaliana and Oryza sativa harbor upwards of800 F-box proteins (Xu et al 2009) and large expansionshave also been described in some nematode lineages

(Thomas 2006) A case has been made that genomic driftalso influences the evolution of these eukaryotic F-boxBTB-box protein families (Xu et al 2009) Genomic drift there-fore seems a plausible mechanism describing some aspects ofevolution of the PEX and NEX gene families in theParachlamydiaceae Genome sequences from other chlamyd-ial genomes and data from within populations would allowfurther testing of this hypothesis

Another possibility is that the expansion of these families isthe result of selection pressure due to a coevolutionary armsrace with host counterpart protein(s) Under this scenariothe expansion and diversification of these families are in directresponse to changes in target proteins These evolutionarydynamics also referred to as the Red Queen hypothesis(Valen 1974) have been most exemplified in bacteriawithin plantndashpathogen relationships most notably betweenthe pathogen Pseudomonas syringae and its host Ar thaliana(Ma et al 2006 Baltrus et al 2011) Copy number variation ofeffector proteins has been shown for Ps syringae and it isspeculated that these differences confer differential hostranges (Baltrus et al 2011) It is plausible therefore that var-iations in copy number between the PEX and NEX gene fam-ilies is influenced by the host range of particular chlamydiallineages This would imply that these proteins interact withmany targets from a narrow host range or that there are alimited number of targets for a large number of possible hostsIt seems most parsimonious that the latter was the case andthat these genes serve as accessory virulence factors allowingthese chlamydiae to expand their host range However thefailure to detect positive selection as the major force acting onthe PEX and NEX gene family members casts doubts on acoevolutionary arms race as the sole mechanism for the evo-lution of these gene families

Another conceivable scenario might be that selection pres-sure for increased gene dosage has contributed to the expan-sion of the large chlamydial gene families In reduced genomeswhere the pool of regulatory proteins is limited a path forincreased protein expression might be gene duplication Thismode of ldquogene amplification has been described for instancefor the increase of antibiotic resistance genes in E coli thecholera toxin gene in Vibrio cholerae and capsule biosynthesisgenes in Haemophilus influenza (reviewed in Andersson andHughes 2009) Although a gene dosage scenario cannot befully ruled out for the PEX and NEX gene families it seemsunlikely as the high divergence of their members makes acompletely overlapping function highly improbableAdditionally gene amplifications are in response to strongselection pressure and would therefore be under strong pos-itive selection It is also known that once these selection pres-sures are removed the amplified gene copies are rapidly lostwithin the population often within several generations(Andersson and Hughes 2009)

We favor a hypothesis that incorporates both selectionand chance into the equation We envision that the expan-sion of these gene families is reflective of their role in hostndashmicrobe interaction where they are interfering with the hostubiquitination pathway The expansions may reflect either achange in the environmental niche perhaps the ability to

2899

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

infect a new host organism or they may be in response to agrowing number of targets within a current host Because thePEX and NEX gene families have respectively expanded in theProtochlamydia and Neochlamydia they appear to providelineage-specific functions related to particular host interac-tions The PEX and NEX variation in copy number betweenclosely related organisms may be reflective of genomic drift inwhich the independent gain and loss of members within afamily has been determined to some extent by chanceevents The fact that members of the Chlamydiae includingpathogenic Chlamydia also harbor various proteins to sub-vert the host ubiquitination pathway indicates a case of con-vergent evolution toward exploitation of this system withinthis phylum

To summarize the Chlamydiae harbor several lineage-spe-cific gene families which are the largest among intracellularmicrobes with small genomes Experimental evidence andcomputational analysis strongly indicate that members ofthese large gene families function as effector proteins involvedin manipulating the ubiquitination machinery of their eukary-otic host cells The large chlamydial gene families follow abirth-and-death model of evolution where genomic driftmay influence copy number variation This might representa previously undescribed mechanism by which organismswith limited exposure to larger gene pools generate geneticdiversity

Materials and Methods

Genome Sequencing

We sequenced the genomes of P amoebophila EI2 (Schmitz-Esser et al 2008) Parachlamydia acanthamoebae OEW1(Heinz et al 2007) Neochlamydia sp TUME1 (Fritsche et al2000) and Neochlamydia sp EPS4 The Acanthamoeba spharboring the latter was isolated from pond sediment fromElba Italy Cells and DNA were prepared as described previ-ously (Schmitz-Esser et al 2010) All four genomes were se-quenced using 454 technology and assemblies wereperformed using Newbler 26 We implemented an in-housepipeline for genome annotation that combines multipleapproaches for gene calling and function prediction Genecalling was performed by combining ab initio predictionsfrom GeneMark (Besemer et al 2001) Glimmer v302 (Del-cher et al 2007) Prodigal (Hyatt et al 2010) and Critica v105(Badger and Olsen 1999) with homology information derivedfrom a BLAST search against National Center for Biotechnol-ogy Information nonredundant protein database (NCBIResource Coordinators 2014) RNA genes were called bytRNAscan-SE (Lowe and Eddy 1997) RNAmmer (Lagesenet al 2007) and Rfam (Griffiths-Jones et al 2005) Functionprediction was performed via BLAST against the UniProtdatabase (UniProt Consortium 2014) and domain predictionwas performed via InterProScan 5 (Jones et al 2014)Completeness estimates were performed based on the pres-ence of single-copy marker genes (n = 54) found in 99 of allbacterial genomes Sequences have been deposited at Gen-bankEMBLDDBJ under accession numbers PRJNA242498PRJNA242497 PRJNA242499 and PRJNA242500

Comparative Genome Analysis

Orthologous protein groups were calculated using OrthoMCL(Li et al 2003) with default parameters using the predictedproteomes from 19 chlamydial organisms (supplementarytable S2 Supplementary Material online) For those gene fam-ilies under analysis membership to a given family was furtherevaluated by alignment and assessment of the phylogenetictrees Members that had obvious major differences in thealignment and trees were dubbed spurious and likely tohave been grouped due to homology in the repeat regionand thus were removed from the gene family Whole-genomealignments were performed using the MAUVE progressive-Mauve algorithm (Darling et al 2010) The MAUVE align-ments and local synteny plots were visualized in R (v 301)with the genoPlotR package (Guy et al 2010)

Species Tree Construction

Phylogeny of the Chlamydiae was reconstructed using 32phylogenetic marker proteins (supplementary table S2Supplementary Material online) Multiple sequence align-ments were performed with MAFFT (Katoh and Standley2013) using the settings ldquondashmaxiterate 1000 and all align-ments subsequently concatenated using FASconCAT v10(Keurouck and Meusemann 2010) Maximum-likelihood analysiswas performed with RAxML (Stamatakis 2006) with 1000bootstrap iterations under the PROTGAMMAGTR modeland Bayesian inference was performed with MrBayes v32(Ronquist and Huelsenbeck 2003) using the mixed aminoacid model and standard settings via the CIPRES science gate-way (Miller et al 2010) Alignment and tree files are availableas supplementary material Supplementary Material online

Gene Family Analysis

As multiple sequence alignments of repeat containing pro-teins are not trivial we employed several methods to obtainreliable alignments We compared the alignments producedby MUSCLE (Edgar 2004) MAFFT using the ldquogenapairsoption (Katoh and Standley 2013) and DIALIGN-PFAM (AlAit et al 2013) all with and without trimming by Gblocksusing relaxed parameters ldquo-b5=h (Castresana 2000) In nearlyall cases the MAFFT alignment combined with Gblocksyielded the best alignment as judged by manual inspectionThe exceptions were that DIALIGN-PFAM with Gblocks wasthe best method for the NEX1b and PEX1 alignments Toensure robustness we calculated neighbor joining maximumlikelihood and Bayesian trees for each of the alignment datasets and selected the most supported tree for the final anal-ysis In all cases the Bayesian trees yielded the most supportwhich were calculated on the CIPRES gateway (Miller et al2010) with MrBayes as mentioned above for the species treeFor protein domain phylogenies we extracted and aligned thedomains using the hmmalign program from HMMER3 pack-age (Eddy 2011) and phylogenetic trees for the domains werecalculated using FastTree2 (Price et al 2010) Reconciliationsof gene trees to the species tree to infer gene gain and losswere performed in Notung v26 (Stolzer et al 2012) and theroot for the gene trees was assigned within the program to

2900

Domman et al doi101093molbevmsu227 MBE

achieve the lowest duplication-to-loss ratio Phylogenetictrees were visualized using iTOL (Letunic and Bork 2007) orthe ETE2 toolkit (Huerta-Cepas et al 2010) Alignment andtree files are available as supplementary materialSupplementary Material online

Detection of Selection and Recombination

Whole-protein alignments were converted to codon align-ments via the Pal2Nal program (Suyama et al 2006) Todetect the mode of selection acting on these gene familieswe used CodeML from the PAML package (Yang 2007) Onlysequences that were nearly full length and not contig fusionswere used for the CodeML analysis We additionally used themodified NeindashGojobori method for codon selection from theMEGA v51 (Tamura et al 2011) program with 1000 boot-straps and treating missing data with pairwise deletions Weused the RDP4 software suite (Martin et al 2010) to detectrecombination events which is an amalgamation of manyindividual recombination programs linked into one softwarearchitecture We employed seven recombination detectionprograms that include RDP (Martin and Rybicki 2000)GENECONV (Sawyer 1989) MaxChi (Smith 1992) BootScan(Martin et al 2005) Chimaera (Posada and Crandall 2001)SiScan (Gibbs et al 2000) and 3Seq (Boni et al 2007) Onlyevents that were predicted with more than four programswere considered

Intergenic Sequences Analysis

The proteome of Protochlamydia was used as query(tBLASTn) against the intergenic regions The hits were con-servatively filtered based on size (4 50 aa) bitscore (4 50)and E value (lt1010) For each intergenic region when twoquery proteins overlap only the best hit was considered Inaddition if different parts of the same protein had hits in oneintergenic space only the top scoring (bitscore) was consid-ered Putative domains in the intergenic regions were de-tected using InterProScan 5 (Jones et al 2014)

Domain Distribution

For each domain of interest we downloaded the Pfam HMMmodel (Finn et al 2013) and scanned all chlamydial prote-omes using the hmmscan program from the HMMER3 suite(Eddy 2011) In the case of the F-box and F-box-like domainsthe results were combined into a nonredundant list The datafor the other taxa were obtained through the Pfam websiteunder the species distribution tab for each domain or clan asin the case for the F-box The networks were created using theldquoarcdiagramrdquo package in R

Type III Secretion Analysis

Fusion proteins containing the 50-part of the genes of interest(including the first 20 codons) and the adenylate cyclase Cyawere expressed in Sh Flexneri SF401 and SF620 derivatives ofthe wild-type strain M90T in which the mxiD and ipaB geneshave been inactivated (Allaoui et al 1993) The 50-part of thetarget genes were amplified by polymerase chain reaction andcloned in the puc19cya vector as described (Subtil et al 2001)

Secretion assays were performed on 30 ml of exponentiallygrown cultures as described previously (Subtil et al 2001)Antibodies against CRP a cytosolic marker were used to es-timate the contamination of supernatant fractions with bac-terial proteins as a result of bacterial lysis Antibodies againstIpaD a type-III-secreted protein of Shigella were used to verifythat type III secretion occurred normally in the transformedstrains A monoclonal antibody against Cya and polyclonalantibodies against CRP and IpaD were generously provided byDrs N Guiso A Ullmann and C Parsot respectively (InstitutPasteur Paris) Prediction of type III secretion was performedvia the Effective database (Jehl et al 2011) and the PSORTbwebserver (Yu et al 2010)

Supplementary MaterialSupplementary tables S1ndashS3 and figures S1ndashS8 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Thomas Penz and Gabriele Schwammel are acknowledged forisolating genomic DNA This work was supported byAustrian Science Fund FWF grants Y277-B06 and I1628-B22European Research Council ERC StG EVOCHLAMY (281633)and Marie Curie Initial Training Network SymbiomicsGenome sequencing was performed at the NorwegianSequencing Center

ReferencesAl Ait L Yamak Z Morgenstern B 2013 DIALIGN at GOBICSmdashmultiple

sequence alignment using various sources of external informationNucleic Acids Res 41W3ndashW7

Allaoui A Sansonetti PJ Parsot C 1993 MxiD an outer membraneprotein necessary for the secretion of the Shigella flexneri lpa inva-sins Mol Microbiol 759ndash68

Andersson DI Hughes D 2009 Gene amplification and adaptive evolu-tion in bacteria Annu Rev Genet 43167ndash195

Angot A Vergunst A Genin S Peeters N 2007 Exploitationof eukaryotic ubiquitin signaling pathways by effectorstranslocated by bacterial type III and type IV Secretion SystemsPLoS Pathog 3e3

Arnold R Brandmaier S Kleine F Tischler P Heinz E Behrens SNiinikoski A Mewes HW Horn M Rattei T 2009 Sequence-based prediction of type III secreted proteins PLoS Pathog 5e1000376

Badger JH Olsen GJ 1999 CRITICA coding region identification toolinvoking comparative analysis Mol Biol Evol 16512ndash524

Balakirev MY Tcherniuk SO Jaquinod M Chroboczek J 2003 Otubainsa new family of cysteine proteases in the ubiquitin pathway EMBORep 4517ndash522

Baltrus DA Nishimura MT Romanchuk A Chang JH Mukhtar MSCherkis K Roach J Grant SR Jones CD Dangl JL 2011 Dynamicevolution of pathogenicity revealed by sequencing and comparativegenomics of 19 Pseudomonas syringae isolates PLoS Pathog 7e1002132

Bertelli C Collyn F Croxatto A Reurouckert C Polkinghorne A Kebbi-Beghdadi C Goesmann A Vaughan L Greub G 2010 TheWaddlia genome a window into chlamydial biology PLoS One 5e10890

2901

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

Besemer J Lomsadze A Borodovsky M 2001 GeneMarkS a self-trainingmethod for prediction of gene starts in microbial genomesImplications for finding sequence motifs in regulatory regionsNucleic Acids Res 292607ndash2618

Betts HJ Wolf K Fields KA 2009 Effector protein modulation of hostcells examples in the Chlamydia spp arsenal Curr Opin Microbiol1281ndash87

Blanc G Ogata H Robert C Audic S Claverie J-M Raoult D 2007Lateral gene transfer between obligate intracellular bacteria evi-dence from the Rickettsia massiliae genome Genome Res 171657ndash1664

Boni MF Posada D Feldman MW 2007 An exact nonparametricmethod for inferring mosaic structure in sequence tripletsGenetics 1761035ndash1047

Bordenstein SR Reznikoff WS 2005 Mobile DNA in obligate intracellularbacteria Nat Rev Microbiol 3688ndash699

Bratlie MS Johansen J Sherman BT Huang DW Lempicki RA Drabloslashs F2010 Gene duplications in prokaryotes can be associated with en-vironmental adaptation BMC Genomics 11588

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Cho N-H Kim H-R Lee J-H Kim SY Kim J Cha S Kim SY Darby ACFuxelius HH Yin J et al 2007 The Orientia tsutsugamushi genome re-veals massive proliferation of conjugative type IV secretion systemand host-cell interaction genes Proc Natl Acad Sci U S A 1047981ndash7986

Collingro A Tischler P Weinmaier T Penz T Heinz E Brunham RC ReadTD Bavoil PM Sachse K Kahane S et al 2011 Unity in varietymdashthe pan-genome of the Chlamydiae Mol Biol Evol 283253ndash3270

Darling AE Mau B Perna NT 2010 progressiveMauve multiple genomealignment with gene gain loss and rearrangement PLoS One 5e11147

Delcher AL Bratke KA Powers EC Salzberg SL 2007 Identifying bacterialgenes and endosymbiont DNA with Glimmer Bioinformatics 23673ndash679

Eddy SR 2011 Accelerated profile hmm searches PLoS Comput Biol 7e1002195

Edgar RC 2004 MUSCLE multiple sequence alignmentwith high accuracy and high throughput Nucleic Acids Res 321792ndash1797

Eirın-Lopez JM Rebordinos L Rooney AP Rozas J 2012 The birth-and-death evolution of multigene families revisited Genome Dyn7170ndash196

Ensminger AW Isberg RR 2010 E3 ubiquitin ligase activity and targetingof Bat3 by multiple Legionella pneumophila translocated substratesInfect Immun 783905ndash3919

Finn RD Bateman A Clements J Coggill P Eberhardt RY Eddy SR HegerA Hetherington K Holm L Mistry J et al 2013 Pfam the proteinfamilies database Nucleic Acids Res 42D222ndashD230

Fritsche TR Horn M Wagner M Herwig RP Schleifer K-H Gautom RK2000 Phylogenetic diversity among geographically dispersedChlamydiales endosymbionts recovered from clinical and environ-mental isolates of Acanthamoeba spp Appl Environ Microbiol 662613ndash2619

Furtado AR Essid M Perrinet S Bala~na ME Yoder N Dehoux P Subtil A2013 The chlamydial OTU domain-containing protein ChlaOTU isan early type III secretion effector targeting ubiquitin and NDP52Cell Microbiol 152064ndash2079

Gevers D Vandepoele K Simillon C Van de Peer Y 2004 Gene dupli-cation and biased functional retention of paralogs in bacterial ge-nomes Trends Microbiol 12148ndash154

Gibbs MJ Armstrong JS Gibbs AJ 2000 Sister-scanning a Monte Carloprocedure for assessing signals in recombinant sequencesBioinformatics 16573ndash582

Goldman BS Nierman WC Kaiser D Slater SC Durkin AS Eisen JARonning CM Barbazuk WB Blanchard M Field C et al 2006Evolution of sensory complexity recorded in a myxobacterialgenome Proc Natl Acad Sci U S A 10315200ndash15205

Gomes JP Nunes A Bruno WJ Borrego MJ Florindo C Dean D 2006Polymorphisms in the nine polymorphic membrane proteins ofChlamydia trachomatis across all serovars evidence for serovar Darecombination and correlation with tissue tropism J Bacteriol 188275ndash286

Griffiths-Jones S Moxon S Marshall M Khanna A Eddy SR Bateman A2005 Rfam annotating non-coding RNAs in complete genomesNucleic Acids Res 33D121ndashD124

Grimwood J Stephens RS 1999 Computational analysis of thepolymorphic membrane protein superfamily of Chlamydia tracho-matis and Chlamydia pneumoniae Microb Comp Genomics 4187ndash201

Guy L Kultima JR Andersson SGE 2010 genoPlotR comparative geneand genome visualization in R Bioinformatics 262334ndash2335

Heinz E Kolarov I Keuroastner C Toenshoff ER Wagner M Horn M 2007An Acanthamoeba sp containing two phylogenetically differentbacterial endosymbionts Environ Microbiol 91604ndash1609

Hooper SD Berg OG 2003 On the nature of gene innovation duplica-tion patterns in microbial genomes Mol Biol Evol 20945ndash954

Horn M 2008 Chlamydiae as symbionts in eukaryotes Annu RevMicrobiol 62113ndash131

Horn M Collingro A Schmitz-Esser S Beier CL Purkhold U Fartmann BBrandt P Nyakatura GJ Droege M Frishman D et al 2004Illuminating the evolutionary history of chlamydiae Science 304728ndash730

Huerta-Cepas J Dopazo J Gabaldon T 2010 ETE a python environmentfor tree exploration BMC Bioinformatics 1124

Hyatt D Chen G-L LoCascio PF Land ML Larimer FW Hauser LJ 2010Prodigal prokaryotic gene recognition and translation initiation siteidentification BMC Bioinformatics 11119

Jehl M-A Arnold R Rattei T 2011 Effectivemdasha database ofpredicted secreted bacterial proteins Nucleic Acids Res 39D591ndashD595

Jones P Binns D Chang H-Y Fraser M Li W McAnulla C McWilliam HMaslen J Mitchell A Nuka G et al 2014 InterProScan 5 genome-scale protein function classification Bioinformatics 301236ndash1240

Kaessmann H 2010 Origins evolution and phenotypic impact of newgenes Genome Res 201313ndash1326

Kalman S Mitchell W Marathe R Lammel C Fan J Hyman RW OlingerL Grimwood J Davis RW Stephens RS 1999 Comparative genomesof Chlamydia pneumoniae and C trachomatis Nat Genet 21385ndash389

Kamneva OK Knight SJ Liberles DA Ward NL 2012 Analysis of genomecontent evolution in PVC bacterial super-phylum assessment ofcandidate genes associated with cellular organization and lifestyleGenome Biol Evol 41375ndash1390

Katoh K Standley DM 2013 MAFFT multiple sequence alignment soft-ware version 7 improvements in performance and usability Mol BiolEvol 30772ndash780

Koonin EV Wolf YI Karev GP 2002 The structure of the protein uni-verse and genome evolution Nature 420218ndash223

Keurouck P Meusemann K 2010 FASconCAT convenient handling of datamatrices Mol Phylogenet Evol 561115ndash1118

Kuo C-H Ochman H 2009a Deletional bias across the three domains oflife Genome Biol Evol 1145ndash152

Kuo C-H Ochman H 2009b The fate of new bacterial genes FEMSMicrobiol Rev 3338ndash43

Lagesen K Hallin P Roslashdland EA Staefeldt H-H Rognes T Ussery DW2007 RNAmmer consistent and rapid annotation of ribosomalRNA genes Nucleic Acids Res 353100ndash3108

Lagkouvardos I Weinmaier T Lauro FM Cavicchioli R Rattei T Horn M2013 Integrating metagenomic and amplicon databases to re-solve the phylogenetic and ecological diversity of the ChlamydiaeISME J 8115ndash125

Lerat E Daubin V Ochman H Moran NA 2005 Evolutionary origins ofgenomic repertoires in bacteria PLoS Biol 3e130

Letunic I Bork P 2007 Interactive Tree Of Life (iTOL) an online tool forphylogenetic tree display and annotation Bioinformatics 23127ndash128

2902

Domman et al doi101093molbevmsu227 MBE

Li L Stoeckert CJ Jr Roos DS 2003 OrthoMCL identification ofortholog groups for eukaryotic genomes Genome Res 132178ndash2189

Lomma M Dervins-Ravault D Rolando M Nora T Newton HJ SansomFM Sahr T Gomez-Valero L Jules M Hartland EL et al 2010The Legionella pneumophila F-box protein Lpp2082 (AnkB) modu-lates ubiquitination of the host protein parvin B and promotesintracellular replication Cell Microbiol 121272ndash1291

Lowe TM Eddy SR 1997 tRNAscan-SE a program for improved detec-tion of transfer RNA genes in genomic sequence Nucleic Acids Res25955ndash964

Lynch M Conery JS 2000 The evolutionary fate and consequences ofduplicate genes Science 2901151ndash1155

Ma W Dong FFT Stavrinides J Guttman DS 2006 Type III effec-tor diversification via both pathoadaptation and horizontaltransfer in response to a coevolutionary arms race PLoS Genet 2e209

Magori S Citovsky V 2011 Hijacking of the host SCF ubiquitin ligasemachinery by plant pathogens Front Plant Sci 287

Martin D Rybicki E 2000 RDP detection of recombination amongstaligned sequences Bioinformatics 16562ndash563

Martin DP Lemey P Lott M Moulton V Posada D Lefeuvre P 2010RDP3 a flexible and fast computer program for analyzing recombi-nation Bioinformatics 262462ndash2463

Martin DP Posada D Crandall KA Williamson C 2005 A modifiedbootscan algorithm for automated identification of recombinantsequences and recombination breakpoints AIDS Res HumRetroviruses 2198ndash102

McCutcheon JP Moran NA 2011 Extreme genome reduction in sym-biotic bacteria Nat Rev Microbiol 1013ndash26

McLeod MP Warren RL Hsiao WWL Araki N Myhre M Fernandes CMiyazawa D Wong W Lillquist AL Wang D et al 2006 Thecomplete genome of Rhodococcus sp RHA1 provides insightsinto a catabolic powerhouse Proc Natl Acad Sci U S A 10315582ndash15587

Miller MA Pfeiffer W Schwartz T 2010 Creating the CIPRES sciencegateway for inference of large phylogenetic trees In Proceedings ofthe Gateway Computing Environments Workshop (GCE) 2010 p1ndash8

Misaghi S Balsara ZR Catic A Spooner E Ploegh HL Starnbach MN2006 Chlamydia trachomatis-derived deubiquitinating enzymes inmammalian cells during infection Mol Microbiol 61142ndash150

Myers GSA Crabtree J Huot CH 2012 Deep and wide comparativegenomics of Chlamydia In Tan M Bavoil PM editors Intracellularpathogens Chlamydiales Washington (DC) ASM Press p 27ndash50

Nam J Nei M 2005 Evolutionary change of the numbers of homeoboxgenes in bilateral animals Mol Biol Evol 222386ndash2394

Navarro-Quezada A Schumann N Quint M 2013 Plant F-box proteinevolution is determined by lineage-specific timing of major genefamily expansion waves PLoS One 8e68672

NCBI Resource Coordinators 2014 Database resources of the NationalCenter for Biotechnology Information Nucleic Acids Res 42D7ndashD17

Nei M 2007 The new mutation theory of phenotypic evolution ProcNatl Acad Sci U S A 10412235ndash12242

Nei M Niimura Y Nozawa M 2008 The evolution of animal chemo-sensory receptor gene repertoires roles of chance and necessity NatRev Genet 9951ndash963

Nei M Rooney AP 2005 Concerted and birth-and-death evolution ofmultigene families Annu Rev Genet 39121ndash152

Nogueira T Touchon M Rocha EPC 2012 Rapid evolution of the se-quences and gene repertoires of secreted proteins in bacteria PLoSOne 7e49403

Nozawa M Kawahara Y Nei M 2007 Genomic drift and copy numbervariation of sensory receptor genes in humans Proc Natl Acad SciU S A 10420421ndash20426

Ochman H Lawrence JG Groisman EA 2000 Lateral gene transfer andthe nature of bacterial innovation Nature 405299ndash304

Perez-Torrado R Yamada D Defossez P-A 2006 Born to bind the BTBproteinndashprotein interaction domain BioEssays 281194ndash1202

Peters J Wilson DP Myers G Timms P Bavoil PM 2007 Type III secre-tion a la Chlamydia Trends Microbiol 15241ndash251

Posada D Crandall KA 2001 Evaluation of methods for detecting re-combination from DNA sequences computer simulations Proc NatlAcad Sci U S A 9813757ndash13762

Price CT Kwaik YA 2010 Exploitation of host polyubiquitination ma-chinery through molecular mimicry by eukaryotic-like bacterial F-box effectors Cell Infect Microbiol 1122

Price CT Al-Khodor S Al-Quadan T Santic M Habyarimana F Kalia AKwaik YA 2009 Molecular mimicry by an F-box effector ofLegionella pneumophila hijacks a conserved polyubiquitinationmachinery within macrophages and protozoa PLoS Pathog 5e1000704

Price CTD Al-Quadan T Santic M Rosenshine I Kwaik YA 2011 Hostproteasomal degradation generates amino acids essential for intra-cellular bacterial growth Science 3341553ndash1557

Price MN Dehal PS Arkin AP 2010 FastTree 2mdashapproximately maxi-mum-likelihood trees for large alignments PLoS One 5e9490

Pushker R Mira A Rodriguez-Valera F 2004 Comparative geno-mics of gene-family size in closely related bacteria Genome Biol 5R27

Ronquist F Huelsenbeck JP 2003 MrBayes 3 Bayesian phylogeneticinference under mixed models Bioinformatics 191572ndash1574

Rooney AP Ward TJ 2008 Birth-and-death evolution of the internalinmultigene family in Listeria Gene 427124ndash128

Sakakibara A Hattori S 2000 Chat a CasHEF1-associated adaptor pro-tein that integrates multiple signaling pathways J Biol Chem 2756404ndash6410

Santoyo G Romero D 2005 Gene conversion and concerted evolutionin bacterial genomes FEMS Microbiol Rev 29169ndash183

Sawyer S 1989 Statistical tests for detecting gene conversion Mol BiolEvol 6526ndash538

Schmitz-Esser S Tischler P Arnold R Montanaro J Wagner M Rattei THorn M 2010 The genome of the amoeba symbiont ldquoCandidatusAmoebophilus asiaticusrdquo reveals common mechanisms for host cellinteraction among amoeba-associated bacteria J Bacteriol 1921045ndash1057

Schmitz-Esser S Toenshoff ER Haider S Heinz E Hoenninger VMWagner M Horn M 2008 Diversity of bacterial endosymbionts ofenvironmental Acanthamoeba isolates Appl Environ Microbiol 745822ndash5831

Smith JM 1992 Analyzing the mosaic structure of genes J Mol Evol 34126ndash129

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Stephens RS Kalman S Lammel C Fan J Marathe R Aravind L MitchellW Olinger L Tatusov RL Zhao Q et al 1998 Genome sequence ofan obligate intracellular pathogen of humans Chlamydia trachoma-tis Science 282754ndash759

Stogios PJ Downs GS Jauhal JJ Nandra SK Prive GG 2005Sequence and structural analysis of BTB domain proteins GenomeBiol 6R82

Stolzer M Lai H Xu M Sathaye D Vernot B Durand D 2012Inferring duplications losses transfers and incomplete line-age sorting with nonbinary species trees Bioinformatics 28i409ndashi415

Subtil A Collingro A Horn M 2014 Tracing the primordial Chlamydiaeextinct parasites of plants Trends Plant Sci 1936ndash43

Subtil A Parsot C Dautry-Varsat A 2001 Secretion of predicted Incproteins of Chlamydia pneumoniae by a heterologous type III ma-chinery Mol Microbiol 39792ndash800

Suyama M Torrents D Bork P 2006 PAL2NAL robust conversion ofprotein sequence alignments into the corresponding codon align-ments Nucleic Acids Res 34W609ndashW612

2903

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

Tamura K Peterson D Peterson N Stecher G Nei M Kumar S 2011MEGA5 molecular evolutionary genetics analysis using maximumlikelihood evolutionary distance and maximum parsimony meth-ods Mol Biol Evol 282731ndash2739

Thomas JH 2006 Adaptive evolution in two large families of ubiquitin-ligase adapters in nematodes and plants Genome Res 161017ndash1030

Treangen TJ Rocha EPC 2011 Horizontal transfer not duplicationdrives the expansion of protein families in prokaryotes PLoSGenet 7e1001284

UniProt Consortium 2014 Activities at the Universal Protein Resource(UniProt) Nucleic Acids Res 42D191ndashD198

Valen LV 1974 Molecular evolution as predicted by natural selectionJ Mol Evol 389ndash101

Willems AR Schwab M Tyers M 2004 A hitchhikerrsquos guide to the cullinubiquitin ligases SCF and its kin Biochim Biophys Acta 1695133ndash170

Xu G Ma H Nei M Kong H 2009 Evolution of F-box genes in plantsdifferent modes of sequence divergence and their relationships withfunctional diversification Proc Natl Acad Sci U S A 106835ndash840

Xu J 2006 Microbial ecology in the age of genomics andmetagenomics concepts tools and recent advances Mol Ecol 151713ndash1731

Yang J Lusk R Li W-H 2003 Organismal complexity protein complex-ity and gene duplicability Proc Natl Acad Sci U S A 10015661ndash15665

Yang Z 2007 PAML 4 phylogenetic analysis by maximum likelihoodMol Biol Evol 241586ndash1591

Yu NY Wagner JR Laird MR Melli G Rey S Lo R Dao P Sahinalp SCEster M Foster LJ et al 2010 PSORTb 30 improved protein sub-cellular localization prediction with refined localization subcate-gories and predictive capabilities for all prokaryotes Bioinformatics261608ndash1615

Zhang J 2003 Evolution by gene duplication an update Trends EcolEvol 18292ndash298

Zheng N Schulman BA Song L Miller JJ Jeffrey PD Wang P Chu CKoepp DM Elledge SJ Pagano M et al 2002 Structure of the Cul1ndashRbx1ndashSkp1ndashF boxSkp2 SCF ubiquitin ligase complex Nature 416703ndash709

2904

Domman et al doi101093molbevmsu227 MBE

Page 12: Massive Expansion of Ubiquitination-Related Gene Families

infect a new host organism or they may be in response to agrowing number of targets within a current host Because thePEX and NEX gene families have respectively expanded in theProtochlamydia and Neochlamydia they appear to providelineage-specific functions related to particular host interac-tions The PEX and NEX variation in copy number betweenclosely related organisms may be reflective of genomic drift inwhich the independent gain and loss of members within afamily has been determined to some extent by chanceevents The fact that members of the Chlamydiae includingpathogenic Chlamydia also harbor various proteins to sub-vert the host ubiquitination pathway indicates a case of con-vergent evolution toward exploitation of this system withinthis phylum

To summarize the Chlamydiae harbor several lineage-spe-cific gene families which are the largest among intracellularmicrobes with small genomes Experimental evidence andcomputational analysis strongly indicate that members ofthese large gene families function as effector proteins involvedin manipulating the ubiquitination machinery of their eukary-otic host cells The large chlamydial gene families follow abirth-and-death model of evolution where genomic driftmay influence copy number variation This might representa previously undescribed mechanism by which organismswith limited exposure to larger gene pools generate geneticdiversity

Materials and Methods

Genome Sequencing

We sequenced the genomes of P amoebophila EI2 (Schmitz-Esser et al 2008) Parachlamydia acanthamoebae OEW1(Heinz et al 2007) Neochlamydia sp TUME1 (Fritsche et al2000) and Neochlamydia sp EPS4 The Acanthamoeba spharboring the latter was isolated from pond sediment fromElba Italy Cells and DNA were prepared as described previ-ously (Schmitz-Esser et al 2010) All four genomes were se-quenced using 454 technology and assemblies wereperformed using Newbler 26 We implemented an in-housepipeline for genome annotation that combines multipleapproaches for gene calling and function prediction Genecalling was performed by combining ab initio predictionsfrom GeneMark (Besemer et al 2001) Glimmer v302 (Del-cher et al 2007) Prodigal (Hyatt et al 2010) and Critica v105(Badger and Olsen 1999) with homology information derivedfrom a BLAST search against National Center for Biotechnol-ogy Information nonredundant protein database (NCBIResource Coordinators 2014) RNA genes were called bytRNAscan-SE (Lowe and Eddy 1997) RNAmmer (Lagesenet al 2007) and Rfam (Griffiths-Jones et al 2005) Functionprediction was performed via BLAST against the UniProtdatabase (UniProt Consortium 2014) and domain predictionwas performed via InterProScan 5 (Jones et al 2014)Completeness estimates were performed based on the pres-ence of single-copy marker genes (n = 54) found in 99 of allbacterial genomes Sequences have been deposited at Gen-bankEMBLDDBJ under accession numbers PRJNA242498PRJNA242497 PRJNA242499 and PRJNA242500

Comparative Genome Analysis

Orthologous protein groups were calculated using OrthoMCL(Li et al 2003) with default parameters using the predictedproteomes from 19 chlamydial organisms (supplementarytable S2 Supplementary Material online) For those gene fam-ilies under analysis membership to a given family was furtherevaluated by alignment and assessment of the phylogenetictrees Members that had obvious major differences in thealignment and trees were dubbed spurious and likely tohave been grouped due to homology in the repeat regionand thus were removed from the gene family Whole-genomealignments were performed using the MAUVE progressive-Mauve algorithm (Darling et al 2010) The MAUVE align-ments and local synteny plots were visualized in R (v 301)with the genoPlotR package (Guy et al 2010)

Species Tree Construction

Phylogeny of the Chlamydiae was reconstructed using 32phylogenetic marker proteins (supplementary table S2Supplementary Material online) Multiple sequence align-ments were performed with MAFFT (Katoh and Standley2013) using the settings ldquondashmaxiterate 1000 and all align-ments subsequently concatenated using FASconCAT v10(Keurouck and Meusemann 2010) Maximum-likelihood analysiswas performed with RAxML (Stamatakis 2006) with 1000bootstrap iterations under the PROTGAMMAGTR modeland Bayesian inference was performed with MrBayes v32(Ronquist and Huelsenbeck 2003) using the mixed aminoacid model and standard settings via the CIPRES science gate-way (Miller et al 2010) Alignment and tree files are availableas supplementary material Supplementary Material online

Gene Family Analysis

As multiple sequence alignments of repeat containing pro-teins are not trivial we employed several methods to obtainreliable alignments We compared the alignments producedby MUSCLE (Edgar 2004) MAFFT using the ldquogenapairsoption (Katoh and Standley 2013) and DIALIGN-PFAM (AlAit et al 2013) all with and without trimming by Gblocksusing relaxed parameters ldquo-b5=h (Castresana 2000) In nearlyall cases the MAFFT alignment combined with Gblocksyielded the best alignment as judged by manual inspectionThe exceptions were that DIALIGN-PFAM with Gblocks wasthe best method for the NEX1b and PEX1 alignments Toensure robustness we calculated neighbor joining maximumlikelihood and Bayesian trees for each of the alignment datasets and selected the most supported tree for the final anal-ysis In all cases the Bayesian trees yielded the most supportwhich were calculated on the CIPRES gateway (Miller et al2010) with MrBayes as mentioned above for the species treeFor protein domain phylogenies we extracted and aligned thedomains using the hmmalign program from HMMER3 pack-age (Eddy 2011) and phylogenetic trees for the domains werecalculated using FastTree2 (Price et al 2010) Reconciliationsof gene trees to the species tree to infer gene gain and losswere performed in Notung v26 (Stolzer et al 2012) and theroot for the gene trees was assigned within the program to

2900

Domman et al doi101093molbevmsu227 MBE

achieve the lowest duplication-to-loss ratio Phylogenetictrees were visualized using iTOL (Letunic and Bork 2007) orthe ETE2 toolkit (Huerta-Cepas et al 2010) Alignment andtree files are available as supplementary materialSupplementary Material online

Detection of Selection and Recombination

Whole-protein alignments were converted to codon align-ments via the Pal2Nal program (Suyama et al 2006) Todetect the mode of selection acting on these gene familieswe used CodeML from the PAML package (Yang 2007) Onlysequences that were nearly full length and not contig fusionswere used for the CodeML analysis We additionally used themodified NeindashGojobori method for codon selection from theMEGA v51 (Tamura et al 2011) program with 1000 boot-straps and treating missing data with pairwise deletions Weused the RDP4 software suite (Martin et al 2010) to detectrecombination events which is an amalgamation of manyindividual recombination programs linked into one softwarearchitecture We employed seven recombination detectionprograms that include RDP (Martin and Rybicki 2000)GENECONV (Sawyer 1989) MaxChi (Smith 1992) BootScan(Martin et al 2005) Chimaera (Posada and Crandall 2001)SiScan (Gibbs et al 2000) and 3Seq (Boni et al 2007) Onlyevents that were predicted with more than four programswere considered

Intergenic Sequences Analysis

The proteome of Protochlamydia was used as query(tBLASTn) against the intergenic regions The hits were con-servatively filtered based on size (4 50 aa) bitscore (4 50)and E value (lt1010) For each intergenic region when twoquery proteins overlap only the best hit was considered Inaddition if different parts of the same protein had hits in oneintergenic space only the top scoring (bitscore) was consid-ered Putative domains in the intergenic regions were de-tected using InterProScan 5 (Jones et al 2014)

Domain Distribution

For each domain of interest we downloaded the Pfam HMMmodel (Finn et al 2013) and scanned all chlamydial prote-omes using the hmmscan program from the HMMER3 suite(Eddy 2011) In the case of the F-box and F-box-like domainsthe results were combined into a nonredundant list The datafor the other taxa were obtained through the Pfam websiteunder the species distribution tab for each domain or clan asin the case for the F-box The networks were created using theldquoarcdiagramrdquo package in R

Type III Secretion Analysis

Fusion proteins containing the 50-part of the genes of interest(including the first 20 codons) and the adenylate cyclase Cyawere expressed in Sh Flexneri SF401 and SF620 derivatives ofthe wild-type strain M90T in which the mxiD and ipaB geneshave been inactivated (Allaoui et al 1993) The 50-part of thetarget genes were amplified by polymerase chain reaction andcloned in the puc19cya vector as described (Subtil et al 2001)

Secretion assays were performed on 30 ml of exponentiallygrown cultures as described previously (Subtil et al 2001)Antibodies against CRP a cytosolic marker were used to es-timate the contamination of supernatant fractions with bac-terial proteins as a result of bacterial lysis Antibodies againstIpaD a type-III-secreted protein of Shigella were used to verifythat type III secretion occurred normally in the transformedstrains A monoclonal antibody against Cya and polyclonalantibodies against CRP and IpaD were generously provided byDrs N Guiso A Ullmann and C Parsot respectively (InstitutPasteur Paris) Prediction of type III secretion was performedvia the Effective database (Jehl et al 2011) and the PSORTbwebserver (Yu et al 2010)

Supplementary MaterialSupplementary tables S1ndashS3 and figures S1ndashS8 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Thomas Penz and Gabriele Schwammel are acknowledged forisolating genomic DNA This work was supported byAustrian Science Fund FWF grants Y277-B06 and I1628-B22European Research Council ERC StG EVOCHLAMY (281633)and Marie Curie Initial Training Network SymbiomicsGenome sequencing was performed at the NorwegianSequencing Center

ReferencesAl Ait L Yamak Z Morgenstern B 2013 DIALIGN at GOBICSmdashmultiple

sequence alignment using various sources of external informationNucleic Acids Res 41W3ndashW7

Allaoui A Sansonetti PJ Parsot C 1993 MxiD an outer membraneprotein necessary for the secretion of the Shigella flexneri lpa inva-sins Mol Microbiol 759ndash68

Andersson DI Hughes D 2009 Gene amplification and adaptive evolu-tion in bacteria Annu Rev Genet 43167ndash195

Angot A Vergunst A Genin S Peeters N 2007 Exploitationof eukaryotic ubiquitin signaling pathways by effectorstranslocated by bacterial type III and type IV Secretion SystemsPLoS Pathog 3e3

Arnold R Brandmaier S Kleine F Tischler P Heinz E Behrens SNiinikoski A Mewes HW Horn M Rattei T 2009 Sequence-based prediction of type III secreted proteins PLoS Pathog 5e1000376

Badger JH Olsen GJ 1999 CRITICA coding region identification toolinvoking comparative analysis Mol Biol Evol 16512ndash524

Balakirev MY Tcherniuk SO Jaquinod M Chroboczek J 2003 Otubainsa new family of cysteine proteases in the ubiquitin pathway EMBORep 4517ndash522

Baltrus DA Nishimura MT Romanchuk A Chang JH Mukhtar MSCherkis K Roach J Grant SR Jones CD Dangl JL 2011 Dynamicevolution of pathogenicity revealed by sequencing and comparativegenomics of 19 Pseudomonas syringae isolates PLoS Pathog 7e1002132

Bertelli C Collyn F Croxatto A Reurouckert C Polkinghorne A Kebbi-Beghdadi C Goesmann A Vaughan L Greub G 2010 TheWaddlia genome a window into chlamydial biology PLoS One 5e10890

2901

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

Besemer J Lomsadze A Borodovsky M 2001 GeneMarkS a self-trainingmethod for prediction of gene starts in microbial genomesImplications for finding sequence motifs in regulatory regionsNucleic Acids Res 292607ndash2618

Betts HJ Wolf K Fields KA 2009 Effector protein modulation of hostcells examples in the Chlamydia spp arsenal Curr Opin Microbiol1281ndash87

Blanc G Ogata H Robert C Audic S Claverie J-M Raoult D 2007Lateral gene transfer between obligate intracellular bacteria evi-dence from the Rickettsia massiliae genome Genome Res 171657ndash1664

Boni MF Posada D Feldman MW 2007 An exact nonparametricmethod for inferring mosaic structure in sequence tripletsGenetics 1761035ndash1047

Bordenstein SR Reznikoff WS 2005 Mobile DNA in obligate intracellularbacteria Nat Rev Microbiol 3688ndash699

Bratlie MS Johansen J Sherman BT Huang DW Lempicki RA Drabloslashs F2010 Gene duplications in prokaryotes can be associated with en-vironmental adaptation BMC Genomics 11588

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Cho N-H Kim H-R Lee J-H Kim SY Kim J Cha S Kim SY Darby ACFuxelius HH Yin J et al 2007 The Orientia tsutsugamushi genome re-veals massive proliferation of conjugative type IV secretion systemand host-cell interaction genes Proc Natl Acad Sci U S A 1047981ndash7986

Collingro A Tischler P Weinmaier T Penz T Heinz E Brunham RC ReadTD Bavoil PM Sachse K Kahane S et al 2011 Unity in varietymdashthe pan-genome of the Chlamydiae Mol Biol Evol 283253ndash3270

Darling AE Mau B Perna NT 2010 progressiveMauve multiple genomealignment with gene gain loss and rearrangement PLoS One 5e11147

Delcher AL Bratke KA Powers EC Salzberg SL 2007 Identifying bacterialgenes and endosymbiont DNA with Glimmer Bioinformatics 23673ndash679

Eddy SR 2011 Accelerated profile hmm searches PLoS Comput Biol 7e1002195

Edgar RC 2004 MUSCLE multiple sequence alignmentwith high accuracy and high throughput Nucleic Acids Res 321792ndash1797

Eirın-Lopez JM Rebordinos L Rooney AP Rozas J 2012 The birth-and-death evolution of multigene families revisited Genome Dyn7170ndash196

Ensminger AW Isberg RR 2010 E3 ubiquitin ligase activity and targetingof Bat3 by multiple Legionella pneumophila translocated substratesInfect Immun 783905ndash3919

Finn RD Bateman A Clements J Coggill P Eberhardt RY Eddy SR HegerA Hetherington K Holm L Mistry J et al 2013 Pfam the proteinfamilies database Nucleic Acids Res 42D222ndashD230

Fritsche TR Horn M Wagner M Herwig RP Schleifer K-H Gautom RK2000 Phylogenetic diversity among geographically dispersedChlamydiales endosymbionts recovered from clinical and environ-mental isolates of Acanthamoeba spp Appl Environ Microbiol 662613ndash2619

Furtado AR Essid M Perrinet S Bala~na ME Yoder N Dehoux P Subtil A2013 The chlamydial OTU domain-containing protein ChlaOTU isan early type III secretion effector targeting ubiquitin and NDP52Cell Microbiol 152064ndash2079

Gevers D Vandepoele K Simillon C Van de Peer Y 2004 Gene dupli-cation and biased functional retention of paralogs in bacterial ge-nomes Trends Microbiol 12148ndash154

Gibbs MJ Armstrong JS Gibbs AJ 2000 Sister-scanning a Monte Carloprocedure for assessing signals in recombinant sequencesBioinformatics 16573ndash582

Goldman BS Nierman WC Kaiser D Slater SC Durkin AS Eisen JARonning CM Barbazuk WB Blanchard M Field C et al 2006Evolution of sensory complexity recorded in a myxobacterialgenome Proc Natl Acad Sci U S A 10315200ndash15205

Gomes JP Nunes A Bruno WJ Borrego MJ Florindo C Dean D 2006Polymorphisms in the nine polymorphic membrane proteins ofChlamydia trachomatis across all serovars evidence for serovar Darecombination and correlation with tissue tropism J Bacteriol 188275ndash286

Griffiths-Jones S Moxon S Marshall M Khanna A Eddy SR Bateman A2005 Rfam annotating non-coding RNAs in complete genomesNucleic Acids Res 33D121ndashD124

Grimwood J Stephens RS 1999 Computational analysis of thepolymorphic membrane protein superfamily of Chlamydia tracho-matis and Chlamydia pneumoniae Microb Comp Genomics 4187ndash201

Guy L Kultima JR Andersson SGE 2010 genoPlotR comparative geneand genome visualization in R Bioinformatics 262334ndash2335

Heinz E Kolarov I Keuroastner C Toenshoff ER Wagner M Horn M 2007An Acanthamoeba sp containing two phylogenetically differentbacterial endosymbionts Environ Microbiol 91604ndash1609

Hooper SD Berg OG 2003 On the nature of gene innovation duplica-tion patterns in microbial genomes Mol Biol Evol 20945ndash954

Horn M 2008 Chlamydiae as symbionts in eukaryotes Annu RevMicrobiol 62113ndash131

Horn M Collingro A Schmitz-Esser S Beier CL Purkhold U Fartmann BBrandt P Nyakatura GJ Droege M Frishman D et al 2004Illuminating the evolutionary history of chlamydiae Science 304728ndash730

Huerta-Cepas J Dopazo J Gabaldon T 2010 ETE a python environmentfor tree exploration BMC Bioinformatics 1124

Hyatt D Chen G-L LoCascio PF Land ML Larimer FW Hauser LJ 2010Prodigal prokaryotic gene recognition and translation initiation siteidentification BMC Bioinformatics 11119

Jehl M-A Arnold R Rattei T 2011 Effectivemdasha database ofpredicted secreted bacterial proteins Nucleic Acids Res 39D591ndashD595

Jones P Binns D Chang H-Y Fraser M Li W McAnulla C McWilliam HMaslen J Mitchell A Nuka G et al 2014 InterProScan 5 genome-scale protein function classification Bioinformatics 301236ndash1240

Kaessmann H 2010 Origins evolution and phenotypic impact of newgenes Genome Res 201313ndash1326

Kalman S Mitchell W Marathe R Lammel C Fan J Hyman RW OlingerL Grimwood J Davis RW Stephens RS 1999 Comparative genomesof Chlamydia pneumoniae and C trachomatis Nat Genet 21385ndash389

Kamneva OK Knight SJ Liberles DA Ward NL 2012 Analysis of genomecontent evolution in PVC bacterial super-phylum assessment ofcandidate genes associated with cellular organization and lifestyleGenome Biol Evol 41375ndash1390

Katoh K Standley DM 2013 MAFFT multiple sequence alignment soft-ware version 7 improvements in performance and usability Mol BiolEvol 30772ndash780

Koonin EV Wolf YI Karev GP 2002 The structure of the protein uni-verse and genome evolution Nature 420218ndash223

Keurouck P Meusemann K 2010 FASconCAT convenient handling of datamatrices Mol Phylogenet Evol 561115ndash1118

Kuo C-H Ochman H 2009a Deletional bias across the three domains oflife Genome Biol Evol 1145ndash152

Kuo C-H Ochman H 2009b The fate of new bacterial genes FEMSMicrobiol Rev 3338ndash43

Lagesen K Hallin P Roslashdland EA Staefeldt H-H Rognes T Ussery DW2007 RNAmmer consistent and rapid annotation of ribosomalRNA genes Nucleic Acids Res 353100ndash3108

Lagkouvardos I Weinmaier T Lauro FM Cavicchioli R Rattei T Horn M2013 Integrating metagenomic and amplicon databases to re-solve the phylogenetic and ecological diversity of the ChlamydiaeISME J 8115ndash125

Lerat E Daubin V Ochman H Moran NA 2005 Evolutionary origins ofgenomic repertoires in bacteria PLoS Biol 3e130

Letunic I Bork P 2007 Interactive Tree Of Life (iTOL) an online tool forphylogenetic tree display and annotation Bioinformatics 23127ndash128

2902

Domman et al doi101093molbevmsu227 MBE

Li L Stoeckert CJ Jr Roos DS 2003 OrthoMCL identification ofortholog groups for eukaryotic genomes Genome Res 132178ndash2189

Lomma M Dervins-Ravault D Rolando M Nora T Newton HJ SansomFM Sahr T Gomez-Valero L Jules M Hartland EL et al 2010The Legionella pneumophila F-box protein Lpp2082 (AnkB) modu-lates ubiquitination of the host protein parvin B and promotesintracellular replication Cell Microbiol 121272ndash1291

Lowe TM Eddy SR 1997 tRNAscan-SE a program for improved detec-tion of transfer RNA genes in genomic sequence Nucleic Acids Res25955ndash964

Lynch M Conery JS 2000 The evolutionary fate and consequences ofduplicate genes Science 2901151ndash1155

Ma W Dong FFT Stavrinides J Guttman DS 2006 Type III effec-tor diversification via both pathoadaptation and horizontaltransfer in response to a coevolutionary arms race PLoS Genet 2e209

Magori S Citovsky V 2011 Hijacking of the host SCF ubiquitin ligasemachinery by plant pathogens Front Plant Sci 287

Martin D Rybicki E 2000 RDP detection of recombination amongstaligned sequences Bioinformatics 16562ndash563

Martin DP Lemey P Lott M Moulton V Posada D Lefeuvre P 2010RDP3 a flexible and fast computer program for analyzing recombi-nation Bioinformatics 262462ndash2463

Martin DP Posada D Crandall KA Williamson C 2005 A modifiedbootscan algorithm for automated identification of recombinantsequences and recombination breakpoints AIDS Res HumRetroviruses 2198ndash102

McCutcheon JP Moran NA 2011 Extreme genome reduction in sym-biotic bacteria Nat Rev Microbiol 1013ndash26

McLeod MP Warren RL Hsiao WWL Araki N Myhre M Fernandes CMiyazawa D Wong W Lillquist AL Wang D et al 2006 Thecomplete genome of Rhodococcus sp RHA1 provides insightsinto a catabolic powerhouse Proc Natl Acad Sci U S A 10315582ndash15587

Miller MA Pfeiffer W Schwartz T 2010 Creating the CIPRES sciencegateway for inference of large phylogenetic trees In Proceedings ofthe Gateway Computing Environments Workshop (GCE) 2010 p1ndash8

Misaghi S Balsara ZR Catic A Spooner E Ploegh HL Starnbach MN2006 Chlamydia trachomatis-derived deubiquitinating enzymes inmammalian cells during infection Mol Microbiol 61142ndash150

Myers GSA Crabtree J Huot CH 2012 Deep and wide comparativegenomics of Chlamydia In Tan M Bavoil PM editors Intracellularpathogens Chlamydiales Washington (DC) ASM Press p 27ndash50

Nam J Nei M 2005 Evolutionary change of the numbers of homeoboxgenes in bilateral animals Mol Biol Evol 222386ndash2394

Navarro-Quezada A Schumann N Quint M 2013 Plant F-box proteinevolution is determined by lineage-specific timing of major genefamily expansion waves PLoS One 8e68672

NCBI Resource Coordinators 2014 Database resources of the NationalCenter for Biotechnology Information Nucleic Acids Res 42D7ndashD17

Nei M 2007 The new mutation theory of phenotypic evolution ProcNatl Acad Sci U S A 10412235ndash12242

Nei M Niimura Y Nozawa M 2008 The evolution of animal chemo-sensory receptor gene repertoires roles of chance and necessity NatRev Genet 9951ndash963

Nei M Rooney AP 2005 Concerted and birth-and-death evolution ofmultigene families Annu Rev Genet 39121ndash152

Nogueira T Touchon M Rocha EPC 2012 Rapid evolution of the se-quences and gene repertoires of secreted proteins in bacteria PLoSOne 7e49403

Nozawa M Kawahara Y Nei M 2007 Genomic drift and copy numbervariation of sensory receptor genes in humans Proc Natl Acad SciU S A 10420421ndash20426

Ochman H Lawrence JG Groisman EA 2000 Lateral gene transfer andthe nature of bacterial innovation Nature 405299ndash304

Perez-Torrado R Yamada D Defossez P-A 2006 Born to bind the BTBproteinndashprotein interaction domain BioEssays 281194ndash1202

Peters J Wilson DP Myers G Timms P Bavoil PM 2007 Type III secre-tion a la Chlamydia Trends Microbiol 15241ndash251

Posada D Crandall KA 2001 Evaluation of methods for detecting re-combination from DNA sequences computer simulations Proc NatlAcad Sci U S A 9813757ndash13762

Price CT Kwaik YA 2010 Exploitation of host polyubiquitination ma-chinery through molecular mimicry by eukaryotic-like bacterial F-box effectors Cell Infect Microbiol 1122

Price CT Al-Khodor S Al-Quadan T Santic M Habyarimana F Kalia AKwaik YA 2009 Molecular mimicry by an F-box effector ofLegionella pneumophila hijacks a conserved polyubiquitinationmachinery within macrophages and protozoa PLoS Pathog 5e1000704

Price CTD Al-Quadan T Santic M Rosenshine I Kwaik YA 2011 Hostproteasomal degradation generates amino acids essential for intra-cellular bacterial growth Science 3341553ndash1557

Price MN Dehal PS Arkin AP 2010 FastTree 2mdashapproximately maxi-mum-likelihood trees for large alignments PLoS One 5e9490

Pushker R Mira A Rodriguez-Valera F 2004 Comparative geno-mics of gene-family size in closely related bacteria Genome Biol 5R27

Ronquist F Huelsenbeck JP 2003 MrBayes 3 Bayesian phylogeneticinference under mixed models Bioinformatics 191572ndash1574

Rooney AP Ward TJ 2008 Birth-and-death evolution of the internalinmultigene family in Listeria Gene 427124ndash128

Sakakibara A Hattori S 2000 Chat a CasHEF1-associated adaptor pro-tein that integrates multiple signaling pathways J Biol Chem 2756404ndash6410

Santoyo G Romero D 2005 Gene conversion and concerted evolutionin bacterial genomes FEMS Microbiol Rev 29169ndash183

Sawyer S 1989 Statistical tests for detecting gene conversion Mol BiolEvol 6526ndash538

Schmitz-Esser S Tischler P Arnold R Montanaro J Wagner M Rattei THorn M 2010 The genome of the amoeba symbiont ldquoCandidatusAmoebophilus asiaticusrdquo reveals common mechanisms for host cellinteraction among amoeba-associated bacteria J Bacteriol 1921045ndash1057

Schmitz-Esser S Toenshoff ER Haider S Heinz E Hoenninger VMWagner M Horn M 2008 Diversity of bacterial endosymbionts ofenvironmental Acanthamoeba isolates Appl Environ Microbiol 745822ndash5831

Smith JM 1992 Analyzing the mosaic structure of genes J Mol Evol 34126ndash129

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Stephens RS Kalman S Lammel C Fan J Marathe R Aravind L MitchellW Olinger L Tatusov RL Zhao Q et al 1998 Genome sequence ofan obligate intracellular pathogen of humans Chlamydia trachoma-tis Science 282754ndash759

Stogios PJ Downs GS Jauhal JJ Nandra SK Prive GG 2005Sequence and structural analysis of BTB domain proteins GenomeBiol 6R82

Stolzer M Lai H Xu M Sathaye D Vernot B Durand D 2012Inferring duplications losses transfers and incomplete line-age sorting with nonbinary species trees Bioinformatics 28i409ndashi415

Subtil A Collingro A Horn M 2014 Tracing the primordial Chlamydiaeextinct parasites of plants Trends Plant Sci 1936ndash43

Subtil A Parsot C Dautry-Varsat A 2001 Secretion of predicted Incproteins of Chlamydia pneumoniae by a heterologous type III ma-chinery Mol Microbiol 39792ndash800

Suyama M Torrents D Bork P 2006 PAL2NAL robust conversion ofprotein sequence alignments into the corresponding codon align-ments Nucleic Acids Res 34W609ndashW612

2903

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

Tamura K Peterson D Peterson N Stecher G Nei M Kumar S 2011MEGA5 molecular evolutionary genetics analysis using maximumlikelihood evolutionary distance and maximum parsimony meth-ods Mol Biol Evol 282731ndash2739

Thomas JH 2006 Adaptive evolution in two large families of ubiquitin-ligase adapters in nematodes and plants Genome Res 161017ndash1030

Treangen TJ Rocha EPC 2011 Horizontal transfer not duplicationdrives the expansion of protein families in prokaryotes PLoSGenet 7e1001284

UniProt Consortium 2014 Activities at the Universal Protein Resource(UniProt) Nucleic Acids Res 42D191ndashD198

Valen LV 1974 Molecular evolution as predicted by natural selectionJ Mol Evol 389ndash101

Willems AR Schwab M Tyers M 2004 A hitchhikerrsquos guide to the cullinubiquitin ligases SCF and its kin Biochim Biophys Acta 1695133ndash170

Xu G Ma H Nei M Kong H 2009 Evolution of F-box genes in plantsdifferent modes of sequence divergence and their relationships withfunctional diversification Proc Natl Acad Sci U S A 106835ndash840

Xu J 2006 Microbial ecology in the age of genomics andmetagenomics concepts tools and recent advances Mol Ecol 151713ndash1731

Yang J Lusk R Li W-H 2003 Organismal complexity protein complex-ity and gene duplicability Proc Natl Acad Sci U S A 10015661ndash15665

Yang Z 2007 PAML 4 phylogenetic analysis by maximum likelihoodMol Biol Evol 241586ndash1591

Yu NY Wagner JR Laird MR Melli G Rey S Lo R Dao P Sahinalp SCEster M Foster LJ et al 2010 PSORTb 30 improved protein sub-cellular localization prediction with refined localization subcate-gories and predictive capabilities for all prokaryotes Bioinformatics261608ndash1615

Zhang J 2003 Evolution by gene duplication an update Trends EcolEvol 18292ndash298

Zheng N Schulman BA Song L Miller JJ Jeffrey PD Wang P Chu CKoepp DM Elledge SJ Pagano M et al 2002 Structure of the Cul1ndashRbx1ndashSkp1ndashF boxSkp2 SCF ubiquitin ligase complex Nature 416703ndash709

2904

Domman et al doi101093molbevmsu227 MBE

Page 13: Massive Expansion of Ubiquitination-Related Gene Families

achieve the lowest duplication-to-loss ratio Phylogenetictrees were visualized using iTOL (Letunic and Bork 2007) orthe ETE2 toolkit (Huerta-Cepas et al 2010) Alignment andtree files are available as supplementary materialSupplementary Material online

Detection of Selection and Recombination

Whole-protein alignments were converted to codon align-ments via the Pal2Nal program (Suyama et al 2006) Todetect the mode of selection acting on these gene familieswe used CodeML from the PAML package (Yang 2007) Onlysequences that were nearly full length and not contig fusionswere used for the CodeML analysis We additionally used themodified NeindashGojobori method for codon selection from theMEGA v51 (Tamura et al 2011) program with 1000 boot-straps and treating missing data with pairwise deletions Weused the RDP4 software suite (Martin et al 2010) to detectrecombination events which is an amalgamation of manyindividual recombination programs linked into one softwarearchitecture We employed seven recombination detectionprograms that include RDP (Martin and Rybicki 2000)GENECONV (Sawyer 1989) MaxChi (Smith 1992) BootScan(Martin et al 2005) Chimaera (Posada and Crandall 2001)SiScan (Gibbs et al 2000) and 3Seq (Boni et al 2007) Onlyevents that were predicted with more than four programswere considered

Intergenic Sequences Analysis

The proteome of Protochlamydia was used as query(tBLASTn) against the intergenic regions The hits were con-servatively filtered based on size (4 50 aa) bitscore (4 50)and E value (lt1010) For each intergenic region when twoquery proteins overlap only the best hit was considered Inaddition if different parts of the same protein had hits in oneintergenic space only the top scoring (bitscore) was consid-ered Putative domains in the intergenic regions were de-tected using InterProScan 5 (Jones et al 2014)

Domain Distribution

For each domain of interest we downloaded the Pfam HMMmodel (Finn et al 2013) and scanned all chlamydial prote-omes using the hmmscan program from the HMMER3 suite(Eddy 2011) In the case of the F-box and F-box-like domainsthe results were combined into a nonredundant list The datafor the other taxa were obtained through the Pfam websiteunder the species distribution tab for each domain or clan asin the case for the F-box The networks were created using theldquoarcdiagramrdquo package in R

Type III Secretion Analysis

Fusion proteins containing the 50-part of the genes of interest(including the first 20 codons) and the adenylate cyclase Cyawere expressed in Sh Flexneri SF401 and SF620 derivatives ofthe wild-type strain M90T in which the mxiD and ipaB geneshave been inactivated (Allaoui et al 1993) The 50-part of thetarget genes were amplified by polymerase chain reaction andcloned in the puc19cya vector as described (Subtil et al 2001)

Secretion assays were performed on 30 ml of exponentiallygrown cultures as described previously (Subtil et al 2001)Antibodies against CRP a cytosolic marker were used to es-timate the contamination of supernatant fractions with bac-terial proteins as a result of bacterial lysis Antibodies againstIpaD a type-III-secreted protein of Shigella were used to verifythat type III secretion occurred normally in the transformedstrains A monoclonal antibody against Cya and polyclonalantibodies against CRP and IpaD were generously provided byDrs N Guiso A Ullmann and C Parsot respectively (InstitutPasteur Paris) Prediction of type III secretion was performedvia the Effective database (Jehl et al 2011) and the PSORTbwebserver (Yu et al 2010)

Supplementary MaterialSupplementary tables S1ndashS3 and figures S1ndashS8 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

Thomas Penz and Gabriele Schwammel are acknowledged forisolating genomic DNA This work was supported byAustrian Science Fund FWF grants Y277-B06 and I1628-B22European Research Council ERC StG EVOCHLAMY (281633)and Marie Curie Initial Training Network SymbiomicsGenome sequencing was performed at the NorwegianSequencing Center

ReferencesAl Ait L Yamak Z Morgenstern B 2013 DIALIGN at GOBICSmdashmultiple

sequence alignment using various sources of external informationNucleic Acids Res 41W3ndashW7

Allaoui A Sansonetti PJ Parsot C 1993 MxiD an outer membraneprotein necessary for the secretion of the Shigella flexneri lpa inva-sins Mol Microbiol 759ndash68

Andersson DI Hughes D 2009 Gene amplification and adaptive evolu-tion in bacteria Annu Rev Genet 43167ndash195

Angot A Vergunst A Genin S Peeters N 2007 Exploitationof eukaryotic ubiquitin signaling pathways by effectorstranslocated by bacterial type III and type IV Secretion SystemsPLoS Pathog 3e3

Arnold R Brandmaier S Kleine F Tischler P Heinz E Behrens SNiinikoski A Mewes HW Horn M Rattei T 2009 Sequence-based prediction of type III secreted proteins PLoS Pathog 5e1000376

Badger JH Olsen GJ 1999 CRITICA coding region identification toolinvoking comparative analysis Mol Biol Evol 16512ndash524

Balakirev MY Tcherniuk SO Jaquinod M Chroboczek J 2003 Otubainsa new family of cysteine proteases in the ubiquitin pathway EMBORep 4517ndash522

Baltrus DA Nishimura MT Romanchuk A Chang JH Mukhtar MSCherkis K Roach J Grant SR Jones CD Dangl JL 2011 Dynamicevolution of pathogenicity revealed by sequencing and comparativegenomics of 19 Pseudomonas syringae isolates PLoS Pathog 7e1002132

Bertelli C Collyn F Croxatto A Reurouckert C Polkinghorne A Kebbi-Beghdadi C Goesmann A Vaughan L Greub G 2010 TheWaddlia genome a window into chlamydial biology PLoS One 5e10890

2901

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

Besemer J Lomsadze A Borodovsky M 2001 GeneMarkS a self-trainingmethod for prediction of gene starts in microbial genomesImplications for finding sequence motifs in regulatory regionsNucleic Acids Res 292607ndash2618

Betts HJ Wolf K Fields KA 2009 Effector protein modulation of hostcells examples in the Chlamydia spp arsenal Curr Opin Microbiol1281ndash87

Blanc G Ogata H Robert C Audic S Claverie J-M Raoult D 2007Lateral gene transfer between obligate intracellular bacteria evi-dence from the Rickettsia massiliae genome Genome Res 171657ndash1664

Boni MF Posada D Feldman MW 2007 An exact nonparametricmethod for inferring mosaic structure in sequence tripletsGenetics 1761035ndash1047

Bordenstein SR Reznikoff WS 2005 Mobile DNA in obligate intracellularbacteria Nat Rev Microbiol 3688ndash699

Bratlie MS Johansen J Sherman BT Huang DW Lempicki RA Drabloslashs F2010 Gene duplications in prokaryotes can be associated with en-vironmental adaptation BMC Genomics 11588

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Cho N-H Kim H-R Lee J-H Kim SY Kim J Cha S Kim SY Darby ACFuxelius HH Yin J et al 2007 The Orientia tsutsugamushi genome re-veals massive proliferation of conjugative type IV secretion systemand host-cell interaction genes Proc Natl Acad Sci U S A 1047981ndash7986

Collingro A Tischler P Weinmaier T Penz T Heinz E Brunham RC ReadTD Bavoil PM Sachse K Kahane S et al 2011 Unity in varietymdashthe pan-genome of the Chlamydiae Mol Biol Evol 283253ndash3270

Darling AE Mau B Perna NT 2010 progressiveMauve multiple genomealignment with gene gain loss and rearrangement PLoS One 5e11147

Delcher AL Bratke KA Powers EC Salzberg SL 2007 Identifying bacterialgenes and endosymbiont DNA with Glimmer Bioinformatics 23673ndash679

Eddy SR 2011 Accelerated profile hmm searches PLoS Comput Biol 7e1002195

Edgar RC 2004 MUSCLE multiple sequence alignmentwith high accuracy and high throughput Nucleic Acids Res 321792ndash1797

Eirın-Lopez JM Rebordinos L Rooney AP Rozas J 2012 The birth-and-death evolution of multigene families revisited Genome Dyn7170ndash196

Ensminger AW Isberg RR 2010 E3 ubiquitin ligase activity and targetingof Bat3 by multiple Legionella pneumophila translocated substratesInfect Immun 783905ndash3919

Finn RD Bateman A Clements J Coggill P Eberhardt RY Eddy SR HegerA Hetherington K Holm L Mistry J et al 2013 Pfam the proteinfamilies database Nucleic Acids Res 42D222ndashD230

Fritsche TR Horn M Wagner M Herwig RP Schleifer K-H Gautom RK2000 Phylogenetic diversity among geographically dispersedChlamydiales endosymbionts recovered from clinical and environ-mental isolates of Acanthamoeba spp Appl Environ Microbiol 662613ndash2619

Furtado AR Essid M Perrinet S Bala~na ME Yoder N Dehoux P Subtil A2013 The chlamydial OTU domain-containing protein ChlaOTU isan early type III secretion effector targeting ubiquitin and NDP52Cell Microbiol 152064ndash2079

Gevers D Vandepoele K Simillon C Van de Peer Y 2004 Gene dupli-cation and biased functional retention of paralogs in bacterial ge-nomes Trends Microbiol 12148ndash154

Gibbs MJ Armstrong JS Gibbs AJ 2000 Sister-scanning a Monte Carloprocedure for assessing signals in recombinant sequencesBioinformatics 16573ndash582

Goldman BS Nierman WC Kaiser D Slater SC Durkin AS Eisen JARonning CM Barbazuk WB Blanchard M Field C et al 2006Evolution of sensory complexity recorded in a myxobacterialgenome Proc Natl Acad Sci U S A 10315200ndash15205

Gomes JP Nunes A Bruno WJ Borrego MJ Florindo C Dean D 2006Polymorphisms in the nine polymorphic membrane proteins ofChlamydia trachomatis across all serovars evidence for serovar Darecombination and correlation with tissue tropism J Bacteriol 188275ndash286

Griffiths-Jones S Moxon S Marshall M Khanna A Eddy SR Bateman A2005 Rfam annotating non-coding RNAs in complete genomesNucleic Acids Res 33D121ndashD124

Grimwood J Stephens RS 1999 Computational analysis of thepolymorphic membrane protein superfamily of Chlamydia tracho-matis and Chlamydia pneumoniae Microb Comp Genomics 4187ndash201

Guy L Kultima JR Andersson SGE 2010 genoPlotR comparative geneand genome visualization in R Bioinformatics 262334ndash2335

Heinz E Kolarov I Keuroastner C Toenshoff ER Wagner M Horn M 2007An Acanthamoeba sp containing two phylogenetically differentbacterial endosymbionts Environ Microbiol 91604ndash1609

Hooper SD Berg OG 2003 On the nature of gene innovation duplica-tion patterns in microbial genomes Mol Biol Evol 20945ndash954

Horn M 2008 Chlamydiae as symbionts in eukaryotes Annu RevMicrobiol 62113ndash131

Horn M Collingro A Schmitz-Esser S Beier CL Purkhold U Fartmann BBrandt P Nyakatura GJ Droege M Frishman D et al 2004Illuminating the evolutionary history of chlamydiae Science 304728ndash730

Huerta-Cepas J Dopazo J Gabaldon T 2010 ETE a python environmentfor tree exploration BMC Bioinformatics 1124

Hyatt D Chen G-L LoCascio PF Land ML Larimer FW Hauser LJ 2010Prodigal prokaryotic gene recognition and translation initiation siteidentification BMC Bioinformatics 11119

Jehl M-A Arnold R Rattei T 2011 Effectivemdasha database ofpredicted secreted bacterial proteins Nucleic Acids Res 39D591ndashD595

Jones P Binns D Chang H-Y Fraser M Li W McAnulla C McWilliam HMaslen J Mitchell A Nuka G et al 2014 InterProScan 5 genome-scale protein function classification Bioinformatics 301236ndash1240

Kaessmann H 2010 Origins evolution and phenotypic impact of newgenes Genome Res 201313ndash1326

Kalman S Mitchell W Marathe R Lammel C Fan J Hyman RW OlingerL Grimwood J Davis RW Stephens RS 1999 Comparative genomesof Chlamydia pneumoniae and C trachomatis Nat Genet 21385ndash389

Kamneva OK Knight SJ Liberles DA Ward NL 2012 Analysis of genomecontent evolution in PVC bacterial super-phylum assessment ofcandidate genes associated with cellular organization and lifestyleGenome Biol Evol 41375ndash1390

Katoh K Standley DM 2013 MAFFT multiple sequence alignment soft-ware version 7 improvements in performance and usability Mol BiolEvol 30772ndash780

Koonin EV Wolf YI Karev GP 2002 The structure of the protein uni-verse and genome evolution Nature 420218ndash223

Keurouck P Meusemann K 2010 FASconCAT convenient handling of datamatrices Mol Phylogenet Evol 561115ndash1118

Kuo C-H Ochman H 2009a Deletional bias across the three domains oflife Genome Biol Evol 1145ndash152

Kuo C-H Ochman H 2009b The fate of new bacterial genes FEMSMicrobiol Rev 3338ndash43

Lagesen K Hallin P Roslashdland EA Staefeldt H-H Rognes T Ussery DW2007 RNAmmer consistent and rapid annotation of ribosomalRNA genes Nucleic Acids Res 353100ndash3108

Lagkouvardos I Weinmaier T Lauro FM Cavicchioli R Rattei T Horn M2013 Integrating metagenomic and amplicon databases to re-solve the phylogenetic and ecological diversity of the ChlamydiaeISME J 8115ndash125

Lerat E Daubin V Ochman H Moran NA 2005 Evolutionary origins ofgenomic repertoires in bacteria PLoS Biol 3e130

Letunic I Bork P 2007 Interactive Tree Of Life (iTOL) an online tool forphylogenetic tree display and annotation Bioinformatics 23127ndash128

2902

Domman et al doi101093molbevmsu227 MBE

Li L Stoeckert CJ Jr Roos DS 2003 OrthoMCL identification ofortholog groups for eukaryotic genomes Genome Res 132178ndash2189

Lomma M Dervins-Ravault D Rolando M Nora T Newton HJ SansomFM Sahr T Gomez-Valero L Jules M Hartland EL et al 2010The Legionella pneumophila F-box protein Lpp2082 (AnkB) modu-lates ubiquitination of the host protein parvin B and promotesintracellular replication Cell Microbiol 121272ndash1291

Lowe TM Eddy SR 1997 tRNAscan-SE a program for improved detec-tion of transfer RNA genes in genomic sequence Nucleic Acids Res25955ndash964

Lynch M Conery JS 2000 The evolutionary fate and consequences ofduplicate genes Science 2901151ndash1155

Ma W Dong FFT Stavrinides J Guttman DS 2006 Type III effec-tor diversification via both pathoadaptation and horizontaltransfer in response to a coevolutionary arms race PLoS Genet 2e209

Magori S Citovsky V 2011 Hijacking of the host SCF ubiquitin ligasemachinery by plant pathogens Front Plant Sci 287

Martin D Rybicki E 2000 RDP detection of recombination amongstaligned sequences Bioinformatics 16562ndash563

Martin DP Lemey P Lott M Moulton V Posada D Lefeuvre P 2010RDP3 a flexible and fast computer program for analyzing recombi-nation Bioinformatics 262462ndash2463

Martin DP Posada D Crandall KA Williamson C 2005 A modifiedbootscan algorithm for automated identification of recombinantsequences and recombination breakpoints AIDS Res HumRetroviruses 2198ndash102

McCutcheon JP Moran NA 2011 Extreme genome reduction in sym-biotic bacteria Nat Rev Microbiol 1013ndash26

McLeod MP Warren RL Hsiao WWL Araki N Myhre M Fernandes CMiyazawa D Wong W Lillquist AL Wang D et al 2006 Thecomplete genome of Rhodococcus sp RHA1 provides insightsinto a catabolic powerhouse Proc Natl Acad Sci U S A 10315582ndash15587

Miller MA Pfeiffer W Schwartz T 2010 Creating the CIPRES sciencegateway for inference of large phylogenetic trees In Proceedings ofthe Gateway Computing Environments Workshop (GCE) 2010 p1ndash8

Misaghi S Balsara ZR Catic A Spooner E Ploegh HL Starnbach MN2006 Chlamydia trachomatis-derived deubiquitinating enzymes inmammalian cells during infection Mol Microbiol 61142ndash150

Myers GSA Crabtree J Huot CH 2012 Deep and wide comparativegenomics of Chlamydia In Tan M Bavoil PM editors Intracellularpathogens Chlamydiales Washington (DC) ASM Press p 27ndash50

Nam J Nei M 2005 Evolutionary change of the numbers of homeoboxgenes in bilateral animals Mol Biol Evol 222386ndash2394

Navarro-Quezada A Schumann N Quint M 2013 Plant F-box proteinevolution is determined by lineage-specific timing of major genefamily expansion waves PLoS One 8e68672

NCBI Resource Coordinators 2014 Database resources of the NationalCenter for Biotechnology Information Nucleic Acids Res 42D7ndashD17

Nei M 2007 The new mutation theory of phenotypic evolution ProcNatl Acad Sci U S A 10412235ndash12242

Nei M Niimura Y Nozawa M 2008 The evolution of animal chemo-sensory receptor gene repertoires roles of chance and necessity NatRev Genet 9951ndash963

Nei M Rooney AP 2005 Concerted and birth-and-death evolution ofmultigene families Annu Rev Genet 39121ndash152

Nogueira T Touchon M Rocha EPC 2012 Rapid evolution of the se-quences and gene repertoires of secreted proteins in bacteria PLoSOne 7e49403

Nozawa M Kawahara Y Nei M 2007 Genomic drift and copy numbervariation of sensory receptor genes in humans Proc Natl Acad SciU S A 10420421ndash20426

Ochman H Lawrence JG Groisman EA 2000 Lateral gene transfer andthe nature of bacterial innovation Nature 405299ndash304

Perez-Torrado R Yamada D Defossez P-A 2006 Born to bind the BTBproteinndashprotein interaction domain BioEssays 281194ndash1202

Peters J Wilson DP Myers G Timms P Bavoil PM 2007 Type III secre-tion a la Chlamydia Trends Microbiol 15241ndash251

Posada D Crandall KA 2001 Evaluation of methods for detecting re-combination from DNA sequences computer simulations Proc NatlAcad Sci U S A 9813757ndash13762

Price CT Kwaik YA 2010 Exploitation of host polyubiquitination ma-chinery through molecular mimicry by eukaryotic-like bacterial F-box effectors Cell Infect Microbiol 1122

Price CT Al-Khodor S Al-Quadan T Santic M Habyarimana F Kalia AKwaik YA 2009 Molecular mimicry by an F-box effector ofLegionella pneumophila hijacks a conserved polyubiquitinationmachinery within macrophages and protozoa PLoS Pathog 5e1000704

Price CTD Al-Quadan T Santic M Rosenshine I Kwaik YA 2011 Hostproteasomal degradation generates amino acids essential for intra-cellular bacterial growth Science 3341553ndash1557

Price MN Dehal PS Arkin AP 2010 FastTree 2mdashapproximately maxi-mum-likelihood trees for large alignments PLoS One 5e9490

Pushker R Mira A Rodriguez-Valera F 2004 Comparative geno-mics of gene-family size in closely related bacteria Genome Biol 5R27

Ronquist F Huelsenbeck JP 2003 MrBayes 3 Bayesian phylogeneticinference under mixed models Bioinformatics 191572ndash1574

Rooney AP Ward TJ 2008 Birth-and-death evolution of the internalinmultigene family in Listeria Gene 427124ndash128

Sakakibara A Hattori S 2000 Chat a CasHEF1-associated adaptor pro-tein that integrates multiple signaling pathways J Biol Chem 2756404ndash6410

Santoyo G Romero D 2005 Gene conversion and concerted evolutionin bacterial genomes FEMS Microbiol Rev 29169ndash183

Sawyer S 1989 Statistical tests for detecting gene conversion Mol BiolEvol 6526ndash538

Schmitz-Esser S Tischler P Arnold R Montanaro J Wagner M Rattei THorn M 2010 The genome of the amoeba symbiont ldquoCandidatusAmoebophilus asiaticusrdquo reveals common mechanisms for host cellinteraction among amoeba-associated bacteria J Bacteriol 1921045ndash1057

Schmitz-Esser S Toenshoff ER Haider S Heinz E Hoenninger VMWagner M Horn M 2008 Diversity of bacterial endosymbionts ofenvironmental Acanthamoeba isolates Appl Environ Microbiol 745822ndash5831

Smith JM 1992 Analyzing the mosaic structure of genes J Mol Evol 34126ndash129

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Stephens RS Kalman S Lammel C Fan J Marathe R Aravind L MitchellW Olinger L Tatusov RL Zhao Q et al 1998 Genome sequence ofan obligate intracellular pathogen of humans Chlamydia trachoma-tis Science 282754ndash759

Stogios PJ Downs GS Jauhal JJ Nandra SK Prive GG 2005Sequence and structural analysis of BTB domain proteins GenomeBiol 6R82

Stolzer M Lai H Xu M Sathaye D Vernot B Durand D 2012Inferring duplications losses transfers and incomplete line-age sorting with nonbinary species trees Bioinformatics 28i409ndashi415

Subtil A Collingro A Horn M 2014 Tracing the primordial Chlamydiaeextinct parasites of plants Trends Plant Sci 1936ndash43

Subtil A Parsot C Dautry-Varsat A 2001 Secretion of predicted Incproteins of Chlamydia pneumoniae by a heterologous type III ma-chinery Mol Microbiol 39792ndash800

Suyama M Torrents D Bork P 2006 PAL2NAL robust conversion ofprotein sequence alignments into the corresponding codon align-ments Nucleic Acids Res 34W609ndashW612

2903

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

Tamura K Peterson D Peterson N Stecher G Nei M Kumar S 2011MEGA5 molecular evolutionary genetics analysis using maximumlikelihood evolutionary distance and maximum parsimony meth-ods Mol Biol Evol 282731ndash2739

Thomas JH 2006 Adaptive evolution in two large families of ubiquitin-ligase adapters in nematodes and plants Genome Res 161017ndash1030

Treangen TJ Rocha EPC 2011 Horizontal transfer not duplicationdrives the expansion of protein families in prokaryotes PLoSGenet 7e1001284

UniProt Consortium 2014 Activities at the Universal Protein Resource(UniProt) Nucleic Acids Res 42D191ndashD198

Valen LV 1974 Molecular evolution as predicted by natural selectionJ Mol Evol 389ndash101

Willems AR Schwab M Tyers M 2004 A hitchhikerrsquos guide to the cullinubiquitin ligases SCF and its kin Biochim Biophys Acta 1695133ndash170

Xu G Ma H Nei M Kong H 2009 Evolution of F-box genes in plantsdifferent modes of sequence divergence and their relationships withfunctional diversification Proc Natl Acad Sci U S A 106835ndash840

Xu J 2006 Microbial ecology in the age of genomics andmetagenomics concepts tools and recent advances Mol Ecol 151713ndash1731

Yang J Lusk R Li W-H 2003 Organismal complexity protein complex-ity and gene duplicability Proc Natl Acad Sci U S A 10015661ndash15665

Yang Z 2007 PAML 4 phylogenetic analysis by maximum likelihoodMol Biol Evol 241586ndash1591

Yu NY Wagner JR Laird MR Melli G Rey S Lo R Dao P Sahinalp SCEster M Foster LJ et al 2010 PSORTb 30 improved protein sub-cellular localization prediction with refined localization subcate-gories and predictive capabilities for all prokaryotes Bioinformatics261608ndash1615

Zhang J 2003 Evolution by gene duplication an update Trends EcolEvol 18292ndash298

Zheng N Schulman BA Song L Miller JJ Jeffrey PD Wang P Chu CKoepp DM Elledge SJ Pagano M et al 2002 Structure of the Cul1ndashRbx1ndashSkp1ndashF boxSkp2 SCF ubiquitin ligase complex Nature 416703ndash709

2904

Domman et al doi101093molbevmsu227 MBE

Page 14: Massive Expansion of Ubiquitination-Related Gene Families

Besemer J Lomsadze A Borodovsky M 2001 GeneMarkS a self-trainingmethod for prediction of gene starts in microbial genomesImplications for finding sequence motifs in regulatory regionsNucleic Acids Res 292607ndash2618

Betts HJ Wolf K Fields KA 2009 Effector protein modulation of hostcells examples in the Chlamydia spp arsenal Curr Opin Microbiol1281ndash87

Blanc G Ogata H Robert C Audic S Claverie J-M Raoult D 2007Lateral gene transfer between obligate intracellular bacteria evi-dence from the Rickettsia massiliae genome Genome Res 171657ndash1664

Boni MF Posada D Feldman MW 2007 An exact nonparametricmethod for inferring mosaic structure in sequence tripletsGenetics 1761035ndash1047

Bordenstein SR Reznikoff WS 2005 Mobile DNA in obligate intracellularbacteria Nat Rev Microbiol 3688ndash699

Bratlie MS Johansen J Sherman BT Huang DW Lempicki RA Drabloslashs F2010 Gene duplications in prokaryotes can be associated with en-vironmental adaptation BMC Genomics 11588

Castresana J 2000 Selection of conserved blocks from multiple align-ments for their use in phylogenetic analysis Mol Biol Evol 17540ndash552

Cho N-H Kim H-R Lee J-H Kim SY Kim J Cha S Kim SY Darby ACFuxelius HH Yin J et al 2007 The Orientia tsutsugamushi genome re-veals massive proliferation of conjugative type IV secretion systemand host-cell interaction genes Proc Natl Acad Sci U S A 1047981ndash7986

Collingro A Tischler P Weinmaier T Penz T Heinz E Brunham RC ReadTD Bavoil PM Sachse K Kahane S et al 2011 Unity in varietymdashthe pan-genome of the Chlamydiae Mol Biol Evol 283253ndash3270

Darling AE Mau B Perna NT 2010 progressiveMauve multiple genomealignment with gene gain loss and rearrangement PLoS One 5e11147

Delcher AL Bratke KA Powers EC Salzberg SL 2007 Identifying bacterialgenes and endosymbiont DNA with Glimmer Bioinformatics 23673ndash679

Eddy SR 2011 Accelerated profile hmm searches PLoS Comput Biol 7e1002195

Edgar RC 2004 MUSCLE multiple sequence alignmentwith high accuracy and high throughput Nucleic Acids Res 321792ndash1797

Eirın-Lopez JM Rebordinos L Rooney AP Rozas J 2012 The birth-and-death evolution of multigene families revisited Genome Dyn7170ndash196

Ensminger AW Isberg RR 2010 E3 ubiquitin ligase activity and targetingof Bat3 by multiple Legionella pneumophila translocated substratesInfect Immun 783905ndash3919

Finn RD Bateman A Clements J Coggill P Eberhardt RY Eddy SR HegerA Hetherington K Holm L Mistry J et al 2013 Pfam the proteinfamilies database Nucleic Acids Res 42D222ndashD230

Fritsche TR Horn M Wagner M Herwig RP Schleifer K-H Gautom RK2000 Phylogenetic diversity among geographically dispersedChlamydiales endosymbionts recovered from clinical and environ-mental isolates of Acanthamoeba spp Appl Environ Microbiol 662613ndash2619

Furtado AR Essid M Perrinet S Bala~na ME Yoder N Dehoux P Subtil A2013 The chlamydial OTU domain-containing protein ChlaOTU isan early type III secretion effector targeting ubiquitin and NDP52Cell Microbiol 152064ndash2079

Gevers D Vandepoele K Simillon C Van de Peer Y 2004 Gene dupli-cation and biased functional retention of paralogs in bacterial ge-nomes Trends Microbiol 12148ndash154

Gibbs MJ Armstrong JS Gibbs AJ 2000 Sister-scanning a Monte Carloprocedure for assessing signals in recombinant sequencesBioinformatics 16573ndash582

Goldman BS Nierman WC Kaiser D Slater SC Durkin AS Eisen JARonning CM Barbazuk WB Blanchard M Field C et al 2006Evolution of sensory complexity recorded in a myxobacterialgenome Proc Natl Acad Sci U S A 10315200ndash15205

Gomes JP Nunes A Bruno WJ Borrego MJ Florindo C Dean D 2006Polymorphisms in the nine polymorphic membrane proteins ofChlamydia trachomatis across all serovars evidence for serovar Darecombination and correlation with tissue tropism J Bacteriol 188275ndash286

Griffiths-Jones S Moxon S Marshall M Khanna A Eddy SR Bateman A2005 Rfam annotating non-coding RNAs in complete genomesNucleic Acids Res 33D121ndashD124

Grimwood J Stephens RS 1999 Computational analysis of thepolymorphic membrane protein superfamily of Chlamydia tracho-matis and Chlamydia pneumoniae Microb Comp Genomics 4187ndash201

Guy L Kultima JR Andersson SGE 2010 genoPlotR comparative geneand genome visualization in R Bioinformatics 262334ndash2335

Heinz E Kolarov I Keuroastner C Toenshoff ER Wagner M Horn M 2007An Acanthamoeba sp containing two phylogenetically differentbacterial endosymbionts Environ Microbiol 91604ndash1609

Hooper SD Berg OG 2003 On the nature of gene innovation duplica-tion patterns in microbial genomes Mol Biol Evol 20945ndash954

Horn M 2008 Chlamydiae as symbionts in eukaryotes Annu RevMicrobiol 62113ndash131

Horn M Collingro A Schmitz-Esser S Beier CL Purkhold U Fartmann BBrandt P Nyakatura GJ Droege M Frishman D et al 2004Illuminating the evolutionary history of chlamydiae Science 304728ndash730

Huerta-Cepas J Dopazo J Gabaldon T 2010 ETE a python environmentfor tree exploration BMC Bioinformatics 1124

Hyatt D Chen G-L LoCascio PF Land ML Larimer FW Hauser LJ 2010Prodigal prokaryotic gene recognition and translation initiation siteidentification BMC Bioinformatics 11119

Jehl M-A Arnold R Rattei T 2011 Effectivemdasha database ofpredicted secreted bacterial proteins Nucleic Acids Res 39D591ndashD595

Jones P Binns D Chang H-Y Fraser M Li W McAnulla C McWilliam HMaslen J Mitchell A Nuka G et al 2014 InterProScan 5 genome-scale protein function classification Bioinformatics 301236ndash1240

Kaessmann H 2010 Origins evolution and phenotypic impact of newgenes Genome Res 201313ndash1326

Kalman S Mitchell W Marathe R Lammel C Fan J Hyman RW OlingerL Grimwood J Davis RW Stephens RS 1999 Comparative genomesof Chlamydia pneumoniae and C trachomatis Nat Genet 21385ndash389

Kamneva OK Knight SJ Liberles DA Ward NL 2012 Analysis of genomecontent evolution in PVC bacterial super-phylum assessment ofcandidate genes associated with cellular organization and lifestyleGenome Biol Evol 41375ndash1390

Katoh K Standley DM 2013 MAFFT multiple sequence alignment soft-ware version 7 improvements in performance and usability Mol BiolEvol 30772ndash780

Koonin EV Wolf YI Karev GP 2002 The structure of the protein uni-verse and genome evolution Nature 420218ndash223

Keurouck P Meusemann K 2010 FASconCAT convenient handling of datamatrices Mol Phylogenet Evol 561115ndash1118

Kuo C-H Ochman H 2009a Deletional bias across the three domains oflife Genome Biol Evol 1145ndash152

Kuo C-H Ochman H 2009b The fate of new bacterial genes FEMSMicrobiol Rev 3338ndash43

Lagesen K Hallin P Roslashdland EA Staefeldt H-H Rognes T Ussery DW2007 RNAmmer consistent and rapid annotation of ribosomalRNA genes Nucleic Acids Res 353100ndash3108

Lagkouvardos I Weinmaier T Lauro FM Cavicchioli R Rattei T Horn M2013 Integrating metagenomic and amplicon databases to re-solve the phylogenetic and ecological diversity of the ChlamydiaeISME J 8115ndash125

Lerat E Daubin V Ochman H Moran NA 2005 Evolutionary origins ofgenomic repertoires in bacteria PLoS Biol 3e130

Letunic I Bork P 2007 Interactive Tree Of Life (iTOL) an online tool forphylogenetic tree display and annotation Bioinformatics 23127ndash128

2902

Domman et al doi101093molbevmsu227 MBE

Li L Stoeckert CJ Jr Roos DS 2003 OrthoMCL identification ofortholog groups for eukaryotic genomes Genome Res 132178ndash2189

Lomma M Dervins-Ravault D Rolando M Nora T Newton HJ SansomFM Sahr T Gomez-Valero L Jules M Hartland EL et al 2010The Legionella pneumophila F-box protein Lpp2082 (AnkB) modu-lates ubiquitination of the host protein parvin B and promotesintracellular replication Cell Microbiol 121272ndash1291

Lowe TM Eddy SR 1997 tRNAscan-SE a program for improved detec-tion of transfer RNA genes in genomic sequence Nucleic Acids Res25955ndash964

Lynch M Conery JS 2000 The evolutionary fate and consequences ofduplicate genes Science 2901151ndash1155

Ma W Dong FFT Stavrinides J Guttman DS 2006 Type III effec-tor diversification via both pathoadaptation and horizontaltransfer in response to a coevolutionary arms race PLoS Genet 2e209

Magori S Citovsky V 2011 Hijacking of the host SCF ubiquitin ligasemachinery by plant pathogens Front Plant Sci 287

Martin D Rybicki E 2000 RDP detection of recombination amongstaligned sequences Bioinformatics 16562ndash563

Martin DP Lemey P Lott M Moulton V Posada D Lefeuvre P 2010RDP3 a flexible and fast computer program for analyzing recombi-nation Bioinformatics 262462ndash2463

Martin DP Posada D Crandall KA Williamson C 2005 A modifiedbootscan algorithm for automated identification of recombinantsequences and recombination breakpoints AIDS Res HumRetroviruses 2198ndash102

McCutcheon JP Moran NA 2011 Extreme genome reduction in sym-biotic bacteria Nat Rev Microbiol 1013ndash26

McLeod MP Warren RL Hsiao WWL Araki N Myhre M Fernandes CMiyazawa D Wong W Lillquist AL Wang D et al 2006 Thecomplete genome of Rhodococcus sp RHA1 provides insightsinto a catabolic powerhouse Proc Natl Acad Sci U S A 10315582ndash15587

Miller MA Pfeiffer W Schwartz T 2010 Creating the CIPRES sciencegateway for inference of large phylogenetic trees In Proceedings ofthe Gateway Computing Environments Workshop (GCE) 2010 p1ndash8

Misaghi S Balsara ZR Catic A Spooner E Ploegh HL Starnbach MN2006 Chlamydia trachomatis-derived deubiquitinating enzymes inmammalian cells during infection Mol Microbiol 61142ndash150

Myers GSA Crabtree J Huot CH 2012 Deep and wide comparativegenomics of Chlamydia In Tan M Bavoil PM editors Intracellularpathogens Chlamydiales Washington (DC) ASM Press p 27ndash50

Nam J Nei M 2005 Evolutionary change of the numbers of homeoboxgenes in bilateral animals Mol Biol Evol 222386ndash2394

Navarro-Quezada A Schumann N Quint M 2013 Plant F-box proteinevolution is determined by lineage-specific timing of major genefamily expansion waves PLoS One 8e68672

NCBI Resource Coordinators 2014 Database resources of the NationalCenter for Biotechnology Information Nucleic Acids Res 42D7ndashD17

Nei M 2007 The new mutation theory of phenotypic evolution ProcNatl Acad Sci U S A 10412235ndash12242

Nei M Niimura Y Nozawa M 2008 The evolution of animal chemo-sensory receptor gene repertoires roles of chance and necessity NatRev Genet 9951ndash963

Nei M Rooney AP 2005 Concerted and birth-and-death evolution ofmultigene families Annu Rev Genet 39121ndash152

Nogueira T Touchon M Rocha EPC 2012 Rapid evolution of the se-quences and gene repertoires of secreted proteins in bacteria PLoSOne 7e49403

Nozawa M Kawahara Y Nei M 2007 Genomic drift and copy numbervariation of sensory receptor genes in humans Proc Natl Acad SciU S A 10420421ndash20426

Ochman H Lawrence JG Groisman EA 2000 Lateral gene transfer andthe nature of bacterial innovation Nature 405299ndash304

Perez-Torrado R Yamada D Defossez P-A 2006 Born to bind the BTBproteinndashprotein interaction domain BioEssays 281194ndash1202

Peters J Wilson DP Myers G Timms P Bavoil PM 2007 Type III secre-tion a la Chlamydia Trends Microbiol 15241ndash251

Posada D Crandall KA 2001 Evaluation of methods for detecting re-combination from DNA sequences computer simulations Proc NatlAcad Sci U S A 9813757ndash13762

Price CT Kwaik YA 2010 Exploitation of host polyubiquitination ma-chinery through molecular mimicry by eukaryotic-like bacterial F-box effectors Cell Infect Microbiol 1122

Price CT Al-Khodor S Al-Quadan T Santic M Habyarimana F Kalia AKwaik YA 2009 Molecular mimicry by an F-box effector ofLegionella pneumophila hijacks a conserved polyubiquitinationmachinery within macrophages and protozoa PLoS Pathog 5e1000704

Price CTD Al-Quadan T Santic M Rosenshine I Kwaik YA 2011 Hostproteasomal degradation generates amino acids essential for intra-cellular bacterial growth Science 3341553ndash1557

Price MN Dehal PS Arkin AP 2010 FastTree 2mdashapproximately maxi-mum-likelihood trees for large alignments PLoS One 5e9490

Pushker R Mira A Rodriguez-Valera F 2004 Comparative geno-mics of gene-family size in closely related bacteria Genome Biol 5R27

Ronquist F Huelsenbeck JP 2003 MrBayes 3 Bayesian phylogeneticinference under mixed models Bioinformatics 191572ndash1574

Rooney AP Ward TJ 2008 Birth-and-death evolution of the internalinmultigene family in Listeria Gene 427124ndash128

Sakakibara A Hattori S 2000 Chat a CasHEF1-associated adaptor pro-tein that integrates multiple signaling pathways J Biol Chem 2756404ndash6410

Santoyo G Romero D 2005 Gene conversion and concerted evolutionin bacterial genomes FEMS Microbiol Rev 29169ndash183

Sawyer S 1989 Statistical tests for detecting gene conversion Mol BiolEvol 6526ndash538

Schmitz-Esser S Tischler P Arnold R Montanaro J Wagner M Rattei THorn M 2010 The genome of the amoeba symbiont ldquoCandidatusAmoebophilus asiaticusrdquo reveals common mechanisms for host cellinteraction among amoeba-associated bacteria J Bacteriol 1921045ndash1057

Schmitz-Esser S Toenshoff ER Haider S Heinz E Hoenninger VMWagner M Horn M 2008 Diversity of bacterial endosymbionts ofenvironmental Acanthamoeba isolates Appl Environ Microbiol 745822ndash5831

Smith JM 1992 Analyzing the mosaic structure of genes J Mol Evol 34126ndash129

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Stephens RS Kalman S Lammel C Fan J Marathe R Aravind L MitchellW Olinger L Tatusov RL Zhao Q et al 1998 Genome sequence ofan obligate intracellular pathogen of humans Chlamydia trachoma-tis Science 282754ndash759

Stogios PJ Downs GS Jauhal JJ Nandra SK Prive GG 2005Sequence and structural analysis of BTB domain proteins GenomeBiol 6R82

Stolzer M Lai H Xu M Sathaye D Vernot B Durand D 2012Inferring duplications losses transfers and incomplete line-age sorting with nonbinary species trees Bioinformatics 28i409ndashi415

Subtil A Collingro A Horn M 2014 Tracing the primordial Chlamydiaeextinct parasites of plants Trends Plant Sci 1936ndash43

Subtil A Parsot C Dautry-Varsat A 2001 Secretion of predicted Incproteins of Chlamydia pneumoniae by a heterologous type III ma-chinery Mol Microbiol 39792ndash800

Suyama M Torrents D Bork P 2006 PAL2NAL robust conversion ofprotein sequence alignments into the corresponding codon align-ments Nucleic Acids Res 34W609ndashW612

2903

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

Tamura K Peterson D Peterson N Stecher G Nei M Kumar S 2011MEGA5 molecular evolutionary genetics analysis using maximumlikelihood evolutionary distance and maximum parsimony meth-ods Mol Biol Evol 282731ndash2739

Thomas JH 2006 Adaptive evolution in two large families of ubiquitin-ligase adapters in nematodes and plants Genome Res 161017ndash1030

Treangen TJ Rocha EPC 2011 Horizontal transfer not duplicationdrives the expansion of protein families in prokaryotes PLoSGenet 7e1001284

UniProt Consortium 2014 Activities at the Universal Protein Resource(UniProt) Nucleic Acids Res 42D191ndashD198

Valen LV 1974 Molecular evolution as predicted by natural selectionJ Mol Evol 389ndash101

Willems AR Schwab M Tyers M 2004 A hitchhikerrsquos guide to the cullinubiquitin ligases SCF and its kin Biochim Biophys Acta 1695133ndash170

Xu G Ma H Nei M Kong H 2009 Evolution of F-box genes in plantsdifferent modes of sequence divergence and their relationships withfunctional diversification Proc Natl Acad Sci U S A 106835ndash840

Xu J 2006 Microbial ecology in the age of genomics andmetagenomics concepts tools and recent advances Mol Ecol 151713ndash1731

Yang J Lusk R Li W-H 2003 Organismal complexity protein complex-ity and gene duplicability Proc Natl Acad Sci U S A 10015661ndash15665

Yang Z 2007 PAML 4 phylogenetic analysis by maximum likelihoodMol Biol Evol 241586ndash1591

Yu NY Wagner JR Laird MR Melli G Rey S Lo R Dao P Sahinalp SCEster M Foster LJ et al 2010 PSORTb 30 improved protein sub-cellular localization prediction with refined localization subcate-gories and predictive capabilities for all prokaryotes Bioinformatics261608ndash1615

Zhang J 2003 Evolution by gene duplication an update Trends EcolEvol 18292ndash298

Zheng N Schulman BA Song L Miller JJ Jeffrey PD Wang P Chu CKoepp DM Elledge SJ Pagano M et al 2002 Structure of the Cul1ndashRbx1ndashSkp1ndashF boxSkp2 SCF ubiquitin ligase complex Nature 416703ndash709

2904

Domman et al doi101093molbevmsu227 MBE

Page 15: Massive Expansion of Ubiquitination-Related Gene Families

Li L Stoeckert CJ Jr Roos DS 2003 OrthoMCL identification ofortholog groups for eukaryotic genomes Genome Res 132178ndash2189

Lomma M Dervins-Ravault D Rolando M Nora T Newton HJ SansomFM Sahr T Gomez-Valero L Jules M Hartland EL et al 2010The Legionella pneumophila F-box protein Lpp2082 (AnkB) modu-lates ubiquitination of the host protein parvin B and promotesintracellular replication Cell Microbiol 121272ndash1291

Lowe TM Eddy SR 1997 tRNAscan-SE a program for improved detec-tion of transfer RNA genes in genomic sequence Nucleic Acids Res25955ndash964

Lynch M Conery JS 2000 The evolutionary fate and consequences ofduplicate genes Science 2901151ndash1155

Ma W Dong FFT Stavrinides J Guttman DS 2006 Type III effec-tor diversification via both pathoadaptation and horizontaltransfer in response to a coevolutionary arms race PLoS Genet 2e209

Magori S Citovsky V 2011 Hijacking of the host SCF ubiquitin ligasemachinery by plant pathogens Front Plant Sci 287

Martin D Rybicki E 2000 RDP detection of recombination amongstaligned sequences Bioinformatics 16562ndash563

Martin DP Lemey P Lott M Moulton V Posada D Lefeuvre P 2010RDP3 a flexible and fast computer program for analyzing recombi-nation Bioinformatics 262462ndash2463

Martin DP Posada D Crandall KA Williamson C 2005 A modifiedbootscan algorithm for automated identification of recombinantsequences and recombination breakpoints AIDS Res HumRetroviruses 2198ndash102

McCutcheon JP Moran NA 2011 Extreme genome reduction in sym-biotic bacteria Nat Rev Microbiol 1013ndash26

McLeod MP Warren RL Hsiao WWL Araki N Myhre M Fernandes CMiyazawa D Wong W Lillquist AL Wang D et al 2006 Thecomplete genome of Rhodococcus sp RHA1 provides insightsinto a catabolic powerhouse Proc Natl Acad Sci U S A 10315582ndash15587

Miller MA Pfeiffer W Schwartz T 2010 Creating the CIPRES sciencegateway for inference of large phylogenetic trees In Proceedings ofthe Gateway Computing Environments Workshop (GCE) 2010 p1ndash8

Misaghi S Balsara ZR Catic A Spooner E Ploegh HL Starnbach MN2006 Chlamydia trachomatis-derived deubiquitinating enzymes inmammalian cells during infection Mol Microbiol 61142ndash150

Myers GSA Crabtree J Huot CH 2012 Deep and wide comparativegenomics of Chlamydia In Tan M Bavoil PM editors Intracellularpathogens Chlamydiales Washington (DC) ASM Press p 27ndash50

Nam J Nei M 2005 Evolutionary change of the numbers of homeoboxgenes in bilateral animals Mol Biol Evol 222386ndash2394

Navarro-Quezada A Schumann N Quint M 2013 Plant F-box proteinevolution is determined by lineage-specific timing of major genefamily expansion waves PLoS One 8e68672

NCBI Resource Coordinators 2014 Database resources of the NationalCenter for Biotechnology Information Nucleic Acids Res 42D7ndashD17

Nei M 2007 The new mutation theory of phenotypic evolution ProcNatl Acad Sci U S A 10412235ndash12242

Nei M Niimura Y Nozawa M 2008 The evolution of animal chemo-sensory receptor gene repertoires roles of chance and necessity NatRev Genet 9951ndash963

Nei M Rooney AP 2005 Concerted and birth-and-death evolution ofmultigene families Annu Rev Genet 39121ndash152

Nogueira T Touchon M Rocha EPC 2012 Rapid evolution of the se-quences and gene repertoires of secreted proteins in bacteria PLoSOne 7e49403

Nozawa M Kawahara Y Nei M 2007 Genomic drift and copy numbervariation of sensory receptor genes in humans Proc Natl Acad SciU S A 10420421ndash20426

Ochman H Lawrence JG Groisman EA 2000 Lateral gene transfer andthe nature of bacterial innovation Nature 405299ndash304

Perez-Torrado R Yamada D Defossez P-A 2006 Born to bind the BTBproteinndashprotein interaction domain BioEssays 281194ndash1202

Peters J Wilson DP Myers G Timms P Bavoil PM 2007 Type III secre-tion a la Chlamydia Trends Microbiol 15241ndash251

Posada D Crandall KA 2001 Evaluation of methods for detecting re-combination from DNA sequences computer simulations Proc NatlAcad Sci U S A 9813757ndash13762

Price CT Kwaik YA 2010 Exploitation of host polyubiquitination ma-chinery through molecular mimicry by eukaryotic-like bacterial F-box effectors Cell Infect Microbiol 1122

Price CT Al-Khodor S Al-Quadan T Santic M Habyarimana F Kalia AKwaik YA 2009 Molecular mimicry by an F-box effector ofLegionella pneumophila hijacks a conserved polyubiquitinationmachinery within macrophages and protozoa PLoS Pathog 5e1000704

Price CTD Al-Quadan T Santic M Rosenshine I Kwaik YA 2011 Hostproteasomal degradation generates amino acids essential for intra-cellular bacterial growth Science 3341553ndash1557

Price MN Dehal PS Arkin AP 2010 FastTree 2mdashapproximately maxi-mum-likelihood trees for large alignments PLoS One 5e9490

Pushker R Mira A Rodriguez-Valera F 2004 Comparative geno-mics of gene-family size in closely related bacteria Genome Biol 5R27

Ronquist F Huelsenbeck JP 2003 MrBayes 3 Bayesian phylogeneticinference under mixed models Bioinformatics 191572ndash1574

Rooney AP Ward TJ 2008 Birth-and-death evolution of the internalinmultigene family in Listeria Gene 427124ndash128

Sakakibara A Hattori S 2000 Chat a CasHEF1-associated adaptor pro-tein that integrates multiple signaling pathways J Biol Chem 2756404ndash6410

Santoyo G Romero D 2005 Gene conversion and concerted evolutionin bacterial genomes FEMS Microbiol Rev 29169ndash183

Sawyer S 1989 Statistical tests for detecting gene conversion Mol BiolEvol 6526ndash538

Schmitz-Esser S Tischler P Arnold R Montanaro J Wagner M Rattei THorn M 2010 The genome of the amoeba symbiont ldquoCandidatusAmoebophilus asiaticusrdquo reveals common mechanisms for host cellinteraction among amoeba-associated bacteria J Bacteriol 1921045ndash1057

Schmitz-Esser S Toenshoff ER Haider S Heinz E Hoenninger VMWagner M Horn M 2008 Diversity of bacterial endosymbionts ofenvironmental Acanthamoeba isolates Appl Environ Microbiol 745822ndash5831

Smith JM 1992 Analyzing the mosaic structure of genes J Mol Evol 34126ndash129

Stamatakis A 2006 RAxML-VI-HPC maximum likelihood-based phylo-genetic analyses with thousands of taxa and mixed modelsBioinformatics 222688ndash2690

Stephens RS Kalman S Lammel C Fan J Marathe R Aravind L MitchellW Olinger L Tatusov RL Zhao Q et al 1998 Genome sequence ofan obligate intracellular pathogen of humans Chlamydia trachoma-tis Science 282754ndash759

Stogios PJ Downs GS Jauhal JJ Nandra SK Prive GG 2005Sequence and structural analysis of BTB domain proteins GenomeBiol 6R82

Stolzer M Lai H Xu M Sathaye D Vernot B Durand D 2012Inferring duplications losses transfers and incomplete line-age sorting with nonbinary species trees Bioinformatics 28i409ndashi415

Subtil A Collingro A Horn M 2014 Tracing the primordial Chlamydiaeextinct parasites of plants Trends Plant Sci 1936ndash43

Subtil A Parsot C Dautry-Varsat A 2001 Secretion of predicted Incproteins of Chlamydia pneumoniae by a heterologous type III ma-chinery Mol Microbiol 39792ndash800

Suyama M Torrents D Bork P 2006 PAL2NAL robust conversion ofprotein sequence alignments into the corresponding codon align-ments Nucleic Acids Res 34W609ndashW612

2903

Ubiquitination-Related Gene Families in Chlamydiae doi101093molbevmsu227 MBE

Tamura K Peterson D Peterson N Stecher G Nei M Kumar S 2011MEGA5 molecular evolutionary genetics analysis using maximumlikelihood evolutionary distance and maximum parsimony meth-ods Mol Biol Evol 282731ndash2739

Thomas JH 2006 Adaptive evolution in two large families of ubiquitin-ligase adapters in nematodes and plants Genome Res 161017ndash1030

Treangen TJ Rocha EPC 2011 Horizontal transfer not duplicationdrives the expansion of protein families in prokaryotes PLoSGenet 7e1001284

UniProt Consortium 2014 Activities at the Universal Protein Resource(UniProt) Nucleic Acids Res 42D191ndashD198

Valen LV 1974 Molecular evolution as predicted by natural selectionJ Mol Evol 389ndash101

Willems AR Schwab M Tyers M 2004 A hitchhikerrsquos guide to the cullinubiquitin ligases SCF and its kin Biochim Biophys Acta 1695133ndash170

Xu G Ma H Nei M Kong H 2009 Evolution of F-box genes in plantsdifferent modes of sequence divergence and their relationships withfunctional diversification Proc Natl Acad Sci U S A 106835ndash840

Xu J 2006 Microbial ecology in the age of genomics andmetagenomics concepts tools and recent advances Mol Ecol 151713ndash1731

Yang J Lusk R Li W-H 2003 Organismal complexity protein complex-ity and gene duplicability Proc Natl Acad Sci U S A 10015661ndash15665

Yang Z 2007 PAML 4 phylogenetic analysis by maximum likelihoodMol Biol Evol 241586ndash1591

Yu NY Wagner JR Laird MR Melli G Rey S Lo R Dao P Sahinalp SCEster M Foster LJ et al 2010 PSORTb 30 improved protein sub-cellular localization prediction with refined localization subcate-gories and predictive capabilities for all prokaryotes Bioinformatics261608ndash1615

Zhang J 2003 Evolution by gene duplication an update Trends EcolEvol 18292ndash298

Zheng N Schulman BA Song L Miller JJ Jeffrey PD Wang P Chu CKoepp DM Elledge SJ Pagano M et al 2002 Structure of the Cul1ndashRbx1ndashSkp1ndashF boxSkp2 SCF ubiquitin ligase complex Nature 416703ndash709

2904

Domman et al doi101093molbevmsu227 MBE

Page 16: Massive Expansion of Ubiquitination-Related Gene Families

Tamura K Peterson D Peterson N Stecher G Nei M Kumar S 2011MEGA5 molecular evolutionary genetics analysis using maximumlikelihood evolutionary distance and maximum parsimony meth-ods Mol Biol Evol 282731ndash2739

Thomas JH 2006 Adaptive evolution in two large families of ubiquitin-ligase adapters in nematodes and plants Genome Res 161017ndash1030

Treangen TJ Rocha EPC 2011 Horizontal transfer not duplicationdrives the expansion of protein families in prokaryotes PLoSGenet 7e1001284

UniProt Consortium 2014 Activities at the Universal Protein Resource(UniProt) Nucleic Acids Res 42D191ndashD198

Valen LV 1974 Molecular evolution as predicted by natural selectionJ Mol Evol 389ndash101

Willems AR Schwab M Tyers M 2004 A hitchhikerrsquos guide to the cullinubiquitin ligases SCF and its kin Biochim Biophys Acta 1695133ndash170

Xu G Ma H Nei M Kong H 2009 Evolution of F-box genes in plantsdifferent modes of sequence divergence and their relationships withfunctional diversification Proc Natl Acad Sci U S A 106835ndash840

Xu J 2006 Microbial ecology in the age of genomics andmetagenomics concepts tools and recent advances Mol Ecol 151713ndash1731

Yang J Lusk R Li W-H 2003 Organismal complexity protein complex-ity and gene duplicability Proc Natl Acad Sci U S A 10015661ndash15665

Yang Z 2007 PAML 4 phylogenetic analysis by maximum likelihoodMol Biol Evol 241586ndash1591

Yu NY Wagner JR Laird MR Melli G Rey S Lo R Dao P Sahinalp SCEster M Foster LJ et al 2010 PSORTb 30 improved protein sub-cellular localization prediction with refined localization subcate-gories and predictive capabilities for all prokaryotes Bioinformatics261608ndash1615

Zhang J 2003 Evolution by gene duplication an update Trends EcolEvol 18292ndash298

Zheng N Schulman BA Song L Miller JJ Jeffrey PD Wang P Chu CKoepp DM Elledge SJ Pagano M et al 2002 Structure of the Cul1ndashRbx1ndashSkp1ndashF boxSkp2 SCF ubiquitin ligase complex Nature 416703ndash709

2904

Domman et al doi101093molbevmsu227 MBE