evolutionary genetics of the capsular locus of serogroup 6 … · other sts in the group from the...

12
JOURNAL OF BACTERIOLOGY, Dec. 2004, p. 8181–8192 Vol. 186, No. 24 0021-9193/04/$08.000 DOI: 10.1128/JB.186.24.8181–8192.2004 Copyright © 2004, American Society for Microbiology. All Rights Reserved. Evolutionary Genetics of the Capsular Locus of Serogroup 6 Pneumococci Angeliki Mavroidi, 1 Daniel Godoy, 1 David M. Aanensen, 1 D. Ashley Robinson, 2 Susan K. Hollingshead, 3 and Brian G. Spratt 1 * Department of Infectious Disease Epidemiology, Imperial College London, St. Mary’s Campus, London, 1 and Department of Biology and Biochemistry, University of Bath, Bath, 2 United Kingdom, and Department of Microbiology, University of Alabama at Birmingham, Birmingham, Alabama 3 Received 18 July 2004/Accepted 6 September 2004 The evolution of the capsular biosynthetic (cps) locus of serogroup 6 Streptococcus pneumoniae was investi- gated by analyzing sequence variation within three serotype-specific cps genes from 102 serotype 6A and 6B isolates. Sequence variation within these cps genes was related to the genetic relatedness of the isolates, determined by multilocus sequence typing, and to the inferred patterns of recent evolutionary descent, explored using the eBURST algorithm. The serotype-specific cps genes had a low percent GC, and there was a low level of sequence diversity in this region among serotype 6A and 6B isolates. There was also little sequence diver- gence between these serotypes, suggesting a single introduction of an ancestral cps sequence, followed by slight divergence to create serotypes 6A and 6B. A minority of serotype 6B isolates had cps sequences (class 2 sequences) that were 5% divergent from those of other serotype 6B isolates (class 1 sequences) and which may have arisen by a second, more recent introduction from a related but distinct source. Expression of a serotype 6A or 6B capsule correlated perfectly with a single nonsynonymous polymorphism within wciP, the rhamnosyl transferase gene. In addition to ample evidence of the horizontal transfer of the serotype 6A and 6B cps locus into unrelated lineages, there was evidence for relatively frequent changes from serotype 6A to 6B, and vice versa, among very closely related isolates and examples of recent recombinational events between class 1 and 2 cps serogroup 6 sequences. The capsular biosynthetic loci (cps) of encapsulated bacteria typically have a modular construction, with central genes that specify the immunochemical differences between capsular se- rotypes flanked by genes common to all or several serotypes (2, 10). In pneumococci, the central cps genes have a low percent GC content compared to the flanking cps genes and the rest of the genome and may have been acquired by horizontal gene transfer from distantly related species (2, 10). Although it appears that the serotype-specific regions of the cps locus have been introduced into pneumococci on multiple occasions from unknown sources, the number of times this may have hap- pened, and the time scale over which such events occurred, are obscure. Thus, we do not know if some serotypes are very much younger than others and whether new pneumococcal serotypes arise at intervals of a few decades, a few centuries, or many millennia. It is also not known why there are so many (90) different capsular types in pneumococci (12) or whether the emergence of new serotypes is positively selected for by the host immune system and, if so, the intensity of this selection. Similarly, we know little about the evolutionary relationships among the capsular loci of the immunologically cross-reactive serotypes within a serogroup. Are the serological similarities of the serotypes within a single serogroup a reflection of their divergence from a recent common ancestral serotype, or does convergence occur, in which the introduction of very different cps sequences has resulted in immunochemically similar capsular polysaccharides? The answers to many of these ques- tions should start to emerge now that the sequences of the complete cps loci of all 90 serotypes are becoming available (1, 7, 11, 13, 14, 17, 20–25, 31, 32; http://www.sanger.ac.uk/Projects /S_pneumoniae/CPS/). The evolution of individual serotypes can be addressed by exploring the patterns of sequence diversity among isolates of a single serotype and relating these to the levels of genetic relatedness and patterns of evolutionary descent of the iso- lates. The evolution of the cps locus is likely to be complicated. Many of the cps genes are common to several serotypes, so a history of recombination among these homologs may confound attempts to discern ancestry. However, one or more genes within the serotype-specific region of the cps locus often have no significant nucleotide sequence similarity with the cps genes of other pneumococcal serotypes or serogroups or with any other genes in the sequence databases, although their gene products may have homology with enzymes involved in poly- saccharide biosynthesis in other species (18). Unlike the flank- ing cps genes, in which sequence variation can arise by point mutation or by homologous recombinational replacements from the homologs in other serotypes, the accumulation of variation within a serotype-specific gene can occur initially only by point mutation and, once some variation has accumulated, by recom- bination between different isolates of the same serotype. Serotype-specific cps genes that have no significant sequence similarity to genes in other pneumococcal serotypes or sero- groups can therefore be used to study the evolution of a sero- type or of the multiple serotypes within a multitypic serogroup. In this paper, we explore the origins of the two serotypes (6A and 6B) within pneumococcal serogroup 6 by analyzing pat- * Corresponding author. Mailing address: Department of Infectious Disease Epidemiology, Imperial College London, Room G22, Old Medical School Building, St. Mary’s Hospital, Norfolk Place, London W2 1PG, United Kingdom. Phone: 44 20 7594 3629. Fax: 44 20 7594 3693. E-mail: [email protected]. † Present address: Department of Microbiology and Immunology, New York Medical College, Valhalla, NY 10595. 8181 on February 21, 2021 by guest http://jb.asm.org/ Downloaded from

Upload: others

Post on 06-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evolutionary Genetics of the Capsular Locus of Serogroup 6 … · other STs in the group from the founder, were computed by eBURST and displayed as eBURST diagrams (9). STs that are

JOURNAL OF BACTERIOLOGY, Dec. 2004, p. 8181–8192 Vol. 186, No. 240021-9193/04/$08.00�0 DOI: 10.1128/JB.186.24.8181–8192.2004Copyright © 2004, American Society for Microbiology. All Rights Reserved.

Evolutionary Genetics of the Capsular Locus of Serogroup 6 PneumococciAngeliki Mavroidi,1 Daniel Godoy,1 David M. Aanensen,1 D. Ashley Robinson,2†

Susan K. Hollingshead,3 and Brian G. Spratt1*Department of Infectious Disease Epidemiology, Imperial College London, St. Mary’s Campus, London,1

and Department of Biology and Biochemistry, University of Bath, Bath,2 United Kingdom, andDepartment of Microbiology, University of Alabama at Birmingham, Birmingham, Alabama3

Received 18 July 2004/Accepted 6 September 2004

The evolution of the capsular biosynthetic (cps) locus of serogroup 6 Streptococcus pneumoniae was investi-gated by analyzing sequence variation within three serotype-specific cps genes from 102 serotype 6A and 6Bisolates. Sequence variation within these cps genes was related to the genetic relatedness of the isolates,determined by multilocus sequence typing, and to the inferred patterns of recent evolutionary descent, exploredusing the eBURST algorithm. The serotype-specific cps genes had a low percent G�C, and there was a low levelof sequence diversity in this region among serotype 6A and 6B isolates. There was also little sequence diver-gence between these serotypes, suggesting a single introduction of an ancestral cps sequence, followed by slightdivergence to create serotypes 6A and 6B. A minority of serotype 6B isolates had cps sequences (class 2sequences) that were �5% divergent from those of other serotype 6B isolates (class 1 sequences) and whichmay have arisen by a second, more recent introduction from a related but distinct source. Expression of aserotype 6A or 6B capsule correlated perfectly with a single nonsynonymous polymorphism within wciP, therhamnosyl transferase gene. In addition to ample evidence of the horizontal transfer of the serotype 6A and 6Bcps locus into unrelated lineages, there was evidence for relatively frequent changes from serotype 6A to 6B, andvice versa, among very closely related isolates and examples of recent recombinational events between class 1and 2 cps serogroup 6 sequences.

The capsular biosynthetic loci (cps) of encapsulated bacteriatypically have a modular construction, with central genes thatspecify the immunochemical differences between capsular se-rotypes flanked by genes common to all or several serotypes (2,10). In pneumococci, the central cps genes have a low percentG�C content compared to the flanking cps genes and the restof the genome and may have been acquired by horizontal genetransfer from distantly related species (2, 10). Although itappears that the serotype-specific regions of the cps locus havebeen introduced into pneumococci on multiple occasions fromunknown sources, the number of times this may have hap-pened, and the time scale over which such events occurred, areobscure. Thus, we do not know if some serotypes are verymuch younger than others and whether new pneumococcalserotypes arise at intervals of a few decades, a few centuries, ormany millennia. It is also not known why there are so many(90) different capsular types in pneumococci (12) or whetherthe emergence of new serotypes is positively selected for by thehost immune system and, if so, the intensity of this selection.Similarly, we know little about the evolutionary relationshipsamong the capsular loci of the immunologically cross-reactiveserotypes within a serogroup. Are the serological similarities ofthe serotypes within a single serogroup a reflection of theirdivergence from a recent common ancestral serotype, or doesconvergence occur, in which the introduction of very different

cps sequences has resulted in immunochemically similarcapsular polysaccharides? The answers to many of these ques-tions should start to emerge now that the sequences of thecomplete cps loci of all 90 serotypes are becoming available (1,7, 11, 13, 14, 17, 20–25, 31, 32; http://www.sanger.ac.uk/Projects/S_pneumoniae/CPS/).

The evolution of individual serotypes can be addressed byexploring the patterns of sequence diversity among isolates ofa single serotype and relating these to the levels of geneticrelatedness and patterns of evolutionary descent of the iso-lates. The evolution of the cps locus is likely to be complicated.Many of the cps genes are common to several serotypes, so ahistory of recombination among these homologs may confoundattempts to discern ancestry. However, one or more geneswithin the serotype-specific region of the cps locus often haveno significant nucleotide sequence similarity with the cps genesof other pneumococcal serotypes or serogroups or with anyother genes in the sequence databases, although their geneproducts may have homology with enzymes involved in poly-saccharide biosynthesis in other species (18). Unlike the flank-ing cps genes, in which sequence variation can arise by pointmutation or by homologous recombinational replacements fromthe homologs in other serotypes, the accumulation of variationwithin a serotype-specific gene can occur initially only by pointmutation and, once some variation has accumulated, by recom-bination between different isolates of the same serotype.

Serotype-specific cps genes that have no significant sequencesimilarity to genes in other pneumococcal serotypes or sero-groups can therefore be used to study the evolution of a sero-type or of the multiple serotypes within a multitypic serogroup.In this paper, we explore the origins of the two serotypes (6Aand 6B) within pneumococcal serogroup 6 by analyzing pat-

* Corresponding author. Mailing address: Department of InfectiousDisease Epidemiology, Imperial College London, Room G22, OldMedical School Building, St. Mary’s Hospital, Norfolk Place, LondonW2 1PG, United Kingdom. Phone: 44 20 7594 3629. Fax: 44 20 75943693. E-mail: [email protected].

† Present address: Department of Microbiology and Immunology,New York Medical College, Valhalla, NY 10595.

8181

on February 21, 2021 by guest

http://jb.asm.org/

Dow

nloaded from

Page 2: Evolutionary Genetics of the Capsular Locus of Serogroup 6 … · other STs in the group from the founder, were computed by eBURST and displayed as eBURST diagrams (9). STs that are

terns of sequence variation within the central cps region andrelating the sequence variation within these genes to the ge-netic relatedness and inferred patterns of recent evolutionarydescent of the isolates. The structures of the capsular polysac-charides of serotype 6A and 6B are identical, except for thelinkage between the L-rhamnopyranosyl and D-ribitol residuesin the tetrasaccharide repeat unit, which is 133 in serotype 6Abut 134 in serotype 6B (15, 16, 27). In addition to showingsurprisingly frequent interconversion between serotypes 6Aand 6B among closely related isolates and the spread of sero-group 6 sequences to distantly related pneumococcal lineages,we demonstrate that the difference between serotypes 6A and6B correlates with a single nonsynonymous substitution in theputative rhamnosyl transferase gene (wciP).

MATERIALS AND METHODS

Bacterial isolates. A group of 56 serogroup 6 isolates were from the study byRobinson et al. (28), selected to cover the diversity of serotype 6A and 6Bisolates among those clonal complexes in which isolates of both serotypes wereobserved and all available examples of serogroup 6 isolates that were identical,or very closely related, in genotype by multilocus sequence typing (MLST) butwhich differed in serotype (6A versus 6B). That study used an MLST scheme thatcharacterized loci different from those in the standard pneumococcal MLSTscheme, and the selected isolates from the study were therefore retyped using thestandard MLST scheme of Enright and Spratt (8). The matrix of pairwise dif-ferences in the allelic profiles of all isolates of serotypes 6A and 6B within thepublic MLST database (http://spneumoniae.mlst.net), and of the additional iso-lates we selected and characterized by MLST from the study of Robinson et al.(28), was used to construct a dendrogram (data not shown). The implied geneticrelatedness of these isolates was used to select additional isolates from ourcollection at Imperial College that further covered the diversity of sequencetypes (STs) of these two serotypes. In addition, we analyzed all isolates ofserogroup 6 in the Imperial College collection that were identical by MLST ordiffered at only one MLST locus (single-locus variants [SLVs]) but whose sero-types varied. The final set of 102 serogroup 6 isolates chosen for further studyincluded 43 isolates of serotype 6A and 59 isolates of serotype 6B.

Characterization of isolates. Chromosomal DNA from each isolate was pre-pared, and MLST was carried out using the seven pneumococcal housekeepingloci and primers described previously (8). For each locus, sequences were ob-tained on both DNA strands using an ABI3700 DNA sequencer (Applied Bio-systems, Warrington, United Kingdom). Alleles at each locus were assigned byinterrogating the pneumococcal MLST website (http://spneumoniae.mlst.net).The trace files of sequences that were different from those of all known alleleswere carefully checked and were assigned new allele numbers. All of the isolateshave been deposited in the MLST database. Serotyping was carried out using theQuellung reaction with sera purchased from the Statens Seruminstitut (Copen-hagen, Denmark). Some isolates were cross-checked for serotype by inhibitionenzyme-linked immunosorbent assay, as previously described (29). This test usedtwo antibodies: Hyp6BM8 recognizes a cross-reactive epitope and binds withequal avidity to serotype 6A and 6B capsular polysaccharides, while Hyp6BM1recognizes a serotype 6B-specific epitope and fails to bind to serotype 6A poly-saccharide. The results were all concordant between the two serotyping systems.

The relatedness of isolates was examined by constructing a dendrogram fromthe matrix of pairwise differences in the allelic profiles of all isolates using theunweighted pair-group method with arithmetic averages (UPGMA). Nonover-lapping groups of related STs were identified using eBURST, with the defaultsetting for the definition of groups (9; http://eburst.mlst.net/). With this defini-tion, all STs assigned to the same eBURST group have alleles at �6 of the 7MLST loci in common with at least one other member of the group, and theeBURST group is equated with a clonal complex in which all STs have probablydescended from a single recent common ancestor or founding ST. The predictedfounding ST of each eBURST group, and the predicted patterns of descent of allother STs in the group from the founder, were computed by eBURST anddisplayed as eBURST diagrams (9). STs that are directly linked to each other ineBURST diagrams differ at only a single locus (i.e., they are SLVs). For each STin an eBURST group, the statistical support for being the founder of the groupwas assessed by the percent recovery of the ST as the founder in 1,000 bootstrapresamplings (9).

Analysis of the serogroup 6 cps locus. The annotated sequences of the cps lociof two serotype 6A isolates deposited in GenBank (accession numbersAY078347 and AF246898) and of three serotype 6B isolates in GenBank(AF316640 [14], AF298581, and AF246897), and additional sequences of aserotype 6A and a 6B isolate from the Sanger Institute cps website (http://www.sanger.ac.uk/Projects/S_pneumoniae/CPS/), were aligned using the ClustalWalgorithm within MEGA version 3.0 (19). Primers were designed from thesealigned sequences that allowed internal fragments of three of the central cpsgenes of serotype 6A and 6B isolates to be amplified from each isolate by PCRand sequenced. The primers used both for PCR amplification and for sequencingwere wciP-up, 5�-ATGGTGAGAGATATTTGTCAC-3�, and wciP-down, 5�-AGCATGATGGTATATAAGCC-3�, for the wciP gene; S6-wzy-F, 5�-CCTAAAGTGGAGGGAATTTCG-3�, and S6-wzy-R, 5�-CCTCCCATATAACGAGTGATG-3�, for the wzy gene; and S6-wzx-F, 5�-TTCGAATGGGAATTCAATGG-3�,and S6-wzx-R, 5�-GCGAGCCAAATCGGTAAGTA-3�, for the wzx gene. ThePCRs were performed in a total volume of 50 �l, consisting of 2 �l of DNAlysate, 50 pmol of each primer, 200 �M deoxynucleoside triphosphates, and 2.5U of Taq DNA polymerase in 1� PCR buffer (QIAGEN Ltd., Crawley, UnitedKingdom). The conditions used for amplification of the PCR products were adenaturation step at 95°C for 5 min and 30 cycles of 95°C for 1 min, 58°C for 30 s,and 72°C for 1 min, followed by a final extension at 72°C for 10 min in a PTC-200Thermal Cycler (MJ Research Inc., Waltham, Mass.). Sequencing was carriedout as for the MLST gene fragments.

For each cps gene fragment, the sequences from the different isolates werecompared, and each different sequence was assigned as a distinct allele and givena different allele number. The allele numbers assigned for the three cps genefragments (in the order wciP-wzy-wzx) defined the cps profile of the isolate. Thesequences of the three cps fragments were concatenated, maintaining the �1reading frame, and a tree was constructed from the concatenated sequences(1,614 bp), and from the individual gene fragments, by the neighbor-joiningmethod using MEGA version 3.0 (19). Variable nucleotides and amino acidswithin the cps sequences and translated products were displayed using MEGAversion 3.0.

One serotype 6B isolate (GenBank accession number AF246897) displayed adivergent sequence compared with the other serogroup 6 isolates, and in thisisolate there was an �300-bp indel between the wciN and wciO genes. Thepresence or absence of the indel was established by the size of the intergenicfragment amplified using primers at the end of wciN (WCI-up, 5�-ATTTGGTGTACTTCCTCC-3�) and the start of wciO (WCI-down, 5�-CCATCCTTCGAGTATTGC-3�). The PCRs were performed as described above. A predicted 958-bpfragment was obtained for isolates without the indel, and a 1,267-bp fragmentwas obtained for those with the indel.

The structures of the cps loci of serotype 6A and 6B isolates, the similaritiesbetween the sequences, and the percent G�C content were visualized using theArtemis Comparison Tool developed by Kim Rutherford at the Sanger Institute(http://www.sanger.ac.uk/Software/ACT/).

RESULTS

Genetic relatedness of serotype 6A and 6B isolates. Fig. 1shows a UPGMA dendrogram based on the pairwise differ-ences in the MLST allelic profiles of the 102 serotype 6A and6B isolates selected for further study. There were 78 differentSTs. Many of the isolates were relatively distantly related toeach other, but there were several clusters of isolates withclosely related STs. Four STs included isolates of both sero-types 6A and 6B, and there were several further cases in whichisolates of both serotypes were present in STs that were SLVs,differing at only a single MLST locus (Fig. 1).

The central cps region of serogroup 6 isolates. Figure 2shows the structures and percent G�C contents of the cps lociof serotype 6A and two serotype 6B isolates. The central re-gion of the cps locus has a low-average percent G�C content(28.5%) compared to that of the flanking genes and to thattypical of the pneumococcal genome (39.7%). Within the cen-tral region of low percent G�C are the wciN, wciO, wciP, wzy,and wzx genes. The wciO, wciP, wzy, and wzx sequences of theserotype 6A and 6B isolates showed �90% nucleotide se-

8182 MAVROIDI ET AL. J. BACTERIOL.

on February 21, 2021 by guest

http://jb.asm.org/

Dow

nloaded from

Page 3: Evolutionary Genetics of the Capsular Locus of Serogroup 6 … · other STs in the group from the founder, were computed by eBURST and displayed as eBURST diagrams (9). STs that are

FIG. 1. Relatedness of serogroup 6 isolates of S. pneumoniae. A UPGMA tree was constructed from the matrix of pairwise differences in theallelic profiles of the 102 isolates. The serotype of each isolate is shown, followed by the isolate name, ST number, and cps profile. The presenceof the indel between the wciN and wciO genes is also indicated. STs that include isolates of both serotypes 6A and 6B are indicated by asterisks.

8183

on February 21, 2021 by guest

http://jb.asm.org/

Dow

nloaded from

Page 4: Evolutionary Genetics of the Capsular Locus of Serogroup 6 … · other STs in the group from the founder, were computed by eBURST and displayed as eBURST diagrams (9). STs that are

8184 MAVROIDI ET AL. J. BACTERIOL.

on February 21, 2021 by guest

http://jb.asm.org/

Dow

nloaded from

Page 5: Evolutionary Genetics of the Capsular Locus of Serogroup 6 … · other STs in the group from the founder, were computed by eBURST and displayed as eBURST diagrams (9). STs that are

quence identity, and the first three genes had �65% identityto the sequences of any of the genes in the cps loci of thoseserogroups available at GenBank or at the S. pneumoniaecps website at the Sanger Institute (http://www.sanger.ac.uk/Projects/S_pneumoniae/CPS/) and to all other sequences atGenBank. The wzx gene of the serogroup 6 isolates alsoshowed �65% identity to the cps regions of other pneumococ-cal serogroups and to other sequences in GenBank, except thesequences of serotype 19A and 19F pneumococci, where thesimilarity was �82%. In contrast, the upstream wzg, wzh, wzd,wze, wchA, and wciN genes and the downstream rhamnosebiosynthetic genes (rmlA-rmlD) of the serotype 6A and 6B cpsloci had high levels of sequence similarity with genes within thecps loci of pneumococci of several other serogroups.

Sequences of internal fragments of three genes within theserotype-specific region of the cps locus (wciP, wzy, and wzx)were obtained from the 102 serotype 6A and 6B isolates, andthe alleles at each cps locus and the cps allelic profile wereassigned for each isolate, as described in Materials and Meth-ods.

Sequences of the wciP gene. The wciP product is homologousto rhamnosyl transferases in other species, and since the sero-group 6 capsule contains rhamnose, it has been assigned thisfunction. A 645-bp internal region of the gene was sequenced,and 11 distinct alleles were distinguished among the 102 iso-lates (Fig. 3). The alleles in serotype 6A isolates and in themajority of serogroup 6B isolates were relatively similar. How-ever, two divergent alleles (wciP8 and wciP12), which differedfrom each other at only a single nonsynonymous site, werefound in 14 serotype 6B isolates (Fig. 1). These divergentalleles differed at �3% of sites from the alleles in the majorityof serotype 6B isolates. A single serotype 6A isolate (ACH-C2)had an allele that was a perfect mosaic (wciP7), in which thefront 537 bp were identical to an allele found in two serotype6A isolates (wciP9) and the rest of the fragment was identicalto a divergent allele (wciP8) from serotype 6B isolates. ThewciP1, -2, -7, -9, and -11 alleles were found exclusively amongserotype 6A isolates, whereas wciP3, -4, -5, and -6 and themore divergent wciP8 and -12 were restricted to serotype 6Bisolates. In no case was the same wciP allele found to bepresent in isolates of both serotypes 6A and 6B.

Sequences of the wzy gene. The wzy gene product showshomology to polysaccharide polymerases. A 492-bp internalfragment of wzy was sequenced from the 102 isolates. Therewere nine alleles, two of which (wzy-1 and wzy-7) were found inboth serotype 6A and 6B isolates (Fig. 3). Two alleles weremore divergent (wzy-7 and wzy-9), and with one exception,these were found only in serotype 6B isolates. The exceptionwas wzy-7, which was also present in the serotype 6A isolateACH-C2. One allele (wzy-10) had a 6-bp deletion.

Sequences of the wzx gene. The wzx gene product has beenassigned by homology as a polysaccharide transporter or flip-pase. A 477-bp internal fragment of wzx was sequenced fromthe 102 isolates. There were eight alleles, four of which werepresent in both serotype 6A and 6B isolates (Fig. 3). Two of thealleles (wzx-6 and -7) were divergent, and a further allele wasa perfect mosaic (wzx-5), the front 228 bp being identical to thedivergent alleles wzx-6 and -7, whereas the rest of the fragmentwas identical to wzx-4. The divergent alleles were all present in

FIG

.2.

cps

regi

ons

ofse

roty

pe6A

and

6BS.

pneu

mon

iae.

The

sim

ilari

ties

betw

een

the

cps

loci

ofa

sero

type

6Ais

olat

ean

da

sero

type

6Bis

olat

ew

itha

clas

s1

cps

sequ

ence

(bot

hfr

omth

eSa

nger

Inst

itute

web

site

[htt

p://w

ww

.san

ger.

ac.u

k/Pr

ojec

ts/S

_pne

umon

iae/

CPS

/])an

da

sero

type

6Bis

olat

ew

itha

clas

s2

cps

sequ

ence

from

Gen

Ban

k(A

F24

897)

wer

edi

spla

yed

usin

gth

eA

rtem

isC

ompa

riso

nT

ool(

http

://w

ww

.san

ger.

ac.u

k/So

ftw

are/

AC

T/)

.The

aver

age,

max

imum

,and

min

imum

perc

ent

G�

Cco

nten

tsal

ong

the

cps

locu

san

dth

elo

catio

nsof

the

cps

gene

sar

esh

own.

The

�30

0-bp

inde

lbet

wee

nth

ew

ciN

and

wci

Oge

nes

isin

dica

ted

bya

stri

ped

box.

Blo

cks

ofre

dco

lor

indi

cate

sequ

ence

hom

olog

ybe

twee

npa

irs

ofse

quen

ces.

VOL. 186, 2004 EVOLUTION OF THE SEROGROUP 6 cps LOCUS 8185

on February 21, 2021 by guest

http://jb.asm.org/

Dow

nloaded from

Page 6: Evolutionary Genetics of the Capsular Locus of Serogroup 6 … · other STs in the group from the founder, were computed by eBURST and displayed as eBURST diagrams (9). STs that are

FIG. 3. Allelic variation in the cps genes. (A) Neighbor-joining trees showing the relatedness among the alleles at wciP, wzy, and wzx. The scalesrepresent a genetic distance of 0.5%. (B) The polymorphic nucleotide sites are shown for all alleles of the three cps genes. The nucleotides arenumbered in vertical format. The nucleotide at each variable site is shown for the first allele, and only the nucleotides that differ from those in thissequence are shown. The dashes show the deletion in wzy-10. Alleles marked by an asterisk were not found in this study and are from publishedsequences of the cps genes deposited at GenBank (wciP10 in AF298581, wzy-8 and wzx-9 in AY078347, and wzx-10 in AF316640). (C) Thepolymorphic amino acids in the translated nucleotide sequences are shown. Dashes show the deletion in Wzy-10.

8186 MAVROIDI ET AL. J. BACTERIOL.

on February 21, 2021 by guest

http://jb.asm.org/

Dow

nloaded from

Page 7: Evolutionary Genetics of the Capsular Locus of Serogroup 6 … · other STs in the group from the founder, were computed by eBURST and displayed as eBURST diagrams (9). STs that are

serotype 6B isolates, except the serotype 6A isolate ACH-C2,which possessed wzx-6 (Fig. 1).

Sequences of the central region of the cps locus of serotype6A and 6B isolates. The sequences of the wciP, wzy, and wzxfragments were joined end to end (concatenated). Twenty-onedifferent sequences (cps profiles) were present among the 102serotype 6A and 6B isolates. The most prevalent cps profileswere 2-1-1 among the serotype 6A isolates (23 out of 43) and

4-2-2 and 8-7-7 among the serotype 6B isolates (31 and 9 outof 59). The polymorphic sites within the concatenated se-quences (1,614 bp), and a neighbor-joining tree illustrating therelatedness among these sequences, are shown in Fig. 4.

The majority of the sequences were very similar, and therewas clustering of sequences according to the serotypes of theisolates. Within the main cluster of sequences (Fig. 4C) therewas only one sequence (cps profile 3-1-1, present within two

FIG. 4. Concatenated sequences of the cps genes of serogroup 6 isolates. (A) Polymorphic nucleotide sites within the concatenated wciP, wzy,and wzx genes are displayed as in Fig. 3. The cps profile corresponding to each sequence is shown. The first block of sequences are from serotype6A isolates, and all of the sequences including and below the 3-1-1 profile are from serotype 6B isolates. The last block shows the divergent class2 sequences. One serotype 6A sequence (cps profile 7-7-6) and one serotype 6B sequence (cps profile 8-7-5) are perfect mosaics between class 1and class 2 sequences. (B) The polymorphic amino acids in the translated concatenated sequences are shown. The dashes in the cps profile 9-10-1in panels A and B represent deleted nucleotides or amino acids. (C) Neighbor-joining tree constructed from the concatenated nucleotidesequences; taxa are labeled according to their cps profiles. Œ, serotype 6A isolates; ■, serotype 6B isolates with class 1 sequences; F, serotype 6Bisolates with class 2 sequences; ‚, serotype 6B isolates APH-10 and AAU-19 (cps profile 3-1-1); E, serotype 6B isolates APO-445 and APO-446(cps profile 8-7-5); A, serotype 6A isolate ACH-C2 (cps profile 7-7-6).

VOL. 186, 2004 EVOLUTION OF THE SEROGROUP 6 cps LOCUS 8187

on February 21, 2021 by guest

http://jb.asm.org/

Dow

nloaded from

Page 8: Evolutionary Genetics of the Capsular Locus of Serogroup 6 … · other STs in the group from the founder, were computed by eBURST and displayed as eBURST diagrams (9). STs that are

isolates, APH-10 and AAU-19) that clustered anomalously;these isolates were serotype 6B, but their cps region clusteredwithin sequences restricted to serotype 6A isolates. Inspectionof the sequence showed that this serotype 6B sequence differedat only one nucleotide site (a nonsynonymous substitution inwciP) from the most prevalent sequence among the serotype6A isolates (cps profile 2-1-1).

Twelve serotype 6B isolates had very closely related se-quences (cps profiles 8-7-7, 12-7-7, 8-7-6, and 8-9-7) that were�5% divergent from those of the other serotype 6B isolates(Fig. 4A and C). These 12 isolates all had divergent alleles atwciP, wzy, and wzx, indicating that their cps regions were likelyto be divergent throughout the whole central cps region. Nineof these serotype 6B isolates had the same sequence (cps pro-file 8-7-7), and the other three each had a different sequence,but each differed from the prevalent class 2 sequence at only asingle nucleotide site. We refer to these divergent sequences asclass 2 sequences to distinguish them from the class 1 se-quences in the majority of serotype 6B isolates and in all of theserotype 6A isolates we examined. One further sequence (cpsprofile 8-7-5), present in two serotype 6B isolates (APO-445and APO-446), clustered on the tree between the class 1 and 2sequences and was a perfect mosaic (Fig. 4C); the sequencethroughout the wciP and wzy fragments, and the first half ofwzx, was identical to that in the predominant class 2 sequenceof serotype 6B isolates, whereas the rest of the wzx fragmentwas identical to a class 1 sequence in several serotype 6Bisolates (Fig. 4A).

One serotype 6A isolate (ACH-C2) also had a sequence (cpsprofile 7-7-6) that clustered on the tree between the class 1 andclass 2 sequences and was a mosaic (Fig. 4C). As noted earlier,the first part of the wciP fragment in ACH-C2 was identical tothat in some of the serotype 6A isolates, whereas the sequenceof the rest of this fragment, and of the entire wzx and wzyfragments, was identical to the class 2 sequences in a serotype6B isolate (AIS-C31).

The average pairwise diversity among the eight distinct con-catenated sequences from serotype 6A isolates (excluding themosaic sequence) was 0.4%; the sequences differed on averageat 6.2 nucleotide sites. The class 1 serotype 6B sequences weresimilarly uniform (0.3% average pairwise diversity; differencesat an average of 4.5 sites), and those of class 2 serotype 6Bisolates (excluding the mosaic sequence) were extremely uni-form (0.1% diversity), with the three unique class 2 sequenceseach differing from the predominant class 2 sequence at only asingle nucleotide site. The average divergence between theserotype 6A sequences and the class 1 serotype 6B sequenceswas only 0.6%, whereas the divergence between both the se-rotype 6A and the class 1 serotype 6B sequences and the class2 serotype 6B sequences was 5.4%.

Distribution of the wciN-wciO intergenic indel. The presenceor absence of the indel between the wciN and wciO genes wasexamined in all serotype 6A and 6B isolates by PCR. The indelwas restricted to those isolates with class 2 cps sequences (all ofwhich were serotype 6B) and to the one serotype 6A isolate(ACH-C2) and two serotype 6B isolates (APO-445 and APO-446) that had mosaic cps sequences (Fig. 1). Only two of theserotype 6B isolates with class 2 cps sequences lacked this indel(ASA-20 and AIS-C31) (Fig. 1).

Horizontal spread of the serotype 6A and 6B cps genes intodivergent pneumococcal lineages. Figure 1 shows the consid-erable diversity of genotypes among the 102 serogroup 6 iso-lates. The majority of the serogroup 6 isolates that were dis-tantly related to all other serogroup 6 isolates were divergentlineages, as they were also not closely related to isolates ofother serogroups in the MLST database. A history of horizon-tal transfer was apparent from the presence of the same cpsprofiles in several distantly related genetic backgrounds. Al-though many cps profiles were found only in isolates of a singleST or were restricted to isolates that were closely related byMLST, 7 of 21 (33%) cps profiles were present in pneumococcithat were distantly related and differed at �6 of the 7 MLSTloci (Fig. 1). This was particularly true of the cps profiles 2-1-1,4-2-2, and 8-7-7 that predominated, respectively, among sero-type 6A and serotype 6B isolates with class 1 and 2 sequences.The lack of any sequence variation in the three cps gene frag-ments in divergent isolates that had sequence differences at all(or most) MLST loci is most readily explained by the recenthorizontal spread of the serotype 6A and 6B cps locus throughthe pneumococcal population.

In a few cases, serogroup 6 isolates were similar in genotypeto isolates of other serotypes. For example, STs 947 and 948were SLVs of each other and were both serotype 6B and hadthe same cps profile (5-4-1), but they were not closely relatedto any other serogroup 6 isolates in the MLST database. How-ever, using eBURST on the entire MLST database showedthat they were within a large clonal complex whose predictedfounder (9) was ST429, where almost all the other isolateswere serogroup 23 (data not shown). These serotype 6B iso-lates therefore appear to have arisen from a serogroup 23isolate by the replacement of its cps region with that from aserotype 6B isolate. Similarly, ST460 and ST529 (serotype 6A)were SLVs of each other and had the same cps profile (1-1-1)but were within a clonal complex whose predicted founder(ST97), and most SLVs, were serotype 10A, suggesting theintroduction of serotype 6A cps sequences into a serotype 10Agenetic background. In several other cases there also appearedto have been horizontal transfer of the serotype 6A or 6B cpsgenes into isolates of clonal complexes descended from found-ing genotypes expressing a different capsular serotype.

Variation in cps profiles within clonal complexes. The relat-edness of the serogroup 6 isolates inferred from the dendro-gram showed several examples of isolates with very similargenotypes that differed in serotype, and there were four caseswhere serogroup 6 isolates of the same ST differed in serotype(Fig. 1). This suggested recent changes between serotypes 6Aand 6B occurring within individual clones or clonal complexes.Two of the examples where isolates of the same ST differed inserotype could be attributed to recombination, as the cps pro-files of the serotype 6A and 6B isolates within both ST1094 andST315 were completely different, whereas those in the othertwo STs (ST361 and ST473) differed at only a single nonsyn-onymous site in wciP, and the differences were more likely tobe due to point mutations (see below).

The eBURST algorithm was used to identify nonoverlappinggroups of related STs (clonal complexes) and to predict thefounding ST of each clonal complex and the patterns of evo-lutionary descent of all STs in the clonal complex from thepredicted founding ST (9). The cps profiles and the sequences

8188 MAVROIDI ET AL. J. BACTERIOL.

on February 21, 2021 by guest

http://jb.asm.org/

Dow

nloaded from

Page 9: Evolutionary Genetics of the Capsular Locus of Serogroup 6 … · other STs in the group from the founder, were computed by eBURST and displayed as eBURST diagrams (9). STs that are

at the wciP, wzy, and wzx genes were then mapped onto thepredicted patterns of descent of the isolates to explore theextents and mechanisms of serotype change among closelyrelated isolates.

eBURST identified nine clonal complexes among the 78 STsthat were resolved among the 102 serogroup 6 isolates (Fig. 5),and there were 29 singleton STs that differed in allelic profilefrom all other STs at two or more of the seven MLST loci. Fiveclonal complexes included only two STs, and in all but onecase, the isolates of each ST (which are SLVs of each other)had identical serotypes and cps profiles. The other pair of STs(ST1288 and ST1289) included one serotype 6A isolate withthe cps profile 2-1-8 and a serotype 6B isolate with a divergentclass 2 cps profile (8-7-7). These isolates are therefore veryclosely related in overall genotype, differing by MLST at onlya single locus, but their cps regions are very different, implyinga change from serotype 6A to 6B (or vice versa) due to recom-bination at the cps locus.

One clonal complex included three STs, two of which wereserotype 6B, and the single isolate of each of these STs sharedthe same class 2 cps profile (8-7-7). The third ST (ST315) wasrepresented by one serotype 6B isolate with the cps profile12-7-7 which differed at a single nonsynonymous substitution inwciP and by a serotype 6A isolate with a completely differentcps profile (11-1-4). In the MLST database, there were anadditional 18 isolates of ST315, all but one of which was sero-type 6B, and 12 of the 13 SLVs of ST315 were also serotype 6B.

The founder of this clonal complex was predicted to be ST315(100% bootstrap support) and was almost certainly serotype6B, and the serotype 6A isolate of this ST presumably arose byrecombination at the cps locus.

There were four STs (seven isolates) within a small clonalcomplex whose predicted founder was ST473 (Fig. 5). Apply-ing eBURST to the whole MLST database, combined with theadditional isolates characterized here, confirmed that ST473was the predicted founder (bootstrap support, 99%) and iden-tified five further SLVs of ST473, all of which included onlyserogroup 6 isolates. Three of the four isolates of ST473, andthose of two of its three SLVs, had the same cps profile (2-1-1)and were all serotype 6A (as were all isolates of the additionalSLVs of ST473 in the MLST database). The ancestral state forthis clonal complex was therefore almost certainly serotype 6Awith the cps profile 2-1-1. However, the fourth isolate of ST473(AAU-19) was serotype 6B, and its central cps region differedfrom that of the serotype 6A isolates by only a single nonsyn-onymous substitution in wciP, resulting in a change from the2-1-1 to the 3-1-1 cps profile. The serotype of AAU-19 wasrechecked and confirmed to be 6B. AAU-19 was from Austra-lia, as was one of the serotype 6A isolates of ST473 with theancestral cps profile 2-1-1. The isolate of the third SLV ofST473 (ST399) was also serotype 6B but had a completelydifferent cps profile (4-2-2). Since the cps regions of this sero-type 6B isolate differed from those of the other isolates in thisclonal complex at multiple sites and at each of the three se-

FIG. 5. eBURST groups among the serogroup 6 isolates. The eBURST algorithm (9) was applied to the 102 serogroup 6 isolates. STs that werenot part of clonal complexes (singleton STs) are not shown, except in the case of ST1094, where the three isolates of this ST varied in serotype.The circles represent the individual STs, and the size of the circle indicates the abundance of the ST in the input data. STs linked by a line differat a single MLST locus (i.e., they are SLVs). The isolate name is shown for each ST, followed by the serotype and cps profile. The presence ofthe indel between the wciO and wciN genes is also indicated. Where there are multiple isolates of an ST, the serotypes and cps profiles are thesame unless otherwise indicated. Isolates of the same ST but different serotypes are underlined. ST1095 is an SLV of ST361, as well as of ST171,and taking account of the cps profiles of the isolates, the descent from ST361 to ST1095 (shown by the dotted line) is more parsimonious than thealternative descent from ST171 to ST1095 suggested by the eBURST algorithm. The dashed lines in the ST490 clonal complex identify STs thatare double-locus variants, and taking account of the observed cps profiles, suggest possible alternative pathways of descent where the postulatedintermediate SLVs are lacking.

VOL. 186, 2004 EVOLUTION OF THE SEROGROUP 6 cps LOCUS 8189

on February 21, 2021 by guest

http://jb.asm.org/

Dow

nloaded from

Page 10: Evolutionary Genetics of the Capsular Locus of Serogroup 6 … · other STs in the group from the founder, were computed by eBURST and displayed as eBURST diagrams (9). STs that are

quenced cps loci, it was considered to have arisen by a recom-binational replacement at the cps locus, resulting in a changefrom serotype 6A to 6B.

The ST490 clonal complex included 11 STs and 21 isolates(Fig. 5). ST490 was identified as the founding ST of the ST490clonal complex (bootstrap support, 97%) by using eBURST onthe 102 isolates combined with all other isolates in the MLSTdatabase. Using the combined data set, there were a further 10STs compared to Fig. 5, and with a single exception, all isolateswere serogroup 6. The eight isolates of ST490 among the 102serogroup 6 isolates, as well as two of the SLVs of ST490, wereserotype 6A and had the same cps profile (2-1-1). However, inthis clonal complex there were five STs represented by sero-type 6B isolates that had four different cps profiles that in-cluded both class 1 and class 2 sequences and a mosaic se-quence. All four of these serotype 6B cps profiles differed atmultiple nucleotide sites from each other and from the 2-1-1profile found in all of the serotype 6A isolates within this clonalcomplex. Although it is difficult to provide a single evolution-ary path from ST490 to all of its assumed descendants that isconsistent with the changes in the cps profiles, it is apparentthat changes from serotype 6A to serotype 6B (or from oneserotype 6B cps profile to another) have occurred on at leastfour occasions within this one clonal complex.

The largest clonal complex identified by eBURST amongthe 102 isolates included 28 isolates and 21 STs; the greatmajority of them were serotype 6B and had the cps profile4-2-2. By applying eBURST to the whole MLST database,ST176 was the predicted founder (bootstrap support, 92%) ofthis large clonal complex. Among the isolates of the clonalcomplex, variation in serotype and in cps profile was apparentwithin only one lineage descended from ST171, an SLV ofST176 (Fig. 5). ST361 was predicted to be descended fromST171, and two of the three isolates of this ST were serotype6A (cps profile 2-1-1), and a descendant SLV of ST361 (ST1095)was also serotype 6A and had the same cps profile (Fig. 5). Theother isolate of ST361 (APH-10) was serotype 6B, with the cpsprofile 3-1-1, which differed by only a single nonsynonymoussubstitution in wciP from the cps profile of the serotype 6Aisolates of ST361. This substitution in the wciP gene of APH-10(and in AAU-19 within the ST473 clonal complex, which hasthe same unusual cps profile) is believed to determine whetheran isolate is serotype 6A or 6B (see Discussion). The mostparsimonious explanation for the origin of APH-10 is that theancestral cps profile of this clonal complex (4-2-2) changed byrecombination to 2-1-1, resulting in a change from serotype 6Bto 6A, and that the single substitution in wciP changed a sero-type 6A isolate of ST361 back to serotype 6B, changing its cpsprofile to 3-1-1. Repeat serotyping of APH-10 confirmed thatit was serotype 6B rather than 6A. The serotype 6B isolateAPH-10, and serotype 6A isolates of this ST with the cpsprofile 2-1-1, were recovered in the Philippines.

DISCUSSION

The central cps region provides the most reliable informa-tion about the evolution of serogroups. This region in serotype6A and 6B isolates has a lower percent G�C content than theflanking genes, which suggests that it has been introducedrelatively recently, on an evolutionary timescale, from an un-

known foreign source. The presence in pneumococci of manycompletely nonhomologous polysaccharide polymerases andglycosyl transferases in the serotype-specific region of the cpsloci of the 90 different pneumococcal serotypes or serogroupsalso supports the view that these genes have been introducedon multiple occasions by horizontal gene transfer.

The sequence variation within the central cps region of allserotype 6A and 6B isolates was sufficiently low (�6%) toindicate that the sequences are all derived from a single recentcommon ancestral sequence, although this ancestral sequencemay not necessarily have been present within a pneumococcus.The most likely scenario is that an ancestral serogroup 6 cpssequence was introduced into the pneumococcus from anotherspecies and that this ancestral sequence diverged, through ran-dom genetic drift or under selection for antigenic variationimposed by the host immune system. This resulted in twosubfamilies of cps sequences that produced capsular polysac-charide structures that differed slightly in their immunochem-istry and their rhamnose-ribitol linkages and which we nowrecognize as serotypes 6A and 6B. Although this scenario ex-plains the high levels of sequence similarity within and betweenmost of the serotype 6A and 6B isolates, it does not explain theorigins of the more divergent class 2 cps sequences. The diver-gence between the class 1 and class 2 sequences is substantial(5.4%) and was greater than that found in typical housekeep-ing genes of the pneumococcus (8). Furthermore, the twoclasses of serotype 6B cps sequences are far more divergentfrom each other than class 1 serotype 6B sequences are fromserotype 6A sequences.

Our favored scenario is that the class 2 sequence appearedby an independent introduction from a similar but unknownsource. The class 2 cps sequences were more uniform thanthose of class 1; there were four class 2 cps profiles, and threeof these differed from the predominant (presumably ancestral)sequence (the 8-7-7 cps profile) at only a single nucleotide site.The low level of sequence diversity among class 2 sequences,and the presence of perfect mosaics between class 1 and 2 se-quences, is consistent with the recent appearance of the class 2sequence in the pneumococcus and its recent recombinationwith class 1 sequences. Subsequently, the class 2 sequence hasspread horizontally within the pneumococcal population and isnow found in several serotype 6B isolates that appear fromMLST to be only distantly related.

The introduction of genes encoding a structurally novel cap-sular polysaccharide from a distantly related species should befavored by natural selection, since it produces an antigenicvariant of the pneumococcus against which there is no existingnatural immunity. A more difficult question is whether immuneselection drives the divergence of a single serotype into tworelated serotypes or favors the observed mutational or recom-binational events that appear to have relatively frequentlychanged the serotype of isolates from 6A to 6B or vice versa.Selection for such events would require that the natural anti-body response against the capsule of one of these serotypes isnot fully protective against the subtly different capsule of theother serotype or that serotypes 6A and 6B differ in some otherway that affects transmission to new hosts. There are no robustquantitative measures of the degree of cross protection fromnatural immunity between serotypes 6A and 6B, but studies ofimmunity induced by the conjugate vaccines, which include

8190 MAVROIDI ET AL. J. BACTERIOL.

on February 21, 2021 by guest

http://jb.asm.org/

Dow

nloaded from

Page 11: Evolutionary Genetics of the Capsular Locus of Serogroup 6 … · other STs in the group from the founder, were computed by eBURST and displayed as eBURST diagrams (9). STs that are

serotype 6B but not serotype 6A capsular polysaccharide, in-dicate a significantly less effective antibody response againstserotype 6A isolates than against 6B isolates (30). Also, a studyof hybridoma antibodies elicited by serotype 6B showed thatwhile half of the antibodies bound with equal avidity to, andwere able to opsonize, both 6A and 6B, the other half bound toserotype 6B alone and failed to opsonize serotype 6A isolates(29). Antibodies elicited by one serotype may provide reducedprotection against acquisition or carriage of the other. Naturalselection could therefore favor switches of serotype between6A and 6B or the original divergence that appears to have splitthe ancestral serotype 6 into serotypes 6A and 6B.

There was considerable evidence for recombination at thecps locus. In addition to evidence for the dissemination of theserotype 6A and 6B cps locus into distantly related lineages,resulting in changes of serogroup by recombination, a phenom-enon well documented elsewhere (3–6, 26), there were a sur-prising number of examples of recombination between the cpsregions of serotypes 6A and 6B leading to changes from one ofthe serotypes to the other. Many of the recombinational eventsthat have spread the serotype 6A and 6B capsule betweenstrains appear to be relatively recent; a third of the serogroup6 cps profiles were found in very different genotypes, indicatingthat no sequence changes within the three cps genes had oc-curred since the horizontal-transfer events. Similarly, the mo-saic cps sequences appear to have arisen recently, as no nucle-otide differences have occurred in either the putative donor orrecipient sequences since their formation. These results there-fore provide supporting molecular evidence for changes be-tween serotype 6A and 6B within clonal complexes that wereproposed by Robinson et al. (28)

We analyzed only the central cps region and cannot addressthe typical size of the cps region that is replaced when isolateschange by recombination from serotype 6A to 6B or when theserogroup 6 cps locus spreads horizontally into other lineages.Recombinational crossover points in pneumococci that havechanged serotype have been identified previously, and changesof serotype by recombination appear to often include most orall of the cps locus (4, 5, 25). The majority of the recombina-tional replacements that introduced the class 2 sequences intoother lineages must have been relatively large (at least 4.5 kb),since in all but two isolates the characteristic indel betweenwciN and wciO has been cotransferred along with wciP, wzy,and wzx.

Comparisons of the published complete sequences of the cpsloci of serotype 6A and 6B isolates identify those nonsynony-mous substitutions in the serotype-specific region of the locusthat correlate with an isolate being serotype 6A or 6B. Withinthis region (wciN-wzx) of the serotype 6A and 6B isolates therewere several nonsynonymous substitutions that correlated withserotype. Our examination of sequence variation in a muchlarger set of diverse serotype 6A and 6B isolates establishedthat only one nonsynonymous substitution in wciP correlatedperfectly with serotype; isolates with serine at residue 195 ofWciP were serotype 6A, whereas those with asparagine wereserotype 6B (Fig. 4B). The importance of wciP in determiningserotype was also supported by examination of the class 2 cpssequences that, with one exception, were found in serotype 6Bisolates and that also have asparagine at position 195. The oneclass 2 cps region present in a serotype 6A isolate (ACH-C2)

was a mosaic that appears to have arisen by a recombinationalreplacement that introduced the front half of wciP (introduc-ing the serine residue into WciP) from a serotype 6A isolate.

The most convincing evidence for the key role of residue 195comes from the wciP3 allele, which was found in two serotype6B isolates and was identical to the wciP2 allele of serotype 6Aisolates except at one nucleotide site, which changes residue195. The two isolates (AAU-19 and APH10) were distantlyrelated and appear to have independently changed from sero-type 6A to 6B by the same single-nucleotide change in wciP.In both cases, the unusual serotype 6B isolates were iden-tical by MLST to multiple isolates of serotype 6A and (ex-cept for the single-nucleotide change) had the same wciP-wzy-wzx sequence. Based on these observations, we conclude thatthese isolates independently changed from serotype 6A to 6Bby a point mutation within wciP rather than by recombination.

There is therefore strong evidence that residue 195 withinWciP determines the serotype within serogroup 6, althoughexperimental evidence is required to establish conclusively thatmutagenesis of this single nucleotide in wciP changes the se-rotype. A key role for wciP in determining whether an isolateis serotype 6A or 6B is consistent with the functional assign-ment of its gene product (rhamnosyl transferase), since theircapsular polysaccharides differ only in the nature of the chem-ical linkage of rhamnose to ribitol (15, 16, 27), which is cata-lyzed by a rhamnosyl transferase.

ACKNOWLEDGMENTS

We thank Moon Nahm at the University of Alabama for providingmonoclonal antibodies Hyp6BM1 and Hyp6BM8 and Pat Coan forserotyping and shipping of strains. We are grateful to Angela Bruegge-mann and Dai Griffith at the John Radcliffe Hospital, Oxford, UnitedKingdom, for confirming the serotypes of some of the isolates andStephen Bentley and Julian Parkhill at the Sanger Institute, Cam-bridge, United Kingdom, for providing annotated sequences of theserotype 6A and 6B cps loci. We thank all those who contributedisolates to this study and to the MLST database and Jay Butler andKaren Rudolph for providing replacements for isolates lost in a freezeraccident.

This work was supported by the Wellcome Trust. B.G.S. is a Prin-cipal Research Fellow of the Wellcome Trust.

REFERENCES

1. Arrecubieta, C., R. Lopez, and E. Garcia. 1994. Molecular characterizationof cap3A, a gene from the operon required for the synthesis of the capsuleof Streptococcus pneumoniae type 3: sequencing of mutations responsible forthe unencapsulated phenotype and localization of the capsular cluster on thepneumococcal chromosome. J. Bacteriol. 176:6375–6383.

2. Barrett, B., L. Ebah, and I. S. Roberts. 2002. Genomic structure of capsulardeterminants. Curr. Top. Microbiol. Immunol. 264:137–155.

3. Coffey, T. J., C. G. Dowson, M. Daniels J. Zhou, C. Martin, B. G. Spratt, andJ. M. Musser. 1991. Horizontal transfer of multiple penicillin-binding pro-tein genes, and capsular biosynthetic genes, in natural populations of Strep-tococcus pneumoniae. Mol. Microbiol. 5:2255–2260.

4. Coffey, T. J., M. C. Enright, M. Daniels, J. K. Morona, R. Morona, W.Hryniewicz, J. C. Paton, and B. G. Spratt. 1998. Recombinational exchangesat the capsular polysaccharide biosynthetic locus lead to frequent serotypechanges among natural isolates of Streptococcus pneumoniae. Mol. Micro-biol. 27:73–83.

5. Coffey, T. J., M. Daniels, M. C. Enright, and B. G. Spratt. 1999. Serotype 14variants of the Spanish penicillin-resistant serotype 9V clone of Streptococcuspneumoniae arose by large recombinational replacements of the cpsA-pbp1aregion. Microbiology 145:2023–2031.

6. Coffey, T. J., M. C. Enright, M. Daniels, P. Wilkinson, S. Berron, A. Fenoll,and B. G. Spratt. 1998. Serotype 19A variants of the Spanish serotype 23Fmultiresistant clone of Streptococcus pneumoniae. Microb. Drug Resist. 4:51–55.

7. Dillard, J. P., M. W. Vandersea, and J. Yother. 1995. Characterisation of thecassette containing genes for type 3 capsular polysaccharide biosynthesis inStreptococcus pneumoniae. J. Exp. Med. 181:973–983.

VOL. 186, 2004 EVOLUTION OF THE SEROGROUP 6 cps LOCUS 8191

on February 21, 2021 by guest

http://jb.asm.org/

Dow

nloaded from

Page 12: Evolutionary Genetics of the Capsular Locus of Serogroup 6 … · other STs in the group from the founder, were computed by eBURST and displayed as eBURST diagrams (9). STs that are

8. Enright, M. C., and B. G. Spratt. 1998. A multilocus sequence typing schemefor Streptococcus pneumoniae: identification of clones associated with seriousinvasive disease. Microbiology 144:3049–3060.

9. Feil, E. J., B. Li, D. M. Aanensen, W. P. Hanage, and B. G. Spratt. 2004.eBURST: Inferring patterns of evolutionary descent among clusters of re-lated bacterial genotypes from multilocus sequence typing data. J. Bacteriol.186:1518–1530.

10. Garcia, E., D. Llull, R. Munoz, M. Mollerach, and R. Lopez. 2000. Currenttrends in capsular polysaccharide biosynthesis of Streptococcus pneumoniae.Res. Microbiol. 151:429–435.

11. Guidolin, A., J. K. Morona, R. Morona, D. Hansman, and J. C. Paton. 1994.Nucleotide sequence analysis of genes essential for capsular polysaccharidebiosynthesis in Streptococcus pneumoniae type 19F. Infect. Immun. 62:5384–5396.

12. Henrichsen, J. 1995. Six newly recognized types of Streptococcus pneu-moniae. J. Clin. Microbiol. 33:2759–2762.

13. Iannelli, F., B. J. Pearce, and G. Pozzi. 1999. The type 2 capsule locus ofStreptococcus pneumoniae. J. Bacteriol. 181:2652–2654.

14. Jiang, S. M., L. Wang, and P. R. Reeves. 2001. Molecular characterisation ofStreptococcus pneumoniae type 4, 6B, 8, and 18C capsular polysaccharidegene clusters. Infect. Immun. 69:1244–1255.

15. Kamerling, J. P. 2000. Pneumococcal polysaccharides: a chemical view, p.81–114. In A. Tomasz (ed.), Streptococcus pneumoniae: molecular biologyand mechanisms of disease. Mary Ann Liebert, Inc., Larchmont, N.Y.

16. Kenne, L., B. Lindberg, and J. K. Madden. 1979. Structural studies of thecapsular antigen from Streptococcus pneumoniae type 26. Carbohydr. Res.73:175–182.

17. Kolkman, M. A., W. Wakarchuk, P. J. Nuijten, and B. A. van der Zeijst.1997. Capsular polysaccharide synthesis in Streptococcus pneumoniae sero-type 14: molecular analysis of the complete cps locus and identification ofgenes encoding glycosyltransferases required for the biosynthesis of thetetrasaccharide subunit. Mol. Microbiol. 26:197–208.

18. Kolkman, M. A., B. A. van der Zeijst, and P. J. Nuijten. 1998. Diversity ofcapsular polysaccharide synthesis gene clusters in Streptococcus pneumoniae.J. Biochem. 123:937–945.

19. Kumar, S., K. Tamura, and M. Nei. 2004. MEGA3: integrated software formolecular evolutionary genetics analysis and sequence alignment. Brief.Bioinform. 5:150–163.

20. Llull, D., R. Lopez, E. Garcia, and R. Munoz. 1998. Molecular structure of

the gene cluster responsible for the synthesis of the polysaccharide capsule ofStreptococcus pneumoniae type 33F. Biochim. Biophys. Acta 1443:217–224.

21. Llull, D., R. Munoz, R. Lopez, and E. Garcia. 1999. A single gene (tts)located outside the cap locus directs the formation of Streptococcus pneu-moniae type 37 capsular polysaccharide. Type 37 pneumococci are natural,genetically binary strains. J. Exp. Med. 190:241–251.

22. Morona, J. K., R. Morona, and J. C. Paton. 1997. Characterization of thelocus encoding the Streptococcus pneumoniae type 19F capsular polysaccha-ride biosynthetic pathway. Mol. Microbiol. 23:751–763.

23. Morona, J. K., R. Morona, and J. C. Paton. 1997. Molecular and geneticcharacterization of the capsule biosynthesis locus of Streptococcus pneu-moniae type 19B. J. Bacteriol. 179:4953–4958.

24. Munoz, R., M. Mollerach, R. Lopez, and E. Garcia. 1999. Characterisation ofthe type 8 capsular gene cluster of Streptococcus pneumoniae. J. Bacteriol.181:6214–6219.

25. Ramirez, M., and A. Tomasz. 1998. Molecular characterisation of the com-plete 23F capsular polysaccharide locus of Streptococcus pneumoniae. J.Bacteriol. 180:5273–5278.

26. Ramirez, M., and A. Tomasz. 1999. Acquisition of new capsular genes amongclinical isolates of antibiotic-resistant Streptococcus pneumoniae. Microb.Drug Resist. 5:241–246.

27. Rebers, P. A., and M. Heidelberger. 1961. The specific polysaccharide of typeVI pneumococcus. II. The repeating unit. J. Am. Chem. Soc. 83:3056–3059.

28. Robinson, D. A., D. E. Briles, M. J. Crain, and S. K. Hollingshead. 2002.Evolution and virulence of serogroup 6 pneumococci on a global scale. J.Bacteriol. 184:6367–6375.

29. Sun, Y., Y. Hwang, and M. H. Nahm. 2001. Avidity, potency, and cross-reactivity of monoclonal antibodies to pneumococcal capsular polysaccha-ride serotype 6B. Infect. Immun. 69:336–344.

30. Vakevainen, M., C. Eklund, and J. Eskola. 2001. Cross-reactivity of antibod-ies to type 6B and 6A polysaccharide of Streptococcus pneumoniae, evoked bypneumococcal conjugate vaccines. J. Infect. Dis. 184:789–793.

31. van Selm, S., M. A. Kolkman, B. A. van der Zeijst, K. A. Zwaagstra, W.Gaastra, and J. P. van Putten. 2002. Organization and characterization ofthe capsule biosynthesis locus of Streptococcus pneumoniae serotype 9V.Microbiology 148:1747–1755.

32. van Selm, S., L. M. van Cann, M. A. Kolkman, B. A. van der Zeijst, and J. P.van Putten. 2003. Genetic basis for the structural difference between Strep-tococcus pneumoniae serotype 15B and 15C capsular polysaccharides. Infect.Immun. 71:6192–6198.

8192 MAVROIDI ET AL. J. BACTERIOL.

on February 21, 2021 by guest

http://jb.asm.org/

Dow

nloaded from