prophages and bacterial genomics what have we

24
Molecular Microbiology (2003) 49(2), 277–300 doi:10.1046/j.1365-2958.2003.03580.x © 2003 Blackwell Publishing Ltd Blackwell Science, LtdOxford, UKMMIMolecular Microbiology1365-2958Blackwell Publishing Ltd, 200349 2277300Review ArticleProphage genomicsS. Casjens Accepted 3 April, 2003. *For correspondence. E-mail [email protected]; Tel. (+1) 801 581 5980; Fax (+1) 801 581 3607. MicroReview Prophages and bacterial genomics: what have we learned so far? Sherwood Casjens Department of Pathology, University of Utah Medical School, 30 North 1900 East, Salt Lake City, UT 84132- 2501, USA. Epigraph There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact. Mark Twain 1883 Life on the Mississippi Summary Bacterial genome nucleotide sequences are being completed at a rapid and increasing rate. Integrated virus genomes (prophages) are common in such genomes. Fifty-one of the 82 such genomes pub- lished to date carry prophages, and these contain 230 recognizable putative prophages. Prophages can constitute as much as 10–20% of a bacterium’s genome and are major contributors to differences between individuals within species. Many of these prophages appear to be defective and are in a state of mutational decay. Prophages, including defective ones, can contribute important biological properties to their bacterial hosts. Therefore, if we are to com- prehend bacterial genomes fully, it is essential that we are able to recognize accurately and understand their prophages from nucleotide sequence analysis. Analysis of the evolution of prophages can shed light on the evolution of both bacteriophages and their hosts. Comparison of the Rac prophages in the sequenced genomes of three Escherichia coli strains and the Pnm prophages in two Neisseria meningitidis strains suggests that some prophages can lie in res- idence for very long times, perhaps millions of years, and that recombination events have occurred between related prophages that reside at different locations in a bacterium’s genome. In addition, many genes in defective prophages remain functional, so a significant portion of the temperate bacteriophage gene pool resides in prophages. Prophage biology The genomes of cellular organisms are often littered with both functional and defunct viral chromosomes. For exam- ple, the human genome is about 8% retrovirus genes (Lander et al ., 2001), and some bacterial genomes may be composed of as much as 20% bacteriophage genes (Casjens et al ., 2000). Clearly, in order to understand these genomes completely, we must be able to recognize these viral genes and understand any effects they may have on the host cells. Bacteriophages, the viruses that infect bacteria, are extremely varied. Different types of phage virions may carry single- or double-stranded (ds)DNA or RNA, and the details of their replication cycles reflect this diversity. The dsDNA phages, the subject of this review, can be grossly divided into lytic and temperate virus groups, each of which is extremely diverse. Lytic dsDNA phages infect bacterial cells and always programme the synthesis of progeny virions, which are then released from the dead, infected cell. Temperate dsDNA phages, on the other hand, although they are able to propagate lytically under some circumstances, are also able to establish a stable relationship with their host bacteria in which the phage DNA is replicated in concert with the host’s chromosome, and virus genes that are detrimental to the host are not expressed. This long-term, apparently benign, association of bacteriophages with bacterial cells was first described in the 1920s (Gildmeister and Herzberg, 1924; Bail, 1925; Bordet, 1925), but its acceptance and an understanding of the real nature of this association took many years (Lwoff, 1953; 1966). Subsequent work has shown that, during this association, the phage DNA (now called the ‘prophage’) is usually physically integrated into one of the native replicons of the host (Campbell, 1962; Freifelder and Meselson, 1970); however, a few phages, such as P1,

Upload: miguel-angelo

Post on 22-Oct-2014

41 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Prophages and Bacterial Genomics What Have We

Molecular Microbiology (2003)

49

(2), 277–300 doi:10.1046/j.1365-2958.2003.03580.x

© 2003 Blackwell Publishing Ltd

Blackwell Science, LtdOxford, UKMMIMolecular Microbiology1365-2958Blackwell Publishing Ltd, 200349

2277300

Review Article

Prophage genomicsS. Casjens

Accepted 3 April, 2003. *For correspondence. [email protected]; Tel. (

+

1) 801 581 5980; Fax (

+

1)801 581 3607.

MicroReview

Prophages and bacterial genomics: what have we learned so far?

Sherwood Casjens

Department of Pathology, University of Utah Medical School, 30 North 1900 East, Salt Lake City, UT 84132-2501, USA.

Epigraph

There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact.

Mark Twain 1883

Life on the Mississippi

Summary

Bacterial genome nucleotide sequences are beingcompleted at a rapid and increasing rate. Integratedvirus genomes (prophages) are common in suchgenomes. Fifty-one of the 82 such genomes pub-lished to date carry prophages, and these contain 230recognizable putative prophages. Prophages canconstitute as much as 10–20% of a bacterium’sgenome and are major contributors to differencesbetween individuals within species. Many of theseprophages appear to be defective and are in a stateof mutational decay. Prophages, including defectiveones, can contribute important biological propertiesto their bacterial hosts. Therefore, if we are to com-prehend bacterial genomes fully, it is essential thatwe are able to recognize accurately and understandtheir prophages from nucleotide sequence analysis.Analysis of the evolution of prophages can shed lighton the evolution of both bacteriophages and theirhosts. Comparison of the Rac prophages in thesequenced genomes of three

Escherichia coli

strainsand the Pnm prophages in two

Neisseria meningitidis

strains suggests that some prophages can lie in res-idence for very long times, perhaps millions of years,and that recombination events have occurred

between related prophages that reside at differentlocations in a bacterium’s genome. In addition, manygenes in defective prophages remain functional, so asignificant portion of the temperate bacteriophagegene pool resides in prophages.

Prophage biology

The genomes of cellular organisms are often littered withboth functional and defunct viral chromosomes. For exam-ple, the human genome is about 8% retrovirus genes(Lander

et al

., 2001), and some bacterial genomes maybe composed of as much as 20% bacteriophage genes(Casjens

et al

., 2000). Clearly, in order to understandthese genomes completely, we must be able to recognizethese viral genes and understand any effects they mayhave on the host cells.

Bacteriophages, the viruses that infect bacteria, areextremely varied. Different types of phage virions maycarry single- or double-stranded (ds)DNA or RNA, and thedetails of their replication cycles reflect this diversity. ThedsDNA phages, the subject of this review, can be grosslydivided into lytic and temperate virus groups, each ofwhich is extremely diverse. Lytic dsDNA phages infectbacterial cells and always programme the synthesis ofprogeny virions, which are then released from the dead,infected cell. Temperate dsDNA phages, on the otherhand, although they are able to propagate lytically undersome circumstances, are also able to establish a stablerelationship with their host bacteria in which the phageDNA is replicated in concert with the host’s chromosome,and virus genes that are detrimental to the host are notexpressed. This long-term, apparently benign, associationof bacteriophages with bacterial cells was first describedin the 1920s (Gildmeister and Herzberg, 1924; Bail, 1925;Bordet, 1925), but its acceptance and an understandingof the real nature of this association took many years(Lwoff, 1953; 1966). Subsequent work has shown that,during this association, the phage DNA (now called the‘prophage’) is usually physically integrated into one of thenative replicons of the host (Campbell, 1962; Freifelderand Meselson, 1970); however, a few phages, such as P1,

Page 2: Prophages and Bacterial Genomics What Have We

278

S. Casjens

© 2003 Blackwell Publishing Ltd,

Molecular Microbiology

,

49

, 277–300

N15, LE1,

f

20 and

f

BB-1, are not integrated and exist ascircular or linear plasmids (Ikeda and Tomizowa, 1968;Ravin and Shulga, 1970; Inal and Karunakaran, 1996;Eggers

et al

., 2000; Girons

et al

., 2000). Different individ-uals of a given integrating temperate phage always havethe same unique integration site on the phage chromo-some, but may or may not always integrate their DNA atprecisely the same site in the bacterial chromosome. In

Escherichia coli

, for example, phage

l

DNA normally inte-grates at only one site, phage P2 DNA can quite readilyintegrate into at least 10 sites (Barreiro and Haggard-Ljungquist, 1992), and phage Mu DNA integrates essen-tially randomly into host DNA (Harshey, 1988). Bacte-riophage virions can be released from cells containing anintact prophage by a process called induction, duringwhich prophage genes required for lytic growth are turnedon and progeny virions are produced and released fromthe cell. Cells carrying a prophage are called ‘lysogens’because of this potential to induce and lyse. Induction canhappen spontaneously and randomly in a small fractionof the bacteria that harbour a given prophage, or specificenvironmental signals can cause simultaneous inductionof a particular prophage in many cells. A number of theimportant ‘model system’ dsDNA tailed phages were firstdiscovered after they were released from lysogenic bac-teria in the laboratory; for example, phages

l

(Lederberg,1951), P22 (Zinder and Lederberg, 1952), P1 and P2 fromthe same

E. coli

strain (Bertani, 1951), P4 (Six, 1963) andN15 (Ravin, 1968) were originally isolated in this manner.Most genes, including those required for lytic growth andvirion production, are turned off in integrated prophagesbut, in the few studied cases, plasmid prophages typicallyexpress most of their non-lysis, non-virion assemblygenes. Some of the genes that are expressed from theprophage in a lysogen are ‘lysogenic conversion’ genes,which alter the properties of the host bacterium. The prod-ucts of these genes can have very important effects onthe host bacterium, which range from protection againstfurther phage infection to increasing the virulence of apathogenic host. This subject has been frequently andrecently reviewed and will not be covered in depth here(see Bishai and Murphy, 1988; Cheetham and Katz, 1995;Waldor, 1998; Miao and Miller, 1999; Boyd

et al

., 2001;Banks

et al

., 2002; Boyd and Brussow, 2002; Wagner andWaldor, 2002; Casjens and Hendrix, 2003). The presenceor absence of prophages can account for a large fractionof the variation among individuals within a bacterial spe-cies, and phages are likely to be important vehicles forhorizontal transfer of genetic information between bacteria(Ohnishi

et al

., 2001; Banks

et al

., 2002; Casjens andHendrix, 2003). Clearly, in order fully to understand theinformation in bacterial whole-genome nucleotidesequences, it is essential that we be able to recognize andunderstand prophages when they are present. The med-

ical and evolutionary importance of prophages makes thisall the more urgent.

Types of prophages and related entities

Fully functional prophages can induce a round of lyticgrowth to initiate; however, not all prophage-like entitiesin bacterial genomes encode functional bacteriophages.Four additional types of prophage-related entities havebeen characterized: defective and satellite prophages,bacteriocins and gene transfer agents. (i) Defective proph-ages (sometimes called ‘cryptic prophages’, although intheory this term could include fully functional prophagesthat have never been induced to lytic growth) are proph-ages that are in a state of mutational decay. Although theymay still harbour functional genes, defective prophagesare unable to programme the full phage replication cycle(reviewed by Campbell, 1994; 1996). Several defectiveprophages in

E. coli

K-12, Rac (Kaiser and Murray, 1979),e14 (Greener and Hill, 1980), DLP12 (Lindsey

et al

.,1989) and QIN (Espion

et al

., 1983) (Table1) and in

Bacil-lus subtilis

, 186 (PBSX; Krogh

et al

., 1996) and SKIN(Takemaru

et al

., 1995; Mizuno

et al

., 1996), were discov-ered before genomic sequencing became possible andhave been studied in some detail. Each of these harbourssome functional genes. For example, Rac encodes theRecE homologous recombination system (Kaiser andMurray, 1979), QIN harbours intact cell lysis genes(Espion

et al

., 1983), and PBSX encodes the synthesis ofa virion-like particle (Okamato

et al

., 1968). (ii) Satellitephages are otherwise functional phages that do not carrytheir own virion structural protein genes, and have chro-mosomes that have been evolutionarily designed to beencapsidated by the virion proteins of other specificphages. The best understood example of such a parasiticrelationship is that between satellite phage P4 and fullyfunctional phage P2 (see also Ruzin

et al

., 2001). P4carries genes that encode proteins that replicate its ownDNA, which turn on the virion protein genes of the P2prophage and modify the P2 head to be smaller and onlyable to accommodate the smaller P4 chromosome (Ber-tani and Six, 1988). (iii) Some bacteria produce bacterio-cins (devices that kill other bacteria) that resemble phagetails (e.g. Gratia, 1989; Thaler

et al

., 1995; Zink

et al

.,1995; Nguyen

et al

., 1999; Nakayama

et al

., 2000). Twoof these that have been characterized, the type F and Rbacteriocins of

Pseudomonas aeruginosa

PAO1, are sim-ilar to phage

l

tails and phage P2 tails respectively(Nakayama

et al

., 2000). The gene clusters encodingthem have nearly complete sets of

l

and P2 tail genehomologues in nearly the same order as they are foundin those phages. (iv) Finally, gene transfer agents (GTAs)are encoded by some bacterial genomes (Yen

et al

., 1979;Starich

et al

., 1985; Rapp and Wall, 1987; Humphrey

Page 3: Prophages and Bacterial Genomics What Have We

Prophage genomics

279

© 2003 Blackwell Publishing Ltd,

Molecular Microbiology

,

49

, 277–300

et al

., 1997). GTAs are tailed phage-like particles thatencapsidate random fragments of the bacterial genome.These particles cannot propagate as viruses, as the vastmajority of the particles do not carry the genes thatencode the GTA and, in the cases that have been studied,those that do contain a DNA fragment that is too short toinclude the full set of GTA genes. These virion-like parti-cles can deliver their DNA payload into another bacteriumof the same species, where the DNA can replace theresident cognate chromosomal region by homologousrecombination. The best characterized GTA is encoded bya cluster of genes on the

Rhodobacter capsulatus

chro-mosome (Lang

et al

., 2000; Lang and Beatty, 2001).Although not all the proteins encoded by the genes in thisGTA cluster have been characterized in detail, the numberof genes involved make it likely that it will contain thegenes for the structural components of the virion-like par-ticles and little else. Do the tail-like bacteriocins and GTAshave a positively selected function or are they simplydefective prophages that happen by chance to be able toperform these functions that serve no important purposefor the host? There are several arguments for such a

selected function. (i) They are often universally present inspecies that harbour them. The

Brachyspira hyodysente-riae

GTA has been found in every isolate of that speciesthat has been examined (T. Stanton and G. Thompson,personal communication), as has the

R. capsulatus

GTA(Wall

et al

., 1975), and the F and R bacteriocins werepresent in all of the nine

P. aeruginosa

strains examined(Nakayama

et al

., 2000). (ii) They do not appear to be ina state of evolutionary decay as pseudogenes (used hereto mean any mutationally inactivated gene) have not beenidentified within them; and (iii) expression of their genesappears to be regulated differently from the phages towhich they are related (Nakayama

et al

., 2000). In spiteof this accumulated knowledge, it is often not possible todistinguish among functional prophages and these proph-age-like entities by simply examining their nucleotidesequences. For example, a tail gene cluster in a bacterialchromosome could encode a bacteriocin or simply bewhat remains of a partly deleted prophage. Induced PBSXencapsulates host DNA, and its virion-like particles kill

B.subtilis

cells that do not carry PBSX (McDonnell

et al

.,1994) (it has not been demonstrated to be able to trans-

Table 1.

Prophages in three

E. coli

genomes.

E. coli

K-12

a

E. coli

O157 EDL933

a

E. coli

O157 Sakai

a

Phage type

CP4-6

b

CP-933I, CP-933H Sp1, Sp2 Lambdoid, P4-likeDLP-12 – – Lambdoid

l

c

CP-933K Sp3 Lambdoid– CP-933M Sp4 Lambdoid– 933W Sp5 Lambdoid– CP-933N Sp6 Lambdoid– CP-933C Sp7 Unstudied typee14 CP-933X (2?) Sp8 Lambdoid– CP-933O (2–4) Sp9 LambdoidRac CP-933R Sp10 LambdoidQIN CP-933P Sp11, Sp12 Lambdoid– CP-933T Sp13 Somewhat P2-like– CP-933U Sp14 LambdoidCP4-44

b

– SpLE2 UnclearPR-X – – P2-like, highly deleted– CP-933V Sp15 LambdoidCPS-53

d

CP-22

d

Sp16 P22-like, highly deletedEut

c

– – P22-like, highly deletedCP4-57

b

CP-933Y Sp17 Lambdoid– – Sp18 Mu-like

a.

Each row represents a different integration site. In some cases (e.g. QIN site), rearrangements have made it difficult to tell whether they haveidentical attachment sites. This list was compiled from the following publications and references therein: Blattner

et al

. (1997); Rudd (1999);Hayashi

et al

. (2001b); Ohnishi

et al

. (2001); Perna

et al

. (2001; 2002). Elements are listed in order clockwise around the standard

E. coli

map;see TableS1 in

Supplementary material

for a list of the genes thought to lie within each prophage. Only

l

and 933W have been shown to befully functional phage genomes. Duplicate morphogenesis functions suggest that CP-933X may be evolved from two original prophages. Thecorrespondence between Sp9 and Sp11

+

Sp12 and CP-933O and CP-933P, respectively, is complex because of an inversion in the EDL933strain lineage that involved these prophages, and other rearrangements that have occurred among the prophages (Perna

et al

., 2002).

b.

These elements are possibly phage derived, but do not carry any uniquely phage-derived genes. CP4-6, CP4-44 and CP4-57 of K-12 andSpLE2 of Sakai are probably phage derived, but convincing proof of this is lacking (see text); in Sakai, SpLE1 and SpLE4, not shown in thistable, have some similarity to the CP4 elements (Blattner

et al

., 1997; Rudd, 1999; Hayashi

et al

., 2001b). The CP4 elements are not closelyrelated to the prophages at the same location in the other strains.

c.

Phage

l

was cured from the sequenced version of

E. coli

K-12 (Blattner

et al

., 1997); Eut [also called CPZ-55 (Rudd, 1999) and ‘CP-unnamed’(Hayashi

et al

., 2001b)] is missing from some extant K-12 laboratory strains (Kofoid

et al

., 1999); Rac and e14 are also excisable (Evans

et al

.,1979; Brody

et al

., 1985).

d.

CP-22 is a provisional name for a region not formally identified as a prophage by Perna

et al

. (2001). CPS-53 (Rudd, 1999) has also beencalled KpLE1.

Page 4: Prophages and Bacterial Genomics What Have We

280

S. Casjens

© 2003 Blackwell Publishing Ltd,

Molecular Microbiology

,

49

, 277–300

duce other bacteria with its packaged DNA, but thisremains a possibility). Is PBSX a GTA, a bacteriocin or adecaying prophage? Because of such current unknow-ables, in this discussion I will usually not attempt to dis-tinguish fully functional prophages from defectiveprophages, satellite prophages, GTAs or phage-like bac-teriocins and will include them all within the term ‘proph-age.’ I will only consider the temperate dsDNA-tailedphages of bacteria, although temperate phages withssDNA containing filamentous virions are known that inte-grate as dsDNA prophages (Waldor and Mekalanos,1996; Chang

et al

., 1998; Davis

et al

., 1999; Lin

et al

.,2001; da Silva

et al

., 2002), and not yet well-studied lyticand temperate dsDNA tailed phages that infect Archaeaare known (for example, see Pfister

et al

., 1998; Klein

et al

., 2002; Tang

et al

., 2002).

Prophage abundance

Should we expect prophages to be present in bacterialgenome sequences and, if so, how many? In addition tothe anecdotal observation that many of the phages cur-rently under study were isolated after their release fromlysogenic bacteria, more systematic studies have indi-cated that prophages can be very common. Osawa

et al

.(2000) found that 51 different functional phages werereleased from 27

E. coli

strains, and Schicklmaier

et al

.(1998) found that 83 of 107

E. coli

strains released at leastone functional phage type. Schmieger and coworkers(Schicklmaier

et al

., 1998; Schmieger and Schicklmaier,1999) examined 173

Salmonella enterica

(serovar Typh-imurium) isolates and found that 136 released functionalphages. Indeed, the LT2 isolate of

S. enterica

that iscommonly used in laboratory studies carries four intact,fully functional prophages (Yamamoto, 1967; 1969;Figueroa-Bossi and Bossi, 1999; McClelland

et al

., 2001).Mitomycin C was found to induce the synthesis of func-tional phages from seven of 170

Yersinia

strains (Popp

et al

., 2000) and phages or phage-like particles from 38of 68 Gram-positive dairy

Streptococcus

strains (Hugginsand Sandine, 1977). Of course, all such searches find aminimum number of functional prophages, as they dependupon successful induction and use of permissive indicatorstrains.

Other studies have asked about the presence of partic-ular prophage features in multiple isolates of the samebacterial species. In the

E. coli

chromosome, the attach-ment site of the

l-like (lambdoid) phage 21 is occupiedby phage-like sequences in 28 of 77 strains examined(Wang et al., 1997), the lambdoid phage Atlas attachmentsite is occupied in 23 of 72 strains examined (Milkmanand Bridges, 1990; Sandt and Hill, 2000), and four of 33strains examined have something (probably l-like in twocases) inserted at the phage l attachment site (Kuhn and

Campbell, 2001). Hybridization of DNA from various bac-terial strains with authentic phage or prophage DNAprobes has shown that related prophages are oftenpresent in a substantial fraction of other isolates of thesame species [a few of the many such analyses are asfollows: Gram-negative enterobacteria (Anilionis et al.,1980; Lindsey et al., 1989; Faubladier and Bouche, 1994;Agron et al., 2001), Wolbachia (Masui et al., 2000) andHaemophilus (Chang et al., 2000); spirochaete Borrelias(Casjens et al., 1997); Gram-positive Streptococcus(Ramirez et al., 1999; Beres et al., 2002; Smoot et al.,2002) and diphtheria-causing Corynebacterium (Pappen-heimer and Murphy, 1983)]. Finally, a substantial fractionof searches for strain-specific bacterial sequences for usein the typing of related bacterial isolates have foundprophage sequences [e.g. enterobacteria (Emmerthet al., 1999; McClelland et al., 2000), Campylobacter (Depet al., 2001), Neisseria (Klee et al., 2000), and Lactoba-cillus (Brandt et al., 2001)]. Clearly, prophages are com-mon in many, widely diverse bacterial species.

A plethora of putative prophages in bacterial genome sequences

In spite of this anecdotal evidence that prophages can becommon, their abundance in bacterial genome sequencescame as a bit of a surprise to many microbiologists. In the14 published g-Proteobacteria genomes, the bacterialphyla with phages that are the best studied and in whichprophages are therefore most easily recognized, the num-ber of convincing prophages is high. Eleven of thesegenomes, those of S. enterica serovars Typhi and Typh-imurium, two Yersinia pestis strains, Shigella flexneri, twoXylella fastidiosa strains and four E. coli strains each carrybetween seven and 20 prophages (Blattner et al., 1997;Simpson et al., 2000; Hayashi et al., 2001a; McClellandet al., 2001; Parkhill et al., 2001a,b; Perna et al., 2001;Deng et al., 2002; Jin et al., 2002; Welch et al., 2002; VanSluys et al., 2003), and the Shewanella oneidensis, Xan-thomonas axonopodis and Xanthomonas campestrisgenomes contain three, two and one recognized proph-ages respectively (Heidelberg et al., 2002; da Silva et al.,2002). Bacteria from other phyla also often harbour mul-tiple prophages. For example, among the Gram-positivebacteria, the sequenced genomes of B. subtilis, Clostrid-ium acetobutylicum, Clostridium perfringens, Clostridiumtetani, Lactococcus lactis, Listeria innocua, Listeria mono-cytogenes, Staphylococcus aureus and Streptococcuspyogenes strains all carry multiple, easily recognizableand, in many cases, largely intact prophages (Kunst et al.,1997; Bolotin et al., 2001; Ferretti et al., 2001; Glaseret al., 2001; Kuroda et al., 2001; Nolling et al., 2001;Beres et al., 2002; Shimizu et al., 2002; Smoot et al.,2002; Bruggemann et al., 2003). The phages that infect

Page 5: Prophages and Bacterial Genomics What Have We

Prophage genomics 281

© 2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277–300

B. subtilis and L. lactis are the best studied in this ratherdiverse group. B. subtilis 186 contains three very convinc-ing and largely intact prophages plus at least two smallerpossible prophage remnants. Of its three unambiguousprophages, one, SPb, is a fully functional 134kbp phagegenome (Lazarevic et al., 1999; it is the largest knowntemperate phage), whereas the other two, PBSX andSKIN, are defective (Krogh et al., 1996; Mizuno et al.,1996). At least two of the six L. lactis IL1403 prophagesare fully functional (Chopin et al., 1989; 2001). Prophagescan make up a significant fraction of these genomes; E.coli O157 Sakai’s 18 recognized prophages make upabout 12% of its chromosome (Ohnishi et al., 2001), andthe six prophages in Streptococcus pyogenes M3 MGAS315 make up about 12% of its chromosome (Beres et al.,2002). Phages of other phyla have been studied in lessdetail, but the spirochaete Borrelia burgdorferi B31’s mul-tiple plasmid prophages may constitute as much as 20%of its genome (Casjens et al., 2000). I emphasize that,although it is clear that many prophages are present inbacterial genomes, our current knowledge is far from com-plete, and some of the interpretations made here mayhave to be revised in the future. Although prophages arecommon in bacterial genomes, they have not been foundin every individual or in every species. Among the 82currently published and annotated bacterial genomesequences, 51 harbour apparent prophages and, of these,all but two have integrated prophages. At least 230 proph-ages are currently recognizable in these 51 genomes.These prophages are listed, along with the genes thatthey encompass in TableS1 in the Supplementary mate-rial. As even the most conserved phage-specific genes(below) are not always recognizable with current methodsor might have been deleted, this is a minimum estimate,especially in bacterial phyla in which phages have notbeen studied in detail. The 31 bacterial genome

sequences that contain no recognized prophages arelargely clustered at the lower end of the bacterial genomesize range (Fig.1). Two of the smallest genomes that haveprophages, B. burgdorferi B31 and Chlamydia pneumo-niae AR39, are ‘exceptions that prove the rule’, in that theprophages they harbour are plasmids (Casjens et al.,2000; Read et al., 2000). The absence of integratedprophages in small-genome bacteria could reflect the evo-lutionary pressure to remove non-essential chromosomalDNA that led to the reduction in the size of their genomes(Lawrence et al., 2001). A few of the larger bacterialgenome sequences, for example those of the high G+CGram-positive bacteria such as Mycobacterium (4.4mbp)and Streptomyces (9.07mbp) have relatively few convinc-ing prophages (Fleischmann et al., 2002; Bentley et al.,2002). In addition, P. aeruginosa PAO1 (6.3mbp) carriesonly two tail-like bacteriocins, and Sinorhizobium meliloti1021 (6.7mbp) has no recognized prophages (Stoveret al., 2000; Galibert et al., 2001). In some cases, temper-ate phages that infect these bacteria are known, makingit less likely (but not impossible) that prophages arepresent in the genomes but remain unrecognized. Forexample, temperate phage fC31 of Streptomyces hasbeen characterized (Smith et al., 1999), and P. aeruginosaphages are known that are similar to the well-studied E.coli phages l and P2 [e.g. phages D3 (Kropinski, 2000)and fCTX (Nakayama et al., 1999) respectively]. Perhapssome bacteria have devised mechanisms to avoid suchparasites or, by chance, individuals with no integratedphage genomes were chosen for sequencing. It shouldalso be noted that, if laboratory bacterial growth condi-tions cause frequent induction of a resident prophage, thiswill impose an artificial selection for derivatives that havelost the prophage. This has apparently happened for theprophages Gifsy-1 and Gifsy-2 in some laboratory strainsof S. enterica LT2 (Bunny et al., 2002).

Fig. 1. Putative prophages in sequenced bac-terial genomes. The number of recognizable prophages in each of the 82 published bacterial genome sequences is indicated. Closed circles represent genomes with only integrated proph-ages, and open circles indicate genomes with prophage plasmids (Borrelia burgdorferi B31, 12 prophages; Chlamydia pneumoniae AR39, one prophage). These probably represent min-imum prophage numbers, as some may not be currently recognizable. The individual proph-ages in each genome sequence are delineated in TableS1 (Supplementary material).

Page 6: Prophages and Bacterial Genomics What Have We

282 S. Casjens

© 2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277–300

The genetic structure of prophages

As nearly 100 complete sequences of fully functionaldsDNA tailed phage genomes have been determined, itmight seem to be a trivial exercise to search for homo-logues of known phage genes in bacterial genomesequences and thus identify prophages; however, thereare confounding factors. The most important of these fac-tors is the extreme diversity of the dsDNA tailed phages(e.g. Casjens et al., 1992; Hendrix et al., 1999). Thephages that infect the enteric bacteria E. coli and Salmo-nella are the most intensively studied. Yet even today, eventhe sequence of a ‘new’ phage that is closely related totheir well-characterized phages is expected to have novelgenes. For example, our recently determined sequence ofthe genome of phage ES18, a typical lambdoid phage thatinfects S. enterica (serovar Typhimurium), has about 20novel genes out of 75 total predicted genes (M. Pedulla,R. Hendrix, G. Hatfull and S. Casjens, unpublished).Prophages in less well-studied bacterial phyla can beexpected to contain a majority of novel genes (e.g. 40 of52 predicted genes in the convincing prophage RadMu inthe Deinococcus radiodurans R1 genome have no knownhomologue; Morgan et al., 2002).

The genomes of most phages that are closely relatedto one another can be described as having a mosaicrelationship, as comparison of any two individuals showspatches of (sometimes very high) sequence similarityseparated by non-homologous regions. The notion thatsuch mosaicism has arisen by horizontal transfer ofgenetic material among the tailed phages has been dis-cussed extensively (Susskind and Botstein, 1978; Bot-stein, 1980; Campbell and Botstein, 1983; Casjens et al.,1992; Campbell, 1994; 1996; Hendrix et al., 1999; Luc-chini et al., 1999; Juhala et al., 2000; Moreira, 2000; Desi-ere et al., 2001; Brussow and Hendrix, 2002; Lawrenceet al., 2002). Such mosaicism is strikingly demonstratedby the relationships among the well-studied phages l,P22 and N15, all of which have historically been includedin the lambdoid phage group. Figure 2 shows that P22and l have similar but mosaically related right halves(early regions) but very different left halves (late operon/

virion protein genes), whereas N15 and l have very sim-ilar left halves and little similarity in their right halves. Acurious result of this is that P22 and N15 are both con-sidered to be lambdoid phages, but they are almost com-pletely non-homologous and only distantly related in theirfew homologous genes (Ravin et al., 2000). The geneticdiversity of phages has only been studied among thosethat infect the Gram-negative g-Proteobacteria and theGram-positive Firmicutes, and these are both far fromattaining ‘saturation’. Nonetheless, comparison of phageswith very similar transcriptional programmes that infect g-Proteobacteria, such as the lambdoid phages of E. coli,phages P22, Gifsy-1, Gifsy-2, Fels1 and ES18 of S. enter-ica (McClelland et al., 2000; Pedulla et al., 2003; S.Casjens, R. Hendrix and M. Pedulla, unpublished), Sf6and SfV of S. flexneri (Allison et al., 2002; S. Casjens, A.J. Clark, W. Inwood and R. Moreno, unpublished), proph-ages XfP1 and XfP2 of X. fastidiosa (Simpson et al.,2000), prophage lSo of Shewanella oneidensis (Heidel-berg et al., 2002) and phage D3 of P. aeruginosa (Kropin-ski, 2000) suggest that exchanges among them havetaken place such that quite similar genes can be presenteven in distantly related phages within this group. Therehave also been recent exchanges of genetic materialbetween very different phages that infect the same host.For example, the E. coli temperate phage l and large lyticphage T4 have tail fibre assembly genes that are similarin sequence and functionally interchangeable (Georgeet al., 1983; Montag and Henning, 1987). Although genescan be exchanged among distantly related phages withthe same host and among phages with different hostspecies, two phages of the same type are more likely (butnot guaranteed) to have a higher proportion of moreclosely related genes if they infect closely related hosts.The lessons for this discussion are that (i) horizontalexchanges are common among the dsDNA tailedphages, so it will not be surprising to find similar mosaicrelationships among prophages that are found in bacterialgenome sequences; and (ii) prophages in the chromo-somes of bacteria that are distantly related to the abovetwo phyla may be very different from known phages andso be much more difficult to recognize.

Fig. 2. Temperate phage genome mosaicism – three ‘unrelated’ lambdoid phages. The genes on phage P22, l and N15 virion chromosomes are shown with rectangles representing genes; grey rectangles are genes that are transcribed rightward and white are transcribed leftward. The three lytic operons are indicated by arrows below each genome. The ends of each phage’s circularly permuted prophage is marked by a black vertical line. Sequence homology is indi-cated by the light grey areas between genomes.

Page 7: Prophages and Bacterial Genomics What Have We

Prophage genomics 283

© 2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277–300

Recognizing prophages in bacterial genome nucleotide sequence

Some, but not all, phage genome sequences per se haveunique properties. For example, some prophages havedifferent G+C contents, oligonucleotide frequencies orcodon usage from their host’s genome, but this type ofanalysis has not progressed to the point that it canunequivocally identify prophage sequences (Blaisdellet al., 1996). We must therefore identify prophages inbacterial genome sequences by the similarity of theirgenes to known phage genes. In spite of the fact that thedsDNA tailed phage genomes encompass an enormousamount of sequence diversity, there are genes that appearto be more highly conserved than others (below). Thesehave and will continue to serve as ‘cornerstones’ for theidentification of prophages in bacterial genomes (therange of diversity makes it imperative that sequencesearches be done at the encoded protein level, and notat the DNA level). It would be useful if the phage genefamilies used to identify new prophages in DNA sequencedid not have non-phage-encoded members that performnon-phage functions, so that the mere presence of suchcornerstone genes can prove that a region of a bacterialgenome is phage derived.

Genes in prophages that do not encode virion component

Should phage genes such as those involved in integration,lysis, regulation of gene expression or DNA replication beconsidered prophage cornerstone genes? Integrases areusually sufficiently conserved to be recognizable, but plas-mid prophages do not integrate, and non-phage elementssuch as plasmids, pathogenicity islands and integrons cancarry integrase genes for their own purposes. Thus,although most temperate phages carry an integrase gene,its presence is neither necessary nor sufficient to provethe existence of a prophage. Phage lysis enzymes areoften true homologues of chicken egg white lysozyme butmay be of other types, such as phage l endolysin orphage amidases, or may have similarity to other polysac-charide-degrading enzymes such as chitinases (Media-villa et al., 2000). These proteins can be quite similar,even among distantly related phages, but some bacteriaencode ‘autolysins’ that are homologues of phage lysisenzymes. Autolysin genes often appear not to be in aprophage context (e.g. Whatmore and Dowson, 1999;Smith et al., 2000), and such enzymes might be used innormal bacterial cell wall remodelling. It is unknownwhether these are ancient prophage relics that have nowbecome useful parts of the bacterial genomes. Every hostand many temperate phages encode their own DNA-binding proteins, nucleases, helicases and/or DNA poly-merases that function in DNA metabolism and regulatory

proteins that control gene expression. The existence ofnon-prophage bacterial homologues to nearly all thesegenes shows that they also do not uniquely mark proph-ages (e.g. Lewis et al., 1998). No host homologue of thetranscriptional antiterminators of the l gene Q family isknown, so these might mark some prophages.

Families of homologous phage genes involved in theabove processes may or may not form discrete phyloge-netic clusters that are separable from their bacterial homo-logues; however, a very close relationship to a bona fidephage gene is likely to signify that a gene in question ispart of a prophage. We will consider two examples, thephage-borne replicon-partitioning proteins and the single-strand DNA-binding proteins (SSBs). The sopA family ofplasmid-partitioning genes on the prophage plasmids ofE. coli phages N15 and P1 are not particularly closerelatives; the N15 SopA protein is 60–75% identical toSopAs encoded by several non-prophage plasmids ofenteric bacteria but is only 25% identical to its phage P1homologue. On the other hand, the S. flexneri lambdoidphage Sf6 SSB protein (S. Casjens, unpublished) is a veryclose relative (93% identity) of the E. coli phage 1639 SSB(GenBank accession no. AJ304858), but is only moder-ately closely related to SSBs of E. coli phage P1 (60%)and the non-phage SSBs of enterobacteria (58–62%); itis only distantly related to SSBs of Gram-positive bacteria(22–30%) and their phages (A118, 29%; FPVL, 32%).Thus, when members of the same gene family are usedin both phage and non-phage contexts, the phage andbacterial genes often do not fall into well-separated lin-eages. On account of these issues, and variation in DNAmetabolism, gene regulation and lysis mechanisms, etc.among phages, the presence of genes for these pro-cesses should be considered as supportive but not suffi-cient evidence for absolute proof of the existence of aprophage.

Virion protein genes as prophage indicator cornerstones

On the other hand, one might expect the genes thatencode proteins involved in building the virion to be uniqueto phages, as bacterial cells are not known to make similarstructures for their own purposes (again here I includeGTAs and tail-like bacteriocins as ‘prophages’), and thisis indeed the case; phage morphogenetic genes usuallydo not have homologues that are known to perform unre-lated functions in other contexts. Therefore, the presenceof genes that are closely related to known phage morpho-genetic genes in a bacterial genome is, at our currentstate of knowledge, a virtually unassailable indication ofa prophage.

The icosahedral heads of the different tailed phages areextremely similar in physical appearance, although theydo have different sizes and some are elongated. Similarly,

Page 8: Prophages and Bacterial Genomics What Have We

284 S. Casjens

© 2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277–300

tails are only known in three general morphotypes – short(e.g. phages P22 and T7), long, contractile (P2 and Mu)and long, non-contractile (l), although details of tail struc-ture sometimes allow recognition of subtypes within thesegeneral tail types (for reviews of phage virion structureand assembly, see Casjens and Hendrix, 1988; Casjens,1997). However, the proteins that build the various struc-turally similar virions are at first glance startlingly diverse.For example, scaffolding protein (required catalytically forhead shell assembly), proteins at the head–tail junctionand proteins at the tail tip/baseplate are very often notrecognizably similar among different phages. Even centralvirion assembly players such as the coat proteins (buildingblock of the icosahedral head shell) are often not recog-nizably similar. For example, the coat proteins of the verywell-studied enterobacterial phages l, P2, P22, HK97, Muand T7 are not recognizably homologous even thoughtheir heads are virtually indistinguishable in appearancein the electron microscope. It is not known whether suchdiversity indicates that these are all truly unrelated pro-teins or whether these proteins are ancient homologuesthat have diverged to the point of having no recognizableamino acid sequence similarity. The recent observationthat HK97 and P22 coat proteins have similar folds sup-ports the latter idea for these two coat proteins (Jian et al.,2003).

Nonetheless, some phage virion assembly proteins aremore highly conserved than others, and homology ofthese genes can often be recognized between phagetypes. These are as follows: (i) the larger of the two sub-units of terminase, the enzyme that cleaves virion-lengthmolecules from concatemeric replicating DNA and isprobably part of the motor that drives DNA into the pre-formed protein capsid; (ii) portal protein, which forms thehole through which DNA is packaged into the capsid andis also part of the packaging motor; (iii) head maturationprotease – the assembly of some but not all phage headsis accompanied by assembly-controlled proteolytic cleav-age of virion proteins; (iv) coat protein (above); (v) theproteins that build the tail shaft; (vi) tail tapemeasure pro-tein, which determines the length of the tail shaft in thelong-tailed phages; and (vii) tail fibres – tail tip proteinsthat make the initial contact between the virion and bac-terial surface. Although the above proteins appear to bemore highly conserved than other virion assembly pro-teins, in no case have all known members of one of thesefunctional protein types been shown to form a single pro-tein sequence family. It is possible that some or all of thesemay coalesce into single groups as more phage genomesequences are determined.

How confident can we be that weak or tenuous matchesto virion assembly genes identify a prophage? The tailfibre proteins and tapemeasure proteins adopt extended,fibrous conformations, and they often contain imperfect

amino acid sequence repeats that reflect these structures.These repeats are sometimes found to match other ‘unre-lated’ extended proteins such as myosin, collagen, etc.,as well as long coiled-coil proteins. For example, somephage tail fibres contain substantial numbers of the col-lagen Gly-X-Y repeat (Smith et al., 1998). In addition, thesequences of coat proteins, tail shaft proteins and thehead maturation proteases are somewhat more variablethan the other proteins in this ‘conserved protein’ list.Protease motifs can often be recognized in the latter, butsuch motifs are not phage specific. For all three of theseprotein types, similarity is sometimes found between dis-tantly related phages, yet it is not uncommon to find nosubstantive similarity between otherwise rather close rel-atives. Probably the most universally conserved andtherefore best cornerstone proteins for prophage identifi-cation are the large terminase subunit and portal protein.If PSI-BLAST (Altschul et al., 1997) is used to build uprelated families of terminase and portal homologues fromthe current sequence database, a small number of cur-rently unconnected families accumulate in both cases,and no convincing matches to these proteins are foundthat have a known non-phage function. Yet there are a few‘orphan’ homologues of terminase and portal genespresent in bacterial genomes that have no other unequiv-ocal phage genes nearby. For example, the Sinorhizobiummeliloti 1021 genome contains an isolated, excellenthomologue (gene SMc04187) of the phage P22 largeterminase subunit (Galibert et al., 2001), and an orphanportal homologue (gene Spy0555) is present in the Strep-tococcus pyogenes M1SF370 genome (Ferretti et al.,2001). The functions of these particular genes have notbeen studied. Are these all that remains of once functionalprophages, or might they have other, as yet unknown,non-phage-related roles in these cases? At present, wedo not know the answer to this question, but current infor-mation suggests that such a lone homologue may well bea relict prophage.

Subjective prophage criteria

Given the immense variation among phages and ourincomplete knowledge of that variation, recognition ofprophages can be a rather subjective and delicate art,especially as satellite prophages and partly deleted defec-tive prophages may contain no morphogenetic ‘corner-stone’ genes. However, there are less objective criteriathat can contribute substantially to our confidence inprophage identification. In spite of their diversity, the tem-perate phages appear to have settled on a limited numberof transcriptional arrangements, and they tend to haveoperons that are longer than the average E. coli operon,presumably to allow turn-off of the lytic genes by repres-sion at a small number of operators. The latter can be

Page 9: Prophages and Bacterial Genomics What Have We

Prophage genomics 285

© 2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277–300

confidence building for prophage identification in bacteriasuch as E. coli, which have more or less randomly ori-ented genes, but is less useful in genomes with genesthat are largely oriented in the direction of DNA replicationsuch as Clostridium (Shimizu et al., 2002) and Thermoa-naerobacter (Bao et al., 2002).

More importantly, phage genomes show striking geneclustering according to general function and orderingaccording to detailed function within some of the clusters,and genes that encode DNA-interacting proteins usuallylie near the DNA target of those proteins. For example,prophage integrase genes are essentially always adjacentto or very near the attachment (integration) site on thephage chromosome, and so they typically mark one endof integrated prophages. Of particular interest here is theobservation that, within the gene cluster that encodes thevirion assembly proteins, there exists a striking conserva-tion of gene order (Casjens and Hendrix, 1988; Casjenset al., 1992; Hendrix and Duda, 1998). Recombination,replication and control functions are not found in this clus-ter, although a small number of non-assembly genesappear to have been relatively recently inserted into thisoperon in some temperate phages (Hendrix et al., 2000).In nearly every tailed phage and prophage with a geneorder that is known, the order is ‘terminase – portal –protease – scaffold – major head shell (coat) protein –head/tail-joining proteins – tail shaft protein – tapemea-sure protein – tail tip/baseplate proteins – tail fibre’ (listedin the order of transcription). The large lytic phages suchas those typified by T4 often have some rearrangementsrelative to this order, but the order is especially well con-served in the temperate phages. This is shown for themost highly conserved genes in some of the best-charac-terized phages in Fig.3. Fifteen to 25 proteins are typicallyused to build a temperate tailed phage’s virion, so themore highly conserved proteins are typically embedded inthis order in an apparent operon of this size. The lysisgenes usually lie in the same orientation, adjacent to andat either end of the virion protein cluster.

This is biology, so there are of course exceptions to anyrules we might attempt to derive. Some temperate phagessuch as P22 have short tails and so have no tapemeasureor tail shaft proteins, and the well-studied E. coli phageP2 and its close relatives have inverted terminase andportal genes relative to other phages, and their lysis geneslie between tail genes. But, overall, the above conservedmorphogenetic gene order has relatively few exceptionsand, when weak matches are present in this order, cre-dence can be lent to otherwise uncertain similarities. Aninstructive case in point is the family of 30–32kbp circular‘cp32’ plasmids found in the spirochaete B. burgdorferi.Each of these plasmids carries a similar, very poorlyexpressed 22-gene-long putative operon, which at the timeof sequencing contained only novel genes (Fraser et al.,

1997; Casjens et al., 2000; Ojaimi et al., 2003). As thephage sequence database grew, a moderately weakmatch (protein BLAST e-value = 3 ¥ 10-8) was foundbetween the second gene from the beginning of theseBorrelia operons and a Streptococcus phage fO1205gene (Stanley et al., 1997). This fO1205 gene, which islocated near the promoter-proximal end of the putativemorphogenetic operon (the expected position for a termi-nase gene), is a moderately weak match (e = 5.5 ¥ 10-7)to the well-characterized terminase of B. subtilis phageSPP1. [The transitive nature of such sequence families (Amatches B, B matches C, but A does not readily matchC) is often a feature of relationships between distantlyrelated phage virion proteins, and transitive matchesshould be accepted in such searches (see Gerstein,1998).] Later, when the X. fastidiosa genome wassequenced (Simpson et al., 2000), the protein encoded bythe adjacent, transcriptionally downstream Borrelia cp32gene was found to match very weakly (e = 0.13) a proteinencoded at the portal position (immediately downstreamof the putative large terminase gene) in X. fastidiosa’sconvincing prophages XfP3 and XfP4. After two additionalrounds of PSI-BLAST alignment, a family of proteins accu-mulates that includes the putative Borrelia portal proteins(now e = 3 ¥ 10-77) and proteins encoded at the portalposition by very unambiguous prophages in S. enterica,Haemophilus influenzae and L. innocua, but no connectionto experimentally proven portal proteins is made. In addi-tion, a novel gene near the 3¢ end of this Borrelia genecluster was found to be able functionally to replace aphage l lysis (holin) gene (Damman et al., 2000). Any ofthese observations alone does not constitute a very con-vincing argument that these Borrelia plasmids are or har-bour prophages, but the fact that each of these threematches is at the expected location within a phage lateoperon (see Fig.3) makes the argument considerablystronger. Finally, Eggers and Samuels (1999) found thatcp32 plasmid DNA is present in tailed phage-like particlesreleased from Borrelia, considerably strengthening theargument that these plasmids are indeed prophages (eventhough 90% of the genes in these putative virion assemblyoperons have no recognized homologues, and none hasbeen studied in more detail). Although it is impossible toquantify the increase in confidence one obtains when suchweak matches occur in the relative positions expected fora phage genome, anecdotal observations like this suggestthat increased confidence is nonetheless at least partlyjustified and can certainly provide impetus for furtherdirected experimental studies.

Highly deleted defective prophages

The evolutionary history of strain-specific elements thathave no remaining virion assembly genes can be difficult

Page 10: Prophages and Bacterial Genomics What Have We

286 S. Casjens

© 2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277–300

to deduce, and it may never be possible to know unam-biguously if they are in fact really prophage relics. Evenin the E. coli K-12 genome, there are elements with originsthat remain uncertain. For example, the 22-kbp-long CP4-57 element is inserted into the tmRNA gene, a site atwhich other more obvious prophages often lie in otherbacteria (Table1) (Kirby et al., 1994; Retallack et al.,1994). It contains an integrase, a functional homologue ofthe satellite phage P4 orf88 regulatory gene, no obviouslynon-phage genes and no recognizable homologues tovirion protein genes. Similarly, the 34kbp CP4-6 and13kbp CP4-44 elements in K-12 are possible prophages(Blattner et al., 1997; Rudd, 1999). CP4-6 carries an inte-grase gene at one end, several transposon parts, the

arginine metabolism argF gene and a glycosyl hydrolase(the last two have been argued to have arrived in E. coliby relatively recent horizontal transfer; Van Vliet et al.,1988; Garcia-Vallve et al., 1999). Genes in these threeregions have a similar codon usage that is different fromE. coli (Perna et al., 2002), and these elements are notpresent in other E. coli strains. All three elements containgenes of unknown function that are homologous to oneanother and are similarly arranged. These CP4s havebeen called prophages without qualification in the litera-ture, but their only overt phage homologies are integraseand control genes (Blattner et al., 1997; Garcia-Vallveet al., 1999; Rudd, 1999); genuine proof of phage ances-try awaits the discovery of a true phage with a genome

Fig. 3. Conserved genes and gene order in temperate phage morphogenetic operons. The most highly conserved genes in the morphogenetic (late) operons of temperate phages are shown as coloured rectangles; rectangle colours indicate similar functions as labelled. Identical colours do not necessarily indicate sequence similarity; phages are sufficiently diverse that not all proteins of similar function are recognizably homologous (see text). Black circles indicate the location of packaging initiation sites where this is known. A gap between rectangles indicates that there is a gene(s) between them that is not shown in the figure. The black arrow above indicates the direction of transcription for all the genes in the figure except two phage P2 genes, which are indicated to be transcribed in the opposite direction. The functions of most of the indicated E. coli and S. enterica phage genes have been determined directly, whereas the function of most of the genes of the other phages shown in the figure have been deduced by sequence homology.

Page 11: Prophages and Bacterial Genomics What Have We

Prophage genomics 287

© 2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277–300

structure that is similar to the CP4s. In the genomes ofless well-studied bacteria, it is even more difficult to rec-ognize partly deleted or satellite prophages that containnone of the prophage cornerstone genes.

Prophage evolution and genetic exchange between prophages

Prophages and the bacteria they inhabit have a somewhatprecarious mutual existence. From the prophage perspec-tive, many of its genes are not in use and so are not underselection for function. Therefore, mutations, including del-eterious ones, can accumulate in these genes resulting ina defective prophage. The host bacterium is under threatof death by prophage induction, and it seems that, in thelong term, it would be advantageous from the bacterium’sperspective if the prophages were to suffer debilitatingmutations, especially if those mutations blocked the abilityof the prophage to express its potentially lethal genes(Lawrence et al., 2001). It is therefore not surprising thata large fraction of the prophages that have been identifiedin bacterial genome sequences appear to be defective(only nine of the more than 200 prophages in TableS1have been shown experimentally to be fully functionalphages). To begin to understand the evolutionary pro-cesses that work on prophage DNA, it is instructive toexamine specific cases. Two cases will be consideredhere – the Rac prophages of E. coli (Table1) and the Pnm

prophages of Neisseria meningitidis. These are both a-Proteobacteria, and they may not be representative of allother bacterial phyla. For example, the sequenced Gram-positive Lactococcus, Lactobacillus and Streptococcusgenomes contain multiple prophages, but very highlydecayed prophages have not been identified there (none-theless, possible defective prophages such as SF370.4do exist in Streptococcus pyogenes; Canchaya et al.,2002). It is not yet clear whether this is a sampling differ-ence or if some species might carry only relatively newlyarrived prophages and/or have ways of avoiding the accu-mulation of defective prophages (see also above).

The Rac prophages

Figure4 diagrammatically compares the prophage entitiesthat lie at the Rac attachment site in the three sequencedE. coli chromosomes. Rac was the first defective proph-age to be discovered in E. coli K-12 (Low, 1973; Kaiserand Murray, 1979). In this strain, it was shown that,although no Rac virions were ever produced upon induc-tion, (i) parts of Rac can be picked up by the phage lchromosome through homologous recombination (Zissleret al., 1971; Kaiser and Murray, 1979); (ii) the Rac proph-age can be excised upon induction (Evans et al., 1979;Brikun et al., 1994); (iii) Rac is lethal to the host if expres-sion of its genes is induced, and this lethality results froman inhibitor of host cell division that is homologous to the

Fig. 4. Three l-like E. coli Rac prophages. Prophages Rac (E. coli strain K-12), Sp10 (strain Sakai) and CP-933R (strain EDL933) are located at identical positions in the three genomes. Genes and predicted genes are indicated by rectangles; black, genes outside the prophage; white, prophage genes that are transcribed to the left; grey, prophage genes that are transcribed to the right; cross-hatched, genes that currently have no homologues in other phages or prophages and so could in theory have been inserted since the original phage genome integrated at this site (see text). Below is a scale in kbp and arrows that indicate the major operons of the prophages as predicted by homology with other better characterized lambdoid phages. Cross-hatching between the three prophages marks regions of nucleotide sequence similarity; in some sections, the percentage identity is given. The labels for the various genes indicate known function or putative function as deduced from homology relationships. Open circles indicate apparent pseudogenes that have obviously been inactivated by mutation; closed circles indicate genes that have been shown to be functional in Rac; and closed triangles indicate deletions relative to known lambdoid infectious phages.

Page 12: Prophages and Bacterial Genomics What Have We

288 S. Casjens

© 2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277–300

l Kil protein (Feinstein and Low, 1982; Conter et al.,1996); (iv) mutations (called sbcA) in Rac can restorehomologous recombination in recB–recC– mutants byexpressing the prophage’s RecE function (Fouts et al.,1983; Willis et al., 1985); and (v) these sbcA mutants alsoexpress a function, Lar, that enhances EcoKI-mediatedDNA methylation (similar to l Ral function) (King andMurray, 1995). The sbcA mutations are thought to turn onthe non-lethal part of the Rac prophage early left operonrather than altering RecE and Lar functions directly(Mahajan et al., 1990), indicating that the lar and recEgenes are functional but unexpressed in the Rac proph-age. The K-12 genome sequence confirmed that Rac isindeed a lambdoid prophage that has lost about 60% ofits original DNA (Blattner et al., 1997). Its early left operoncontains the recE gene at the position in which otherlambdoid phages carry their genes for homologousrecombination. More recently, the fully functional Salmo-nella phages, Gifsy-1 and Gifsy-2, have been found tocarry recE homologues in similar positions in their earlyleft operons (McClelland et al., 2001), suggesting that therecE gene is most likely an authentic part of the originalRac phage. In addition, it is likely that the Rac repressorand integrase still function, as conjugational transferinduces gene expression from the prophage and causesexcision (Evans et al., 1979; Feinstein and Low, 1982).Rac’s right arm has not fared as well (Fig.4); deletionshave removed at least (i) the region between the DNAreplication and lysis genes; (ii) the head and upstream tailgenes (equivalent to l genes nu1 to G-T); and (iii) the tailtip genes (equivalent to l M to J). In addition, two trans-posons now reside in its right arm, one of which disruptsa homologue of the l lom lysogenic conversion gene.There are four obvious pseudogenes in the right arm, theinterrupted lom gene and truncated b1361, tail tapemea-sure (H) and lysis (Rz) genes. Of course, it is not possibleto tell whether any open reading frame that has not beenstudied experimentally but is approximately full length rel-ative to other homologues is in fact functional, so this isthe minimum number of defective genes.

Curiously, immediately to the right of Rac’s Rz homo-logue, the trkG gene for potassium uptake (Dosch et al.,1991; Schlosser et al., 1991) lies in a region that is veryvariable among the lambdoid phages and is not known tocarry essential genes (for the phage). Was the trkG genepart of the original prophage or was it moved into thislocation subsequent to the phage’s original integration?To date, no functional phage is known that carries a trkGhomologue. The huge diversity of phages makes it difficultto even guess whether such a putative prophage gene,which has not yet been found on other phages, was orwas not part of the phage that integrated to form theoriginal prophage. The trkG gene in Rac and the argFhomologue in CP4-6 (above) are such cases in point, but

both are redundant to other genes with the same functionin K-12 and so may be recent arrivals. Our (admittedly notexhaustive) analysis of the prophages in TableS1 sug-gests that there are few compelling examples of putativenon-phage genes that have moved into a prophageafter its integration. It seems inevitable that some non-phage genes would end up inside defective prophagesduring rearrangements that might accompany the decayprocess, and the frequency of such events could varyamong hosts but, nevertheless, such events appear to berare in prophages that have not yet decayed intounrecognizability.

More recently, the genomes of two closely relatedO157-type E. coli strains, EDL933 and Sakai, have beensequenced (Hayashi et al., 2001a; Perna et al., 2001) thatcarry a prophage located precisely at the Rac attachmentsite (Table1); in EDL933, it was named CP-933R and, inSakai, it was named Sp10 (Fig.4). In a fourth E. coli strain,CFT073 (Welch et al., 2002), all that remains at thisattachment site is 320bp (including a C-terminal fragmentof an integrase gene) that are 98.4% identical to the leftend of the above three prophages. It thus appears that arelated prophage once occupied the Rac attachment sitein CTF073 but, as it has been nearly completely deleted,it will not be discussed further here. CP-933R and Sp10are similar to one another, but are not identical. Both havelengths similar to known lambdoid phages (which rangefrom about 39kbp to 62kbp). They are typically mosaiclambdoid genomes, with many homologues of knownlambdoid phage genes arranged with the ‘correct’ cluster-ing, order and orientation. Neither contains any genes thatare clearly related to ‘non-phage’ genes, and both containa few obvious pseudogenes. Among the essential virionassembly genes, the Sp10 putative coat protein genecontains a frameshift relative to several other prophagesin these strains. CP-933R has head and tail genes thatare similar to phage l and, using l gene nomenclature,its essential genes E, V, H, I and J are truncated orcontain frameshifting mutations, and genes FI, FII, Z andU are missing. Thus, neither prophage is expected to beable produce viable virions upon induction, and theyappear to have had different mutational histories sincetheir arrival at this location. As in Rac, the left arm of thesetwo prophages appears, at this level of analysis, to belargely intact. The leftmost 21kbp are >99.9% identical inCP-933R and Sp10, and their leftmost 8kbp are 99.0%identical to the K-12 Rac prophage.

Are Rac, CP-933R and Sp10 the result of integrationby different phages at the same bacterial attachment site,or are they descendants of the same progenitor proph-age? Independently isolated phages with identical integra-tion specificities are known so, at first glance, the formerscenario seems plausible, as the central regions of Sp10and CP-993R are not closely related. The head genes of

Page 13: Prophages and Bacterial Genomics What Have We

Prophage genomics 289

© 2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277–300

Sp10 are very similar to those of prophages Sp6, Sp9 andSp12 (which are not close relatives of any experimentallystudied phage). Those of CP-933R are very similar tohead genes of phages l and 21 and also closely relatedto genes in the CP-933Od portion of the complex CP-933O prophage (see TableS1). (‘Sp’ prophages are in E.coli strain Sakai and ‘CP’ prophages are in strainEDL933.) This could be interpreted as evidence for inde-pendent origins for Sp10 and CP-933R; however, thelambdoid phages are so diverse that very rarely, if ever,have any two independently isolated infectious lambdoidphages been found that are so nearly identical over suchan extended region as Sp10 and CP-933R are at their leftand right ends. HK97, 434 and l integrate at the samesite, as do Sf6 and HK620, and these phages do not havethis ‘similar prophage ends with different central regions’relationship; they are typically mosaically related withnearly identical integrase genes (Juhala et al., 2000; S.Casjens, A. J. Clark, W. Inwood and R. Moreno, unpub-lished). Thus, independent integration at the Rac attach-ment site by two different progenitor phages with suchsimilar genomes seems an unlikely event.

The deletion in CP-933R that has end-points in its l Eand V gene homologues (Fig.4) contributes to a strongerargument that genes have, in fact, been exchangedamong prophages within these bacteria. This deletion isalso present with exactly the same end-points (betweengenes Z2136 and Z2137) in the EDL933 prophage CP-933Od. It is very unlikely that identical deletions happenedindependently in CP-933R and CP-933Od, so one ofthese head regions was apparently replaced by a copy ofthe other after the deletion occurred. It is also unlikely thatthis deletion (which removes six essential genes) wouldbe present in an infecting phage virion’s DNA. We cannotbe absolutely sure, but it therefore seems most reason-able to propose that CP-933R and Sp10 are in fact

descendants of the same original prophage, and thateither (i) in EDL933, the head genes of the original proph-age at this site were replaced by a copy of the deletion-carrying head genes from CP-933Od; or (ii) in Sakai, thephage l-like head genes of the original prophage werereplaced by a copy of those from Sp6, Sp9 or Sp12.Although such recombination acts could be seen as‘homogenizing’, the recipient carries a new overall combi-nation of alleles not present in the parent prophages. Asthere is a very low probability of two phage DNAs ofindependent origin having such extended regions ofnearly identical nucleotide sequence integrating into thesame chromosome, such identity, when present, couldconceivably constitute tentative evidence for such dupli-cative exchanges. For example, the 14317bp of identitybetween prophages XfP3 and XfP4 in X. fastidiosa 9a5cand the over 4000bp of identity between the Gifsy-1 andGifsy-2 prophages’ DNA replication–Nin regions in S.enterica LT2 suggest that such exchanges may also haveoccurred in these cases.

Even more surprising is the observation that the sametype of relationship as is seen between CP-933R andSp10 (extremely similar outside regions with very differentcentral regions) is found to be common when other cog-nate prophage pairs in EDL933 and Sakai are compared.Prophage pairs Sp14/CP-933U, Sp4/CP-933M and Sp15/CP-933V all have this type of relationship (Fig.5). Forexample, lambdoid prophages Sp14 and CP-933U areboth integrated into the same site within the serU tRNAgene. These two prophages have about 12kbp of 99.2%identity at their tail gene ends and 16kbp of 99.9% identityat their integrase ends. Between these long-terminal sim-ilarities, they have >10kbp of sequence where little simi-larity can be found. This central part of Sp14 contains an8kbp section of the head genes that is 99.8% identical tothe head gene region of Sp4. If it is unlikely that two phage

Fig. 5. Central region shuffling among E. coli O157 prophages.Top. Five E. coli O157 Sakai prophages are indicated by coloured rectangles. Bottom. The E. coli O157 EDL933 prophages integrated at cognate sites are similarly indicated. The host gene at the site of integration is shown between cognate prophages. All five cognate pairs have outer regions that are extremely similar (in most cases >99% identical). The colours of the central sections of the prophages indicate their sequence relationships in the head gene regions, and the asterisk (*) indicates the presence of the deletion that ends in the coat and tail shaft protein genes (see text). Similar colours indicate nucleotide sequences that are >93% identical. The central (head) regions indicated by different colours are not close sequence relatives; the closest is about two-thirds of the Sp15 head region, which is about 75% identical to that of CP-933U, and the others are much more distantly related. Rectangle sizes are not proportional to DNA length, and the situation is actually more complex than the diagram indicates in that some of the non-head gene regions of the central non-homologous parts of cognate prophages have different relationships from the indicated head genes. Prophage CP-933X contains the remaining unsequenced section of the strain EDL933 genome.

Page 14: Prophages and Bacterial Genomics What Have We

290 S. Casjens

© 2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277–300

genomes with this type of sequence relationship hap-pened to have integrated independently at the Rac attach-ment site in these two strains (above), then it is all themore unlikely that these four prophage pairs would alsohave such a relationship. Furthermore, the identical dele-tion of DNA between the coat and tail shaft genes that ispresent in CP-933R and CP-933Od in EDL933 is presentin Sp4 and Sp14 in Sakai. As none of these four is acognate prophage (i.e. integrated at the same site in thesetwo strains), this deletion appears to have occurred in acommon ancestor of EDL933 and Sakai and then movedbetween prophages several times after their divergence.The relative abundance of this type of duplicative rear-rangement between prophages in these two isolates sug-gests that interprophage homologous recombination mayoccur much more frequently than previously imagined,and that such events could well be an important route bywhich new temperate phage allele combinations areformed.

Pnm2 and Pnm3 prophages

Neisseria meningitidis cognate prophages Pnm2 in strainZ2491 and NeisMu1 in strain MC58 (Parkhill et al., 2000;Tettelin et al., 2000) are mosaic relatives of the Mu-likegroup of phages [E. coli phage Mu and three largely intactprophages, FluMu, Sp18 and Pnm1, present in H. influ-enzae Rd, E. coli Sakai and N. meningitidis Z2491,respectively, have been completely sequenced (Fleis-

chmann et al., 1995; Parkhill et al., 2000; Hayashi et al.,2001a; Morgan et al., 2002); ‘NeisMu1’ is a provisionalname used here as the original annotators did not namethis element]. This type of phage integrates essentiallyrandomly by a transposition mechanism (reviewed byHarshey, 1988). Thus, as the number of potential integra-tion targets in any genome is huge, natural prophages ofthis type that are found at identical positions in thegenomes of two independently isolated bacteria areextremely likely to be descendants of the same pastphage integration event. Pnm2 and NeisMu1 occupy pre-cisely the same integration site within an ABC-type trans-porter gene (Fig.6). In both prophages, the readingframes of the two transporter gene halves seem to beessentially intact (97.4% identical in nucleotidesequence); however, the N-terminal fragment of the strainMC58 gene contains a frameshift mutation. These proph-ages are both certainly defective, and their deletion histo-ries are different. For example, Pnm2 appears to havesuffered an ª9kbp deletion in the tail region, and NeisMu1has a major deletion in its middle gene region and ashorter deletion of the putative coat protein gene.

As in the case of Sp10 and CP-933R above, differentialDNA replacements appear to have occurred after integra-tion. An example of such a replacement is near the leftend of the two prophages, where Pnm2 and NeisMu1have unrelated genes, the best matches of which areother transcriptional repressors, at the position whereother Mu-like phages encode repressors. In general, the

Fig. 6. Defective Mu-like Neisseria meningitidis prophages. Defective prophages Pnm2 and Pnm3 and NeisMu1 and NeisMu2 in N. meningitidis prophages in strains Z2491 and MC58, respectively, are shown as in Fig.4. Below each prophage, selected genes are marked by the gene number of the homologous phage Mu gene and/or a predicted function (Morgan et al., 2002). Grey arrows connect genes that have similar predicted function but not sequence similarity. Black bars marked A, B or C denote regions where more detailed comparisons were made (see text).

Page 15: Prophages and Bacterial Genomics What Have We

Prophage genomics 291

© 2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277–300

homologous genes in Pnm2 and NeisMu1 are about asdifferent from each other as are chromosomal backbonegenes in N. meningitidis MC58 and Z2491. Sections Aand B (Fig.6) of Pnm2 and NeisMu1 are 99.3% and 97.4%identical in nucleotide sequence, respectively, and thethree intact chromosomal genes adjacent to the left endof NeisMu1 in MC58 are 97% identical to the same genesin strain Z2491. This is consistent with the notion thatNeisMu1 and Pnm2 have been diverging for the aboutsame length of time as the chromosomes in which theyreside. Sections A and B are in the head and tail geneclusters, respectively, neither of which should be underselection for function in the prophage. N. meningitidis is anaturally competent bacterium, in which DNA uptake ismediated through a DNA uptake sequence (Goodmanand Scocca, 1988). Both Pnm2 and NeisMu1 do containthis sequence greatly over-represented, so it is impossibleto know whether one of the putative repressor genesentered the prophage from an infecting phage, anotherprophage (now gone) or from transforming phage orprophage DNA.

Also present at identical locations in N. meningitidisZ2491 and MC58 is another region that is probably a morehighly decayed Mu-like prophage called Pnm3 andNeisMu2 in the two strains respectively (Fig.6). These aremuch more highly deleted than Pnm2 and NeisMu1 (theyretain only 15–20% of their putative original DNA), and soare likely to have been decaying for a longer period of time– yet they are 97.5% identical to each other in region C(Fig.6). This is consistent with the divergence of Z2491and MC58 after this element started to decay. These twoprophages highlight the use of gene order and clusteringin recognizing highly deleted prophages. The only matchto an authentic phage gene in Pnm3 and NeisMu2 is thepresence of a homologue of Mu gene 16 (also calledgemA). The presence of a single phage-like gene, espe-cially a regulatory gene (Ghelardini et al., 1994) such asthis one, cannot be considered unequivocal evidence ofa prophage. However, there are a number of genes inPnm3/NeisMu2 that are similar to otherwise novel openreading frames present in Pnm2/NeisMu1 (and Pnm1,another largely intact Mu-like prophage in Z2491; Kleeet al., 2000). As homologues to these genes are notpresent outside these prophages, and as they are presentin the same order in each of the putative prophages, itcan be rather firmly concluded that Pnm3 and NeisMu2are real but highly deleted prophages.

The complex decay of prophages

It might have been expected that derelict prophage DNAswould be in a straightforward mutational ‘free fall’ in whichinactivating mutations occur at random until the prophageis completely eliminated. Lysogenic conversion (or possi-

bly other) integrated prophage genes may be advanta-geous to the host and be kept functional by selection asthe rest of the prophage decays into oblivion, and so theymay eventually be appropriated as integral parts of thehost chromosome. Examples of possible intermediates inthis assimilation process might be some pathogenicityislands and the Shigella dysenteriae Shiga toxin that isencoded by a small prophage remnant (McDonough andButterton, 1999). Likewise, plasmid prophages mightevolve into plasmid replicons.

However, the situation is certainly much more complexthan this. Understanding prophage evolution and decayis significantly complicated by possible excision and sub-sequent replacement by another, possibly related phagegenome, as well as by homologous recombination withinfecting phage genomes and other prophages in thesame cell. Infecting phages can clearly acquire geneticinformation from prophages in cells they infect (e.g. Kai-ser, 1980; Espion et al., 1983; Bouchard and Moineau,2000). However, as prophages express immunity andsuperinfection exclusion systems that can allow cell sur-vival after a superinfecting phage has injected its DNA(Susskind et al., 1974; Susskind and Botstein, 1980),transfer of information from infecting phage DNA torelated prophages might occur as well. Studies of theDNA sequences present in different E. coli isolates at thephage 21, l and Atlas attachment sites have found evi-dence for different entities being present at each site insome different strains (Milkman and Bridges, 1990; Wanget al., 1997; Kuhn and Campbell, 2001), so completeexcision and replacement is certainly plausible. Althoughsuch findings could be interpreted to support the ideathat, in a given bacterial lineage, prophages come and go(perhaps frequently?; Campbell, 1996), the argumentspresented above suggest that complete replacementmay be less common than other types of genetic interac-tions. Some prophages may in fact spend rather longtimes in residence in bacterial chromosomes beforebeing completely removed. At least large parts of Proteo-bacterial prophages such as Rac and Pnm2, which arestill quite far from complete assimilation, appear to havebeen in residence at least long enough for bacterialgenes to diverge about 1.5 and 3% respectively. (Thegenes of E. coli K-12, for example, are on average 98.5%and 98.3% identical to those of EDL933 and Sakairespectively; Perna et al., 2002.) If estimates of diver-gence rates in bacteria are correct (Ochman and Wilson,1987; Reid et al., 2000), this suggests that these proph-ages may have been in place for as long as severalmillion years. This does not mean that all prophageshave such long residence times, and the suggestion ofsuch antiquity for some is rather speculative. Comparisonof additional genome sequences will help to decide uponits accuracy.

Page 16: Prophages and Bacterial Genomics What Have We

292 S. Casjens

© 2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277–300

Analysis of such decaying prophages clearly shows thatpoint mutations, transposon insertions and deletions alloccur. Interestingly, it appears that, as prophages decay,prophage-debilitating deletions can accumulate more rap-idly than gene-inactivating point mutations, as numerousgenes, even in moderately highly deleted prophages suchas Rac, remain functionally intact. The functionality ofmany normally unexpressed genes in defective proph-ages has been demonstrated in the laboratory throughmutations that turn on their expression (Willis et al., 1985;Blasband et al., 1986; Bejar et al., 1988; Mahajan et al.,1990), recombination onto a related phage that dependsupon that function (Kaiser, 1980; Espion et al., 1983; Bou-chard and Moineau, 2000) or expression of functionalproteins from a cloning vector (Morimyo et al., 1992; Kingand Murray, 1995; Jin et al., 1996; Mahdi et al., 1996).This lack of debilitating point mutations could be the resultof random failure to be inactivated by mutation or be dueto selection for function if the genes are in fact weaklyexpressed and have a function in the lysogen. The latterseems unlikely, except for lysogenic conversion genes,given current knowledge about gene expression fromprophages. On the other hand, inactivated genes couldbe repaired to full functionality by recombination with otherprophages or with infecting phages.

It has long been known that homologous recombinationbetween lambdoid prophages is possible in the laboratory(Meselson, 1967; Redfield and Campbell, 1987), but sim-ple, single break-and-join recombination events betweennon-tandem prophages integrated in the same chromo-some would result in inversion or deletion of the interven-ing DNA. Such events could be detrimental to the host;however, one such inversion event does appear to haveoccurred that involved prophages CP-933O and CP-933Pin E. coli strain EDL933 (Perna et al., 2002). Non-recipro-cal double break-and-join or long gene conversion eventscould replace parts of one prophage with sequences fromanother prophage. Either mechanism could create rela-tionships such as those observed between the prophagesin Fig.5 where multiple prophages within a bacterium con-tain sections of nearly identical sequence. Such duplica-tive replacement events among prophages should notdistinguish between functional and non-functional genes,and so would be just as likely to replace a functional genewith a non-functional one as vice versa. On the otherhand, replacement of part of a prophage by part of aninfecting phage genome would be more likely to repairdamaged prophage genes to functionality, as genes on aninfectious phage genome have presumably been underrecent selection for functionality. Nonetheless, at present,we cannot know whether (for example) the apparentlyfunctional left early operon of Rac was left intact bychance, was somehow selected to remain functional or iscurrently functional because of recent repair from another

prophage (since lost) or an infecting phage. We can, how-ever, conclude that, even if a prophage is defective, it isnot necessary that all its genes are doomed to be lostforever. As many of their genes retain functionality andremain accessible to the phage population, and as phagevirions may only be in 10-fold excess over bacterial cellsin the environment (Bergh et al., 1989), prophage genesconstitute a significant portion of the ‘phage gene pool’ inthe earth’s biosphere.

Comments on the identification and annotation of prophages

In order to understand fully the true nature of bacterialgenomes, we must be able to recognize prophages innucleotide sequence; however, the extreme variability ofphage nucleotide sequences makes it quite possible thatunrecognized prophages still lurk in bacterial genomesequences. The ‘gold standard’ of prophage recognitionis and should remain high similarity of sequence and geneorganization to authentic temperate phages that infect thesame bacterial species. In addition, (i) recognition of theconserved nature of some dsDNA tailed phage morpho-genetic proteins such as portal and terminase; and (ii) theobservation that these proteins do not have homologueswith known non-phage functions has made the recogni-tion of many prophages, even in distantly related bacterialgenome sequences, quite unambiguous.

Can our ability to recognize prophages and annotatetheir sequences be improved? Yes. Most importantly, thestudy of additional infectious tailed phages, especiallythose that infect the less well-studied phylogeneticbranches of bacteria, will help to ‘fill in the current gaps insequence space’ and so make prophages more easilyrecognizable in those phyla. Hopefully, this will eventuallylead to a situation in which at least the most highly con-served phage proteins will form one or a few (transitive)sets of related sequences that will include, for example,all the known terminases or portal proteins and will con-tain recognizable homologues of all subsequentlysequenced members of those families. But such corner-stone genes may not be present in authentic but defectiveprophages or satellite prophages; how can we recognizethese with higher accuracy and confidence? Several sim-ple things can be done now.

(i) As relatively few ‘non-phage’ genes appear to havemoved into the known prophages after integration, itseems justified at this point for bacterial genomeannotators to indicate that ‘hypothetical’ (novel) or‘conserved hypothetical’ (have a homologue ofunknown function in the database) genes withinapparent prophages are ‘putative prophage genes’.To date, some bacterial genomes have been anno-

Page 17: Prophages and Bacterial Genomics What Have We

Prophage genomics 293

© 2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277–300

tated in this manner, whereas many have not. If thiswere universally done, it would be much easier todetermine whether conserved hypothetical genes innew prophages are present elsewhere in otherprophages. If they are, and especially if they arepresent in the same order, they probably representthe remains of another prophage. When this logic isapplied to the X. fastidiosa 9a5c genome, for exam-ple, at least five more highly deleted putative proph-age remnants are found in addition to the fourprophages identified in the original genome report(Simpson et al., 2000) (TableS1).

(ii) Many phage genes have specific, very well-under-stood functions, so possible prophage genes withhomology to these should be annotated with theirspecific presumed function, not just ‘phage-relatedprotein’ as is often done currently.

(iii) Prophages very often ‘repair’ the bacterial gene intowhich they integrate by carrying a similar replace-ment part on the phage genome that gets fused tothe target gene upon integration (Campbell et al.,1992; Campbell, 1994). Even when this does notoccur, the identity between the phage and bacterialattachment sites is usually ≥10bp long. Thus, thereare typically exact direct repeats tens of basepairslong at the prophage boundaries [e.g. 10–148bp inthe various E. coli Sakai prophages (Hayashi et al.,2001b); such repeats can be even longer and neednot be perfect throughout their length (Campbellet al., 1992)]. Genome annotators and analysersshould attempt to locate and report such repeats, asfinding these features identifies the outside bound-aries of the prophage with precision.

(iv) A strong argument for integrated phage DNA (or anymobile DNA element) is its absence in some otherstrains. This may be subject to exception if there hasbeen a recent population bottleneck in a species orif phages are so abundant that ancestors of everyextant bacterium acquired a prophage at a givenattachment site. As many genome sequencing oper-ations have an interest in using their sequence infor-mation to examine genomic variation within species,this author recommends that, whenever possible, if aprophage is tentatively identified in a new bacterialgenome sequence, the sequencers check for itsabsence in other strains. This can be done by DNAarray analysis, but this approach has the disadvan-tage that it can be fooled by the not unlikely occur-rence of similar prophages at different locations inother strains. A more informative approach is poly-merase chain reaction amplification across the puta-tive attachment site in other strains and sequencingthe amplified product, if it is made, that is expectedwhen no prophage is present. This would both help

to confirm a sequence region as a prophage andprecisely locate the prophage attachment site andprophage ends with confidence, which in turn wouldmake the assignment of prophage genes much morerobust.

(v) Finally, annotators should give names to putativeprophage elements in bacterial genome sequences.This may seem a trivial point, but it has not been donein many of the published genome sequences, and thelack of names makes it difficult for others (who arereluctant to name them themselves) to deal with themin print. Prophages at cognate sites in differentstrains of the same species should not be given thesame name, as they are probably not identical.

If these were universally implemented, it would makeglobal analysis of prophage sequences much easier,which in turn would make annotation much more accurateand our understanding much more sophisticated.

Prophage sequences and bacteriophage diversity

For those who are interested in understanding the rangeof diversity of phages on earth, the sequenced prophagesrepresent a wealth of information that cannot be ignored,as more prophage sequences have been determined thanhave sequences of bona fide infectious phages. For exam-ple, it is currently possible to use the prophage sequencesto learn about the different types of non-homologous (con-vergent) gene modules that are used for a particular func-tion by a group of temperate phages. A few such casesare as follows. (i) The E. coli RecE-type homologousrecombination function was first found in the defective Racprophage and only subsequently found in other infectiouslambdoid phages (above). This recombination systemuses the recE and recT genes of prophage Rac but, onother phages, we can recognize only a recT homologue[e.g. B. subtilis phage SPP1 (Alonso et al., 1997) and L.monocytogenes phage A118 (Loessner et al., 2000)] oronly a recE homologue (e.g. S. enterica phages Gifsy-1and Gifsy-2; McClelland et al., 2001). This suggests thatthese phages may have another non-homologous proteinthat replaces the missing partner. (ii) The lambdoidphages P22 and l have convergent replication genes –the l gene P protein recruits the host DnaB helicase tothe replication initiation complex, whereas the cognate,non-homologous P22 gene 12 protein is a homologue ofthe host DnaB protein that does the helicase job itself(Wickner, 1984a,b). An E. coli dnaC homologue was firstseen in the Rac prophage at this location and, recently,the lambdoid phages Gifsy-1 and Gifsy-2 have been foundto carry a clear dnaC homologue in their DNA replicationregion. E. coli DnaC protein is a helicase loader, so per-haps these DnaC homologues act in the same way asphage l P protein? In addition, the lambdoid S. enterica

Page 18: Prophages and Bacterial Genomics What Have We

294 S. Casjens

© 2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277–300

LT2 prophage Fels-1 encodes a novel protein that con-tains a primase motif in its replication gene position; doesFels-1 use a new, unstudied type of lambdoid phage rep-lication initiation? (iii) Most lambdoid phages carry ahomologue of the l Rz lysis gene adjacent and down-stream of their endolysin gene. The K-12 prophage QINhas a different, novel gene in this position, and phage N15was subsequently found to have a homologue of this QINprotein in the same location (Ravin et al., 2000). Does thisgene represent a functional alternative to Rz function? (iv)Several prophages in the E. coli genome sequences thatare lambdoid in other respects have head and/or tailgenes (as deduced from their position within the proph-age) that are unrelated in sequence to any previouslystudied virion assembly genes, and lambdoid prophagesfound in the genomes of Wolbachia (Masui et al., 2000;2001) and X. fastidiosa (Simpson et al., 2000) have tailgenes that are homologous to genes that encode contrac-tile tails in other phages (all previously characterizedlambdoid phages had non-contractile or short tails). Morerecently, E. coli and S. flexneri lambdoid phages fP27 andSfV were found to have contractile tail genes (Allisonet al., 2002; Recktenwald and Schmidt, 2002). Clearly, thesequenced prophages are an excellent place to find vari-ations on temperate phage lifestyle themes.

Finally, we can learn about the overall variety of typesof temperate phages from the examination of prophagesequences. A dramatic example of this may be indicatedby genes homologous to the RNA polymerase gene ofvirulent E. coli phage T7 in the X. axonopodis 903 genome(da Silva et al., 2002) and in the Pseudomonas putidaKT2440 (Nelson et al., 2002). In both cases, homologuesof phage head and tail genes lie nearby, supporting thenotion that these putative RNA polymerase genes areparts of prophages PP03 in P. putida and XacP2 in X.axonopodis (TableS1, Supplementary material). If true,this would be a completely new type of temperate phage,as no temperate phage is currently known to encode itsown RNA polymerase. Many such discoveries no doubtawait the careful analysis of the numerous prophagespresent in bacterial genome sequences.

Acknowledgements

The author’s research is supported by NSF grantMCB990526 and NIH grant AI49003. I thank Roger Hendrixand Jeff Lawrence for reading this manuscript and for manyproductive discussions of phage biology and evolution, andThad Stanton, Kenn Rudd, Guy Plunkett and Nicole Pernafor access to unpublished information.

Supplementary material

The following material is available from http://www.blackwellpublishing.com/products/journals/suppmat/mmi/mmi3580/mmi3580sm.htm.

Table S1. Prophages and phage-like objects in 82 publishedbacterial complete genomes.

References

Agron, P.G., Walker, R.L., Kinde, H., Sawyer, S.J., Hayes,D.C., Wollard, J., et al. (2001) Identification by subtractivehybridization of sequences specific for Salmonella enter-ica serovar Enteritidis. Appl Environ Microbiol 67: 4984–4991.

Allison, G.E., Angeles, D., Tran-Dinh, N., and Verma, N.K.(2002) Complete genomic sequence of SfV, a serotype-converting temperate bacteriophage of Shigella flexneri. JBacteriol 184: 1974–1987.

Alonso, J.C., Luder, G., Stiege, A.C., Chai, S., Weise, F., andTrautner, T.A. (1997) The complete nucleotide sequenceand functional organization of Bacillus subtilis bacterioph-age SPP1. Gene 204: 201–212.

Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J.,Zhang, Z., Miller, W., et al. (1997) Gapped BLAST andPSI-BLAST: a new generation of protein database searchprograms. Nucleic Acids Res 25: 3389–3402.

Anilionis, A., Ostapchuk, P., and Riley, M. (1980) Identifica-tion of a second cryptic lambdoid prophage locus in the E.coli K12 chromosome. Mol Gen Genet 180: 479–481.

Bail, O. (1925) Der kolistamm 88 von Gildmeister undHerzberg. Med Klin (Munich) 21: 1271–1273.

Banks, D.J., Beres, S.B., and Musser, J.M. (2002) The fun-damental contribution of phages to GAS evolution, genomediversification and strain emergence. Trends Microbiol 10:515–521.

Bao, Q., Tian, Y., Li, W., Xu, Z., Xuan, Z., Hu, S., et al. (2002)A complete sequence of the T. tengcongensis genome.Genome Res 12: 689–700.

Barreiro, V., and Haggard-Ljungquist, E. (1992) Attachmentsites for bacteriophage P2 on the Escherichia coli chromo-some: DNA sequences, localization on the physical map,and detection of a P2-like remnant in E. coli K-12 deriva-tives. J Bacteriol 174: 4086–4093.

Bejar, S., Bouche, F., and Bouche, J.P. (1988) Cell divisioninhibition gene dicB is regulated by a locus similar to lamb-doid bacteriophage immunity loci. Mol Gen Genet 212: 11–19.

Bentley, S., Chater, K., Cerdeno-Tarrage, A., Challis, G.,Thomson, R., James, K., et al. (2002) Complete genomesequence of the model actinomycete Streptococcus coeli-color A3(2). Nature 417: 141–147.

Beres, S.B., Sylva, G.L., Barbian, K.D., Lei, B., Hoff, J.S.,Mammarella, N.D., et al. (2002) Genome sequence of aserotype M3 strain of group A Streptococcus: phage-encoded toxins, the high-virulence phenotype, and cloneemergence. Proc Natl Acad Sci USA 99: 10078–10083.

Bergh, O., Borsheim, K., Bratbak, G., and Heldal, M. (1989)High abundance of viruses found in aquatic environments.Nature 340: 467–468.

Bertani, G. (1951) Studies on lysogenesis. I. The mode ofphage liberation by lysogenic Escherichia coli. J Bacteriol62: 293–299.

Bertani, E., and Six, E. (1988) The P2-like phages and theirparasite P4. In The Bacteriophages, Vol. 2. Calendar, R.(ed.). New York: Plenum Press, pp. 73–143.

Page 19: Prophages and Bacterial Genomics What Have We

Prophage genomics 295

© 2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277–300

Bishai, W., and Murphy, J. (1988) Bacteriophage gene prod-ucts that cause human disease. In The Bacteriophages,Vol. 2. Calendar, R. (ed.). New York: Plenum Press, pp.683–724.

Blaisdell, B.E., Campbell, A.M., and Karlin, S. (1996) Simi-larities and dissimilarities of phage genomes. Proc NatlAcad Sci USA 93: 5854–5859.

Blasband, A.J., Marcotte, W.R., Jr, and Schnaitman, C.A.(1986) Structure of the lc and nmpC outer membrane porinprotein genes of lambdoid bacteriophage. J Biol Chem261: 12723–12732.

Blattner, F.R., Plunkett, G., III, Bloch, C.A., Perna, N.T., Bur-land, V., Riley, M., et al. (1997) The complete genomesequence of Escherichia coli K-12. Science 277: 1453–1474.

Bolotin, A., Wincker, P., Mauger, S., Jaillon, O., Malarme, K.,Weissenbach, J., et al. (2001) The complete genomesequence of the lactic acid bacterium Lactococcus lactisssp. lactis IL1403. Genome Res 11: 731–753.

Bordet, J. (1925) Le probléme de l’autolyse microbiennetransmissible ou du bactériophage. Ann Inst Pasteur 39:711–763.

Botstein, D. (1980) A theory of modular evolution in bacte-riophages. Ann NY Acad Sci 354: 484–491.

Bouchard, J.D., and Moineau, S. (2000) Homologous recom-bination between a lactococcal bacteriophage and thechromosome of its host strain. Virology 270: 65–75.

Boyd, E.F., and Brussow, H. (2002) Common themes amongbacteriophage-encoded virulence factors and diversityamong the bacteriophages involved. Trends Microbiol 10:521–529.

Boyd, E.F., Davis, B.M., and Hochhut, B. (2001) Bacterioph-age–bacteriophage interactions in the evolution of patho-genic bacteria. Trends Microbiol 9: 137–144.

Brandt, K., Tilsala-Timisjarvi, A., and Alatossava, T. (2001)Phage-related DNA polymorphism in dairy and probioticLactobacillus. Micron 32: 59–65.

Brikun, I., Suziedelis, K., and Berg, D.E. (1994) DNAsequence divergence among derivatives of Escherichiacoli K-12 detected by arbitrary primer PCR (random ampli-fied polymorphic DNA) fingerprinting. J Bacteriol 176:1673–1682.

Brody, H., Greener, A., and Hill, C.W. (1985) Excision andreintegration of the Escherichia coli K-12 chromosomalelement e14. J Bacteriol 161: 1112–1117.

Bruggemann, H., Baumer, S., Fricke, W.F., Wiezer, A., Lie-segang, H., Decker, I., et al. (2003) The genome sequenceof Clostridium tetani, the causative agent of tetanus dis-ease. Proc Natl Acad Sci USA 100: 1316–1321.

Brussow, H., and Hendrix, R.W. (2002) Phage genomics:small is beautiful. Cell 108: 13–16.

Bunny, K., Liu, J., and Roth, J. (2002) Phenotypes of lexAmutations in Salmonella enterica: evidence for a lethal lexAnull phenotype due to the Fels-2 prophage. J Bacteriol 184:6235–6249.

Campbell, A. (1962) The episomes. Adv Genet 11: 101–118.

Campbell, A. (1994) Comparative molecular biology of lamb-doid phages. Annu Rev Microbiol 48: 193–222.

Campbell, A. (1996) Cryptic prophages. In Escherichia coliand Salmonella: Cellular and Molecular Biology.

Neidhardt, F. (ed.). Washington, DC: American Society forMicrobiology Press, pp. 2041–2046.

Campbell, A., and Botstein, D. (1983) Evolution of the lamb-doid phages. In Lambda II. Hendrix, R., Roberts, J.W.,Stahl, F.W., and Weisberg, R. (eds). Cold Spring Harbor,NY: Cold Spring Harbor Laboratory Press, pp. 365–380.

Campbell, A., Schneider, S.J., and Song, B. (1992) Lamb-doid phages as elements of bacterial genomes (integrase/phage 21/Escherichia coli K-12/icd gene). Genetica 86:259–267.

Canchaya, C., Desiere, F., McShan, W., Ferretti, J., Parkhill,J., and Brussow, H. (2002) Genome analysis of an induc-ible prophage and prophage remnants integrated intoStreptococcus pyogenes strain SF370. Virology 302: 245–258.

Casjens, S. (1997) Principles of virion structure, function andassembly. In Structural Biology of Viruses. Chiu, W., Bur-nett, R., and Garcea, R. (eds). Oxford: Oxford UniversityPress, pp. 3–37.

Casjens, S., and Hendrix, R. (1988) Control mechanisms indsDNA bacteriophage assembly. In The Bacteriophages,Vol. 1. Calendar, R. (ed.). New York: Plenum Press, pp.15–91.

Casjens, S., and Hendrix, R. (2003) Bacteriophage roles inbacterial chromosome evolution. In The Bacterial Chromo-some. Higgins, P. (ed.). Washington, DC: American Soci-ety for Microbiology Press, (in press).

Casjens, S., Hatfull, G., and Hendrix, R. (1992) Evolution ofdsDNA tailed-bacteriophage genomes. Semin Virol 3:383–397.

Casjens, S., van Vugt, R., Tilly, K., Rosa, P.A., and Steven-son, B. (1997) Homology throughout the multiple 32-kilo-base circular plasmids present in Lyme diseasespirochetes. J Bacteriol 179: 217–227.

Casjens, S., Palmer, N., Van Vugt, R., Mun Huang, W.,Stevenson, B., Rosa, P., et al. (2000) A bacterial genomein flux: the twelve linear and nine circular extrachromo-somal DNAs in an infectious isolate of the Lyme diseasespirochaete Borrelia burgdorferi. Mol Microbiol 35: 490–516.

Chang, C.C., Gilsdorf, J.R., DiRita, V.J., and Marrs, C.F.(2000) Identification and genetic characterization of Hae-mophilus influenzae genetic island 1. Infect Immun 68:2630–2637.

Chang, K.H., Wen, F.S., Tseng, T.T., Lin, N.T., Yang, M.T.,and Tseng, Y.H. (1998) Sequence analysis and expressionof the filamentous phage fLf gene I encoding a 48-kDaprotein associated with host cell membrane. Biochem Bio-phys Res Commun 245: 313–318.

Cheetham, B.F., and Katz, M.E. (1995) A role for bacterioph-ages in the evolution and transfer of bacterial virulencedeterminants. Mol Microbiol 18: 201–208.

Chopin, M.C., Chopin, A., Rouault, A., and Galleron, N.(1989) Insertion and amplification of foreign genes in theLactococcus lactis subsp. lactis chromosome. Appl Envi-ron Microbiol 55: 1769–1774.

Chopin, A., Bolotin, A., Sorokin, A., Ehrlich, S.D., andChopin, M. (2001) Analysis of six prophages in Lactococ-cus lactis IL1403: different genetic structure of temperateand virulent phage populations. Nucleic Acids Res 29:644–651.

Page 20: Prophages and Bacterial Genomics What Have We

296 S. Casjens

© 2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277–300

Conter, A., Bouche, J.P., and Dassain, M. (1996) Identifica-tion of a new inhibitor of essential division gene ftsZ as thekil gene of defective prophage Rac. J Bacteriol 178: 5100–5104.

Damman, C.J., Eggers, C.H., Samuels, D.S., and Oliver, D.B.(2000) Characterization of Borrelia burgdorferi BlyA andBlyB proteins: a prophage-encoded holin-like system. JBacteriol 182: 6791–6797.

Davis, B.M., Kimsey, H.H., Chang, W., and Waldor, M.K.(1999) The Vibrio cholerae O139 Calcutta bacteriophageCTXf is infectious and encodes a novel repressor. J Bac-teriol 181: 6779–6787.

Deng, W., Burland, V., Plunkett, G., III, Boutin, A., Mayhew,G.F., Liss, P., et al. (2002) Genome sequence of Yersiniapestis KIM. J Bacteriol 184: 4601–4611.

Dep, M.S., Mendz, G.L., Trend, M.A., Coloe, P.J., Fry, B.N.,and Korolik, V. (2001) Differentiation between Campylo-bacter hyoilei and Campylobacter coli using genotypic andphenotypic analyses. Int J Syst Evol Microbiol 51: 819–826.

Desiere, F., Mahanivong, C., Hillier, A.J., Chandry, P.S.,Davidson, B.E., and Brussow, H. (2001) Comparativegenomics of lactococcal phages: insight from the completegenome sequence of Lactococcus lactis phage BK5-T.Virology 283: 240–252.

Dosch, D.C., Helmer, G.L., Sutton, S.H., Salvacion, F.F., andEpstein, W. (1991) Genetic analysis of potassium transportloci in Escherichia coli: evidence for three constitutive sys-tems mediating uptake potassium. J Bacteriol 173: 687–696.

Eggers, C.H., and Samuels, D.S. (1999) Molecular evidencefor a new bacteriophage of Borrelia burgdorferi. J Bacteriol181: 7308–7313.

Eggers, C.H., Casjens, S., Hayes, S.F., Garon, C.F., Dam-man, C.J., Oliver, D.B., et al. (2000) Bacteriophages ofspirochetes. J Mol Microbiol Biotechnol 2: 365–373.

Emmerth, M., Goebel, W., Miller, S.I., and Hueck, C.J. (1999)Genomic subtraction identifies Salmonella typhimuriumprophages, F-related plasmid sequences, and a novel fim-brial operon, stf, which are absent in Salmonella typhi. JBacteriol 181: 5652–5661.

Espion, D., Kaiser, K., and Dambly-Chaudiere, C. (1983) Athird defective lambdoid prophage of Escherichia coli K12defined by the lambda derivative, lambdaqin111. J Mol Biol170: 611–633.

Evans, R., Seeley, N.R., and Kuempel, P.L. (1979) Loss ofrac locus DNA in merozygotes of Escherichia coli K12. MolGen Genet 175: 245–250.

Faubladier, M., and Bouche, J.P. (1994) Division inhibitiongene dicF of Escherichia coli reveals a widespread groupof prophage sequences in bacterial genomes. J Bacteriol176: 1150–1156.

Feinstein, S.I., and Low, K.B. (1982) Zygotic induction of therac locus can cause cell death in E. coli. Mol Gen Genet187: 231–235.

Ferretti, J.J., McShan, W.M., Ajdic, D., Savic, D.J., Savic, G.,Lyon, K., et al. (2001) Complete genome sequence of anM1 strain of Streptococcus pyogenes. Proc Natl Acad SciUSA 98: 4658–4663.

Figueroa-Bossi, N., and Bossi, L. (1999) Inducible prophagescontribute to Salmonella virulence in mice. Mol Microbiol33: 167–176.

Fleischmann, R.D., Adams, M.D., White, O., Clayton, R.A.,Kirkness, E.F., Kerlavage, A.R., et al. (1995) Whole-genome random sequencing and assembly of Haemophi-lus influenzae Rd. Science 269: 496–512.

Fleischmann, R., Alland, D., Eisen, J., Carpenter, L., White,O., Peterson, J., et al. (2002) Whole-genome comparisonof Mycobacterium tuberculosis clinical and laboratorystrains. J Bacteriol 184: 5479–5490.

Fouts, K.E., Wasie-Gilbert, T., Willis, D.K., Clark, A.J., andBarbour, S.D. (1983) Genetic analysis of transposon-induced mutations of the Rac prophage in Escherichia coliK-12 which affect expression and function of recE. J Bac-teriol 156: 718–726.

Fraser, C.M., Casjens, S., Huang, W.M., Sutton, G.G., Clay-ton, R., Lathigra, R., et al. (1997) Genomic sequence of aLyme disease spirochaete, Borrelia burgdorferi. Nature390: 580–586.

Freifelder, D., and Meselson, M. (1970) Topological rela-tionship of prophage lambda to the bacterial chromo-some in lysogenic cells. Proc Natl Acad Sci USA 65:200–205.

Galibert, F., Finan, T.M., Long, S.R., Puhler, A., Abola, P.,Ampe, F., et al. (2001) The composite genome of thelegume symbiont Sinorhizobium meliloti. Science 293:668–672.

Garcia-Vallve, S., Palau, J., and Romeu, A. (1999) Horizontalgene transfer in glycosyl hydrolases inferred from codonusage in Escherichia coli and Bacillus subtilis. Mol BiolEvol 16: 1125–1134.

George, D.G., Yeh, L.S., and Barker, W.C. (1983) Unex-pected relationships between bacteriophage lambda hypo-thetical proteins and bacteriophage T4 tail-fiber proteins.Biochem Biophys Res Commun 115: 1061–1068.

Gerstein, M. (1998) Measurement of the effectiveness oftransitive sequence comparison, through a third ‘interme-diate’ sequence. Bioinformatics 14: 707–714.

Ghelardini, P., La Valle, R., and Paolozzi, L. (1994) The Mugem operon: its role in gene expression, recombinationand cell cycle. Genetica 94: 151–156.

Gildmeister, E., and Herzberg, K. (1924) Zur theorie derbakteriophagen (d’Herelle Lysine). 6. Mitteilung über dasd’Herellesche phanomen. Zentr Bakteriol Parasitenk I AbtOrig 93: 402–420.

Girons, I.S., Bourhy, P., Ottone, C., Picardeau, M., Yelton,D., Hendrix, R.W., et al. (2000) The LE1 bacteriophagereplicates as a plasmid within Leptospira biflexa: construc-tion of an L. biflexa–Escherichia coli shuttle vector. J Bac-teriol 182: 5700–5705.

Glaser, P., Frangeul, L., Buchrieser, C., Rusniok, C., Amend,A., Baquero, F., et al. (2001) Comparative genomics ofListeria species. Science 294: 849–852.

Goodman, S.D., and Scocca, J.J. (1988) Identification andarrangement of the DNA sequence recognized in specifictransformation of Neisseria gonorrhoeae. Proc Natl AcadSci USA 85: 6982–6986.

Gratia, J.P. (1989) Products of defective lysogeny in Serratiamarcescens SMG 38 and their activity against Escherichiacoli and other Enterobacteria. J Gen Microbiol 135: 25–35.

Greener, A., and Hill, C.W. (1980) Identification of a novelgenetic element in Escherichia coli K-12. J Bacteriol 144:312–321.

Page 21: Prophages and Bacterial Genomics What Have We

Prophage genomics 297

© 2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277–300

Harshey, R. (1988) Phage Mu. In The Bacteriophages, Vol.1. Calendar, R. (ed.). New York: Plenum Press, pp. 193–234.

Hayashi, T., Makino, K., Ohnishi, M., Kurokawa, K., Ishii, K.,Yokoyama, K., et al. (2001a) Complete genome sequenceof enterohemorrhagic Escherichia coli O157:H7 andgenomic comparison with a laboratory strain K-12. DNARes 8: 11–22.

Hayashi, T., Makino, K., Ohnishi, M., Kurokawa, K., Ishii, K.,Yokoyama, K., et al. (2001b) Complete genome sequenceof enterohemorrhagic Escherichia coli O157: H7 andgenomic comparison with a laboratory strain K-12. DNARes 8 (Suppl.): 47–52.

Heidelberg, J.F., Paulsen, I.T., Nelson, K.E., Gaidos, E.J.,Nelson, W.C., Read, T.D., et al. (2002) Genomesequence of the dissimilatory metal ion-reducing bacte-rium Shewanella oneidensis. Nature Biotechnol 20: 1118–1123.

Hendrix, R.W., and Duda, R.L. (1998) Bacteriophage HK97head assembly: a protein ballet. Adv Virus Res 50: 235–288.

Hendrix, R.W., Smith, M.C., Burns, R.N., Ford, M.E., andHatfull, G.F. (1999) Evolutionary relationships amongdiverse bacteriophages and prophages: all the world’s aphage. Proc Natl Acad Sci USA 96: 2192–2197.

Hendrix, R.W., Lawrence, J.G., Hatfull, G.F., and Casjens,S. (2000) The origins and ongoing evolution of viruses.Trends Microbiol 8: 504–508.

Huggins, A.R., and Sandine, W.E. (1977) Incidence andproperties of temperate bacteriophages induced from lacticstreptococci. Appl Environ Microbiol 33: 184–191.

Humphrey, S.B., Stanton, T.B., Jensen, N.S., and Zuerner,R.L. (1997) Purification and characterization of VSH-1, ageneralized transducing bacteriophage of Serpulina hyod-ysenteriae. J Bacteriol 179: 323–329.

Ikeda, H., and Tomizowa, J. (1968) Prophage P1, an extra-chromosoal replication unit. Cold Spring Harb Symp QuantBiol 33: 791–798.

Inal, J.M., and Karunakaran, K.V. (1996) f20, a temperatebacteriophage isolated from Bacillus anthracis exists as aplasmidial prophage. Curr Microbiol 32: 171–175.

Jian, W., Li, Z., Zhang, Z., Baker, M., Prevelige, P., and Chiu,W. (2003) Coat protein fold and maturation transition ofbacteriophage P22 seen at sub-nanometer resolution.Nature Struct Biol 10: 131–135.

Jin, Q., Yuan, Z., Xu, J., Wang, Y., Shen, Y., Lu, W., et al.(2002) Genome sequence of Shigella flexneri 2a: insightsinto pathogenicity through comparison with genomes ofEscherichia coli K12 and O157. Nucleic Acids Res 30:4432–4441.

Jin, S., Chen, Y., Christie, G.E., and Benedik, M.J. (1996)Regulation of the Serratia marcescens extracellularnuclease: positive control by a homolog of P2 Ogr encodedby a cryptic prophage. J Mol Biol 256: 264–278.

Juhala, R.J., Ford, M.E., Duda, R.L., Youlton, A., Hatfull,G.F., and Hendrix, R.W. (2000) Genomic sequences ofbacteriophages HK97 and HK022: pervasive geneticmosaicism in the lambdoid bacteriophages. J Mol Biol299: 27–51.

Kaiser, K. (1980) The origin of Q-independent derivatives ofphage lambda. Mol Gen Genet 179: 547–554.

Kaiser, K., and Murray, N.E. (1979) Physical characterisationof the ‘Rac prophage’ in E. coli K12. Mol Gen Genet 175:159–174.

King, G., and Murray, N.E. (1995) Restriction alleviation andmodification enhancement by the Rac prophage of Escher-ichia coli K-12. Mol Microbiol 16: 769–777.

Kirby, J., Trempy, J., and Gottesman, S. (1994) Excisionof a P4-like cryptic prophage leads to Alp proteaseexpression in Escherichia coli. J Bacteriol 176: 2068–2081.

Klee, S.R., Nassif, X., Kusecek, B., Merker, P., Beretti, J.L.,Achtman, M., et al. (2000) Molecular and biological analy-sis of eight genetic islands that distinguish Neisseria men-ingitidis from the closely related pathogen Neisseriagonorrhoeae. Infect Immun 68: 2082–2095.

Klein, R., Baranyl, U., Rossler, N., Greineder, B., Scholz, H.,and Witte, A. (2002) Natrialba magadii virus fCh1: firstcomplete nucleotide sequence and functional organizationof a virus infecting a haloalkaliphilic archaeon. Mol Micro-biol 45: 851–863.

Kofoid, E., Rappleye, C., Stojiljkovic, I., and Roth, J. (1999)The 17-gene ethanolamine (eut) operon of Salmonellatyphimurium encodes five homologues of carboxysomeshell proteins. J Bacteriol 181: 5317–5329.

Krogh, S., O’Reilly, M., Nolan, N., and Devine, K.M. (1996)The phage-like element PBSX and part of the skin ele-ment, which are resident at different locations on the Bacil-lus subtilis chromosome, are highly homologous.Microbiology 142: 2031–2040.

Kropinski, A.M. (2000) Sequence of the genome of the tem-perate, serotype-converting, Pseudomonas aeruginosabacteriophage D3. J Bacteriol 182: 6066–6074.

Kuhn, J., and Campbell, A. (2001) The bacteriophage lambdaattachment site in wild strains of Escherichia coli. J MolEvol 53: 607–614.

Kunst, F., Ogasawara, N., Moszer, I., Albertini, A.M., Alloni,G., Azevedo, V., et al. (1997) The complete genomesequence of the gram-positive bacterium Bacillus subtilis.Nature 390: 249–256.

Kuroda, M., Ohta, T., Uchiyama, I., Baba, T., Yuzawa, H.,Kobayashi, I., et al. (2001) Whole genome sequencing ofmeticillin-resistant Staphylococcus aureus. Lancet 357:1225–1240.

Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody,M.C., Baldwin, J., et al. (2001) Initial sequencing and anal-ysis of the human genome. Nature 409: 860–921.

Lang, A.S., and Beatty, J.T. (2001) The gene transfer agentof Rhodobacter capsulatus and ‘constitutive transduction’in prokaryotes. Arch Microbiol 175: 241–249.

Lang, A.S., Beatty, J.T., LeBlanc, H., Towers, G., Harris, J.,Lang, G., et al. (2000) Genetic analysis of a bacterialgenetic exchange element: the gene transfer agent ofRhodobacter capsulatus. Proc Natl Acad Sci USA 97:859–864.

Lawrence, J.G., Hendrix, R.W., and Casjens, S. (2001)Where are the bacterial pseudogenes? Trends Microbiol 9:535–540.

Lawrence, J.G., Hatfull, G., and Hendrix, R. (2002) Theimbroglios of viral taxonomy: genetic exchange and thefailings of phenetic approaches. J Bacteriol 184: 4891–4905.

Page 22: Prophages and Bacterial Genomics What Have We

298 S. Casjens

© 2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277–300

Lazarevic, V., Dusterhoft, A., Soldo, B., Hilbert, H., Mauel,C., and Karamata, D. (1999) Nucleotide sequence of theBacillus subtilis temperate bacteriophage SPbc2. Microbi-ology 145: 1055–1067.

Lederberg, E. (1951) Lysogenicity in E. coli K-12. Genetics36: 560.

Lewis, R.J., Brannigan, J.A., Offen, W.A., Smith, I., andWilkinson, A.J. (1998) An evolutionary link between sporu-lation and prophage induction in the structure of a repres-sor: anti-repressor complex. J Mol Biol 283: 907–912.

Lin, N.T., Chang, R.Y., Lee, S.J., and Tseng, Y.H. (2001)Plasmids carrying cloned fragments of RF DNA from thefilamentous phage fLf can be integrated into the host chro-mosome via site-specific integration and homologousrecombination. Mol Gen Genet 266: 425–435.

Lindsey, D.F., Mullin, D.A., and Walker, J.R. (1989) Charac-terization of the cryptic lambdoid prophage DLP12 ofEscherichia coli and overlap of the DLP12 integrase genewith the tRNA gene argU. J Bacteriol 171: 6197–6205.

Loessner, M.J., Inman, R.B., Lauer, P., and Calendar, R.(2000) Complete nucleotide sequence, molecular analysisand genome structure of bacteriophage A118 of Listeriamonocytogenes: implications for phage evolution. MolMicrobiol 35: 324–340.

Low, K.B. (1973) Restoration of the rac locus of recombinantforming ability in recB– and recC– merozygotes of Escher-ichia coli K12. Mol Gen Genet 122: 119–130.

Lucchini, S., Desiere, F., and Brussow, H. (1999) Compara-tive genomics of Streptococcus thermophilus phage spe-cies supports a modular evolution theory. J Virol 73: 8647–8656.

Lwoff, A. (1953) Lysogeny. Bacteriol Rev 17: 269–337.Lwoff, A. (1966) The prophage and I. In Phage and the

Origins of Molecular Biology. Cairns, J., Stent, G., andWatson, J. (eds). Cold Spring Harbor, NY: Cold SpringHarbor Laboratory Press, pp. 88–99.

McClelland, M., Florea, L., Sanderson, K., Clifton, S.W.,Parkhill, J., Churcher, C., et al. (2000) Comparison of theEscherichia coli K-12 genome with sampled genomes of aKlebsiella pneumoniae and three Salmonella entericaserovars, Typhimurium, Typhi and Paratyphi. Nucleic AcidsRes 28: 4974–4986.

McClelland, M., Sanderson, K.E., Spieth, J., Clifton, S.W.,Latreille, P., Courtney, L., et al. (2001) Complete genomesequence of Salmonella enterica serovar TyphimuriumLT2. Nature 413: 852–856.

McDonnell, G.E., Wood, H., Devine, K.M., and McConnell,D.J. (1994) Genetic control of bacterial suicide: regulationof the induction of PBSX in Bacillus subtilis. J Bacteriol176: 5820–5830.

McDonough, M.A., and Butterton, J.R. (1999) Spontaneoustandem amplification and deletion of the shiga toxinoperon in Shigella dysenteriae 1. Mol Microbiol 34: 1058–1069.

Mahajan, S.K., Chu, C.C., Willis, D.K., Templin, A., and Clark,A.J. (1990) Physical analysis of spontaneous andmutagen-induced mutants of Escherichia coli K-12expressing DNA exonuclease VIII activity. Genetics 125:261–273.

Mahdi, A.A., Sharples, G.J., Mandal, T.N., and Lloyd, R.G.(1996) Holliday junction resolvases encoded by homolo-

gous rusA genes in Escherichia coli K-12 and phage 82.J Mol Biol 257: 561–573.

Masui, S., Kamoda, S., Sasaki, T., and Ishikawa, H. (2000)Distribution and evolution of bacteriophage WO in Wolba-chia, the endosymbiont causing sexual alterations inarthropods. J Mol Evol 51: 491–497.

Masui, S., Kuroiwa, H., Sasaki, T., Inui, M., Kuroiwa, T., andIshikawa, H. (2001) Bacteriophage WO and virus-like par-ticles in Wolbachia, an endosymbiont of arthropods. Bio-chem Biophys Res Commun 283: 1099–1104.

Mediavilla, J., Jain, S., Kriakov, J., Ford, M.E., Duda, R.L.,Jacobs, W.R., Jr, et al. (2000) Genome organization andcharacterization of mycobacteriophage Bxb1. Mol Micro-biol 38: 955–970.

Meselson, M. (1967) Reciprocal recombination in prophagelambda. J Cell Physiol 70 (Suppl. 1): 113–118.

Miao, E.A., and Miller, S.I. (1999) Bacteriophages in theevolution of pathogen–host interactions. Proc Natl AcadSci USA 96: 9452–9454.

Milkman, R., and Bridges, M.M. (1990) Molecular evolutionof the Escherichia coli chromosome. III. Clonal frames.Genetics 126: 505–517.

Mizuno, M., Masuda, S., Takemaru, K., Hosono, S., Sato, T.,Takeuchi, M., et al. (1996) Systematic sequencing of the283 kb 210 degrees-232 degrees region of the Bacillussubtilis genome containing the skin element and manysporulation genes. Microbiology 142: 3103–3111.

Montag, D., and Henning, U. (1987) An open reading framein the Escherichia coli bacteriophage lambda genomeencodes a protein that functions in assembly of the longtail fibers of bacteriophage T4. J Bacteriol 169: 5884–5886.

Moreira, D. (2000) Multiple independent horizontal transfersof informational genes from bacteria to plasmids andphages: implications for the origin of bacterial replicationmachinery. Mol Microbiol 35: 1–5.

Morgan, G., Hatfull, G., Casjens, S., and Hendrix, R. (2002)Bacteriophage Mu genome sequence: analysis and com-parison with Mu-like prophages in Haemophilus, Neisseriaand Deinococcus. J Mol Biol 317: 337–359.

Morimyo, M., Hongo, E., Hama-Inaba, H., and Machida, I.(1992) Cloning and characterization of the mvrC gene ofEscherichia coli K-12 which confers resistance againstmethyl viologen toxicity. Nucleic Acids Res 20: 3159–3165.

Nakayama, K., Kanaya, S., Ohnishi, M., Terawaki, Y., andHayashi, T. (1999) The complete nucleotide sequence offCTX, a cytotoxin-converting phage of Pseudomonasaeruginosa: implications for phage evolution and horizontalgene transfer via bacteriophages. Mol Microbiol 31: 399–419.

Nakayama, K., Takashima, K., Ishihara, H., Shinomiya, T.,Kageyama, M., Kanaya, S., et al. (2000) The R-type pyocinof Pseudomonas aeruginosa is related to P2 phage, andthe F-type is related to lambda phage. Mol Microbiol 38:213–231.

Nelson, K.E., Weinel, C., Paulsen, I.T., Dodson, R.J., Hilbert,H., Martins dos Santos, V.A., et al. (2002) Completegenome sequence and comparative analysis of the meta-bolically versatile Pseudomonas putida KT2440. EnvironMicrobiol 4: 799–808.

Nguyen, A.H., Tomita, T., Hirota, M., Sato, T., and Kamio, Y.(1999) A simple purification method and morphology and

Page 23: Prophages and Bacterial Genomics What Have We

Prophage genomics 299

© 2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277–300

component analyses for carotovoricin Er, a phage-tail-likebacteriocin from the plant pathogen Erwinia carotovora Er.Biosci Biotechnol Biochem 63: 1360–1369.

Nolling, J., Breton, G., Omelchenko, M.V., Makarova, K.S.,Zeng, Q., Gibson, R., et al. (2001) Genome sequence andcomparative analysis of the solvent-producing bacteriumClostridium acetobutylicum. J Bacteriol 183: 4823–4838.

Ochman, H., and Wilson, A.C. (1987) Evolution in bacteria:evidence for a universal substitution rate in cellulargenomes. J Mol Evol 26: 74–86.

Ohnishi, M., Kurokawa, K., and Hayashi, T. (2001) Diversifi-cation of Escherichia coli genomes: are bacteriophagesthe major contributors? Trends Microbiol 9: 481–485.

Ojaimi, C., Brooks, C., Casjens, S., Rosa, P., Elias, A., Bar-bour, A., et al. (2003) Profiling temperature-inducedchanges in Borrelia burgdorferi gene expression usingwhole genome arrays. Infect Immun 71: 1689–1705.

Okamato, K., Mudd, J., Mangon, J., Huang, W.M., and Mar-mur, J. (1968) Properties of the defective phage of Bacillussubtilis. J Mol Biol 34: 413–428.

Osawa, R., Iyoda, S., Nakayama, S.I., Wada, A., Yamai, S.,and Watanabe, H. (2000) Genotypic variations of Shigatoxin-converting phages from enterohaemorrhagic Escher-ichia coli O157:H7 isolates. J Med Microbiol 49: 565–574.

Pappenheimer, A.M., Jr, and Murphy, J.R. (1983) Studies onthe molecular epidemiology of diphtheria. Lancet 2: 923–926.

Parkhill, J., Achtman, M., James, K.D., Bentley, S.D.,Churcher, C., Klee, S.R., et al. (2000) Complete DNAsequence of a serogroup A strain of Neisseria meningitidisZ2491. Nature 404: 502–506.

Parkhill, J., Dougan, G., James, K.D., Thomson, N.R., Pick-ard, D., Wain, J., et al. (2001a) Complete genomesequence of a multiple drug resistant Salmonella entericaserovar Typhi CT18. Nature 413: 848–852.

Parkhill, J., Wren, B.W., Thomson, N.R., Titball, R.W.,Holden, M.T., Prentice, M.B., et al. (2001b) Genomesequence of Yersinia pestis, the causative agent of plague.Nature 413: 523–527.

Pedulla, M.L., Ford, M.E., Karthikeyan, T., Houtz, J.M., Hen-drix, R.W., Hatfull, G.F., et al. (2003) Corrected sequenceof the bacteriophage P22 genome. J Bacteriol 185: 1475–1477.

Perna, N.T., Plunkett, G., III, Burland, V., Mau, B., Glasner,J.D., Rose, D.J., et al. (2001) Genome sequence of entero-haemorrhagic Escherichia coli O157:H7. Nature 409: 529–533.

Perna, N., Glasner, J., Burland, V., and Plunkett, G. III (2002)The Genomes of Escherichia coli K-12 and Pathogenic E.coli. In Escherichia coli: Virulence Mechanisms of a Versa-tile Pathogen. Donnenberg, M. (ed.). San Diego, CA: Aca-demic Press, pp. 3–53.

Pfister, P., Wasserfallen, A., Stettler, R., and Leisinger, T.(1998) Molecular genomics of methanobacterium phageYM2. Mol Microbiol 30: 233–244.

Popp, A., Hertwig, S., Lurz, R., and Appel, B. (2000) Com-parative study of temperate bacteriophages isolated fromYersinia. Syst Appl Microbiol 23: 469–478.

Ramirez, M., Severina, E., and Tomasz, A. (1999) A highincidence of prophage carriage among natural isolates ofStreptococcus pneumoniae. J Bacteriol 181: 3618–3625.

Rapp, B., and Wall, J. (1987) Genetic transfer in Desulfovibriodesulfuricans. Proc Natl Acad Sci USA 84: 9128–9130.

Ravin, V.K. (1968) The functioning of the genes of temperatebacteriophage in lysogenic cells. Genetika 4: 119–124 (inRussian).

Ravin, V., and Shulga, M.G. (1970) The evidence of extrach-romosomal location phage prophage N15. Virology 40:800–805.

Ravin, V., Ravin, N., Casjens, S., Ford, M.E., Hatfull, G.F.,and Hendrix, R.W. (2000) Genomic sequence and analysisof the atypical temperate bacteriophage N15. J Mol Biol299: 53–73.

Read, T.D., Brunham, R.C., Shen, C., Gill, S.R., Heidelberg,J.F., White, O., et al. (2000) Genome sequences ofChlamydia trachomatis MoPn and Chlamydia pneumoniaeAR39. Nucleic Acids Res 28: 1397–1406.

Recktenwald, J., and Schmidt, H. (2002) The nucleotidesequence of Shiga toxin (Stx) 2e-encoding phage fP27 isnot related to other Stx phage genomes, but the modulargenetic structure is conserved. Infect Immun 70: 1896–1908.

Redfield, R.J., and Campbell, A. (1987) Structure of crypticl prophages. J Mol Biol 198: 393–404.

Reid, S.D., Herbelin, C.J., Bumbaugh, A.C., Selander, R.K.,and Whittam, T.S. (2000) Parallel evolution of virulence inpathogenic Escherichia coli. Nature 406: 64–67.

Retallack, D.M., Johnson, L.L., and Friedman, D.I. (1994)Role for 10Sa RNA in the growth of lambda-P22 hybridphage. J Bacteriol 176: 2082–2089.

Rudd, K.E. (1999) Novel intergenic repeats of Escherichiacoli K-12. Res Microbiol 150: 653–664.

Ruzin, A., Lindsay, J., and Novick, R.P. (2001) Moleculargenetics of SaPI1 – a mobile pathogenicity island in Sta-phylococcus aureus. Mol Microbiol 41: 365–377.

Sandt, C.H., and Hill, C.W. (2000) Four different genesresponsible for nonimmune immunoglobulin-binding activ-ities within a single strain of Escherichia coli. Infect Immun68: 2205–2214.

Schicklmaier, P., Moser, E., Wieland, T., Rabsch, W., andSchmieger, H. (1998) A comparative study on the fre-quency of prophages among natural isolates of Salmonellaand Escherichia coli with emphasis on generalized trans-ducers. Antonie Van Leeuwenhoek 73: 49–54.

Schlosser, A., Kluttig, S., Hamann, A., and Bakker, E.P.(1991) Subcloning, nucleotide sequence, and expressionof trkG, a gene that encodes an integral membrane proteininvolved in potassium uptake via the Trk system of Escher-ichia coli. J Bacteriol 173: 3170–3176.

Schmieger, H., and Schicklmaier, P. (1999) Transduction ofmultiple drug resistance of Salmonella enterica serovartyphimurium DT104. FEMS Microbiol Lett 170: 251–256.

Shimizu, T., Ohtani, K., Hirakawa, H., Ohshima, K., Yamash-ita, A., Shiba, T., et al. (2002) Complete genome sequenceof Clostridium perfringens, an anaerobic flesh-eater. ProcNatl Acad Sci USA 99: 996–1001.

da Silva, A.C., Ferro, J.A., Reinach, F.C., Farah, C.S., Furlan,L.R., Quaggio, R.B., et al. (2002) Comparison of thegenomes of two Xanthomonas pathogens with differinghost specificities. Nature 417: 459–463.

Simpson, A.J., Reinach, F.C., Arruda, P., Abreu, F.A., Acen-cio, M., Alvarenga, R., et al. (2000) The genome sequence

Page 24: Prophages and Bacterial Genomics What Have We

300 S. Casjens

© 2003 Blackwell Publishing Ltd, Molecular Microbiology, 49, 277–300

of the plant pathogen Xylella fastidiosa. Nature 406: 151–157.

Six, E. (1963) A defective phage depending on phage P2.Bacteriol Proc 80: 138.

Smith, M.C., Burns, N., Sayers, J.R., Sorrell, J.A., Casjens,S.R., and Hendrix, R.W. (1998) Bacteriophage collagen.Science 279: 1834.

Smith, M.C., Burns, R.N., Wilson, S.E., and Gregory, M.A.(1999) The complete sequence of the Streptomyces tem-perate phage fC31: evolutionary relationships to otherviruses. Nucleic Acids Res 27: 2145–2155.

Smith, T.J., Blackman, S.A., and Foster, S.J. (2000) Autol-ysins of Bacillus subtilis: multiple enzymes with multiplefunctions. Microbiology 146: 249–262.

Smoot, J.C., Barbian, K.D., Van Gompel, J.J., Smoot, L.M.,Chaussee, M.S., Sylva, G.L., et al. (2002) Genomesequence and comparative microarray analysis of serotypeM18 group A Streptococcus strains associated with acuterheumatic fever outbreaks. Proc Natl Acad Sci USA 99:4668–4673.

Stanley, E., Fitzgerald, G.F., Le Marrec, C., Fayard, B., andvan Sinderen, D. (1997) Sequence analysis and charac-terization of fO1205, a temperate bacteriophage infectingStreptococcus thermophilus CNRZ1205. Microbiology143: 3417–3429.

Starich, T., Cordes, P., and Zissler, J. (1985) Transposontagging to detect a latent virus in Myxococcus xanthus.Science 230: 541–543.

Stover, C.K., Pham, X.Q., Erwin, A.L., Mizoguchi, S.D., War-rener, P., Hickey, M.J., et al. (2000) Complete genomesequence of Pseudomonas aeruginosa PA01, an opportu-nistic pathogen. Nature 406: 959–964.

Susskind, M.M., and Botstein, D. (1978) Molecular geneticsof bacteriophage P22. Microbiol Rev 42: 385–413.

Susskind, M.M., and Botstein, D. (1980) Superinfectionexclusion by lambda prophage in lysogens of Salmonellatyphimurium. Virology 100: 212–216.

Susskind, M.M., Botstein, D., and Wright, A. (1974) Superin-fection exclusion by P22 prophage in lysogens of Salmo-nella typhimurium. III. Failure of superinfecting phage DNAto enter sieA+ lysogens. Virology 62: 350–366.

Takemaru, K., Mizuno, M., Sato, T., Takeuchi, M., and Koba-yashi, Y. (1995) Complete nucleotide sequence of a skinelement excised by DNA rearrangement during sporulationin Bacillus subtilis. Microbiology 141: 323–327.

Tang, S., Nutthall, S., Ngui, K., Fisher, C., Lopez, P., andDyall-Smith, M. (2002) HF2: a double-stranded DNA tailedhaloarcheal virus with a mosaic genome. Mol Microbiol 44:283–296.

Tettelin, H., Saunders, N.J., Heidelberg, J., Jeffries, A.C.,Nelson, K.E., Eisen, J.A., et al. (2000) Complete genomesequence of Neisseria meningitidis serogroup B strainMC58. Science 287: 1809–1815.

Thaler, J.O., Baghdiguian, S., and Boemare, N. (1995) Puri-fication and characterization of xenorhabdicin, a phage tail-like bacteriocin, from the lysogenic strain F1 of Xenorhab-dus nematophilus. Appl Environ Microbiol 61: 2049–2052.

Van Sluys, M.A., de Oliveira, M.C., Monteiro-Vitorello, C.B.,Miyaki, C.Y., Furlan, L.R., Camargo, L.E., et al. (2003)Comparative analyses of the complete genome sequences

of Pierce’s disease and citrus variegated chlorosis strainsof Xylella fastidiosa. J Bacteriol 185: 1018–1026.

Van Vliet, F., Boyen, A., and Glansdorff, N. (1988) On inter-species gene transfer: the case of the argF gene of Escher-ichia coli. Ann Inst Pasteur Microbiol 139: 493–496.

Wagner, P.L., and Waldor, M.K. (2002) Bacteriophage con-trol of bacterial virulence. Infect Immun 70: 3985–3993.

Waldor, M.K. (1998) Bacteriophage biology and bacterial vir-ulence. Trends Microbiol 6: 295–297.

Waldor, M.K., and Mekalanos, J.J. (1996) Lysogenic conver-sion by a filamentous phage encoding cholera toxin. Sci-ence 272: 1910–1914.

Wall, J.D., Weaver, P.F., and Gest, H. (1975) Gene transferagents, bacteriophages, and bacteriocins ofRhodopseudomonas capsulata. Arch Microbiol 105: 217–224.

Wang, F.S., Whittam, T.S., and Selander, R.K. (1997) Evo-lutionary genetics of the isocitrate dehydrogenase gene(icd) in Escherichia coli and Salmonella enterica. J Bacte-riol 179: 6551–6559.

Welch, R.A., Burland, V., Plunkett, G., III, Redford, P., Roe-sch, P., Rasko, D., et al. (2002) Extensive mosaic structurerevealed by the complete genome sequence of uropatho-genic Escherichia coli. Proc Natl Acad Sci USA 99: 17020–17024.

Whatmore, A.M., and Dowson, C.G. (1999) The autolysin-encoding gene (lytA) of Streptococcus pneumoniae dis-plays restricted allelic variation despite localized recombi-nation events with genes of pneumococcal bacteriophageencoding cell wall lytic enzymes. Infect Immun 67: 4551–4556.

Wickner, S. (1984a) Oligonucleotide synthesis by Escheri-chia coli dnaG primase in conjunction with phage P22 gene12 protein. J Biol Chem 259: 14044–14047.

Wickner, S. (1984b) DNA-dependent ATPase activity associ-ated with phage P22 gene 12 protein. J Biol Chem 259:14038–14043.

Willis, D.K., Satin, L.H., and Clark, A.J. (1985) Mutation-dependent suppression of recB21 recC22 by a regioncloned from the Rac prophage of Escherichia coli K-12. JBacteriol 162: 1166–1172.

Yamamoto, K. (1967) The origin of bacteriophage P221.Virology 33: 545–547.

Yamamoto, N. (1969) Genetic evolution of bacteriophage. I.Hybrids between unrelated bacteriophages P22 and Fels2.Proc Natl Acad Sci USA 62: 63–69.

Yen, H.C., Hu, N.T., and Marrs, B.L. (1979) Characterizationof the gene transfer agent made by an overproducermutant of Rhodopseudomonas capsulata. J Mol Biol 131:157–168.

Zinder, N., and Lederberg, J. (1952) Genetic exchange inSalmonella. J Bacteriol 64: 679–699.

Zink, R., Loessner, M.J., and Scherer, S. (1995) Character-ization of cryptic prophages (monocins) in Listeria andsequence analysis of a holin/endolysin gene. Microbiology141: 2577–2584.

Zissler, J., Signer, E., and Sachaefer, F. (1971) The role ofrecombination in growth of bacteriophage lambda. In Bac-teriophage Lambda. Hershey, A.D. (ed.). Cold Spring Har-bor, NY: Cold Spring Harbor Laboratory Press, pp. 455–475.