chapter 1 general introduction -...
TRANSCRIPT
Chapter 1
General Introduction
1
1.1 Importance of Shigellosis
Diarrheal diseases claim the lives of at least five million children per year in developing
countries (Rohde, 1984), and shigellosis or bacillary dysentery is responsible for
approximately 10% of these deaths (Stoll, 1982). Shigella is an important human
pathogen, responsible for the majority of cases of endemic bacillary dysentery
prevalent in developing nations (Kotloff, 1999). In the developing world, it is estimated
that 113 million episodes of shigellosis occur annually, resulting in more than 400,000
deaths (Kotloff, 1999). Shigellosis is a pediatric and a third world disease where it
prevails 100 times more in underdeveloped and developing countries than the
industrialized countries (Bennish et al., 1990). Shigellosis is common among children
less than five years of age in developing countries and in persons who travel from
industrialized to less developed countries (Shlim, 1999). Industrialized countries also
report outbreaks of Shigella infections among high-risk populations such as children
attending day care (Mohle-Boetani, 1995; Pickering and Evans, 1981), persons with
human immunodeficiency virus/acquired immunodeficiency syndrome (Baer, 1999; van
Oosterhout, 1994), and inmates of custodial institutions (Mahoney, 1993). In developed
countries, outbreaks usually involve S. sonnei (Black et al., 1978). Low infectivity dose
i.e., 10-100 microorganisms orally administered can cause disease and severity of acute
complications (Bennish et al., 1990).
1.2 Historical Perspective of Shigellosis
Bacillary dysentery was first differentiated from amoebic dysentery in 1887 and an
etiologic agent, Bacillus dysenteriae, was isolated and described by Shiga in 1898. The
subsequent painstaking process of epidemiological, physiological, and serological
characterization of related dysentery bacilli culminated with the recommendations of
the 1950 Congress of the International Association of Microbiologists Shigella
Commission that Shigella be adopted as the generic name and that species subgroups be
designated A (Shigella dysenteriae), B (S. flexneri), C (S. boydii), and D (S. sonnei)
(Gerber et al., 1960). However, Current studies pose an argument that Shigella
emerged from multiple independent origins of Escherichia coli centuries ago and thus
may not constitute a genus (Pupo et al., 2000). The next milestone was the
characterization of the basic virulence mechanism of shigellosis. By the late 1950s, it
had been shown that shigellae can infect the corneal epithelium of guinea pigs (this is
the basis of the Sereny test) and it was also known that virulent organisms can be
2
grown intracellular in cultured mammalian cells (Gerber et al., 1960). Nonetheless, it
was the prevailing view as late as 1960 that shigellae cause disease by elaborating
endotoxin while adhering to the surface of the intestinal epithelium (Watkins, 1960). In
1964, however, it was conclusively demonstrated that S. flexneri causes disease by
penetrating the intestinal mucosa (LaBrec et al., 1964; Voino-Tasenetsky and Khavkin,
1964). During the late 1960s and early 1970s the pathogenic mechanism of shigellosis
was studied further, and the genetic basis of virulence was analyzed by constructing
intergenic species, a puzzling result of the later work was the finding that essentially the
entire chromosome of S.flexneri could be transferred to E.coli without reconstituting the
virulence phenotype of the donor. This enigma was resolved by the seminal work of P.
J. Sansonetti and colleagues in the early 1980s showing that virulence in Shigella
species is dependent upon a family of large plasmids (Watkins, 1960; Sansonetti,
1982). Studies of Litwin et al., 1991, indicate a correlation between serotype and
plasmid patterns which also suggests that plasmid profiles can be useful in
identification of epidemic clones of S. flexneri as they are introduced in a population.
Genetic variability between serotypes accentuates the problems in development of
vaccine as immunity to Shigella is serotype specific and vaccine protection will depend
on the serotype in the vaccine development (Noriega et al., 1996). Studies on
Shigellosis in Bangladesh have shown S. flexneri serotype 1 to be the second most
prevalent group after type 2. The prevalence of serotype 1c increased from 0 to 8.2%
from 1978 to 2000 whereas 1a decreased from 13.1 to 0.4% which emphasizes
continuous monitoring of S. flexneri serotype distributions (Talukder et al., 2003).
Study of antibiotic resistance patterns in these serotypes indicates the emergence of
resistance to antibiotics to be a major public health problem in developing countries.
Studies also indicate that multidrug resistance can be conferred by plasmids via plasmid
transfer (Hossain et al., 1998).
1.3 Microbiology of Shigella
Shigellae are gram negative, non-sporulating, facultative anaerobic bacilli of the family
Enterobacteriaceae. They are causative agents of Shigellosis or bacillary dysentery
(Sansonetti, 2001). An invasive disease of the human colonic epithelium marked by an
intense inflammatory reaction and subsequent mucosal destruction (Dramsi and
Cossart, 1998), Shigellosis is caused by Shigella spp. which can be subdivided into four
serogroups - S.sonnei, S.boydii, S.flexneri and S.dysenteriae
3
(http://www.who.int/infectious-disease-report accessed on 20th April 2004).Each
serogroup contains multiple serotypes based on the structure of O-antigen component
of the lipopolysaccharide present in the outer membrane of the cell wall (Simmons and
Romanowska, 1987). The four species of Shigella are so closely related to Escherichia
coli that all of these bacteria could be considered members of a single species. They
share greater than 90% homology by DNA–DNA re-association analysis (Brenner et
al., 1969). Infection is transmitted via the fecal-oral route and is characterized by
excretion of stools containing white cells and blood (DuPont et al., 1989). They are
pathogenic primarily due to their ability to invade intestinal epithelial cells. The
virulence factor is a smooth lipopolysaccharide cell wall antigen which is responsible
for the invasive features and a toxin (shiga toxin) which is both cytotoxic and
neurotoxic and causes watery diarrhea (Dipika et al., 2004).
1.4 Shigella Serogrups and Serotyps
Shigellosis is caused by Shigella spp. which can be subdivided into four serogroups - S.
sonnei, S. boydii, S. flexneri and S. dysenteriae. (Brenner et al., 1969). Shigella strains
have been further divided into 38 serotypes based on ‘O’ antigen variation: 13 in S.
dysenteriae, 18 in S. boydii, 6 in S. flexneri and 1 in S. sonnei. Shigella strains have
been clustered into three groups. Cluster 1 contains majority of S. dysenteriae and S.
boydii serotypes, cluster 2 has smaller groups of S. boydii serotypes and S. dysenteriae
type 2 and cluster 3 contains all the S. flexneri serotypes except serotype 6 and 6A (Lan
et al., 2001). However, they have some degree of antigenic relatedness attributable to a
common repeating tetrasaccharide unit, to which ∝-D-glucopyranosyl and O-acetyl
groups are added, providing the basis for their type group antigenic factors. The
variable antigenicity of lipopolysaccharide is mainly due to the chemical and structural
diversity of the O-polysaccharides. The addition of glucosyl and/or O-acetyl groups to a
common tetrasaccharide O repeat units result in different group and type specific
antigens which are encoded by the chromosomal rfb gene (Yao and Valvano, 1994).
Isolation of uncommon serotypes and subserotypes of Shigella spp. particularly of
Shigella flexneri has become frequent occurrence where it may not be always possible
to type the isolates with the present classification system (Talukder et al., 2001).
Despite the antigenic variability based on the structure of ‘O’ antigen, the type group
specificity is retained due to type group antigenic factors displaying some degree of
antigenic relatedness. However, reports indicate a multitude of epitopes in Shigella
4
flexneri that are not covered by agglutination reactions with commercial antisera
(Edwards and Ewing, 1972).
1.5 Clinical Features
The clinical manifestations of Shigella infection vary from short-lasting watery diarrhea
to acute inflammatory bowel disease characterized by fever, intestinal cramp and
bloody diarrhea with mucopurulent feces (Sansonetti, 2001). Neurologic symptoms
such as lethargy, confusion, severe headache, and convulsion are the most common
extraintestinal manifestations of shigellosis. (Ashkenazi et al., 1990). In some cases,
there may not be any symptoms (asymptomatic), while in others it may produce mild to
moderate dysentery or even fulminating dysentery with fever, severe abdominal cramps
and rectal pain. Children may have high fever (104 0F) with convulsions, rectal
prolapse and later develop malnutrition. Shigella sonnei produces mild dysentery.
S.flexneri and S.dysenteriae type 1 typically produce severe dysentery, particularly the
latter (Dipika et al., 2004).
1.6 Epidemiology of Shigellosis
1.6.1 Reservoirs and modes of transmission
Humans are the only natural hosts for Shigella. The predominant mode of transmission
is by faecal-oral contact, and the low infectious inoculum (as few as 10 organisms)
renders Shigellae highly contagious (DuPont et al., 1989). In developing countries,
shigellosis is most common in children less than 5 years old (Black et al., 1978).
Persons symptomatic with diarrhoea are primarily responsible for transmission (Centers
for Disease Control, 1986). Less commonly, transmission is related to contaminated
food and water or fomites; however, the organism generally survives poorly in the
environment. In certain settings where disposal of human faeces is inadequate,
houseflies can serve as a mechanical vector for transmission (Levine et al., 1991).
Overcrowded conditions and water supplies that are inadequately protected from
sewage contamination contribute to the high incidence of infection. In developed
countries, common-source outbreaks, usually involving S. sonnei, occur sporadically,
and the source of such outbreaks is often uncooked food such as a salad that contains
carbohydrates or proteins (Black et al., 1961). Homosexual men are also at risk for
direct transmission of Shigella infections, and recurrent shigellosis complicating human
immunodeficiency virus infection can occur (Blaser et al., 1989). Direct fecal-oral
5
contamination can contribute to endemic shigellosis in institutional environments such
as mental hospitals, day care centers, nursing homes, prisons, and outdoor gatherings.
For example, a recent outbreak of S. sonnei among 12,700 attendees at an outdoor
conference was characterized by an attack rate of greater than 50% (Wharton et al.,
1990).
1.6.2 Distribution of serogroups and serotypes
The predominant serogroup of Shigella circulating in a community appears to be
related to the level of socioeconomic development (Kotloff et al., 1999). Three
predominant strains are responsible for majority of shigellosis cases viz., S. sonnei, S.
flexneri 2a and S. dysenleriae type 1. S. dysenteriae type 1, which produces severe
disease, may cause life-threatening complications, is usually multi drug resistant and
can cause large epidemics and even pandemics with high morbidity and mortality
(Brenner et al., 1969). S. flexneri is the main serogroup found in developing countries
(median 60% of isolates), with S. sonnei being the next most common (median 15%). S.
dysenteriae and S. boydii occur with equal frequency (median 6%). In contrast, data
from Spain, Israel and the United States consistently demonstrate that S. sonnei is the
most common serogroup found in industrialized countries (median 77%), followed by
S. flexneri (median 16%), S. boydii (median 2%) and finally S. dysenteriae (median
1%) (Kotloff et al., 1999). Industrialized countries also report outbreaks of Shigella
infections among high-risk populations such as children attending day care (Mohle-
Boetani et al., 1995; Pickering et al., 1984), persons with human immunodeficiency
virus/acquired immunodeficiency syndrome (Baer et al.,1999; van Oosterhout et
al.,1994). According to current estimates, over two thirds of all episodes of shigellosis
and four fifths of all deaths from shigellosis occur in children under five years old.
Among children, the risk of death from shigellosis is greatest in infants and those who
are severely malnourished (Khan et al., 1985; Bennish et al., 1990). S. dysenteriae 1,
the agent of epidemic shigellosis, is responsible for extensive outbreaks in Central
Africa, Southeast Asia, and the Indian subcontinent. S. dysenteriae 1 is also isolated
from up to 30% of dysentery patients in endemic areas (Bennish et al., 1990).
Provisional serotype of S. flexneri 1c, which was first identified in Bangladesh, was
found later in rural Egypt (Gendy et al., 1999). Similarly serotype 4c was isolated in
Russia (Pryamukhina and Khomenko, 1988) and, in Taiwan S. flexneri and S. sonnei
have been reported to be major causative agents of Shigellosis as compared to S.
6
dysenteriae and S. boydii which are seen only in cases of imported disease (Pan, 1997).
Cases of Shigellosis caused by S. sonnei have also been reported in United Kingdom
and Ireland (Delappe et al., 2003). S. boydii strains are less frequently isolated and in
developed countries they are considered to be imported (Rowe et al., 1974), however,
S. boydii has been found to be indigenous to some South European countries. In 1990,
the isolation rate of S. boydii serotype 2 in Bulgaria increased sharply and the strains
originated from different geographic locations were reported as sporadic (Prats et al.,
1985). S. sonnei is typically associated with mild self-limiting infection however; it has
become most prevalent in the developed world. Shigellosis has been reported to be
third leading bacterial gastrointestinal disease in the United States (Cimmons, 2000).
1.6.3 Shigella and HIV infection
The intersection of Shigella infections and the human immunodeficiency virus (HIV)
epidemic has had serious consequences. Both chronic diarrhea and dysentery are
common among persons infected with HIV (Colebunders et al., 1987; van Oosterhout
and van der Hoek, 1994). Although it is not known whether the risk of acquiring
shigellosis is enhanced by concomitant HIV infection (Angulo et al., 1995). It appears
that HIV-associated immunodeficiency leads to more severe clinical manifestations of
Shigella infection. Patients with HIV infection may develop persistent or recurrent
intestinal Shigella infections, even in the presence of adequate antimicrobial therapy.
They also face an increased risk of Shigella bacteraemia, which can be recurrent, severe
or even fatal (Dougherty et al., 1996; Batchelor et al., 1996).
1.6.4 Shigella epidemics and pandemics
During 1967-70, bacillary dysentery was first reported in Central American countries
(Mendizabal-Morris Et al., 1969). Since then, spread of this infection has been reported
from many Asian countries such as Bangladesh (1972-78, 2003), Sri Lanka (1976),
Maldives (1982), Nepal (1984-85), Bhutan (1984-85) and Myanmar (1984-85) (Pal et
al., 1989; Naheed et al., 2004). In India, epidemics were mainly encountered in
southern India (Vellore - 1972-73, 1997-2001 ) (Mathan et al., 1984; Jesudason et al.,
1997) eastern India (1984) (Pal et al., 1984 Datta et al., 1987) and Andaman and
Nicobar islands (1986) (Sen et al., 1987 Bhattacharya et al.,1988). Recent outbreaks
(2002-03) of multi drug resistant S. dysenteriae type 1 have been reported from Siliguri,
Diamond Harbour, Kolkata, and Aizwal and Bangladesh (Sarkar et al., and
7
Bhattacharya 2003). When pandemic S. dysenteriae type 1 strains invade these
vulnerable populations, the attack rates are high and dysentery often becomes a leading
cause of death (Ries et al., 1994). The pandemic that began in Central Africa in 1979
progressed to East Africa and has since become particularly problematic among refugee
populations (Centers for Disease Control and Prevention, 1994).
1.7 Drug Resistance
Over the last 50 years, Shigella has demonstrated extraordinary prowess in acquiring
plasmid-encoded resistance to the antimicrobial drugs that previously constituted first-
line therapy. Sulfonamides, tetracycline, ampicillin and
trimethoprim±sulfamethoxazole initially appeared as highly efficacious drugs, only to
become impotent in the face of emerging resistance (Sack et al., 1997). In the 1990s,
few reliable options exist to treat multiresistant Shigella infections, particularly in
developing countries where cost and practicality are paramount considerations.
(Gangarosa et al., 1970; Ries et al., 1994). Treatment of shigellosis has been
confounded by wide spread resistance to the commonly used antibiotics such as
ampicillin, co-trirnoxazole, tetracycline, nalidixic acid and recently to norfloxacin and
ciprofloxacin. The transmissibility of resistance can take place by clonal spread of
particular strains as observed in S. dysenteriae type 1(Dutta et al., 2003). Plasmid
fingerprinting of 25 strains of S. boydii serotype 2 from Bulgaria however showed high
genetic relatedness and presence of antibiotic resistance genes (Bratoeva et al., 1992).
Conjugative plasmids encoding resistance to multiple antibiotics have been detected in
S. sonnei (Barg et al., 1995).
1.8 Shigella Pathogenicity
Pathogenecity of Shigella invasion is a multistep process which consists of entry into
epithelial cells by induced phagocytosis, escape from the phagocytic vacuole,
multiplication and spread within the epithelial cell cytoplasm, passage into adjacent
epithelial cells by finger like protrusions from cell surface and killing of host cells
(Vasselon et al., 1992). The three essential steps for Shigella virulence are invasion of
epithelial cells, intracellular multiplication, and the spread of the invading bacteria into
adjacent cells. (Parsot, 1994).
8
1.8.1 Entry
Shigella directs its own uptake into the colonic mucosa through membrane ruffling and
macropinocytosis (Adam et al., 1995). Shigella strains are unusual among enteric
bacteria in their ability to gain access to the epithelial cell cytosol, where they replicate
and spread directly into adjacent cells. (Sansonetti et al., 1982). Preliminary
experiments indicated that shigellae entered mammalian cells through endocytosis. No
leakage of macromolecules from recipient cells could be observed during entry (Hale
and Formal, 1980). Cytochalasin D, which inhibits microfilament functions, blocked
the entry process (Hale et al., 1979). Use of an antimyosin monoclonal antibody and of
7-nitrobenz-2-oxa-l,3-diazole phallacidin, a fluorescent dye that stains polymerized F-
actin, demonstrated accumulation of myosin and F-actin, two major components of the
cell cytoskeleton, underneath the cytoplasmic membrane at the site of bacterial entry.
(Baudry et al., 1987). The DNA sequence indicated that the genes necessary for entry
of bacteria into epithelial cells are clustered within a 31-kb region of the virulence
plasmid. (Philpott et al., 2000).
1.8.2 lntracellular multiplication
When the pathogen reaches the colon, it invades the epithelial cells. Once it reaches the
underlying M cells, Shigella infects resident macrophages and induces cell death. The
infected macrophages release large amounts of interleukin-1β which leads to a strong
inflammatory response (ZyChlinsky et al., 1994). Meanwhile, a bacterium released
from macrophages enters enterocytes via basolateral surface by directing membrane
ruffling and micropinocytosis. The bacterium is surrounded by phagocytic vacuole but
it disrupts the membrane escapes in cytoplasm where it multiplies and moves by
inducing actin polymerization at one pole of the bacterium, allowing intracellular
spread within the cytoplasm and into adjacent epithelial cells (Makino et al., 1986). A
sequential electron-microscopic study of infected HeLa cells demonstrated that invasive
S. flexneri induced lysis of the phagocytic membrane shortly after penetration into cells.
By 30 min after centrifugation-induced penetration, all bacteria were lying free within
the cytoplasm of host cells (Sansonetti et al., 1986). Similar invasiveness was observed
with E. coli K-12 caring pWR100. On the other hand, Salmonella typhimurium, whose
lysis of the phagocytic membrane is late and inefficient, grows poorly intracellular. A
plasmid-mediated contact-hemolytic activity demonstrated in virulent shigellae
provides a likely mechanism for lysis of the phagosome (Sansonetti et al., 1986). By
9
contrast, correlation has been observed between rapid intracellular growth of shigellae
and the level of Shiga toxin or SLT production (Sansonetti et al., 1986). The iron
chelator aerobactin may also be critical for bacterial replication within cells in which
iron is immobilized by ferritin. Independent studies have shown that mutants of S.
flexneri that no longer produce aerobactin demonstrate no significant alteration in their
capacity to multiply intracellular (Lawlor et al., 1987; Nassif et al., 1963).
1.8.3 Early killing of host cells
Metabolic events that mediate early killing have been demonstrated to include a rapid
drop in the intracellular concentration of ATP, an increase in pyruvate concentration,
and arrest of lactate production (Sansonetti and Mounier, 1987). Plasmid genes are also
involved in early killing of host cells. In a study of the intracellular fate of both an
invasive strain and a noninvasive, plasmidless derivative of S. flexneri, plasmid
pWR100 appeared to mediate killing of host cells (the continuous macrophage cell line
J774) within 4 hr. For expression of this activity bacteria had to be intracellular, since
macrophages were protected by cytochalasin D. Although both strains produced
equivalent levels of SLT and inhibited protein synthesis of macrophages within 2 hr,
only invasive bacteria were able to kill host cells. Damage to macrophages correlated
with the ability of invasive bacteria to rapidly and efficiently lyse the membrane of the
phagocytic vacuole (Clerc et al., 1987).
1.8.4 Continuous reinfection of adjacent cells
In order to spread from one cell to another, the bacterium forms a finger-like
protrusion from surface of the infected cell where around site of exit, a major
rearrangement of the cytoskeleton occurs with the formation of many tiny villosites.
The protrusion elongates and penetrates the surface membrane of adjacent cell. The
bacterium lyses these membranes and is released in the cytoplasm of the adjacent cell
(Prevost et al., 1992). A region (virG) of the S. flexneri virulence plasmid is considered
to be necessary for continuous reinfection of adjacent cells (Makino et al., 1986). virG
mutants can invade cells and multiply intracellular but do not spread to adjacent cells.
Within epithelia, bacteria tend to localize within the cytoplasm and convert to a
spherical morphology before being eliminated (Anthony et al., 1988). After engulfment
the bacteria are surrounded by a membrane bound vacuole in host but they rapidly lyse
the vacuole and are released in the cytosol. They divide and grow in the cytosol and
10
become coated with filamentous actin which ultimately forms an actin tail at one pole
of the bacterium. This propels the bacterium through the cytoplasm and the pathogen
reaches the plasma membrane of the cell where it forms a long protrusion into the
neighbouring cell that internalizes the microbe. This process allows Shigella to move
from cell to cell without coming in contact with the extracellular milieu (Sansonetti et
al., 1999).
1.8.5 Shiga and Shiga-like toxins
Since the beginning of the twentieth century it has been known that Shigella
dysenteriae type 1 produces a potent protein, Shiga toxin (Conradi, 1903; O'Brien and
Holmes, 1989). Its activity as a neurotoxin, cytotoxin, and enterotoxin has been well
described (Keusch, 1988), and cytotoxins with similar biological properties have been
identified from a variety of bacteria, including Escherichia coli and Vibrio species
(O'Brien et al., 1984). This toxin is composed of two subunits. Subunit (32-kd)
possesses the biological activities. It is combined with five molecules of the B subunit
(7.7 kd), which are responsible for binding to cell-surface receptors (Donohue-Rolfe et
al., 1984; Seidah, 1986). This toxin binds to Galotl-4Galp (galabiose) glycolipid
receptors (Lindberg et al., 1987), and inhibits mammalian protein synthesis by cleaving
the N-glycosidic bond at adenine 4324 in 28S rRNA. Therefore, the toxic mechanism is
identical to that of the plant toxin ricin (Endo et al., 1988; Jackson, 1990). Some strains
of S. flexneri and S. sonnei produce low levels of Shiga-like toxin, which is
neutralizable by anti-Shiga toxin sera (Keusch and Jacewicz 1977; O’Brien et a l.,
1977).
1.8.6 Genetic bases of Shigella pathogenicity
The genetic bases for several aspects of the pathogenic process and intracellular
lifestyle of Shigella, including the mechanisms of species specificity, tissue tropism,
and restriction of the immune response, are still poorly understood and probably
involve chromosomally encoded proteins. In common with other enteric bacteria,
Shigella survives the proteases and acids of the intestinal tract by uncertain means.
Highly tissue-specific disease results from a very low infectious dose (10 to 100
bacteria) and in the absence of flagellum-based motility (LaBrec et al., 1964). Plasmid
or bacteriophage mediated horizontal transfer of genes may lead to the emergence of
virulent Shigella strains from closely related avirulent precursors. (Faruque et al.,
11
2002). Virulence is often multifactorial and coordinately regulated, and virulence genes
tend to be clustered in the genome (Hacker et al., 1997). The genetic determinants of
the virulence are mainly associated on virulence plasmids found in Shigellae as well as
some genes located on the chromosome. Most of the work on molecular pathogenesis
of Shigella has been carried out in S. flexneri serotypes 2a and 5a. (Maurelli et al.,
1998). The ability of Shigella sp. to invade epithelial cells and cause enteric disease is
dependent on presence of a family of large low copy number plasmids called pINV
(Makino et al., 1988). The bacterial factors are released upon contact which is property
specific to “type III secretion system” found in a growing number of bacterial
pathogens. These systems comprise of approximately 20 genes. IcsA or VirG, a 120
kDa outer membrane protein hydrolyzes ATP and is localized to one pole of the
bacterium at the junction between microbe and the actin tail. The surface-exposed virG
α-domain recruits vinculin and N-WASP (Neural Wiskott Aldrich syndrome protein)
through binding to the glycine-rich repeats of virG. Vinculin then interacts with actin
filaments and VASP (vasodilator stimulated phosphoprotein), which contributes to
actin polymerization. IcsA is proteolytically cleaved by bacterial protease, SopA (IcsP)
that is required for polarized distribution of IcsA on bacterial surface and for proper
actin-based motility of Shigella in infected cells (Egile et al., 1997).
1.9 Distinguish the Shigella Virulence
Several assay systems are employed to distinguish the different steps in Shigella spp
virulence. Included among these are the Sereny test, which measures infection and
destruction of mucosal surfaces resulting in keratoconjunctivitis in guinea pigs (Sereny,
1955 and Oaks et al., 1985) and the HeLa cell invasion assay (Hale and Formal, 1981).
The use of these various assay systems and the application of classical genetic
techniques have demonstrated that Shigella spp virulence is a multigenic phenomenon.
Genes encoded on the 220-kilobase (kb) invasion plasmid (Sansonetti et al., 1981) and
several unlinked chromosomal loci (Sansonetti et al., 1983) are essential for the
expression of a complete virulence phenotype. In addition, expression of Shigella spp
virulence is regulated by growth temperature. Shigella strains, which are phenotypically
virulent when cultured at 37°C, become phenotypically avirulent when cultured at 30°C
(Maurelli et al., 1984).
12
1.10 Shigella Virulence Genes
Virulence genes may be encoded on plasmids or chromosome, as virulence is
multifactorial the virulence genes tend to be clustered in the genome. ( Hacker et
al.,1997). Genes required for entry of bacteria into epithelial cells and the induction of
apoptosis in infected macrophages are clustered on a 30 kb region (designated the entry
region) of the VP. This region encodes components of a type III secretion (TTS)
apparatus (the Mxi-Spa TTS apparatus), substrates of this secretion apparatus (the
translocators IpaB and IpaC and the effectors IpaD, IpgB1, IpgD and IcsB) and their
dedicated chaperones (IpgA, IpgC, IpgE and Spa15) and two transcriptional activators
(VirB and MxiE) (Hale et al., 1983; Sansonetti et al., 1983). The capacity of Shigella to
enter cells is governed by proteins encoded by a subset of genes within three
contiguous operons (ipa, mxi, and spa) in a 30-kb region of the 230-kb pWR100
virulence plasmid (Parsot, 1994). The Ipa proteins (invasins) are essential for the
invasion of epithelial cells, and their secretion is mediated by the proteins encoded at
the mxi and spa loci whose products constitute a type III secretion apparatus (TTSS) (or
secreton) (Blocker et al., 1999; Ménard et al., 1994).
The entry region is a pathogenicity island-like cluster (see below) that contains: a) the
mxi and spa genes encoding components of a type III secretion apparatus; b) the ipaA,
B, C and D and ipgD genes encoding proteins secreted by this machinery; c) the ipgC
and ipgE genes encoding cytoplasmic chaperones required for stability of IpaB and
IpaC, and IpgD, respectively; d) the virB gene encoding a protein required for
transcription of the mxi, spa and ipa genes; and e) additional genes of unknown
function. Outside of the entry region, other genes associated with virulence have been
identified. They include: a) the icsA (virG) gene encoding an outer membrane protein
that is directly responsible for the ability of the bacteria to move within the cytoplasm
of infected cells; b) the virF gene encoding a transcriptional activator that controls
expression of icsA and virB; and c) the sepA gene, which encodes a secreted serine
protease of the autotransporter family. In addition, the virulence plasmid contains two
copies of the shet2 gene encoding a putative enterotoxin, and genes encoding several
secreted proteins, which include virA, ipaH4.5, ipaH7.8, ipaH9.8 and six
uncharacterized genes designated (outer Shigella proteins): ospB, ospC1, ospD1,
ospE1, ospF and ospG. The proteins encoded in this plasmid are directly involved in
13
the entry into epithelial cells and invasive phenotypes observed in the pathogenesis of
Shigella strains (Alfredo, 2004).
1.11 Invasion Plasmid Antigen Proteins
A set of bacterial gene products, called invasion plasmid antigen (Ipa) proteins, is
secreted by the type III pathway of Shigella and triggers a eukaryotic membrane
ruffling process responsible for mediating entry (Ménard et al., 1996). Genetic and
biochemical analyses implicate four Ipa invasins (IpaA through IpaD) and a type III
secretion system consisting of up to twenty Mxi-Spa proteins (Hueck et al., 1998;
Ménard et al., 1996). The ipa and mxi-spa loci are located within closely linked operons
found on the 230-kb virulence-associated plasmid of Shigella (Parsot and Sansonetti,
1996). The major function of TTSSs is to transport proteins from the bacterial
cytoplasm into the host cell plasma membrane or cytoplasm upon contact with host
cells (Bleves and Cornelis, 2000; Cornelis and Denecker, 2001). In Shigella flexneri,
the mxi, the spa and the ipa operons are expressed at 37°C, but Ipa proteins remain in
the bacterial cytoplasm until the secretion machinery is activated by host cell contact or
by external, presumably surrogate, signals such as serum or a small amphipathic Congo
red (CR) dye molecule (Bahrani et al., 1997; Ménard et al., 1994; Parsot et al., 1995).
Physical contact between the bacterium and the host cell induces insertion of two Ipas
(IpaB and IpaC) into the host membrane to form a 25-Å pore that might be used to
translocate the other invasins into target cells (Blocker et al., 1999). The Ipas then
catalyze the formation of a localized actin-rich, macropinocytic-like ruffle on the host
cell surface, which internalizes the bacterium (Bourdet-Sicard et al., 1999; Tran Van
Nhieu et al., 1999). The current model of the TTS pathway proposes that, upon contact
of bacteria with host cells, translocators insert into the membrane of the host cell to
form a pore through which effectors transit to reach the cell cytoplasm (Hueck, 1998).
Other substrates of the TTS apparatus are encoded by the genes scattered throughout
the VP, such as virA, ospB, C, D, E, F and G and ipaH genes (Buchrieser et al., 2000).
Several of these putative effectors are encoded by the multigene families, with five
ipaH, four ospC, three ospD and two ospE genes carried by the VP. Genes encoding
components of the TTS apparatus and its substrates exhibit a similar low G+C content
(approx. 34 mol%), suggesting that lateral transfer acquired the entire TTS system once.
In addition, the VP encodes at least five other proteins that are involved in virulence,
including IcsA, IcsP, VirK, MsbB2 and SepA. IcsA (VirG) is an outer-membrane protein
directly involved in promoting actin polymerization at one pole of intracellular bacteria
14
(Buchrieser et al., 2000; Goldberg and Theriot, 1995). IcsP (SopA) is an outer-
membrane protease involved in the release of a certain proportion of surface-exposed
IcsA (Egile et al., 1997; Shere et al., 1997).
1.11.1 VirF (Positive regulator of the plasmid virulence regulon)
The virF locus was first identified by a spontaneous deletion in SalI fragment F that
resulted in the simultaneous loss of four virulence-associated phenotypes: Pcr (Congo
red binding), Inv (invasion of tissue culture cells in vitro), Ser (Sereny test), and Igr
(inhibition of growth) (Sasakawa et al., 1986). The functional gene product of virF is a
30-kDa protein, but a 24-amino-acid signal peptide-like sequence may be cleaved
during passage through the inner membrane, yielding a 27-kDa protein that has been
detected in minicells (Sakai et al., 1986). The virF gene product plays a central role in
positive regulation of the plasmid virulence regulon. It directly activates transcription
of the virG gene (Sakai et al., 1988) and it indirectly activates ipaABCD (Sakai et al.,
1988; Watanabe, 1988) and invAKJHF (Watanabe, 1988).
1.11.2 VirB (invE, ipaR)
A regulatory locus necessary for expression of the invasive phenotype is located within
Sall fragment B of the S. flexneri 2a virulence plasmid, and the cloned gene has been
designated virB (Adler et al., 1989). Analogous genes have been designated ipaR in
S.flexneri 5 (Buysse et al., 1990) and invE in S. sonnei (Watanabe et al., 1990). The
mobility of the protein product of virB in sodium dodecyl sulfate (SDS)-polyacrylamide
gels indicates a molecular mass of 33 kDa (Adler et al., 1989). Whereas the ipaR
protein has been estimated to be 34 kDa (Buysse, 1990) and the invE protein has been
estimated at 35 kDa (Watanabe et al., 1990).
1.11.3 IpaABCD, ippI, and invGF
Immunoblots with serum from monkeys or humans infected with Shigella species
demonstrate a consistent serum immune response that recognizes five plasmid-encoded
proteins (Watanabe et al., 1990; Sasakawa et al., 1989; Kato et al., 1989). The largest
of these proteins is the product of the virG (icsA) locus, which is located outside the
invasion region. The other proteins are encoded by the ipa locus, which corresponds to
invasion region 2 in the S. flexneri 2a virulence plasmid (Sasakawa et al., 1988). Since
the latter proteins were originally designated a (78 kDa), b (62 kDa), c (43 kDa), and d
15
(38 kDa) in order of descending molecular mass (Hale et al., 1985). The corresponding
genes have been named ipaABCD (Buysse et al., 1987).
1.11.4 InvAAJH (mxiAB)
Tn3-lac fusion inserts within the S.sonnei invasion plasmid have defined four
transcribed genes, designated invAKJH, that are necessary for expression of the
invasive phenotype (Watanabe, 1988; Watanabe et al., 1990). Restriction maps suggest
that these genes correspond to invasion regions 3 and 5 of the S. flexneri 2a plasmid
(Sasakawa et al., 1988) designated mxiA (mxi, membrane expression of Ipa)
(Hromockyj, and Maurelli, 1989). Rabbit antiserum raised against the BS260 fusion
protein recognizes a 76-kDa protein in immunoblots of an S. flexneri 5 whole-cell
lysate (la). Published restriction maps suggest that invA should map within invasion
region 5 and invKJ should map within invasion region 3 on the S. flexneri 2a plasmid.
Restriction analysis also indicates that an S. sonnei gene designated invH should map at
the junction of invasion region 3 and region 2 in the S. flexneri 2a virulence plasmid
(Watanabe, 1988 and 1990; Sasakawa et al., 1988). Since results of precise mapping
and sequencing of invasion regions 3, 4, and 5 have yet to be published, the molecular
mass of invKJH gene products is unknown. However, invA insertion mutants are
complemented by a cloned fragment from the S. sonnei plasmid which expresses a 38-
kDa protein (Watanabe and Nakamura, 1986).
1.11.5 VirG (icsA) (Plasmid gene associated with intercellular bacterial spread)
The virG (icsA) gene product was originally identified as the fifth invasion plasmid
antigen of S. flexneri 5 (Oaks et al., 1986) and extrinsic radioiodination of whole cells
has shown that this protein is exposed on the bacterial surface (Lett et al., 1989). The
virG (icsA) gene product has been reported to have a molecular mobility in SDS-
polyacrylamide gels of 140 kDa (Oaks et al., 1986 Pal et al., 1989) 130 kDa (Lett et al.,
1989) or 120 kDa (Bernardini et al., 1989). The ORF of the cloned virG gene of S.
flexneri 2a suggests a protein of approximately 117 kDa (Lett et al., 1989). However,
minicell analysis of the product(s) of the cloned virG (icsA) gene reveals at least nine
nonvector polypeptides of 130 kDa or less (Bernardini et al., 1989; Lett et al., 1989)
and it has been suggested that these polypeptides are the products of internal initiation
codons (Lett et al., 1989).
16
1.11.6 IpaH (Multicopy invasion plasmid antigen gene)
The ipaH gene is unique in that five complete or partial copies are present on the
invasion plasmids of the various S. flexneri serotypes and multiple copies are also
found on the invasion plasmids of other Shigella species and EIEC. The copies of ipaH
have been characterized by Southern hybridization of HindIII digests of S. flexneri 5
plasmid DNA with a Agtll::ipaH probe. These genes have been designated ipaH 9.8,
ipaH7.8, ipaH4 .5, ipaH2.5, and ipaH.4 on the basis of the size of the hybridizing DNA
fragment (Hartman et al., 1990), ipaH7.8 and ipaH45 have been mapped to SalI fragment
B between virG and ipaR (within 10 kb of the latter gene). Northern blot analysis
indicates that both ipaH7.8 and ipaH4.5 are transcribed in vitro whereas ipaH25 and
ipaH.4 are unexpressed, truncated sequences (Venkatesan et al., 1991).
1.11.7 VirK
VirK is required for production of IcsA by an unknown mechanism (Nakata et al.,
1992). MsbB2 is an acyl transferase that, in conjunction with the product of the
chromosomal msbB1 gene, acts to produce full acyl-oxy-acylation of the myristate at
the 3' position of the lipid A glucosamine disaccharide (d'Hauteville et al., 2002).
Table .1.1 Chromosomal loci associated with the virulence of Shigella
Locus Function T-locus Integration site for incorporation of lysogenic phage encoding type-
specific somatic agent kcpA positive regulation of virG (icsA) virR repression of plasmid invasion loci ipaABCD in response to
temperature stx synthesis of Shiga toxin rfb synthesis of group specific somatic antigen rfd synthesis of somatic antigen basal core sodB superoxide dismutase to inactivate superoxide radicals produced by
respiratory burst in phagocytes ompR-envz Induction of plasmid invasion loci iucABCD-iutA synthesis of aerobactin and 76-kDa aerobactin receptor protein rfa synthesis of somatic antigen 102a decreased intercellular spread basal
core in infected tissue culture monolayers (Hale 1991)
17
Table.1.2 Plasmid genes associated with the virulence of Shigella
Gene Function Stb necessary for stable maintenance of the high molecular
weight virulent plasmid Rep necessary for replication of high molecular weight plasmidvirF positive regulation of virB and virG genes virB (invE, ipaR) positive regulation of ipaABCD and invAKJHFG virG (icsA) associated with intra- and intercellular bacterial spread icsB protrusions at surface of infected cells invA (mxiB),invK (mxiA), invJ, invH, invF
necessary for invasion
ipaB induces apoptosis of macrophages ipaC, ipaD, ipaA mediates endocytic uptake of bacteria mxi, spa components of type III secretion system ipgC molecular chaperons (inv genes are found in S. sonnei)
18
Table1.3 Gene products influencing expression of plasmid linked virulence genes of Shigella flexneri
Gene product
Gene Location Description
CpxA/CpxR cpxA, cpxR Chr Response regulator and sensor
DNA gyrase gyrA, gyrB Chr Introduce negative supercoils, two component system DNAtopoisomerase II
EnvZ/OmpR envZ, ompR Chr Response regulator and sensor
H-NS hns Chr Nucleotide associated protein
IHF ihfA, ihfB Chr Repressor of virB transcription, DNA binding protein regulatesvirB and virF transcription
IspA ispA Chr Possible role in cell division
Mia mia Chr tRNA N -isopentyl adenosinesynthetase
MxiB mxiB VP AraC like protein
Rho rho Chr Transcription termination factor, regulates transcription of virBgene
StpA stpA Chr Analogue of H-NS can repress virulence gene expression whenoverproduced
TopoI topA Chr DNA Topoisomerase I, relaxes negative supercoils
TopoII parC, parE Chr DNA Topoisomerase II, relaxes negative supercoils
TyrT trpT Chr Tyrosyl tRNA
VacB vacB Chr Post transcriptional regulation of ipa and icsA (virG) genes
VacC vacC Chr tRNA guanine transglycosylase, post transcriptional regulation ofvirF
VacJ vacJ Chr Protein needed for intracellular spreading
VacM vacM Chr Transcription of ipa genes
VirB virB VP Vassal regulator, activates main virulence structural genes operons
VirF virF VP AraC like transcription regulator, activates transcription of virBand icsA (virG) genes
VirK virK VP Required for post transcriptional control of icsA (virG) gene
Chr: Chromosomal, VP: virulence plasmid
1.12 Pathogenicity islands and “black holes”.
Pathogenecity islands are regions on the genomes of certain pathogenic bacteria, which
are absent in nonpathogenic strains of the same or closely related species and contain
the contiguous blocks of virulence genes. Similar to horizontal transfer, horizontal
spread of virulence genes by addition of pathogenecity islands is an important element
in the evolution of new emerging pathogens. Another adaptation to evolve towards
pathogenecity is by formation of deletions of genes, which can be harmful for
pathogenecity. The 90% homology exhibited between Shigella and Escherichia coli is
19
suggestive of a high efficiency of gene transfer by conjugation or transduction (Brenner
et al., 1969). In general, pathogenicity islands (PAIs) are large and unstable genetic
elements, acquired by lateral gene transfer, with different G + C content, often
associated with tRNA genes, which contribute to the virulence of bacterial pathogens.
The concept of PAIs was developed on the basis of data on genome structure and
pathogenicity of enteric organisms, especially pathogenic E. coli. However, this concept
is now used broadly in other gram-negative and gram-positive pathogens. (Alfredo,
2004).
1.13 IS-elements
Bacterial insertion sequences were initially identified during studies of model genetic
systems by their capacity to generate mutations as a result of their translocation. Interest
in antibiotic resistance and transmissible plasmids subsequently revealed an important
role for these mobile elements in dissemination
of resistance genes and in promotion of gene acquisition. In particular, it was observed
that several different elements were often clustered in “islands” within plasmid genomes
and served to promote plasmid integration and excision (Bukhari et al., 1977).
The most striking feature of the Shigella genomes is their highly dynamic nature due to
the presence of hundreds of IS-elements in each of the genomes. IS-elements are
capable of causing many kinds of DNA rearrangements (Schneider et al., 2000) and the
presence of the many rearrangements (deletions as well as translocations and inversions)
are a likely the result of the copious numbers of IS-elements. The Sd197 genome shows
the most rearrangements and is considerably smaller than the MG1655 genome due to a
large number of deletions. The genome of this Shigella strain also possesses the greatest
number of IS-elements, mainly in the form of IS1N, which may be responsible for many
of these rearrangements (Fan Yang et al., 2005). Compared with MG1655, Shigella
strains not only have many more copies of IS-elements but also have additional IS-
species, such as IS1N, IS600 and IS629. Within the Shigella genomes, IS1 is
predominant in the Sf301, Sb227 and Ss046 chromosomes whereas IS1N is copiously
present in the Sd197 chromosome. Intact IS21 and IS630 are present only in Ss046,
while the newly identified ISSbo6 is found mainly in Sb227 chromosome. ISSbo6 is
similar to ISEc8 found adjacent to the locus of enterocyte effacement (LEE)
pathogenicity island in EHEC (Perna et al., 1998). Furthermore, most copies of the
ISSbo6 are located within SHI-1, SHI-2 and ipaH islands (see below) in the Sb227
20
genome. The virulence plasmids and chromosomes share most of the IS-species,
suggesting that inter- and intra-replicon translocation and replication has occurred,
leading to large numbers of IS-elements in the genomes (Fan Yang et al., 2005).
The virulence plasmids also display a dynamic nature with many IS-mediated deletions,
translocations and inversions. (Shepherd et al., 2000).
1.14 Virulence plasmids
Large plasmids were first detected about 12 years ago in S. flexneri 2a (Kopecko, 1979)
and the essential role of plasmids in virulence was established shortly thereafter in both
S. sonnei (Sansonetti et al., 1981) and S. flexneri (Sansonetti et al., 1982). Subsequently
it was shown that virulence plasmids are also present in other serotypes of S.flexneri, S.
dysenteriae, S. boydii, and EIEC, Endonuclease digestion and Southern hybridization
indicate that the virulence plasmids of Shigella species and EIEC are essentially
homologous but restriction sites vary with the species and serotype (Hale et al., 1983;
Sansonetti et al., 1983). The virulence plasmids pWR100 in S. flexneri serotype 5,
pMYSH6000 in S. flexneri serotype 2a, and pSS120 in S. sonnei, together with those of
other Shigella bacteria, have been shown to carry determinants for invasiveness and the
ability to cause disease. These large plasmids are collectively termed pINV plasmids
(Hale, 1991). Which are also present in EIEC strains. The cell invasion capacity of
Shigella-EIEC is determined by a cluster of 38 genes within a 32-kb segment of the
pINV plasmid, often referred to as the entry or invasion region, which includes genes
for invasins, molecular chaperones, motility, regulation, and a specialized type III
secretion apparatus (Parsot and Sansonetti, 1996). The plasmids of pINV group have
been classified into two relatively homogeneous sequence forms pINV A and B. pINVA
are found in S. flexneri F6 and F6A, S. boydii B1, B4, B9, B10, B14, B15, S.
dysenteriae D3, D4, D6, D8, D9, D10, D13. pINVB plasmids are present in S. flexneri
F1A, F2A, F3A, F3C, F4A, FY, S. boydii B11, B12 and in S. sonnei. The clustering of
Shigella strains by plasmid types and forms is consistent based on chromosomal gene
sequences. Variations in plasmid sequences have been attributed to horizontal gene
transfer and IS elements (Lan et al., 2001).
21
1.15. Shigella Species:
1.15.1. Shigella boydii , GenBank Taxonomy No.: 621
Description: This species is uncommon except in India, where it was first isolated. The
18 known serotypes are antigenically distinct, expressing a diverse range of toxins in
addition to a Shigella-specific toxin. Progression to clinical dysentery occurs in most
patients infected with this organism (NCBI Entrez Genome Project).
1.15.1.1 Variant(s): Shigella boydii BS512. GenBank Taxonomy No: 344609.
Parent: Shigella boydii.
Description: This strain (strain BS512; serotype 18) was originally isolated from a 12-
year-old boy in Arizona by Dr. Nancy Stockbine. It is a member of Group 1 as
determined by limited sequence analysis and can invade HeLa cells. Pathogenicity and
virulence have been verified during in vitro experimentation, and multiple plasmids are
present in this strain (NCBI Entrez Genome Project).
1.15.1.2 Shigella boydii Sb227 :
GenBank Taxonomy No.: 300268. Parent: Shigella boydii.
Description: This strain is an isolate from an epidemic that took place in China in the
1950s (NCBI Entrez Genome Project).
1.15.2 Shigella dysenteriae :
GenBank Taxonomy No.: 622
Description: Since the late 1960s, pandemic waves of Shiga (S. dysenteriae type 1)
dysentery have appeared in Central America, south and south-east Asia and sub-Saharan
Africa, often affecting populations in areas of political upheaval and natural disaster.
(Kotloff et al., 1999). Synonyms: Shigella shigae, Eberthella dysenteriae, Bacillus
shigae, Bacillus dysenteriae, Bacillus dysentericus (NCBI Taxonomy).
Variant(s):
1.15.2.1 Shigella dysenteriae 1012 : GenBank Taxonomy No.: 358708. Parent:
Shigella dysenteriae .
Description: This strain is representative of the type 4 group of S. dysenteriae that is
becoming more prevalent in human infections. This shift is towards the type 2 and type
4 serotypes, which were not previously associated with outbreaks, and away from the
type 1 serotype, which was implicated in widespread epidemics in Asia, Central
America, and Africa. Pathogenicity has been confirmed in human challenge experiments
and strain 1012 has been shown to be one of the most virulent S. dysenteriae strains
22
identified by WRAIR/NMRC to date. This strain contains multiple plasmids thought to
be involved in virulence (NCBI Entrez Genome Project).
1.15.2.2 Shigella dysenteriae M131649 :
GenBank Taxonomy No.: 216598. Parent: Shigella dysenteriae .
Description: This strain was isolated from a patient in 1970 in Guatemala (NCBI
Entrez Genome Project).
1.15.2.3 Shigella dysenteriae Sd197 :
GenBank Taxonomy No.: 300267. Parent: Shigella dysenteriae.
Description: This strain is an isolate from an epidemic in China in the 1950s (NCBI
Entrez Genome Project).
1.15.3 Shigella flexneri : GenBank Taxonomy No: 623
Description: S. flexneri is endemic in most developing countries and causes more
mortality than any other Shigella species. The predominant serotypes of S. flexneri in
developing countries are serotypes 1b, 2a, 3a, 4a and 6. whilst in industrialized countries
most isolates are 2a (Jennison and Verna, 2004). Synonym: Shigella paradysenteriae
(NCBI Taxonomy).
1.15.3.1 Variant(s): Shigella flexneri 2a : GenBank Taxonomy No.: 42897. Parent:
Shigella flexneri .
Description: In developing countries, the predominant serotype of S. flexneri is 2a,
followed by 1b, 3a, 4a, and 6. In industrialized countries, most isolates are S. flexneri 2a
or other unspecified type 2 strains (Kotloff et al., 1999). Synonym: Shigella flexneri
serotype 2a (NCBI Taxonomy).
Shigella flexneri 2a str. 2457T: GenBank Taxonomy No.: 198215 . Parent: Shigella
flexneri 2a .
Description: Shigella flexneri 2a str. 2457T. This is a highly virulent strain that has
been widely used for genetic and clinical research. It is similar to pathogenic
Escherichia coli except for the more numerous insertion sequences and contains 4
plasmids pINV-2457T, pSf2, and pSf4, and pSf-R27 that are similar to pWR100,
pWR501, pCP301, and R27 respectively (NCBI Entrez Genome Project).
Shigella flexneri 2a str. 301: GenBank Taxonomy No.: 198214. Parent: Shigella
flexneri 2a .
23
Description: This strain was isolated in 1984 from a patient in Beijing, China. It is
similar to pathogenic Escherichia coli except for the more numerous insertion
sequences and contains a virulence plasmid (pCP301) (NCBI Entrez Genome Project).
1.15.3.2 Shigella flexneri 5: GenBank Taxonomy No.: 373383 . Parent: Shigella
flexneri .Description: This organism, along with Shigella sonnei, is the major cause of
shigellosis in industrialized countries and is responsible for endemic infections (NCBI
Genome Project).
Shigella flexneri 5 str. 8401: GenBank Taxonomy No.: 373383. Parent: Shigella
flexneri 5 .
Description: This organism is a strain of serogroup 5 and will be used for comparative
analysis
(NCBI Genome Project).
1.15.4 Shigella sonnei: GenBank Taxonomy No.: 624
Description: Synonym: Bacterium sonnei (NCBI Taxonomy). Shigella dysenteriae and
Shigella sonnei are the predominant species in the tropics, while S. sonnei is the
predominant species in industrialized countries (Alcoba-Florez et al., 2005).
Variant(s): Shigella sonnei 53G: GenBank Taxonomy No.: 216599. Parent: Shigella
sonnei.
Description: Isolated from 5 year old patient in Japan (NCBI Entrez Genome Project).
1.15.4.1 Shigella sonnei Ss046: GenBank Taxonomy No.: 300269. Parent: Shigella
sonnei .
Description: This strain is an isolate from an epidemic in China in the 1950s
(NCBI Entrez Genome Project).
1.16 Genome Summary:
1.16.1 Genome of Shigella boydii
Chromosome (NCBI Entrez Genome): GenBank Accession Number: NC_007613.
Size: 4,519,823 (NCBI Entrez Genome). Gene Count: 4466 Genes. 4136 Proteins
(NCBI Entrez Genome).
Plasmid pSB4_227 (NCBI Entrez Genome): GenBank Accession Number:
NC_007608. Size: 126,697 (NCBI Entrez Genome). Gene Count: 149 Genes. 148
Proteins (NCBI Entrez Genome).
24
1.16.1.1 Genome of Shigella boydii BS512
Description: The Shigella boydii BS512 whole genome shotgun (WGS) project has the
project accession NZ_AAKA00000000. This version of the project (01) has the
accession number NZ_AAKA01000000, and consists of sequences
NZ_AAKA01000001-NZ_AAKA01000079 (NCBI Entrez Genome).
Chromosome (NCBI Entrez Genome): GenBank Accession Number:
NZ_AAKA00000000. Size: 4680 Genes. 4680 Proteins (NCBI Entrez Genome). Gene
Count: 4,900,244 (NCBI Entrez Genome).
1.16.2 Genome of Shigella dysenteriae
1.16.2.1 Genome of Shigella dysenteriae Sd197
Chromosome (NCBI Entrez Genome): GenBank Accession Number: NC_007606.
Size: 4,369,232 (NCBI Entrez Genome). Gene Count: 4664 Genes. 4274 Proteins
(NCBI Entrez Genome).
Plasmid pSD1_197 (NCBI Entrez Genome): GenBank Accession Number:
NC_0076076. Size: 182,726 (NCBI Entrez Genome). Gene Count: 224 Genes. 223
Proteins (NCBI Entrez Genome).
1.16.2.2 Genome of Shigella dysenteriae 1012
Chromosome (NCBI Entrez Genome): GenBank Accession Number:
NZ_AAMJ00000000. Size: 3,013,140 (NCBI Entrez Genome). Gene Count: 2782
Genes. 2782 Proteins (NCBI Entrez Genome).
1.16.3 Genome of Shigella flexneri
1.16.3.1 Genome of Shigella flexneri 2a
1.16.3.1.1 Genome of Shigella flexneri 2a str. 2457T
Description: The genome exhibits the backbone and island mosaic structure of E. coli
pathogens, albeit with much less horizontally transferred DNA and lacking 357 genes
present in E. coli. The strain is distinctive in its large complement of insertion
sequences, with several genomic rearrangements mediated by insertion sequences, 12
cryptic prophages, 372 pseudogenes, and 195 S. flexneri-specific genes (Wei et al.,
2003).
25
Chromosome (NCBI Entrez Genome): GenBank Accession Number: NC_004741.
Size: 4,599,354 (NCBI Entrez Genome). Gene Count: 4577 Genes. 4068 Proteins
(NCBI Entrez Genome).
Description: The genome consists of a single circular chromosome of 4,599,354 bp
with a G+C content of 50.9%. The genome is slightly smaller than that of K-12
(4,639,221 bp), and its organization is roughly similar to that described for pathogenic
E. coli strain O157:H7 EDL933 and the uropathogen CFT073, with large regions of
collinear E. coli backbone punctuated by islands of sequence presumably acquired by
horizontal transfer. The number of islands is smaller than those in CFT073 and
O157:H7, and a larger proportion of the genome is backbone (82% versus 75% for
O157:H7 and CFT073) (Wei et al., 2003).
1.16.3.1.2 Genome of Shigella flexneri 2a str. 301
Description: The whole genome is composed of a 4,607,203 bp chromosome and a
221,618 bp virulence plasmid, designated pCP301 (Jin et al., 2002).
Chromosome (NCBI Entrez Genome): GenBank Accession Number: NC_004337 .
Size: 4,607,203 (NCBI Entrez Genome). Gene Count: 4566 Genes. 4182 Proteins
(NCBI Entrez Genome).
Description: While the plasmid shows minor divergence from that sequenced in
serotype 5a, striking characteristics of the chromosome have been revealed. The S.
flexneri chromosome has, astonishingly, 314 IS elements, more than 7-fold over those
possessed by its close relatives, the non-pathogenic K12 strain and enterohemorrhagic
O157:H7 strain of Escherichia coli (Jin et al., 2002).
1.16.3.1.3 Plasmid pCP301 (NCBI Entrez Genome): GenBank Accession Number:
NC_004851. Size: 221,618 (NCBI Entrez Genome). Gene Count: 267 Genes. 261
Proteins (NCBI Entrez Genome).
Description: pCP301 is a mosaic of potential virulence-related genes, IS elements,
maintenance genes and functionally unknown ORFs. All the previously identified
virulence genes are present in pCP301. These include the primary invasion genes ipa
and mxi-spa (encoding the invasion plasmid antigens and the type III secretion system,
respectively), virG/IcsA (required for polymerizing host actin to provide propelling force
for intra- and inter-cellular spread) and virF (necessary for regulating virulence gene
expression). The replication origin (R100-like) ori and G site (single-strand initiation
site) in pCP301 are identical to those of pWR501 and pWR100. pCP301 also has
maintenance genes, repA, copA and copB, for replication; parA and parB for
26
partitioning; and ccdA and ccdB for post-segregation killing. The noticeable difference
between pCP301 and the plasmids from serotype 5a is the presence of more IS-related
DNA in pCP301, making its size close to pWR501 (221 851 bp), which is larger than
pWR100 because of a Tn501 (8360 bp) insertion (Venkatesan et al., 2001).
1.16.3.1.4 Plasmid p2457TS2 (NCBI Entrez Genome): GenBank Accession Number:
NC_002773. Size: 3,179 (NCBI Entrez Genome). Description: Submitted (03-MAR-
2001). Wang,H., Feng,E., Liao,X., Su,G. and Huang,C. Unpublished (NCBI Entrez
Genome).
1.16.3.2 Genome of Shigella flexneri 5
1.16.3.2.1 Plasmid pWR501 (NCBI Entrez Genome): GenBank Accession Number:
NC_002698 . Size: 221,851 (NCBI Entrez Genome). Gene Count: 293 Genes. 293
Proteins (NCBI Entrez Genome).
Description: The 210-kb Shigella flexneri 5a virulence plasmid ( pWR501) is a mosaic
of potential pathogenesis-associated genes, IS elements, maintenance genes, and
unknown ORFs. Of the 286 Shigella-derived potential ORFs, 54 (19%) encode known
Shigella proteins. Thirty-seven of these are located within a 32-kb cluster of
uninterrupted ORFs, previously described, constituting the ipa-mxi-spa loci or
pathogenicity island (Tran van Nhieu and Sansonetti, 1999). The remaining 17 are
distributed throughout the plasmid and include five alleles of ipaH and one allele each
of icsA (virG), virA, icsP (sopA), virF, virK, msbB sepA, ipgH, shet2, phoN-Sf, trcA, and
an apyrase gene. Most virulence-associated genes, including the ipa-mxi-spa operons,
the virG gene, and the ShET2 toxin gene, are flanked by one or more such mosaics of IS
element ORFs. In addition, many of the unknown ORFs are flanked by IS element
ORFs and have G+C content of less than 40%. Based on this genetic organization, the
recombination events that led to the acquisition of many or most genetic loci and the
assembly of the large virulence plasmid almost certainly involved IS-mediated events.
(Malabi et al., 2001). The average G+C content of pWR501 is 47.6% but all virulence
associated genes on the plasmid have G+C composition of 30-35% (Lan et al., 2001). In
pWR501, the impCAB operon is missing; only the first 176 bp are present, beginning at
sequence coordinate 157595. (Philpott et al., 2000).
27
1.16.4 Genome of Shigella sonnei
1.16.4.1 Genome of Shigella sonnei Ss046
Chromosome (NCBI Entrez Genome): GenBank Accession Number: (NC_007384) .
Size: 4,825,265 (NCBI Entrez Genome). Gene Count: 4553 Genes. 4223 Proteins
(NCBI Entrez Genome).
1.16.4.2 Plasmid pSS (NCBI Entrez Genome): GenBank Accession Number:
(NC_007385) . Size: 214,396 (NCBI Entrez Genome).
Gene Count: 241 Genes. 248 Proteins (NCBI Entrez Genome).
Description: The complete sequence of pSS, which is the large virulence plasmid of
Shigella sonnei, was determined. The 214-kb plasmid is composed of segments of
virulence-associated genes, the O-antigen gene clusters, a range of replication and
maintenance genes, and large numbers of insertion sequence (IS) elements. The pSS
plasmid is a mixture of genes with different origins and functions. The sequence
suggests a remarkable history of IS-mediated recombination and acquisition of DNA
across a range of bacterial species (Jiang et al., 2005). Similar to the other three groups
of Shigella, the virulence plasmid of S. sonnei, designated as pSS, is sufficient for
entering, replicating, and disseminating within epithelial cells. However, pSS is unstable
and tends to be lost at a high frequency, unlike other large unicopy plasmids (Jiang et
al., 2005).
1.16.4.3 Colicins
Description: Colicins are plasmid-encoded toxic exoproteins that are produced by
colicinogenic strains of Escherichia coli and some related species of the family
Enterobacteriaceae. To date, at least 23 colicin types have been described in detail
(Smajs and Weinstock, 2001). Colicin Js was originally described as a bacteriocin of
Shigella sonnei colicinotype 7. (Smajs and Weinstock, 2001).
Plasmid ColJs (NCBI Entrez Genome): GenBank Accession Number: NC_002809 .
Size: 5,210 (NCBI Entrez Genome). Gene Count: 3 Genes. 3 Proteins (NCBI Entrez
Genome).
Description: The 5.2-kb ColJs plasmid of a colicinogenic strain of Shigella sonnei
(colicin type 7) was isolated and sequenced. A 1.2-kb unique region of pColJs showed
significantly different G+C content (34%) compared to the rest of pColJs (53%) (Smajs
and Weinstock, 2001).
28
1.17 Diagnosis
Diagnosis of shigellosis is made clinically by the typical features of bacillary dysentery
with blood and mucus in stool although some cases may present with mild to moderate
watery diarrhea initially. Microscopic examination of faecal smear stained with iodine
shows presence of plenty of faecal leucocytes (> 10/high power field). Confirmation is
made by stool culture, serological and biochemical tests (World Health Organisation,
1987).
1.17.1 Collection, transportation and culture of stool specimen
Specific diagnosis of shigella in stool specimens depends on the appropriate collection
and transportation to the laboratory. Fresh stool samples collected from patients before
initiation of therapy are preferred for microbiological tests because the chances of
recovering the organisms are higher. For microbiologic cultures, fresh stool is preferred
to rectal swabs in which the pathogens are less in number. Samples that cannot be
cultured immediately should be kept in buffered glycerol-saline transport medium.
Cary-Blair medium is the second option. Direct inoculation of culture plates at the
bedside is the most efficient means of isolating shigella from the dysentery patients.
Stool specimens for isolation of shigella should be plated on both moderately selective
medium such as MacConkey or deoxycholate citrate agar (DCA), and a highly selective
medium such as xylose-lysin deoxycholate (XLD), Hektoen enteric (HE) or
Salmonella-Shigella (SS) agar. Since the Shigella isolates growing in these plates do not
change the colour of the pH indicator due to its inability to ferment lactose, it is easy to
pick up the typical colonies. Further identification can be made by using triple sugar
iron (TSI) agar or Kligler iron agar (KIA), on which Shigellae are non-motile, produce
an alkaline slant and acid butt due to inability to ferment lactose aerobically in the slope
and the anaerobic fermentation of glucose in the butt, and fail to produce hydrogen
sulphide or other gas. After tentative identification, strains can be speciated by
serological methods, using grouping antisera. Rapid methods for the diagnosis of S.
dysenteriae type 1 by means of fluorescent antibody staining have been established
(Albert et al., 1992).
1.17.2 PCR based methods for the identification of Shigella
Molecular typing of pathogens has long been a part of pathogen identification and
control and has recently been accelerating with new technologies. Traditionally,
serotyping has been extremely valuable and has often been able to identify important
29
cellular components associated with virulence. While serotyping will continue to be an
important tool, it often has limited discriminatory power, resolving pathogens into only
a few types. (Boyd et al.,1996). However, DNA typing is more rapid and less expensive
and has an even greater capacity for genetic dissection of bacterial pathogens. It is
limited only by the genome size and the technology. Because most microbial genomes
consist of millions of nucleotides, technology is invariably limiting. (Miettinen etal.,
1999). The polymerase chain reaction (PCR) is a powerful technique for highly specific
amplification of DNA defined by two flanking primers and has had a major impact on
many aspects of biology (Mullis and Faloona, 1987). Most of the PCR methods
established for the identification of shigella are targeted towards either invasive-
associated locus (ial) gene or invasive plasmid antigen (ipa) H locus, which are also
present in the enteroinvasive Escherichia coli (EIEC) (Ye LY et al., 1993; Dutta S et
al.2001). The use of IS630-specific primers along with serotype specific primers derived
from the rfc genes in the multiplex PCR was reported to be useful for the detection of
many serotypes of Shigella (Houng et al., 1997). In most of these studies, PCR was
found to be more sensitive and specific technique than the conventional culture methods
and has the potential to be employed in routine diagnosis. In addition, in most of the
Shigella strains there is a spontaneous loss of the virulence genes, and hence direct stool
PCR based detection system is preferred than the DNA probe hybridization technique in
which the strains should be cultured several times (Dipika Sur et al ., 2004).
1.17.3 PCR based Molecular typing methods
Traditional subspecific typing methods include serotyping, phage typing, biotyping,
plasmid profiling, multilocus enzyme electrophoresis, conventional restriction
endonuclease analysis, ribotyping, and pulsed-field gel electrophoresis (PFGE). Their
strengths notwithstanding, all of these methods have one or more significant drawbacks,
including being slow or cumbersome; requiring highly specialized equipment, skills,
and/or reagents; relying on variable or unstable traits; and yielding uninterpretable
results for some strains (Eisenstein, 19990; Lupski, 1993; Selander et al., 1987).
Recently, PCR-based methods have become increasingly important to molecular typing
efforts. These approaches include AFLPs, repetitive element polymorphisms-PCR,
randomly amplified polymorphic DNA, arbitrarily primed PCR (Welsh and
McClelland; Williams, 1990) and Pulsed Field Gel Electrophoresis (PFGE) (Herrmann
et al., 1992).
30
The power of PCR-based methods is the ease with which they can be applied to many
bacterial pathogens and their multilocus discrimination. These methods have proven
valuable for genetic dissection of pathogens for which other approaches have failed.
However, a limitation of many PCR-based approaches is the biallelic (binary) nature of
their data, frequently, the presence or absence of a marker fragment. Finally,
comparative gene sequencing is becoming feasible for strain characterization and can be
performed at multiple loci (Williams et al., 1990). PCR-based fingerprinting is a simple,
rapid, and broadly applicable typing method that is potentially available to any
laboratory with PCR capability. Fingerprints are generated using RAPD (random
amplified polymorphic DNA) (Williams et al., 1990), arbitrarily primed PCR (Welsh et
al., 1990), or DNA amplification fingerprinting (Caetano-Anolles, 1993) or repetitive-
element-based primers (rep-PCR) (Versalovic et al., 1994). In its best applications,
multiple-locus sequence typing (MLST) can provide data for multiple alleles
(haplotypes) spread across dispersed genomic locations (Maiden et al., 1998).
Nucleotide data are well understood, standardized into four defined categories, and
easily analyzed using phylogenetic approaches. If sufficient nucleotide diversity is
present, MLST can distinguish among both species and strains. While routine clinical
MLST is still unfeasible, hybridization arrays (e.g., chip technology) could make single-
nucleotide polymorphisms a mainstream approach to pathogen typing in the future
(Vahey et al., 1999).
1.17.3.1 Rep-PCR, ERIC-PCR and BOX-PCR
Rep-PCR genomic fingerprinting makes use of the DNA primers complementary to
naturally occurring, highly conserved, repetitive DNA sequences, present in multiple
copies in the genomes of most Gram-negative and several Gram-positive bacteria
(Lupski and Weinstock, 1992). Rep-PCR genomic fingerprinting, is based on PCR-
mediated amplification of DNA sequences located between specific interspersed
repeated sequences in prokaryotic genomes (de Bruijn,1992; Louws et al., 1996). Three
families of repetitive sequences have been identified, including the 35-40 bp repetitive
extragenic palindromic (REP) sequence, the 124-127 bp enterobacterial repetitive
intergenic consensus (ERIC) sequence, and the 154 bp BOX element (Versalovic et al.,
1994). These sequences appear to be located in distinct, intergenic positions around the
genome. The repetitive elements may be present in both orientations, and
oligonucleotide primers have been designed to prime DNA synthesis outward from the
inverted repeats in REP and ERIC, and from the boxA subunit of BOX, in the
31
polymerase chain reaction (PCR) (Versalovic et al., 1994). The use of these primer(s)
and PCR leads to the selective amplification of distinct genomic regions located
between REP, ERIC or BOX elements. The corresponding protocols are referred to as
REP-PCR, ERIC-PCR and BOX-PCR genomic fingerprinting respectively, and rep-
PCR genomic fingerprinting collectively (Versalovic et al., 1991). The amplified
fragments can be resolved in a gel matrix, yielding a profile referred to as a rep-PCR
genomic fingerprint (Versalovic et al., 1994). These fingerprints resemble "bar code"
patterns analogous to UPC codes used in grocery stores (Lupski, 1993).
Characteristic prokaryotic repeats such as the enterobacterial repetitive intergenic
consensus (ERIC) sequences and the repetitive extragenic palindrome sequence motif
have been found in microbial species as diverse as Enterobacteriaceae and
cyanobacteria (Boom et al., 1990; Martin et al., 1992).
1.17.3.2 RFLP and RAPD
DNA fingerprinting techniques such as restriction fragment length polymorphism
(RFLP) and random primer polymorphism amplification detection (RAPD) have been
described as powerful molecular typing methods for microorganisms (Swaminathan et
al., 1993). RFLP requires large amounts of genomic DNA, defined nucleic acid probes
and laborious hybridization procedures. The performance of RAPD is also sensitive to
many factors such as selection of primers, magnesium concentration in the PCR buffers
and the thermocycler used for PCR (Lin et al., 1996). There are three major steps in the
AFLP procedure: (i) restriction endonuclease digestion of genomic DNA and the
ligation of specific adapters; (ii) amplification of the restriction fragments by PCR using
primer pairs containing common sequences of the adapter and one to three arbitrary
nucleotides; (iii) analysis of the amplified fragments using gel electrophoresis. The
combination of different restriction enzymes and the choice of selective nucleotides in
the primers for PCR make AFLP a useful new system for molecular typing of
microorganisms (Jhy-Jhu et al., 1996).
1.17.4 Characteristics of the various molecular typing methods
Although a particular typing method may have high discriminatory power and good
reproducibility, the complexity of the method and interpretation of results as well as the
costs involved in setting up and using the method may be beyond the capabilities of the
laboratory. The choice of a molecular typing method, therefore, will depend upon the
32
needs, skill level, and resources of the laboratory. Some factors in evaluating the utility
of a particular typing method is:
- Ease of interpretation
- Ease of use
- Cost
- Time to obtain a result
- Discrimination power
- Intralaboratory reproducibility
- Interlaboratory reproducibility
Table.1 Shows summary of the characteristics of the various molecular typing
methods
Table 1.4 Summary of the characteristics of the various molecular typing methods Methodology
M: Moderate, E: Easy, H: High, G: Good, D: Difficult, P: Poor, L: Low (Michael Olive and Pamela Bean,1999).
Ease
of
use
Ease of
interpretation
Discrimination
power
Time to
result
(days)
Intralaboratory
reproducibility
Interlaboratory
reproducibility
Setup
cost
Cost per
test
PFGE
Moderate Easy High 3 Good Good Moderate Moderate
PCR-RFLP Easy Easy Moderate 1 Good Good Moderate Low
Rep-PCR Easy Easy High 1 Good Moderate Moderate Low
RAPD Easy Easy High 1 Moderate Poor Moderate Low
CFLP Moderate Moderate Moderate 2 Good Poor Moderate High
AFLP Moderate Easy High 2 Good Good High Moderate
Sequencing Difficult Moderate High 2 Good Good High High
1.17.5 VNTR and MLVA
One of the most recent developments in molecular typing involves the analysis of
VNTR sequences (Frothingham and Meeker, 1998; Keim et al., 1999). Short
nucleotide sequences that are repeated multiple times often vary in copy number,
creating length polymorphisms that can be detected easily by PCR using flanking
primers. VNTRs appear to contain greater diversity and, hence, greater discriminatory
capacity than any other type of molecular typing system (van Belkum et al., 1998;
Richards and Sutherland, 1997). Analysis of variable-number tandem repeats
33
(VNTR), also called multiple-locus VNTR analysis has proven to be a highly
powerful and discriminant method to study the population structure of bacteria
(Pourcel et al., 2003) and to characterize isolates even from monomorphic bacterial
populations (Farlow et al., 2002; Keim et al., 2000). VNTRs and other short-sequence
DNA tandem repeats in prokaryotic genomes appear to provide useful information on
both the functional and the evolutionary aspects of bacterial genetic diversity (Van
Belkum, 1999). Once these polymorphisms are located, flanking primers can then be
designed to amplify these variable length regions thus allowing differentiation of copy
numbers using the size of the resultant amplicon. This can be done using standard
agarose gel electrophoresis and if a higher resolution is required, fluorescent labelling
and fragment sizing via a DNA sequencer can be used. VNTR is therefore applicable
to a wide range of laboratories, including those which may have simple equipment
such as thermal cyclers and agarose gel electrophoresis but do not have access to
sophisticated equipment such as DNA sequencers. Furthermore when VNTR is
applied to multiple loci as a typing scheme such as in Multiple Locus VNTR Analysis
(MLVA) greater discriminatory power and more accurate determination of genetic
relatedness is achieved (Adair et al., and Keim et al., 2000; Klevytska et al., 2001).
More recently, a number of studies have supported the notion that tandem repeats
reminiscent of mini and microsatellites are likely to be a highly significant source of
very informative markers for the identification of pathogenic bacteria even when
these pathogens are recently emerged, highly monomorphic species (van Belkum et
al., 1997; Adair et al., 2000). This probably reflects the important contribution of
tandem repeats to the adaptation of the pathogen to its host. Tandem repeats appear to
contribute to phenotypic variation in bacteria in at least two ways. Tandem repeats
located within the regulatory region of a gene can constitute an on/off switch of gene
expression at the transcriptional level [van Ham et al., 1993; Weise et al., 1989).
Similarly, tandem repeats within coding regions with repeat units length not a
multiple of three can induce a reversible premature end of translation when a mutation
changes the number of repeats (reviewed in [Bayliss et al., 2001; Wang et al., 2000).
Variable Number of Tandem Repeats (VNTR) has been described for various
organisms. These include Salmonella enterica (Ramisse et al., 2004; Liu et al., 2003),
Staphylococcus aureus (Sabat et al., 2003), Yersinia pestis (Adair et al., 2000),
Mycobacterium tuberculosis (Frothingham et al., 1998), Francisella tularensis
(Farlow et al., 2001), Legionella pneumophila (Pourcel et al., 2003), Brucella spp
34
(Bricker et al., 2003), Escherichia coli O157:H7 (Noller et al., 2000) and Borrelia spp
(Farlow et al., 2002). The increasing availability of whole-genome sequences is an
invaluable source of VNTRs, which has opened the way to multiple-locus VNTR
analysis (MLVA) for the typing of bacteria. MLVAs have been proposed so far for
Bacillus anthracis (Le Fle`che et al., 2001), Yersinia pestis (Pourcel et al., 2004),
Francisella tularensis (Farlow et al., 2001), Mycobacterium tuberculosis (Le Fle`che
et al., 2001), Legionella pneumophila (Pourcel et al., 2003), Pseudomonas
aeruginosa (Oteniente et al., 2003), Escherichia coli O157:H7 ((Lindstedt et al.,
2004), and Salmonella enterica subsp. Enterica serovars Typhimurium (Lindstedt et
al., 2003) and Typhi (Liu et al., 2003).
1.18 Repetitive DNA
Repetitive DNA, which occurs in large quantities in eukaryotic cells, has been
increasingly identified in prokaryotes. In eukaryotic genomes, this repetitive DNA is
infrequently associated with coding regions and consequently is located primarily in
extragenic regions (Cox et al., 1997). Repetitive DNA consists of simple
homopolymeric tracts of a single nucleotide type [poly(A), poly(G), poly(T), or
poly(C)] or of large or small numbers of several multimeric classes of repeats. These
multimeric repeats are built from identical units (homogeneous repeats), mixed units
(heterogeneous repeats), or degenerate repeat sequence motifs (Fig. 1 shows a
schematic overview) (van Belkum et al., 1998).
Figure. 1. Schematic survey of SSRs.
35
(A) Examples of homogeneous simple sequence motifs consisting of repeat units
varying from 1 (homopolymeric tract) to 6 nucleotides in length. (B) Example of a
in eukaryotes, essentially
uman and yeast (Vergnaud and Denoeud, 2000). In brief, the data obtained so far
eplication slippage processes; mutation rates
composite, heterogeneous repeat built from three 3-nucleotide units, two 5-nucleotide
units, and seven 2-nucleotide motifs. (C) Comparative analysis of four different
repeats built from three 10-nucleotide units showing degeneracy among units. Identity
of the nucleotide sequences B through D with the consensus given in sequence A is
indicated by dashes (van Belkum et al., 1998).
1.18.1 Microsatellites and minisatellites
Minisatellites are usually defined as the repetition in tandem of a short (6- to 100-bp)
motif spanning 0.5 kb to several kilobases. Although the first examples described
20 years ago were of human origin, (Wyman and White, 1980).
Microsatellites or simple sequence repeats (SSRs), tandemly repeated units of one to
six nucleotides, are abundant in prokaryotic and eukaryotic genomes (Weber 1990;
Field and Wills, 1996). They are ubiquitously distributed in the genome, both in
protein coding and in noncoding regions (Toth et al., 2000). Mutation mechanisms of
micro and minisatellites have been studied in some detail
h
suggest that microsatellites mutate by r
depend upon the efficiency of mismatch repair mechanisms and an internal
heterogeneity within the array strongly stabilizes the tandem repeat. In contrast,
minisatellites mutate predominantly as the result of the repair of a double strand break
initiated within, or very close to, the tandem repeat. In eukaryotes at least, these
events can be of replicative origin (Kokoska et al., 1998), or can be genetically
controlled, and specifically induced, during meiosis, at double strand breaks hot-spots.
Minisatellite mutation rate in eukaryotes appears to be insensitive to mismatch repair
efficiency, and internal heterogeneity is compatible with a high mutation rate
(Vergnaud and Denoeud, 2000; Debrauwère et al., 1999).
1.18.2 SSR (Simple sequence repeats)
SSRs are encountered in many different branches of the prokaryote kingdom. They
are found in genes encoding products as diverse as microbial surface components
36
recognizing adhesive matrix molecules and specific bacterial virulence factors such as
lipopolysaccharide-modifying enzymes or adhesins. SSRs enable genetic and
consequently phenotypic flexibility. SSRs function at various levels of gene
expression regulation. Variations in the number of repeat units per locus or changes in
the nature of the individual repeat sequences may result from recombination processes
or polymerase inadequacy such as slipped-strand mispairing (SSM), either alone or in
combination with DNA repair deficiencies. These rather complex phenomena can
roaching a frequency of 10−4 per bacterial cell
etitive
mbersome Southern hybridization
occur with relative ease, with SSM app
division and allowing high-frequency genetic switching. Bacteria use this random
strategy to adapt their genetic repertoire in response to selective environmental
pressure. SSR-mediated variation has important implications for bacterial
pathogenesis and evolutionary fitness. Molecular analysis of changes in SSRs allows
epidemiological studies on the spread of pathogenic bacteria. The occurrence,
evolution and function of SSRs, and the molecular methods used to analyze them are
discussed in the context of responsiveness to environmental factors, bacterial
pathogenicity, epidemiology, and the availability of full-genome sequences for
increasing numbers of microorganisms, especially those that are medically relevant
(Van Belkum et al., 1998).
1.18.2.1 Molecular Analysis of SSRs
Repetitive DNA has characteristic physical features due toits specific nucleotide
composition. The detection of the first eukaryotic repetitive DNA moieties was the
immediate consequence of their aberrant density. When subjected to density gradient
centrifugation, repetitive DNA lagged behind the bulk of DNA and presented as
satellite fractions due to differences in thermodynamic stability and reassociation
kinetics (Britten, R. J., and D. E. Kohne. 1968). Whereas the variability in rep
DNA domains was initially detected by relatively cu
techniques with DNA probes recognizing the repeat consensus motif, the emergence
of PCR technology enabled a more straightforward DNA amplification mediated
approach (Jeffreys, A. J. et al.,1991). In this method, PCR primers bordering the SSR
region are constructed and polymorphism in repeat unit number is documented by
simple electrophoretic techniques once DNA amplification has been performed.
Regions bordering the repeats are generally sufficiently well-conserved targets for
PCR-mediated amplification. Consequently, repeat degeneracy can be analyzed by
direct sequencing. Moreover, border sequence conservation is sometimes even
37
observed among different species, allowing a broad-spectrum analysis of the nature of
the species and subspecific genetic polymorphisms (Shields, D. C et al., 1995).
1.18.2.2 STRUCTURAL FEATURES OF SSRS
Bacterial SSR-type DNA can be divided into four main categories. First, dispersed
repeat motifs that generally do not occur in tandem have been identified. Although
these repeats occur throughout genomes of a multitude of microorganisms, they are
sometimes organized in tandem as well. A second class is formed by the
homopolymeric tracts. Multimers of one of the four nucleotides are peculiar sequence
elements that are frequently encountered in the genome of S. cerevisiae, for instance.
These homogeneous stretches can amount to as much as 42 nucleotides. Third, short-
motif SSRs are identified. With repeat units differing from 2 to 6 bases, it is this class
of repeats that is most liable to unit number variation at a given locus. Particularly,
when these short-motif repeats are located within genes and are not 3 or 6 nucleotides
tential of a given transcript. Fourth,
le.
nomes has shown that (i) a large number of
long, they can drastically affect the coding po
repeats harboring more than 8 nucleotides per unit, form a separate category. Repeats
with intermediately sized unit lengths are only rarely encountered. It is interesting that
the shorter unit repeats, in particular, are involved in regulatory processes that are
affected by SSM. Among the longer repeats, a larger degree of sequence
heterogeneity is observed. This heterogeneity is thought to be indicative of more
frequent recombination. Analyses of the precise function of the repeat locus are often
missing. It is regularly assumed that these repeats encode protein sequences spanning
membranes or cell walls. Therefore, they play a physical more than a regulatory ro
These longer repeats are candidate regions for determining phylogenetic relatedness
between species or strains (Van Belkum et al., 1998).
1.18.2.3 Studies on SSR in Bacterial Species
Study on characterization of mononucleotide Repeats of size between 5 and 13 nt in
157 Sequenced Prokaryotic Ge
mononucleotide SSRs are present in all prokaryotic genomes investigated, (ii) shorter
repeats are much more abundant than longer repeats, and (iii) in the majority of the
genomes, longer mononucleotide SSRs are excluded from coding regions. Also it has
observed that some genomes contain more mononucleotide SSRs than expected,
while others contain significantly less. Bacterial genomes that contain much less
mononucleotide SSRs than expected are generally larger and more GC-rich, while
38
bacterial genomes that contain much more mononucleotide SSRs than expected are in
general smaller and more AT-rich. Finally, it also has noted that genomes that contain
genome and the G+C content of
ononucleotide SSRs ≥ 6 bp. There are considerable differences in the dinucleotide
omposition between the genome and SSRs. While the frequency of AT/TA in
inucleotide SSRs ≥ 6 bp is much higher than expected in C. jejuni, it is lower in H.
enome of W. succinogenes. For H. hepaticus, the frequency
a high fraction of horizontally transferred genes have a lower mononucleotide SSR
density and that A and T are generally overrepresented in mononucleotide SSRs
(Coenye and Vandamme, 2005).
Study on Simple Sequence Repeats in Escherichia coli has shown SSRs were well
distributed throughout the genome. Mononucleotide SSRs were over-represented in
noncoding regions and under-represented in open reading frames (ORFs). Nucleotide
composition of mono- and dinucleotide SSRs, both in ORFs and in noncoding
regions, differed from that of the genomic region in which they occurred, with 93% of
all mononucleotide SSRs proving to be of A or T. It also have shown that SSRs are
polymorphic among E. coli strains, providing potential marker loci for rapid detection
and characterization (Gur-Arie et al in 2000).
Study on Abundance, distribution and composition of simple sequence repeats in the
genomes of ε-Proteobacteria has shown The number of mononucleotide SSRs
decreased rapidly with increasing size of the repeat unit, Although local differences in
SSR density could be observed, overall, the SSRs were evenly distributed over the
genomes, they indicates that (i) there is a tremendous overrepresentation of A and T
in mononucleotide SSRs ≥ 6 bp and (ii) there is a highly significant linear
relationship between the G+C content of the
m
c
d
pylori genomes and the g
of AT/TA is approximately the same in the genome and in dinucleotide SSRs ≥ 6 bp.
The frequency of CG/GC in dinucleotide SSRs ≥ 6 bp is lower than expected based
on the genome composition for C. jejuni and H. hepaticus, but normal in the other
genomes. There is an overrepresentation of CT/TC in dinucleotide SSRs ≥ 6 bp in
both H. pylori genomes and in the W. succinogenes genome. there is a slight
overrepresentation of A in mononucleotide SSRs ≥ 6 bp that occur in coding regions
for all genomes (Coenye and Vandamme, 2004).
39
1.18.2.4 SSR Function
Numerous lines of evidence have demonstrated that genomic distribution of simple
sequence repeats (SSRs) is nonrandom, presumably because of their effects on
chromatin organization, regulation of gene activity, recombination, DNA replication,
cell cycle, mismatch repair (MMR) system, etc. (Li et al., 2002). SSRs may provide an
evolutionary advantage of fast adaptation to new environments as evolutionary tuning
knobs (Kashi et al., 1997; Trifonov, 2003). The presence of SSRs in prokaryotes is
rare, but most that do occur are related to pathogenic organisms; their variation in
repeat numbers can also cause phenotypic changes (van Belkum et al., 1998).
Haemophilus influenzae (Hi), an obligate upper respiratory tract commensal/pathogen,
uses phase variation (PV) to adapt to host environment changes. Switching occurs by
slippage of SSR repeats within genes coding for virulence molecules (Hood et al.,
1996). When SSR repeats lie within protein coding regions, UTRs, and introns, any
changes by replication slippage and other mutational mechanisms may lead to changes
in protein function. There are numerous lines of evidence indicating that changes in
lengths of triplet or amino acid repeats could affect protein function, and frameshifts
within coding regions caused by SSR expansion or contraction could (1) cause gain of
function and loss of function or gene silencing and (2) induce novel protein, bacterial
pathogenesis, and virulence. Variations in repeat number of SSR located in the 5'-
UTRs and 3'-UTRs and introns can cause significant effects on gene expression—e.g.,
mRNA splicing or translation—and lead to phenotypic changes with altered selective
values. For instance, in Escherichia coli, hundreds of genes related to DNA repair,
recombination, and physiological adaptation to different stresses contain high density
of small SSRs, which can induce mutation phenotypes by affecting repair efficiency
and/or DNA metabolism (Rocha et al., 2002). In humans, SSR variation in coding
regions, UTRs, and introns can cause neuronal diseases, cancers, SCA, and DM
diseases, among others. In some cases, MSI even affects the effectiveness of medical
treatment on human cancers (Kim et al., 2001). In bacteria, particularly pathogenic
bacteria, infection processes require that the bacteria adapt to several host
environments. Initial colonization, crossing epithelial and endothelial barriers, survival
in circulation, and translocation across, for instance, the blood-brain barrier, are all
processes that require specific virulence traits (Roche and Moxon, 1995). SSR
evolution in genes should share similar mutational processes, including replication
lippage, point mutation, and recombination, but SSRs within genes should be s
40
subjected to stronger selection pressure than other regions because of their functional
gnificance in regulating gene expression and function. These mutational processes
provide mutation resources for the MMR system. If SSR mutations within genes
escape from MMR correction, these mutations can cause phenotypic changes. The
link between changing copy number of SSRs and phenotypes is provided by an
accumulating number of experimental observations showing a dependence of gene
expression and other functions on the copy number of the associated. If SSR changes
result in selectable phenotypic variation, selection can naturally start to act. It has been
demonstrated that SSRs in protein-coding regions are under strong selection (Richard
si
and Dujon, 1997; Alba et al., 1999).
41