chapter 1 general introduction -...

Chapter 1

General Introduction

1

1.1 Importance of Shigellosis

Diarrheal diseases claim the lives of at least five million children per year in developing

countries (Rohde, 1984), and shigellosis or bacillary dysentery is responsible for

approximately 10% of these deaths (Stoll, 1982). Shigella is an important human

pathogen, responsible for the majority of cases of endemic bacillary dysentery

prevalent in developing nations (Kotloff, 1999). In the developing world, it is estimated

that 113 million episodes of shigellosis occur annually, resulting in more than 400,000

deaths (Kotloff, 1999). Shigellosis is a pediatric and a third world disease where it

prevails 100 times more in underdeveloped and developing countries than the

industrialized countries (Bennish et al., 1990). Shigellosis is common among children

less than five years of age in developing countries and in persons who travel from

industrialized to less developed countries (Shlim, 1999). Industrialized countries also

report outbreaks of Shigella infections among high-risk populations such as children

attending day care (Mohle-Boetani, 1995; Pickering and Evans, 1981), persons with

human immunodeficiency virus/acquired immunodeficiency syndrome (Baer, 1999; van

Oosterhout, 1994), and inmates of custodial institutions (Mahoney, 1993). In developed

countries, outbreaks usually involve S. sonnei (Black et al., 1978). Low infectivity dose

i.e., 10-100 microorganisms orally administered can cause disease and severity of acute

complications (Bennish et al., 1990).

1.2 Historical Perspective of Shigellosis

Bacillary dysentery was first differentiated from amoebic dysentery in 1887 and an

etiologic agent, Bacillus dysenteriae, was isolated and described by Shiga in 1898. The

subsequent painstaking process of epidemiological, physiological, and serological

characterization of related dysentery bacilli culminated with the recommendations of

the 1950 Congress of the International Association of Microbiologists Shigella

Commission that Shigella be adopted as the generic name and that species subgroups be

designated A (Shigella dysenteriae), B (S. flexneri), C (S. boydii), and D (S. sonnei)

(Gerber et al., 1960). However, Current studies pose an argument that Shigella

emerged from multiple independent origins of Escherichia coli centuries ago and thus

may not constitute a genus (Pupo et al., 2000). The next milestone was the

characterization of the basic virulence mechanism of shigellosis. By the late 1950s, it

had been shown that shigellae can infect the corneal epithelium of guinea pigs (this is

the basis of the Sereny test) and it was also known that virulent organisms can be

2

grown intracellular in cultured mammalian cells (Gerber et al., 1960). Nonetheless, it

was the prevailing view as late as 1960 that shigellae cause disease by elaborating

endotoxin while adhering to the surface of the intestinal epithelium (Watkins, 1960). In

1964, however, it was conclusively demonstrated that S. flexneri causes disease by

penetrating the intestinal mucosa (LaBrec et al., 1964; Voino-Tasenetsky and Khavkin,

1964). During the late 1960s and early 1970s the pathogenic mechanism of shigellosis

was studied further, and the genetic basis of virulence was analyzed by constructing

intergenic species, a puzzling result of the later work was the finding that essentially the

entire chromosome of S.flexneri could be transferred to E.coli without reconstituting the

virulence phenotype of the donor. This enigma was resolved by the seminal work of P.

J. Sansonetti and colleagues in the early 1980s showing that virulence in Shigella

species is dependent upon a family of large plasmids (Watkins, 1960; Sansonetti,

1982). Studies of Litwin et al., 1991, indicate a correlation between serotype and

plasmid patterns which also suggests that plasmid profiles can be useful in

identification of epidemic clones of S. flexneri as they are introduced in a population.

Genetic variability between serotypes accentuates the problems in development of

vaccine as immunity to Shigella is serotype specific and vaccine protection will depend

on the serotype in the vaccine development (Noriega et al., 1996). Studies on

Shigellosis in Bangladesh have shown S. flexneri serotype 1 to be the second most

prevalent group after type 2. The prevalence of serotype 1c increased from 0 to 8.2%

from 1978 to 2000 whereas 1a decreased from 13.1 to 0.4% which emphasizes

continuous monitoring of S. flexneri serotype distributions (Talukder et al., 2003).

Study of antibiotic resistance patterns in these serotypes indicates the emergence of

resistance to antibiotics to be a major public health problem in developing countries.

Studies also indicate that multidrug resistance can be conferred by plasmids via plasmid

transfer (Hossain et al., 1998).

1.3 Microbiology of Shigella

Shigellae are gram negative, non-sporulating, facultative anaerobic bacilli of the family

Enterobacteriaceae. They are causative agents of Shigellosis or bacillary dysentery

(Sansonetti, 2001). An invasive disease of the human colonic epithelium marked by an

intense inflammatory reaction and subsequent mucosal destruction (Dramsi and

Cossart, 1998), Shigellosis is caused by Shigella spp. which can be subdivided into four

serogroups - S.sonnei, S.boydii, S.flexneri and S.dysenteriae

3

(http://www.who.int/infectious-disease-report accessed on 20th April 2004).Each

serogroup contains multiple serotypes based on the structure of O-antigen component

of the lipopolysaccharide present in the outer membrane of the cell wall (Simmons and

Romanowska, 1987). The four species of Shigella are so closely related to Escherichia

coli that all of these bacteria could be considered members of a single species. They

share greater than 90% homology by DNA–DNA re-association analysis (Brenner et

al., 1969). Infection is transmitted via the fecal-oral route and is characterized by

excretion of stools containing white cells and blood (DuPont et al., 1989). They are

pathogenic primarily due to their ability to invade intestinal epithelial cells. The

virulence factor is a smooth lipopolysaccharide cell wall antigen which is responsible

for the invasive features and a toxin (shiga toxin) which is both cytotoxic and

neurotoxic and causes watery diarrhea (Dipika et al., 2004).

1.4 Shigella Serogrups and Serotyps

Shigellosis is caused by Shigella spp. which can be subdivided into four serogroups - S.

sonnei, S. boydii, S. flexneri and S. dysenteriae. (Brenner et al., 1969). Shigella strains

have been further divided into 38 serotypes based on ‘O’ antigen variation: 13 in S.

dysenteriae, 18 in S. boydii, 6 in S. flexneri and 1 in S. sonnei. Shigella strains have

been clustered into three groups. Cluster 1 contains majority of S. dysenteriae and S.

boydii serotypes, cluster 2 has smaller groups of S. boydii serotypes and S. dysenteriae

type 2 and cluster 3 contains all the S. flexneri serotypes except serotype 6 and 6A (Lan

et al., 2001). However, they have some degree of antigenic relatedness attributable to a

common repeating tetrasaccharide unit, to which ∝-D-glucopyranosyl and O-acetyl

groups are added, providing the basis for their type group antigenic factors. The

variable antigenicity of lipopolysaccharide is mainly due to the chemical and structural

diversity of the O-polysaccharides. The addition of glucosyl and/or O-acetyl groups to a

common tetrasaccharide O repeat units result in different group and type specific

antigens which are encoded by the chromosomal rfb gene (Yao and Valvano, 1994).

Isolation of uncommon serotypes and subserotypes of Shigella spp. particularly of

Shigella flexneri has become frequent occurrence where it may not be always possible

to type the isolates with the present classification system (Talukder et al., 2001).

Despite the antigenic variability based on the structure of ‘O’ antigen, the type group

specificity is retained due to type group antigenic factors displaying some degree of

antigenic relatedness. However, reports indicate a multitude of epitopes in Shigella

4

flexneri that are not covered by agglutination reactions with commercial antisera

(Edwards and Ewing, 1972).

1.5 Clinical Features

The clinical manifestations of Shigella infection vary from short-lasting watery diarrhea

to acute inflammatory bowel disease characterized by fever, intestinal cramp and

bloody diarrhea with mucopurulent feces (Sansonetti, 2001). Neurologic symptoms

such as lethargy, confusion, severe headache, and convulsion are the most common

extraintestinal manifestations of shigellosis. (Ashkenazi et al., 1990). In some cases,

there may not be any symptoms (asymptomatic), while in others it may produce mild to

moderate dysentery or even fulminating dysentery with fever, severe abdominal cramps

and rectal pain. Children may have high fever (104 0F) with convulsions, rectal

prolapse and later develop malnutrition. Shigella sonnei produces mild dysentery.

S.flexneri and S.dysenteriae type 1 typically produce severe dysentery, particularly the

latter (Dipika et al., 2004).

1.6 Epidemiology of Shigellosis

1.6.1 Reservoirs and modes of transmission

Humans are the only natural hosts for Shigella. The predominant mode of transmission

is by faecal-oral contact, and the low infectious inoculum (as few as 10 organisms)

renders Shigellae highly contagious (DuPont et al., 1989). In developing countries,

shigellosis is most common in children less than 5 years old (Black et al., 1978).

Persons symptomatic with diarrhoea are primarily responsible for transmission (Centers

for Disease Control, 1986). Less commonly, transmission is related to contaminated

food and water or fomites; however, the organism generally survives poorly in the

environment. In certain settings where disposal of human faeces is inadequate,

houseflies can serve as a mechanical vector for transmission (Levine et al., 1991).

Overcrowded conditions and water supplies that are inadequately protected from

sewage contamination contribute to the high incidence of infection. In developed

countries, common-source outbreaks, usually involving S. sonnei, occur sporadically,

and the source of such outbreaks is often uncooked food such as a salad that contains

carbohydrates or proteins (Black et al., 1961). Homosexual men are also at risk for

direct transmission of Shigella infections, and recurrent shigellosis complicating human

immunodeficiency virus infection can occur (Blaser et al., 1989). Direct fecal-oral

5

contamination can contribute to endemic shigellosis in institutional environments such

as mental hospitals, day care centers, nursing homes, prisons, and outdoor gatherings.

For example, a recent outbreak of S. sonnei among 12,700 attendees at an outdoor

conference was characterized by an attack rate of greater than 50% (Wharton et al.,

1990).

1.6.2 Distribution of serogroups and serotypes

The predominant serogroup of Shigella circulating in a community appears to be

related to the level of socioeconomic development (Kotloff et al., 1999). Three

predominant strains are responsible for majority of shigellosis cases viz., S. sonnei, S.

flexneri 2a and S. dysenleriae type 1. S. dysenteriae type 1, which produces severe

disease, may cause life-threatening complications, is usually multi drug resistant and

can cause large epidemics and even pandemics with high morbidity and mortality

(Brenner et al., 1969). S. flexneri is the main serogroup found in developing countries

(median 60% of isolates), with S. sonnei being the next most common (median 15%). S.

dysenteriae and S. boydii occur with equal frequency (median 6%). In contrast, data

from Spain, Israel and the United States consistently demonstrate that S. sonnei is the

most common serogroup found in industrialized countries (median 77%), followed by

S. flexneri (median 16%), S. boydii (median 2%) and finally S. dysenteriae (median

1%) (Kotloff et al., 1999). Industrialized countries also report outbreaks of Shigella

infections among high-risk populations such as children attending day care (Mohle-

Boetani et al., 1995; Pickering et al., 1984), persons with human immunodeficiency

virus/acquired immunodeficiency syndrome (Baer et al.,1999; van Oosterhout et

al.,1994). According to current estimates, over two thirds of all episodes of shigellosis

and four fifths of all deaths from shigellosis occur in children under five years old.

Among children, the risk of death from shigellosis is greatest in infants and those who

are severely malnourished (Khan et al., 1985; Bennish et al., 1990). S. dysenteriae 1,

the agent of epidemic shigellosis, is responsible for extensive outbreaks in Central

Africa, Southeast Asia, and the Indian subcontinent. S. dysenteriae 1 is also isolated

from up to 30% of dysentery patients in endemic areas (Bennish et al., 1990).

Provisional serotype of S. flexneri 1c, which was first identified in Bangladesh, was

found later in rural Egypt (Gendy et al., 1999). Similarly serotype 4c was isolated in

Russia (Pryamukhina and Khomenko, 1988) and, in Taiwan S. flexneri and S. sonnei

have been reported to be major causative agents of Shigellosis as compared to S.

6

dysenteriae and S. boydii which are seen only in cases of imported disease (Pan, 1997).

Cases of Shigellosis caused by S. sonnei have also been reported in United Kingdom

and Ireland (Delappe et al., 2003). S. boydii strains are less frequently isolated and in

developed countries they are considered to be imported (Rowe et al., 1974), however,

S. boydii has been found to be indigenous to some South European countries. In 1990,

the isolation rate of S. boydii serotype 2 in Bulgaria increased sharply and the strains

originated from different geographic locations were reported as sporadic (Prats et al.,

1985). S. sonnei is typically associated with mild self-limiting infection however; it has

become most prevalent in the developed world. Shigellosis has been reported to be

third leading bacterial gastrointestinal disease in the United States (Cimmons, 2000).

1.6.3 Shigella and HIV infection

The intersection of Shigella infections and the human immunodeficiency virus (HIV)

epidemic has had serious consequences. Both chronic diarrhea and dysentery are

common among persons infected with HIV (Colebunders et al., 1987; van Oosterhout

and van der Hoek, 1994). Although it is not known whether the risk of acquiring

shigellosis is enhanced by concomitant HIV infection (Angulo et al., 1995). It appears

that HIV-associated immunodeficiency leads to more severe clinical manifestations of

Shigella infection. Patients with HIV infection may develop persistent or recurrent

intestinal Shigella infections, even in the presence of adequate antimicrobial therapy.

They also face an increased risk of Shigella bacteraemia, which can be recurrent, severe

or even fatal (Dougherty et al., 1996; Batchelor et al., 1996).

1.6.4 Shigella epidemics and pandemics

During 1967-70, bacillary dysentery was first reported in Central American countries

(Mendizabal-Morris Et al., 1969). Since then, spread of this infection has been reported

from many Asian countries such as Bangladesh (1972-78, 2003), Sri Lanka (1976),

Maldives (1982), Nepal (1984-85), Bhutan (1984-85) and Myanmar (1984-85) (Pal et

al., 1989; Naheed et al., 2004). In India, epidemics were mainly encountered in

southern India (Vellore - 1972-73, 1997-2001 ) (Mathan et al., 1984; Jesudason et al.,

1997) eastern India (1984) (Pal et al., 1984 Datta et al., 1987) and Andaman and

Nicobar islands (1986) (Sen et al., 1987 Bhattacharya et al.,1988). Recent outbreaks

(2002-03) of multi drug resistant S. dysenteriae type 1 have been reported from Siliguri,

Diamond Harbour, Kolkata, and Aizwal and Bangladesh (Sarkar et al., and

7

Bhattacharya 2003). When pandemic S. dysenteriae type 1 strains invade these

vulnerable populations, the attack rates are high and dysentery often becomes a leading

cause of death (Ries et al., 1994). The pandemic that began in Central Africa in 1979

progressed to East Africa and has since become particularly problematic among refugee

populations (Centers for Disease Control and Prevention, 1994).

1.7 Drug Resistance

Over the last 50 years, Shigella has demonstrated extraordinary prowess in acquiring

plasmid-encoded resistance to the antimicrobial drugs that previously constituted first-

line therapy. Sulfonamides, tetracycline, ampicillin and

trimethoprim±sulfamethoxazole initially appeared as highly efficacious drugs, only to

become impotent in the face of emerging resistance (Sack et al., 1997). In the 1990s,

few reliable options exist to treat multiresistant Shigella infections, particularly in

developing countries where cost and practicality are paramount considerations.

(Gangarosa et al., 1970; Ries et al., 1994). Treatment of shigellosis has been

confounded by wide spread resistance to the commonly used antibiotics such as

ampicillin, co-trirnoxazole, tetracycline, nalidixic acid and recently to norfloxacin and

ciprofloxacin. The transmissibility of resistance can take place by clonal spread of

particular strains as observed in S. dysenteriae type 1(Dutta et al., 2003). Plasmid

fingerprinting of 25 strains of S. boydii serotype 2 from Bulgaria however showed high

genetic relatedness and presence of antibiotic resistance genes (Bratoeva et al., 1992).

Conjugative plasmids encoding resistance to multiple antibiotics have been detected in

S. sonnei (Barg et al., 1995).

1.8 Shigella Pathogenicity

Pathogenecity of Shigella invasion is a multistep process which consists of entry into

epithelial cells by induced phagocytosis, escape from the phagocytic vacuole,

multiplication and spread within the epithelial cell cytoplasm, passage into adjacent

epithelial cells by finger like protrusions from cell surface and killing of host cells

(Vasselon et al., 1992). The three essential steps for Shigella virulence are invasion of

epithelial cells, intracellular multiplication, and the spread of the invading bacteria into

adjacent cells. (Parsot, 1994).

8

1.8.1 Entry

Shigella directs its own uptake into the colonic mucosa through membrane ruffling and

macropinocytosis (Adam et al., 1995). Shigella strains are unusual among enteric

bacteria in their ability to gain access to the epithelial cell cytosol, where they replicate

and spread directly into adjacent cells. (Sansonetti et al., 1982). Preliminary

experiments indicated that shigellae entered mammalian cells through endocytosis. No

leakage of macromolecules from recipient cells could be observed during entry (Hale

and Formal, 1980). Cytochalasin D, which inhibits microfilament functions, blocked

the entry process (Hale et al., 1979). Use of an antimyosin monoclonal antibody and of

7-nitrobenz-2-oxa-l,3-diazole phallacidin, a fluorescent dye that stains polymerized F-

actin, demonstrated accumulation of myosin and F-actin, two major components of the

cell cytoskeleton, underneath the cytoplasmic membrane at the site of bacterial entry.

(Baudry et al., 1987). The DNA sequence indicated that the genes necessary for entry

of bacteria into epithelial cells are clustered within a 31-kb region of the virulence

plasmid. (Philpott et al., 2000).

1.8.2 lntracellular multiplication

When the pathogen reaches the colon, it invades the epithelial cells. Once it reaches the

underlying M cells, Shigella infects resident macrophages and induces cell death. The

infected macrophages release large amounts of interleukin-1β which leads to a strong

inflammatory response (ZyChlinsky et al., 1994). Meanwhile, a bacterium released

from macrophages enters enterocytes via basolateral surface by directing membrane

ruffling and micropinocytosis. The bacterium is surrounded by phagocytic vacuole but

it disrupts the membrane escapes in cytoplasm where it multiplies and moves by

inducing actin polymerization at one pole of the bacterium, allowing intracellular

spread within the cytoplasm and into adjacent epithelial cells (Makino et al., 1986). A

sequential electron-microscopic study of infected HeLa cells demonstrated that invasive

S. flexneri induced lysis of the phagocytic membrane shortly after penetration into cells.

By 30 min after centrifugation-induced penetration, all bacteria were lying free within

the cytoplasm of host cells (Sansonetti et al., 1986). Similar invasiveness was observed

with E. coli K-12 caring pWR100. On the other hand, Salmonella typhimurium, whose

lysis of the phagocytic membrane is late and inefficient, grows poorly intracellular. A

plasmid-mediated contact-hemolytic activity demonstrated in virulent shigellae

provides a likely mechanism for lysis of the phagosome (Sansonetti et al., 1986). By

9

contrast, correlation has been observed between rapid intracellular growth of shigellae

and the level of Shiga toxin or SLT production (Sansonetti et al., 1986). The iron

chelator aerobactin may also be critical for bacterial replication within cells in which

iron is immobilized by ferritin. Independent studies have shown that mutants of S.

flexneri that no longer produce aerobactin demonstrate no significant alteration in their

capacity to multiply intracellular (Lawlor et al., 1987; Nassif et al., 1963).

1.8.3 Early killing of host cells

Metabolic events that mediate early killing have been demonstrated to include a rapid

drop in the intracellular concentration of ATP, an increase in pyruvate concentration,

and arrest of lactate production (Sansonetti and Mounier, 1987). Plasmid genes are also

involved in early killing of host cells. In a study of the intracellular fate of both an

invasive strain and a noninvasive, plasmidless derivative of S. flexneri, plasmid

pWR100 appeared to mediate killing of host cells (the continuous macrophage cell line

J774) within 4 hr. For expression of this activity bacteria had to be intracellular, since

macrophages were protected by cytochalasin D. Although both strains produced

equivalent levels of SLT and inhibited protein synthesis of macrophages within 2 hr,

only invasive bacteria were able to kill host cells. Damage to macrophages correlated

with the ability of invasive bacteria to rapidly and efficiently lyse the membrane of the

phagocytic vacuole (Clerc et al., 1987).

1.8.4 Continuous reinfection of adjacent cells

In order to spread from one cell to another, the bacterium forms a finger-like

protrusion from surface of the infected cell where around site of exit, a major

rearrangement of the cytoskeleton occurs with the formation of many tiny villosites.

The protrusion elongates and penetrates the surface membrane of adjacent cell. The

bacterium lyses these membranes and is released in the cytoplasm of the adjacent cell

(Prevost et al., 1992). A region (virG) of the S. flexneri virulence plasmid is considered

to be necessary for continuous reinfection of adjacent cells (Makino et al., 1986). virG

mutants can invade cells and multiply intracellular but do not spread to adjacent cells.

Within epithelia, bacteria tend to localize within the cytoplasm and convert to a

spherical morphology before being eliminated (Anthony et al., 1988). After engulfment

the bacteria are surrounded by a membrane bound vacuole in host but they rapidly lyse

the vacuole and are released in the cytosol. They divide and grow in the cytosol and

10

become coated with filamentous actin which ultimately forms an actin tail at one pole

of the bacterium. This propels the bacterium through the cytoplasm and the pathogen

reaches the plasma membrane of the cell where it forms a long protrusion into the

neighbouring cell that internalizes the microbe. This process allows Shigella to move

from cell to cell without coming in contact with the extracellular milieu (Sansonetti et

al., 1999).

1.8.5 Shiga and Shiga-like toxins

Since the beginning of the twentieth century it has been known that Shigella

dysenteriae type 1 produces a potent protein, Shiga toxin (Conradi, 1903; O'Brien and

Holmes, 1989). Its activity as a neurotoxin, cytotoxin, and enterotoxin has been well

described (Keusch, 1988), and cytotoxins with similar biological properties have been

identified from a variety of bacteria, including Escherichia coli and Vibrio species

(O'Brien et al., 1984). This toxin is composed of two subunits. Subunit (32-kd)

possesses the biological activities. It is combined with five molecules of the B subunit

(7.7 kd), which are responsible for binding to cell-surface receptors (Donohue-Rolfe et

al., 1984; Seidah, 1986). This toxin binds to Galotl-4Galp (galabiose) glycolipid

receptors (Lindberg et al., 1987), and inhibits mammalian protein synthesis by cleaving

the N-glycosidic bond at adenine 4324 in 28S rRNA. Therefore, the toxic mechanism is

identical to that of the plant toxin ricin (Endo et al., 1988; Jackson, 1990). Some strains

of S. flexneri and S. sonnei produce low levels of Shiga-like toxin, which is

neutralizable by anti-Shiga toxin sera (Keusch and Jacewicz 1977; O’Brien et a l.,

1977).

1.8.6 Genetic bases of Shigella pathogenicity

The genetic bases for several aspects of the pathogenic process and intracellular

lifestyle of Shigella, including the mechanisms of species specificity, tissue tropism,

and restriction of the immune response, are still poorly understood and probably

involve chromosomally encoded proteins. In common with other enteric bacteria,

Shigella survives the proteases and acids of the intestinal tract by uncertain means.

Highly tissue-specific disease results from a very low infectious dose (10 to 100

bacteria) and in the absence of flagellum-based motility (LaBrec et al., 1964). Plasmid

or bacteriophage mediated horizontal transfer of genes may lead to the emergence of

virulent Shigella strains from closely related avirulent precursors. (Faruque et al.,

11

2002). Virulence is often multifactorial and coordinately regulated, and virulence genes

tend to be clustered in the genome (Hacker et al., 1997). The genetic determinants of

the virulence are mainly associated on virulence plasmids found in Shigellae as well as

some genes located on the chromosome. Most of the work on molecular pathogenesis

of Shigella has been carried out in S. flexneri serotypes 2a and 5a. (Maurelli et al.,

1998). The ability of Shigella sp. to invade epithelial cells and cause enteric disease is

dependent on presence of a family of large low copy number plasmids called pINV

(Makino et al., 1988). The bacterial factors are released upon contact which is property

specific to “type III secretion system” found in a growing number of bacterial

pathogens. These systems comprise of approximately 20 genes. IcsA or VirG, a 120

kDa outer membrane protein hydrolyzes ATP and is localized to one pole of the

bacterium at the junction between microbe and the actin tail. The surface-exposed virG

α-domain recruits vinculin and N-WASP (Neural Wiskott Aldrich syndrome protein)

through binding to the glycine-rich repeats of virG. Vinculin then interacts with actin

filaments and VASP (vasodilator stimulated phosphoprotein), which contributes to

actin polymerization. IcsA is proteolytically cleaved by bacterial protease, SopA (IcsP)

that is required for polarized distribution of IcsA on bacterial surface and for proper

actin-based motility of Shigella in infected cells (Egile et al., 1997).

1.9 Distinguish the Shigella Virulence

Several assay systems are employed to distinguish the different steps in Shigella spp

virulence. Included among these are the Sereny test, which measures infection and

destruction of mucosal surfaces resulting in keratoconjunctivitis in guinea pigs (Sereny,

1955 and Oaks et al., 1985) and the HeLa cell invasion assay (Hale and Formal, 1981).

The use of these various assay systems and the application of classical genetic

techniques have demonstrated that Shigella spp virulence is a multigenic phenomenon.

Genes encoded on the 220-kilobase (kb) invasion plasmid (Sansonetti et al., 1981) and

several unlinked chromosomal loci (Sansonetti et al., 1983) are essential for the

expression of a complete virulence phenotype. In addition, expression of Shigella spp

virulence is regulated by growth temperature. Shigella strains, which are phenotypically

virulent when cultured at 37°C, become phenotypically avirulent when cultured at 30°C

(Maurelli et al., 1984).

12

1.10 Shigella Virulence Genes

Virulence genes may be encoded on plasmids or chromosome, as virulence is

multifactorial the virulence genes tend to be clustered in the genome. ( Hacker et

al.,1997). Genes required for entry of bacteria into epithelial cells and the induction of

apoptosis in infected macrophages are clustered on a 30 kb region (designated the entry

region) of the VP. This region encodes components of a type III secretion (TTS)

apparatus (the Mxi-Spa TTS apparatus), substrates of this secretion apparatus (the

translocators IpaB and IpaC and the effectors IpaD, IpgB1, IpgD and IcsB) and their

dedicated chaperones (IpgA, IpgC, IpgE and Spa15) and two transcriptional activators

(VirB and MxiE) (Hale et al., 1983; Sansonetti et al., 1983). The capacity of Shigella to

enter cells is governed by proteins encoded by a subset of genes within three

contiguous operons (ipa, mxi, and spa) in a 30-kb region of the 230-kb pWR100

virulence plasmid (Parsot, 1994). The Ipa proteins (invasins) are essential for the

invasion of epithelial cells, and their secretion is mediated by the proteins encoded at

the mxi and spa loci whose products constitute a type III secretion apparatus (TTSS) (or

secreton) (Blocker et al., 1999; Ménard et al., 1994).

The entry region is a pathogenicity island-like cluster (see below) that contains: a) the

mxi and spa genes encoding components of a type III secretion apparatus; b) the ipaA,

B, C and D and ipgD genes encoding proteins secreted by this machinery; c) the ipgC

and ipgE genes encoding cytoplasmic chaperones required for stability of IpaB and

IpaC, and IpgD, respectively; d) the virB gene encoding a protein required for

transcription of the mxi, spa and ipa genes; and e) additional genes of unknown

function. Outside of the entry region, other genes associated with virulence have been

identified. They include: a) the icsA (virG) gene encoding an outer membrane protein

that is directly responsible for the ability of the bacteria to move within the cytoplasm

of infected cells; b) the virF gene encoding a transcriptional activator that controls

expression of icsA and virB; and c) the sepA gene, which encodes a secreted serine

protease of the autotransporter family. In addition, the virulence plasmid contains two

copies of the shet2 gene encoding a putative enterotoxin, and genes encoding several

secreted proteins, which include virA, ipaH4.5, ipaH7.8, ipaH9.8 and six

uncharacterized genes designated (outer Shigella proteins): ospB, ospC1, ospD1,

ospE1, ospF and ospG. The proteins encoded in this plasmid are directly involved in

13

the entry into epithelial cells and invasive phenotypes observed in the pathogenesis of

Shigella strains (Alfredo, 2004).

1.11 Invasion Plasmid Antigen Proteins

A set of bacterial gene products, called invasion plasmid antigen (Ipa) proteins, is

secreted by the type III pathway of Shigella and triggers a eukaryotic membrane

ruffling process responsible for mediating entry (Ménard et al., 1996). Genetic and

biochemical analyses implicate four Ipa invasins (IpaA through IpaD) and a type III

secretion system consisting of up to twenty Mxi-Spa proteins (Hueck et al., 1998;

Ménard et al., 1996). The ipa and mxi-spa loci are located within closely linked operons

found on the 230-kb virulence-associated plasmid of Shigella (Parsot and Sansonetti,

1996). The major function of TTSSs is to transport proteins from the bacterial

cytoplasm into the host cell plasma membrane or cytoplasm upon contact with host

cells (Bleves and Cornelis, 2000; Cornelis and Denecker, 2001). In Shigella flexneri,

the mxi, the spa and the ipa operons are expressed at 37°C, but Ipa proteins remain in

the bacterial cytoplasm until the secretion machinery is activated by host cell contact or

by external, presumably surrogate, signals such as serum or a small amphipathic Congo

red (CR) dye molecule (Bahrani et al., 1997; Ménard et al., 1994; Parsot et al., 1995).

Physical contact between the bacterium and the host cell induces insertion of two Ipas

(IpaB and IpaC) into the host membrane to form a 25-Å pore that might be used to

translocate the other invasins into target cells (Blocker et al., 1999). The Ipas then

catalyze the formation of a localized actin-rich, macropinocytic-like ruffle on the host

cell surface, which internalizes the bacterium (Bourdet-Sicard et al., 1999; Tran Van

Nhieu et al., 1999). The current model of the TTS pathway proposes that, upon contact

of bacteria with host cells, translocators insert into the membrane of the host cell to

form a pore through which effectors transit to reach the cell cytoplasm (Hueck, 1998).

Other substrates of the TTS apparatus are encoded by the genes scattered throughout

the VP, such as virA, ospB, C, D, E, F and G and ipaH genes (Buchrieser et al., 2000).

Several of these putative effectors are encoded by the multigene families, with five

ipaH, four ospC, three ospD and two ospE genes carried by the VP. Genes encoding

components of the TTS apparatus and its substrates exhibit a similar low G+C content

(approx. 34 mol%), suggesting that lateral transfer acquired the entire TTS system once.

In addition, the VP encodes at least five other proteins that are involved in virulence,

including IcsA, IcsP, VirK, MsbB2 and SepA. IcsA (VirG) is an outer-membrane protein

directly involved in promoting actin polymerization at one pole of intracellular bacteria

14

(Buchrieser et al., 2000; Goldberg and Theriot, 1995). IcsP (SopA) is an outer-

membrane protease involved in the release of a certain proportion of surface-exposed

IcsA (Egile et al., 1997; Shere et al., 1997).

1.11.1 VirF (Positive regulator of the plasmid virulence regulon)

The virF locus was first identified by a spontaneous deletion in SalI fragment F that

resulted in the simultaneous loss of four virulence-associated phenotypes: Pcr (Congo

red binding), Inv (invasion of tissue culture cells in vitro), Ser (Sereny test), and Igr

(inhibition of growth) (Sasakawa et al., 1986). The functional gene product of virF is a

30-kDa protein, but a 24-amino-acid signal peptide-like sequence may be cleaved

during passage through the inner membrane, yielding a 27-kDa protein that has been

detected in minicells (Sakai et al., 1986). The virF gene product plays a central role in

positive regulation of the plasmid virulence regulon. It directly activates transcription

of the virG gene (Sakai et al., 1988) and it indirectly activates ipaABCD (Sakai et al.,

1988; Watanabe, 1988) and invAKJHF (Watanabe, 1988).

1.11.2 VirB (invE, ipaR)

A regulatory locus necessary for expression of the invasive phenotype is located within

Sall fragment B of the S. flexneri 2a virulence plasmid, and the cloned gene has been

designated virB (Adler et al., 1989). Analogous genes have been designated ipaR in

S.flexneri 5 (Buysse et al., 1990) and invE in S. sonnei (Watanabe et al., 1990). The

mobility of the protein product of virB in sodium dodecyl sulfate (SDS)-polyacrylamide

gels indicates a molecular mass of 33 kDa (Adler et al., 1989). Whereas the ipaR

protein has been estimated to be 34 kDa (Buysse, 1990) and the invE protein has been

estimated at 35 kDa (Watanabe et al., 1990).

1.11.3 IpaABCD, ippI, and invGF

Immunoblots with serum from monkeys or humans infected with Shigella species

demonstrate a consistent serum immune response that recognizes five plasmid-encoded

proteins (Watanabe et al., 1990; Sasakawa et al., 1989; Kato et al., 1989). The largest

of these proteins is the product of the virG (icsA) locus, which is located outside the

invasion region. The other proteins are encoded by the ipa locus, which corresponds to

invasion region 2 in the S. flexneri 2a virulence plasmid (Sasakawa et al., 1988). Since

the latter proteins were originally designated a (78 kDa), b (62 kDa), c (43 kDa), and d

15

(38 kDa) in order of descending molecular mass (Hale et al., 1985). The corresponding

genes have been named ipaABCD (Buysse et al., 1987).

1.11.4 InvAAJH (mxiAB)

Tn3-lac fusion inserts within the S.sonnei invasion plasmid have defined four

transcribed genes, designated invAKJH, that are necessary for expression of the

invasive phenotype (Watanabe, 1988; Watanabe et al., 1990). Restriction maps suggest

that these genes correspond to invasion regions 3 and 5 of the S. flexneri 2a plasmid

(Sasakawa et al., 1988) designated mxiA (mxi, membrane expression of Ipa)

(Hromockyj, and Maurelli, 1989). Rabbit antiserum raised against the BS260 fusion

protein recognizes a 76-kDa protein in immunoblots of an S. flexneri 5 whole-cell

lysate (la). Published restriction maps suggest that invA should map within invasion

region 5 and invKJ should map within invasion region 3 on the S. flexneri 2a plasmid.

Restriction analysis also indicates that an S. sonnei gene designated invH should map at

the junction of invasion region 3 and region 2 in the S. flexneri 2a virulence plasmid

(Watanabe, 1988 and 1990; Sasakawa et al., 1988). Since results of precise mapping

and sequencing of invasion regions 3, 4, and 5 have yet to be published, the molecular

mass of invKJH gene products is unknown. However, invA insertion mutants are

complemented by a cloned fragment from the S. sonnei plasmid which expresses a 38-

kDa protein (Watanabe and Nakamura, 1986).

1.11.5 VirG (icsA) (Plasmid gene associated with intercellular bacterial spread)

The virG (icsA) gene product was originally identified as the fifth invasion plasmid

antigen of S. flexneri 5 (Oaks et al., 1986) and extrinsic radioiodination of whole cells

has shown that this protein is exposed on the bacterial surface (Lett et al., 1989). The

virG (icsA) gene product has been reported to have a molecular mobility in SDS-

polyacrylamide gels of 140 kDa (Oaks et al., 1986 Pal et al., 1989) 130 kDa (Lett et al.,

1989) or 120 kDa (Bernardini et al., 1989). The ORF of the cloned virG gene of S.

flexneri 2a suggests a protein of approximately 117 kDa (Lett et al., 1989). However,

minicell analysis of the product(s) of the cloned virG (icsA) gene reveals at least nine

nonvector polypeptides of 130 kDa or less (Bernardini et al., 1989; Lett et al., 1989)

and it has been suggested that these polypeptides are the products of internal initiation

codons (Lett et al., 1989).

16

1.11.6 IpaH (Multicopy invasion plasmid antigen gene)

The ipaH gene is unique in that five complete or partial copies are present on the

invasion plasmids of the various S. flexneri serotypes and multiple copies are also

found on the invasion plasmids of other Shigella species and EIEC. The copies of ipaH

have been characterized by Southern hybridization of HindIII digests of S. flexneri 5

plasmid DNA with a Agtll::ipaH probe. These genes have been designated ipaH 9.8,

ipaH7.8, ipaH4 .5, ipaH2.5, and ipaH.4 on the basis of the size of the hybridizing DNA

fragment (Hartman et al., 1990), ipaH7.8 and ipaH45 have been mapped to SalI fragment

B between virG and ipaR (within 10 kb of the latter gene). Northern blot analysis

indicates that both ipaH7.8 and ipaH4.5 are transcribed in vitro whereas ipaH25 and

ipaH.4 are unexpressed, truncated sequences (Venkatesan et al., 1991).

1.11.7 VirK

VirK is required for production of IcsA by an unknown mechanism (Nakata et al.,

1992). MsbB2 is an acyl transferase that, in conjunction with the product of the

chromosomal msbB1 gene, acts to produce full acyl-oxy-acylation of the myristate at

the 3' position of the lipid A glucosamine disaccharide (d'Hauteville et al., 2002).

Table .1.1 Chromosomal loci associated with the virulence of Shigella

Locus Function T-locus Integration site for incorporation of lysogenic phage encoding type-

specific somatic agent kcpA positive regulation of virG (icsA) virR repression of plasmid invasion loci ipaABCD in response to

temperature stx synthesis of Shiga toxin rfb synthesis of group specific somatic antigen rfd synthesis of somatic antigen basal core sodB superoxide dismutase to inactivate superoxide radicals produced by

respiratory burst in phagocytes ompR-envz Induction of plasmid invasion loci iucABCD-iutA synthesis of aerobactin and 76-kDa aerobactin receptor protein rfa synthesis of somatic antigen 102a decreased intercellular spread basal

core in infected tissue culture monolayers (Hale 1991)

17

Table.1.2 Plasmid genes associated with the virulence of Shigella

Gene Function Stb necessary for stable maintenance of the high molecular

weight virulent plasmid Rep necessary for replication of high molecular weight plasmidvirF positive regulation of virB and virG genes virB (invE, ipaR) positive regulation of ipaABCD and invAKJHFG virG (icsA) associated with intra- and intercellular bacterial spread icsB protrusions at surface of infected cells invA (mxiB),invK (mxiA), invJ, invH, invF

necessary for invasion

ipaB induces apoptosis of macrophages ipaC, ipaD, ipaA mediates endocytic uptake of bacteria mxi, spa components of type III secretion system ipgC molecular chaperons (inv genes are found in S. sonnei)

18

Table1.3 Gene products influencing expression of plasmid linked virulence genes of Shigella flexneri

Gene product

Gene Location Description

CpxA/CpxR cpxA, cpxR Chr Response regulator and sensor

DNA gyrase gyrA, gyrB Chr Introduce negative supercoils, two component system DNAtopoisomerase II

EnvZ/OmpR envZ, ompR Chr Response regulator and sensor

H-NS hns Chr Nucleotide associated protein

IHF ihfA, ihfB Chr Repressor of virB transcription, DNA binding protein regulatesvirB and virF transcription

IspA ispA Chr Possible role in cell division

Mia mia Chr tRNA N -isopentyl adenosinesynthetase

MxiB mxiB VP AraC like protein

Rho rho Chr Transcription termination factor, regulates transcription of virBgene

StpA stpA Chr Analogue of H-NS can repress virulence gene expression whenoverproduced

TopoI topA Chr DNA Topoisomerase I, relaxes negative supercoils

TopoII parC, parE Chr DNA Topoisomerase II, relaxes negative supercoils

TyrT trpT Chr Tyrosyl tRNA

VacB vacB Chr Post transcriptional regulation of ipa and icsA (virG) genes

VacC vacC Chr tRNA guanine transglycosylase, post transcriptional regulation ofvirF

VacJ vacJ Chr Protein needed for intracellular spreading

VacM vacM Chr Transcription of ipa genes

VirB virB VP Vassal regulator, activates main virulence structural genes operons

VirF virF VP AraC like transcription regulator, activates transcription of virBand icsA (virG) genes

VirK virK VP Required for post transcriptional control of icsA (virG) gene

Chr: Chromosomal, VP: virulence plasmid

1.12 Pathogenicity islands and “black holes”.

Pathogenecity islands are regions on the genomes of certain pathogenic bacteria, which

are absent in nonpathogenic strains of the same or closely related species and contain

the contiguous blocks of virulence genes. Similar to horizontal transfer, horizontal

spread of virulence genes by addition of pathogenecity islands is an important element

in the evolution of new emerging pathogens. Another adaptation to evolve towards

pathogenecity is by formation of deletions of genes, which can be harmful for

pathogenecity. The 90% homology exhibited between Shigella and Escherichia coli is

19

suggestive of a high efficiency of gene transfer by conjugation or transduction (Brenner

et al., 1969). In general, pathogenicity islands (PAIs) are large and unstable genetic

elements, acquired by lateral gene transfer, with different G + C content, often

associated with tRNA genes, which contribute to the virulence of bacterial pathogens.

The concept of PAIs was developed on the basis of data on genome structure and

pathogenicity of enteric organisms, especially pathogenic E. coli. However, this concept

is now used broadly in other gram-negative and gram-positive pathogens. (Alfredo,

2004).

1.13 IS-elements

Bacterial insertion sequences were initially identified during studies of model genetic

systems by their capacity to generate mutations as a result of their translocation. Interest

in antibiotic resistance and transmissible plasmids subsequently revealed an important

role for these mobile elements in dissemination

of resistance genes and in promotion of gene acquisition. In particular, it was observed

that several different elements were often clustered in “islands” within plasmid genomes

and served to promote plasmid integration and excision (Bukhari et al., 1977).

The most striking feature of the Shigella genomes is their highly dynamic nature due to

the presence of hundreds of IS-elements in each of the genomes. IS-elements are

capable of causing many kinds of DNA rearrangements (Schneider et al., 2000) and the

presence of the many rearrangements (deletions as well as translocations and inversions)

are a likely the result of the copious numbers of IS-elements. The Sd197 genome shows

the most rearrangements and is considerably smaller than the MG1655 genome due to a

large number of deletions. The genome of this Shigella strain also possesses the greatest

number of IS-elements, mainly in the form of IS1N, which may be responsible for many

of these rearrangements (Fan Yang et al., 2005). Compared with MG1655, Shigella

strains not only have many more copies of IS-elements but also have additional IS-

species, such as IS1N, IS600 and IS629. Within the Shigella genomes, IS1 is

predominant in the Sf301, Sb227 and Ss046 chromosomes whereas IS1N is copiously

present in the Sd197 chromosome. Intact IS21 and IS630 are present only in Ss046,

while the newly identified ISSbo6 is found mainly in Sb227 chromosome. ISSbo6 is

similar to ISEc8 found adjacent to the locus of enterocyte effacement (LEE)

pathogenicity island in EHEC (Perna et al., 1998). Furthermore, most copies of the

ISSbo6 are located within SHI-1, SHI-2 and ipaH islands (see below) in the Sb227

20

genome. The virulence plasmids and chromosomes share most of the IS-species,

suggesting that inter- and intra-replicon translocation and replication has occurred,

leading to large numbers of IS-elements in the genomes (Fan Yang et al., 2005).

The virulence plasmids also display a dynamic nature with many IS-mediated deletions,

translocations and inversions. (Shepherd et al., 2000).

1.14 Virulence plasmids

Large plasmids were first detected about 12 years ago in S. flexneri 2a (Kopecko, 1979)

and the essential role of plasmids in virulence was established shortly thereafter in both

S. sonnei (Sansonetti et al., 1981) and S. flexneri (Sansonetti et al., 1982). Subsequently

it was shown that virulence plasmids are also present in other serotypes of S.flexneri, S.

dysenteriae, S. boydii, and EIEC, Endonuclease digestion and Southern hybridization

indicate that the virulence plasmids of Shigella species and EIEC are essentially

homologous but restriction sites vary with the species and serotype (Hale et al., 1983;

Sansonetti et al., 1983). The virulence plasmids pWR100 in S. flexneri serotype 5,

pMYSH6000 in S. flexneri serotype 2a, and pSS120 in S. sonnei, together with those of

other Shigella bacteria, have been shown to carry determinants for invasiveness and the

ability to cause disease. These large plasmids are collectively termed pINV plasmids

(Hale, 1991). Which are also present in EIEC strains. The cell invasion capacity of

Shigella-EIEC is determined by a cluster of 38 genes within a 32-kb segment of the

pINV plasmid, often referred to as the entry or invasion region, which includes genes

for invasins, molecular chaperones, motility, regulation, and a specialized type III

secretion apparatus (Parsot and Sansonetti, 1996). The plasmids of pINV group have

been classified into two relatively homogeneous sequence forms pINV A and B. pINVA

are found in S. flexneri F6 and F6A, S. boydii B1, B4, B9, B10, B14, B15, S.

dysenteriae D3, D4, D6, D8, D9, D10, D13. pINVB plasmids are present in S. flexneri

F1A, F2A, F3A, F3C, F4A, FY, S. boydii B11, B12 and in S. sonnei. The clustering of

Shigella strains by plasmid types and forms is consistent based on chromosomal gene

sequences. Variations in plasmid sequences have been attributed to horizontal gene

transfer and IS elements (Lan et al., 2001).

21

1.15. Shigella Species:

1.15.1. Shigella boydii , GenBank Taxonomy No.: 621

Description: This species is uncommon except in India, where it was first isolated. The

18 known serotypes are antigenically distinct, expressing a diverse range of toxins in

addition to a Shigella-specific toxin. Progression to clinical dysentery occurs in most

patients infected with this organism (NCBI Entrez Genome Project).

1.15.1.1 Variant(s): Shigella boydii BS512. GenBank Taxonomy No: 344609.

Parent: Shigella boydii.

Description: This strain (strain BS512; serotype 18) was originally isolated from a 12-

year-old boy in Arizona by Dr. Nancy Stockbine. It is a member of Group 1 as

determined by limited sequence analysis and can invade HeLa cells. Pathogenicity and

virulence have been verified during in vitro experimentation, and multiple plasmids are

present in this strain (NCBI Entrez Genome Project).

1.15.1.2 Shigella boydii Sb227 :

GenBank Taxonomy No.: 300268. Parent: Shigella boydii.

Description: This strain is an isolate from an epidemic that took place in China in the

1950s (NCBI Entrez Genome Project).

1.15.2 Shigella dysenteriae :

GenBank Taxonomy No.: 622

Description: Since the late 1960s, pandemic waves of Shiga (S. dysenteriae type 1)

dysentery have appeared in Central America, south and south-east Asia and sub-Saharan

Africa, often affecting populations in areas of political upheaval and natural disaster.

(Kotloff et al., 1999). Synonyms: Shigella shigae, Eberthella dysenteriae, Bacillus

shigae, Bacillus dysenteriae, Bacillus dysentericus (NCBI Taxonomy).

Variant(s):

1.15.2.1 Shigella dysenteriae 1012 : GenBank Taxonomy No.: 358708. Parent:

Shigella dysenteriae .

Description: This strain is representative of the type 4 group of S. dysenteriae that is

becoming more prevalent in human infections. This shift is towards the type 2 and type

4 serotypes, which were not previously associated with outbreaks, and away from the

type 1 serotype, which was implicated in widespread epidemics in Asia, Central

America, and Africa. Pathogenicity has been confirmed in human challenge experiments

and strain 1012 has been shown to be one of the most virulent S. dysenteriae strains

22

identified by WRAIR/NMRC to date. This strain contains multiple plasmids thought to

be involved in virulence (NCBI Entrez Genome Project).

1.15.2.2 Shigella dysenteriae M131649 :

GenBank Taxonomy No.: 216598. Parent: Shigella dysenteriae .

Description: This strain was isolated from a patient in 1970 in Guatemala (NCBI

Entrez Genome Project).

1.15.2.3 Shigella dysenteriae Sd197 :

GenBank Taxonomy No.: 300267. Parent: Shigella dysenteriae.

Description: This strain is an isolate from an epidemic in China in the 1950s (NCBI

Entrez Genome Project).

1.15.3 Shigella flexneri : GenBank Taxonomy No: 623

Description: S. flexneri is endemic in most developing countries and causes more

mortality than any other Shigella species. The predominant serotypes of S. flexneri in

developing countries are serotypes 1b, 2a, 3a, 4a and 6. whilst in industrialized countries

most isolates are 2a (Jennison and Verna, 2004). Synonym: Shigella paradysenteriae

(NCBI Taxonomy).

1.15.3.1 Variant(s): Shigella flexneri 2a : GenBank Taxonomy No.: 42897. Parent:

Shigella flexneri .

Description: In developing countries, the predominant serotype of S. flexneri is 2a,

followed by 1b, 3a, 4a, and 6. In industrialized countries, most isolates are S. flexneri 2a

or other unspecified type 2 strains (Kotloff et al., 1999). Synonym: Shigella flexneri

serotype 2a (NCBI Taxonomy).

Shigella flexneri 2a str. 2457T: GenBank Taxonomy No.: 198215 . Parent: Shigella

flexneri 2a .

Description: Shigella flexneri 2a str. 2457T. This is a highly virulent strain that has

been widely used for genetic and clinical research. It is similar to pathogenic

Escherichia coli except for the more numerous insertion sequences and contains 4

plasmids pINV-2457T, pSf2, and pSf4, and pSf-R27 that are similar to pWR100,

pWR501, pCP301, and R27 respectively (NCBI Entrez Genome Project).

Shigella flexneri 2a str. 301: GenBank Taxonomy No.: 198214. Parent: Shigella

flexneri 2a .

23

Description: This strain was isolated in 1984 from a patient in Beijing, China. It is

similar to pathogenic Escherichia coli except for the more numerous insertion

sequences and contains a virulence plasmid (pCP301) (NCBI Entrez Genome Project).

1.15.3.2 Shigella flexneri 5: GenBank Taxonomy No.: 373383 . Parent: Shigella

flexneri .Description: This organism, along with Shigella sonnei, is the major cause of

shigellosis in industrialized countries and is responsible for endemic infections (NCBI

Genome Project).

Shigella flexneri 5 str. 8401: GenBank Taxonomy No.: 373383. Parent: Shigella

flexneri 5 .

Description: This organism is a strain of serogroup 5 and will be used for comparative

analysis

(NCBI Genome Project).

1.15.4 Shigella sonnei: GenBank Taxonomy No.: 624

Description: Synonym: Bacterium sonnei (NCBI Taxonomy). Shigella dysenteriae and

Shigella sonnei are the predominant species in the tropics, while S. sonnei is the

predominant species in industrialized countries (Alcoba-Florez et al., 2005).

Variant(s): Shigella sonnei 53G: GenBank Taxonomy No.: 216599. Parent: Shigella

sonnei.

Description: Isolated from 5 year old patient in Japan (NCBI Entrez Genome Project).

1.15.4.1 Shigella sonnei Ss046: GenBank Taxonomy No.: 300269. Parent: Shigella

sonnei .

Description: This strain is an isolate from an epidemic in China in the 1950s

(NCBI Entrez Genome Project).

1.16 Genome Summary:

1.16.1 Genome of Shigella boydii

Chromosome (NCBI Entrez Genome): GenBank Accession Number: NC_007613.

Size: 4,519,823 (NCBI Entrez Genome). Gene Count: 4466 Genes. 4136 Proteins

(NCBI Entrez Genome).

Plasmid pSB4_227 (NCBI Entrez Genome): GenBank Accession Number:

NC_007608. Size: 126,697 (NCBI Entrez Genome). Gene Count: 149 Genes. 148

Proteins (NCBI Entrez Genome).

24

1.16.1.1 Genome of Shigella boydii BS512

Description: The Shigella boydii BS512 whole genome shotgun (WGS) project has the

project accession NZ_AAKA00000000. This version of the project (01) has the

accession number NZ_AAKA01000000, and consists of sequences

NZ_AAKA01000001-NZ_AAKA01000079 (NCBI Entrez Genome).

Chromosome (NCBI Entrez Genome): GenBank Accession Number:

NZ_AAKA00000000. Size: 4680 Genes. 4680 Proteins (NCBI Entrez Genome). Gene

Count: 4,900,244 (NCBI Entrez Genome).

1.16.2 Genome of Shigella dysenteriae

1.16.2.1 Genome of Shigella dysenteriae Sd197




Plasmid pSD1_197 (NCBI Entrez Genome): GenBank Accession Number:



1.16.2.2 Genome of Shigella dysenteriae 1012

Chromosome (NCBI Entrez Genome): GenBank Accession Number:

NZ_AAMJ00000000. Size: 3,013,140 (NCBI Entrez Genome). Gene Count: 2782

Genes. 2782 Proteins (NCBI Entrez Genome).

1.16.3 Genome of Shigella flexneri

1.16.3.1 Genome of Shigella flexneri 2a

1.16.3.1.1 Genome of Shigella flexneri 2a str. 2457T

Description: The genome exhibits the backbone and island mosaic structure of E. coli

pathogens, albeit with much less horizontally transferred DNA and lacking 357 genes

present in E. coli. The strain is distinctive in its large complement of insertion

sequences, with several genomic rearrangements mediated by insertion sequences, 12

cryptic prophages, 372 pseudogenes, and 195 S. flexneri-specific genes (Wei et al.,

2003).

25




Description: The genome consists of a single circular chromosome of 4,599,354 bp

with a G+C content of 50.9%. The genome is slightly smaller than that of K-12

(4,639,221 bp), and its organization is roughly similar to that described for pathogenic

E. coli strain O157:H7 EDL933 and the uropathogen CFT073, with large regions of

collinear E. coli backbone punctuated by islands of sequence presumably acquired by

horizontal transfer. The number of islands is smaller than those in CFT073 and

O157:H7, and a larger proportion of the genome is backbone (82% versus 75% for

O157:H7 and CFT073) (Wei et al., 2003).

1.16.3.1.2 Genome of Shigella flexneri 2a str. 301

Description: The whole genome is composed of a 4,607,203 bp chromosome and a

221,618 bp virulence plasmid, designated pCP301 (Jin et al., 2002).

Chromosome (NCBI Entrez Genome): GenBank Accession Number: NC_004337 .



Description: While the plasmid shows minor divergence from that sequenced in

serotype 5a, striking characteristics of the chromosome have been revealed. The S.

flexneri chromosome has, astonishingly, 314 IS elements, more than 7-fold over those

possessed by its close relatives, the non-pathogenic K12 strain and enterohemorrhagic

O157:H7 strain of Escherichia coli (Jin et al., 2002).

1.16.3.1.3 Plasmid pCP301 (NCBI Entrez Genome): GenBank Accession Number:



Description: pCP301 is a mosaic of potential virulence-related genes, IS elements,

maintenance genes and functionally unknown ORFs. All the previously identified

virulence genes are present in pCP301. These include the primary invasion genes ipa

and mxi-spa (encoding the invasion plasmid antigens and the type III secretion system,

respectively), virG/IcsA (required for polymerizing host actin to provide propelling force

for intra- and inter-cellular spread) and virF (necessary for regulating virulence gene

expression). The replication origin (R100-like) ori and G site (single-strand initiation

site) in pCP301 are identical to those of pWR501 and pWR100. pCP301 also has

maintenance genes, repA, copA and copB, for replication; parA and parB for

26

partitioning; and ccdA and ccdB for post-segregation killing. The noticeable difference

between pCP301 and the plasmids from serotype 5a is the presence of more IS-related

DNA in pCP301, making its size close to pWR501 (221 851 bp), which is larger than

pWR100 because of a Tn501 (8360 bp) insertion (Venkatesan et al., 2001).

1.16.3.1.4 Plasmid p2457TS2 (NCBI Entrez Genome): GenBank Accession Number:

NC_002773. Size: 3,179 (NCBI Entrez Genome). Description: Submitted (03-MAR-

2001). Wang,H., Feng,E., Liao,X., Su,G. and Huang,C. Unpublished (NCBI Entrez

Genome).

1.16.3.2 Genome of Shigella flexneri 5

1.16.3.2.1 Plasmid pWR501 (NCBI Entrez Genome): GenBank Accession Number:

NC_002698 . Size: 221,851 (NCBI Entrez Genome). Gene Count: 293 Genes. 293


Description: The 210-kb Shigella flexneri 5a virulence plasmid ( pWR501) is a mosaic

of potential pathogenesis-associated genes, IS elements, maintenance genes, and

unknown ORFs. Of the 286 Shigella-derived potential ORFs, 54 (19%) encode known

Shigella proteins. Thirty-seven of these are located within a 32-kb cluster of

uninterrupted ORFs, previously described, constituting the ipa-mxi-spa loci or

pathogenicity island (Tran van Nhieu and Sansonetti, 1999). The remaining 17 are

distributed throughout the plasmid and include five alleles of ipaH and one allele each

of icsA (virG), virA, icsP (sopA), virF, virK, msbB sepA, ipgH, shet2, phoN-Sf, trcA, and

an apyrase gene. Most virulence-associated genes, including the ipa-mxi-spa operons,

the virG gene, and the ShET2 toxin gene, are flanked by one or more such mosaics of IS

element ORFs. In addition, many of the unknown ORFs are flanked by IS element

ORFs and have G+C content of less than 40%. Based on this genetic organization, the

recombination events that led to the acquisition of many or most genetic loci and the

assembly of the large virulence plasmid almost certainly involved IS-mediated events.

(Malabi et al., 2001). The average G+C content of pWR501 is 47.6% but all virulence

associated genes on the plasmid have G+C composition of 30-35% (Lan et al., 2001). In

pWR501, the impCAB operon is missing; only the first 176 bp are present, beginning at

sequence coordinate 157595. (Philpott et al., 2000).

27

1.16.4 Genome of Shigella sonnei

1.16.4.1 Genome of Shigella sonnei Ss046

Chromosome (NCBI Entrez Genome): GenBank Accession Number: (NC_007384) .



1.16.4.2 Plasmid pSS (NCBI Entrez Genome): GenBank Accession Number:

(NC_007385) . Size: 214,396 (NCBI Entrez Genome).

Gene Count: 241 Genes. 248 Proteins (NCBI Entrez Genome).

Description: The complete sequence of pSS, which is the large virulence plasmid of

Shigella sonnei, was determined. The 214-kb plasmid is composed of segments of

virulence-associated genes, the O-antigen gene clusters, a range of replication and

maintenance genes, and large numbers of insertion sequence (IS) elements. The pSS

plasmid is a mixture of genes with different origins and functions. The sequence

suggests a remarkable history of IS-mediated recombination and acquisition of DNA

across a range of bacterial species (Jiang et al., 2005). Similar to the other three groups

of Shigella, the virulence plasmid of S. sonnei, designated as pSS, is sufficient for

entering, replicating, and disseminating within epithelial cells. However, pSS is unstable

and tends to be lost at a high frequency, unlike other large unicopy plasmids (Jiang et

al., 2005).

1.16.4.3 Colicins

Description: Colicins are plasmid-encoded toxic exoproteins that are produced by

colicinogenic strains of Escherichia coli and some related species of the family

Enterobacteriaceae. To date, at least 23 colicin types have been described in detail

(Smajs and Weinstock, 2001). Colicin Js was originally described as a bacteriocin of

Shigella sonnei colicinotype 7. (Smajs and Weinstock, 2001).

Plasmid ColJs (NCBI Entrez Genome): GenBank Accession Number: NC_002809 .

Size: 5,210 (NCBI Entrez Genome). Gene Count: 3 Genes. 3 Proteins (NCBI Entrez

Genome).

Description: The 5.2-kb ColJs plasmid of a colicinogenic strain of Shigella sonnei

(colicin type 7) was isolated and sequenced. A 1.2-kb unique region of pColJs showed

significantly different G+C content (34%) compared to the rest of pColJs (53%) (Smajs

and Weinstock, 2001).

28

1.17 Diagnosis

Diagnosis of shigellosis is made clinically by the typical features of bacillary dysentery

with blood and mucus in stool although some cases may present with mild to moderate

watery diarrhea initially. Microscopic examination of faecal smear stained with iodine

shows presence of plenty of faecal leucocytes (> 10/high power field). Confirmation is

made by stool culture, serological and biochemical tests (World Health Organisation,

1987).

1.17.1 Collection, transportation and culture of stool specimen

Specific diagnosis of shigella in stool specimens depends on the appropriate collection

and transportation to the laboratory. Fresh stool samples collected from patients before

initiation of therapy are preferred for microbiological tests because the chances of

recovering the organisms are higher. For microbiologic cultures, fresh stool is preferred

to rectal swabs in which the pathogens are less in number. Samples that cannot be

cultured immediately should be kept in buffered glycerol-saline transport medium.

Cary-Blair medium is the second option. Direct inoculation of culture plates at the

bedside is the most efficient means of isolating shigella from the dysentery patients.

Stool specimens for isolation of shigella should be plated on both moderately selective

medium such as MacConkey or deoxycholate citrate agar (DCA), and a highly selective

medium such as xylose-lysin deoxycholate (XLD), Hektoen enteric (HE) or

Salmonella-Shigella (SS) agar. Since the Shigella isolates growing in these plates do not

change the colour of the pH indicator due to its inability to ferment lactose, it is easy to

pick up the typical colonies. Further identification can be made by using triple sugar

iron (TSI) agar or Kligler iron agar (KIA), on which Shigellae are non-motile, produce

an alkaline slant and acid butt due to inability to ferment lactose aerobically in the slope

and the anaerobic fermentation of glucose in the butt, and fail to produce hydrogen

sulphide or other gas. After tentative identification, strains can be speciated by

serological methods, using grouping antisera. Rapid methods for the diagnosis of S.

dysenteriae type 1 by means of fluorescent antibody staining have been established

(Albert et al., 1992).

1.17.2 PCR based methods for the identification of Shigella

Molecular typing of pathogens has long been a part of pathogen identification and

control and has recently been accelerating with new technologies. Traditionally,

serotyping has been extremely valuable and has often been able to identify important

29

cellular components associated with virulence. While serotyping will continue to be an

important tool, it often has limited discriminatory power, resolving pathogens into only

a few types. (Boyd et al.,1996). However, DNA typing is more rapid and less expensive

and has an even greater capacity for genetic dissection of bacterial pathogens. It is

limited only by the genome size and the technology. Because most microbial genomes

consist of millions of nucleotides, technology is invariably limiting. (Miettinen etal.,

1999). The polymerase chain reaction (PCR) is a powerful technique for highly specific

amplification of DNA defined by two flanking primers and has had a major impact on

many aspects of biology (Mullis and Faloona, 1987). Most of the PCR methods

established for the identification of shigella are targeted towards either invasive-

associated locus (ial) gene or invasive plasmid antigen (ipa) H locus, which are also

present in the enteroinvasive Escherichia coli (EIEC) (Ye LY et al., 1993; Dutta S et

al.2001). The use of IS630-specific primers along with serotype specific primers derived

from the rfc genes in the multiplex PCR was reported to be useful for the detection of

many serotypes of Shigella (Houng et al., 1997). In most of these studies, PCR was

found to be more sensitive and specific technique than the conventional culture methods

and has the potential to be employed in routine diagnosis. In addition, in most of the

Shigella strains there is a spontaneous loss of the virulence genes, and hence direct stool

PCR based detection system is preferred than the DNA probe hybridization technique in

which the strains should be cultured several times (Dipika Sur et al ., 2004).

1.17.3 PCR based Molecular typing methods

Traditional subspecific typing methods include serotyping, phage typing, biotyping,

plasmid profiling, multilocus enzyme electrophoresis, conventional restriction

endonuclease analysis, ribotyping, and pulsed-field gel electrophoresis (PFGE). Their

strengths notwithstanding, all of these methods have one or more significant drawbacks,

including being slow or cumbersome; requiring highly specialized equipment, skills,

and/or reagents; relying on variable or unstable traits; and yielding uninterpretable

results for some strains (Eisenstein, 19990; Lupski, 1993; Selander et al., 1987).

Recently, PCR-based methods have become increasingly important to molecular typing

efforts. These approaches include AFLPs, repetitive element polymorphisms-PCR,

randomly amplified polymorphic DNA, arbitrarily primed PCR (Welsh and

McClelland; Williams, 1990) and Pulsed Field Gel Electrophoresis (PFGE) (Herrmann

et al., 1992).

30

The power of PCR-based methods is the ease with which they can be applied to many

bacterial pathogens and their multilocus discrimination. These methods have proven

valuable for genetic dissection of pathogens for which other approaches have failed.

However, a limitation of many PCR-based approaches is the biallelic (binary) nature of

their data, frequently, the presence or absence of a marker fragment. Finally,

comparative gene sequencing is becoming feasible for strain characterization and can be

performed at multiple loci (Williams et al., 1990). PCR-based fingerprinting is a simple,

rapid, and broadly applicable typing method that is potentially available to any

laboratory with PCR capability. Fingerprints are generated using RAPD (random

amplified polymorphic DNA) (Williams et al., 1990), arbitrarily primed PCR (Welsh et

al., 1990), or DNA amplification fingerprinting (Caetano-Anolles, 1993) or repetitive-

element-based primers (rep-PCR) (Versalovic et al., 1994). In its best applications,

multiple-locus sequence typing (MLST) can provide data for multiple alleles

(haplotypes) spread across dispersed genomic locations (Maiden et al., 1998).

Nucleotide data are well understood, standardized into four defined categories, and

easily analyzed using phylogenetic approaches. If sufficient nucleotide diversity is

present, MLST can distinguish among both species and strains. While routine clinical

MLST is still unfeasible, hybridization arrays (e.g., chip technology) could make single-

nucleotide polymorphisms a mainstream approach to pathogen typing in the future

(Vahey et al., 1999).

1.17.3.1 Rep-PCR, ERIC-PCR and BOX-PCR

Rep-PCR genomic fingerprinting makes use of the DNA primers complementary to

naturally occurring, highly conserved, repetitive DNA sequences, present in multiple

copies in the genomes of most Gram-negative and several Gram-positive bacteria

(Lupski and Weinstock, 1992). Rep-PCR genomic fingerprinting, is based on PCR-

mediated amplification of DNA sequences located between specific interspersed

repeated sequences in prokaryotic genomes (de Bruijn,1992; Louws et al., 1996). Three

families of repetitive sequences have been identified, including the 35-40 bp repetitive

extragenic palindromic (REP) sequence, the 124-127 bp enterobacterial repetitive

intergenic consensus (ERIC) sequence, and the 154 bp BOX element (Versalovic et al.,

1994). These sequences appear to be located in distinct, intergenic positions around the

genome. The repetitive elements may be present in both orientations, and

oligonucleotide primers have been designed to prime DNA synthesis outward from the

inverted repeats in REP and ERIC, and from the boxA subunit of BOX, in the

31

polymerase chain reaction (PCR) (Versalovic et al., 1994). The use of these primer(s)

and PCR leads to the selective amplification of distinct genomic regions located

between REP, ERIC or BOX elements. The corresponding protocols are referred to as

REP-PCR, ERIC-PCR and BOX-PCR genomic fingerprinting respectively, and rep-

PCR genomic fingerprinting collectively (Versalovic et al., 1991). The amplified

fragments can be resolved in a gel matrix, yielding a profile referred to as a rep-PCR

genomic fingerprint (Versalovic et al., 1994). These fingerprints resemble "bar code"

patterns analogous to UPC codes used in grocery stores (Lupski, 1993).

Characteristic prokaryotic repeats such as the enterobacterial repetitive intergenic

consensus (ERIC) sequences and the repetitive extragenic palindrome sequence motif

have been found in microbial species as diverse as Enterobacteriaceae and

cyanobacteria (Boom et al., 1990; Martin et al., 1992).

1.17.3.2 RFLP and RAPD

DNA fingerprinting techniques such as restriction fragment length polymorphism

(RFLP) and random primer polymorphism amplification detection (RAPD) have been

described as powerful molecular typing methods for microorganisms (Swaminathan et

al., 1993). RFLP requires large amounts of genomic DNA, defined nucleic acid probes

and laborious hybridization procedures. The performance of RAPD is also sensitive to

many factors such as selection of primers, magnesium concentration in the PCR buffers

and the thermocycler used for PCR (Lin et al., 1996). There are three major steps in the

AFLP procedure: (i) restriction endonuclease digestion of genomic DNA and the

ligation of specific adapters; (ii) amplification of the restriction fragments by PCR using

primer pairs containing common sequences of the adapter and one to three arbitrary

nucleotides; (iii) analysis of the amplified fragments using gel electrophoresis. The

combination of different restriction enzymes and the choice of selective nucleotides in

the primers for PCR make AFLP a useful new system for molecular typing of

microorganisms (Jhy-Jhu et al., 1996).

1.17.4 Characteristics of the various molecular typing methods

Although a particular typing method may have high discriminatory power and good

reproducibility, the complexity of the method and interpretation of results as well as the

costs involved in setting up and using the method may be beyond the capabilities of the

laboratory. The choice of a molecular typing method, therefore, will depend upon the

32

needs, skill level, and resources of the laboratory. Some factors in evaluating the utility

of a particular typing method is:

- Ease of interpretation

- Ease of use

- Cost

- Time to obtain a result

- Discrimination power

- Intralaboratory reproducibility

- Interlaboratory reproducibility

Table.1 Shows summary of the characteristics of the various molecular typing

methods

Table 1.4 Summary of the characteristics of the various molecular typing methods Methodology

M: Moderate, E: Easy, H: High, G: Good, D: Difficult, P: Poor, L: Low (Michael Olive and Pamela Bean,1999).

Ease

of

use

Ease of

interpretation

Discrimination

power

Time to

result

(days)

Intralaboratory

reproducibility

Interlaboratory

reproducibility

Setup

cost

Cost per

test

PFGE

Moderate Easy High 3 Good Good Moderate Moderate

PCR-RFLP Easy Easy Moderate 1 Good Good Moderate Low

Rep-PCR Easy Easy High 1 Good Moderate Moderate Low

RAPD Easy Easy High 1 Moderate Poor Moderate Low

CFLP Moderate Moderate Moderate 2 Good Poor Moderate High

AFLP Moderate Easy High 2 Good Good High Moderate

Sequencing Difficult Moderate High 2 Good Good High High

1.17.5 VNTR and MLVA

One of the most recent developments in molecular typing involves the analysis of

VNTR sequences (Frothingham and Meeker, 1998; Keim et al., 1999). Short

nucleotide sequences that are repeated multiple times often vary in copy number,

creating length polymorphisms that can be detected easily by PCR using flanking

primers. VNTRs appear to contain greater diversity and, hence, greater discriminatory

capacity than any other type of molecular typing system (van Belkum et al., 1998;

Richards and Sutherland, 1997). Analysis of variable-number tandem repeats

33

(VNTR), also called multiple-locus VNTR analysis has proven to be a highly

powerful and discriminant method to study the population structure of bacteria

(Pourcel et al., 2003) and to characterize isolates even from monomorphic bacterial

populations (Farlow et al., 2002; Keim et al., 2000). VNTRs and other short-sequence

DNA tandem repeats in prokaryotic genomes appear to provide useful information on

both the functional and the evolutionary aspects of bacterial genetic diversity (Van

Belkum, 1999). Once these polymorphisms are located, flanking primers can then be

designed to amplify these variable length regions thus allowing differentiation of copy

numbers using the size of the resultant amplicon. This can be done using standard

agarose gel electrophoresis and if a higher resolution is required, fluorescent labelling

and fragment sizing via a DNA sequencer can be used. VNTR is therefore applicable

to a wide range of laboratories, including those which may have simple equipment

such as thermal cyclers and agarose gel electrophoresis but do not have access to

sophisticated equipment such as DNA sequencers. Furthermore when VNTR is

applied to multiple loci as a typing scheme such as in Multiple Locus VNTR Analysis

(MLVA) greater discriminatory power and more accurate determination of genetic

relatedness is achieved (Adair et al., and Keim et al., 2000; Klevytska et al., 2001).

More recently, a number of studies have supported the notion that tandem repeats

reminiscent of mini and microsatellites are likely to be a highly significant source of

very informative markers for the identification of pathogenic bacteria even when

these pathogens are recently emerged, highly monomorphic species (van Belkum et

al., 1997; Adair et al., 2000). This probably reflects the important contribution of

tandem repeats to the adaptation of the pathogen to its host. Tandem repeats appear to

contribute to phenotypic variation in bacteria in at least two ways. Tandem repeats

located within the regulatory region of a gene can constitute an on/off switch of gene

expression at the transcriptional level [van Ham et al., 1993; Weise et al., 1989).

Similarly, tandem repeats within coding regions with repeat units length not a

multiple of three can induce a reversible premature end of translation when a mutation

changes the number of repeats (reviewed in [Bayliss et al., 2001; Wang et al., 2000).

Variable Number of Tandem Repeats (VNTR) has been described for various

organisms. These include Salmonella enterica (Ramisse et al., 2004; Liu et al., 2003),

Staphylococcus aureus (Sabat et al., 2003), Yersinia pestis (Adair et al., 2000),

Mycobacterium tuberculosis (Frothingham et al., 1998), Francisella tularensis

(Farlow et al., 2001), Legionella pneumophila (Pourcel et al., 2003), Brucella spp

34

(Bricker et al., 2003), Escherichia coli O157:H7 (Noller et al., 2000) and Borrelia spp

(Farlow et al., 2002). The increasing availability of whole-genome sequences is an

invaluable source of VNTRs, which has opened the way to multiple-locus VNTR

analysis (MLVA) for the typing of bacteria. MLVAs have been proposed so far for

Bacillus anthracis (Le Fle`che et al., 2001), Yersinia pestis (Pourcel et al., 2004),

Francisella tularensis (Farlow et al., 2001), Mycobacterium tuberculosis (Le Fle`che

et al., 2001), Legionella pneumophila (Pourcel et al., 2003), Pseudomonas

aeruginosa (Oteniente et al., 2003), Escherichia coli O157:H7 ((Lindstedt et al.,

2004), and Salmonella enterica subsp. Enterica serovars Typhimurium (Lindstedt et

al., 2003) and Typhi (Liu et al., 2003).

1.18 Repetitive DNA

Repetitive DNA, which occurs in large quantities in eukaryotic cells, has been

increasingly identified in prokaryotes. In eukaryotic genomes, this repetitive DNA is

infrequently associated with coding regions and consequently is located primarily in

extragenic regions (Cox et al., 1997). Repetitive DNA consists of simple

homopolymeric tracts of a single nucleotide type [poly(A), poly(G), poly(T), or

poly(C)] or of large or small numbers of several multimeric classes of repeats. These

multimeric repeats are built from identical units (homogeneous repeats), mixed units

(heterogeneous repeats), or degenerate repeat sequence motifs (Fig. 1 shows a

schematic overview) (van Belkum et al., 1998).

Figure. 1. Schematic survey of SSRs.

35

(A) Examples of homogeneous simple sequence motifs consisting of repeat units

varying from 1 (homopolymeric tract) to 6 nucleotides in length. (B) Example of a

in eukaryotes, essentially

uman and yeast (Vergnaud and Denoeud, 2000). In brief, the data obtained so far

eplication slippage processes; mutation rates

composite, heterogeneous repeat built from three 3-nucleotide units, two 5-nucleotide

units, and seven 2-nucleotide motifs. (C) Comparative analysis of four different

repeats built from three 10-nucleotide units showing degeneracy among units. Identity

of the nucleotide sequences B through D with the consensus given in sequence A is

indicated by dashes (van Belkum et al., 1998).

1.18.1 Microsatellites and minisatellites

Minisatellites are usually defined as the repetition in tandem of a short (6- to 100-bp)

motif spanning 0.5 kb to several kilobases. Although the first examples described

20 years ago were of human origin, (Wyman and White, 1980).

Microsatellites or simple sequence repeats (SSRs), tandemly repeated units of one to

six nucleotides, are abundant in prokaryotic and eukaryotic genomes (Weber 1990;

Field and Wills, 1996). They are ubiquitously distributed in the genome, both in

protein coding and in noncoding regions (Toth et al., 2000). Mutation mechanisms of

micro and minisatellites have been studied in some detail

h

suggest that microsatellites mutate by r

depend upon the efficiency of mismatch repair mechanisms and an internal

heterogeneity within the array strongly stabilizes the tandem repeat. In contrast,

minisatellites mutate predominantly as the result of the repair of a double strand break

initiated within, or very close to, the tandem repeat. In eukaryotes at least, these

events can be of replicative origin (Kokoska et al., 1998), or can be genetically

controlled, and specifically induced, during meiosis, at double strand breaks hot-spots.

Minisatellite mutation rate in eukaryotes appears to be insensitive to mismatch repair

efficiency, and internal heterogeneity is compatible with a high mutation rate

(Vergnaud and Denoeud, 2000; Debrauwère et al., 1999).

1.18.2 SSR (Simple sequence repeats)

SSRs are encountered in many different branches of the prokaryote kingdom. They

are found in genes encoding products as diverse as microbial surface components

36

recognizing adhesive matrix molecules and specific bacterial virulence factors such as

lipopolysaccharide-modifying enzymes or adhesins. SSRs enable genetic and

consequently phenotypic flexibility. SSRs function at various levels of gene

expression regulation. Variations in the number of repeat units per locus or changes in

the nature of the individual repeat sequences may result from recombination processes

or polymerase inadequacy such as slipped-strand mispairing (SSM), either alone or in

combination with DNA repair deficiencies. These rather complex phenomena can

roaching a frequency of 10−4 per bacterial cell

etitive

mbersome Southern hybridization

occur with relative ease, with SSM app

division and allowing high-frequency genetic switching. Bacteria use this random

strategy to adapt their genetic repertoire in response to selective environmental

pressure. SSR-mediated variation has important implications for bacterial

pathogenesis and evolutionary fitness. Molecular analysis of changes in SSRs allows

epidemiological studies on the spread of pathogenic bacteria. The occurrence,

evolution and function of SSRs, and the molecular methods used to analyze them are

discussed in the context of responsiveness to environmental factors, bacterial

pathogenicity, epidemiology, and the availability of full-genome sequences for

increasing numbers of microorganisms, especially those that are medically relevant

(Van Belkum et al., 1998).

1.18.2.1 Molecular Analysis of SSRs

Repetitive DNA has characteristic physical features due toits specific nucleotide

composition. The detection of the first eukaryotic repetitive DNA moieties was the

immediate consequence of their aberrant density. When subjected to density gradient

centrifugation, repetitive DNA lagged behind the bulk of DNA and presented as

satellite fractions due to differences in thermodynamic stability and reassociation

kinetics (Britten, R. J., and D. E. Kohne. 1968). Whereas the variability in rep

DNA domains was initially detected by relatively cu

techniques with DNA probes recognizing the repeat consensus motif, the emergence

of PCR technology enabled a more straightforward DNA amplification mediated

approach (Jeffreys, A. J. et al.,1991). In this method, PCR primers bordering the SSR

region are constructed and polymorphism in repeat unit number is documented by

simple electrophoretic techniques once DNA amplification has been performed.

Regions bordering the repeats are generally sufficiently well-conserved targets for

PCR-mediated amplification. Consequently, repeat degeneracy can be analyzed by

direct sequencing. Moreover, border sequence conservation is sometimes even

37

observed among different species, allowing a broad-spectrum analysis of the nature of

the species and subspecific genetic polymorphisms (Shields, D. C et al., 1995).

1.18.2.2 STRUCTURAL FEATURES OF SSRS

Bacterial SSR-type DNA can be divided into four main categories. First, dispersed

repeat motifs that generally do not occur in tandem have been identified. Although

these repeats occur throughout genomes of a multitude of microorganisms, they are

sometimes organized in tandem as well. A second class is formed by the

homopolymeric tracts. Multimers of one of the four nucleotides are peculiar sequence

elements that are frequently encountered in the genome of S. cerevisiae, for instance.

These homogeneous stretches can amount to as much as 42 nucleotides. Third, short-

motif SSRs are identified. With repeat units differing from 2 to 6 bases, it is this class

of repeats that is most liable to unit number variation at a given locus. Particularly,

when these short-motif repeats are located within genes and are not 3 or 6 nucleotides

tential of a given transcript. Fourth,

le.

nomes has shown that (i) a large number of

long, they can drastically affect the coding po

repeats harboring more than 8 nucleotides per unit, form a separate category. Repeats

with intermediately sized unit lengths are only rarely encountered. It is interesting that

the shorter unit repeats, in particular, are involved in regulatory processes that are

affected by SSM. Among the longer repeats, a larger degree of sequence

heterogeneity is observed. This heterogeneity is thought to be indicative of more

frequent recombination. Analyses of the precise function of the repeat locus are often

missing. It is regularly assumed that these repeats encode protein sequences spanning

membranes or cell walls. Therefore, they play a physical more than a regulatory ro

These longer repeats are candidate regions for determining phylogenetic relatedness

between species or strains (Van Belkum et al., 1998).

1.18.2.3 Studies on SSR in Bacterial Species

Study on characterization of mononucleotide Repeats of size between 5 and 13 nt in

157 Sequenced Prokaryotic Ge

mononucleotide SSRs are present in all prokaryotic genomes investigated, (ii) shorter

repeats are much more abundant than longer repeats, and (iii) in the majority of the

genomes, longer mononucleotide SSRs are excluded from coding regions. Also it has

observed that some genomes contain more mononucleotide SSRs than expected,

while others contain significantly less. Bacterial genomes that contain much less

mononucleotide SSRs than expected are generally larger and more GC-rich, while

38

bacterial genomes that contain much more mononucleotide SSRs than expected are in

general smaller and more AT-rich. Finally, it also has noted that genomes that contain

genome and the G+C content of

ononucleotide SSRs ≥ 6 bp. There are considerable differences in the dinucleotide

omposition between the genome and SSRs. While the frequency of AT/TA in

inucleotide SSRs ≥ 6 bp is much higher than expected in C. jejuni, it is lower in H.

enome of W. succinogenes. For H. hepaticus, the frequency

a high fraction of horizontally transferred genes have a lower mononucleotide SSR

density and that A and T are generally overrepresented in mononucleotide SSRs

(Coenye and Vandamme, 2005).

Study on Simple Sequence Repeats in Escherichia coli has shown SSRs were well

distributed throughout the genome. Mononucleotide SSRs were over-represented in

noncoding regions and under-represented in open reading frames (ORFs). Nucleotide

composition of mono- and dinucleotide SSRs, both in ORFs and in noncoding

regions, differed from that of the genomic region in which they occurred, with 93% of

all mononucleotide SSRs proving to be of A or T. It also have shown that SSRs are

polymorphic among E. coli strains, providing potential marker loci for rapid detection

and characterization (Gur-Arie et al in 2000).

Study on Abundance, distribution and composition of simple sequence repeats in the

genomes of ε-Proteobacteria has shown The number of mononucleotide SSRs

decreased rapidly with increasing size of the repeat unit, Although local differences in

SSR density could be observed, overall, the SSRs were evenly distributed over the

genomes, they indicates that (i) there is a tremendous overrepresentation of A and T

in mononucleotide SSRs ≥ 6 bp and (ii) there is a highly significant linear

relationship between the G+C content of the

m

c

d

pylori genomes and the g

of AT/TA is approximately the same in the genome and in dinucleotide SSRs ≥ 6 bp.

The frequency of CG/GC in dinucleotide SSRs ≥ 6 bp is lower than expected based

on the genome composition for C. jejuni and H. hepaticus, but normal in the other

genomes. There is an overrepresentation of CT/TC in dinucleotide SSRs ≥ 6 bp in

both H. pylori genomes and in the W. succinogenes genome. there is a slight

overrepresentation of A in mononucleotide SSRs ≥ 6 bp that occur in coding regions

for all genomes (Coenye and Vandamme, 2004).

39

1.18.2.4 SSR Function

Numerous lines of evidence have demonstrated that genomic distribution of simple

sequence repeats (SSRs) is nonrandom, presumably because of their effects on

chromatin organization, regulation of gene activity, recombination, DNA replication,

cell cycle, mismatch repair (MMR) system, etc. (Li et al., 2002). SSRs may provide an

evolutionary advantage of fast adaptation to new environments as evolutionary tuning

knobs (Kashi et al., 1997; Trifonov, 2003). The presence of SSRs in prokaryotes is

rare, but most that do occur are related to pathogenic organisms; their variation in

repeat numbers can also cause phenotypic changes (van Belkum et al., 1998).

Haemophilus influenzae (Hi), an obligate upper respiratory tract commensal/pathogen,

uses phase variation (PV) to adapt to host environment changes. Switching occurs by

slippage of SSR repeats within genes coding for virulence molecules (Hood et al.,

1996). When SSR repeats lie within protein coding regions, UTRs, and introns, any

changes by replication slippage and other mutational mechanisms may lead to changes

in protein function. There are numerous lines of evidence indicating that changes in

lengths of triplet or amino acid repeats could affect protein function, and frameshifts

within coding regions caused by SSR expansion or contraction could (1) cause gain of

function and loss of function or gene silencing and (2) induce novel protein, bacterial

pathogenesis, and virulence. Variations in repeat number of SSR located in the 5'-

UTRs and 3'-UTRs and introns can cause significant effects on gene expression—e.g.,

mRNA splicing or translation—and lead to phenotypic changes with altered selective

values. For instance, in Escherichia coli, hundreds of genes related to DNA repair,

recombination, and physiological adaptation to different stresses contain high density

of small SSRs, which can induce mutation phenotypes by affecting repair efficiency

and/or DNA metabolism (Rocha et al., 2002). In humans, SSR variation in coding

regions, UTRs, and introns can cause neuronal diseases, cancers, SCA, and DM

diseases, among others. In some cases, MSI even affects the effectiveness of medical

treatment on human cancers (Kim et al., 2001). In bacteria, particularly pathogenic

bacteria, infection processes require that the bacteria adapt to several host

environments. Initial colonization, crossing epithelial and endothelial barriers, survival

in circulation, and translocation across, for instance, the blood-brain barrier, are all

processes that require specific virulence traits (Roche and Moxon, 1995). SSR

evolution in genes should share similar mutational processes, including replication

lippage, point mutation, and recombination, but SSRs within genes should be s

40

subjected to stronger selection pressure than other regions because of their functional

gnificance in regulating gene expression and function. These mutational processes

provide mutation resources for the MMR system. If SSR mutations within genes

escape from MMR correction, these mutations can cause phenotypic changes. The

link between changing copy number of SSRs and phenotypes is provided by an

accumulating number of experimental observations showing a dependence of gene

expression and other functions on the copy number of the associated. If SSR changes

result in selectable phenotypic variation, selection can naturally start to act. It has been

demonstrated that SSRs in protein-coding regions are under strong selection (Richard

si

and Dujon, 1997; Alba et al., 1999).

41

chapter 1 general introduction -...

Documents