chapter 2. review of literature -...
TRANSCRIPT
CHAPTER 2. REVIEW OF LITERATURE
2.1. Tuberculosis
Tuberculosis (TB), designated as ‘the Captain of all these men of death’ as it
continued to claim lives throughout the ages (Daniel, 2006), is a widespread infectious
disease. It is caused by various Mycobacterial strains, commonly by Mycobacterium
tuberculosis, an obligate human pathogen. It can be traced in humans back to about 6000
B.C. (Wirth et al., 2008). In the past it also called as Phthisis or Phthisis pulmonalis
(Breathnach & Moynihan, 2004). On 24th March, 1882, the German scientist Robert Koch
identified a tiny microorganism MTB as the causative pathogen of TB. This discovery tiled
the way for vast advances in TB research. For diagnosing TB, Clemens von Pirquet
developed the tuberculin skin test in 1907 (Daniel, 2006). Albert Calmette and Camille
Guérin developed BCG (Bacille Calmette-Guérin) vaccine from attenuated bovine
tuberculosis strain in 1921 (Calmette 1931). Now, BCG is the world’s most widely used
vaccine. With the discoveries of para-amino salicylic acid (PAS) and streptomycin in two
successive years i.e. 1943 and 1944 respectively, a revolution in TB therapy began. The first
oral anti-TB drug, Isoniazid was developed in 1952 followed by Rifampicin in 1963 (Girling
et al., 1976; Daniel, 2006). In the early 1970s, short-course chemotherapy regimens
developed which was proved to be highly efficient in the TB therapy (Dawson & Bateman,
2009; Zumla et al., 2009). In the early 1980s, TB treatment regimen was worked out. In 1993,
Tuberculosis was declared as “global health emergency” by World Health Organization
(WHO). The “Stop TB Partnership” (http://www.stoptb.org) was established in 2001 with an
aim to eliminate tuberculosis as a public health. WHO outlined a global plan to reduce the
global TB burden by 2015 with a target to eradicate TB as a “public health problem” by 2050
(The Stop TB Strategy, WHO, http://www.who.int/).
2.2. Epidemiology
2.2.1. World
Tuberculosis is the major public health problem worldwide. Globally, 8.6 million new
TB cases with mortality of 1.3 million from TB occurred in 2012, of which 0.32 million
deaths were in HIV-positive people. Out of total cases, an estimated 0.5 million are children
and 2.9 million occurred among women. There are an estimated 0.45 million multidrug-
resistant TB (MDR-TB) cases with 0.17 million death from MDR-TB. The majority of cases
worldwide were in the South-East Asia (29%), Africa (27%) and Western Pacific (19%)
regions. India & China unaided accounted for 26% and 12% of total cases, respectively
(WHO TB Report, 2013). The largest number of incident cases were in five countries i.e.
India (2.0 - 2.4 million), China (0.9 -1.1 million), South Africa (0.4 - 0.6 million), Indonesia
(0.4 - 0.5 million) and Pakistan (0.3 million–0.5 million). Figure 1 showing the estimated
number of TB cases in 2012.
Figure 1: Estimated number of TB cases in the year 2012
(WHO TB Report, 2013)
2.2.2. India
Annually, there are more new cases of TB in India as compared to other country. Out
of the 8.6 million estimate yearly incidences of TB globally, in 2012, 2.0 - 2.4 million have
occurred in India i.e. 26% of total TB global cases. There are 0.27 million deaths caused by
TB in 2012. India also accounted for 31% of the estimated 2.9 million missed TB cases
(WHO TB Report, 2013). According to WHO, there are 2-3% of Indian TB patients are
multi-drug-resistant.
2.3. Transmission of TB
Usually lungs are affected by tuberculosis, but sometimes it also affects other parts
also. It spreads through air when person with active pulmonary TB, sneeze, cough, speak or
transmit respiratory fluids through the air. This fluid contains droplets of diameter size range
from 0.5 to 5.0 µm. There are up to 40000 droplets in a single sneeze (Cole & Cook, 1998)
and each droplet has the competence to transmit disease, as the infectious dose of TB is very
low (less than 10 bacteria may cause the infection) ( Nicas et al., 2005). People with
protracted, close or recurrent contact with TB infected person are more prone to TB infection,
with 22% anticipated infection rate (Ahmed & Hasnain, 2011). A person having untreated
and active TB may infect other normal people. There are many factors make people more
vulnerable to TB infections such as quantity of contagious droplets the infected person
expelled, the aeration efficiency, the exposure period, the strain virulence properties of MTB
and also the immunity level of healthy individuals. One infected person takes 3 to 4 weeks
time to transmit the disease to other normal individuals.
2.4. Pathogenesis of TB
About 90% of the people with MTB infection but having without any symptoms of
TB called latent TB and there are only 10% of them will become active TB disease (Mainous
& Pomeroy, 2009). When the pathogenic organism MTB reach at the pulmonary alveoli, the
tuberculosis infection begins. In pulmonary alveoli, the mycobacteria invade and get
replicated within endosomes of alveolar macrophages (Houben et al., 2006). Blood stream
infection may also cause for TB of the lungs. Due to this hematogenous transmission, TB can
also spread to the brain, kidneys and bones (Herrmann & Lagrange, 2005). How TB affects
the other body parts is still unknown, however it hardly affects thyroid, heart, pancreas, and
skeletal muscles (Agarwal et al., 2005). If the mycobacteria enter into the bloodstream from
any damaged tissue, they can reach different parts of the tissue and cause infection appearing
as tiny and white tubercles (Crowley, 2010). This type of severe infection are commonly
found in children and it is called as miliary TB, in case of HIV (Harries, 2005) with high
fatality rate nearly about 30% (Jacob et al., 2009).
2.5. Diagnosis and treatment of TB
An ultimate diagnosis of TB is possible if MTB strain is identified in a clinical
specimen such as tissue biopsy, pus or sputum. But, due to the prolonged culture time of this
sluggish pathogen, treatment is often started before cultures are established. TB treatment
uses antibiotics for killing the bacteria. The cell wall composition of mycobacteria is found to
be very atypical at both structural and chemical level which obstructs the drug entry due to
which effective TB treatment is difficult (Brennan & Nikaido, 1995). It is very difficult to
diagnose active TB based on signs and symptoms (Bento et al., 2011). However, the patients
with continuous cough of more than two weeks or lung disease may be considered for
diagnosis of tuberculosis (Escalante, 2009). Multiple sputum cultures and chest X-ray are
usually part of the initial evaluation (Escalante, 2009). In developing world, “tuberculin skin
tests” and “interferon-γ release assays” are rarely used (Metcalfe et al., 2011; Sester et al.,
2011).
For rapid diagnosis of tuberculosis, adenosine deaminase and nucleic acid
amplification tests (NAATs) came in 2010 (Bento, 2011; Ling et al., 2008). As, there is no
specificity and sensitivity in detecting antibodies through blood tests, it is not recommended
(Steingart et al., 2011). For people with high TB risk, the Mantoux tuberculin skin test is
frequently used for diagnosis (Escalante, 2009). Those patients found positive in Mantoux
tuberculin test, interferon gamma release assays (IGRAs), are recommended which is more
sensitive (Amicosante et al., 2010).
Normally DOTS (Directly Observed Treatment, Short course) is used for drug
sensitive tuberculosis. It involves intake of four drugs, isoniazid, rifampicin, ethambutol and
pyrazinamide for the first two months followed by isoniazid and rifampicin for next four
months. In case of MDR-TB (Multi Drug Resistance - TB), second-line drugs like
fluoroquinolones, kanamycin and amikacin are used. The treatments currently used for MDR
and XDR-TB (Extremely Drug Resistant - TB) are long, toxic, expensive with little efficacy
(Zumla et al., 2012). There is over a dozen of new anti - TB drugs in clinical trials or in
preclinical development (Cole & Riccardi, 2011) and further research is going on for
developing new effective treatment for tuberculosis.
2.6. Mycobacterium tuberculosis (MTB) - the causative pathogen of TB
Mycobacterium tuberculosis is a slow-growing, aerobic, acid-fast, rod-shaped
bacterium. It causes TB in human. Other pathogens of the genus Mycobacterium causes
various human and animal diseases. Those include Mycobacterium ulcerans causes Buruli
ulcer, Mycobacterium leprae causes leprosy, Mycobacterium avium frequently reported to be
associated with HIV-infection (Horsburgh, 2001). MTB takes approximately 24 hours for cell
division and nearly 3 - 4 weeks for forming colonies in vitro. It has a complex life cycle in
human, having a latent or dormant phase where there is reduced metabolism within host cell.
The most characteristic feature of the genus Mycobacteria is its exceptionally flexible cell
envelope which contains affluent diversity of lipids such as glycolipids, mycolic acids and
polysaccharides. This distinct cell wall is responsible for the acid fast staining of
mycobacteria (Uplekar, 2012).
Hierarchical classification of Mycobacterium tuberculosis Bacilli (Cavalier-Smith, 2004), is
as follows:
Hierarchical classification of Mycobacterium tuberculosis
Kingdom : Bacteria Suborder : Cornebacterinneae
Phylum : Actinobacteria Family : Mycobacteriaceae
Class : Schizomycetes Genus : Mycobacterium
Subclass : Actinobacteridae Species : tuberculosis
Order : Actinomycetales
MTB is a genetically diverse organism with varying phenotypes. Various MTB strain
are associated with different geographical areas. TB outbreaks are caused by hypervirulent
MTB strains. These strains have mutation in the form of deletion in their cell wall modifying
regulators or enzymes responsible for responding environmental stimuli. Due to these
mutations, MTB acquires the ability to survive in garnuloma for a long time causing
persistent infection and makes the organism extremely pathogenic (Casali, 2009).
2.7. Factors responsible for pathogenesis and virulence in MTB
There are different factors involved in the pathogenesis and virulence of
Mycobacterium tuberculosis. By knowing the virulent factors of MTB may reveal better
understanding of host pathogen interaction (Camacho et al., 1999) and for the development of
novel vaccines and drugs. Many studies have been carried out in order to identify those
important factors by different researchers around the world.
Disruption of the erp gene in MTB and M. bovis have shown to decrease the ability to
multiply within host, revealed the important contribution of erp gene in MTB virulence
(Berthet et al., 1998). Secreted antigen 85-A (FbpA) encoded by Rv3804c, was reported as an
important virulence factor of MTB as this protein found to have mycolyl transferase activity
and helps in cell wall synthesis (Armitige et al., 2000). pcaA, an essential mycobacterial gene
required for cording and synthesis of mycolic acid cyclopropane ring in the cell wall of both
MTB and BCG reported by Glickman et al. (2000). pcaA was also reported as pro-
inflammatory activator of macrophages during early infection (Rao et al., 2005).
Dubnau et al. (2000) inactivated hma (cmaA, mma4) gene and constructed a mutant
MTB strain. They demonstrated that the mutant strain was unable to synthesize oxygenated
mycolic acids and also observed there was variation in its envelope permeability and
attenuation in mice (Dubnau et al., 2000). This result revealed the importance of oxygenated
mycolic acids for MTB virulence in mice.
Phospholipases C is known to have a significant function in pathogenesis of numerous
bacteria. Raynaud et al. (2002) revealed the involvement of Phospholipases C in MTB
virulence (Raynaud et al., 2002). Sirakova et al. (2003) demonstrated that Rv2946c (pks1)
and Rv2947c (pks15), required for polyketide synthase involved in the biosynthesis of
phthiocerol, as a virulence factor of MTB (Sirakova et al., 2003). The largest open reading
frame (pks12 / Rv2048c) of MTB required for dimycocerosyl phthiocerol was reported to
involve in MTB pathogenesis (Sirakova et al., 2003).
MmpL8 (Rv3823c), an integral membrane transport protein was reported as an important
factor which is necessary for “sulfolipid-1 biosynthesis” and MTB virulence (Converse et al.,
2003). Domenech et al. (2005) revealed the involvement of MmpL family protein in the
MTB virulence and drug resistance (Domenech et al., 2004; Domenech et al., 2005).
Sander et al. (2004) found lipoprotein metabolism as a most important factor MTB virulence
and pathogenesis (Sander et al., 2004). The eukaryotic and prokaryotic like isoforms of the
glyoxylate cycle enzyme isocitrate lyase (ICL) were shown to have important factor for fatty
acid catabolism and MTB virulence (Muñoz-Elías and McKinney, 2005). The mymA operon
(Rv3083 to Rv3089) of MTB was reported as an important factor for the pathogenesis of
MTB (Singh et al., 2005).
Membrane bound metalloprotease encoded by Rv2869c was reported as an important enzyme
for regulating cell envelope composition and in vivo growth (Makinoshima and Glickman,
2005). OtsB2 (Rv3372) encodes trehalose 6-phosphate phosphatase was shown as an
essential protein in OtsAB pathway required for trehalose biosynthesis in MTB (Murphy et
al., 2005). CFP-10 & ESAT-6 are two proteins encoded by locus, ESX-1 are required for full
virulence in MTB (Fortune et al., 2005). The high-affinity phosphate binding proteins
encoded by pstS1 and pstS2 genes of MTB demonstrated as essential factor for in vivo
virulence (Peirs et al., 2005). Due to deletion of kasB (Rv2246) gene known for coding the
enzyme 3-oxoacyl-ACP synthase, resulted in the subclinical latent TB and acid-fastness in
non- immunodeficiency mice (Bhatt et al., 2007). Brzostek et al. (2007) demonstrated
cholesterol oxidase (ChoD), known as cholesterol modification enzyme, as an imperative
factor for MTB virulence (Brzostek et al., 2007). A study conducted by Lun and Bishai
(2007) revealed that cell wall-associated carboxylesterase, encoded by Rv2223c gene as very
essential for full virulence of MTB (Lun & Bishai, 2007).
Gioffré et al. (2005) generated knock-out mutants in mce1, mce2 and mce3 operons of MTB
and found decreased ability of these mutants to multiply within host and thus concluded mce
operon as virulence factor of MTB (Gioffré et al., 2005). Other two similar studies have also
shown the importance of mce operons i.e. mce2 (Marjanovic et al., 2010) and mce3 & mce4
(Senaratne et al., 2008) in MTB virulence.
During infection, the bacterial proteins transport through MTB protein secretion system,
ESX-1, a significant factor for MTB virulence. Raghavan et al. (2008) revealed that EspR
(Rv3849), a main ESX-1 regulator, required for MTB virulence (Raghavan et al., 2008).
The ATP-binding cassette transporter LpqY-SugA-SugB-SugC found in MTB was reported
as an essential component for virulence (Kalscheuer et al., 2010). CtpV, a putative copper
exporter was also shown as a virulence factor for MTB (Ward et al., 2010). A novel heat
shock protein (Hsp22.5) encoded by Rv0990c was shown to be involved in MTB
pathogenesis (Abomoelak et al., 2011). The Region of difference 2 (RD2) shown to
contribute MTB virulence (Kozak et al., 2010). The acg gene of MTB was shown to as vital
factor for growth and virulence in vivo (Hu & Coates, 2011).
Besides the above mentioned factors, the two component system, senX3 and regX3 of MTB
(Parish et al., 2003), superoxide dismutase secreted by SecA2 (Braunstein et al., 2003), the
sigmaE (extra-cytoplasmic sigma factor) (Manganelli et al., 2004), the AraC family
transcriptional regulator Rv1931c (Frota et al., 2004), KatG, catalase-peroxidase (Li et al.,
1998; Ng et al., 2004), extracytoplasmic-function sigma factor SigL of MTB (Hahn et al.,
2005), SigD sigma factor (Calamita et al., 2005), The stress responsive chaperone alpha
crystallin 2 (Stewart et al., 2005), The phoP protein in MTB (Pérez et al., 2001; Martin et al.,
2006), PhoPR (two-component system) of MTB (Walters et al., 2006), nuoG (Rv3151) gene
(Velmurugan et al., 2007), transcriptional regulator of hypoxia (mosR) of MTB (Abomoelak
et al., 2009), the transcriptional regulator Rv0485 known to modulate pe and ppe gene
expression (Goldstone et al., 2009), Rv0198c, a putative matrix metalloprotease
(Muttucumaru et al., 2011), ESX-1 genes espF and espG1 (Bottai et al., 2011), PE_PGRS30
(Iantomasi et al., 2012) were also reported as an important factor for pathogenesis and
virulence of Mycobacterium tuberculosis.
2.8. Genome sequencing of MTB
The complete genome sequence information (Cole et al., 1998; Fleischmann et al., 2002) of
different strains of MTB, have provided valuable imminent of its biology. The availability of
the genome and proteome information of MTB combined with high-throughput technologies
might unlock the new landscape for the development of novel diagnostic techniques, better
vaccine and drugs against TB (Ahmed and Hasnain, 2004). With the declining expenses of
genome sequencing technology (Ng & Kirkness, 2010) and advancement in molecular
biology & functional genomics, whole genome sequence information of different MTB
strains has been released and available in public domain. As of June 2013, complete genome
sequence of several clinical and laboratory strains of MTB are available at “National Center
for Biotechnology Information (NCBI)” (Table 1).
Table 1: List of completely sequenced genomes of MTB complex
Organism Genome
Size (Mb)
GC% Genes Proteins
Mycobacterium tuberculosis H37Rv 4.41 65.6 4062 4003
Mycobacterium tuberculosis 7199-99 4.42 65.6 4042 3994
Mycobacterium tuberculosis CAS/NITR204 4.39 65.6 4007 3959
Mycobacterium tuberculosis CCDC5079 4.4 65.6 3695 3646
Mycobacterium tuberculosis CCDC5079 4.41 65.6 4204 4156
Mycobacterium tuberculosis CCDC5180 4.41 65.6 3638 3590
Mycobacterium tuberculosis CDC1551 4.4 65.6 4293 4189
Mycobacterium tuberculosis CTRI-2 4.4 65.6 4001 3944
Mycobacterium tuberculosis EAI5 4.39 65.6 4026 3902
Mycobacterium tuberculosis EAI5 / NITR206 4.39 65.6 4067 4019
Mycobacterium tuberculosis F11 4.42 65.6 3998 3941
Mycobacterium tuberculosis H37Ra 4.42 65.6 4084 4034
Mycobacterium tuberculosis KZN 1435 4.4 65.6 4107 4059
Mycobacterium tuberculosis KZN 4207 4.39 65.6 4044 3996
Mycobacterium tuberculosis KZN 605 4.4 65.6 4071 4001
Mycobacterium tuberculosis RGTB327 4.38 65.6 3739 3691
Mycobacterium tuberculosis RGTB423 4.41 65.6 3670 3622
Mycobacterium tuberculosis UT205 4.42 64.9 3812 3794
Mycobacterium tuberculosis str. Beijing / NITR203 4.41 65.6 4158 4110
Mycobacterium tuberculosis str. Erdman = ATCC
35801
4.39 65.6 4301 4245
Mycobacterium tuberculosis str. Haarlem 4.41 65.6 4100 4036
Mycobacterium tuberculosis str. Haarlem /
NITR202
4.4 65.6 3729 3680
2.9. Comparative genomics of MTB
Comparative genomics is a field of life science, which deals with the comparison of
genomic features of different organisms (Touchman, 2010; Xia, 2013). The nucleotide
sequence, regulatory sequences, genes & their order etc. are come under genomic features
(Xia, 2013). The main goal of comparative genomics is to compare either whole or large
parts of genome sequence obtained from genome sequencing project, to know the biological
similarities and variations between organisms along with their evolutionary relationship
(Touchman, 2010; Russel et al., 2011; Primrose, 2009). The most important principle of this
branch of genomics is that common features of two different organisms are encoded by
conserved DNA sequence (Hardison, 2003).
Due to advances in genomics and associated novel technologies, vast amount of data
sets are generating which provide new openings for indulgent and combating both genetic &
infectious diseases in humans (Cole 2002). Comparative genomic analysis of different
mycobacterial strains also helpful in identifying the genetic basis of varying phenotypes
which may further gives new insights in the development novel drugs and vaccines (Brosch
et al., 2000). Comparative genomics is a powerful and novel tool for revealing microbial
evolution and identifying genes which might responsible for encoding novel drug targets
(Cole, 2002). The comparison study revealed that all members of MTB complex share
99.9% identity in their DNA sequence and having identical 16s rRNA (Brosch et al., 2002;
Fleischmann et al., 2002).
With the help of comparative genomics, two tandem duplications of 29 and 36 kb in
the chromosome of Mycobacterium bovis BCG Pasteur strain have been revealed (Brosch et
al., 2000). The entire genome comparison among different strains of MTB complex revealed
the mutation (insertion / deletion / substitution), gene duplication and selection on the MTB
strain evolution. After the completion genome sequencing of MTB H37Rv (Cole et al., 1998),
MTB CDC1551 (Fleischmann et al., 2002), Mycobacterium bovis AF2122/97, the causative
agent of bovine (Garnier, 2003), the whole genome became available in the public domain.
The CDC 1551 strain known to cause TB outbreak in the United States in 1990s
(Valway et al., 1998) was observed to be comparatively less virulent that MTB H37Rv
(Manca et al., 2001). The genomic comparison of MTB H37Rv and MTB CDC1551 revealed
86 InDels and 1075 Single Nucleotide Polymorphisms (SNPs), of which 579 were observed
to be nonsynonymous, focusing the association of genotypic changes with phenotypic
variation (Fleischmann et al., 2002).
The Mycobacterium bovis genome sequence was found to be 99.95% identical to the
genomes of MTB CDC1551and MTB H37Rv but with slightly smaller genome size. With the
comparison of 2504 coding sequences (CDS) among these three genomes revealed 1600 CDS
of M. bovis identical to MTB H37Rv and MTB CDC1551 respectively. There were 2400
SNPs identified between the two MTB strains and M. bovis (Fleischmann et al., 2002). The
genome of Mycobacterium leprae (M. leprae) has undergone enormous gene loss, leaving
only 1604 functional protein coding genes in the bacillus (Cole et al., 2001). M. leprae is
known to cause leprosy. Out of 1439 common genes of MTB and M. leprae, a set of 219
genes were found to be unique to mycobacteria through in silico comparative analysis
(Marmiesse et al., 2004). Arnold et al. (2006), revealed the existence of short sequence
repeats in MTB used for genotyping schemes through whole genome comparison (Arnold et
al., 2006). Comparative genomics will also provide a proficient direction in making out the
genetic based variation in phenotype, pathogenicity and host range among different
mycobacterial species /strains. The current advances in comparative and functional genomics
have also improved our understanding of genetic diversity among the MTB complex. Diaz et
al. (2006), explored and identified genetic variability among different MTB strains through
DNA microarrays technology (Diaz et al., 2006).
The genome comparison of M. bovis BCG Pasteur 1173P2 (BCG Pasteur) with MTB
H37Rv, MTB CDC1551, and M. bovis AF2122/97 discovered Large Sequence
Polymorphisms (LSPs) which led to the loss of 133 genes in BCG Pasteur (Behr et al., 1999;
Brosch et al., 2007).
Most of the comparative genomics studies have been carried out on MTB H37Rv,
MTB H37Ra, MTB Erdman, CDC1551 and Mycobacterium bovis BCG (Uplekar, 2012).
Comparative genomics revealed the genomic diversity among different MTB strains.
Specifically, the identification of particular genes that differ between virulent and avirulent or
attenuated MTB strains may give insights the molecular mechanisms of pathogenicity and
also give a new direction for the development of new therapies against TB.
2.10. Comparative proteomics of MTB
Proteomics is the study of different features of proteins, particularly their location,
structures and functions (Anderson & Anderson, 1998; Blackstock & Weir, 1999). Proteins
are usually highly conserved and therefore amino acid substitutions are very important for
constructive functional selection. Identification of proteins and comparison of similar
proteins among different strains of same organism may reveal the variation of virulence
mechanisms that lead to different forms of disease caused by the same organism (Uplekar,
2012). Comparative proteomics deals with the comparison of proteomic features of different
organisms which can reveal the role and association of different proteins in different
biological systems. With the completion and availability of the genome sequence of different
MTB strains, the vast information about the proteome of corresponding strains also became
available. This provides not only the comparison of their genomes but also gives a new
insight for the comparison of their proteomes also. Since last decades comparative proteome
analysis among different MTB strains have also been carried out by different researchers
worldwide.
In silico analysis of MTB proteomes identified the existence of two novel protein
families, PE and PPE (Tekaia et al., 1999). Upon proteomic comparison of two M. bovis
BCG non virulent strains (Chicago and Copenhagen) with two MTB virulent strains (Erdman
and H37Rv), Jungblut et al. (1999) identified distinct proteins by mass spectrometry (Jungblut
et al., 1999). 27 diverse proteins specific to MTB were identified upon proteomic comparison
of culture supernatant from MTB H37Rv and M. bovis BCG strain (Mattow et al., 2003).
Miallau et al. (2013) identified “RelBE-like toxin-antitoxin complexes” associated with
lethality of MTB (Miallau et al., 2013).
2.11. MTB databases
Due to the advancement of Bioinformatics and life science research different database
on MTB have been developed and available in the public domain in last few years.
Mycobacterial Genome Divergence Database (MGDD) available at
http://mirna.jnu.ac.in/mgdd/, is an online database for accessing different types of genomic
variations (SNPs, indels, tandem repeats and divergent regions) among a six different strains
of MTB complex such as MTB H37Rv, MTB H37Ra, MTB CDC1551, MTB F11,
Mycobacterium bovis AF2122/97 and Mycobacterium bovis BCG (Vishnoi et al., 2008).
The TB drug resistance mutation database available at http://www.tbdreamdb.com/,
comprises comprehensive information on list of the genetic polymorphisms associated with
first and second line drug resistance in clinical MTB isolates all over the world.
Mycobacterium Database (MyBASE) available at http://mybase.psych.ac.cn/,
provides integrated information on Mycobacterium tuberculosis (MTB) and Mycobacterium
leprae (M. leprae). This information are mainly focused on genome polymorphism, predicted
operon along with the annotated information on essential & virulence genes and their role in
virulence and pathogenesis (Zhu et al., 2009).
The “Tuberculosis Database (TBDB)” available at “http://www.tbdb.org/” is an
online database providing information on various aspects of TB such as genome sequence,
assemblies & expression data obtained from pre- and post- publication data along with
curated literature for various MTB strains along with more than 20 strains related to MTB.
Expression data mainly include datasets of more than three thousand MTB microarrays, 95
Real time PCR and also 2.7 thousand microarrays for mouse and human TB related research,
and 260 microarrays for Streptomyces coelicolor (Reddy et al., 2009; Galagan et al., 2010).
The TubercuList database available at http://tuberculist.epfl.ch/ is a knowledge base
of MTB which amalgamates vast information on MTB genome details, protein information,
mutant and operon annotation, bibliography, drug and transcriptome data etc (Lew et al.,
2011).
The MTBreg, a database of conditionally regulated proteins in MTB available at
“http://www.doe-mbi.ucla.edu/Services/MTBreg/”, integrates information on proteins up-
and down- regulated in MTB, when the pathogenic organism is subjected to grow under
conditions mimicking infection.
The Mycobacterium tuberculosis Structural Database (MtbSD) available at
http://bmi.icmr.org.in/mtbsd/MtbSD.php, hosts 857 protein structure information of MTB
which comprises of description, domains, reaction catalyzed, structural homologues, active
site etc for each proteins (Hassan et al., 2011).
The Mycobacterium tuberculosis Proteome Comparison Database (MTB-PCDB)
available at http://www.bicjbtdrc-mgims.in/MTB-PCDB/, hosts 40252 protein sequence
comparison data obtained through inter-strain proteome comparison of five different strains
of MTB (H37Rv, H37Ra, CDC 1551, F11 and KZN 1435) (Jena et al., 2011). MycoProtease-
DB database available at http://www.bicjbtdrc-mgims.in/MycoProtease-DB/, domiciles 1324
protease information of 8 strains of Mycobacterium tuberculosis (MTB) complex and 4
Nontuberculous Mycobacteria (NTM) strains, whose complete genome sequence is available
(Jena et al., 2012).
Mycobacterium tuberculosis genome variation resource (tbvar) available at
http://genome.igib.res.in/tbvar/index.html, comprises of more than 29000 single nucleotide
variations obtained from more than 450 isolates of MTB complex (Joshi et al., 2014).
2.12. MTB H37Rv and MTB H37Ra
MTB H37Rv is the virulent counterpart of its avirulent sister strain H37Ra. In 1935,
William Steenken derived both the strains from their parent strain H37 (Steenken & Gardner
1946). MTB H37Ra has various distinct characters as compare to MTB H37Rv. Those
includes a “raised colony morphology” (Steenken, 1935), lack of neutral red dye binding
(Dubos & Middlebrook, 1948), lack of cord formation (Middlebrook et al., 1947), declined
survival inside macrophages (Mackaness et al., 1954) or under anaerobic conditions (Heplar
et al., 1954), decreased virulence in mice (Larson & Wicht, 1964) and guinea pigs (Alsaadi &
Smith, 1973). In spite of several genetic and biochemical studies in the past seven decades,
the molecular mechanism for the decrease of virulence in MTB H37Ra is still under study
(Zheng et al., 2008).
2.12.1. Genome biology of MTB H37Rv
The mycobacterium tuberculosis strain H37Rv obtained originally from the human-
lung H37 isolate in 1934, since then it has been broadly used worldwide in biomedical
research. In 1905, Edward R. Baldwin isolated H37 from a male nineteen years old
pulmonary tuberculosis patient (Steenken & Gardner, 1946). MTB H37Rv preserves its
complete virulence properties in animal model and is susceptible to anti tubercular drugs. The
whole genome of this pathogenic strain was sequenced in 1998 (Cole et al., 1998). The
genome consists of 4411532 base pairs (Figure 2.1) having 65.6 % guanine + cytosine (G+C)
content. It contains more than 4000 protein coding genes and the gene density is at one gene
per kilobases. Genes in the genome are evenly dispersed on both forward and reverse
strands. Nearly one half of the coding sequences are due to domain shuffling and gene
duplication (Tekaia et al., 1999).
2.12.2. Genome biology of MTB H37Ra
MTB H37Ra is an avirulent strain derived from the H37. The whole genome of the
avirulent strain of MTB was sequenced by the Chinese National Human Genome Center at
Shanghai. It has genome length of 4419977 base pairs (Figure 2.2) with G+C content of 65.6
%. It has 4034 protein coding genes out of 4084 genes. 45 genes are responsible for coding
tRNA whereas 3 for rRNA and 2 for others RNA
(http://www.ncbi.nlm.nih.gov/genome/genomes/166?details=on&project_id=58853).
Figure 2.1 Circular map of MTB H37Rv chromosome (Zhu et al., 2009; Stothard & Wishart 2005)
Figure 2.2 Circular map of MTB H37Ra chromosome (Zhu et al., 2009; Stothard & Wishart 2005)
2.12.3. Genomic and proteomic comparison of MTB H37Rv and MTB H37Ra
A genomic approach was first carried out by Brosch et al. (2000), for identifying the
variations between MTB H37Ra and MTB H37Rv at genetic level. Their study revealed dual
polymorphisms in these two strains i.e. a fragment of 480 kilo bases in MTB H37Rv was
found to be substituted by two segments of size 260 and 220 kilo bases in MTB H37Ra and
presence of a DraI segment of size 7900 bases in MTB H37Ra which was absent in MTB
H37Rv. The reported 7900 bases polymorphism was due to the removal of MTB H37Rv RvD2
in MTB H37Ra. Three IS6110 deletions (RvD3 to RvD5) from the MTB H37Rv genome were
also found in MTB H37Ra. Authors of this study also described the occurrence and
mechanisms of genomic differences at genomic level between MTB H37Rv and MTB H37Ra
but they were not clear about the role of variation in the MTB H37Ra attenuation (Brosch et
al., 2000).
Genomic comparison between MTB H37Rv and H37Ra also revealed that, the genome
of MTB H37Rv is very similar to that of MTB H37Ra and is 8,445 base pair smaller than that
of H37Ra (Zheng et al., 2008). In H37Ra and H37Rv, only 98 “single nucleotide variations
(SNVs)” were identified (Zheng et al., 2008). Out of them, 119 were found identical between
MTB CDC1551 and MTB H37Ra and three were because of MTB H37Rv variation, leaving
only 76 MTB H37Ra specific SNVs that affecting only 32 genes (Zheng et al., 2008).
An in silico analyses of PE/PPE family of MTB H37Ra and MTB H37Rv revealed
genetic variations in terms of numerous SNVs along with some deletions and insertions
between these two strains. Due to these variations, changes are also observed in their
physico-chemical properties, protein: protein interacting domains and phosphorylation, sites
which can be correlated to differences in their virulence and pathogenesis (Kohli et al., 2012).
A link between the avirulence of MTB H37Ra and a single amino acid substitution in
the PhoP protein was observed by Gonzalo-Asensio et al. (2008). In this study, they focused
on the phoP gene, which was found to have significant role in MTB virulence. This gene is
completely conserved in all MTB complexes including MTB H37Rv except that of MTB
H37Ra. There is point mutation in phoP gene resulting formation of mutilated protein with
single amino acid variation i.e. replacement of the polar residue Ser219 by the nonpolar
residue Leu (Gonzalo-Asensio et al., 2008).
Målen et al. (2011) compared membrane proteins of MTB H37Rv with its avirulent
sister strain MTB H37Ra and identified more than seventeen hundred proteins. Among these
proteins identified by them, majority were found to have comparable abundance in both the
strains. There were 29 “membrane-associated proteins” reported with a five or more fold
variation in their comparative abundance when compared one strain with the other. There
were nineteen membrane and lipo proteins of MTB H37Rv and 10 other proteins of MTB
H37Ra, observed with higher abundance in corresponding strains (Målen et al., 2011).
2.13. Bioinformatics tools for genome and proteome analysis
2.13.1. Genome and proteome comparison tools
A number of Bioinformatics tools and techniques are available in the public domain
for genome and proteome comparison. GenomeVISTA available at http://genome.lbl.gov/cgi-
bin/. GenomeVista is an automatic server which can be used to find out the candidate
orthologous regions for a draft or finished DNA sequence from one species based on the
genome of a second species. It also provides their comparative analysis in details (Couronne
et al., 2003; Bray et al., 2003). A set of alignment programs present in the Lagan Toolkit
available at http://lagan.stanford.edu/, can be used for comparative genomics. LAGAN is
used for rapid global alignment of two homologous genomic sequences whereas Multi-
LAGAN is used for multiple global alignments of genomic sequences (Brudno et al., 2003).
PipMaker is a web based application which identifies conserved segments between two long
genomic sequences through sequence comparison. It provides an efficient technique for
aligning genomic sequences and returns a comprehensive result in the form of a plot known,
the percent identity plot (pip) (Elnitski et al., 2003). MUMmer is a system available at
http://mummer.sourceforge.net/, for rapidly aligning whole genomes, irrespective of being in
complete or draft. MUMmer 3.0 specifically aligns large genomes of eukaryotic organisms at
varying evolutionary distances (Kurtz et al., 2004).
GenomeBlast, is an online tool available at “http://bioinfo- srv1.awh.unomaha.edu/”
for comparative study of small genomes. Besides, identifying unique and homologous gene
among multiple genomes, it also illustrate their distributions on genomes in a graphical
manner (Lu et al., 2006).
Artemis Comparison Tool (ACT) is a free tool that allows pair-wise comparisons between
complete genome sequences with annotation. It can also be used for identification and
analysis of regions of similarity and variation between genomes by considering entire
sequence comparison (Carver et al., 2008). ABWGAT (Anchor-Based Whole Genome
Analysis Tool) available at http://abwgc.jnu.ac.in/_sarba/cgi-bin/abwgc_retrival.cgi, is a web-
based tool for identification of sequence variations such as SNVs, indels, inversion and repeat
expansion at genomic level (Das et al., 2009).
PROCOM is a web-based tool available at http://procom.wustl.edu/, used for
comparing multiple eukaryotic proteomes. Currently it hosts proteomes of 32 eukaryotic
organisms for comparison (Li et al., 2005). PROMPT (Protein Mapping and Comparison
Tool) is a comprehensive bioinformatics software environment available at
http://www.geneinfo.eu/prompt/index.php, which can be used for retrieving, analyzing,
mapping and comparing protein sets. Easy mapping of various types of sequence identifiers,
automatic data retrieval & integration, and a user friendly graphical interface are the main
features of PROMPT (Schmidt & Frishman, 2006).
2.13.2. Mutation analysis tools
Comparative proteome analysis among different pathogenic organisms may come out
with some proteins with different type of variations in their amino acid sequences. These
variations may have some important role in the evolution of a particular organism that results
in the divergence of different strains. A single amino acid mutation in protein sequence may
cause alteration in protein structure and function that may account for virulence and drug
resistance properties of pathogenic organisms. Some mutation analysis systems are available
for analyzing the effect of amino acid variation in the structure & function of proteins, such
as PolyPhen (Adzhubei et al., 2010), SIFT (Ng et al., 2003; Kumar et al., 2009), PROVEAN
(Choi et al., 2012) and Project HOPE (Venselaar et al., 2010). Computational tools like SIFT,
PolyPhen and PROVEAN are able to predict the deleterious non-synonymous SNPs whereas
Project HOPE is a system that can automatically analyze the consequence of a point mutation
on the three dimensional structure of a protein.