research • spirochaete bacterium speciesthe genome of the treponema pallidum sub-species were...

10
RESEARCH • SPIROCHAETE BACTERIUM Anjaneyulu K et al. In silico Comparative Genomics of Treponema, Species, 2012, 1(1), 5-14, http://www.discovery.org.in/s.htm Anjaney 1. Ph.d Scholar, Dept. of Bio Sciences., South Gujarat Un 2. Ph.d Scholar, Dept. of Bio Sciences., South Gujarat Un 3. Professor , Dept. of Bio Sciences., South Gujarat Unive *Corresponding author: Ph.d Scholar, Dept. of Bio [email protected] Received 04 August; accepted 18 September; published o T. pallidum is one of the few human bacterial pathoge few of its virulence factors have been identified and evolutionary or mutation analysis and complementatio the available Treponema subspecies and the resulting in the whole context. Divergence within the species i genes that is present in a particular species. Proteins to act as a pathogen or its resistance to a certain drug towards its achievement. In the present work, we have numerous protein sequence properties using state-o sequenced to study the gene properties (Comparative with multiplewise alignment tool. The result obtained w of these sequences were predicted with the help of p we find that proteins of different species are signicantl in within their genomes. This discrimination does not sequence. We have also constructed a phylogenetic tr gene and protein comparison is the first assessment phylogenetic studies of these enigmatic organisms. M with predicted virulence and unknown functions sugge Keywords: T.pallidium sub-species; Comparative Gen Abbreviations: GW – GeneWiz, Bp –Base Pairs, AA 1. INTRODUCTIO Both comparative gene proteins encoded in com revels novel and unique s of the techniques are still to reach their full potent powerful tool which enab mechanism of evolutio strategies in emerging Spirochetes. T. pallidum human pathogens that ha in vitro. A Gram-nega subspecies cause trepon bejel, pinta and yaws. Th and outer membrane. pallidum namely Trepon DAL1, Treponema pal Treponema pallidum su pallidum subsps. pallidu pallidum subsps. pallidum basis of some genomic functional proteins. Close specific differences and e genes (Lukashin and Bor RESEARCH • SPIROCHAETE BACTERIUM In silico Comparative Homology: The relationship among sequences due to descent from a common ancestral sequence. An important organizing principle for genomic studies because structural and functional similarities tend to change together along the structure of homology relationships. ISSN 2319 – 5746 EISSN 2319 – 5754 © 2012 discove yulu k 1 , Ashok .P. Patil 2 , Desai PV 3* niversity , Post Box No 49, Surat - 395007, Gujarat, INDIA niversity , Post Box No 49, Surat - 395007, Gujarat, INDIA ersity , Post Box No 49, Surat - 395007, Gujarat, INDIA o Sciences, South Gujarat University, Post Box No 49, Surat - 3 online 01 October; printed 16 October 2012 ABSTRACT ens that have not been cultivated in vitro. This pathogens still remains the pathogenesis of the disease is poorly understood. Several exp on to definitively identify virulence determinants are in infancy state. g comparative analysis of genome sequences approaches seems to is mainly caused by variation in gene and protein sequences but al that are specific for a particular species may be responsible for its ad g. Identifying species-specific proteins is thus a relevant aim, and her e compared the genomes, genes and proteins of five different Trepon of-the-art Support Vector Machines. The genome of the Treponem e Genome Sequencing, CGS) about 5016 protein coding genes. T was filtered on the basis of sequences with 100 percent and 99 per prosite scan. When compared to the heterogeneity in the T. pallidum tly correlated and can be distinguished based on sequence properties t rely on any homology criteria but is based only on the biophysical ree based on the results of the comparisons, and compared it to the w of the degree of variation between the five T.pallidium sub-species a Moreover the divergence in genome, genes and proteins more often b esting their involvement in infection differences such as yaws or syphil nomeics, Gene Wiz Browser, Prosite, Cladogram. –Amino Acids, CGS - Comparative Genome Sequencing. ON e analysis as well as comparative mplete genomes of an organism species specific information inspite being in their infancy, and have yet tial. A comparative genomics is a bles to understand the underlying on, pathogenesis and adaptive non cultural pathogens such as mi spp is one of the few unusual ave not been cultured continuously ative spirochaete bacterium with nemal diseases such as syphilis, he treponemes have a cytoplasmic Five subspecies of Treponema nema pallidum subsps. pallidum llidum subsps. pallidum SS14, ubsps. pallidum str., Treponema um str. Nichols and Treponema m str. CDC2 were compared on the properties and various types of ely related species reveal species- evolutionary selection pressures on rodovsky, 1998). At the same time, a comparative sequence ana better annotation. In addition and absence of lipopolysacc contains relatively few intra forward several limitations on to gain more insight in the genomes of the species o reveals the comparative rela ORFs , protein sequences encoded in the complete gen spp. Conclusions from such the understanding of the h enable pathogens to carve o nature. In the present study, each genome along with analysis of the five different can provide further insigh uniqueness and importantly leading to new approaches treatment (Lowe and Eddy, 19 2. Statement of the The present comparative an subspecies which are pathoge the similarity and difference genomes. The comparative st Species, Volu Sp e Genomics of Treponema www.discovery.org.in ery publication. All rights reserved 5 395007, Gujarat, INDIA, E-Mail: s the enigmatic pathogen, since perimental approaches such as Whole genome sequencing of be one a promising approach lso by differences in the set of dapted phenotype, e.g. its ability re we make a small contribution nema sub species by extracting ma pallidum sub-species were The sequences were compared rcent similarity. Functional sites m chromosome. To our surprise, s and functional sites encoded characteristics en-coded in the well-documented. The observed and hence it paves the way for belonged to the group of genes lis. alysis provides the means for a n to its spirochetal morphology charide in its outer membrane a membranous proteins; put its research. With an objective e functional elements of the of Treponema ,present study ationship within the predicted with their functional sites , nomes of Treponema pallidum Comparative analysis augment host-parasite interactions that out unique ecological niches in , we summarize the findings of a computational comparative subspecies of T. pallidum that ht into species and strain y can stimulate new studies into disease prevention and 997). e Problem nalysis among the Treponema enic to human, aims to allocate es among the genes encoded tudy also aims to ume 1, Number 1, October 2012 pecies

Upload: others

Post on 10-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

RESEARCH • SPIROCHAETE BACTERIUM

Anjaneyulu K et al.In silico Comparative Genomics of Treponema,Species, 2012, 1(1), 5-14, www.discovery.org.inhttp://www.discovery.org.in/s.htm © 2012 discovery publication. All rights reserved

5

Anjaneyulu k1, Ashok .P. Patil2, Desai PV3*

1. Ph.d Scholar, Dept. of Bio Sciences., South Gujarat University , Post Box No 49, Surat - 395007, Gujarat, INDIA2. Ph.d Scholar, Dept. of Bio Sciences., South Gujarat University , Post Box No 49, Surat - 395007, Gujarat, INDIA3. Professor , Dept. of Bio Sciences., South Gujarat University , Post Box No 49, Surat - 395007, Gujarat, INDIA

*Corresponding author: Ph.d Scholar, Dept. of Bio Sciences, South Gujarat University, Post Box No 49, Surat - 395007, Gujarat, INDIA, E-Mail:[email protected]

Received 04 August; accepted 18 September; published online 01 October; printed 16 October 2012

ABSTRACTT. pallidum is one of the few human bacterial pathogens that have not been cultivated in vitro. This pathogens still remains the enigmatic pathogen, sincefew of its virulence factors have been identified and the pathogenesis of the disease is poorly understood. Several experimental approaches such asevolutionary or mutation analysis and complementation to definitively identify virulence determinants are in infancy state. Whole genome sequencing ofthe available Treponema subspecies and the resulting comparative analysis of genome sequences approaches seems to be one a promising approachin the whole context. Divergence within the species is mainly caused by variation in gene and protein sequences but also by differences in the set ofgenes that is present in a particular species. Proteins that are specific for a particular species may be responsible for its adapted phenotype, e.g. its abilityto act as a pathogen or its resistance to a certain drug. Identifying species-specific proteins is thus a relevant aim, and here we make a small contributiontowards its achievement. In the present work, we have compared the genomes, genes and proteins of five different Treponema sub species by extractingnumerous protein sequence properties using state-of-the-art Support Vector Machines. The genome of the Treponema pallidum sub-species weresequenced to study the gene properties (Comparative Genome Sequencing, CGS) about 5016 protein coding genes. The sequences were comparedwith multiplewise alignment tool. The result obtained was filtered on the basis of sequences with 100 percent and 99 percent similarity. Functional sitesof these sequences were predicted with the help of prosite scan. When compared to the heterogeneity in the T. pallidum chromosome. To our surprise,we find that proteins of different species are signicantly correlated and can be distinguished based on sequence properties and functional sites encodedin within their genomes. This discrimination does not rely on any homology criteria but is based only on the biophysical characteristics en-coded in thesequence. We have also constructed a phylogenetic tree based on the results of the comparisons, and compared it to the well-documented. The observedgene and protein comparison is the first assessment of the degree of variation between the five T.pallidium sub-species and hence it paves the way forphylogenetic studies of these enigmatic organisms. Moreover the divergence in genome, genes and proteins more often belonged to the group of geneswith predicted virulence and unknown functions suggesting their involvement in infection differences such as yaws or syphilis.

Keywords: T.pallidium sub-species; Comparative Genomeics, Gene Wiz Browser, Prosite, Cladogram.

Abbreviations: GW – GeneWiz, Bp –Base Pairs, AA –Amino Acids, CGS - Comparative Genome Sequencing.

1. INTRODUCTIONBoth comparative gene analysis as well as comparativeproteins encoded in complete genomes of an organismrevels novel and unique species specific information inspiteof the techniques are still being in their infancy, and have yetto reach their full potential. A comparative genomics is apowerful tool which enables to understand the underlyingmechanism of evolution, pathogenesis and adaptivestrategies in emerging non cultural pathogens such asSpirochetes. T. pallidumi spp is one of the few unusualhuman pathogens that have not been cultured continuouslyin vitro. A Gram-negative spirochaete bacterium withsubspecies cause treponemal diseases such as syphilis,bejel, pinta and yaws. The treponemes have a cytoplasmicand outer membrane. Five subspecies of Treponemapallidum namely Treponema pallidum subsps. pallidumDAL1, Treponema pallidum subsps. pallidum SS14,Treponema pallidum subsps. pallidum str., Treponemapallidum subsps. pallidum str. Nichols and Treponemapallidum subsps. pallidum str. CDC2 were compared on thebasis of some genomic properties and various types offunctional proteins. Closely related species reveal species-specific differences and evolutionary selection pressures ongenes (Lukashin and Borodovsky, 1998). At the same time,

a comparative sequence analysis provides the means for abetter annotation. In addition to its spirochetal morphologyand absence of lipopolysaccharide in its outer membranecontains relatively few intra membranous proteins; putforward several limitations on its research. With an objectiveto gain more insight in the functional elements of thegenomes of the species of Treponema ,present studyreveals the comparative relationship within the predictedORFs , protein sequences with their functional sites ,encoded in the complete genomes of Treponema pallidumspp. Conclusions from such Comparative analysis augmentthe understanding of the host-parasite interactions thatenable pathogens to carve out unique ecological niches innature. In the present study, we summarize the findings ofeach genome along with a computational comparativeanalysis of the five different subspecies of T. pallidum thatcan provide further insight into species and strainuniqueness and importantly can stimulate new studiesleading to new approaches into disease prevention andtreatment (Lowe and Eddy, 1997).

2. Statement of the ProblemThe present comparative analysis among the Treponemasubspecies which are pathogenic to human, aims to allocatethe similarity and differences among the genes encodedgenomes. The comparative study also aims to

RESEARCH • SPIROCHAETE BACTERIUM Species, Volume 1, Number 1, October 2012

SpeciesIn silico Comparative Genomics of Treponema

Homology:The relationship amongsequences due to descentfrom a common ancestralsequence. An importantorganizing principle forgenomic studies becausestructural and functionalsimilarities tend to changetogether along thestructure of homologyrelationships.

ISS

N23

19–

5746

EIS

SN

231

9–

5754

RESEARCH • SPIROCHAETE BACTERIUM

Anjaneyulu K et al.In silico Comparative Genomics of Treponema,Species, 2012, 1(1), 5-14, www.discovery.org.inhttp://www.discovery.org.in/s.htm © 2012 discovery publication. All rights reserved

5

Anjaneyulu k1, Ashok .P. Patil2, Desai PV3*

1. Ph.d Scholar, Dept. of Bio Sciences., South Gujarat University , Post Box No 49, Surat - 395007, Gujarat, INDIA2. Ph.d Scholar, Dept. of Bio Sciences., South Gujarat University , Post Box No 49, Surat - 395007, Gujarat, INDIA3. Professor , Dept. of Bio Sciences., South Gujarat University , Post Box No 49, Surat - 395007, Gujarat, INDIA

*Corresponding author: Ph.d Scholar, Dept. of Bio Sciences, South Gujarat University, Post Box No 49, Surat - 395007, Gujarat, INDIA, E-Mail:[email protected]

Received 04 August; accepted 18 September; published online 01 October; printed 16 October 2012

ABSTRACTT. pallidum is one of the few human bacterial pathogens that have not been cultivated in vitro. This pathogens still remains the enigmatic pathogen, sincefew of its virulence factors have been identified and the pathogenesis of the disease is poorly understood. Several experimental approaches such asevolutionary or mutation analysis and complementation to definitively identify virulence determinants are in infancy state. Whole genome sequencing ofthe available Treponema subspecies and the resulting comparative analysis of genome sequences approaches seems to be one a promising approachin the whole context. Divergence within the species is mainly caused by variation in gene and protein sequences but also by differences in the set ofgenes that is present in a particular species. Proteins that are specific for a particular species may be responsible for its adapted phenotype, e.g. its abilityto act as a pathogen or its resistance to a certain drug. Identifying species-specific proteins is thus a relevant aim, and here we make a small contributiontowards its achievement. In the present work, we have compared the genomes, genes and proteins of five different Treponema sub species by extractingnumerous protein sequence properties using state-of-the-art Support Vector Machines. The genome of the Treponema pallidum sub-species weresequenced to study the gene properties (Comparative Genome Sequencing, CGS) about 5016 protein coding genes. The sequences were comparedwith multiplewise alignment tool. The result obtained was filtered on the basis of sequences with 100 percent and 99 percent similarity. Functional sitesof these sequences were predicted with the help of prosite scan. When compared to the heterogeneity in the T. pallidum chromosome. To our surprise,we find that proteins of different species are signicantly correlated and can be distinguished based on sequence properties and functional sites encodedin within their genomes. This discrimination does not rely on any homology criteria but is based only on the biophysical characteristics en-coded in thesequence. We have also constructed a phylogenetic tree based on the results of the comparisons, and compared it to the well-documented. The observedgene and protein comparison is the first assessment of the degree of variation between the five T.pallidium sub-species and hence it paves the way forphylogenetic studies of these enigmatic organisms. Moreover the divergence in genome, genes and proteins more often belonged to the group of geneswith predicted virulence and unknown functions suggesting their involvement in infection differences such as yaws or syphilis.

Keywords: T.pallidium sub-species; Comparative Genomeics, Gene Wiz Browser, Prosite, Cladogram.

Abbreviations: GW – GeneWiz, Bp –Base Pairs, AA –Amino Acids, CGS - Comparative Genome Sequencing.

1. INTRODUCTIONBoth comparative gene analysis as well as comparativeproteins encoded in complete genomes of an organismrevels novel and unique species specific information inspiteof the techniques are still being in their infancy, and have yetto reach their full potential. A comparative genomics is apowerful tool which enables to understand the underlyingmechanism of evolution, pathogenesis and adaptivestrategies in emerging non cultural pathogens such asSpirochetes. T. pallidumi spp is one of the few unusualhuman pathogens that have not been cultured continuouslyin vitro. A Gram-negative spirochaete bacterium withsubspecies cause treponemal diseases such as syphilis,bejel, pinta and yaws. The treponemes have a cytoplasmicand outer membrane. Five subspecies of Treponemapallidum namely Treponema pallidum subsps. pallidumDAL1, Treponema pallidum subsps. pallidum SS14,Treponema pallidum subsps. pallidum str., Treponemapallidum subsps. pallidum str. Nichols and Treponemapallidum subsps. pallidum str. CDC2 were compared on thebasis of some genomic properties and various types offunctional proteins. Closely related species reveal species-specific differences and evolutionary selection pressures ongenes (Lukashin and Borodovsky, 1998). At the same time,

a comparative sequence analysis provides the means for abetter annotation. In addition to its spirochetal morphologyand absence of lipopolysaccharide in its outer membranecontains relatively few intra membranous proteins; putforward several limitations on its research. With an objectiveto gain more insight in the functional elements of thegenomes of the species of Treponema ,present studyreveals the comparative relationship within the predictedORFs , protein sequences with their functional sites ,encoded in the complete genomes of Treponema pallidumspp. Conclusions from such Comparative analysis augmentthe understanding of the host-parasite interactions thatenable pathogens to carve out unique ecological niches innature. In the present study, we summarize the findings ofeach genome along with a computational comparativeanalysis of the five different subspecies of T. pallidum thatcan provide further insight into species and strainuniqueness and importantly can stimulate new studiesleading to new approaches into disease prevention andtreatment (Lowe and Eddy, 1997).

2. Statement of the ProblemThe present comparative analysis among the Treponemasubspecies which are pathogenic to human, aims to allocatethe similarity and differences among the genes encodedgenomes. The comparative study also aims to

RESEARCH • SPIROCHAETE BACTERIUM Species, Volume 1, Number 1, October 2012

SpeciesIn silico Comparative Genomics of Treponema

Homology:The relationship amongsequences due to descentfrom a common ancestralsequence. An importantorganizing principle forgenomic studies becausestructural and functionalsimilarities tend to changetogether along thestructure of homologyrelationships.

ISS

N23

19–

5746

EIS

SN

231

9–

5754

RESEARCH • SPIROCHAETE BACTERIUM

Anjaneyulu K et al.In silico Comparative Genomics of Treponema,Species, 2012, 1(1), 5-14, www.discovery.org.inhttp://www.discovery.org.in/s.htm © 2012 discovery publication. All rights reserved

5

Anjaneyulu k1, Ashok .P. Patil2, Desai PV3*

1. Ph.d Scholar, Dept. of Bio Sciences., South Gujarat University , Post Box No 49, Surat - 395007, Gujarat, INDIA2. Ph.d Scholar, Dept. of Bio Sciences., South Gujarat University , Post Box No 49, Surat - 395007, Gujarat, INDIA3. Professor , Dept. of Bio Sciences., South Gujarat University , Post Box No 49, Surat - 395007, Gujarat, INDIA

*Corresponding author: Ph.d Scholar, Dept. of Bio Sciences, South Gujarat University, Post Box No 49, Surat - 395007, Gujarat, INDIA, E-Mail:[email protected]

Received 04 August; accepted 18 September; published online 01 October; printed 16 October 2012

ABSTRACTT. pallidum is one of the few human bacterial pathogens that have not been cultivated in vitro. This pathogens still remains the enigmatic pathogen, sincefew of its virulence factors have been identified and the pathogenesis of the disease is poorly understood. Several experimental approaches such asevolutionary or mutation analysis and complementation to definitively identify virulence determinants are in infancy state. Whole genome sequencing ofthe available Treponema subspecies and the resulting comparative analysis of genome sequences approaches seems to be one a promising approachin the whole context. Divergence within the species is mainly caused by variation in gene and protein sequences but also by differences in the set ofgenes that is present in a particular species. Proteins that are specific for a particular species may be responsible for its adapted phenotype, e.g. its abilityto act as a pathogen or its resistance to a certain drug. Identifying species-specific proteins is thus a relevant aim, and here we make a small contributiontowards its achievement. In the present work, we have compared the genomes, genes and proteins of five different Treponema sub species by extractingnumerous protein sequence properties using state-of-the-art Support Vector Machines. The genome of the Treponema pallidum sub-species weresequenced to study the gene properties (Comparative Genome Sequencing, CGS) about 5016 protein coding genes. The sequences were comparedwith multiplewise alignment tool. The result obtained was filtered on the basis of sequences with 100 percent and 99 percent similarity. Functional sitesof these sequences were predicted with the help of prosite scan. When compared to the heterogeneity in the T. pallidum chromosome. To our surprise,we find that proteins of different species are signicantly correlated and can be distinguished based on sequence properties and functional sites encodedin within their genomes. This discrimination does not rely on any homology criteria but is based only on the biophysical characteristics en-coded in thesequence. We have also constructed a phylogenetic tree based on the results of the comparisons, and compared it to the well-documented. The observedgene and protein comparison is the first assessment of the degree of variation between the five T.pallidium sub-species and hence it paves the way forphylogenetic studies of these enigmatic organisms. Moreover the divergence in genome, genes and proteins more often belonged to the group of geneswith predicted virulence and unknown functions suggesting their involvement in infection differences such as yaws or syphilis.

Keywords: T.pallidium sub-species; Comparative Genomeics, Gene Wiz Browser, Prosite, Cladogram.

Abbreviations: GW – GeneWiz, Bp –Base Pairs, AA –Amino Acids, CGS - Comparative Genome Sequencing.

1. INTRODUCTIONBoth comparative gene analysis as well as comparativeproteins encoded in complete genomes of an organismrevels novel and unique species specific information inspiteof the techniques are still being in their infancy, and have yetto reach their full potential. A comparative genomics is apowerful tool which enables to understand the underlyingmechanism of evolution, pathogenesis and adaptivestrategies in emerging non cultural pathogens such asSpirochetes. T. pallidumi spp is one of the few unusualhuman pathogens that have not been cultured continuouslyin vitro. A Gram-negative spirochaete bacterium withsubspecies cause treponemal diseases such as syphilis,bejel, pinta and yaws. The treponemes have a cytoplasmicand outer membrane. Five subspecies of Treponemapallidum namely Treponema pallidum subsps. pallidumDAL1, Treponema pallidum subsps. pallidum SS14,Treponema pallidum subsps. pallidum str., Treponemapallidum subsps. pallidum str. Nichols and Treponemapallidum subsps. pallidum str. CDC2 were compared on thebasis of some genomic properties and various types offunctional proteins. Closely related species reveal species-specific differences and evolutionary selection pressures ongenes (Lukashin and Borodovsky, 1998). At the same time,

a comparative sequence analysis provides the means for abetter annotation. In addition to its spirochetal morphologyand absence of lipopolysaccharide in its outer membranecontains relatively few intra membranous proteins; putforward several limitations on its research. With an objectiveto gain more insight in the functional elements of thegenomes of the species of Treponema ,present studyreveals the comparative relationship within the predictedORFs , protein sequences with their functional sites ,encoded in the complete genomes of Treponema pallidumspp. Conclusions from such Comparative analysis augmentthe understanding of the host-parasite interactions thatenable pathogens to carve out unique ecological niches innature. In the present study, we summarize the findings ofeach genome along with a computational comparativeanalysis of the five different subspecies of T. pallidum thatcan provide further insight into species and strainuniqueness and importantly can stimulate new studiesleading to new approaches into disease prevention andtreatment (Lowe and Eddy, 1997).

2. Statement of the ProblemThe present comparative analysis among the Treponemasubspecies which are pathogenic to human, aims to allocatethe similarity and differences among the genes encodedgenomes. The comparative study also aims to

RESEARCH • SPIROCHAETE BACTERIUM Species, Volume 1, Number 1, October 2012

SpeciesIn silico Comparative Genomics of Treponema

Homology:The relationship amongsequences due to descentfrom a common ancestralsequence. An importantorganizing principle forgenomic studies becausestructural and functionalsimilarities tend to changetogether along thestructure of homologyrelationships.

ISS

N23

19–

5746

EIS

SN

231

9–

5754

RESEARCH • SPIROCHAETE BACTERIUM

Anjaneyulu K et al.In silico Comparative Genomics of Treponema,Species, 2012, 1(1), 5-14, www.discovery.org.inhttp://www.discovery.org.in/s.htm © 2012 discovery publication. All rights reserved

6

reveal the genomic properties of the species, identify theidentical sequences of proteins and analyze their functionalsites.

2.1 Scope of the StudyThe aim of the study is to allocate the common anddifferentiating functional components encoded within thegenomes of the five species of Treponema. Researchreview signifies the similarities and differences atmorphological level as, Treponema pallidum subspeciesare morphologically and serologically indistinguishable. Themode of transmission is not unique in nature. The course ofeach disease is significantly variable. The outer membraneof T. pallidum has too few surface proteins for an antibody tobe effective. Thus due to poor antigencity, it’s diagnosis andtreatment through antibodies (vaccines) is difficult. Themolecular analysis at sequence point such as on their modeof pathogencity and study the clinical significance of thesespecies. Comparative genome, gene and protein analysis ofthe five subspecies of their findings the similarity anddifferences may be useful for future research

2.2 Limitations of the Study Study undertaken is limited to three years The genomes of the species Treponema was

available during the timeline of this research study We did the citation analysis based on the Secondary

information available in the databases In this study we did not include citiation analysis

based on the invitro findings

3. MATERIALS AND METHODS3 .1. Materials3.1.1. Genome SequencesGenomes of the Treponema pallidum subsps namelyTreponema pallidum subsps. pallidum DAL1 (species A),Treponema pallidum subsps. pallidum SS14 (species B),Treponema pallidum subsps. pallidum str. Chicago (speciesC), Treponema pallidum subsps. pallidum str. Nichols(species D) and Treponema pallidum subsps. pallidum str.CDC2 (species E) were selected for analysis andabbrivated as above. Genome sequences of all theTreponema subspecies and genome statistics werecollected from the Genome sequence database maintainedat the National center for Biotechnology Information(National Institutes of Health, Bethesda, Md.). This resourseorganizes information on genomes including sequences,maps, chromosomes, assemblies and annotations(http://www.ncbi.nlm.nih. gov/sites/entrez?Db=genome).

3.2. Research Methodology3.2.1. Sequence analysis: detection andinterpretation of varying levels of genomesequence similarity3.2.1.1. Clustal W Multiple Wise Alignments ProgramClustalW2 is a general purpose multiple sequence alignmentprogram for DNA or proteins. It attempts to calculate thebest match for the selected sequences and lines them up sothat the identities, similarities and differences can be seen(http://www.ebi.ac.uk/Tools/msa/clustalw2/#).

3.2.1.2. Tree View SoftwarePhylogenetic trees were constructed using the CLUSTALWprograms (Sigrist et al., 2005) with the neighbor-joining andleast squares (Fitch-Margoliash) methods, accompanied bybootstrap analysis (De Castro et al., 2005). Tree View is aprogram for displaying and printing phylogenies. Theprogram reads most NEXUS tree files (such as thoseproduced by PAUP and COMPONENT) and PHYLIP style

tree files (including those produced by fast DNAml andCLUSTALW).

3.2.1.3. GeneWiz browser 0.94 serverGeneWiz browser 0.94 server is an interactive webapplication for visualizing genomic data of prokaryoticchromosomes. The tool allows users to carry out variousanalyses such as mapping alignments of homologous genesto other genomes, mapping of short sequencing reads to areference chromosome and calculating DNA properties suchas curvature or stac k-ing energy along the chromosome(Tamura et al., 2007). The GeneWiz browser produces aninteractive graphic that enables zooming from a global scaledown to single nucleotides without changing the size of theplot. Its ability to disproportionally zoom provides optimalreadability and increased functionality compared to otherbrowsers. It allows the user to select the display of variousgenomic features such as color setting and data ranges.Custom numerical data can be added to the plot allowing,for example, visualization of gene expression and regulationdata. Further, standard atlases are pre-generated for allprokaryotic genomes available in GenBank, providing a fastoverview of all available genomes, including recentlydeposited genome sequences. The tool is available onlinefrom (http://www.cbs.dtu.dk/services/gwBrowser).

3.2.1.4. Microbial Genome Annotation ToolsGLIMMER is a system for finding genes in microbial DNA,especially the genomes of bacteria and archaea. GLIMMER(Gene Locator and Interpolated Markov ModelER) usesinterpolated Markov models to identify coding regions(elcher et al., 1999), (http://www.ncbi.nlm.nih.gov/genomes/MICROBES/glimmer_3.cgi?).

3.2.2. Conservation and diversity of functionalclasses of proteins between the subspecies ofTreponemaRecent advances in high-throughput structural determinationtechniques and structural genomics initiatives haveproduced an increase in volume of structural data forproteins prior to knowledge of their functions. With theseadvances several tools are developed rapidly to predictfunctions for proteins based on their sequence similarity.

3.2.2.1. PrositePROSITE is a database of protein, currently containspatterns and profiles specific for more than a thousandprotein families or domains. It is based on the observationthat large number of different proteins can be grouped onthe basis of similarities in their sequences, into a limitednumber of families. Proteins or protein domains belonging toa particular family generally share functional attributes andare derived from a common ancestor. The ProRule sectionof PROSITE is constituted of manually created rules thatcan automatically generate annotation inthe UniProtKB/Swiss-Prot format based on PROSITE motifs.These rules, most of the times rules are based on PROSITEprofiles as they are more specific than patterns, butoccasionally rules make use of patterns. In these cases, therules will not work independently, but will be called byanother rule, which will be triggered by a profile. In additionto these rules corresponding to a unique PROSITE motif,there are also rules triggered by a specific combination ofPROSITE motifs called metamotifs. Metamotifs allow thedefinition of arrangements of domains separated by spacersof variable size, as well as the anchoring to the N- and/or C-termini and the exclusion of a PROSITE motif (Sigrist et al.,2010). ProRule is used to create UniProtKB/Swiss-Prot lineswith basic and complex annotation derived from the

SCIENTOMETRICInterpreting the functional content of a given genomic sequence is one of the central challenges of biology today. Perhaps the most promisingapproach to this problem is based on the comparative method of classic biology in the modern guise of sequence comparison. For instance, protein-coding regions tend to be conserved between species. Hence, a simple method for distinguishing a functional exon from the chance absence of stopcodons is to investigate its homologue from closely related species.

Citation analysis:It is the examination ofthe frequency, patterns,and graphsof citations in articles and books. It usescitations in scholarlyworks to establish linksto other works or otherresearchers. Citationanalysis is one of themost widely usedmethodsof bibliometrics.

Forward genetics:It involves studying

genes one at a time.Only a small minority ofgenes are uniquelyassociated with aneasily definablephenotype - acharacteristic that iscritical for determininggene function byforward genetics.

RESEARCH • SPIROCHAETE BACTERIUM

Anjaneyulu K et al.In silico Comparative Genomics of Treponema,Species, 2012, 1(1), 5-14, www.discovery.org.inhttp://www.discovery.org.in/s.htm © 2012 discovery publication. All rights reserved

6

reveal the genomic properties of the species, identify theidentical sequences of proteins and analyze their functionalsites.

2.1 Scope of the StudyThe aim of the study is to allocate the common anddifferentiating functional components encoded within thegenomes of the five species of Treponema. Researchreview signifies the similarities and differences atmorphological level as, Treponema pallidum subspeciesare morphologically and serologically indistinguishable. Themode of transmission is not unique in nature. The course ofeach disease is significantly variable. The outer membraneof T. pallidum has too few surface proteins for an antibody tobe effective. Thus due to poor antigencity, it’s diagnosis andtreatment through antibodies (vaccines) is difficult. Themolecular analysis at sequence point such as on their modeof pathogencity and study the clinical significance of thesespecies. Comparative genome, gene and protein analysis ofthe five subspecies of their findings the similarity anddifferences may be useful for future research

2.2 Limitations of the Study Study undertaken is limited to three years The genomes of the species Treponema was

available during the timeline of this research study We did the citation analysis based on the Secondary

information available in the databases In this study we did not include citiation analysis

based on the invitro findings

3. MATERIALS AND METHODS3 .1. Materials3.1.1. Genome SequencesGenomes of the Treponema pallidum subsps namelyTreponema pallidum subsps. pallidum DAL1 (species A),Treponema pallidum subsps. pallidum SS14 (species B),Treponema pallidum subsps. pallidum str. Chicago (speciesC), Treponema pallidum subsps. pallidum str. Nichols(species D) and Treponema pallidum subsps. pallidum str.CDC2 (species E) were selected for analysis andabbrivated as above. Genome sequences of all theTreponema subspecies and genome statistics werecollected from the Genome sequence database maintainedat the National center for Biotechnology Information(National Institutes of Health, Bethesda, Md.). This resourseorganizes information on genomes including sequences,maps, chromosomes, assemblies and annotations(http://www.ncbi.nlm.nih. gov/sites/entrez?Db=genome).

3.2. Research Methodology3.2.1. Sequence analysis: detection andinterpretation of varying levels of genomesequence similarity3.2.1.1. Clustal W Multiple Wise Alignments ProgramClustalW2 is a general purpose multiple sequence alignmentprogram for DNA or proteins. It attempts to calculate thebest match for the selected sequences and lines them up sothat the identities, similarities and differences can be seen(http://www.ebi.ac.uk/Tools/msa/clustalw2/#).

3.2.1.2. Tree View SoftwarePhylogenetic trees were constructed using the CLUSTALWprograms (Sigrist et al., 2005) with the neighbor-joining andleast squares (Fitch-Margoliash) methods, accompanied bybootstrap analysis (De Castro et al., 2005). Tree View is aprogram for displaying and printing phylogenies. Theprogram reads most NEXUS tree files (such as thoseproduced by PAUP and COMPONENT) and PHYLIP style

tree files (including those produced by fast DNAml andCLUSTALW).

3.2.1.3. GeneWiz browser 0.94 serverGeneWiz browser 0.94 server is an interactive webapplication for visualizing genomic data of prokaryoticchromosomes. The tool allows users to carry out variousanalyses such as mapping alignments of homologous genesto other genomes, mapping of short sequencing reads to areference chromosome and calculating DNA properties suchas curvature or stac k-ing energy along the chromosome(Tamura et al., 2007). The GeneWiz browser produces aninteractive graphic that enables zooming from a global scaledown to single nucleotides without changing the size of theplot. Its ability to disproportionally zoom provides optimalreadability and increased functionality compared to otherbrowsers. It allows the user to select the display of variousgenomic features such as color setting and data ranges.Custom numerical data can be added to the plot allowing,for example, visualization of gene expression and regulationdata. Further, standard atlases are pre-generated for allprokaryotic genomes available in GenBank, providing a fastoverview of all available genomes, including recentlydeposited genome sequences. The tool is available onlinefrom (http://www.cbs.dtu.dk/services/gwBrowser).

3.2.1.4. Microbial Genome Annotation ToolsGLIMMER is a system for finding genes in microbial DNA,especially the genomes of bacteria and archaea. GLIMMER(Gene Locator and Interpolated Markov ModelER) usesinterpolated Markov models to identify coding regions(elcher et al., 1999), (http://www.ncbi.nlm.nih.gov/genomes/MICROBES/glimmer_3.cgi?).

3.2.2. Conservation and diversity of functionalclasses of proteins between the subspecies ofTreponemaRecent advances in high-throughput structural determinationtechniques and structural genomics initiatives haveproduced an increase in volume of structural data forproteins prior to knowledge of their functions. With theseadvances several tools are developed rapidly to predictfunctions for proteins based on their sequence similarity.

3.2.2.1. PrositePROSITE is a database of protein, currently containspatterns and profiles specific for more than a thousandprotein families or domains. It is based on the observationthat large number of different proteins can be grouped onthe basis of similarities in their sequences, into a limitednumber of families. Proteins or protein domains belonging toa particular family generally share functional attributes andare derived from a common ancestor. The ProRule sectionof PROSITE is constituted of manually created rules thatcan automatically generate annotation inthe UniProtKB/Swiss-Prot format based on PROSITE motifs.These rules, most of the times rules are based on PROSITEprofiles as they are more specific than patterns, butoccasionally rules make use of patterns. In these cases, therules will not work independently, but will be called byanother rule, which will be triggered by a profile. In additionto these rules corresponding to a unique PROSITE motif,there are also rules triggered by a specific combination ofPROSITE motifs called metamotifs. Metamotifs allow thedefinition of arrangements of domains separated by spacersof variable size, as well as the anchoring to the N- and/or C-termini and the exclusion of a PROSITE motif (Sigrist et al.,2010). ProRule is used to create UniProtKB/Swiss-Prot lineswith basic and complex annotation derived from the

SCIENTOMETRICInterpreting the functional content of a given genomic sequence is one of the central challenges of biology today. Perhaps the most promisingapproach to this problem is based on the comparative method of classic biology in the modern guise of sequence comparison. For instance, protein-coding regions tend to be conserved between species. Hence, a simple method for distinguishing a functional exon from the chance absence of stopcodons is to investigate its homologue from closely related species.

Citation analysis:It is the examination ofthe frequency, patterns,and graphsof citations in articles and books. It usescitations in scholarlyworks to establish linksto other works or otherresearchers. Citationanalysis is one of themost widely usedmethodsof bibliometrics.

Forward genetics:It involves studying

genes one at a time.Only a small minority ofgenes are uniquelyassociated with aneasily definablephenotype - acharacteristic that iscritical for determininggene function byforward genetics.

RESEARCH • SPIROCHAETE BACTERIUM

Anjaneyulu K et al.In silico Comparative Genomics of Treponema,Species, 2012, 1(1), 5-14, www.discovery.org.inhttp://www.discovery.org.in/s.htm © 2012 discovery publication. All rights reserved

6

reveal the genomic properties of the species, identify theidentical sequences of proteins and analyze their functionalsites.

2.1 Scope of the StudyThe aim of the study is to allocate the common anddifferentiating functional components encoded within thegenomes of the five species of Treponema. Researchreview signifies the similarities and differences atmorphological level as, Treponema pallidum subspeciesare morphologically and serologically indistinguishable. Themode of transmission is not unique in nature. The course ofeach disease is significantly variable. The outer membraneof T. pallidum has too few surface proteins for an antibody tobe effective. Thus due to poor antigencity, it’s diagnosis andtreatment through antibodies (vaccines) is difficult. Themolecular analysis at sequence point such as on their modeof pathogencity and study the clinical significance of thesespecies. Comparative genome, gene and protein analysis ofthe five subspecies of their findings the similarity anddifferences may be useful for future research

2.2 Limitations of the Study Study undertaken is limited to three years The genomes of the species Treponema was

available during the timeline of this research study We did the citation analysis based on the Secondary

information available in the databases In this study we did not include citiation analysis

based on the invitro findings

3. MATERIALS AND METHODS3 .1. Materials3.1.1. Genome SequencesGenomes of the Treponema pallidum subsps namelyTreponema pallidum subsps. pallidum DAL1 (species A),Treponema pallidum subsps. pallidum SS14 (species B),Treponema pallidum subsps. pallidum str. Chicago (speciesC), Treponema pallidum subsps. pallidum str. Nichols(species D) and Treponema pallidum subsps. pallidum str.CDC2 (species E) were selected for analysis andabbrivated as above. Genome sequences of all theTreponema subspecies and genome statistics werecollected from the Genome sequence database maintainedat the National center for Biotechnology Information(National Institutes of Health, Bethesda, Md.). This resourseorganizes information on genomes including sequences,maps, chromosomes, assemblies and annotations(http://www.ncbi.nlm.nih. gov/sites/entrez?Db=genome).

3.2. Research Methodology3.2.1. Sequence analysis: detection andinterpretation of varying levels of genomesequence similarity3.2.1.1. Clustal W Multiple Wise Alignments ProgramClustalW2 is a general purpose multiple sequence alignmentprogram for DNA or proteins. It attempts to calculate thebest match for the selected sequences and lines them up sothat the identities, similarities and differences can be seen(http://www.ebi.ac.uk/Tools/msa/clustalw2/#).

3.2.1.2. Tree View SoftwarePhylogenetic trees were constructed using the CLUSTALWprograms (Sigrist et al., 2005) with the neighbor-joining andleast squares (Fitch-Margoliash) methods, accompanied bybootstrap analysis (De Castro et al., 2005). Tree View is aprogram for displaying and printing phylogenies. Theprogram reads most NEXUS tree files (such as thoseproduced by PAUP and COMPONENT) and PHYLIP style

tree files (including those produced by fast DNAml andCLUSTALW).

3.2.1.3. GeneWiz browser 0.94 serverGeneWiz browser 0.94 server is an interactive webapplication for visualizing genomic data of prokaryoticchromosomes. The tool allows users to carry out variousanalyses such as mapping alignments of homologous genesto other genomes, mapping of short sequencing reads to areference chromosome and calculating DNA properties suchas curvature or stac k-ing energy along the chromosome(Tamura et al., 2007). The GeneWiz browser produces aninteractive graphic that enables zooming from a global scaledown to single nucleotides without changing the size of theplot. Its ability to disproportionally zoom provides optimalreadability and increased functionality compared to otherbrowsers. It allows the user to select the display of variousgenomic features such as color setting and data ranges.Custom numerical data can be added to the plot allowing,for example, visualization of gene expression and regulationdata. Further, standard atlases are pre-generated for allprokaryotic genomes available in GenBank, providing a fastoverview of all available genomes, including recentlydeposited genome sequences. The tool is available onlinefrom (http://www.cbs.dtu.dk/services/gwBrowser).

3.2.1.4. Microbial Genome Annotation ToolsGLIMMER is a system for finding genes in microbial DNA,especially the genomes of bacteria and archaea. GLIMMER(Gene Locator and Interpolated Markov ModelER) usesinterpolated Markov models to identify coding regions(elcher et al., 1999), (http://www.ncbi.nlm.nih.gov/genomes/MICROBES/glimmer_3.cgi?).

3.2.2. Conservation and diversity of functionalclasses of proteins between the subspecies ofTreponemaRecent advances in high-throughput structural determinationtechniques and structural genomics initiatives haveproduced an increase in volume of structural data forproteins prior to knowledge of their functions. With theseadvances several tools are developed rapidly to predictfunctions for proteins based on their sequence similarity.

3.2.2.1. PrositePROSITE is a database of protein, currently containspatterns and profiles specific for more than a thousandprotein families or domains. It is based on the observationthat large number of different proteins can be grouped onthe basis of similarities in their sequences, into a limitednumber of families. Proteins or protein domains belonging toa particular family generally share functional attributes andare derived from a common ancestor. The ProRule sectionof PROSITE is constituted of manually created rules thatcan automatically generate annotation inthe UniProtKB/Swiss-Prot format based on PROSITE motifs.These rules, most of the times rules are based on PROSITEprofiles as they are more specific than patterns, butoccasionally rules make use of patterns. In these cases, therules will not work independently, but will be called byanother rule, which will be triggered by a profile. In additionto these rules corresponding to a unique PROSITE motif,there are also rules triggered by a specific combination ofPROSITE motifs called metamotifs. Metamotifs allow thedefinition of arrangements of domains separated by spacersof variable size, as well as the anchoring to the N- and/or C-termini and the exclusion of a PROSITE motif (Sigrist et al.,2010). ProRule is used to create UniProtKB/Swiss-Prot lineswith basic and complex annotation derived from the

SCIENTOMETRICInterpreting the functional content of a given genomic sequence is one of the central challenges of biology today. Perhaps the most promisingapproach to this problem is based on the comparative method of classic biology in the modern guise of sequence comparison. For instance, protein-coding regions tend to be conserved between species. Hence, a simple method for distinguishing a functional exon from the chance absence of stopcodons is to investigate its homologue from closely related species.

Citation analysis:It is the examination ofthe frequency, patterns,and graphsof citations in articles and books. It usescitations in scholarlyworks to establish linksto other works or otherresearchers. Citationanalysis is one of themost widely usedmethodsof bibliometrics.

Forward genetics:It involves studying

genes one at a time.Only a small minority ofgenes are uniquelyassociated with aneasily definablephenotype - acharacteristic that iscritical for determininggene function byforward genetics.

RESEARCH • SPIROCHAETE BACTERIUM

Anjaneyulu K et al.In silico Comparative Genomics of Treponema,Species, 2012, 1(1), 5-14, www.discovery.org.inhttp://www.discovery.org.in/s.htm © 2012 discovery publication. All rights reserved

7

presence of the domain and ofbiologically critical aminoacids: domain name andboundaries, EC number,function, keywords, associatedPROSITE patterns, PTMs,active sites, disulfide bonds,etc.). ProRule contains notablythe position of structurallyand/or functionally criticalamino acid(s), as well as thecondition(s) they must fulfil to

play their biological role(s). Part of these supplementarydata are used by ScanProsite that not only provides theprotein sequence matched by a profile, but also informationabout the relevance of biologically meaningful residues, likeactive sites, binding sites, post-translational modificationsites or disulfide bonds, to help function determination.

4. RESULTS AND DISCUSSION4.1. Comparative Genome AnalysisCompletely automated computational analysis of genomesequences of five subspecies of Treponema pallidum wasobtained from NCBI to compare the basic properties ofgenes of these species. The size of the genome was foundto be 1.4Mb for all species under analysis. Table 1 clearlyindicated the result of the comparative analysis of the

subspecies the content of GC% is almost same about52.8%. Number of genes in species A, C and E were in therange of 1118 to 1122 whereas species B and D hadcomparatively differed to some extent. Number of proteinswas almost same in number in species A and E respectively;and B and D. Species C showed less number of proteinscounting only till 981 proteins.

4.2. Tree View Software AnalysisIn addition to the species discrimination, it was interesting toexplore whether using sequence features to discriminatebetween bacterial subspecies by machine learning willprovide an accurate phylogenetic relationship between thesubspecies as documented in Fig 1.

4.3. Genewiz Browser Results4.3.1. Treponema pallidum subsp. pallidumDAL-1The Lineage: Bacteria - Spirochaetes - Spirochaetales -Spirochaetaceae; Treponema - Treponema pallidum -Treponema pallidum subsp. pallidum - Treponema pallidumsubsp. pallidum DAL1.

Treponema pallidum subsp. pallidum DAL1: This organismis the causative agent of endemic and venereal syphilis.This sexual transmitted disease was first discovered inEurope at the end of the fifteenth century, however, thecausative agent was not identified until 1905. At one timesyphilis was the third most commonly reportedcommunicable disease in the USA. Syphilis is characterizedby multiple clinical stages and long periods of latent,asymptomatic infection. Although effective therapies havebeen available since the introduction of penicillin, syphilisremains a global health problem. Treponemapallidumsubsp. pallidum str. Dallas1. This strain will be usedfor comparative analysis, Fig.2 shows the Genome map ofTreponema palladium DAL1.

Lane 1 = feature lane (annotations), Lane 2 = nucleotides

TABLE 1 GENOME PROPERTIES

Figure 1Phylogenetic Tree Depicting the Relationships between T.Pallidum subspecies

Figure 2Genome map of Treponema palladium DAL1

RESEARCH • SPIROCHAETE BACTERIUM

Anjaneyulu K et al.In silico Comparative Genomics of Treponema,Species, 2012, 1(1), 5-14, www.discovery.org.inhttp://www.discovery.org.in/s.htm © 2012 discovery publication. All rights reserved

7

presence of the domain and ofbiologically critical aminoacids: domain name andboundaries, EC number,function, keywords, associatedPROSITE patterns, PTMs,active sites, disulfide bonds,etc.). ProRule contains notablythe position of structurallyand/or functionally criticalamino acid(s), as well as thecondition(s) they must fulfil to

play their biological role(s). Part of these supplementarydata are used by ScanProsite that not only provides theprotein sequence matched by a profile, but also informationabout the relevance of biologically meaningful residues, likeactive sites, binding sites, post-translational modificationsites or disulfide bonds, to help function determination.

4. RESULTS AND DISCUSSION4.1. Comparative Genome AnalysisCompletely automated computational analysis of genomesequences of five subspecies of Treponema pallidum wasobtained from NCBI to compare the basic properties ofgenes of these species. The size of the genome was foundto be 1.4Mb for all species under analysis. Table 1 clearlyindicated the result of the comparative analysis of the

subspecies the content of GC% is almost same about52.8%. Number of genes in species A, C and E were in therange of 1118 to 1122 whereas species B and D hadcomparatively differed to some extent. Number of proteinswas almost same in number in species A and E respectively;and B and D. Species C showed less number of proteinscounting only till 981 proteins.

4.2. Tree View Software AnalysisIn addition to the species discrimination, it was interesting toexplore whether using sequence features to discriminatebetween bacterial subspecies by machine learning willprovide an accurate phylogenetic relationship between thesubspecies as documented in Fig 1.

4.3. Genewiz Browser Results4.3.1. Treponema pallidum subsp. pallidumDAL-1The Lineage: Bacteria - Spirochaetes - Spirochaetales -Spirochaetaceae; Treponema - Treponema pallidum -Treponema pallidum subsp. pallidum - Treponema pallidumsubsp. pallidum DAL1.

Treponema pallidum subsp. pallidum DAL1: This organismis the causative agent of endemic and venereal syphilis.This sexual transmitted disease was first discovered inEurope at the end of the fifteenth century, however, thecausative agent was not identified until 1905. At one timesyphilis was the third most commonly reportedcommunicable disease in the USA. Syphilis is characterizedby multiple clinical stages and long periods of latent,asymptomatic infection. Although effective therapies havebeen available since the introduction of penicillin, syphilisremains a global health problem. Treponemapallidumsubsp. pallidum str. Dallas1. This strain will be usedfor comparative analysis, Fig.2 shows the Genome map ofTreponema palladium DAL1.

Lane 1 = feature lane (annotations), Lane 2 = nucleotides

TABLE 1 GENOME PROPERTIES

Figure 1Phylogenetic Tree Depicting the Relationships between T.Pallidum subspecies

pallidumDAL-1

pallidumSS14

pallidumstrChicago

pallidumstNichols

pertenuestrCDC2

Figure 2Genome map of Treponema palladium DAL1

RESEARCH • SPIROCHAETE BACTERIUM

Anjaneyulu K et al.In silico Comparative Genomics of Treponema,Species, 2012, 1(1), 5-14, www.discovery.org.inhttp://www.discovery.org.in/s.htm © 2012 discovery publication. All rights reserved

7

presence of the domain and ofbiologically critical aminoacids: domain name andboundaries, EC number,function, keywords, associatedPROSITE patterns, PTMs,active sites, disulfide bonds,etc.). ProRule contains notablythe position of structurallyand/or functionally criticalamino acid(s), as well as thecondition(s) they must fulfil to

play their biological role(s). Part of these supplementarydata are used by ScanProsite that not only provides theprotein sequence matched by a profile, but also informationabout the relevance of biologically meaningful residues, likeactive sites, binding sites, post-translational modificationsites or disulfide bonds, to help function determination.

4. RESULTS AND DISCUSSION4.1. Comparative Genome AnalysisCompletely automated computational analysis of genomesequences of five subspecies of Treponema pallidum wasobtained from NCBI to compare the basic properties ofgenes of these species. The size of the genome was foundto be 1.4Mb for all species under analysis. Table 1 clearlyindicated the result of the comparative analysis of the

subspecies the content of GC% is almost same about52.8%. Number of genes in species A, C and E were in therange of 1118 to 1122 whereas species B and D hadcomparatively differed to some extent. Number of proteinswas almost same in number in species A and E respectively;and B and D. Species C showed less number of proteinscounting only till 981 proteins.

4.2. Tree View Software AnalysisIn addition to the species discrimination, it was interesting toexplore whether using sequence features to discriminatebetween bacterial subspecies by machine learning willprovide an accurate phylogenetic relationship between thesubspecies as documented in Fig 1.

4.3. Genewiz Browser Results4.3.1. Treponema pallidum subsp. pallidumDAL-1The Lineage: Bacteria - Spirochaetes - Spirochaetales -Spirochaetaceae; Treponema - Treponema pallidum -Treponema pallidum subsp. pallidum - Treponema pallidumsubsp. pallidum DAL1.

Treponema pallidum subsp. pallidum DAL1: This organismis the causative agent of endemic and venereal syphilis.This sexual transmitted disease was first discovered inEurope at the end of the fifteenth century, however, thecausative agent was not identified until 1905. At one timesyphilis was the third most commonly reportedcommunicable disease in the USA. Syphilis is characterizedby multiple clinical stages and long periods of latent,asymptomatic infection. Although effective therapies havebeen available since the introduction of penicillin, syphilisremains a global health problem. Treponemapallidumsubsp. pallidum str. Dallas1. This strain will be usedfor comparative analysis, Fig.2 shows the Genome map ofTreponema palladium DAL1.

Lane 1 = feature lane (annotations), Lane 2 = nucleotides

TABLE 1 GENOME PROPERTIES

Figure 1Phylogenetic Tree Depicting the Relationships between T.Pallidum subspecies

pallidumDAL-1

pallidumSS14

pallidumstrChicago

pallidumstNichols

pertenuestrCDC2

Figure 2Genome map of Treponema palladium DAL1

RESEARCH • SPIROCHAETE BACTERIUM

Anjaneyulu K et al.In silico Comparative Genomics of Treponema,Species, 2012, 1(1), 5-14, www.discovery.org.inhttp://www.discovery.org.in/s.htm © 2012 discovery publication. All rights reserved

8

Lane 3 = intrinsic curvature Lane 4 = stacking energy Lane 5 = positional preferences

Lanes 6 and 7 = Global direct repeats and globalinverted repeats

Lane 8 = GC skew Lane 9 = percent AT Lanes 10, 11, 12 and 13 = A, T, G and C content

respectively Lanes 14, 15, 16 and 17= AAAA, TTTT, GGGG

and CCCC repeats respectively Lane 18 = AT skew Lanes 19 and 20 = direct repeats and simple

repeats

Genes in lines are color-coded according to the followingcategory:

Wine Red = The genes involved in centralmetabolism and respiration without orthologues inH.pyloricyan, methyl-accepting chemotaxisproteins (MCPs)

Dark Blue = Type IV secretion system Sky Blue = Genes involved in acid acclimation Green = Putative secreted virulence factors Pale Green = Glycosyltransferse gene cluster

specific of H.bizzozeronii;Pale Grey = All other CDSs. ACC, acetophenonecarboxylase; comB, Type IV secretion system; NAP,periplasmic nitrate reductase; AHD, allophanate hydrolase;GT, glycosyltransferase; NRS, nitrite reductase system;SNO, S and N oxidases; FDH, formate reductase system;PL, polysaccharide lyase

4.3.2. Treponema pallidum subsp. pallidumSS14The Lineage: Bacteria - Spirochaetes - Spirochaetales -Spirochaetaceae; Treponema - Treponema pallidum -Treponema pallidum subsp. pallidum - Treponema pallidumsubsp. pallidum SS14.Treponema pallidum subsp. pallidum SS14: This organismis the causative agent of endemic and venereal syphilis.This sexual transmitted disease was first discovered inEurope at the end of the fifteenth century; however, thecausative agent was not identified until 1905. At one timesyphilis was the third most commonly reportedcommunicable disease in the USA. Syphilis is characterizedby multiple clinical stages and long periods of latent,asymptomatic infection. Although effective therapies havebeen available since the introduction of penicillin, syphilisremains a global health problem. Treponema pallidumsubsp. pallidum SS14. Treponemapallidum subsp. pallidumSS14 was isolated in 1977 from a patient with secondarysyphilis. This strain is less susceptible than the Nicholsstrain for a number of antibiotics and will be used forcomparative analysis. Fig.3 shows the Genome map ofTreponema palladium SS14.

4.3.3. Treponema pallidum subsp. pallidum str.ChicagoThe Lineage: Bacteria - Spirochaetes - Spirochaetales -Spirochaetaceae; Treponema - Treponema pallidum -Treponema pallidum subsp. pallidum - Treponema pallidumsubsp. pallidum str. Chicago.Treponema pallidum subsp. pallidum str. Chicago: Theavailability of more Treponema pallidum genomes willgreatly help comparative studies among isolates; facilitatethe improvement of typing methods and the identification ofpotential targets to be used as protective antigens. Fig. 4shows the genome map of Treponema palladium str.Chicago.

4.3.4. Treponema pallidum subsp. pallidum str.NicholsThe Lineage: Bacteria - Spirochaetes - Spirochaetales -Spirochaetaceae; Treponema - Treponema pallidum -Treponema pallidum subsp. pallidum - Treponema pallidumsubsp. pallidum str. Nichols.

Figure 3

Genome map of Treponema palladium SS14

RESEARCH • SPIROCHAETE BACTERIUM

Anjaneyulu K et al.In silico Comparative Genomics of Treponema,Species, 2012, 1(1), 5-14, www.discovery.org.inhttp://www.discovery.org.in/s.htm © 2012 discovery publication. All rights reserved

8

Lane 3 = intrinsic curvature Lane 4 = stacking energy Lane 5 = positional preferences

Lanes 6 and 7 = Global direct repeats and globalinverted repeats

Lane 8 = GC skew Lane 9 = percent AT Lanes 10, 11, 12 and 13 = A, T, G and C content

respectively Lanes 14, 15, 16 and 17= AAAA, TTTT, GGGG

and CCCC repeats respectively Lane 18 = AT skew Lanes 19 and 20 = direct repeats and simple

repeats

Genes in lines are color-coded according to the followingcategory:

Wine Red = The genes involved in centralmetabolism and respiration without orthologues inH.pyloricyan, methyl-accepting chemotaxisproteins (MCPs)

Dark Blue = Type IV secretion system Sky Blue = Genes involved in acid acclimation Green = Putative secreted virulence factors Pale Green = Glycosyltransferse gene cluster

specific of H.bizzozeronii;Pale Grey = All other CDSs. ACC, acetophenonecarboxylase; comB, Type IV secretion system; NAP,periplasmic nitrate reductase; AHD, allophanate hydrolase;GT, glycosyltransferase; NRS, nitrite reductase system;SNO, S and N oxidases; FDH, formate reductase system;PL, polysaccharide lyase

4.3.2. Treponema pallidum subsp. pallidumSS14The Lineage: Bacteria - Spirochaetes - Spirochaetales -Spirochaetaceae; Treponema - Treponema pallidum -Treponema pallidum subsp. pallidum - Treponema pallidumsubsp. pallidum SS14.Treponema pallidum subsp. pallidum SS14: This organismis the causative agent of endemic and venereal syphilis.This sexual transmitted disease was first discovered inEurope at the end of the fifteenth century; however, thecausative agent was not identified until 1905. At one timesyphilis was the third most commonly reportedcommunicable disease in the USA. Syphilis is characterizedby multiple clinical stages and long periods of latent,asymptomatic infection. Although effective therapies havebeen available since the introduction of penicillin, syphilisremains a global health problem. Treponema pallidumsubsp. pallidum SS14. Treponemapallidum subsp. pallidumSS14 was isolated in 1977 from a patient with secondarysyphilis. This strain is less susceptible than the Nicholsstrain for a number of antibiotics and will be used forcomparative analysis. Fig.3 shows the Genome map ofTreponema palladium SS14.

4.3.3. Treponema pallidum subsp. pallidum str.ChicagoThe Lineage: Bacteria - Spirochaetes - Spirochaetales -Spirochaetaceae; Treponema - Treponema pallidum -Treponema pallidum subsp. pallidum - Treponema pallidumsubsp. pallidum str. Chicago.Treponema pallidum subsp. pallidum str. Chicago: Theavailability of more Treponema pallidum genomes willgreatly help comparative studies among isolates; facilitatethe improvement of typing methods and the identification ofpotential targets to be used as protective antigens. Fig. 4shows the genome map of Treponema palladium str.Chicago.

4.3.4. Treponema pallidum subsp. pallidum str.NicholsThe Lineage: Bacteria - Spirochaetes - Spirochaetales -Spirochaetaceae; Treponema - Treponema pallidum -Treponema pallidum subsp. pallidum - Treponema pallidumsubsp. pallidum str. Nichols.

Figure 3

Genome map of Treponema palladium SS14

RESEARCH • SPIROCHAETE BACTERIUM

Anjaneyulu K et al.In silico Comparative Genomics of Treponema,Species, 2012, 1(1), 5-14, www.discovery.org.inhttp://www.discovery.org.in/s.htm © 2012 discovery publication. All rights reserved

8

Lane 3 = intrinsic curvature Lane 4 = stacking energy Lane 5 = positional preferences

Lanes 6 and 7 = Global direct repeats and globalinverted repeats

Lane 8 = GC skew Lane 9 = percent AT Lanes 10, 11, 12 and 13 = A, T, G and C content

respectively Lanes 14, 15, 16 and 17= AAAA, TTTT, GGGG

and CCCC repeats respectively Lane 18 = AT skew Lanes 19 and 20 = direct repeats and simple

repeats

Genes in lines are color-coded according to the followingcategory:

Wine Red = The genes involved in centralmetabolism and respiration without orthologues inH.pyloricyan, methyl-accepting chemotaxisproteins (MCPs)

Dark Blue = Type IV secretion system Sky Blue = Genes involved in acid acclimation Green = Putative secreted virulence factors Pale Green = Glycosyltransferse gene cluster

specific of H.bizzozeronii;Pale Grey = All other CDSs. ACC, acetophenonecarboxylase; comB, Type IV secretion system; NAP,periplasmic nitrate reductase; AHD, allophanate hydrolase;GT, glycosyltransferase; NRS, nitrite reductase system;SNO, S and N oxidases; FDH, formate reductase system;PL, polysaccharide lyase

4.3.2. Treponema pallidum subsp. pallidumSS14The Lineage: Bacteria - Spirochaetes - Spirochaetales -Spirochaetaceae; Treponema - Treponema pallidum -Treponema pallidum subsp. pallidum - Treponema pallidumsubsp. pallidum SS14.Treponema pallidum subsp. pallidum SS14: This organismis the causative agent of endemic and venereal syphilis.This sexual transmitted disease was first discovered inEurope at the end of the fifteenth century; however, thecausative agent was not identified until 1905. At one timesyphilis was the third most commonly reportedcommunicable disease in the USA. Syphilis is characterizedby multiple clinical stages and long periods of latent,asymptomatic infection. Although effective therapies havebeen available since the introduction of penicillin, syphilisremains a global health problem. Treponema pallidumsubsp. pallidum SS14. Treponemapallidum subsp. pallidumSS14 was isolated in 1977 from a patient with secondarysyphilis. This strain is less susceptible than the Nicholsstrain for a number of antibiotics and will be used forcomparative analysis. Fig.3 shows the Genome map ofTreponema palladium SS14.

4.3.3. Treponema pallidum subsp. pallidum str.ChicagoThe Lineage: Bacteria - Spirochaetes - Spirochaetales -Spirochaetaceae; Treponema - Treponema pallidum -Treponema pallidum subsp. pallidum - Treponema pallidumsubsp. pallidum str. Chicago.Treponema pallidum subsp. pallidum str. Chicago: Theavailability of more Treponema pallidum genomes willgreatly help comparative studies among isolates; facilitatethe improvement of typing methods and the identification ofpotential targets to be used as protective antigens. Fig. 4shows the genome map of Treponema palladium str.Chicago.

4.3.4. Treponema pallidum subsp. pallidum str.NicholsThe Lineage: Bacteria - Spirochaetes - Spirochaetales -Spirochaetaceae; Treponema - Treponema pallidum -Treponema pallidum subsp. pallidum - Treponema pallidumsubsp. pallidum str. Nichols.

Figure 3

Genome map of Treponema palladium SS14

RESEARCH • SPIROCHAETE BACTERIUM

Anjaneyulu K et al.In silico Comparative Genomics of Treponema,Species, 2012, 1(1), 5-14, www.discovery.org.inhttp://www.discovery.org.in/s.htm © 2012 discovery publication. All rights reserved

9

Treponema pallidum subsp. pallidum: This organism is thecausative agent of endemic and venereal syphilis. Thissexual transmitted disease was first discovered in Europe at

the end of the fifteenth century, however, the causativeagent was not identified until 1905. At one time syphilis wasthe third most commonly reported communicable disease inthe USA. Syphilis is characterized by multiple clinical stagesand long periods of latent, asymptomatic infection. Althougheffective therapies have been available since theintroduction of penicillin, syphilis remains a global healthproblem. Treponema pallidum subsp. pallidum strainNichols, this strain was originally isolated in 1912 from aneurosyphilitic patient and is virulent. Fig.5 shows thegenome map of Treponema palladium str. Nichols

4.3.5. Treponema pallidum subsp. pertenue str.CDC2The Lineage: Bacteria - Spirochaetes - Spirochaetales -Spirochaetaceae; Treponema - Treponema pallidum -Treponema pallidum subsp. pallidum - Treponema pallidumsubsp. pallidum str. CDC2.

Treponema pallidum subsp. pertenue: This species causeschronic and disfiguring illness called yaws. The diseasestarts as a skin infection causing persistent ulcers andprogresses to form tumor-like masses. This disease tends toinfect children and is common in rural areas in Africa,Southeast Asia and equatorial South America. Treponemapallidum subsp. pertenue str. CDC2, this strain was isolatedin Akorabo, Ghana in 1980 and will be used for comparativeanalysis. Fig. 6 shows the genome map of Treponemapalladium str. CDC2.

The collective analysis of the each of the genomecharacterization attained from the Genewiz Browsersummarized in the Table 2. This table illustrates thecomparative results obtained from Genewiz browser to studythe DNA characteristics of genomes. All the DNA propertiesof the five subspecies were found to be identical expect thedirect repeats and inverted repeats which distinguishedthem from each other. Direct repeats were similar in speciesB and E whereas other three species had difference innumber. Inverted repeats were identical in species A and Ewhile B and C showed a minute difference in number.

4.4. Comparative analysis of the Proteinspresent in Treponema SpeciesTable 2 shows the Comparative genome Analysis andProperties

4.4.1. Protein Sequence Alignment analysis4.4.1.1. Protein sequence with 100% SimilarityThe protein sequences were obtained from NCBI genomebrowser for each Treponema pallidum subspecies. About5061 sequences were compared to each other. Multiplesequence alignment was executed by using ClustalWsoftware. The table 3 denotes total number of sequences offive species having 100% similarity based on related type ofproteins. The analysis performed, resulted into 92sequences of these five species which showed 100%similarity when matched with each other. It was observedthat species D has the highest number of 43 sequencesmatched with other four species.Species C and D have themaximum 100% score alignment of 13 sequences whilespecies D and E and have 10 aligned similarsequences.Pairing between species A and E; and B and Dwere found to be having 10 sequences with complete similarprotein based sequences. Species B and E showed theleast number of 6 sequences aligned score of 100. Table 3shows the Total number of protein Sequences with alignedscore of 100% of five Treponema pallidum subspecies.

4.4.1.2. Protein sequence with 99% SimilarityThe protein sequences were obtained from NCBI genomebrowser for each Treponema pallidum subspecies. About5061 sequences were compared to each other. Multiplesequence alignment was executed by using ClustalW

Figure 4

Genome map of Treponema palladium str. Chicago

RESEARCH • SPIROCHAETE BACTERIUM

Anjaneyulu K et al.In silico Comparative Genomics of Treponema,Species, 2012, 1(1), 5-14, www.discovery.org.inhttp://www.discovery.org.in/s.htm © 2012 discovery publication. All rights reserved

9

Treponema pallidum subsp. pallidum: This organism is thecausative agent of endemic and venereal syphilis. Thissexual transmitted disease was first discovered in Europe at

the end of the fifteenth century, however, the causativeagent was not identified until 1905. At one time syphilis wasthe third most commonly reported communicable disease inthe USA. Syphilis is characterized by multiple clinical stagesand long periods of latent, asymptomatic infection. Althougheffective therapies have been available since theintroduction of penicillin, syphilis remains a global healthproblem. Treponema pallidum subsp. pallidum strainNichols, this strain was originally isolated in 1912 from aneurosyphilitic patient and is virulent. Fig.5 shows thegenome map of Treponema palladium str. Nichols

4.3.5. Treponema pallidum subsp. pertenue str.CDC2The Lineage: Bacteria - Spirochaetes - Spirochaetales -Spirochaetaceae; Treponema - Treponema pallidum -Treponema pallidum subsp. pallidum - Treponema pallidumsubsp. pallidum str. CDC2.

Treponema pallidum subsp. pertenue: This species causeschronic and disfiguring illness called yaws. The diseasestarts as a skin infection causing persistent ulcers andprogresses to form tumor-like masses. This disease tends toinfect children and is common in rural areas in Africa,Southeast Asia and equatorial South America. Treponemapallidum subsp. pertenue str. CDC2, this strain was isolatedin Akorabo, Ghana in 1980 and will be used for comparativeanalysis. Fig. 6 shows the genome map of Treponemapalladium str. CDC2.

The collective analysis of the each of the genomecharacterization attained from the Genewiz Browsersummarized in the Table 2. This table illustrates thecomparative results obtained from Genewiz browser to studythe DNA characteristics of genomes. All the DNA propertiesof the five subspecies were found to be identical expect thedirect repeats and inverted repeats which distinguishedthem from each other. Direct repeats were similar in speciesB and E whereas other three species had difference innumber. Inverted repeats were identical in species A and Ewhile B and C showed a minute difference in number.

4.4. Comparative analysis of the Proteinspresent in Treponema SpeciesTable 2 shows the Comparative genome Analysis andProperties

4.4.1. Protein Sequence Alignment analysis4.4.1.1. Protein sequence with 100% SimilarityThe protein sequences were obtained from NCBI genomebrowser for each Treponema pallidum subspecies. About5061 sequences were compared to each other. Multiplesequence alignment was executed by using ClustalWsoftware. The table 3 denotes total number of sequences offive species having 100% similarity based on related type ofproteins. The analysis performed, resulted into 92sequences of these five species which showed 100%similarity when matched with each other. It was observedthat species D has the highest number of 43 sequencesmatched with other four species.Species C and D have themaximum 100% score alignment of 13 sequences whilespecies D and E and have 10 aligned similarsequences.Pairing between species A and E; and B and Dwere found to be having 10 sequences with complete similarprotein based sequences. Species B and E showed theleast number of 6 sequences aligned score of 100. Table 3shows the Total number of protein Sequences with alignedscore of 100% of five Treponema pallidum subspecies.

4.4.1.2. Protein sequence with 99% SimilarityThe protein sequences were obtained from NCBI genomebrowser for each Treponema pallidum subspecies. About5061 sequences were compared to each other. Multiplesequence alignment was executed by using ClustalW

Figure 4

Genome map of Treponema palladium str. Chicago

RESEARCH • SPIROCHAETE BACTERIUM

Anjaneyulu K et al.In silico Comparative Genomics of Treponema,Species, 2012, 1(1), 5-14, www.discovery.org.inhttp://www.discovery.org.in/s.htm © 2012 discovery publication. All rights reserved

9

Treponema pallidum subsp. pallidum: This organism is thecausative agent of endemic and venereal syphilis. Thissexual transmitted disease was first discovered in Europe at

the end of the fifteenth century, however, the causativeagent was not identified until 1905. At one time syphilis wasthe third most commonly reported communicable disease inthe USA. Syphilis is characterized by multiple clinical stagesand long periods of latent, asymptomatic infection. Althougheffective therapies have been available since theintroduction of penicillin, syphilis remains a global healthproblem. Treponema pallidum subsp. pallidum strainNichols, this strain was originally isolated in 1912 from aneurosyphilitic patient and is virulent. Fig.5 shows thegenome map of Treponema palladium str. Nichols

4.3.5. Treponema pallidum subsp. pertenue str.CDC2The Lineage: Bacteria - Spirochaetes - Spirochaetales -Spirochaetaceae; Treponema - Treponema pallidum -Treponema pallidum subsp. pallidum - Treponema pallidumsubsp. pallidum str. CDC2.

Treponema pallidum subsp. pertenue: This species causeschronic and disfiguring illness called yaws. The diseasestarts as a skin infection causing persistent ulcers andprogresses to form tumor-like masses. This disease tends toinfect children and is common in rural areas in Africa,Southeast Asia and equatorial South America. Treponemapallidum subsp. pertenue str. CDC2, this strain was isolatedin Akorabo, Ghana in 1980 and will be used for comparativeanalysis. Fig. 6 shows the genome map of Treponemapalladium str. CDC2.

The collective analysis of the each of the genomecharacterization attained from the Genewiz Browsersummarized in the Table 2. This table illustrates thecomparative results obtained from Genewiz browser to studythe DNA characteristics of genomes. All the DNA propertiesof the five subspecies were found to be identical expect thedirect repeats and inverted repeats which distinguishedthem from each other. Direct repeats were similar in speciesB and E whereas other three species had difference innumber. Inverted repeats were identical in species A and Ewhile B and C showed a minute difference in number.

4.4. Comparative analysis of the Proteinspresent in Treponema SpeciesTable 2 shows the Comparative genome Analysis andProperties

4.4.1. Protein Sequence Alignment analysis4.4.1.1. Protein sequence with 100% SimilarityThe protein sequences were obtained from NCBI genomebrowser for each Treponema pallidum subspecies. About5061 sequences were compared to each other. Multiplesequence alignment was executed by using ClustalWsoftware. The table 3 denotes total number of sequences offive species having 100% similarity based on related type ofproteins. The analysis performed, resulted into 92sequences of these five species which showed 100%similarity when matched with each other. It was observedthat species D has the highest number of 43 sequencesmatched with other four species.Species C and D have themaximum 100% score alignment of 13 sequences whilespecies D and E and have 10 aligned similarsequences.Pairing between species A and E; and B and Dwere found to be having 10 sequences with complete similarprotein based sequences. Species B and E showed theleast number of 6 sequences aligned score of 100. Table 3shows the Total number of protein Sequences with alignedscore of 100% of five Treponema pallidum subspecies.

4.4.1.2. Protein sequence with 99% SimilarityThe protein sequences were obtained from NCBI genomebrowser for each Treponema pallidum subspecies. About5061 sequences were compared to each other. Multiplesequence alignment was executed by using ClustalW

Figure 4

Genome map of Treponema palladium str. Chicago

RESEARCH • SPIROCHAETE BACTERIUM

Anjaneyulu K et al.In silico Comparative Genomics of Treponema,Species, 2012, 1(1), 5-14, www.discovery.org.inhttp://www.discovery.org.in/s.htm © 2012 discovery publication. All rights reserved

10

software. The above table denotes total number ofsequences of five species having 99% similarity based onrelated type of proteins. The analysis resulted into 13sequences of these five species which showed 99%similarity when matched with each other. It was observed

that Species D and E have the most 99% similar sequencesabout 5 sequences. Species C and E have 4 sequenceswith 99% similar sequences. Table 4 shows the total numberof protein Sequences with aligned score of 99% of fiveTreponema pallidum subspecies

4.4.2. Comparative analysis of Protein Basedon functional categories4.4.2.1. Distribution of Proteins (100% Similarity)based on Functional categoriesUsing ClustalW software, 92 protein sequences were filteredbased on sequences having 100% similarity. Out of 92sequences, 26 types of different proteins were categorized.The above table details about the presence of a specifictype of protein in an single subspecies among 92 sequenceshaving 100% similarity.Analysis reveals that among allsimilar proteins, ribosomal proteins L15 and L30 andReplication initiator factor proteins were most common to allthe five subspecies of Treponema pallidum. Aspartylglutamyl / tRNA amidotransferase subunit C andhypothetical proteins were the other two types of proteinscommonly found in all five subspecies of Treponema.Special types of putative proteins were found in all fivespecies with different functional proteins. Aspartyl glutamyl/tRNA amidotransferase subunit A proteins and lipoproteinswere observed in all four species except species E. Proteinlike methionine aminopeptidase was found in species B, Cand D but not in A and E. Phosphoenol pyruvatecarboxykinase wa found in species C, D and E exceptSpecies A and B respectively Table 5 shows the Functionalcategories within the 100% similar protein sequences inTreponema pallidum subspecies (‘#’ indicates the presenceof hypothetical proteins with other type of protein accordingto the databases).

4.4.2.2. Distribution of proteins (99% similar) basedon Functional categoriesUsing ClustalW software, 13 protein sequences were filteredbased on sequences having 99% similarity. Out of 13sequences, 10 types of different proteins were categorized.The above table details about the presence of a specifictype of protein in a single subspecies among 13 sequenceshaving 100% similarity. Table 6 shows the Comparativeanalysis of based on proteins present in Treponemapallidum subspecies with 99% similarity sequences. SpeciesC and D had shown maximum similarity in Apolipoprotein N-acyltransferase protein and Alginate O-acetylation protein(algl). Species A and B have Spermidine/putrescine ABCsuperfamily ATP binding cassette transporter, ABC proteinand Species C and E have 30S ribosomal protein S9. Table6 shows the comparative analysis of based on proteinspresent in Treponema pallidum subspecies with 99%similarity sequences. (#’ indicates the presence ofhypothetical proteins with other type of protein according tothe databases).

4.5. Protein Functional Site AnalysisIt is apparent, when studying protein sequence families, thatsome regions have been better conserved than othersduring evolution. These conserved regions are generallyimportant for the three dimensional structure and function ofa protein. By analyzing the constant and variable propertiesof such groups of similar sequences, it is possible to derivea signature for a protein family or domain, whichdistinguishes its members from all other unrelated proteins.A significant analogy is to use the fingerprints foridentification. A fingerprint, a protein signature can be usedto assign a new protein to a specific family of proteins andthus to formulate hypotheses about its function.

4.5.1. Comparative Functional Sites in Proteins(100% Similar)92 proteins scanned with the prosite for predictind thefunctional sites and the locations. Out of these we have got28 functional hits. Majorly found are NHL repeat proteins,recombinase A protein and Spermidine/putrescine ABCsuperfamily ATP binding cassette transporter, ABC protein

Figure 5Genome map of Treponema palladium str. Nichols

RESEARCH • SPIROCHAETE BACTERIUM

Anjaneyulu K et al.In silico Comparative Genomics of Treponema,Species, 2012, 1(1), 5-14, www.discovery.org.inhttp://www.discovery.org.in/s.htm © 2012 discovery publication. All rights reserved

10

software. The above table denotes total number ofsequences of five species having 99% similarity based onrelated type of proteins. The analysis resulted into 13sequences of these five species which showed 99%similarity when matched with each other. It was observed

that Species D and E have the most 99% similar sequencesabout 5 sequences. Species C and E have 4 sequenceswith 99% similar sequences. Table 4 shows the total numberof protein Sequences with aligned score of 99% of fiveTreponema pallidum subspecies

4.4.2. Comparative analysis of Protein Basedon functional categories4.4.2.1. Distribution of Proteins (100% Similarity)based on Functional categoriesUsing ClustalW software, 92 protein sequences were filteredbased on sequences having 100% similarity. Out of 92sequences, 26 types of different proteins were categorized.The above table details about the presence of a specifictype of protein in an single subspecies among 92 sequenceshaving 100% similarity.Analysis reveals that among allsimilar proteins, ribosomal proteins L15 and L30 andReplication initiator factor proteins were most common to allthe five subspecies of Treponema pallidum. Aspartylglutamyl / tRNA amidotransferase subunit C andhypothetical proteins were the other two types of proteinscommonly found in all five subspecies of Treponema.Special types of putative proteins were found in all fivespecies with different functional proteins. Aspartyl glutamyl/tRNA amidotransferase subunit A proteins and lipoproteinswere observed in all four species except species E. Proteinlike methionine aminopeptidase was found in species B, Cand D but not in A and E. Phosphoenol pyruvatecarboxykinase wa found in species C, D and E exceptSpecies A and B respectively Table 5 shows the Functionalcategories within the 100% similar protein sequences inTreponema pallidum subspecies (‘#’ indicates the presenceof hypothetical proteins with other type of protein accordingto the databases).

4.4.2.2. Distribution of proteins (99% similar) basedon Functional categoriesUsing ClustalW software, 13 protein sequences were filteredbased on sequences having 99% similarity. Out of 13sequences, 10 types of different proteins were categorized.The above table details about the presence of a specifictype of protein in a single subspecies among 13 sequenceshaving 100% similarity. Table 6 shows the Comparativeanalysis of based on proteins present in Treponemapallidum subspecies with 99% similarity sequences. SpeciesC and D had shown maximum similarity in Apolipoprotein N-acyltransferase protein and Alginate O-acetylation protein(algl). Species A and B have Spermidine/putrescine ABCsuperfamily ATP binding cassette transporter, ABC proteinand Species C and E have 30S ribosomal protein S9. Table6 shows the comparative analysis of based on proteinspresent in Treponema pallidum subspecies with 99%similarity sequences. (#’ indicates the presence ofhypothetical proteins with other type of protein according tothe databases).

4.5. Protein Functional Site AnalysisIt is apparent, when studying protein sequence families, thatsome regions have been better conserved than othersduring evolution. These conserved regions are generallyimportant for the three dimensional structure and function ofa protein. By analyzing the constant and variable propertiesof such groups of similar sequences, it is possible to derivea signature for a protein family or domain, whichdistinguishes its members from all other unrelated proteins.A significant analogy is to use the fingerprints foridentification. A fingerprint, a protein signature can be usedto assign a new protein to a specific family of proteins andthus to formulate hypotheses about its function.

4.5.1. Comparative Functional Sites in Proteins(100% Similar)92 proteins scanned with the prosite for predictind thefunctional sites and the locations. Out of these we have got28 functional hits. Majorly found are NHL repeat proteins,recombinase A protein and Spermidine/putrescine ABCsuperfamily ATP binding cassette transporter, ABC protein

Figure 5Genome map of Treponema palladium str. Nichols

RESEARCH • SPIROCHAETE BACTERIUM

Anjaneyulu K et al.In silico Comparative Genomics of Treponema,Species, 2012, 1(1), 5-14, www.discovery.org.inhttp://www.discovery.org.in/s.htm © 2012 discovery publication. All rights reserved

10

software. The above table denotes total number ofsequences of five species having 99% similarity based onrelated type of proteins. The analysis resulted into 13sequences of these five species which showed 99%similarity when matched with each other. It was observed

that Species D and E have the most 99% similar sequencesabout 5 sequences. Species C and E have 4 sequenceswith 99% similar sequences. Table 4 shows the total numberof protein Sequences with aligned score of 99% of fiveTreponema pallidum subspecies

4.4.2. Comparative analysis of Protein Basedon functional categories4.4.2.1. Distribution of Proteins (100% Similarity)based on Functional categoriesUsing ClustalW software, 92 protein sequences were filteredbased on sequences having 100% similarity. Out of 92sequences, 26 types of different proteins were categorized.The above table details about the presence of a specifictype of protein in an single subspecies among 92 sequenceshaving 100% similarity.Analysis reveals that among allsimilar proteins, ribosomal proteins L15 and L30 andReplication initiator factor proteins were most common to allthe five subspecies of Treponema pallidum. Aspartylglutamyl / tRNA amidotransferase subunit C andhypothetical proteins were the other two types of proteinscommonly found in all five subspecies of Treponema.Special types of putative proteins were found in all fivespecies with different functional proteins. Aspartyl glutamyl/tRNA amidotransferase subunit A proteins and lipoproteinswere observed in all four species except species E. Proteinlike methionine aminopeptidase was found in species B, Cand D but not in A and E. Phosphoenol pyruvatecarboxykinase wa found in species C, D and E exceptSpecies A and B respectively Table 5 shows the Functionalcategories within the 100% similar protein sequences inTreponema pallidum subspecies (‘#’ indicates the presenceof hypothetical proteins with other type of protein accordingto the databases).

4.4.2.2. Distribution of proteins (99% similar) basedon Functional categoriesUsing ClustalW software, 13 protein sequences were filteredbased on sequences having 99% similarity. Out of 13sequences, 10 types of different proteins were categorized.The above table details about the presence of a specifictype of protein in a single subspecies among 13 sequenceshaving 100% similarity. Table 6 shows the Comparativeanalysis of based on proteins present in Treponemapallidum subspecies with 99% similarity sequences. SpeciesC and D had shown maximum similarity in Apolipoprotein N-acyltransferase protein and Alginate O-acetylation protein(algl). Species A and B have Spermidine/putrescine ABCsuperfamily ATP binding cassette transporter, ABC proteinand Species C and E have 30S ribosomal protein S9. Table6 shows the comparative analysis of based on proteinspresent in Treponema pallidum subspecies with 99%similarity sequences. (#’ indicates the presence ofhypothetical proteins with other type of protein according tothe databases).

4.5. Protein Functional Site AnalysisIt is apparent, when studying protein sequence families, thatsome regions have been better conserved than othersduring evolution. These conserved regions are generallyimportant for the three dimensional structure and function ofa protein. By analyzing the constant and variable propertiesof such groups of similar sequences, it is possible to derivea signature for a protein family or domain, whichdistinguishes its members from all other unrelated proteins.A significant analogy is to use the fingerprints foridentification. A fingerprint, a protein signature can be usedto assign a new protein to a specific family of proteins andthus to formulate hypotheses about its function.

4.5.1. Comparative Functional Sites in Proteins(100% Similar)92 proteins scanned with the prosite for predictind thefunctional sites and the locations. Out of these we have got28 functional hits. Majorly found are NHL repeat proteins,recombinase A protein and Spermidine/putrescine ABCsuperfamily ATP binding cassette transporter, ABC protein

Figure 5Genome map of Treponema palladium str. Nichols

RESEARCH • SPIROCHAETE BACTERIUM

Anjaneyulu K et al.In silico Comparative Genomics of Treponema,Species, 2012, 1(1), 5-14, www.discovery.org.inhttp://www.discovery.org.in/s.htm © 2012 discovery publication. All rights reserved

11

respectively. Table 7 shows the Predicted Functional site intotal of 92 proteins

4.5.2. Comparative Functional Sites in Proteins(99% Similar)

Figure 6Genome map of Treponema palladium str. CDC2

Table 2 Comparative Genome Analysis

Table 3 Total number of protein Sequences withaligned score of 100% of fiveTreponema pallidumsubspecies

Table 4 Total number of protein Sequences withaligned score of 99% of five Treponema pallidumsubspecies

RESEARCH • SPIROCHAETE BACTERIUM

Anjaneyulu K et al.In silico Comparative Genomics of Treponema,Species, 2012, 1(1), 5-14, www.discovery.org.inhttp://www.discovery.org.in/s.htm © 2012 discovery publication. All rights reserved

11

respectively. Table 7 shows the Predicted Functional site intotal of 92 proteins

4.5.2. Comparative Functional Sites in Proteins(99% Similar)

Figure 6Genome map of Treponema palladium str. CDC2

Table 2 Comparative Genome Analysis

Table 3 Total number of protein Sequences withaligned score of 100% of fiveTreponema pallidumsubspecies

Table 4 Total number of protein Sequences withaligned score of 99% of five Treponema pallidumsubspecies

RESEARCH • SPIROCHAETE BACTERIUM

Anjaneyulu K et al.In silico Comparative Genomics of Treponema,Species, 2012, 1(1), 5-14, www.discovery.org.inhttp://www.discovery.org.in/s.htm © 2012 discovery publication. All rights reserved

11

respectively. Table 7 shows the Predicted Functional site intotal of 92 proteins

4.5.2. Comparative Functional Sites in Proteins(99% Similar)

Figure 6Genome map of Treponema palladium str. CDC2

Table 2 Comparative Genome Analysis

Table 3 Total number of protein Sequences withaligned score of 100% of fiveTreponema pallidumsubspecies

Table 4 Total number of protein Sequences withaligned score of 99% of five Treponema pallidumsubspecies

RESEARCH • SPIROCHAETE BACTERIUM

Anjaneyulu K et al.In silico Comparative Genomics of Treponema,Species, 2012, 1(1), 5-14, www.discovery.org.inhttp://www.discovery.org.in/s.htm © 2012 discovery publication. All rights reserved

12

13 proteins scanned with the prosite for predicting thefunctional sites and the locations. Out of these we have got7 functional hits. Majorly found are ATP-binding cassette,ABC transporter-type domain and ATP- dependentnuclease, subunit A. Table 8: Predicted Functional site intotal of 13 proteins.

5. CONCLUSIONThe comparative analysis of five subspecies of Treponemapallidum was performed to study their similarity anddifferences on the basis of comparison between theirgenomic properties and various types of proteins. WithBioinformatics tools and software used for analysis we areenable to conclude the differentiating characters among thesubspecies of the Treponema. The genome sequences ofthe five subspecies of T.pallidum were extracted from NCBInamely Treponema pallidum DAL1 (species A), Treponemapallidum SS14 (species B), Treponem pallidum str. Chicago(species C), Treponema pallidum str. Nichols (species D)and Treponema pallidum str. CDC2 (species E). Analysis ofgenomes from species A, B, C, D and E were performed onthe basis of comparison with the genome sequencesobtained from NCBI about 5166 sequences. The genomicproperties and various types of proteins with their functionalsite were studied and compared among the five species tocollect the information on their similarity and differences.Multiple sequence alignment was performed using Clustal Wsoftware of 5166 * 5166 sequences. The sequences alignedwere filtered with the sequences having 100% and 99 %alignment scores based on the similar proteins present inthe five subspecies. It resulted into 92 sequences with 100%identical protein sequences and 13 sequences with 99%identical proteins respectively. The functional site wereobtained using Prositescan tool. 28 functional hits of 100%and 7 of 99% identical protein sequences were found tohave similar functions. The detailed study and research arein mentioned tables for better interpretations of results anddiscussion. According to Table 1, we can infer that speciesB and D have similar number of genes and proteinswhereas species A, C and E show similarity enomicproperties. From Table 2; it can be observed that all DNAproperties are similar in all five species. Species B and Eshow similar gradation in Direct repeats wherein species Aand E and species B and C have identical Inverted repeats.Multiple sequence alignment using ClustalW software,screened all the five subspecies sequences and henceresulted that species C and D have 13 identical sequencesof functional proteins with 100% alignment score andspecies D and E have 5 identical sequences of 99%similarity. The details are mentioned in Table 3 to 6respectively. Table 7 and 8 gives information about thefunctional site of 28 proteins (100% identical) and 7 proteins

(99% identical). The most analogous proteins are NHLrepeat proteins, recombinase A protein andSpermidine/putrescine ABC superfamily ATP bindingcassette transporter, ABC protein with 100% identicalsequences whereas ATP-binding cassette, ABCtransporter-type domain and ATP- dependent nuclease,subunit A with 99% identical sequences. Comparativegenomics analysis between the species revealed thatspecies B and D and species A and E are closely relatedto each other in their genomic composition while speciesC, D and E are similar in functional protein content. Bymeans of local sequence similarity searches, Proteinprofile searches, and analysis of 100% and 99% similarprotein funcational categories we have conducted adetailed comparative anal-ysis of the genomes of theT.pallidum. The level of conservation between functionalclasses and evolutionary measure, it was possible tocharacterize, in functional terms, the nature of thedivergence between the five spirochetes and the commonand distinct aspects of their physiological strategies.

(#’ indicates the presence of hypothetical proteins with other type of proteinaccording to the databases.)

Table 6 Comparative analysis of based on proteins present in Treponema pallidumsubspecies with 99% similarity sequences.

Table 5 Functional categories within the 100% similar protein sequences inTreponema pallidum subspecies

RESEARCH • SPIROCHAETE BACTERIUM

Anjaneyulu K et al.In silico Comparative Genomics of Treponema,Species, 2012, 1(1), 5-14, www.discovery.org.inhttp://www.discovery.org.in/s.htm © 2012 discovery publication. All rights reserved

12

13 proteins scanned with the prosite for predicting thefunctional sites and the locations. Out of these we have got7 functional hits. Majorly found are ATP-binding cassette,ABC transporter-type domain and ATP- dependentnuclease, subunit A. Table 8: Predicted Functional site intotal of 13 proteins.

5. CONCLUSIONThe comparative analysis of five subspecies of Treponemapallidum was performed to study their similarity anddifferences on the basis of comparison between theirgenomic properties and various types of proteins. WithBioinformatics tools and software used for analysis we areenable to conclude the differentiating characters among thesubspecies of the Treponema. The genome sequences ofthe five subspecies of T.pallidum were extracted from NCBInamely Treponema pallidum DAL1 (species A), Treponemapallidum SS14 (species B), Treponem pallidum str. Chicago(species C), Treponema pallidum str. Nichols (species D)and Treponema pallidum str. CDC2 (species E). Analysis ofgenomes from species A, B, C, D and E were performed onthe basis of comparison with the genome sequencesobtained from NCBI about 5166 sequences. The genomicproperties and various types of proteins with their functionalsite were studied and compared among the five species tocollect the information on their similarity and differences.Multiple sequence alignment was performed using Clustal Wsoftware of 5166 * 5166 sequences. The sequences alignedwere filtered with the sequences having 100% and 99 %alignment scores based on the similar proteins present inthe five subspecies. It resulted into 92 sequences with 100%identical protein sequences and 13 sequences with 99%identical proteins respectively. The functional site wereobtained using Prositescan tool. 28 functional hits of 100%and 7 of 99% identical protein sequences were found tohave similar functions. The detailed study and research arein mentioned tables for better interpretations of results anddiscussion. According to Table 1, we can infer that speciesB and D have similar number of genes and proteinswhereas species A, C and E show similarity enomicproperties. From Table 2; it can be observed that all DNAproperties are similar in all five species. Species B and Eshow similar gradation in Direct repeats wherein species Aand E and species B and C have identical Inverted repeats.Multiple sequence alignment using ClustalW software,screened all the five subspecies sequences and henceresulted that species C and D have 13 identical sequencesof functional proteins with 100% alignment score andspecies D and E have 5 identical sequences of 99%similarity. The details are mentioned in Table 3 to 6respectively. Table 7 and 8 gives information about thefunctional site of 28 proteins (100% identical) and 7 proteins

(99% identical). The most analogous proteins are NHLrepeat proteins, recombinase A protein andSpermidine/putrescine ABC superfamily ATP bindingcassette transporter, ABC protein with 100% identicalsequences whereas ATP-binding cassette, ABCtransporter-type domain and ATP- dependent nuclease,subunit A with 99% identical sequences. Comparativegenomics analysis between the species revealed thatspecies B and D and species A and E are closely relatedto each other in their genomic composition while speciesC, D and E are similar in functional protein content. Bymeans of local sequence similarity searches, Proteinprofile searches, and analysis of 100% and 99% similarprotein funcational categories we have conducted adetailed comparative anal-ysis of the genomes of theT.pallidum. The level of conservation between functionalclasses and evolutionary measure, it was possible tocharacterize, in functional terms, the nature of thedivergence between the five spirochetes and the commonand distinct aspects of their physiological strategies.

(#’ indicates the presence of hypothetical proteins with other type of proteinaccording to the databases.)

Table 6 Comparative analysis of based on proteins present in Treponema pallidumsubspecies with 99% similarity sequences.

Table 5 Functional categories within the 100% similar protein sequences inTreponema pallidum subspecies

RESEARCH • SPIROCHAETE BACTERIUM

Anjaneyulu K et al.In silico Comparative Genomics of Treponema,Species, 2012, 1(1), 5-14, www.discovery.org.inhttp://www.discovery.org.in/s.htm © 2012 discovery publication. All rights reserved

12

13 proteins scanned with the prosite for predicting thefunctional sites and the locations. Out of these we have got7 functional hits. Majorly found are ATP-binding cassette,ABC transporter-type domain and ATP- dependentnuclease, subunit A. Table 8: Predicted Functional site intotal of 13 proteins.

5. CONCLUSIONThe comparative analysis of five subspecies of Treponemapallidum was performed to study their similarity anddifferences on the basis of comparison between theirgenomic properties and various types of proteins. WithBioinformatics tools and software used for analysis we areenable to conclude the differentiating characters among thesubspecies of the Treponema. The genome sequences ofthe five subspecies of T.pallidum were extracted from NCBInamely Treponema pallidum DAL1 (species A), Treponemapallidum SS14 (species B), Treponem pallidum str. Chicago(species C), Treponema pallidum str. Nichols (species D)and Treponema pallidum str. CDC2 (species E). Analysis ofgenomes from species A, B, C, D and E were performed onthe basis of comparison with the genome sequencesobtained from NCBI about 5166 sequences. The genomicproperties and various types of proteins with their functionalsite were studied and compared among the five species tocollect the information on their similarity and differences.Multiple sequence alignment was performed using Clustal Wsoftware of 5166 * 5166 sequences. The sequences alignedwere filtered with the sequences having 100% and 99 %alignment scores based on the similar proteins present inthe five subspecies. It resulted into 92 sequences with 100%identical protein sequences and 13 sequences with 99%identical proteins respectively. The functional site wereobtained using Prositescan tool. 28 functional hits of 100%and 7 of 99% identical protein sequences were found tohave similar functions. The detailed study and research arein mentioned tables for better interpretations of results anddiscussion. According to Table 1, we can infer that speciesB and D have similar number of genes and proteinswhereas species A, C and E show similarity enomicproperties. From Table 2; it can be observed that all DNAproperties are similar in all five species. Species B and Eshow similar gradation in Direct repeats wherein species Aand E and species B and C have identical Inverted repeats.Multiple sequence alignment using ClustalW software,screened all the five subspecies sequences and henceresulted that species C and D have 13 identical sequencesof functional proteins with 100% alignment score andspecies D and E have 5 identical sequences of 99%similarity. The details are mentioned in Table 3 to 6respectively. Table 7 and 8 gives information about thefunctional site of 28 proteins (100% identical) and 7 proteins

(99% identical). The most analogous proteins are NHLrepeat proteins, recombinase A protein andSpermidine/putrescine ABC superfamily ATP bindingcassette transporter, ABC protein with 100% identicalsequences whereas ATP-binding cassette, ABCtransporter-type domain and ATP- dependent nuclease,subunit A with 99% identical sequences. Comparativegenomics analysis between the species revealed thatspecies B and D and species A and E are closely relatedto each other in their genomic composition while speciesC, D and E are similar in functional protein content. Bymeans of local sequence similarity searches, Proteinprofile searches, and analysis of 100% and 99% similarprotein funcational categories we have conducted adetailed comparative anal-ysis of the genomes of theT.pallidum. The level of conservation between functionalclasses and evolutionary measure, it was possible tocharacterize, in functional terms, the nature of thedivergence between the five spirochetes and the commonand distinct aspects of their physiological strategies.

(#’ indicates the presence of hypothetical proteins with other type of proteinaccording to the databases.)

Table 6 Comparative analysis of based on proteins present in Treponema pallidumsubspecies with 99% similarity sequences.

Table 5 Functional categories within the 100% similar protein sequences inTreponema pallidum subspecies

RESEARCH • SPIROCHAETE BACTERIUM

Anjaneyulu K et al.In silico Comparative Genomics of Treponema,Species, 2012, 1(1), 5-14, www.discovery.org.inhttp://www.discovery.org.in/s.htm © 2012 discovery publication. All rights reserved

13

Table 7 Predicted Functional site in total of 92 proteins

Table 8 Predicted Functional site in total of 13 proteins

RESEARCH • SPIROCHAETE BACTERIUM

Anjaneyulu K et al.In silico Comparative Genomics of Treponema,Species, 2012, 1(1), 5-14, www.discovery.org.inhttp://www.discovery.org.in/s.htm © 2012 discovery publication. All rights reserved

13

Table 7 Predicted Functional site in total of 92 proteins

Table 8 Predicted Functional site in total of 13 proteins

RESEARCH • SPIROCHAETE BACTERIUM

Anjaneyulu K et al.In silico Comparative Genomics of Treponema,Species, 2012, 1(1), 5-14, www.discovery.org.inhttp://www.discovery.org.in/s.htm © 2012 discovery publication. All rights reserved

13

Table 7 Predicted Functional site in total of 92 proteins

Table 8 Predicted Functional site in total of 13 proteins

RESEARCH • SPIROCHAETE BACTERIUM

Anjaneyulu K et al.In silico Comparative Genomics of Treponema,Species, 2012, 1(1), 5-14, www.discovery.org.inhttp://www.discovery.org.in/s.htm © 2012 discovery publication. All rights reserved

14

Protein functional profile searches resulted in theidentification of diverged and common components mightmediate interactions between the spirochetes and host cellsor the extracellular matrix. It appears possible to tentativelyunderstand the divergent mechanisms underlying theirinvertebrate pathogenesis and virulence and adaptation totheir specific niches.Comparative analysis of Spirocheate Treponema pallidumfive subspecies was performed.

1. The parameters include comparison of the genomicand protein function similarity and differences on thebasis of genome sequences obtained from NCBI werestudied.

2. This study enables the further analysis of the speciesto understand and grasp the growth, developmentand impact of research and to research on thepathogenecity activity to overcome the diseases andfor treatment and prevention of the causative agent ofdiseases caused by these organisms.

Finally, our strategy has demonstrated the discriminatorypower of computational tools and techniques with SupportVector Machine classification as the sequence basedcomparative analysis to discriminate proteins and theirfunctions associated within the species of pathogenicmicroorganisms with high reliability and accuracy.

SUMMARY OF RESEARCHThe aim of the study is to allocate the common and differentiating functional components encoded within the genomes of the five species of Treponema.Research review signifies the similarities and differences at morphological level as, Treponema pallidum subspeciesare morphologically and serologically indistinguishable. This study has demonstrated the discriminatory power of computational tools and techniques withSupport Vector Machine classification as the sequence based comparative analysis to discriminate proteins and their functions associated within thespecies of pathogenic microorganisms with high reliability and accuracy.

FUTURE ISSUESSequence variants could be readily used for molecular typing and identification of these Treponema pallidum strains and, with accumulation of additionaldata from other Treponema pallidum genomes, for epidemiologic applications and clinical discrimination between reinfection or reactivation of diseases.Moreover, the ability to now sequence numerous T.pallidum strains, especially those showing different degrees of virulence, will allow phenotype to becorrelated with sequence. This is a significant development for an organism of important public health impact, but for which standard bacterial geneticmethods is untenable. We hope that this work can be extended by exploring further sequence properties as well as more diverse organisms, to elucidatethe underlying host association and evolutionary mechanisms

DISCLOSURE STATEMENTThere is no financial support for this research work from the funding agency.

ACKNOWLEDGMENTSWe thank our guide for his timely help, giving outstanding ideas and encouragement to finish this research work successfully.

REFERENCES1. D elcher AL, H armon D, Kasif S, White O, Salzberg SL. Improved microbial gene identification with GLIMMER.

Nucleic Acids Res 1999, 27, 4636-46412. Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence.

Nucleic Acids Res 1997, 25, 955-9643. Lukashin AV, Borodovsky M. GeneMark.hmm: new solutions for gene finding. Nucleic Acids Research 1998,

26, 1107-11154. De Castro E., Sigrist C.J.A., Gattiker A., Bulliard V., Langendijk-Genevaux P.S., Gasteiger E., Bairoch A., Hulo N.

ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues inproteins. Nucleic Acids Res. 2006, 1, 34(Web Server issue), W362-5

5. Sigrist CJA, De Castro E, Langendijk-Genevaux PS, Le Saux V, Bairoch A, Hulo N. ProRule: a new databasecontaining functional and structural information on PROSITE profiles. Bioinformatics, 2005, 21(21), 4060-6

6. Tamura K, Dudley J, Nei M, Kumar S. MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version4.0. Mol Biol Evol 2007, 24,1596-1599

7. Sigrist CJA, Cerutti L, de Castro E, Langendijk-Genevaux PS, Bulliard V, Bairoch A, Hulo N. PROSITE, a proteindomain database for functional characterization and annotation. Nucleic Acids Res. 2010, 38(Database issue), 161-6

RELATED RESOURCE1. ArdhaniDwi Lestari, TiniPalupi, Bertha Oktarina, MochammadYuwono, GunawanIndrayanto. J. Liquid

Chromatography & Related Technol., 2004, 25(27), 2603-26122. Rabert Hartman, Ahmed Abrahim, Andrew Claused, Bing Mao, Louis S. Crocker, ZhihongGe. J. Liquid

Chromatography and Related Technol., 2003, 25(26), 2551-2566

Lukashin et al.,(1998): In this study,researchers presentthe analysis of falsepositive and falsenegative predictionswith the caution thatthese categories arenot precisely definedif the publicdatabase annotationis used as a control.

RESEARCH • SPIROCHAETE BACTERIUM

Anjaneyulu K et al.In silico Comparative Genomics of Treponema,Species, 2012, 1(1), 5-14, www.discovery.org.inhttp://www.discovery.org.in/s.htm © 2012 discovery publication. All rights reserved

14

Protein functional profile searches resulted in theidentification of diverged and common components mightmediate interactions between the spirochetes and host cellsor the extracellular matrix. It appears possible to tentativelyunderstand the divergent mechanisms underlying theirinvertebrate pathogenesis and virulence and adaptation totheir specific niches.Comparative analysis of Spirocheate Treponema pallidumfive subspecies was performed.

1. The parameters include comparison of the genomicand protein function similarity and differences on thebasis of genome sequences obtained from NCBI werestudied.

2. This study enables the further analysis of the speciesto understand and grasp the growth, developmentand impact of research and to research on thepathogenecity activity to overcome the diseases andfor treatment and prevention of the causative agent ofdiseases caused by these organisms.

Finally, our strategy has demonstrated the discriminatorypower of computational tools and techniques with SupportVector Machine classification as the sequence basedcomparative analysis to discriminate proteins and theirfunctions associated within the species of pathogenicmicroorganisms with high reliability and accuracy.

SUMMARY OF RESEARCHThe aim of the study is to allocate the common and differentiating functional components encoded within the genomes of the five species of Treponema.Research review signifies the similarities and differences at morphological level as, Treponema pallidum subspeciesare morphologically and serologically indistinguishable. This study has demonstrated the discriminatory power of computational tools and techniques withSupport Vector Machine classification as the sequence based comparative analysis to discriminate proteins and their functions associated within thespecies of pathogenic microorganisms with high reliability and accuracy.

FUTURE ISSUESSequence variants could be readily used for molecular typing and identification of these Treponema pallidum strains and, with accumulation of additionaldata from other Treponema pallidum genomes, for epidemiologic applications and clinical discrimination between reinfection or reactivation of diseases.Moreover, the ability to now sequence numerous T.pallidum strains, especially those showing different degrees of virulence, will allow phenotype to becorrelated with sequence. This is a significant development for an organism of important public health impact, but for which standard bacterial geneticmethods is untenable. We hope that this work can be extended by exploring further sequence properties as well as more diverse organisms, to elucidatethe underlying host association and evolutionary mechanisms

DISCLOSURE STATEMENTThere is no financial support for this research work from the funding agency.

ACKNOWLEDGMENTSWe thank our guide for his timely help, giving outstanding ideas and encouragement to finish this research work successfully.

REFERENCES1. D elcher AL, H armon D, Kasif S, White O, Salzberg SL. Improved microbial gene identification with GLIMMER.

Nucleic Acids Res 1999, 27, 4636-46412. Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence.

Nucleic Acids Res 1997, 25, 955-9643. Lukashin AV, Borodovsky M. GeneMark.hmm: new solutions for gene finding. Nucleic Acids Research 1998,

26, 1107-11154. De Castro E., Sigrist C.J.A., Gattiker A., Bulliard V., Langendijk-Genevaux P.S., Gasteiger E., Bairoch A., Hulo N.

ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues inproteins. Nucleic Acids Res. 2006, 1, 34(Web Server issue), W362-5

5. Sigrist CJA, De Castro E, Langendijk-Genevaux PS, Le Saux V, Bairoch A, Hulo N. ProRule: a new databasecontaining functional and structural information on PROSITE profiles. Bioinformatics, 2005, 21(21), 4060-6

6. Tamura K, Dudley J, Nei M, Kumar S. MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version4.0. Mol Biol Evol 2007, 24,1596-1599

7. Sigrist CJA, Cerutti L, de Castro E, Langendijk-Genevaux PS, Bulliard V, Bairoch A, Hulo N. PROSITE, a proteindomain database for functional characterization and annotation. Nucleic Acids Res. 2010, 38(Database issue), 161-6

RELATED RESOURCE1. ArdhaniDwi Lestari, TiniPalupi, Bertha Oktarina, MochammadYuwono, GunawanIndrayanto. J. Liquid

Chromatography & Related Technol., 2004, 25(27), 2603-26122. Rabert Hartman, Ahmed Abrahim, Andrew Claused, Bing Mao, Louis S. Crocker, ZhihongGe. J. Liquid

Chromatography and Related Technol., 2003, 25(26), 2551-2566

Lukashin et al.,(1998): In this study,researchers presentthe analysis of falsepositive and falsenegative predictionswith the caution thatthese categories arenot precisely definedif the publicdatabase annotationis used as a control.

RESEARCH • SPIROCHAETE BACTERIUM

Anjaneyulu K et al.In silico Comparative Genomics of Treponema,Species, 2012, 1(1), 5-14, www.discovery.org.inhttp://www.discovery.org.in/s.htm © 2012 discovery publication. All rights reserved

14

Protein functional profile searches resulted in theidentification of diverged and common components mightmediate interactions between the spirochetes and host cellsor the extracellular matrix. It appears possible to tentativelyunderstand the divergent mechanisms underlying theirinvertebrate pathogenesis and virulence and adaptation totheir specific niches.Comparative analysis of Spirocheate Treponema pallidumfive subspecies was performed.

1. The parameters include comparison of the genomicand protein function similarity and differences on thebasis of genome sequences obtained from NCBI werestudied.

2. This study enables the further analysis of the speciesto understand and grasp the growth, developmentand impact of research and to research on thepathogenecity activity to overcome the diseases andfor treatment and prevention of the causative agent ofdiseases caused by these organisms.

Finally, our strategy has demonstrated the discriminatorypower of computational tools and techniques with SupportVector Machine classification as the sequence basedcomparative analysis to discriminate proteins and theirfunctions associated within the species of pathogenicmicroorganisms with high reliability and accuracy.

SUMMARY OF RESEARCHThe aim of the study is to allocate the common and differentiating functional components encoded within the genomes of the five species of Treponema.Research review signifies the similarities and differences at morphological level as, Treponema pallidum subspeciesare morphologically and serologically indistinguishable. This study has demonstrated the discriminatory power of computational tools and techniques withSupport Vector Machine classification as the sequence based comparative analysis to discriminate proteins and their functions associated within thespecies of pathogenic microorganisms with high reliability and accuracy.

FUTURE ISSUESSequence variants could be readily used for molecular typing and identification of these Treponema pallidum strains and, with accumulation of additionaldata from other Treponema pallidum genomes, for epidemiologic applications and clinical discrimination between reinfection or reactivation of diseases.Moreover, the ability to now sequence numerous T.pallidum strains, especially those showing different degrees of virulence, will allow phenotype to becorrelated with sequence. This is a significant development for an organism of important public health impact, but for which standard bacterial geneticmethods is untenable. We hope that this work can be extended by exploring further sequence properties as well as more diverse organisms, to elucidatethe underlying host association and evolutionary mechanisms

DISCLOSURE STATEMENTThere is no financial support for this research work from the funding agency.

ACKNOWLEDGMENTSWe thank our guide for his timely help, giving outstanding ideas and encouragement to finish this research work successfully.

REFERENCES1. D elcher AL, H armon D, Kasif S, White O, Salzberg SL. Improved microbial gene identification with GLIMMER.

Nucleic Acids Res 1999, 27, 4636-46412. Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence.

Nucleic Acids Res 1997, 25, 955-9643. Lukashin AV, Borodovsky M. GeneMark.hmm: new solutions for gene finding. Nucleic Acids Research 1998,

26, 1107-11154. De Castro E., Sigrist C.J.A., Gattiker A., Bulliard V., Langendijk-Genevaux P.S., Gasteiger E., Bairoch A., Hulo N.

ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues inproteins. Nucleic Acids Res. 2006, 1, 34(Web Server issue), W362-5

5. Sigrist CJA, De Castro E, Langendijk-Genevaux PS, Le Saux V, Bairoch A, Hulo N. ProRule: a new databasecontaining functional and structural information on PROSITE profiles. Bioinformatics, 2005, 21(21), 4060-6

6. Tamura K, Dudley J, Nei M, Kumar S. MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version4.0. Mol Biol Evol 2007, 24,1596-1599

7. Sigrist CJA, Cerutti L, de Castro E, Langendijk-Genevaux PS, Bulliard V, Bairoch A, Hulo N. PROSITE, a proteindomain database for functional characterization and annotation. Nucleic Acids Res. 2010, 38(Database issue), 161-6

RELATED RESOURCE1. ArdhaniDwi Lestari, TiniPalupi, Bertha Oktarina, MochammadYuwono, GunawanIndrayanto. J. Liquid

Chromatography & Related Technol., 2004, 25(27), 2603-26122. Rabert Hartman, Ahmed Abrahim, Andrew Claused, Bing Mao, Louis S. Crocker, ZhihongGe. J. Liquid

Chromatography and Related Technol., 2003, 25(26), 2551-2566

Lukashin et al.,(1998): In this study,researchers presentthe analysis of falsepositive and falsenegative predictionswith the caution thatthese categories arenot precisely definedif the publicdatabase annotationis used as a control.