bioinformatics structural prediction of fmta gene from ten various species of bacteria and...

Upload: caguioa-mark-anthony-g

Post on 10-Oct-2015

7 views

Category:

Documents


0 download

DESCRIPTION

molecular biology

TRANSCRIPT

  • Structural Prediction through sequence analysis using bioinformatics tools:

    Prediction of the most distant species of fmtA protein from a variety of 10 species

    using multiple sequence analysis

    Nemerisza I. ARAN, Relyando DE FIESTA, Dane Nathalie T. MIRANDA and Justine

    Rose F. SANTOS

    Chemical Engineering Department, Technological Institute of the Philippines, Manila

    1001, Philippines

    Abstract

    Various bacteria from different species that contains the fmtA protein was obtained using

    the NCBI genbank. This paper contains fmtA protein sequences from gram-positive and

    gram-negative bacteria that has different functions and characteristics. With the use of

    protscale, the researchers were able to determine the molecular weight and chemical

    formula of each species. B. pseudomallei with a molecular weight of 82492.3 Da is

    considered to be the heaviest among the remaining bacteria and S. aureus with a

    molecular weight of 46067.4 Da being the lightest. All the species chosen gave partial

    hydrophobic results based on the Protscale result and all being hydrophilic in the

    protparam result, G. lozoyensis having the highest number of major peaks and S.

    epedermidis being the lowest. Data of motifs that matched in each sequence was given

    by MotifScan, only two bacteria gave no match and the rest gave possible results. Two

    domains were identified, the -lactamase and TonB, some shared the same domain.

    Proteases and chemicals that are capable of digesting the given bacteria was given by

    Peptide cutter.

    Secondary structure analysis using Psipred determined how many -helices, -sheets

    and random coils each fmtA protein have. S. sanguinis has the most -helix, B.

    pseudomallei and H. alvei tied in having the most number of -sheets and Y. regensburgei

    having the most number of random coils.

  • It was found that B. pseudomallei is farthest from the reference ancestral fmtA protein

    with a distance of 19.57, and was used in determining the tertiary and quaternary

    structure.

    Keywords: fmtA protein, protein structure, multiple sequence alignment, bacteria,

    bioinformatics

    Introduction

    Protein sequences emerging from genome sequencing projects are of greatest value to

    medicine and biology if their structure and function can be identified. With the growing

    number of annotate sequences, association of a new sequence to a protein of known

    structure can be a significant step towards the identification of its biological role. Simple

    sequence search methods such as FASTA (Pearson and Lipman, 1988) or NCBI readily

    identify close homologs of protein sequences.

    Multiple Sequence Alignment improves the detection of distantly related homologous

    protein. ClustalW and Tcoffee are the most common procedure when doing multiple

    sequence alignment, the sequences were grouped according to their similarities into a

    tree (hierarchical cluster analysis). Starting with the most similar pairs, all the sequences

    are aligned stepwise to each other using the dynamic programming method. The aligned

    sequences are output as well as the cluster analysis, but these procedures normally do

    not include any statistical analysis of the significance of the alignment.

    The fmtA gene was identified to be a methicillin resistance factor in Staphylococcus

    aureus. Inactivation of fmtA leads to increased sensitivity of methicillin-resistant S.aureus

    strains (MRSA) to Triton X-100 and -lactams and decreases the level of highly cross-

    linked peptidoglycan (PG) (Komatsuzawa et al). Ten sequences of fmtA gene from

    different bacterial species were entered to a protein structure identifier database to

  • compare the primary to secondary structure of fmtA genes from each given species of

    bacteria. Then, having the most distant specie in predicting the structure of tertiary and

    quaternary structure.

    Methods

    Control dataset

    A dataset of fmtA protein from different bacterial species obtained from NCBI GenBank

    (http://www.ncbi.nlm.nih.gov/) is listed here: EEV75348.1, EFH09490.1, EHL01514.1,

    WP_009496590.1, YP_001066210, WP_004191820.1, WP_002496510.1, EHM51293.1,

    EHL97443.1, EHM40511.1. The protein sequences of this 10 accession numbers were

    used throughout this paper.

    Physico-chemical properties. For the computation of physico-chemical properties,

    ProtParam (http://web.expasy.org/protparam/) was used. The computed parameters

    include the molecular weight, theoretical pI, amino acid composition, atomic composition,

    estimated half-life and grand average of hydropathicity (GRAVY). Extinction coefficient,

    absorbance and aliphatic index was calculated using the following formula

    E(Prot) = Numb(Tyr)*Ext(Tyr) + Numb(Trp)*Ext(Trp) + Numb(Cystine)*Ext(Cystine)

    Absorb(Prot) = E(Prot) / Molecular_weight

    Aliphatic index = X(Ala) + a * X(Val) + b * [ X(Ile) + X(Leu) ]

    Identifying Protein Domains. InterProScan (www.ebi.ac.uk/InterProScan/) combines

    different protein signature recognition methods and allows the comparison of a certain

    sequence from the InterPro, a domain database that includes most of the major domain

    collections available online.

    Motifs determination. Motif (http://myhits.isb-sib.ch/cgi-bin/motif_scan) scanning means

    finding all known motifs that occur in a sequence. In using this database, the result must

    be filtered and shown using PROSITE profiles.

  • Transmembrane Segment Prediction.

    THMM (http://www.cbs.dtu.dk/services/TMHMM/) is used to predict transmembrane

    segments in protein. It also tells about the portion of proteins that are probably inside and

    outside the cell.

    Locating Coiled-coil Region. In determining the coiled-coil region of the given protein

    structure, http://ch.embnet.org/software/COILS_form.html. COILS is a program that

    compares a sequence to a database of known parallel two-stranded coiled-coils and

    derives a similarity score.

    Hydrophobicity prediction. In computing ang representing the profile produced by any

    amino acid scale on a selected protein, ProtScale (http://web.expasy.org/protscale/) was

    used. It is a two-dimensional plot wherein the hydrophobicity of a protein is given account

    to. This concerns major and minor peaks that are responsible in the determination of the

    hydrophobic site. In analyzing the plot, the number of hydrophobic and hydrophilic peaks

    was counted at the score of ()1st respectively.

    Detecting PROSITE signature matches. To detect which functional group or protein will

    help in increasing the functional diversity of proteome, a trusted protein database was

    used (http://prosite.expasy.org/scanprosite/). ScanProsite is a web-based tool in

    determining which prosite pattern a certain sequence is located. It is also designed for

    checking if other proteins contain the same sequence.

    Predicting cleavage sites. PeptideCutter (http://web.expasy.org/peptide_cutter/) was

    used in predicting potential cleavage sites cleaved by proteases or chemicals in a given

    protein sequence. This tool can be helpful in determining whether the chosen protein can

    interact with the available enzyme in the database. Enzymes can be chosen all at once,

    and can also be chosen one at a time depending on how the user wants in to be.

  • Prediction of the secondary structure

    PSIPRED (www.bioinf.cs.ucl.ac.uk/psipred/) is a popular structure prediction method to

    accurately predict the secondary structure of any protein. It can say how many alpha helix

    and beta sheets are in there in a protein structure.

    Prediction of the tertiary structure

    Dihedral angles between C-C (, psi) and N-C (, phi) of amino acid residues and

    empirical distribution of data points in a protein structure are determined using Rampage

    (http://mordred.bioc.cam.ac.uk/~rapper/rampage.php).

    Prediction of the quaternary structure

    SWISS-MODEL (http://swissmodel.expasy.org/) is a structural bioinformatics web-

    server dedicated to homology modeling of protein 3D structures. Homology modeling is

    currently the most accurate method to generate reliable three-dimensional protein

    structure models and is routinely used in many practical applications.

  • Results and Dscussion

    Primary sequence analysis

    [Sporosarcina newyorkensis]

    Number of amino acids 625

    MW 72531.6 Da GRAVY: -0.1

    Instability Index 43.79, protein is unstable Aliphatic Index 99.28

    Extinction Coefficients 113110 Absorbance 1.559

    [Burkholderia pseudomallei 1106a]

    Number of amino acids 753

    MW 82434.2 Da

    Instability Index 31.03, protein is stable Aliphatic Index 75.34

    Extinction Coefficients 153795 Absorbance 1.864

    [Streptococcus sanguinis]

    Number of amino acids 592

    MW 67379.2 Da GRAVY: -0.235

    Instability Index 26.40, protein is stable Aliphatic Inedx 88.26

    Extinction Coefficients 130640 Absorbance 1.939

    [Staphylococcus aureus A8115]

    Number of amino acids 397

    MW 46067.4 Da GRAVY: -0.561

    Instability Index 28.01, protein is stable Aliphatic Index 85.08

    Extinction Coefficients 56160 Absorbance 1.219

    [Roseomonas cervicalis ATCC 49957]

    Number of amino acids 731

    MW 79740.9 Da GRAVY: -0.312

    Instability Index 36.20, protein is stable Aliphatic Index 76.51

    Extinction Coefficients 122620 Absorbance 1.538

    [Glarea lozoyensis 74030]

    Number of amino acids 513

    MW 56389.9 Da GRAVY: -0.127

    Instability Index 29.45, protein is stable Aliphatic Index 92.76

    Extinction Coefficients 55350 Absorbance 0.982

    [Staphylococcus epidermidis]

    Number of amino acids 400

    MW 46620.6 Da GRAVY: -0.584

    Instability Index 28.36, protein is stable Aliphatic Index 81.45

    Extinction Coefficients 57190 Absorbance 1.227

    [Yokenella regensburgei ATCC 43003]

    Number of amino acids 733

    MW 81182.9 Da GRAVY: -0.521

    Instability Index 36.68, protein is stable Aliphatic Index 60.95

    Extinction Coefficients 120560 Absorbance 1.485

    [Acetobacteraceae bacterium AT-5844]

    Number of amino acids 720

    MW 79079.1 Da GRAVY: -0.327

    Instability Index 29.54, protein is stable Aliphatic Index 75.67

    Extinction Coefficients 106120 Absorbance 1.342

    [Hafnia alvei ATCC 51873]

    Number of amino acids 728

    MW 80466.0 Da GRAVY: -0.531

    Instability Index 31.68, protein is stable Aliphatic Index 63.69

    Extinction Coefficients 106120 Absorbance 1.319

    Fig.1. Physico-chemical properties of the fmtA protein in

    different bacteria.

  • Identifying Protein Domains

    Fig.2a. Resulting domain for gram-positive bacteria

    Fig.2b. Resulting domain for gram-negative bacteria

  • Matches E-value

    [Staphylococcus aureus A8115]

    Beta-lactamase

    Beta-lactamase PENICILLIN-BINDING PROTEIN

    transmembrane_regions

    signal-peptide

    5.0E-62 [88-383] T

    5.0E-62 [88-383] T 5.9E-33 [15-295] T

    -1.0 [9-27] ?

    -1.0 [1-31] ?

    [Roseomonas cervicalis

    ATCC 49957]

    G3DSA:2.40.170.20

    TonB_dep_Rec TAT

    G3DSA:2.170.130.10

    Plug PTHR32552

    PTHR32552:SF0

    SSF56935

    0.0 [195-731] T

    5.3000000000000115E-28 [487-730] T 0.0 [1-40] T

    4.099999999813685E-38 [38-192] T

    4.400000000000006E-22 [82-181] T 0.0 [7-731] T 0.0 [7-731] T

    0.0 [55-731] T

    [Glarea lozoyensis 74030]

    Beta-lactamase

    G3DSA:3.40.710.10 PBP_transp_fold

    PTHR22935

    1.0999999999999873E-48 [3-354] T

    2.9999999998641766E-60 [3-358] T 2.9999875044570637E-61 [5-369] T 3.4999946686394883E-40 [1-337] T

    [Sporosarcina newyorkensis]

    Beta-lactamase

    G3DSA:3.40.710.10 PBP_transp_fold

    PTHR22935

    1.2000000000000005E-47 [68-379] T

    9.099999999558015E-61 [56-393] T 1.2000011745813432E-59 [51-403] T 3.9999854413940615E-41 [68-378] T

    [Burkholderia pseudomallei

    1106a ]

    Beta-lactamase G3DSA:3.40.710.10

    PBP_transp_fold PTHR22935

    1.2000000000000005E-47 [68-379] T 9.099999999558015E-61 [56-393] T

    1.2000011745813432E-59 [51-403] T 3.9999854413940615E-41 [68-378] T

    [Streptococcus sanguinis]

    Beta-lactamase G3DSA:3.40.710.10

    PBP_transp_fold

    PTHR22935

    3.999999999999978E-51 [56-355] T 2.0999999998359858E-66 [57-355] T

    7.40000780398351E-66 [31-371] T

    3.800006970935884E-41 [36-352] T

    [Staphylococcus

    epidermidis]

    Beta-lactamase no description

    beta-lactamase/transpeptidase-like

    PENICILLIN-BINDING PROTEIN signal-peptide

    transmembrane_regions

    3.9E-49 [87-368] T 1.0E-54 [70-368] T 4.5E-68 [62-387] T

    3.9E-29 [73-386] T -1.0 [1-26] ? -1.0 [9-29] ?

    [Yokenella regensburgei

    ATCC 43003]

    G3DSA:2.40.170.20 TonB_dep_Rec TonB-siderophor

    TONB_DEPENDENT_REC_1 TONB_DEPENDENT_REC_2

    G3DSA:2.170.130.10 Plug

    0.0 [197-733] T 1.6999999999999923E-27 [489-732] T

    0.0 [79-733] T

    0.0 [1-48] T 0.0 [716-733] T

    3.099999999755361E-40 [32-193] T 1.3000000000000007E-22 [78-182] T

    [Acetobacteraceae bacterium AT-5844]

    G3DSA:2.40.170.20 TonB_dep_Rec

    TonB-siderophor G3DSA:2.170.130.10

    Plug PTHR32552

    PTHR32552:SF0 SSF56935

    0.0 [188-720] T 1.9000000000000044E-32 [494-719] T

    9.900000000000002E-130 [76-718] T 8.800000001009721E-38 [53-182] T

    2.1000000000000035E-21 [75-174] T 0.0 [39-720] T

    0.0 [39-720] T 0.0 [29-720] T

    [Hafnia alvei ATCC 51873]

    G3DSA:2.40.170.20 TonB_dep_Rec TonB-siderophor

    TONB_DEPENDENT_REC_2 G3DSA:2.170.130.10

    Plug

    PTHR32552 PTHR32552:SF0

    SSF56935

    0.0 [196-728] T 1.9999999999999946E-25 [489-727] T

    0.0 [80-728] T

    0.0 [711-728] T 1.0E-40 [33-192] T

    5.099999999999984E-24 [79-181] T

    0.0 [52-728] T 0.0 [52-728] T 0.0 [52-728] T

    Table 1. Summary of matches and E-value from domains obtained using Interproscan

  • The functions of unknown proteins can be recognized by matching its motif with those of

    the known ones using InterProscan a tool from Expasy. FmtA protein in S. aureus, S.

    newyorkensis, S. sanguinis, S. epedermidis have the same domains of Beta-lactamase

    related and Beta-lactamase/transpeptidase-like. Beta-lactamase catalyses the opening

    and hydrolysis of the beta-lactam ring of beta-lactam antibiotics such as penicillins and

    cephalosporins. Most of these antibiotics work by preventing biosynthesis of the bacterial

    cell wall. The possibility of Staphylococcal and Streptococcal bacteria to have Beta-

    lactamase as the primary domain is that these kind of organisms are capable of resisting

    many forms of important antibiotics. G. lozoyensis also share this domain and has

    Peptidase S12, Pab87-related, C-terminal as another domain. The common

    characteristic of the five (5) bacteria is that they are all gram-positive.

    As for fmtA in R. cervicalis, B. pseudomallei, Y. regensburgei, A. bacterium and H. alvei

    share the same domains for TonB-dependent receptor, beta-barrel, siderophore receptor

    and plug. An extensional domain of TonB-dependent receptor, conserved site for both Y.

    regensgurgei and H. alvei. TonB box, conserved site as another domain in H. alvei.

    TonB is responsible in interacting with the outer membrane receptor proteins. These

    proteins carry out high-affinity binding energy-dependent uptake of specific substrates

    into the periplasmic space. The periplasmic space is the space between the cell wall and

    the cell membranes. Bacteria that has TonB as domain are all gram-negative, just like R.

    cervicalis, B. pseudomallei, Y. regensgurgei, A. bacterium and H. alvei.

  • Motif information Status Position Raw-score N-score E-value

    Fmta [Staphylococcus aureus A8115] Big-1 (bacterial Ig-like domain 1) domain

    BIG1 Weak match 1-9 33 4.128 1.6e+03

    Ferric malleobactin receptor fmta

    [Roseomonas cervicalis ATCC

    49957]

    NHL repeat profile

    NHL Weak match 306-318 27 4.009 2.1e+03

    Twin arginine translocation

    TAT Weak match 1-40 858 7.882 0.28

    Putative protein fmta [Glarea

    lozoyensis 74030] No match

    Fmta family protein [Sporosarcina

    newyorkensis]

    LDL-receptor class B repeat profile

    LDLRB Weak match 617-625 153 5.329 99

    Ferric malleobactin transporter

    [Burkholderia pseudomallei 1106a ]

    Alanine-rich region

    ALA_RICH Strong match 27-83 48 9.353 0.0094

    Fmta family protein [Streptococcus

    sanguinis] No match

    Fmta family protein [Staphylococcus

    epidermidis]

    Lysine-rich region

    LYS_RICH Weak match 52-83 40 6.918 2.6

    Bipartite nuclear localization signal profile

    NLS_BP Weak match 52-66 3 3.000 2.1e+04

    Putative ferric malleobactin receptor

    fmta [Yokenella regensburgei ATCC

    43003]

    Threonine-rich region Weak match 67-136 38 6.929 2.5

    Putative ferric malleobactin receptor

    fmta [Acetobacteraceae bacterium

    AT-5844]

    No match

    Putative ferric malleobactin receptor

    fmta [Hafnia alvei ATCC 51873]

    MVP (vault) repeat

    MVP Weak match 61-114 446 5.366 91

    Table 2. Corresponding motif for each bacteria; given extra information about the position, raw-score, N-score and E-value

  • Motifs determination.

    Among the resulting motifs, only B. pseudomallei gave the strongest response with

    regards to its corresponding motif, which is ALA_RICH, given the highest possible N-

    score of 9.353. Both S.sanguinis and A. bacterium gave no match to any motif available

    in the MotifScan database. All remaining bacteria gave a weak match, this means that

    there is a probability that the given motifs are unsure and does not comply to each given

    protein sequence.

    E-value provides an estimation of the number of false positives. Among the weak

    matches, NLS_BP with an E-value of 2.1x104 gave the lowest possibility of it being false

    positive match to S. epidermidis. For having an E-value of 99, LDLRB has the highest

    possibility of it being a false match to S. newyorkensis.

    S. epidermidis is said to be a part of humans normal bacterial flora and is associated with

    foreign infection. This cocci has low pathogenic potential for those who have strong

    immune system. Having weak response from NLS_BP, which has a primary role in

    describing a specific sequence within a protein that is responsible for the translocation

    into the cell, there must be a possibility that the protein sequence of S. epidermidis

    happens to be have few amino acids that are capable in transferring or infecting other life

    forms.

    S. newyorkensis is an endospore-forming bacteria capable of transferring some of its

    DNA to a host. Endospores are commonly found on places where it can survive for a long

    period of time, they can be found on soil and water. The function of LDLRB is to regulate

    and maintain internal stability of cholesterol in mammalian cells. The difference between

    S. newyorkensis and LDLRB is that one does not have to modify the endospores that it

    forms and may let it lie dormant for a very long time, and the other has to check-up on the

    cholesterol levels once in a while to make sure that everything is at equilibrium.

  • Transmembrane Segment Prediction

    Table 3. Given the posterior probabilities of being on the inside or outside of the cell

    Inside Transmembrane helix Outside

    Staphylococcus aureus A8115 1 - 6 7 - 26 27 - 397

    Roseomonas cervicalis ATCC 4995 1 - 731

    Glarea lozoyensis 7403 1 -513

    Sporosarcina newyorkensis

    1 - 25

    524 - 531

    590 - 601

    26 - 48

    501 - 523

    532 - 554

    569 - 589

    602 - 624

    49 - 500

    555 - 568

    625 - 625

    Burkholderia pseudomallei 1106a 1 - 753

    Streptococcus sanguinis

    1 - 4

    481 - 492

    553 - 563

    5 - 24

    458 - 480

    493 - 515

    530 - 552

    564 - 586

    25 - 457

    516 - 529

    587 - 592

    Staphylococcus epidermidis 1 - 6 7 - 29 30 - 400

    Yokenella regensburgei ATCC 43003 1 - 733

    Acetobacteraceae bacterium AT-5844 1 - 720

    Hafnia alvei ATCC 51873 1 - 728

    In determining where the residue is on the cell, the TMHMM plots must be the source of

    information and not the probabilities listed above, because the plot shows the location

    and the data above shows only the prediction of the location if the transmembrane helix

    is on the inside, outside or inside the membrane of the cell.

    With respect to the resulting graph of each species, S. aureus, B. pseudomallei, A.

    bacterium, H. alvei, Y. regensgurgei and S. epedermidis are inside the cell. G.

    lozoyensis showed no TM helix. R. cervicalis is outside the cell. S. newyorkensis

    showed inside positions at 25-50, 495-520, 525-550, 555-580. 600-625. Lastly, S.

    sanguinis at the position of 1-25, 450-480, 485-510, 520-560, 570-592 showed to be

    inside.

  • Determining the coiled-coil region

    Fig 3. Number of coiled-coil regions for different species.

    Coiled coils are built by two or more alpha-helices that wind around each other to form a

    supercoil. In essence coiled coils are built of sequence elements of three and four

    residues whose hydrophobicity pattern and residue composition is compatible with the

    structure of amphipathic alpha-helices. S. newyorkensis having the most number of

    coiled-coil region with a score of 8 and A. bacterium being the least with a score of 2.

    Hydrophobicity prediction

    Based on the GRAVY calculated using Protparam, species were considered hydrophilic

    due to a very low score. In determining how hydrophobic and hydrophilic each species

    are, the Protscale plot was used as the basis for analysis. Peaks above zero indicates

    that the residue is hydrophobic, below zero is hydrophilic.

    0 1 2 3 4 5 6 7 8

    NUMBER OF COILED COIL REGION

    Hafnia alvei ATCC 51873 Acetobacteraceae bacterium AT-5844 Yokenella regensburgei ATCC 43003Staphylococcus epidermidis Streptococcus sanguinis Burkholderia pseudomallei 305Sporosarcina newyorkensis Glarea lozoyensis 74030 Roseomonas cervicalis ATCC 49957Staphylococcus aureus A8115

  • Fig 4. Prediction of the number of major peaks having ()1 as the base point in the

    ProtScale plot.

    According to the figure, G. lozoyensis with 13 major peaks has the most hydrophobic

    residue and S. epedermidis with 4 major peaks being the least. The hydrophilic peaks are

    more in number than the hydrophobic peaks. H. alvei with 20 peaks is considered the

    most hydrophilic among all species. Having 12 peaks, S. aureus, S. epedermidis and S.

    sanguinis are the least hydrophilic among all the species.

    Determining the post-translational modification

    Some PTMs of the given bacteria are identical with each other and some has a unique

    modification. Figure A shows the PTMs for each bacterium, it was arranged according to

    similarity of their modifications. Only two bacteria have active sites namely, Y.

    regensburgei ATCC 43003 and H. alvei ATCC 51873. Active sites for Y. regensburgei

    ATCC 43003 and H. alvei ATCC 51873 are TonB-dependent receptor proteins signatures

    1 and 2 (TonB-DRPS 1 & 2) and TonB-dependent receptor proteins signature 2 (TonB-

    DRPS 2), respectively. Without TonB, receptors attach their substrates even though it

    does not have an active transport. Active transport is important in moving ions through

    5

    9

    13

    912

    8

    4

    9 96

    12

    1714 14

    17

    12 12

    18 1720

    hydrophobic peaks hydrophilic peaks

  • membranes contrary to their electrochemical gradient. A summarized table below is

    provided to show how many patterns in the sequence of bacteria are identical in the

    prosite database.

    Fig. 5. The chart shows the post-translations modification for each bacterium. The name

    of the bacteria is placed outside the circle. Similar PTMs for the 10 bacteria are located

    at the center. PTMs positioned in between two colors correspond to two bacteria. While

    PTMs found inside the segment of the circle is a unique modifier for that specific bacteria.

    Gla

    rea

    lozoyensis 7

    40

    3

    CK2, PKC

    Phosphorylation

    N-glycosylation

    N-myristoylation

    leucine

    zipper

    pattern

    TonB-DRPS

    1 & 2

    TonB-

    DRPS 2

  • Detecting PROSITE signature matches

    Table 4. The table shows the number of hits by pattern in a sequence for each bacterium

    Protein kinases are responsible for phosphorylation; it is a kind of enzyme that catalyzes

    transfer of phosphate group from ATP to substrates. They are usually used in transmitting

    signals and controlling processes in cells. Its known function is to modify the activities of

    proteins. Some of the functions of phosphorylation are regulating the functions of proteins

    by formation and disturbance of protein-protein surfaces and by stimulating

    conformational changes. Due to the reversibility of phosphorylation, it permits cell to

    respond to stimuli making it to be perfect tool in signal transduction. Individual functions

    for the different types of phosphorylation that was used in modifying the given bacteria

    are also discussed. Phosphorylation of tyrosine residues controls the enzymatic activity

    of the protein and generates binding sites for downstream signaling proteins. Casein

    kinase (II) is concern mainly on regulating cellular processes. They are self-regulating on

    cyclic nucleotides and calcium. Protein kinase C is used for phosphorylation of serine or

    threonine residues adjacent to the C-terminal basic residue. It improves enzyme-

    catalyzed reaction and substrate concentration of phosphorylation reaction.

    fmtA Sequence Hits by pattern

    Staphylococcus aureus A8115 16 out of 397 amino acids

    Roseomonas cervicalis ATCC 4995 47 out of 731 amino acids

    Glarea lozoyensis 7403 31 out of 513 amino acids

    Sporosarcina newyorkensis 24 out of 625 amino acids

    Burkholderia pseudomallei 1106a 35 out of 753 amino acids

    Streptococcus sanguinis 23 out of 592 amino acids

    Staphylococcus epidermidis 22 out of 400 amino acids

    Yokenella regensburgei ATCC 43003 48 out of 733 amino acids

    Acetobacteraceae bacterium AT-5844 42 out of 720 amino acids

    Hafnia alvei ATCC 51873 56 out of 728 amino acids

  • N-myristoylation serves as a conformational localization switch, the conformational

    changes of protein has a major effect in the availability of the handle for membrane

    attachment. It also increases the hydrophobicity and affinity of membranes due to

    myristoyl group which attaches to the N-terminal amino acid of polypeptide. N-

    myristoyltransferase (NMT) is the enzyme used in catalyzing the modification.

    N-glycosylation improves the functional diversity of proteome. It confirms whether the

    folded proteins are transferred to Golgi or not. N-linked glycoproteins contribute important

    properties during protein folding, conformation, distribution, stability and activity.

    Amidation improves the activity of peptides and lengthens its shell life. Together with N-

    terminal acetylation, it lessens the overall charge of peptides causing the solubility to

    decrease. Moreover, it enhances the resistance of peptides against enzymatic

    degradation and increases its stability as they copy the native protein. Therefore,

    amidation boosts the biological activity of peptide.

    S. sanguinis has a unique modification called leucine zipper pattern. It is responsible for

    binding DNA within the promoters of genes. It regulates gene expression in order to

    develop complex organisms.

  • Predicting cleavage sites

    Fig. 6. Number of possible cleavage sites obtained using PeptideCutter.

    With respect to the number of cleavages, Proteinase K cleaved most of the amino acid in

    a corresponding sequence for each bacteria. This enzyme preferentially cleaves at

    aliphatic of aromatic amino acid such as Tyrosin, Phenylalanine, Tryptophan and

    Histidine (Keil, 1992). Other function includes major role in the destruction of proteins in

    cell lysates (tissue, cell culture cells) and for the release of nucleic acids. Since almost all

    of the chosen sequence contain a lot of aromatic amino acid. Next on the list is

    Thermolysin, this proteinase cleaves sites with bulky and aromatic residues Isolucine,

    Leucine, Valine, Alanine, Methaionine and Phenylalanine (Keil, 1992).

    Trypsin, being in in the middle, preferentially cleaves at Arg and Lys in position P1 with

    higher rates for Arg especially at high pH (Keil, 1992). Hydroxylamine, with a low score

    of one-digit number, is responsible in cleaving sites at Asn and Glu (Bornstein & Balian).

    Lastly, Enterokinase, having no cleavage at all, is a serine protease that recognizes the

    0

    50

    100

    150

    200

    250

    300

    350

    400

    190

    381

    256

    344372

    317

    205

    363378

    357

    115

    205

    153

    207221

    183

    116

    194 205 183

    59 63 47 5767 62 54 59 60 62

    3 3 2 2 2 3 3 2 1 20 0 0 0 0 0 0 0 0 0

    NO

    . OF

    CLE

    AV

    EGES

    Proteinase K Thermolysin Trypsin Hydroxylamine Enterokinase

  • amino acid sequence -Asp-Asp-Asp-Asp-Lys-|-X (Roche) with a high specificity. The

    enterokinase activates its natural substrate trypsinogen and releases trypsin by cleavage

    at the C-terminal end of this sequence.

    Together with enzyme enterokinase, Caspase1-10 is not intended for fmtA from all

    bacteria except for Sporosarcina newyorkensis and Streptococcus sanguinis because

    these bacterium can digest Caspase8. Enterokinase and Granzyme B enzymes has a

    preference for cleaving which fmtA lacks, thats why these enzymes did not give possible

    cleavage sites in the sequence. For Factor Xa, only Roseomonas cervicalis is applicable.

    FmtA proteins from Roseomonas cervicalis, Burkholderia pseudomallei, Yokonella

    regensburgei and Aceterobacteraceae bacterium can digest the enzyme Tobacco etch

    virus protease. For Thrombin, only Roseomoinas cervicali gave a positive feedback.

    Prediction of secondary structure

    Table 4. Summary of results obtained using PSIPRED, indicated the number of -helices,

    -sheets and random coils in the structure of each fmtA protein in different bacteria

    -helix -sheet Random coils

    [Staphylococcus aureus

    A8115]

    11

    9 21

    [Roseomonas cervicalis

    ATCC 49957] 4 36 41

    [Glarea lozoyensis 74030] 11 19 30

    [Sporosarcina newyorkensis] 16 17 33

    [Burkholderia pseudomallei

    1106a] 4 38 42

    [Streptococcus sanguinis] 15 17 32

    [Staphylococcus

    epidermidis] 11 9 21

    [Yokenella regensburgei

    ATCC 43003] 3 39 43

    [Acetobacteraceae bacterium

    AT-5844] 3 36 40

    [Hafnia alvei ATCC 51873] 3 38 42

  • Secondary protein structure is the specific geometric shape caused by intramolecular

    and intermolecular hydrogen bonding of amide groups. It composes of -helix, -sheets

    and sometimes random coils.

    Based on the Psipred result of 10 sequences, it is noticeable that the -sheets have a

    higher number compared to the -helix. in the -helix structure, the "backbone" of the

    peptide forms the inner part of the coil while the side chains extend outward from the coil.

    -sheets have a greater number because not all amino acids favor the formation of the

    -helix due to steric constraints of the R-groups. Amino acids such as A, D, E, I, L and M

    favor the formation of -helices, whereas, G and P favor disruption of the helix. This is

    particularly true for P since it is a pyrimidine based imino acid (HN=) whose structure

    significantly restricts movement about the peptide bond in which it is present, thereby,

    interfering with extension of the helix. Whereas an -helix is composed of a single linear

    array of helically disposed amino acids, -sheets are composed of 2 or more different

    regions of stretches of at least 5-10 amino acids.

  • Fig. 7. Multiple sequence alignment of fmtA. The colors of the letters correspond how good or bad the sequence identity.

  • Multiple sequence alignment is used to detect related proteins and to study the

    relationship between the sequences. It has been clearly shown that using multiple

    sequence alignments improve upon the detection of distantly related homologous

    proteins.

    Figure shows the multiple sequence of the entered sequence. Each of the sequence is

    listed with their names and an alignment. The color of each letter tells how bad or good

    the alignment is. Notice how the first and last part of the protein has a good sequence

    identity. These proteins are conserved through evolution.

    Fig. 8. Phylogeny of fmtA protein fromm different bacteria, it is divided into three

    clusters: Distance were calculated by means of % identity.

    The proteins in the alignment can be grouped according to different sequence features.

    This allows the proteins to be further assorted into subgroups that are most closely

  • related to each other. If the protein is not well conserved, it indicates that it is more

    evolutionary distant.

    Sequences that are more related are closer together in the branch of the tree. Based on

    the fig. 8 the ancestor (base point of zero) of the protein is closely related to S. aureus

    and S. epidemidis. In contrast, B. pseudomallei are the most distant from ancestral

    protein. B. pseudomallei is a Gram-negative bacteria pathogen that normally survives as

    a saprophyte in soil and water, but is also capable of infecting most mammals and causing

    serious infections resulting in the multifaceted disease melioidosis. Very little is known

    about iron acquisition mechanisms in B. pseudomallei. The bacterium produces a

    hydroxamate-type siderophore, malleobactin, that can remove iron from lactoferrin and

    transferrin, allowing this bacterium to grow under iron-limiting conditions.

    Prediction of the tertiary structure of B. pseudomallei

    Partial-double-bond makes the peptide planar; it limits the rotation around C-N bond

    making it to have two alpha-carbons, C, O, N and H among them in one plane. Making

    thea the third angle which is omega () is constant at 1800. These three angles are the

    most significant local structure parameter in protein folding. Allowed and favored regions

    of dihedral angles in B. pseudomallei 1106a can be seen in figure below. Table A shows

    the name, position and coordinates of residues that lie in non-core regions (outlier).

  • Fig. 9. Ramachandran plot of B. pseudomallei. Residues in outlier regions are numbered

    accordingly.

    Table 5. This table shows the name, position and coordinates of residues that lie in non-

    core regions (outlier)

    Name of Residue

    Position of residue in sequence

    Coordinates Name of Residue

    Position of residue in sequence

    Coordinates

    Threonine (Thr)

    96 -175.32, -11.24 Glycine(Gly) 454 -177.83, -58.14

    Valine (V)

    157 133.40, 157.46 Proline(Pro) 455 56.81, 105.40

    Proline (Pro)

    165 -115.66, 152.39 Arginine(Arg) 482 161.14, -43.91

    Tryptophan (Trp)

    173 46.95, 151.16 Lysine(Lys) 545 -65.87,-150.16

  • Alanine (Ala)

    201 158.76,-166.87 Glycine(Gly) 549 31.09, 30.59

    Aspartic acid (Asp)

    236 136.98,-175.22 Proline(Pro) 568 -44.89, 163.52

    Proline (Pro)

    261 4.87, 40.87 Proline(Pro) 569 -43.24, -73.29

    Histidine (His)

    328 154.55, 128.87 Serine(Ser) 620 -7.65, -78.12

    Asparagine (Asn)

    345 -173.79,-140.22 Valine(V) 651 -55.41, 180.00

    Threonine (Thr)

    407 -65.02, -85.34 Proline(Pro) 652 6.62, 54.3

    Alanine (Ala)

    432 -173.28,-149.78 Arginine(Arg) 711 -159.69,-128.76

    Proline (Pro)

    448 -62.94,-139.77 Alanine(Ala) 735 -172.95, -69.74

    Glycine has one side chain, hydrogen, while proline is limited in ramachandran plot since

    phi is restricted by cyclic side chain that ranges from -35o to -85o. Clear illustrations for

    glycine, pre-proline and proline residues lying in favored and allowed regions are provided

    below.

    Fig. 10. Ramachandran plots for glycine, preproline and proline residues. Dark colors

    indicate the favored region while lighter colors designates allowed region.

    Ramachandran plot illustrates the , angles of in B. pseudomallei 1106a. The

    percentage of favorable, allowed and outlier region are 88.9 %, 7.5% and 3.6%,

    respectively which is a little bit far from the expected value of ~98% (favorable) and ~2%

  • (allowed). Out of 573 residues, there are 24 residues lying in outlier regions. Outlier region

    indicates how well the structures are suited in the main chain distribution of torsional

    angles.

    Prediction of quaternary structure

    Fig. 11. Results obtained using Swiss model, given the qmean z-score, C-beta interaction

    energy and AII-atom interaction energy.

  • Fig. 12a. 3-dimensional structure of B.

    pseudomallei 1106a.

    Fig. 12b. 3-dimensional structure of B.

    pseudomallei 1106a based on

    temperature. High values are colored in

    warmer (red) colors and lower values in

    colder (blue) colors.

    Fig. 13. Assessment of the quality of the homology model.

  • QMEAN (Qualitative Model Energy Analysis) calculates global and local quality

    estimates on the basis of single models. The data shown allows us to inspect the

    differences between the models and helps us understand the expected accuracy of the

    model.

    (a) Represents the QMEAN scores of the reference structures from the PDB. It indicates

    how many standard deviations the model score differs from the expected values. It has a

    Z-score of -5.42 which is far from the mean. (b) This is a projection of the first plot for the

    given protein size. It also shows the number of reference models used in calculation. (c)

    Shows that a low quality model has a strongly negative Z-scores for QMEAN. Good

    structures are said to be in the light red to blue region. The data shows that the model is

    in very low quality because of its very high negative values.

    Figure 14. Anolea evaluates the packing quality of the model. The Y-axis shows the

    energy for each amino acid of the protein chain. Qmean estimates the global quality for

    all models.

    The Atomic empirical mean force potential (ANOLEA) performs energy calculations on a

    protein chain. The negative energy values shown in green signify beneficial energy setting

    while the positive energy values shown in red signify unfavorable energy setting for a

    given amino acid. It immediately reveals if there are regions with atoms coming close to

    each other and some regions have very high energy. In protein structures amino acid

    residues have their preferred location. And the red regions show that the energy of the

  • model is much higher due to residues making bad contacts. We can conclude that the

    green region is much favorable and it indicates the incorrectness of the model.

    Conclusion

    The species used all contained the fmtA protein; each bacterium is of different kind and

    has different functions. Some are fmtA family proteins and some are ferric malleobactin

    receptor and transporter. Various databases were used in comparing each species of

    bacteria from the other, starting from the molecular weight, GRAVY, and number of amino

    acids. Then, the matching domains and motifs are all different. Also, predicting the

    location of the transmembrane helix where each has different posterior probabilities.

    In doing multiple alignments, the average distance was calculated using % identity. Upon

    doing that, the results came out to be B. pseudomallei that is the most distant species

    among all species used.

  • References:

    [1] Bornstein P., Balian G. Cleavage at Asn-Gly bonds with hydroxylamine. Methods in

    Enzymology (1977) 47: 132- 144

    [2] Roche. Enterokinase product description. http://www.roche-applied-

    science.com/proddata/gpip/3_1_3_7_10_1.html

    [3] Keil, B. Specificity of proteolysis. Springer-Verlag Berlin-Heidelberg-NewYork,

    pp.335. (1992)

    [4] Komatsuzawa H, et al. 1997. Cloning and characterization of the fmt gene which

    affects the methicillin resistance level and autolysis in the presence of Triton X-100 in

    methicillin-resistant Staphylococcus aureus. Antimicrob.

    Agents Chemother. 41:23552361.

    [5] Arnold K., Bordoli L., Kopp J., and Schwede T. (2006). The SWISS-MODEL

    Workspace: A web-based environment for protein structure homology

    modelling. Bioinformatics, 22,195-201.

    [6] Schwede T, Kopp J, Guex N, and Peitsch MC (2003) SWISS-MODEL: an automated

    protein homology-modeling server. Nucleic Acids Research 31: 3381-3385.

    Guex, N. and Peitsch, M. C. (1997) SWISS-MODEL and the Swiss-PdbViewer: An

    environment for comparative protein modelling.Electrophoresis 18: 2714-2723.

    [7] Pearson,W.R. and Lipman,D.J. (1988) Proc. Natl Acad. Sci. USA, 85, 2444.A

    comparison of sequence and structure protein domain mafilies as a basis for structural

    genomics. Arne elofsson and erik L. L. Sonnhammer. Department of biochemistry,

    Stockholm university, November 11, 1998

    [8] Crystal structure of a D-aminopeptidase from Ochrobactrum anthropi, a new member

    of the 'penicillin-recognizing enzyme' family.

    [9] Bompard-Gilles C, Remaut H, Villeret V, Prange T, Fanuel L, Delmarcelle M, Joris B,

    Frere J, Van Beeumen J.

    Structure 8 971-80 2000

    PMID: 10986464 Related citations

  • [10] EstB from Burkholderia gladioli: a novel esterase with a beta-lactamase fold reveals

    steric factors to discriminate between esterolytic and beta-lactam cleaving activity.

    Wagner UG, Petersen EI, Schwab H, Kratky C.

    Protein Sci. 11 467-78 2002

    PMID: 11847270 Related citations

    [11] Understanding the acylation mechanisms of active-site serine penicillin-recognizing

    proteins: a molecular dynamics simulation study.

    Oliva M, Dideberg O, Field MJ.

    Proteins 53 88-100 2003

    PMID: 12945052 Related citations

    [12]Beta-lactamase of Bacillus licheniformis 749/C. Refinement at 2 A resolution and

    analysis of hydration.

    Knox JR, Moews PC.

    J. Mol. Biol. 220 435-55 1991

    PMID: 1856867 Related citations

    [13] The active-site-serine penicillin-recognizing enzymes as members of the

    Streptomyces R61 DD-peptidase family.

    Joris B, Ghuysen JM, Dive G, Renard A, Dideberg O, Charlier P, Frere JM, Kelly JA,

    Boyington JC, Moews PC.

    Biochem. J. 250 313-24 1988

    PMID: 3128280 Related citations

    [14] The phototrophic bacterium Rhodopseudomonas capsulata sp108 encodes an

    indigenous class A beta-lactamase.

    Campbell JI, Scahill S, Gibson T, Ambler RP.

    Biochem. J. 260 803-12 1989

    PMID: 2788410 Related citations

    [15] X-ray structure of Streptococcus pneumoniae PBP2x, a primary penicillin target

    enzyme.

    Pares S, Mouz N, Petillot Y, Hakenbeck R, Dideberg O.

  • Nat. Struct. Biol. 3 284-9 1996

    PMID: 8605631 Related citations

    [16] Crystal structure of the outer membrane active transporter FepA from Escherichia

    coli.

    Buchanan SK, Smith BS, Venkatramani L, Xia D, Esser L, Palnitkar M, Chakraborty R,

    van der Helm D, Deisenhofer J.

    Nat. Struct. Biol. 6 56-63 1999

    PMID: 9886293 Related citations

    [17] Transmembrane signaling across the ligand-gated FhuA receptor: crystal structures

    of free and ferrichrome-bound states reveal allosteric changes.

    [18] Locher KP, Rees B, Koebnik R, Mitschler A, Moulinier L, Rosenbusch JP, Moras D.

    Cell 95 771-8 1998

    PMID: 9865695 Related citations

    [19] Structural basis of gating by the outer membrane transporter FecA.

    Ferguson AD, Chakraborty R, Smith BS, Esser L, van der Helm D, Deisenhofer J.

    Science 295 1715-9 2002

    PMID: 11872840 Related citations

    [20] Substrate-induced transmembrane signaling in the cobalamin transporter BtuB.

    Chimento DP, Mohanty AK, Kadner RJ, Wiener MC.

    Nat. Struct. Biol. 10 394-401 2003

    PMID: 12652322 Related citations

    [21] Three paradoxes of ferric enterobactin uptake.

    Klebba PE. Front. Biosci. 8 s1422-36 2003. PMID: 12957833 Related citations

    [22] The Escherichia coli outer membrane cobalamin transporter BtuB: structural

    analysis of calcium and substrate binding, and identification of orthologous transporters

    by sequence/structure conservation.

    Chimento DP, Kadner RJ, Wiener MC.

    [23] Swiss Institute of Bioinformatics. Available:< http://prosite.expasy.org/cgi-

    bin/prosite/ScanView.cgi?scanfile=623843423385.scan.gz>. Accessed 16 May 2013.

  • [24] Thermo Fisher Scientific Inc. (2013).

    Available:< http://www.piercenet.com/browse.cfm?fldID=7CE3FCF5-0DA0-4378-A513-

    2E35E5E3B49B >. Accessed 12 May 2013.

    [25] Hubbard SR, Till JH. (2000).

    Available:< http://www.ncbi.nlm.nih.gov/pubmed/10966463>. Accessed 12 May 2013.

    [26] Manning G, Whyte DB. et al. (2002). "The protein kinase complement of the human

    genome".Science 298 (5600): 1912

    1934. doi:10.1126/science.1075762. PMID 12471243.

    [27] Francis SH, Corbin JD (August 1999). "Cyclic nucleotide-dependent protein

    kinases: intracellular receptors for cAMP and cGMP action". Crit Rev Clin Lab

    Sci 36 (4): 275328.doi:10.1080/10408369991239213. ISSN 1040-

    8363. PMID 10486703.

    [28] American Association for Cancer Research (cAMP-responsive Genes and Tumor

    Progression) Available:

    < https://en.wikipedia.org/wiki/Cyclic_adenosine_monophosphate>. Accessed 12 May

    2013.

    [29] LifeTein. Free Modifications: N-Terminal Acetylation and C-Terminal Amidation

    Available:< http://www.lifetein.com/Peptide-Synthesis-Amidation-Acetylation.html>.

    Accessed 16 May 2013.

    [30] Landschulz WH, Johnson PF, McKnight SL (1988-06-24). "The leucine zipper: a

    hypothetical structure common to a new class of DNA-binding

    proteins". Science 240 (4860): 1759

    1764.doi:10.1126/science.3289117. PMID 3289117

    [31] Berger-Bchi B, Strssle A, Gustafson J E, Kayser F H. Mapping and

    characterization of multiple chromosomal factors involved in methicillin resistance

    in Staphylococcus aureus. Antimicrob Agents Chemother. 1992;36:13671373. [PMC

    free article] [PubMed]

    [32] Pearson,W.R. and Lipman,D.J. (1988) Proc. Natl Acad. Sci. USA, 85, 2444.

  • Abstract/FREE Full Text

    [33] Springer 2013

    http://www.springerreference.com/docs/html/chapterdbid/34498.html Jeon H., Meng W.,

    Takagi J., Eck M.J., Springer T.A., Blacklow S.C. Implications for familial

    hypercholesterolemia from the structure of the LDL receptor YWTD-EGF domain pair.

    Source Nat. Struct. Biol. 8:499-504(2001).PubMed ID 11373616. DOI

    10.1038/88556

    [34] S.C. Lovell, I.W. Davis, W.B. Arendall III, P.I.W. de Bakker, J.M. Word, M.G. Prisant,

    J.S. Richardson and D.C. Richardson (2002) Structure validation by Calpha geometry:

    phi,psi and Cbeta deviation. Proteins: Structure, Function & Genetics. 50: 437-450.

    Available: < http://mordred.bioc.cam.ac.uk/~rapper/rampage2.php>. Accessed 16 May

    2013

    [35] Karadaghi, S.A. (2012). Available:

    .

    Accessed 12 May 2013

    [36] Kleywegt, G.J., Jones, A.T. (1996). Phi/Psi-chology: Ramachandran revisited.

    Available: < http://www.greeley.org/~hod/papers/ByAuthor/Jones/s4_1996_1395.pdf >.

    Accessed 12 May 2013

    [36] Swiss Institute of Bioinformatics. Swiss-Model Workspace. Available: <

    http://swissmodel.expasy.org/workspace/[email protected]&key

    =3e0d577ab36ad95cd472ca012a0defe0&func=workspace_modelling&prjid=P000008 >.

    Accessed 16 May 2013