btp report on homology modeling of riboflavin synthase

Upload: tanmaya-kumar-sahoo

Post on 06-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 BTP Report on Homology Modeling of Riboflavin Synthase

    1/14

    Homology Modelling of Riboflavin Synthase

    Alpha Chain ofM. tuberculosis (Rv1412)

    By

    Tanmaya Kumar Sahoo

    Roll No. 08BT1016

    Department of Biotechnology

    Under the Supervision of

    Prof. A. K. Das

    Department of Biotechnology

    IIT Kharagpur -721302

  • 8/3/2019 BTP Report on Homology Modeling of Riboflavin Synthase

    2/14

    Acknowledgements

    I would like to express my humble indebtedness and deep sense of sincere

    gratitude to my project supervisor Prof. A. K. Das, Department of

    Biotechnology for his supervision, constructive guidance, constructive

    criticism, nurturing encouragement and support at every stage of my research

    work.

    I would also like to thank Dr. Baisakhee Saha for helping and giving me her

    support throughout this project. Her immense help and encouragement greatly

    helped me to materialize this project.

    Tanmaya Kumar Sahoo08BT1016

    Department of BiotechnologyIIT, KharagpurNovember 2011

  • 8/3/2019 BTP Report on Homology Modeling of Riboflavin Synthase

    3/14

    Certificate

    This is to certify that the project entitled Homology Modelling of RiboflavinSynthase Alpha Chain from M. tuberculosis (Rv1412) and Determination of

    its Interaction with Beta Chain (Rv1416) being submitted by Tanmaya Kumar Sahoo is a bona fide work done by him in the Department of Biotechnology, Indian Institute of Technology, Kharagpur under mysupervision and guidance as his B. Tech project.

    Prof. A. K. DasDepartment of BiotechnologyIIT Kharagpur

  • 8/3/2019 BTP Report on Homology Modeling of Riboflavin Synthase

    4/14

    Contents

    1. Introduction1.1 Introduction to Riboflavin Synthase1.2 Introduction to Homology Modelling

    2. Objective of the project3. Materials and Methods

    3.1Softwares/Tools used in Homology Modelling3.2Steps of Homology Modelling

    3.2.1 Searching for structures related to Rv14123.2.2 Selecting a template structure3.2.3 Aligning Rv1412 with the template selected

    3.2.4 Model Building3.2.5 Model Evaluation

    4 Results Obtained4.1 DOPE score vs. alignment positions for template and model4.2 Ramachandran plot4.3 Secondary structure analysis

    5 Conclusion6 Future Works7 References

  • 8/3/2019 BTP Report on Homology Modeling of Riboflavin Synthase

    5/14

    AbstractRiboflavin synthase is an enzyme that acts as a catalyst in the final reaction ofriboflavin biosynthesis. This enzyme from M. tuberculosis is encoded by the gene

    rv1412. In this study, the homology model of Rv1412 has been generated using thetemplate 1kzl: A. The quality of the model as revealed from Ramachandran plotshows that core region which constitutes more than 90% of the residues lie in the mostfavourable regions. The homology model of Rv1412 shows two beta barrels Nterminal barrel and C terminal barrel. It also has three helices and three 310 helices.The sequence alignment between target and template sequence show that , the regionswhich are relatively more conserved have less difference in DOPE score (i.e. theregion from residue 40-50 and from residue 95-105) while the other regions showmore difference in DOPE score.

    1. IntroductionIt has been hypothesized that enzymes involved in the riboflavin biosynthesispathway, including riboflavin synthase, can be used to develop antibacterial drugs inorder to treat infections caused by Gram-negative bacteria and yeasts. This hypothesisis based on the inability of Gram-negative bacteria, such as E. coli and S.typhimurium, to uptake riboflavin from the external environment. [1] As Gram-negative bacteria need to produce their own riboflavin, inhibiting riboflavin synthaseor other enzymes involved in the pathway may be useful tools in developingantibacterial drugs. This is one of the major reasons which drive us towards extensivestudy of riboflavin synthase in M. tuberculosis.

    1.1 Riboflavin Synthase

    Riboflavin synthase is an enzyme that acts as a catalyst in the final reaction ofriboflavin biosynthesis:

    6, 7-dimethyl-8-ribityllumazine riboflavin + 5-amino-6-ribitylamino-2, 4(1H, 3H)-pyrimidinedione

    Riboflavin synthase is a homotrimer with 23kDa subunits. Each monomer containstwo beta-barrels and one -helix at the C-terminus. The monomer folds into pseudo

    two-fold symmetry, predicted by sequence similarity between the N-terminus barreland the C-terminus barrel. [4]Two 6, 7-dimethyl-8-ribityllumazine molecules are hydrogen bound to each monomeras the two domains are topologically similar. [2] The active site is located in theinterface of the substrates between monomer pairs and modelled structures of theactive site dimer have been created. [3] Only one of the active sites of the enzymecatalyse riboflavin formation at a time as the other two sites face outward and areexposed to solvent. [4]

  • 8/3/2019 BTP Report on Homology Modeling of Riboflavin Synthase

    6/14

    1.2 Homology Modelling

    Homology modelling, also known as comparative modelling of protein refers toconstructing an atomic-resolution model of the "target" protein from its amino acid

    sequence and an experimental three-dimensional structure of a related homologousprotein (the "template"). Homology modelling relies on the identification of one ormore known protein structures likely to resemble the structure of the query sequence,and on the production of an alignment that maps residues in the query sequence toresidues in the template sequence. It has been shown that protein structures are moreconserved than protein sequences amongst homologues, but sequences falling below a20% sequence identity can have very different structure. [5]Evolutionarily related proteins have similar sequences and naturally occurringhomologous proteins have similar protein structure. It has been shown that three-dimensional protein structure is evolutionarily more conserved than expected due to

    sequence conservation. [6]The sequence alignment and template structure are then used to produce a structuralmodel of the target. Because protein structures are more conserved than DNAsequences, detectable levels of sequence similarity usually imply significant structuralsimilarity. [7]The quality of the homology model is dependent on the quality of the sequencealignment and template structure. The approach can be complicated by the presence ofalignment gaps that indicate a structural region present in the target but not in thetemplate, and by structure gaps in the template that arise from poor resolution in theexperimental procedure (usually X-ray crystallography) used to solve the structure.

    Model quality declines with decreasing sequence identity; a typical model has ~12 root mean square deviation between the matched C atoms at 70% sequence identity but only 24 agreement at 25% sequence identity. However, the errors aresignificantly higher in the loop regions, where the amino acid sequences of the targetand template proteins may be completely different.Regions of the model that were constructed without a template, usually by loopmodelling, are generally much less accurate than the rest of the model. Errors in sidechain packing and position also increase with decreasing identity, and variations inthese packing configurations have been suggested as a major reason for poor modelquality at low identity. [8]

    Taken together, these various atomic-position errors are significant and impede theuse of homology models for purposes that require atomic-resolution data, such as drugdesign and proteinprotein interaction predictions; even the quaternary structure of a

    protein may be difficult to predict from homology models of its subunit(s).Nevertheless, homology models can be useful in reaching qualitative conclusionsabout the biochemistry of the query sequence, especially in formulating hypothesesabout why certain residues are conserved, which may in turn lead to experiments totest those hypotheses. For example, the spatial arrangement of conserved residues maysuggest whether a particular residue is conserved to stabilize the folding, to participatein binding some small molecule, or to foster association with another protein or

    nucleic acid.

  • 8/3/2019 BTP Report on Homology Modeling of Riboflavin Synthase

    7/14

    The chief inaccuracies in homology modelling, which worsen with lower sequenceidentity, derive from errors in the initial sequence alignment and from impropertemplate selection. [9] Like other methods of structure prediction, current practice inhomology modelling is assessed in a biannual large-scale experiment known as theCritical Assessment of Techniques for Protein Structure Prediction, or CASP.The method of homology modelling is based on the observation that protein tertiarystructure is better conserved than amino acid sequence.[7] Thus, even proteins thathave diverged appreciably in sequence but still share detectable similarity will alsoshare common structural properties, particularly the overall fold. Because it is difficultand time-consuming to obtain experimental structures from methods such as X-raycrystallography and protein NMR for every protein of interest, homology modellingcan provide useful structural models for generating hypotheses about a protein'sfunction and directing further experimental work.

    2. Objective of the Project

    Construction of a homology model of the "Rv1412" protein from its amino acidsequence and generation of a three-dimensional structure by the process ofhomology modelling.

    3. Materials and Methods

    3.1 Softwares/Tools used in Homology Modelling

    Modeller 9.10Platform Used WindowsVersion used 9.10Downloaded from http://salilab.org/modeller

    Python 2.6Platform Used WindowsVersion Used 2.6Downloaded from http://python.org/getit/releases/2.6/

    Procheck Analysis

    For Procheck plots the PDB file was uploaded to http://www.ebi.ac.uk/pdbsum/ PyMOLVersion Used 1.4.1http://www.pymol.org/

    PDBsumTo get all types of secondary structures present the PDB file was uploaded tohttp://www.ebi.ac.uk/pdbsum/

    http://www.pymol.org/http://www.ebi.ac.uk/pdbsum/http://www.pymol.org/http://www.ebi.ac.uk/pdbsum/
  • 8/3/2019 BTP Report on Homology Modeling of Riboflavin Synthase

    8/14

    3.2 Steps of Homology Modelling

    3.2.1 Searching for structures related to Rv1412 First, the target Rv1412 sequence was put into a PIR format which is readable

    by modeller e.g. (file Rv1412.ali). A search was performed for potentially related sequences of known structures

    by running profile_build.py script of modeller package. It reads a text format file containing non-redundant PDB sequences (file

    "pdb_95.pir").Like the previously-created alignment, this file was in PIRformat. Sequences which have fewer than 30 or more than 4000 residues werediscarded, and non-standard residues were removed.

    The output of the "build_profile.py" script is written to the "build_profile.prf"file. Two PDB sequences (1i8d: A and 1kzl: A) show very significantsimilarities to the query sequence with e-values equal to 0.

    3.2.2 Selecting a template structure To select the most appropriate template for query sequence over the two similar

    structures, compare.py was used. The comparison showed that 1kzl: A has a

    higher overall sequence identity to the query sequence (40%). 1kzl: A wasselected as a template for comparative modelling.

    3.2.3 Aligning Rv1412 with the template selected The alignment of Rv1412 sequence in the file Rv1412.ali with the 1kzl: A

    structure in the PDB file 1kzl.pdb was achieved through align2d.py. This was different from standard sequence-sequence alignment methods

    because it takes into account structural information from the template. This taskwas achieved through a variable gap penalty function that tends to place gapsin solvent exposed and curved regions, outside secondary structure segments,and between two positions that were close in space.

    As a result, the alignment errors were reduced by approximately one thirdrelative to those that occur with standard sequence alignment techniques. Thisimprovement becomes more important as the similarity between the sequencesdecreases and the number of gaps increases.

    3.2.4 Model Building Once a target-template alignment was constructed, MODELLER was used to

    calculate a 3D model of the target completely automatically, using itsautomodel class.

    The following script (file "model-single.py") generated five similar models ofRv1412 based on the 1kzl: A template structure and the alignment in file"Rv1412-1kzl.ali".

    The models generated by MODELLER were energy optimized with eachmodels having a specific DOPE (Discrete Optimized Protein Energy) score.Lower the value of DOPE score the model is more favourable.

  • 8/3/2019 BTP Report on Homology Modeling of Riboflavin Synthase

    9/14

    The model with lowest DOPE score or lowest value of MODELLER objectivefunction was selected for further evaluation with PROCHECK.

    3.2.5 Model Evaluation Using three scripts evaluate_model.py, evaluate_template.py and

    plot_profiles.py a graph was plotted for DOPE score vs. Alignment Positionfor both the model and the template.

    The PDB file for the model formed was uploaded to PDBsum and all thenecessary PROCHECK plots were obtained for the modelled structure.

    The model was also evaluated for different types of secondary structuralelements.

    4. Results and Discussion

    4.1 DOPE score vs. alignment position for template and modelDOPE, or Discrete Optimized Protein Energy, is a statistical potential used to assesshomology models in protein structure prediction. It is implemented in the popularhomology modelling program MODELLER and used to assess the energy of the

    protein model generated through much iteration by MODELLER.

    Fig.1 DOPE score plot for model and template which was generated by

    MODELLER shows the relative difference in energy between them at differentalignment positions.

    The above plot (i.e. Fig.1) maps optimized energy value over the whole sequencebetween template and the model. The energy difference between the template andmodel can be attributed to presence of loop regions and other variable regions. Asseen from sequence alignment between target and template sequence (Fig.2), theregions which are relatively more conserved show less difference in DOPE score (i.e.from residue 40- residue 50 and from residue 95- residue 105) and other regions showmore difference.

  • 8/3/2019 BTP Report on Homology Modeling of Riboflavin Synthase

    10/14

    Fig.2 Sequence alignment between template and target shows the positions where

    the sequence is conserved.

    4.2 Ramachandran plot

    Fig.3 Ramachandran plot. Shows PROCHECK scan result on energy minimized

    homology model

    Table.1 Ramachandran Plot Statistics

  • 8/3/2019 BTP Report on Homology Modeling of Riboflavin Synthase

    11/14

    The Ramachandran plot (Fig.3) shows the phi-psi torsion angles for all residues in thestructure (except those at the chain termini). Glycine residues are separately identified

    by triangles as these are not restricted to the regions of the plot appropriate to the otherside-chain types.The darkest areas correspond to the "core" regions representing the most favourablecombinations of phi-psi values. According Ramachandran plot statistics (Table.1)favourable regions constitute more than 90% and unfavourable regions constituteapproximately 1% of the total structure. The number of most favourable regions is 157and the number of disallowed regions is 2.

    4.3 Secondary structure analysis

    Fig.5 Topology of the model generated using PROCHECK shows the alignment

    of helices and beta strands in the model.

  • 8/3/2019 BTP Report on Homology Modeling of Riboflavin Synthase

    12/14

    Fig.5 Secondary structures for the model generated shows six helices present in

    the structure and two beta sheets each consisting of 6 beta strands are also

    shown.

    The summary of secondary structures generated are shown below :-Secondary structure summary

    Strand Alpha helix 3-10 helix Others Total residues82 (40.8%) 23 (11.4%) 13 (6.5%) 83 (41.3%) 201

    Beta Sheets

    Sheet No. of strands Type Barrel TopologyA 6 Antiparallel Y -1 3 1 1 3B 6 Antiparallel Y -1 3 1 1 3

    Helices

    Start End Type No. of residuesAla65 Arg70 H 6Leu73 Glu75 G 3Ala127 Tyr132 G 6Pro165 Leu170 H 6Thr172 Ser175 G 4Val188 Arg198 H 11

  • 8/3/2019 BTP Report on Homology Modeling of Riboflavin Synthase

    13/14

    Beta Strands

    Start End Sheet No. of residueGlu8 Leu18 A 11Ala21 Arg27 A 7Ser40 Val43 A 4Val46 Asp52 A 7Gln58 Met64 A 7Arg81 Arg86 A 6Ala105 Cys113 B 9Trp118 Glu124 B 7Ser138 Val141 B 4Ile144 Leu151 B 8Trp158 Leu163 B 6

    Arg181 Val186 B 6

    It was observed that the active site residues (i.e. His102, Thr50, Val103 and Cys48) ofthe template which binds to 6-Carboxyethyl-7-Oxo-8-Ribityllumazine at the N-terminal barrel are conserved in the structure of the model generated. It was observedthat the active site residues (i.e. Tyr148, Thr165 and Ser146) of the template which

    binds to 6-Carboxyethyl-7-Oxo-8-Ribityllumazine at the C-terminal barrel are alsoconserved.

    5. Conclusion

    Since the percentage of residues constituting most favoured regions is morethan 90% the model generated is accurate to a large extent.

    It is inferred that the most conserved regions in the sequence show littledeviation in DOPE score, thus they are correspondingly conserved in thestructure.

    6. Future Work

    Protein-Ligand docking with homology model of Rv1412. Protein-Protein docking with homology model of Rv1412.

    7. References [1] - Fischer M, Bacher A (June 2008). "Biosynthesis of vitamin B2: Structure

    and mechanism of riboflavin synthase". Arch. Biochem. Biophys. 474 (2):25265.

    [2] - Gerhardt S, Schott AK, Kairies N, Cushman M, Illarionov B, EisenreichW, Bacher A, Huber R, Steinbacher S, Fischer M (October 2002). "Studies onthe reaction mechanism of riboflavin synthase: X-ray crystal structure of acomplex with 6-carboxyethyl-7-oxo-8-ribityllumazine". Structure 10 (10):137181.

    [3] - Fischer M, Schott AK, Kemter K, Feicht R, Richter G, Illarionov B,Eisenreich W, Gerhardt S, Cushman M, Steinbacher S, Huber R, Bacher A

  • 8/3/2019 BTP Report on Homology Modeling of Riboflavin Synthase

    14/14

    (December 2003). "Riboflavin synthase of Schizosaccharomyces pombe.Protein dynamics revealed by 19F NMR protein perturbation experiments".BMC Biochem. 4: 18.

    [1] - Liao DI, Wawrzak Z, Calabrese JC, Viitanen PV, Jordan DB (May 2001)."Crystal structure of riboflavin synthase". Structure 9 (vol.5): 399408.

    [5] Chothia C and Lesk AM. The relation between the divergence ofsequence and structure in proteins. The EMBO Journal vol.5 no.4 pp.823-826,1986.

    [6] - Kaczanowski S and Zielenkiewicz P. Why similar protein sequencesencode similar three-dimensional structures? Theoretical Chemistry Accounts125:54350, 2010.

    [7] - Marti-Renom MA, Stuart AC, Fiser A, Sanchez R, Melo F, Sali A.Comparative protein structure modelling of genes and genomes. Annu RevBiophys. Biomol. Struct. 29: 291325, 2000.

    [8] - Chung SY, Subbiah S. A structural explanation for the twilight zone ofprotein sequence homology. Structure 4: 112327, 1996.

    [9] - Venclovas C, Margeleviius M. Comparative modelling in CASP6 usingconsensus approach to template selection, sequence-structure alignment, andstructure assessment. Proteins 61(S7):99105, 2005.