dissertation final

142
“Exploring Structural Propensities Exploring the structural Propensities on a Denatured Small Ubiquitin Like Modifier Protein by ab-initio Quantum Chemical Calculation ”. Dissertation submitted to Dr.D.Y.Patil University, Navi Mumbai for partial fulfillment towards the degree of B-Tech. (Biotechnology) By Abhilash Kannan Under the Guidance Dr.Ganapathy Subramanian Scientist E MCC/BRC, Molecular Biology unit National Centre for Cell Science, Pune. & Prof R.V. Hosur, Department of Chemical Sciences, Tata Institute Of fundamental Research, Mumbai. 2009 Department Of Biotechnology and Bioinformatics, Padmashree Dr. D. Y. Patil University, Belapur CBD,Navi Mumbai-400614.

Upload: abhilash-kannan

Post on 15-Apr-2017

87 views

Category:

Documents


1 download

TRANSCRIPT

“Exploring Structural Propensities Exploring the structural Propensities on a Denatured Small Ubiquitin Like Modifier Protein by ab-initio Quantum Chemical Calculation ”. Dissertation submitted to Dr.D.Y.Patil University, Navi Mumbai for partial fulfillmenttowardsthedegreeofB-Tech.(Biotechnology)

By

AbhilashKannan

UndertheGuidance

Dr.GanapathySubramanianScientistE

MCC/BRC,MolecularBiologyunitNationalCentreforCellScience,Pune.

&ProfR.V.Hosur,

DepartmentofChemicalSciences,TataInstituteOffundamentalResearch,

Mumbai.

2009

DepartmentOfBiotechnologyandBioinformatics,

PadmashreeDr.D.Y.PatilUniversity,

BelapurCBD,NaviMumbai-400614.

NATIONAL CHEMICAL LABORATORY

PASHAN ROAD, PUNE 411008 (MAHARASHTRA)

Tel: (020) – 25902570 Fax: (020) - 25902601 / 25902660 Website www.ncl-india.org

CERTIFICATE

This is to certify that Mr. Abhilash Kannan, Final year student of B-Tech.

(Biotechnology) from Dr. D.Y. Patil institute of Biotechnology and Bioinformatics, Navi

Mumbai has successfully completed a project entitled “Exploring the Structural

Propensities on a denatured Small Ubiquitin Like Modifier protein by ab-initio

Quantum Chemical Calculations” for the partial fulfilment of B-Tech degree at

Central NMR Facility, National Chemical Laboratory, Pune under my guidance.

Dr. S. Ganapathy Scientist-Emeritus

Solid State NMR Group Central NMR Facility, NCL

DECLARATION

I hereby declare that the Dissertation entitled ‘‘Exploring the Structural Propensities

on a denatured Small Ubiquitin Like Modifier protein by ab-initio Quantum

Chemical Calculations’’ embodies my original work carried out under

supervision of Dr. S. Ganapathy, Scientist Emeritus at Central NMR Facility,

National Chemical Laboratory, Pune, Maharashtra, for the partial fulfilment of

the Bachelor Of Technology degree during the period December 2008 to June

2009. This work presented in this report is original and has not formed the

basis for the award of any Degree/ diploma / Associateship/ fellowship or

similar title to any candidate of any university.

Abhilash Krishna Kannan

Pune 25 June 2009

Acknowledgement

“InthenameofKrishnathemostbeneficentandmerciful”.

I am extremely glad to make use of this opportunity to express my deep sense of

gratitudetoDr.S.Ganapathy ,ScientistEmeritus,CentralNMR Facility ,National

ChemicalLaboratory,Pune,andProf.R.V.Hosur,DepartmentofChemicalSciences,

Tata institute of fundamental Research, Mumbai, whom I am obliged to for his

guidance in each & every step of my project. I am very much thankful to him for

providingmeanexposure to research field.Hisexcellentguidance, constant support

inspiringwords&friendlymannerenabledmetoexplorethedepthofSolidStateNMR.

ItakethisopportunitytothanksDr.S.Sivaram,Director,NCLforallowingmetodo

my project at NCL. I express my heart-felt gratitude to Dr. P.R. Rajamohanan ,

CentralNMRdivision,NCLforprovidingmeanopportunitytousetheNMRfacility.

IamgratefultomyDeanandProfessorDr.D.A.Bhiwgade,SchoolofBiotechnology,

D.Y.PatilInstituteofBiotechnologyandBioinformatics,NaviMumbaiforhisvaluable

comments,suggestionsandinspirationaladvice.

Mydeep&sincerethanksmustgotomyNMRgroupcolleaguesDr.GurpreetSingh,

Dr. Dinesh Gupta, Ms. Mamta Barfa, and Mr. Vikas gupta , for their timely

assistance, precious advice & guidance during my project work. Their support was

invaluableincompletingmywork&reportontime.

ItisindeedapleasuretoexpressmyheartfeltgratitudetoDr.SaoravPal,ScientistG,

Head of Physical chemistry Division and Center for Materials Characterization

NationalChemicalLaboratory forallowingme touseNCLclustercomputerandDr.

SudipRoyforfruitfulguidance.

IwishtoexpressmysinceregratitudetoDr.ArpitaGupte,Dr.MadhaviRevankar,

andMr.ShineD,PadmashreeDr.D.Y.PatilinstituteofBiotechnologyand

Bioinformaticsfortheirtimelyassistance,preciousadvices&guidancethroughmail

duringmyworkatNCL.

IthankfellowNMRlabmatesAany,Annu,Hari,Hilda,Jima,YamunaandRennyfor

theircompanionshipduringmystayatNCL.

IalsoexpressmydeepgratitudetomyfriendsAditi,Pratik,Bhoomi,Nidhita,JiteshandMeenakshi,Rahulfortheirencouragementandsupport.

Last but not the least; I amgrateful tomy family for their unflinching support andguidanceduringthecourseofmystudy.Iamluckyandproudtohavethem.

25 June 2009 Abhilash Kannan

Dedicated to my

Beloved Parents ……

Objective

This work focuses on Denaturation effects of the Small Ubiquitin-like Modifier (SUMO) protein. SUMO protein extracted from Drosophila melanogaster has shown to have some unique denaturation characteristics. Extensive multidimensional NMR studies have already established that the secondary structure propensities are partially retained for this 88-residue long protein in the denatured state and the protein does not go fully into a random coil state. Recognizing that NMR chemical shift is sensitive to secondary structure, the work reported in this project represent a novel and new theoretical approach to calculate the NMR chemical shifts by advanced quantum chemical calculations with an aim to determine the secondary shifts for urea denatured SUMO protein. As shown in this study, the ab initio calculations on the SUMO protein fragment (32-52) generated by MD studies indeed reveal that the helical and sheet propensities are not lost in the process of denaturation. The work reported in this project is also the first such effort to apply advanced quantum chemical calculations on any denatured protein system and has been aimed as a general approach in protein structure elucidation.

Contents

Preface..................................................................................................1

List of Abbreviations...........................................................................2

List of twenty essential amino acids...................................................4

List of Figures.......................................................................................5

Section 1 - Literature

1. Protein evolution, Structure, and Function.......................8

1.1. Evolutionofproteins.......................................................................................8

1.1.1. Evolutionary Time Scales 1.1.2. Protein Homologies

1.2. Sequence,StructureandFunction.............................................................10

1.3. SpecificExamplesofProteinStructureandFunction.........................12

i. Renin-Angiotensin-Aldosterone System. ii. Oxytocin and Vasopressin.

iii. Insulin and Glucagon. iv. Haemoglobin v. Collagen. 1.4.NMRinStructuralBiology............................................................................17 2. SUMOProteins...................................................................................................22

2.1. Introduction.....................................................................................................22

2.2. SUMOandSUMOParalogues......................................................................23

2.2.1 Discovery of SUMO Protein Modification.

2.3.Structure............................................................................................................25

2.4.SUMOBindingProteins.................................................................................27

2.5SUMOSubstrateselection.............................................................................25

2.6.SUMOConjugation..........................................................................................28

2.7.EnzymesMediatingSumocycle.................................................................29

2.7.1. SUMO Activating Enzyme (E1) 2.7.2. SUMO Conjugating Enzyme (E2) 2.7.3. SUMO Ligases 2.8CellularFunctionscarriedoutbySUMO.................................................32

2.8.1. Nucleo-cytoplasmic Transport 2.8.2. Transcriptional regulation 2.8.3. Regulation of intracellular localization 2.8.4. Interplay between Ubiquitin and SUMO signalling 2.8.5. Role of SUMO in Mitochondrial Fission 2.8.6. SUMO and Cell-cycle

Summary..................................................................................................................41 3. NuclearMagneticResonanceSpectroscopy............................................42

3.1. Introduction..................................................................................................42

3.1.1. Nuclei with spin 3.1.2. Spin

3.1.3. Properties of Spin+ 3.1.4. Transitions 3.1.5. Energy level Diagrams 3.1.6. Population distribution 3.1.7. CW NMR Experiment 3.1.8. CW-Spectrometer 3.1.9. Relaxation Process

3.2. ChemicalShift................................................................................................50

3.3. Spin-Spincoupling.......................................................................................52

3.4. ProteinNMR...................................................................................................53

3.4.1. Sample Preparation 3.4.2. Isotopic Labelling 3.4.3. Data collection

3.5. One-DimensionalNMRofProteins............................................................55

3.6. Two-DimensionalFourierTransformNMR............................................58

3.6.1. Fourier Transforms 3.6.2. Heteronuclear Single quantum Correlations 3.7. HomonuclearMagneticResonance..........................................................61

3.7.1. COSY 3.7.2. TOCSY 3.7.3. NOESY 3.8. SpectrumDescriptions.................................................................................64

3.8.1. HNCO 3.8.2. HN(CA)CO 3.8.3. HN(CO)CA 3.8.4. HNCA 3.8.5. HN(CO)CA 3.8.6. CBCA(CO)NH 3.8.7. CBCANH 3.8.8. CC(CO)NH 3.8.9. H(CCO)NH 3.8.10. HBHA(CO)NH 3.9. TwoDimensionalFTNMRinproteinstructuredetermination.....75

3.9.1. AMX/Three Spin-Spin systems 3.9.2. Five-Spin systems 3.9.3. Structure of Heptapeptide (YGRGDSP) Summary.....................................................................................................................81

Section2–Experiment.........................................................................................82

Section3–ResultsandDiscussions..................................................................112

FutureProspects.....................................................................................................121

References.................................................................................................................122

1

Preface

One of the most intriguing problems in Biology today is to understand the mechanisms by which a newly synthesized linear polypeptide chain attains its functional three dimensional native structures and undergoes denaturation when subjected to unfavourable condition. There is still a debate on whether a protein can be denatured completely or not. It has been observed that when the proteins are denatured most of them get trapped in the local minima state of energy landscape without landing up in top of the energy hill which represents a fully unfolded state.

The recent experimental evidences have suggested that even on denaturing the protein, there exists some kind of secondary strucutural preferences within its residues. The Nuclear magnetic resonance studies of these denatured protein shows a kind of characteristic patterns or results which are similar to those found in a fully folded folded native protein. The work presented in this report calculates the NMR properties of Denatured SUMO protein by the Ab-initio methods with a good accuracy level that can be compared with the experimental results. The results clearly indicates some kind of structural propensities even after its Denaturation. aaa

2

ListofAbbreviations

A G

ACE – angiotensin converting enzyme GROMACS – Groningen Machine For ADH – antidiuretic harmone Chemical Simulation AAPF – succinyl-Ala-Ala-Pro-p-nitoanilide GPCR – G-protein coupled receptors

B H BLM – Bloom Syndrome gene HF – Hartee Fock BMRB – Bio-magnetic Resonance Bank HSQC – Heteronuclear Single

Quantum Correlation HbA – Haemoglobin A

HPRO HLYS

C I CW – Continuous Wave IKK CS – Chemical Shift COSY – correlation Spectroscopy

D L DFT – Density Functional theory LPS 2,3-DPG – 2,3 Diphosphoglycerate. F M FT – Fourier Transform MD MAS

3

N S

NMR – Nuclear Magnetic Resonance SIMs – SUMO interacting Motifs NLS – Nuclear Localization Signal. SAE – SUMO activating Enzyme NBS – Nuclear bodies SUMO – Small Ubiquitin like Modifier NOESY – Nuclear Overhauser Effect Spectroscopy SBDs – SUMO Binding Domains NOE – Nuclear Overhauser Effect SNPs – Single Nucleotide

Polymorphisms

P T

PDB – Protein Data Bank TMS - Tetramethylsilane PML – promyelocytic Leukaemia protein TOCSY – Total Correlation spectroscopyPPAR-γ – Peroxisome proliferator activated

Receptor-γ U PPM – Parts per million

UBls – Ubiquitin related protein modifiers

R V

R VMD – Visual Molecular Dynamics RF Ran BP

4

Essential Amino Acids

20 essential amino acids

Alanine Ala A hydrophobic

Arginine Arg R free amino group makes it basic and hydrophilic

Asparagine Asn N carbohydrate can be covalently linked ("N-linked) to its -NH

Aspartic acid Asp D free carboxyl group makes it acidic and hydrophilic

Cysteine Cys C oxidation of their sulfhydryl (-SH) groups link 2 Cys (S-S)

Glutamic acid Glu E free carboxyl group makes it acidic and hydrophilic

Glutamine Gln Q moderately hydrophilic

Glycine Gly G so small it is amphiphilic (can exist in any surroundings)

Histidine His H basic and hydrophilic

Isoleucine Ile I hydrophobic

Leucine Leu L hydrophobic

Lysine Lys K strongly basic and hydrophilic

Methionine Met M hydrophobic

Phenylalanine Phe F very hydrophobic

Proline Pro P causes kinks in the chain

Serine Ser S carbohydrate can be covalently linked ("O-linked") to its -OH

Threonine Thr T carbohydrate can be covalently linked ("O-linked") to its -OH

Tryptophan Trp W scarce in most plant proteins

Tyrosine Tyr Y a phosphate or sulphate group can be covalently attached to its -OH

Valine Val V hydrophobic

5

List of figures Figure no Description

1 General evolutionary tree with respect proteins. 2 Relationship between Sequence, Structure and Function. 3 Ubiquitylation process. 4 Schematic representation of Sumoylation Process. 5 Discovery of SUMO protein modification 6 Structure of Dsmt3 from Drosophila melanogaster. 7 Structure comparison of Ubiquitin and Human SUMO. 8 Effects of Sumoylation. 9 SUMO-interacting motif. 10 SUMO Conjugation/Deconjugation cycle. 11 Activation of SUMO by E1 enzyme. 12 Transfer of SUMO to the conjugating enzyme. 13 Three dimensional structure of Ubc9. 14 Interaction between RanBP2 and Ubc9. 15 Interaction between RanBP2 and Ubc9 in Harmonic view. 16 SUMO complexed with RanBP2 (a ligating enzyme).

17 Effects of SUMO mutants on Cell-cycle. 18 RanGAP1 (a target of Sumoylation at the nuclear pore). 19 Role of RanBP2 (E3 Enzyme) in Nucleo-Cytoplasmic Transport. 20 Transcriptional repression via Sumoylation. 21 Active repression of inflammatory response genes. 22 PPAR-γ targeted to the NCoR complexes. 23 Removal of co-repressor complexes by Lippopolysaccharide. 24 Sumoylation of PPAR-γ. 25 Localization of BLM gene in PML nuclear bodies mediated by SUMO. 26 Antagonistic effects of Ubiquitin and SUMO. 27 Sequential action of SUMO and Ubiquitin. 28 Role of SUMO in mitochondrial fission. 29 Low energy configuration of protein in the magnetic field. 30 High energy state of protein in the magnetic field. 31 Transition of a proton from lower energy state to high energy state. 32 Energy difference between the two spins states. 33 Behaviour of Nuclei under constant frequency with varying magnetic

field. 34 Behaviour of Nuclei under constant magnetic field with varying frequency.

6

35 A typical CW-spectrometer. 36 Net magnetization vector. 37 Rotation of proton about the Z-axis. 38 Overall process of relaxation. 39 T1 relaxation process. 40 Phenomenon of Chemical Shielding. 41 Phenomenon of Deshielding. 42 Opposing field exerted by Methanol. 43 Two nuclei three bonds away from each other. 44 Intensity of magnetic field at the nucleus. 45 Spectrum of ethanol. 46 NMR sample in thin walled glass tube. 47 1H NMR spectrum of lysozyme in 750 MHz spectrometer. 48 One dimensional 13C spectrum of a protein 49 Resonances for alanine. 50 CH3 resonances on the binding of substrate to the ligand. 51 Mechanism of Fourier Transform NMR. 52 Contour map of two dimensional spectrum. 53 Principle of 1H-15N HSQC. 54 Characteristic 1H-15N HSQC spectrum of a protein. 55 Comparison of a COSY and TOCSY spectra. 56 Two dimensional spectrum of Isoleucine. 57 Principle of HNCO. 58 HNCO spectrum of a protein. 59 Principle of HN(CA)CO. 60 HN(CA)CO spectrum of protein. 61 Overlay of HNCO and HN(CA)CO spectrum. 62 Principle of HN(CO)CA. 63 HN (CO)CA spectrum of a protein. 64 Principle of HNCA. 65 HNCA spectrum of a protein. 66 Overlay of HNCA and HN(CO)CA. 67 Principle of HN(CO)CA. 68 HN(CO)CA spectrum of protein sample. 69 Principle of CBCA(CO)NH. 70 CBCA(CO)NH spectrum of a protein. 71 Principle of CBCANH. 72 CBCANH spectrum of a protein. 73 Principle of CC(CO)NH. 74 CC(CO)NH spectrum of a protein. 75 Principle of H(CCO)NH 76 H(CCO)NH spectrum of a protein 77 Principle of HBHA(CO)NH

7

78 HBHA(CO)NH spectrum of a protein. 79 Proton chemical shifts of 20 amino acids 80 Two dimensional COSY spectrum of heptapeptide. 81 NOESY spectrum of Heptapeptide, 82 Amide portion of the NOESY spectrum for heptapeptides. 83 Structure of heptapeptide displaying type 2 β turn. 84 Amide portion of NOESY spectrum for Mutant Heptapeptide. 85 Folded SUMO displayed in Rasmol. 86 SUMO protein with Hydrogen added to the backbone and sidechain. 87 SUMO protein after removal of hydrogen. 88 Pymol window showing one of the fragments after MD. 89 Protein visualization in swiss-pdb viewer. 90 SPDBV window showing the cut protein 91 Superimposition of 25 backbone structure by MolMol. 92 Structure of protein before MD 93 Structure of protein after MD in use. 94 Setting up NMR calculation 95 Structure of Dsmt3 having PDB id 2k1f 96 Sequences of SUMO with their structural preferences 97 A typical *.cco file 98 A typical init.cya file 99 Batch file for structure calculation in cyana. 100 Five different topologies generated by Cyana 3.0 101 Accuracy of MD simulations. 102 Topology file. 103 A typical em.mdp file. 104 A typical *.mdp file. 105 Force field interaction. 106 Adding urea to SUMO protein in a box. 107 Mean backbone structure of the trajectories. 108 Calculation of chemical shifts for specific residues in protein. 109 Modified protein used to calculate NMR properties. 110 Setting up GAUSSIAN calculation. 111 Selecting NMR option from the calculation setup box. 112 Setting up DFT calculation. 113 Selecting 6-311G as the basis set. 114 Results of NMR calculations from the results tab. 115 Output of NMR spectra. 116 Visualization of the trajectories.

Section 1 -Literature

8

Chapter1–

Protein evolution, structure and its function

Abstract

Proteinsequencecomparisonisourmostpowerfultoolforcharacterizingproteinsequencesbecause of the enormous amount of information that is preserved throughout theevolutionaryprocess.Formanyproteinsequences,anevolutionaryhistorycanbetracedback1–2billionyears.Proteins thatshareacommonancestorarecalledhomologous.Sequencecomparisonismost informativewhenitdetectshomologousproteins.Homologousproteinsalways sharea common three-dimensional folding structureand theyoften share commonactive sites or binding domains. Frequently homologous proteins share common functions,butsometimestheydonot.Theabilitytocharacterizethebiologicalpropertiesofaproteinbasedon sequencedataalone stemsalmostexclusively fromproperties conserved throughevolutionarytime.Ahealthymindandbodyrequirethecoordinatedactionofbillionsoftinymolecular workers called proteins. Our genes contain the DNA scripts for manufacturingtheseproteins. Someproteinsbuildour cells andotherproteinswork toallowus to think,smell,eatandbreathe.Proteinsare indispensablemolecules inourbodies,andeachhasauniquethree-dimensionalshapethatiswellsuitedforitsparticularjob.Andiftheshapeofevenoneproteinhappens togoawry, therecanbemajorconsequences forhumanhealth.Mis-shapen proteins, especially those that make up part of the cell surface or cellmembranes, are the culprits behind many diseases, including cystic fibrosis, Alzheimer'sdiseaseandcountlessothers.

1.1 Evolution of proteins

Advances in the understanding of the biochemical processes of life have provided a wealth of evidence in support of evolution. Biochemical homologies provide some of the strongest evidence for evolution - partly because of the level of detail they provide, and partly because the nature of some of the homologies makes any explanation other than evolution seem even more farfetched than with the larger-scale homologies. There are a variety of different avenues of biochemical evidence for evolution, but most of them are either examination of genetics or of proteins.

The information conserved during the evolution of a protein molecule can be used to infer reliably homology, and thus a shared protein fold and possibly a shared active site or function. Many protein sequences can be used to infer reliably events that happened more than a billion years ago. Remarkably, some protein sequences change so slowly that they could be used to “date” events that took place more than 5 billion years ago, had the proteins existed.

9 1.1.1 Evolutionary time scales

When we search for homologous proteins, we are trying to identify proteins that shared a common ancestor in the past. The goal of protein sequence comparison is to take a protein sequence, for example from a human chromosome, and search a protein database to find homologous sequences, often from very divergent organisms. Thus, if the similarity search produces significant matches with a protein found in yeast, then an ancestral protein must have existed in an organism at least 1 billion years ago and that the descendants of that organism preserved the sequence in modern day humans and yeast. Likewise, if a yeast protein is homologous to one found in E. coli, that sequence must have existed in 2 billion years ago in the primordial organism that gave rise to bacteria and fungi.

For organisms that diverged within the past 600 Million years, inferences about divergence times for modern organisms are taken from geological data; more ancient divergence times are inferred from extrapolations of evolutionary “clocks.” Evolutionary clocks are based both on slowly changing protein sequences and on ribosomal RNA sequences; such divergence time estimates require a rate of change that is constant on average. The oldest fossils are of prokaryotes in rocks about 2.5 billion years old; this geological age is consistent with that inferred from evolutionary divergence rates.

10

1.1.2 Protein Homologies

Proteins are coded by genes, so in a sense, protein homologies are reflective of genetic homologies. However, they are being considered independently because there has been a lot of work done on examining proteins. A protein is a string of amino acids. Proteins range in size from around 50 amino acids to thousands. Proteins are among the most important chemicals in life: in addition to making up a good chunk of the structure of many organisms, proteins are involved in regulating or controlling many of the functions of a living organism. The characteristics of a protein are determined by the sequence of amino acids of which it is constructed. There is a homology between all livings things regarding amino acids because the same twenty amino acids are found in most kinds of living things. These twenty are a small subset of the amino acids that occur naturally (~250) and there is no known reason why these particular twenty amino acids need to be used over some other subset of amino acids. If different life forms originated independently, there is no reason to think that the same twenty amino acids would be found in most of life. However, it does make sense if all life evolved from a common ancestor that happened to use these twenty amino acids.

Homologies can be found not only among the constituents of proteins but also among proteins themselves. One important point to note about proteins is that, in many cases, smalls changes in some of the amino acids that make up a protein do not appear to have much if any effect on the functioning of the protein. Thus we can have a set of proteins that do essentially the same thing but are not identical.

For some proteins it is estimated that changes in a significant number of the amino acids from which it is constructed will not affect its function. Haemoglobins are such an example because there are actually several types, all of which serve the function of binding oxygen in the blood and yet differ in their amino acid sequences. Haemoglobins are found in a wide variety of life forms and they are all very similar in structure. Given that there are so many different sequences of amino acids that could make a functional haemoglobin molecule; we can ask why the various haemoglobins among vastly different creatures are so similar. Evolution provides a meaningful answer.

1.2 Sequence structure and Function

There are close connections among sequences, structures, and functions. Sequences determine structures, and structures determine functions. In a broader sense, this implies that similar sequences often have similar structures, and that similar structures often function similarly, although conversely it is not always so. From the point of view of evolution, genes mutate as biological systems develop. Genes that belong to close families are certainly similar and the proteins expressed from them should carry similar structures as well. The conserved parts of the structures are very likely to correspond to some biological functions shared by the similar genes and hence the proteins.

Aaaaaaaaaaa

11

These relationships among sequences, structures, and functions are fundamental questions investigated in modern molecular genetics. They are important properties that are often employed in structural and functional studies.

Genes encode proteins by providing a sequence of nucleotides that is translated into a sequence of amino acids. The sequence of amino acids is known as the primary structure of the protein. However, in order to function correctly, this amino acid chain must fold up into a complex three-dimensional shape. Protein folding involves the formation of local structural motifs such as helices and sheets (secondary structures) and the coalescence of these individual structures into an overall three-dimensional configuration (tertiary structure). The greatest achievement of the human genome is its ability to encode the precise three-dimensional shapes of thousands of proteins using linear sequences. It is a trick we still do not fully understand.

Protein structure is essential for correct function because it allows molecularrecognition. For example, enzymesareproteins that catalysebiochemical reactions.The function of an enzyme relies on the structure of its active site, a cavity in theproteinwithashapeandsizethatenableittofittheintendedsubstrateverysnugly.Italsohas the correct chemical properties tobind the substrate efficiently. Theactivesite also contains certain amino acids that are involved in the chemical reactioncatalysedbytheenzyme.

Not all proteins are enzymes, but all in some way rely on molecular recognition in order to perform their functions. Transport proteins such as haemoglobin must recognise the molecules they carry (in this case oxygen), receptors on the cell surface must recognise particular signalling molecules, transcription factors must recognise particular DNA sequence and antibodies must recognise specific antigens. The functional integrity of the cell depends critically on protein-protein interactions, particularly on the formation of multi-protein complexes.

12

Mutations that cause human diseases often disrupt protein structure and therefore abolish normal function. This occurs if one amino acid is replaced with another that has completely different chemical properties or if the sequence of amino acids in a protein is truncated or radically changed. Such changes alter the way the protein folds and prevent the recognition of interacting molecules. Polymorphisms in the coding sequence of a gene can also affect protein structure but do so in more subtle ways, e.g. by replacing one amino acid with another that has similar chemical properties. This is how single-nucleotide polymorphisms (SNPs) influence drug response patterns. For example, they may cause subtle alterations to the structure of the receptors with which drugs interact, or subtle changes to the activity of enzymes responsible for drug metabolism.

1.Specific Examples of Protein Structure and Function i. The Renin-Angiotensin-Aldosterone System The renin-angiotensin-aldosterone system is used by the body to regulate blood pressure. In response to lowered blood pressure, the kidney releases the protease renin, which cleaves the inactive, 14 amino acid peptide angiotensinogen to another inactive peptide, the decapeptide angiotensin I. A second enzyme, angiotensin converting enzyme (ACE), converts this decapeptide to its active form, the octapeptide angiotensin II. Angiotensin II is a potent vasoconstrictor that is about 40 times more potent than norepinephrine at raising vascular pressure. In addition, angiotensin II stimulates the release of aldosterone, a steroid hormone that causes the kidney to reabsorb sodium and water, thus raising blood pressure by an osmotic effect. Angiotensin II is ultimately inactivated by a third peptidase called angiotensinase, which renders the hormone inactive.

13

The renin-angiotensin-aldosterone system is of great importance in the development of a common disease known as essential hypertension. When the renin-angiotensin-aldosterone system is overactive, the basal blood pressure is elevated, putting increased stress on the cardiovascular system. A group of compounds have been developed known as ACE inhibitors which are used quite effectively to treat hypertension. Since they prevent the conversion of angiotensin I to angiotensin II, they prevent the elevation of blood pressure seen in essential hypertension. ii. Oxytocin and Vasopressin Oxytocin and vasopressin are two peptide hormones with very similar structure, but with very different biological activities. Their primary sequences are shown below. Interestingly, their structures only differ by one amino acid residue (the hydrophobic LEU number 8 in oxytocin is replaced by a hydrophilic ARG residue in vasopressin). Oxytocin is a potent stimulator of uterine smooth muscle, and also stimulates lactation. However, vasopressin, also known as antidiuretic hormone (ADH), has no effect on uterine smooth muscle, but causes reabsorbtion of water by the kidney, thus increasing blood pressure.

iii. Insulin and Glucagon

Insulin is an extremely important peptide hormone that is produced by the beta cells of the Islet of Langerhans in the pancreas. It has 51 amino acids, three disulfide cross links, and is comprised of two separate chains, termed A and B. Insulin has a number of important effects on cells in the body including:

1. Stimulation of glycolysis (glucose breakdown). 2. Stimulation of glycogen formation (a storage form for glucose). 3. Enhancement of the rate of fatty acid biosynthesis. 4. Stimulation of the entry of glucose into cells. 5. Overall reduction of blood glucose levels.

Insulin is not synthesized in active form, but is first made as a single inactive peptide chain called preproinsulin (see the figure below). Preproinsulin has no crosslink, and in addition to the A and B chain, has two additional portions called the signal sequence and the connecting (C) peptide. The signal sequence informs the cell that insulin is being made, and that the finished preproinsulin should be deposited outside the cell. Aaaaaaaaaaa

14

The C-peptide is necessary to allow preproinsulin to fold in the correct conformation to ultimately produce active insulin. Preproinsulin is processed by a two step procedure; in the first step, the signal sequence is cleaved by a peptidase, and two of the three crosslink’s are formed to give a new but still inactive peptide called proinsulin. A second peptidase then cleaves the C-peptide, and an internal disulfide forms to produce insulin.

Glucagon is a peptide hormone that is formed in the alpha cells of the Islets of Langerhans in the pancreas. It is a single chain peptide consisting of 29 amino acid residues, and has effects which oppose insulin, including:

• Down regulation of glycolysis. • Enhancement of the rate of glycogenolysis (glycogen breakdown). • Reduction in the rate of fatty acid synthesis. • Enhancement of blood glucose levels.

iv. Haemoglobin

v. Haemoglobin A (HbA) is a tetrameric protein which consists of two alpha chains and two beta chains, and comprises 98% of human haemoglobin A. There is a heme group and an oxygen binding site on each subunit; therefore, each molecule of HbA can carry 4 molecules of oxygen. There are other forms of human haemoglobin A, the most common being HbA2, which has two alpha chains and two delta chains, and accounts for 2% of HbA.

15

Haemoglobin is an example of an allosteric protein, i.e. its function can be altered by the binding of some external substance (called the effector) at a site on the molecule other than the active site (the allosteric site). When an allosteric effector binds to a protein, it induces a conformational change which turns the function of the protein either on (positive allosterism) or off (negative allosterism). In the case of haemoglobin, the allosteric effector is 2, 3-diphosphoglycerate (2, 3-DPG), which causes haemoglobin to have 1/26th of its normal affinity for oxygen. This is an important issue, since 2, 3-DPG in the tissues triggers the release of oxygen at the correct location.

Haemoglobin also exhibits cooperativity, which is a phenomenon wherein the binding of one molecule to a protein with more than one active site influences the ease of binding of subsequent molecules. Cooperativity can be positive (the second molecule binds more easily), or negative (the second molecule binds less easily). In the case of haemoglobin, the binding of oxygen to the four sites of haemoglobin is an example of positive cooperativity. As shown in the figure below, haemoglobin can also exist in a glycosylated form known as HbA1C. HbA1C is formed when the amino terminus of HbA reacts with glucose, first reversibly forming an aldimine or Schiff’s Base, and then undergoing an irreversible Amadori rearrangement to afford the ketamine form HbA1C. In normal patients, HbA1C accounts for about 3-5% of HbA, but in diabetics who have elevated blood glucose for extended periods, this number can reach 6 to 15%. Physicians can measure HbA1C, and are using it as a reliable way to monitor how well diabetic patients are complying with their insulin therapy.

Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

16

vi. Collagen Collagen is a connective tissue protein that is found in skin, bone, tendons, cartilage,

the cornea, etc. It is quite insoluble in water, and is composed of two types of chain termed alpha-1 and alpha-2. In the amino acid sequence of collagen, about every 3rd amino acid is a GLY residue, and there are many prolines which are hydroxylated to form hydroxyproline (HPRO). LYS residues are also hydroxylated in collagen to form HLYS. These additional sidechain OH residues allow for extra strength due to H-bonding, and the GLY residues allow the protein to coil more tightly, since they fit on the inside of the helix. In a collagen fiber, three of these helices are coiled together to form a rope-like structure called a superhelical coil. It is this structure that gives collagen its great strength. Collagen structure can be disrupted in diseases such as scurvy, which is a lack of ascorbic acid, a cofactor in the hydroxylation of proline. In addition, collagen structure is disrupted in rheumatoid arthritis.

Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

17

1.4.NMRinStructuralBiology

Amechanistic understanding of biological function requires the determination of thethreedimensionalmolecularstructuresofbiologicalmacromolecules.Nuclearmagneticresonance (NMR) spectroscopy is a unique technique for measuring structural anddynamicpropertiesofcomplexbiomoleculesundernear-physiologicalconditions.High-resolutionsolutionNMRhasbecomeastandardmethodforstructuredeterminationofsoluble proteins and high resolution spectra are a result of fast isotropic moleculartumbling. In many cases, however, the most natural condition to study molecularstructureanddynamicsofmacromoleculesisinamembrane-integratedform.Aleadingtechnique for investigating biological systems is X-ray crystallography.However,well-orderedthree-dimensionalsinglecrystalsarethemajorrequirementforattaininghighresolution structures of biomolecules of any size, and growing a crystal that diffractsbeyond 3 Å can take a long time. Moreover, many proteins aggregate and hencediffractionqualitysinglecrystalsaredifficulttogrow.Inthisregard,NMRspectroscopyhasoffereduniqueopportunities indeterminingthethreedimensionalarchitectureofproteinswithoutanyneedtogrowsinglecrystals.

Table showing the collection of advantages and disadvantages (or strengths and weaknesses) of NMR and X-ray diffraction.

Advantages of NMR

Disadvantages of NMR

several types of information from lots of types of experiments

We have lots of atoms and a lot of extracted data from a system.

We obtain angles, distances, coupling constants, chemical shifts, rate constants etc. These are really molecular parameters which could be examined more with computers and molecular procedures.

This is good for the more accurate determination of the structure, but not for the availability of higher molecular masses

If we have enough strength of the magnetic field (the resolution is the function of that) than we can handle all of the atoms “personally”

The resolving power of NMR is less than some other type of experiments (e.g.: X-ray crystallography) since the information got from the same material is much more complex

With a suitable computer apparatus we can calculate the whole 3D structure

The highest molecular mass which was examined successfully is just a 64kDa protein-complex

There are lots of possibilities to collect different data-sets from different types of experiments for the ability to resolve the uncertainties of one type of measurements

There are lots of cases when from a given data-set – a given type of experiment – we could predict two or more possible conformations, too

The motion of the segments (domains) can be examined.

Unfortunately we are just able to determine the degree of probability of being of the protein segment in the given conformation

This method is capable to lead us for the observation of the chemical kinetics

The cost of the experimental implementation is increasing with the higher strength and the complexity of the determination

(Activation-)thermodynamic (and certainly kinetic) data could be determined from a well-prepared (dynamic-)NMR experiment

we can investigate the influence of the dielectric constant, the polarity and any other properties of the solvent or some added material

Advantages of X-ray

Disadvantages of X-ray

We can examine the solvent effect since from different solvents the same protein may crystallize into different crystalloid form

The crystal structure is necessary only that proteins which can be crystallized are examinable

We can force the protein to another form of crystallization by the change of its solvent.

We cannot examine solutions and the behaviour of the molecules in solution

We could get the whole 3D structure by the systematic analysis of a good crystallized material

Examining powders and gases very difficult

Studying of motions are not available

We can get only one parameter-set so we are able to observe only one conformation

There is no possibility to examine small parts in the molecule

There is no chance for direct determination of secondary structures and especially domain movements (big disadvantage against the NMR)

The hydrogen in the molecules are not examinable since it has only one electron

ThebeautyofNMRisthatthestructuredeterminationcanbedoneintheliquidphasebydissolvingtheproteininasuitablesolvent(usuallywater).ByrecordingNMRspectraof the protein in the liquid state and analyzing the multi-dimensional spectra usingadvancedcomputationaltools,thesecondarystructuredetailscanbedetermined.Thisis the hallmark of modern protein NMR spectroscopy for which Prof. Kurt Wuthrich(ETH,Switzerland)wasawardedtheNobelPrizein2004.

Membrane proteins are estimated to constitute a third of the genomes of mammalians and some of them, the G-protein coupled receptors (GPCR), represent primary receptor targets for drug discovery. They play a crucial role in the cell, being involved in many vital cellular processes, acting as channels, pumps, receptors and enzymes. For membrane proteins, which are essentially solids of semi-solids, liquid state NMR spectroscopy is of no avail. In solids, fast molecular tumbling is absent and the NMR interactions (chemical shielding, dipolar and quadrupolar interactions) retain their spatial dependence. This gives rise to broad resonances. Ingenious line narrowing techniques have been developed in solid state NMR, of which Magic Angle Spinning (MAS) is by far the one technique that is widely employed. Solid State NMR spectroscopy has offered new structure elucidation tools which are far different from those used in liquid state NMR. In the structure elucidation of membrane proteins, modern solid state NMR has emerged as an effective method.

20

SUMO proteins post translationally modify the other protein. They are about 88-90 residues long. Liquid State NMR is also the method of choice for studying SUMO proteins. Since these are highly dynamic in nature it is difficult to crystallize materials, thus allowing them to be studied by conventional liquid state NMR. For the determination of secondary structure in SUMO systems, new liquid state NMR techniques have been developed and these are constantly refined in structure determinations.

21

In this project, the liquid state NMR of SUMO proteins obtained from Drosophila melanogaster having 88 residues are studied for its secondary structural propensities in the form of Ab-initio calculations.

22

Chapter2

SUMOProteins

Abstract

SUMO (small Ubiquitin-relatedmodifier) family proteins are not only structurally but alsomechanistically related to Ubiquitin. They are covalently attached to and detached fromother protein in the cells tomodify their function. Their roles have been implicated in thecontrol of various cellular process including gene transcription, cell cycle, DNA repair andapoptosis. SUMO molecular machineries are implicated in the formation of signallingnetworksthatunderliethespecificityinthesebiologicalprocesses.AhighresolutionstructureofSUMOispossiblebyNMR.Sincethisproteinisdynamicinsolution,itspropertiesarebetterstudiedbyNMRratherthanX-raycrystallography.

2.1. Introduction

Post-translational protein modifications are versatile devices that cells use to control the function of proteins by regulating their activity, sub cellular localization, stability, as well as their interaction with other proteins. The reversibility of protein modifications enables the participation of proteins regulated by them in multiple rounds of functional circuits. Protein modifications are also important to rapidly regulate and orchestrate protein functions in response to changes in a cell’s state or its environment, without altering their synthesis or turnover rates. The modification that has attracted great attention is protein Ubiquitylation which occurs in a three step process leading to the attachment of small protein Ubiquitin to the lysine residues of the substrate protein. Those proteins which show these properties are collectively called as Ubiquitin-related protein modifiers (Ubl’s). They are post-translationally attached to substrate proteins by enzymatic reactions that are similar to Ubiquitin conjugation. Due to its involvement in a variety of important processes of eukaryotic cell biology; SUMO is the most intriguing Ubl. Despite the similarities in their structure and the enzymatic reactions underlying their conjugation, SUMO and Ubiquitin have distinct non-overlapping functions. Ubiquitylation It is a process of attachment of protein Ubiquitin to the lysine residues of a substrate protein. Ub is activated in an ATP-dependent manner by Ub-activating enzymes (E1), and is transferred to an Ub-conjugating enzyme (E2) via a thio-ester bond. An Ub-protein ligase (E3) specifically attaches Ub to the ε-amino group of a lysine residue in the target protein. (fig.3) Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

23

2.2 SUMO and SUMO paralogues Just few years ago, only Ub seemed to be an important protein modifier. Today we know that several proteins have similarities both in the sequence and in the three-dimensional structure to Ubiquitin. These Ubl’s fulfil other cellular functions than targeting proteins for degradation. The best characterized Ubl protein so far is SUMO. It is present in all eukaryotic cells and is highly conserved from yeast to humans. Whereas invertebrates have only a single SUMO gene, three members of the SUMO family have been described in vertebrates:

i. SUMO1ii. SUMO2iii. SUMO3

Sumoylation

All the SUMO proteins are covalently attached to the target proteins through an isopeptide bond by a mechanism similar to that of Ubiquitylation known as Sumoylation. This process also involves the role of E1, E2 and E3 enzymes. In mammals only one E1 and one E2 enzymes are known so far. Interestingly, the E2 enzyme UBC9 is not only able to form a reactive thiolester with SUMO, but also binds SUMO non-covalently. Several E3 Ligases for SUMO proteins are known so far (PIAS family proteins, Pc2, RanBP2 and TOPORS). E3 Ligases for Sumoylation enhance the conjugation process rather than being required for its mediations, as it is generally during the Ubiquitylation reactions.

`

24

2.2.1 Discovery of SUMO protein modification

The Ubiquitin-related protein SUMO-1 was discovered in studies on nuclear import in mammalian cells as a covalent modification of RanGAP1. This discovery may have been facilitated by its unique property of being nearly quantitatively and constitutively modified with SUMO. This modification targets the otherwise cytosolic RanGAP1 to the nuclear pore complex where it participates in nuclear import by activating the GTPase activity of the cytosol/nucleus shuttling factor Ran. Sumoylation of RanGAP1 leads to its interaction with the Ran binding protein (RanBP2) at the cytoplasm filaments of the Nuclear Pore Complex. As discussed in a later section, RanBP2 itself is modified by Sumoylation and, moreover, has recently been shown to act as a SUMO ligase. SUMO also has different names like dGMPT, dPIC1T, dsentrinT, dSMT3T, or dUBL1T.

25

2.3 Structure

Sumo proteins are small proteins; most are around 100 amino acids in length and 12 kDa in mass. The exact length and mass varies between Sumo family members and depends on which organism the protein comes from. For example, human SUMO1, also shown in the figures, is 101 residues long and has a mass of 11.6 kDa. Its homologues in rat and mice are also 101 residues long, while the presumed relative in C. elegans has only 91 amino acids. The structure of human SUMO-1 has been determined by NMR and compared to the crystal structure of Ubiquitin. Although the sequence identity between SUMO and Ubiquitin is relatively low (~18% identity) the overall three-dimensional structures are very similar. TheproteinstudiedinthisworkwasobtainedfromDrosophilaMelanogaster.Thisproteinis88residueslongandhasfourBetasheetsandtwoalphahelices.IthasmethionineatitsN-terminal end andGlycine at its C-terminal end. This protein is homologous tomany otherproteinsofthefamily.

Differences between SUMO and Ubiquitin

The surface charge distributions of the two proteins are quite different, indicating that they interact specifically with distinct enzymes and substrates. Another prominent feature of SUMO-1 is a protruding long and flexible N-terminal domain, which is absent in Ubiquitin. In yeast, a lysine residue within this N-terminal domain has been implicated in the formation of poly-SUMO chains.

Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

26

Surprisingly, however, the entire extension including this lysine can be deleted without severe consequences for the yeast indicating that, in contrast to Ubiquitin, chain formation is not important for SUMO function in S. cerevisiae. A feature that is shared between the mature forms of SUMO and Ubiquitin, and also some other Ubl’s, is a di-glycine motif at the C-terminus. This motif was shown to be critical for SUMO conjugation in S. cerevisiae.

Effects of Sumoylation

Sumoylation can have three general consequences for a modified protein.

27

2.4 SUMO binding proteins

While our understanding of the mechanisms of SUMO protein conjugation are quite advanced, much remains to be understood on how these modifications are translated into different biological responses. Consistent with the role of Ub binding proteins it was proposed that many if not most functions can be mediated via SUMO binding partners. Indeed, several studies have shown that the functional properties of SUMO isoforms in vivo to a great extent reflect their ability to mediate distinct protein-protein interactions thus forming multimeric signalling complexes. This is based on the ability of SUMO paralogues to engage in non-covalent binding to other proteins containing specific motifs that recognize SUMOs, called SUMO-interacting Motifs (SIMs), also known as SUMO-binding domains (SBDs). SIMs is present in a great variety of proteins. Biophysical studies of the SUMO-interacting motif (SIM) revealed that the small hydrophobic region is an essential determinant of SUMO recognition.

2.5. SUMO substrate selection

The analysis of an increasing number of SUMO targets has confirmed that the majority of SUMO accepting lysine residues (K) lie within the consensus sequence KXE. This sequence constitutes a transferable motif sufficient to transform test proteins into suitable substrates for Sumoylation in vitro. In one of the studies it was shown that the Sumoylation consensus sequence mediates direct interaction with SUMO-conjugating enzyme Ubc9. It is assumed that this interaction is sufficient to target some substrates such as RanGAP, whereas SUMO Ligases are required in addition for modification of others. Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

28

2.6. SUMO conjugation

SUMO conjugation to its target is analogous to that of Ubiquitin.

i. A C-terminal peptide is cleaved from SUMO by a protease (in human these are the SENP proteases or Ulp1 in yeast) using ATP to reveal a di-glycine motif.

ii. SUMO then becomes bound to an E1 enzyme (SUMO Activating Enzyme (SAE)) which is a heterodimer.

iii. It is then passed to an E2 which is a conjugating enzyme (Ubc9). Finally, one of a

small number of E3 ligating proteins attaches it to the protein.

SUMO conjugation/deconjugation cycle

29

2.7. Enzymes mediating the SUMO cycle Sumoylation appears to be a highly selective process both with respect to the choice of substrates as well as to the timing of their modification. How substrate specificity and the timing of their modification are achieved is beginning to emerge. The enzymes required for reversible SUMO conjugation (Sumoylation) were first characterized in the yeast S. cerevisiae. Some of these enzymes such as the SUMO-activating enzyme (E1) and SUMO-conjugating enzyme (E2) have sequences with similarities to their counterparts in the Ubiquitin system, and are conserved from yeast to humans. Despite the similarity to enzymes of the Ubiquitin system or to those involved in conjugation of other Ubl’s such as Rub1/NEDD8, the enzymes mediating the SUMO cycle appear to be specific for this modifier. A balance between the SUMO conjugating and deconjugating activities is critical for a variety of processes. These findings were consistent with the general observation that in cell populations and probably within a single cell only a small fraction of a given substrate is to be detected in its sumoylated form at a given time. In other words, the Sumoylation/desumoylation cycle is highly dynamic, and is for many substrates synchronized with the cell cycle. How this synchronicity is achieved is largely unknown. Several enzymes of the SUMO system including E1, E2, some E3s and deconjugating enzymes (Ulp2), as well as most SUMO conjugates are found enriched within the cell nucleus, whereas the deconjugating enzymes Ulp1 in S. cerevisiae and SENP2 in mammals as well as the SUMO ligase RanBP2 are associated with the nuclear pore . The latter distribution is consistent with functions of Sumoylation in cytosol/nucleus transit. 2.7.1 SUMO-activating enzyme (E1) Like Ubiquitin, the C-terminus of mature SUMO generated by the activity of processing protease needs to be activated for posttranslational conjugation, i.e., for isopeptide bond formation with substrate lysine residues. SUMO-activating enzyme (E1) is a heterodimer of Aos1 (SAE1, Sua1) and Uba2 (SAE2), proteins with sequence similarities to the N- and C-terminal parts, respectively, of Ubiquitin-activating enzymes. ATP-dependent activation occurs via a non-covalently bound SUMO adenylate intermediate followed by the formation of a thioester between SUMO and an active site cysteine in Uba2.

Aaaaaaaaaaaaaaaaaaaaaaaaaaaa

30

Both the subunits of E1 are essential for viability consistent with an essential function of SUMO modification. The levels of human Aos1 were found to be regulated during the cell cycle reaching a peak in S phase, whereas Uba2 levels remained unchanged. These data suggested that regulation of SUMO activating enzyme might be mediated via Aos1. This assumption is consistent with the presence of two genes, SAE1a and SAE1b, coding for Aos1-type subunits with ~81% sequence identity to each other in the Arabidopsis genome, with only one gene (SAE2) present coding for Uba2. SUMO-activating enzyme is found predominantly in the cell nuclei of species ranging from yeast to mammals. TheDrosophilaUba2 isexpressedatall timesof the lifecyclebut ismostabundantduringembryogenesis,suggestingarequirementforhighersumoylationratesinproliferatingcells2.7.2 SUMO-conjugating enzyme (E2)

In a transesterification reaction, activated SUMO is transferred from the Uba2 subunit of SUMO-activating enzyme to a single SUMO-conjugating enzyme (E2) known as Ubc9. As a result, a SUMO-Ubc9 thioester intermediate is formed.

As the SUMO-activating enzyme, Ubc9 is a predominantly nuclear protein. Studies on Ubc9 in mammals have shown that at least a fraction of it is associated with cytoplasmic and nucleoplasmic filaments of the NPC.

Aaaaaaaaaaa

31

Ubc9 proteins are well conserved with ~56% identity between the mammalian and S. cerevisiae orthologues. Ubc9 established that it has an overall fold that is quite similar to the core domain of Ubiquitin-conjugating enzymes. N-terminal end of Ubc9 provides the binding site for SUMO. Increasing concentrations of SUMO were shown to result in a displacement of E1 from E1–Ubc9 complexes in vitro with the concomitant formation of a noncovalent Ubc9–SUMO complex. The physiological relevance of this noncovalent binding site for SUMO on Ubc9 is unclear. It may increase the affinity of a SUMO thioester-bound Ubc9 to the distal end of a growing poly-SUMO chain. It also shows the binding sites for RanBP2 (a SUMO ligase). ThelysineresiduewithintheRanBP2motifformsashallowhydrophobicgroove.WithinthisgrooveAsp127ofUbc9appearstoengageinhydrogenbondingwiththeLysresidue,whichmayassistinthecatalysis.

32

2.7.3 SUMO Ligases Three different general types of SUMO E3 Ligases have been described. The first E3 group comprises the PIAS family of proteins. In yeast only two E3 proteins have been identified (Siz1 and Siz2) which have sequence similarity to mammalian PIAS proteins, of which at least five members have SUMO E3 activity. These proteins share a common RING finger-like structure and bind directly to the Ubc9 E2 enzyme and some SUMO protein targets. This RING finger motif has also been identified in some of the Ubiquitin E3 Ligases. A second type of SUMO E3 protein found in mammalian systems is RanBP2, which is part of the nuclear pore complex. RanBP2 differs from the PIAS proteins in that it does not have a RING finger domain or homology to Ubiquitin E3 proteins. However, it interacts with Ubc9 although not the sumoylation target protein. The final E3 protein type (Pc2) belongs to the Poly-comb protein family and stimulates sumoylation of C terminus binding protein.

2.8. Cellular functions controlled by SUMO Proteins

Genetic studies in different model organisms underline the crucial function of the SUMO system for normal cell functions. Targeted disruption of the SUMO E2 enzyme Ubc9 gene in mice results in early embryonic lethality and is associated with severe defects in nuclear morphology. In the budding yeast S. Cerevisiae temperature-sensitive mutants in the genes for the SUMO system exhibit arrest at the G2/M boundary of the cell division. The importance of de-conjugating SUMO from target proteins is illustrated by the genetic data in the yeast S. Cerevisiae and in the plant Arabidopsis showing that SUMO deconjugation is needed for viability and in the control of flowering time, respectively.

33

2.8.1. Nucleo-cytoplasmic Transport

The first identified target for Sumoylation, the GTPase activating protein RanGAP1, also provided the first link between nucleocy-toplasmic transport and Sumoylation. RanGAP1 is a key component of the RanGTPase cycle, which serves as the driving force for directional movement through the nuclear pore. Vertebrate RanGAP1 is highly enriched at the Nuclear pore, where it forms a stable complex with RanBP2. This association depends on RanGAP1 Sumoylation.

34

RanBP2 was shown to be the E3 ligase that mediates the Sumoylation of RanGAP1. Despite the existence of at least two cytoplasmic SUMO isopeptidases, RanGAP1 is efficiently constitutively modified in vivo. The presence of a SUMO E3 ligase at the nuclear pore complex provides the bases for a speculative model in which modification of certain targets is coupled to their shuttling model in which modification of certain targets is coupled to their shuttling into and out of the nucleus. Yet, it is not clear if Sumoylation precedes import or if import precedes Sumoylation. Finally, Sumoylation of mammalian RanGAP1 may even affect transport of proteins that are not themselves sumoylated, but contain a SUMO binding domain.

2.8.2. Transcriptional regulation

Sumoylation of transcription factors, cofactors or proteins involved in chromatin remodelling have been shown to modulate the transcriptional activity and to regulate signalling pathways, such as Wnt Pathway, which is linked to various human cancers like colon, hepatocellular carcinoma, leukaemia or melanoma. After stimulation with the Wnt ligand, β-catenin enters the nucleus and recruits a chromatin remodelling complex to activate transcription. Among other proteins, reptin was identified to be a part of the β-catenin chromatin remodelling complex. Sumoylation of reptin causes repression of the expression of the metastasis suppressor gene KAI1 and results in promotion of tumour metastasis. Abrogation of SUMO modification of reptin promotes the expression of KAI1 and thus inhibits the invasive activity of cancer cells.

.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

35

The Peroxisome Proliferator-Activated Receptor –γ (PPAR- γ) has essential roles in adipogenesis and glucose homeostasis. It has the ability to repress the transcriptional activation of inflammatory response genes in macrophages. This PPAR- γ-dependent repression is initiated by ligand-induced Sumoylation of its ligand binding domain. This modification targets PPAR- γ to the NCoR complexes associated with the promoter.

36

The recruitment of the Ubiquitylation/19S proteasome machinery, that normally mediates the signal-dependent removal of co-repressor complexes, is prevented. As a result, NCoR complexes are not cleared from the promoter and target genes are maintained in a repressed state.

37

Depending on the context, PPAR-γcan be an activator of transcription as well as promoter-specific repressor of certain inflammatory response genes. 2.8.3. Regulation of intranuclear localization

PML nuclear bodies (NBs) are nuclear vesicle like structures that have been implicated in processes such as transcriptional regulation, genome stability, and response to viral infection, apoptosis and tumor suppression. A lot of proteins found in the PML NBs are sumoylated. Furthermore, components of the sumoylation machinery are also localized in PML NBs. However it is PML (promyelocytic Leukemia protein), which is the major essential component of PML NBs. It contains a SUMO interacting a SUMO interacting motif and is sumoylated itself.

Example of localization

Bloomsyndromegene(BLM),encodesaRecQDNAhelicasethatwhenabsentfromthecellresults in genomic instability and cancer predisposition. BLM normally resides in itsunmodifiedforminthePMLnuclearbodies(PML-NBs). It leavesthePML-NBstosurveythenucleoplasm for specific preferred substrates, such as Holliday junctions, D-loops or otherrecombinational intermediates for which DNA has a high affinity (DNA substratesurveillance).IfBLMdoesnotencounteroneofitspreferredDNAsubstrates,itissumoylatedandtherebyre-directedtothePML-NBs.WhenBLMarrivesatthePML-NBs,itisrapidlyde-sumoylated by a SUMO protease within the PML-NBs to begin another round of DNAsubstratesurveillance. If,ontheotherhand,BLMencountersandbindsoneof itspreferredDNAsubstrates(DNAsubstratebinding),astructuralchangeoccursintheN-terminalregionthatpreventssumoylationfromoccurring.Forexample,theUBC9bindingsitecouldbecomeinaccessibleforbinding.

38

2.8.4. Interplay between Ubiquitin and SUMO Signalling In many cases, there is an active interplay between Ubiquitin and SUMO in the regulation of individual proteins and/or cellular pathways. For example, several targets can be modified either by Ubiquitin or by SUMO. Emerging data show that Ubiquitin and SUMO cross-talk plays an important role in stress and DNA repair response both by regulating cell cycle arrest (NEMO) and by controlling translation DNA synthesis (PCNA) NF-kB signalling pathway

It has become apparent that modification systems often communicate and jointly affect the properties of common proteins, in some cases by being to the same site. For example, two key regulators in the NF- kB pathway are alternatively modified by SUMO and Ubiquitin.

Antagonistic effects of Ubiquitin and SUMO

Sumoylation protects the NF- kB inhibitor I- kB inhibitor from degradation by blocking its Ubiquitylation at the same lysine residue.

39

Regulation of IkBa stability is achieved by attachment of Ubiquitin (gray) or SUMO (pink). Phosphate moieties are represented in red, and lysine and serine residues relevant to the modifications are indicated. Signalling from cell-surface receptors (shown in yellow) leads to phosphorylation and subsequent Ubiquitylation of IkBa. Proteasome-mediated degradation then releases active NF-kB, which translocates into the nucleus and activates transcription. By contrast, sumoylation stabilizes IkBa, thereby preventing the release of NF-kB.

Sequential actions of SUMO and Ubiquitin

Sequential Sumoylation /Ubiquitylation triggers nucleocy-toplasmic shuttling and activation of the IkB kinase regulator NEMO in response to DNA damage.

The IKK regulatory subunit NEMO associates reversibly with the catalytic subunits IKKa and IKKb. Sumoylation of NEMO causes its retention in the nucleus, where ATM-dependent signalling by DNA damage (shown in yellow) induces phosphorylation and subsequent Ubiquitylation of NEMO. The ubiquitylated form can then leave the nucleus and associate with IKKa/b to yield an active kinase. Activation of NF kB is achieved through signal induced Phosphorylation of IkB at specific N-terminal serine residues by the IkB kinase (IKK) complex. This phosphorylation triggers IkB degradation via the Ubiquitin proteasome pathway, resulting in NF-kB translocation into the nucleus.

40

2.8.5. Role of SUMO in mitochondrial fission

The dynamin related protein DRP1 (mammalian orthologues of S. cerevisiae Dnm1), which is involved in mitochondrial fission, was recently reported to bind to Ubc9 and SUMO in a two-hybrid assay, and to be sumoylated in mammalian cells. DRP1-SUMO appeared to be preferentially localized at the site of mitochondrial fission. Over expression of SUMO-1, moreover, resulted in a stabilization of DRP1 and in an induction of mitochondrial fragmentation. Taken together, these data suggested the involvement of SUMO modification in the control of mitochondrial fission. In the same study it was reported that a large number of SUMO conjugates are to be found in mitochondrial protein preparations indicating that sumoylation is a common mitochondrial modification.

2.8.6. SUMO and cell cycle

Mutations in the SUMO conjugation system in S. cerevisiae have revealed the importance of this protein modification for normal execution of the cell cycle. Mutants deficient in SUMO conjugation accumulate at G2/M in the cell cycle with duplicated DNA content, short spindles, unseparated sister chromatids and undivided nuclei. These cells are unable to degrade Pds1 and mitotic cyclins.

41

Summary

Small Ubiquitin like modifier proteins (SUMO) is a vital protein in living organisms that performs some important functions like cell-cycle regulation, mitochondrial fission, nucleo-cytoplasmic transport, signalling, transcriptional regulation etc. These proteins collectively function as a modifier protein in our body very much similar to Ubiquitin. Thus studying is Denaturation characteristics will be helpful in understanding its molecular dynamics and the condition in which it can exist and function properly. This protein which generally is considered to be dynamic is difficult to be studied through x-ray crystallography. This 88-residue protein has been studied by solution state Nuclear Magnetic Resonance Spectroscopy (NMR) and its structural propensities have been identified. Thus NMR has been proved to be a valuable resource for studying the biomolecules and the bigger proteins.

42

Chapter3

NuclearMagneticResonanceSpectroscopy

Abstract

OverthepastfiftyyearsNuclearMagneticResonanceSpectroscopy,commonlyreferredtoas(NMR), has become the preeminent technique for determining the structure of organicbiomolecules. Of all the spectroscopic methods, it is the only one for which a completeanalysis and interpretation of the entire spectrum is normally expected. Although largeramountsofsampleareneededthanformassspectroscopy,NMRisnon-destructive,andwithmodern instruments good data may be obtained from samples weighing less than amilligram. This technique is commonly used in structural biology to elucidate molecularstructuresandconformationsbystudying1Hand13Cnuclei.NMRissensitivetomanyothernuclei,however,andisnotrestrictedtotheseuses.ThefieldofNMRcontinuestogrowataprodigious rate and applications ofNMR can be found in virtually every field of structuralbiology.NMRhaseven leadtothedevelopmentofMagneticResonance Imaging(MRI),animportantmedicalimagingtechnique.

3.1 Introduction

Nuclear magnetic resonance is a phenomenon which occurs when the nuclei of certain atoms are immersed in a static magnetic field and exposed to a second oscillating magnetic field. Some nuclei experience this phenomenon, and others do not, dependent upon whether they possess a property called spin.

The proton possesses a property called spin which:

1. Can be thought of as a small magnetic field, and 2. Will cause the nucleus to produce an NMR signal.

Not all nuclei possess the property called spin.

43

3.1.1 Nuclei with Spin

The shell model for the nucleus tells us that nucleons, just like electrons, fill orbitals. When the number of protons or neutrons equals 2, 8, 20, 28, 50, 82, and 126, orbitals are filled. Because nucleons have spin, just like electrons do, their spin can pair up when the orbitals are being filled and cancel out. Almost every element in the periodic table has an isotope with a non zero nuclear spin. NMR can only be performed on isotopes whose natural abundance is high enough to be detected. Some of the nuclei routinely used in NMR are listed below.

Nuclei Unpaired Protons Unpaired Neutrons Net Spin (MHz/T) 1H 1 0 1/2 42.58 2H 1 1 1 6.54 31P 1 0 1/2 17.25 23Na 1 2 3/2 11.27 14N 1 1 1 3.08 13C 0 1 1/2 10.71 19F 1 0 1/2 40.08

3.1.2 Spin

Spin is a fundamental property of nature like electrical charge or mass. Spin comes in multiples of 1/2 and can be positive or negative. Protons, electrons, and neutrons possess spin. Individual unpaired electrons, protons, and neutrons each possess a spin of 1/2.

In the deuterium atom ( 2H ), with one unpaired electron, one unpaired proton, and one unpaired neutron, the total electronic spin = 1/2 and the total nuclear spin = 1. Two or more particles with spins having opposite signs can pair up to eliminate the observable manifestations of spin. An example is helium. In nuclear magnetic resonance, it is unpaired nuclear spins that are of importance.

Whentheprotonisplacedinanexternalmagneticfield,thespinvectoroftheparticlealigns itselfwith theexternal field, just likeamagnetwould. There isa lowenergyconfigurationorstatewherethepolesarealignedN-S-N-SandahighenergystateN-N-S-S.

44

3.1.3 Properties of Spin

When placed in a magnetic field of strength B, a particle with a net spin can absorb a photon, of frequency . The frequency depends on the gyromagnetic ratio, of the particle.

= B

For hydrogen, = 42.58 MHz / T.

3.1.4 Transitions This particle can undergo a transition between the two energy states by the absorption of a photon. A particle in the lower energy state absorbs a photon and ends up in the upper energy state. The energy of this photon must exactly match the energy difference between the two states. The energy, E, of a photon is related to its frequency, by Planck's constant (h = 6.626x10-34 J s).

In NMR and MRI, the quantity is called the resonance frequency and the Larmor frequency.

3.1.5 Energy Level Diagrams The energy of the two spin states can be represented by an energy level diagram. Since

= B and E = h , therefore the energy of the photon needed to cause a transition between the two spin states is

When the energy of the photon matches the energy difference between the two spin states, absorption of energy occurs.

= B

E=h

E=h B

45

In the NMR experiment, the frequency of the photon is in the radio frequency (RF) range. In NMR spectroscopy, is between 60 and 800 MHz for hydrogen nuclei. In clinical MRI, is typically between 15 and 80 MHz for hydrogen imaging.

3.1.6 Population distribution

When a group of spins is placed in a magnetic field, each spin aligns in one of the two possible orientations.

At room temperature, the number of spins in the lower energy level, N+, slightly outnumbers the number in the upper level, N-. Boltzmann statistics tells us that

N-/N+ = e-E/kT.

E is the energy difference between the spin states; k is Boltzmann's constant, 1.3805x10-

23 J/Kelvin; and T is the temperature in Kelvin. As the temperature decreases, so does the ratio N- /N+. As the temperature increases, the ratio approaches one. The signal in NMR spectroscopy results from the difference between the energy absorbed by the spins which make a transition from the lower energy state to the higher energy state, and the energy emitted by the spins which simultaneously make a transition from the higher energy state to the lower energy state. The signal is thus proportional to the population difference between the states. NMR is a rather sensitive spectroscopy since it is capable of detecting these very small population differences. It is the resonance, or exchange of energy at a specific frequency between the spins and the spectrometer, which gives NMR its sensitivity.

46

3.1.6 CW NMR Experiment

The simplest NMR experiment is the continuous wave (CW) experiment. There are two ways of performing this experiment. In the first, a constant frequency, which is continuously on, probes the energy levels while the magnetic field is varied. The energy of this frequency is represented by the blue line in the energy level diagram.

The CW experiment can also be performed with a constant magnetic field and a frequency which is varied. The magnitude of the constant magnetic field is represented by the position of the vertical blue line in the energy level diagram.

3.1.7 CW-spectrometer. A solution of the sample in a uniform 5 mm glass tube is oriented between the poles of a powerful magnet, and is spun to average any magnetic field variations, as well as tube imperfections. Radio frequency radiation of appropriate energy is broadcast into the sample from an antenna coil (colored red). A receiver coil surrounds the sample tube, and emission of absorbed RF energy is monitored by dedicated electronic devices and a computer. An NMR spectrum is acquired by varying or sweeping the magnetic field over a small range while observing the rf signal from the sample. An equally effective technique is to vary the frequency of the rf radiation while holding the external field constant.

47

3.1.8 Relaxation processes This process is vital for the nuclei in the higher energy state return to the lower state. Emission of radiation is insignificant because the probability of re-emission of photons varies with the cube of the frequency. At radio frequencies, re-emission is negligible. Ideally, the NMR spectroscopist would like relaxation rates to be fast - but not too fast. If the relaxation rate is fast, then saturation is reduced. If the relaxation rate is too fast, line-broadening in the resultant NMR spectrum is observed. At equilibrium, the net magnetization vector lies along the direction of the applied magnetic field Bo and is called the equilibrium magnetization Mo. In this configuration, the Z component of magnetization MZ equals Mo. MZ is referred to as the longitudinal magnetization. There is no transverse (MX or MY) magnetization here.

3.1.8.1 Precession If the net magnetization is placed in the XY plane, it will rotate about the Z axis at a frequency equal to the frequency of the photon which would cause a transition between the two energy levels of the spin. This frequency is called the Larmor frequency.

48

There are two major relaxation processes; • Spin - lattice (longitudinal) relaxation • Spin - spin (transverse) relaxation

49

3.1.8.2 Spin - lattice relaxation Nuclei in an NMR experiment are in a sample. The sample in which the nuclei are held is called the lattice. Nuclei in the lattice are in vibrational and rotational motion, which creates a complex magnetic field. The magnetic field caused by motion of nuclei within the lattice is called the lattice field. This lattice field has many components. Some of these components will be equal in frequency and phase to the Larmor frequency of the nuclei of interest. These components of the lattice field can interact with nuclei in the higher energy state, and cause them to lose energy (returning to the lower state). The energy that a nucleus loses increases the amount of vibration and rotation within the lattice (resulting in a tiny rise in the temperature of the sample). Longitudinal relaxation is due to energy exchange between the spins and surrounding lattice (spin-lattice relaxation), re-establishing thermal equilibrium. As spins go from a high energy state back to a low energy state, RF energy is released back into the surrounding lattice. The recovery of longitudinal magnetization follows an exponential curve. The recovery rate is characterized by the tissue-specific time constant T1. After time T1, longitudinal magnetization has returned to 63 % of its final value. With 1.5 T field strength, T1 values are about 200 to 3000 ms. T1 values are longer at higher field strengths.

50

3.1.8.3 Transverse Relaxation Transverse relaxation results from spins getting out of phase. As spins move together, their magnetic fields interact (spin-spin interaction), slightly modifying their precession rate. These interactions are temporary and random. Thus, spin-spin relaxation causes a cumulative loss in phase resulting in transverse magnetization decay. Transverse magnetization decay is described by an exponential curve, characterized by the time constant T2. After time T2, transverse magnetization has lost 63 % of its original value. T2 is tissue-specific and is always shorter than T1. Transverse relaxation is faster than longitudinal relaxation. T2 values are unrelated to field strength 3.2 Chemical Shift When an atom is placed in a magnetic field, its electrons circulate about the direction of the applied magnetic field. This circulation causes a small magnetic field at the nucleus which opposes the externally applied field.

The magnetic field at the nucleus (the effective field) is therefore generally less than the applied field by a fraction .

B = Bo (1-s) In some cases, such as the benzene molecule, the circulation of the electrons in the aromatic orbitals creates a magnetic field at the hydrogen nuclei which enhances the Bo field. This phenomenon is called deshielding. In the following example, the Bo field is applied perpendicular to the plane of the molecule. The ring current is travelling clockwise if looked downwards at the plane.

51

The electron density around each nucleus in a molecule varies according to the types of nuclei and bonds in the molecule. The opposing field and therefore the effective field at each nucleus will vary. This is called the chemical shift phenomenon.

Consider the methanol molecule. The resonance frequency of two types of nuclei in this example differs. This difference will depend on the strength of the magnetic field, Bo, used to perform the NMR spectroscopy.

The greater the value of Bo, the greater the frequency difference. This relationship could make it difficult to compare NMR spectra taken on spectrometers operating at different field strengths. The term chemical shift was developed to avoid this problem.

The chemical shift of a nucleus is the difference between the resonance frequency of the nucleus and a standard, relative to the standard. This quantity is reported in ppm and given the symbol delta, .

d = (n - nREF) x106 / nREF

In NMR spectroscopy, this standard is often tetramethylsilane, Si (CH3)4, abbreviated TMS. The chemical shift is a very precise metric of the chemical environment around a nucleus. For example, the hydrogen chemical shift of a CH2 hydrogen next to a Cl will be different than that of a CH3 next to the same Cl. It is therefore difficult to give a detailed list of chemical shifts in a limited space.

52

3.3 Spin - spin coupling Nuclei experiencing the same chemical environment or chemical shift are called equivalent. Those nuclei experiencing different environment or having different chemical shifts are non-equivalent. Nuclei which are close to one another exert an influence on each other's effective magnetic field. This effect shows up in the NMR spectrum when the nuclei are non-equivalent. If the distance between non-equivalent nuclei is less than or equal to three bond lengths, this effect is observable. This effect is called spin-spin coupling or J coupling. Consider the following example. There are two nuclei, A and B, three bonds away from one another in a molecule.

The spin of each nucleus can be either aligned with the external field such that the fields are N-S-N-S, called spin up , or opposed to the external field such that the fields are N-N-S-S, called spin down . The magnetic field at nucleus A will be either greater than Bo or less than Bo by a constant amount due to the influence of nucleus B.

Here Ethanol is considered as an example;

The 1H NMR spectrum of ethanol (below) shows the methyl peak has been split into three peaks (a triplet) and the methylene peak has been split into four peaks (a quartet).

53

This phenomenon is known as spin-spin splitting and can be explained by n+1 rule. Each type of proton senses the number of equivalent protons (n) on the carbon atom next to the one to which it is bonded, and its resonance peak is split into (n+1) components. This occurs because there is a small interaction (coupling) between the two groups of protons. The spacing’s between the peaks of the methyl triplet are equal to the spacing’s between the peaks of the methylene quartet. This spacing is measured in Hertz and is called the coupling constant, J. Thus utilizing the n+1 rule, methylene protons are situated next to a carbon bearing three methyl protons. According to the rule it has three equivalent neighbours (n=3) and is split into n+1=4 peaks (a quartet). The methyl protons are situated next to a carbon bearing only two methylene hydrogen. According to the rule these protons have one neighbour (n=2) and are split into n+1=3 peaks (a doublet).

3.4 Protein Nuclear Magnetic Resonance Spectroscopy Protein nuclear magnetic resonance spectroscopy (usually abbreviated protein NMR) is a field of structural biology in which NMR spectroscopy is used to obtain information about the structure and dynamics of proteins. The field was pioneered by, among others, Kurt Wüthrich, who shared the Nobel Prize in Chemistry in 2002. Protein NMR techniques are continually being used and improved in both academia and the biotech industry. Structure determination by NMR spectroscopy usually consists of several following phases, each using a separate set of highly specialized techniques: I. Samplepreparation.II. ResonanceassignmentsIII. Generationofrestraints.IV. Calculationofstructures.V. Validationofstructures.

54

3.4.1 Sample preparation Protein nuclear magnetic resonance is performed on aqueous samples of highly purified protein. Usually the sample consists of between 300 and 600 microlitres with a protein concentration in the range 0.1 – 3 millimolars. The source of the protein can be either natural or produced in an expression system using recombinant DNA techniques through genetic engineering. Recombinantly expressed proteins are usually easier to produce in sufficient quantity, and makes isotopic labelling possible.

The most abundant isotopes of carbon and oxygen, carbon-12 and oxygen-16, have no net nuclear spin, which is the physical property nuclear magnetic resonance spectroscopy exploits. The most abundant isotope of nitrogen, nitrogen-14, has a net nuclear spin of 1; however, it also has a large quadrupolar moment, a property of the atomic nuclei which prevents high-resolution information to be obtained from this isotope. Thus nuclear magnetic resonance of proteins from natural sources is restricted to utilizing nuclear magnetic resonance based solely on protons.

However the less common isotopes, carbon-13 and nitrogen-15, have a net nuclear spin of 1/2, a simpler case making them suitable for nuclear magnetic resonance, and therefore labelling the proteins with these compounds opens up possibilities for doing more advanced experiments which also detect or use these nuclei. 3.4.2 Isotopic labelling Isotopic labelling is done by growing the expression host in a growth media enriched with the desired isotopes. Since isotopically-enriched compounds remain expensive, organisms are used which are capable of growing on a defined minimal medium, containing only one carbon-13 source, usually glucose, but occasionally glycerol or methanol, and one nitrogen-15 source such as ammonium chloride or ammonium sulphate . These organisms include organisms such as Escherichia coli, which is the most frequently used type of bacteria, and Pichia pastoris, which is the most frequently used yeast. The purified protein is usually dissolved in a buffer solution and adjusted to the desired solvent conditions. The NMR sample is prepared in a thin walled glass tube.

55

3.4.3 Data collection Protein NMR utilizes multidimensional nuclear magnetic resonance experiments to obtain information about the protein. Ideally, each distinct nucleus in the molecule experiences a distinct chemical environment and thus has a distinct chemical shift by which it can be recognized. However, in large molecules such as proteins the number of resonances can typically be several thousand and a one-dimensional spectrum inevitably has incidental overlaps. Therefore multidimensional experiments are performed which correlate the frequencies of distinct nuclei. The additional dimensions decrease the chance of overlap and have larger information content since they correlate signals from nuclei within a specific part of the molecule. Magnetization is transferred into the sample using pulses of electromagnetic (radiofrequency) energy and between nuclei using delays; the process is described with so-called pulse sequences. Pulse sequences allow the experimenter to investigate and select specific types of connections between nuclei. The array of nuclear magnetic resonance experiments used on proteins falls in two main categories: i. Onewheremagnetizationistransferredthroughthechemicalbonds,andii. Onewherethetransferisthroughspace,irrespectiveofthebondingstructure. The first category is used to assign the different chemical shifts to a specific nucleus, and the second is primarily used to generate the distance restraints used in the structure calculation, and in the assignment with unlabelled protein. Depending on the concentration of the sample, on the magnetic field of the spectrometer, and on the type of experiment, a single multidimensional nuclear magnetic resonance experiment on a protein sample may take hours or even several days to obtain suitable signal-to-noise ratio through signal averaging, and to allow for sufficient evolution of magnetization transfer through the various dimensions of the experiment. Other things being equal, higher-dimensional experiments will take longer than lower-dimensional experiments. Depending on the concentration of the sample, on the magnetic field of the spectrometer, and on the type of experiment, a single multidimensional nuclear magnetic resonance experiment on a protein sample may take hours or even several days to obtain suitable signal-to-noise ratio through signal averaging, and to allow for sufficient evolution of magnetization transfer through the various dimensions of the experiment. Other things being equal, higher-dimensional experiments will take longer than lower-dimensional experiments. 3.5 One dimensional NMR of proteins. Proteins contain many different kinds of protons, so we expect their one-dimensional spectra to be quite complicated. This expectation born out in the figure below, with the spectrum of lysozyme, a relatively small protein of 129 amino acids. Even with a high field magnet of a 750 MHz instrument. The resonances obtained were quite complicated, many of them overlapped, thus making the assignment of most resonances from the one-dimensional (1D) spectrum impossible.

56

However there are some general positions of many of the resonances. Side chain protons on aliphatic carbons usually occur at less than 4 PPM. Protons on α carbon are between 4 and 5 ppm. Aromatic protons have resonances between 6.8 and 7.8 ppm. The resonances for amide protons occur between 8 and 9 ppm. Amino and imino protons are found between 6.6 and 8.2 ppm and at high ppm.

Amino acids have characteristic resonances resulting from the proton on α carbon and the protons on the side chains. The figure below shows the CH and CH3 resonances for alanine, which occur at low ppm.

57

One dimensional 1H NMR will still answer many questions about proteins and their substrates. As one example which looks at the peptidyl prolyl cis-trans activity of cyclophilin. This 163 amino acid protein binds the important immunosuppressive drug cyclosporin A. In addition it catalyzes the isomerisation of peptidyl prolyl cis configuration to the trans, which can be a rate-determining step in protein folding. One dimensional NMR can be used to demonstrate that cyclosporine A blocks the cis-trans activity of cyclophilin. This is demonstrated in the following figure:

58

The CH3 resonances for the alanine in Succinyl-Ala-Ala-Pro-p-nitoanilide (AAPF), which is a model substrate for cyclophilin is shown in the above figure. The major resonances at 1.48 ppm for AAPF molecules with P in the trans-configuration and minor resonances at about 1.47 ppm for AAPF molecules with P in the less stable cis configuration. There is an isomerisation between the cis and trans configuration for each proline, but these two species are in slow exchange compared to NMR time scale of about two seconds to run the experiment. Thus distinct resonances are seen for each of the species. If the isomerisation were fast exchange, then we would see only one weighted average peak. Broadening of the resonances occurs when two species have intermediate exchange at about the time scale of the NMR experiment. When Cyclophilin is added only the minor resonances have been broadened. Since AAPF is present in a 300-fold excess, so the resonances due to the cyclophilin are unimportant. Hence the effect is most pronounced for the minor resonance which is seen as a shoulder. This means that cyclophilin speeds up rate of isomerisation between cis and trans proline, although the equilibrium is not changed. The effect of blocking the cyclophilin with cyclosporin A in figure 44.c, where the minor resonance has returned. 3.6 Two-Dimensional Fourier Transform NMR With its short pulses of energy that contain all the frequencies of interest. FT NMR is rapid, and therefore excellent for acquiring repeated scans that can be averaged to reduce noise. The use of pulses also introduces the possibility of simplifying spectrum by looking only at the interactions between the nuclei and spreading these interactions over to two or more dimensions.

3.6.1 Fourier Transforms

The Fourier transform (FT) is a mathematical technique for converting time domain data to frequency domain data, and vice versa.

Two important interactions are used to assign resonances and investigate the conformation of a protein.

59

a) Spin-spin interactions through the bonds between the nearby nuclei – this tells us about the bonding structure in a protein. They can be used to assign resonances.

b) The Nuclear Overhauser Effect (NOE) which changes the signal intensity of one nucleus when another nearby is irradiated –this can be used to assign resonances to their order in the sequence of a protein, and can be used to measure the distance between nuclei.

In using two-dimensional (2D) FT NMR data using two frequency axes are plotted, which will show a crosspeak at the frequencies corresponding to each pair of nuclei that interact. Usually a 2D spectrum is either for the spin-spin interaction or for the nuclear Overhauser effect. The magnitude of the effect is represented by the intensity of the crosspeak; it is usually shown by the number of contours at the point of interaction. The contours are like contour map of a mountain and more the contours, the higher the peak corresponding to the interaction. The interaction may be between any two types of NMR-active nuclei: 1H-1H, 1H-13C, 13C-13C, and so on. Since a nucleaus will interact with itself, a 1D spectrum, as the contour lines of peaks, appear on the diagonal when the same 1D spectrum is plotted along both the axes.

3.6.2. Heteronuclear single quantum correlation (HSQC) Typically the first experiment to be measured with an isotope-labeled protein is a 2D heteronuclear single quantum correlation (HSQC) spectrum where "heteronuclear" refers to nuclei other than 1H. In theory the heteronuclear single quantum correlation has one peak for each H bound to a heteronucleus. Thus in the 15N-HSQC one signal is expected for each amino acid residue with the exception of proline which has no amide-hydrogen due to the cyclic nature of its backbone. Tryptophan and certain other residues with N-containing sidechains also give rise to additional signals. 3.6.2.1. 1H-15N-HSQC (2D) Magnetization is transferred from hydrogen to attached 15N nuclei via the J-coupling. The chemical shift is evolved on the nitrogen and the magnetisation is then transferred back to the hydrogen for detection.

60

This is the most standard experiment and shows all H-N correlations. Mainly these are the backbone amide groups, but Trp side-chain Nε-Hε groups and Asn/Gln side-chain Nδ-Hδ2/Nε-Hε2 groups are also visible. The Arg Nε-Hε peaks are in principle also visible, but because the Nε chemical shift is outside the region usually recorded, the peaks are folded/aliased (this essentially means that they appear as negative peaks and the Nε chemical shift has to be specially calculated). If working at low pH the Arg Nη-Hη and Lys Nζ-Hζ groups can also be visible, but are also folded/aliased.

The spectrum is rather like a fingerprint and is usually the first heteronuclear experiment performed on proteins. Analysis of the 15N-HSQC allows researchers to evaluate whether the expected number of peaks is present and thus to identify possible problems due to multiple conformations or sample heterogeneity.

61

The relatively quick heteronuclear single quantum correlation experiment helps determine the feasibility of doing subsequent longer, more expensive, and more elaborate experiments. (From it you can assess whether other experiments are likely to work and for instance, whether it is worth carbon labelling the protein before spending the time and money on it). It is not possible to assign peaks to specific atoms from the heteronuclear single quantum correlation alone.

3.5 Homonuclear nuclear magnetic resonance

With unlabelled protein the usual procedure is to record a set of two dimensional Homonuclear nuclear magnetic resonance experiments through correlation spectroscopy (COSY), of which several types include conventional correlation spectroscopy, total correlation spectroscopy (TOCSY) and nuclear Overhauser effect spectroscopy (NOESY). A two-dimensional nuclear magnetic resonance experiment produces a two-dimensional spectrum. The units of both axes are chemical shifts.

3.5.1 COSY

The conventional correlation spectroscopy experiment is only able to transfer magnetization between protons on adjacent atoms. Thus in a conventional correlation spectroscopy, an alpha proton transfers magnetization to the beta protons, the beta protons transfers to the alpha and gamma protons, if any are present, then the gamma proton transfers to the beta and the delta protons, and the process continues.

3.5.2 TOCSY

In the total correlation spectroscopy experiment the protons are able to relay the magnetization, so it is transferred among all the protons that are connected by adjacent atoms. Thus in total correlation spectroscopy, the alpha and all the other protons are able to transfer magnetization to the beta, gamma, delta, epsilon if they are connected by a continuous chain of protons. The continuous chains of protons are the sidechain of the individual amino acids.

62

Thus these two experiments are used to build so called spin systems, that is build a list of resonances of the chemical shift of the peptide proton, the alpha protons and all the protons from each residue’s sidechain. Which chemical shifts corresponds to which nuclei in the spin system is determined by the conventional correlation spectroscopy connectivity’s and the fact that different types of protons have characteristic chemical shifts.

3.5.2.1 Examples of COSY

COSY of Isoleucine

The above figure is an example of 2D 1H NMR spectrum. Only 3.6 ppm resonance has a single crosspeak, so it can be said that this must be due to the CαH. The crosspeak must be through-bond spin-spin interaction with the CβH, so it has been identified at resonance of 1.9 ppm. CβH will also have interactions with CγH3 and the CγH2 protons. The two CH2 protons are not equivalent, and they interact with the CδH3. This identifies the 1.4 and 1.2 ppm resonances as the CγH2 protons as well as the 0.8 ppm resonance as the CδH3. It is also noted that the two C CγH2 protons have crosspeak due to their mutual interaction. The remaining 0.9 ppm resonance is CγH3, which of course interacts with the C βH. Thus it is seen that 2D COSY spectrum is an excellent way to identify NMR resonances.

63

3.5.3 NOESY

To connect the different spin systems in a sequential order, the nuclear Overhauser effect spectroscopy experiment has to be used. Because this experiment transfers magnetization through space, it will show crosspeaks for all protons that are close in space regardless of whether they are in the same spin system or not. The neighboring residues are inherently close in space, so the assignments can be made by the peaks in the NOESY with other spin systems.

One important problem using Homonuclear nuclear magnetic resonance is overlap between peaks. This occurs when different protons have the same or very similar chemical shifts. This problem becomes greater as the protein becomes larger, so Homonuclear nuclear magnetic resonance is usually restricted to small proteins or peptides.

64

3.6 Spectrum Descriptions: This contains a list of the solution NMR experiments most commonly used in protein NMR assignment and structure calculation. For each experiment there is an illustration showing which atoms are observed (pink) and through which atoms magnetisation flows (light blue). 3.6.1 HNCO (3D) Magnetisation is passed from 1H to 15N and then selectively to the carbonyl 13C via the 15NH-13CO J-coupling. Magnetisation is then passed back via 15N to 1H for detection. The chemical shift is evolved on all three nuclei resulting in a three-dimensional spectrum.

This is the most sensitive triple-resonance experiment. In addition to the backbone CO-N-HN correlations, Asn and Gln side-chain correlations are also visible. It is mainly used to obtain CO chemical shifts which can be used in a program like TALOS to help predict secondary structure. The HNCO can also be useful for backbone assignment in conjunction with the HN(CA)CO, if the CBCANNH and CBCA(CO)NNH spectra are of bad quality.

65

3.6.2 HN(CA)CO (3D) The Magnetisation is transferred from 1H to 15N and then via the N-Cα J-coupling to the 13Cα. From there it is transferred to the 13CO via the 13Cα-13CO J-coupling. For detection the magnetisation is transferred back the same way: from 13CO to 13Cα, 15N and finally 1H. The chemical shift is only evolved on 1H, 15N and 13CO and not on the 13Cα. The result is a three-dimensional spectrum. Because the amide nitrogen is coupled both to the Cα of its own residue and that of the preceding residue, both these transfers occur and transfer to both 13CO nuclei occurs. Thus for each NH group, two carbonyl groups are observed in the spectrum. But because the coupling between Ni and Cαi is stronger than that between Ni and Cαi-1, the Hi-Ni-COi peak generally ends up being more intense than the Hi-Ni-COi-1 peak.

This experiment can be useful for backbone assignment when used in conjunction with the HNCA, HN(CO)CA and HNCO if the CBCANNH and CBCA(CO)NNH spectra are of bad quality.

66

An overlay of the HNCO and HN(CA)CO spectra makes it very easy to distinguish between COi and COi-1 for each NH group.

3.6.3 HN(CO)CA (3D) The magnetisation is passed from 1H to 15N and then to 13CO. From here it is transferred to 13Cα and the chemical shift is evolved. The magnetisation is then transferred back via 13CO to 15N and 1H for detection. The chemical shift is only evolved for the 1HN, the 15NH and the 13Cα, but not for the 13CO. This results in a spectrum which is like the HNCA, but which is selective for the Cα of the preceding residue.

67

This experiment can be useful for backbone assignment when used in conjunction with the HNCA, HNCO and HN(CA)CO if the CBCANNH and CBCA(CO)NNH spectra are of bad quality.

3.6.4 HNCA (3D) Here the magnetisation is passed from 1H to 15N and then via the N-Cα J-coupling to the 13Cα and then back again to 15N and 1H hydrogen for detection. The chemical shift is evolved for 1HN as well as the 15NH and 13Cα, resulting in a 3-dimensional spectrum. Since the amide nitrogen is coupled both to the Cα of its own residue and that of the preceding residue, both these transfers occur and peaks for both Cαs are visible in the spectrum. However, the coupling to the directly bonded Cα is stronger and thus these peaks will appear with greater intensity in the spectra.

68

This experiment can be useful for backbone assignment when used in conjunction with the HN(CO)CA, HNCO and HN(CA)CO if the CBCANNH and CBCA(CO)NNH spectra are of bad quality.

By overlaying the HN(CO)CA spectrum with the HNCA, it becomes even easier to identify and distinguish between all Cαi and Cαi-1 peaks.

69

3.6.5 HN(CO)CA (3D) The magnetisation is passed from 1H to 15N and then to 13CO. From here it is transferred to 13Cα and the chemical shift is evolved. The magnetisation is then transferred back via 13CO to 15N and 1H for detection. The chemical shift is only evolved for the 1HN, the 15NH and the 13Cα, but not for the 13CO. This results in a spectrum which is like the HNCA, but which is selective for the Cα of the preceding residue.

This experiment can be useful for backbone assignment when used in conjunction with the HNCA, HNCO and HN(CA)CO if the CBCANNH and CBCA(CO)NNH spectra are of bad quality.

70

3.6.6 CBCA(CO)NH (3D) Magnetisation is transferred from 1Hα and 1Hβ to 13Cα and 13Cβ, respectively, and then from 13Cβ to 13Cα. From here it is transferred first to13CO, then to 15NH and then to 1HN for detection. The chemical shift is evolved simultaneously on 13Cα and 13Cβ, so these appear in one dimension. The chemical shifts evolved in the other two dimensions are 15NH and 1HN. The chemical shift is not evolved on 13CO.

Along with the CBCANNH and HSQC this forms the standard set of experiments needed for backbone assignment. For large proteins the signal-to-noise may not be great and assignment using the HNCA, HN(CO)CA, HNCO and HN(CA)CO may form a better strategy. When using deuterated protein, the spectrum has to be recorded as an 'out-and-back' method and the signal-to-noise suffers even further.

71

3.6.7 CBCANH (3D) Magnetisation is transferred from 1Hα and 1Hβ to 13Cα and 13Cβ, respectively, and then from 13Cβ to 13Cα. From here it is transferred first to 15NH and then to 1HN for detection. Transfer form Cαi-1 can occur both to 15Ni-1 and 15Ni, or viewed the other way, magnetisation is transferred to 15Ni from both 13Cαi and 13Cαi-1. Thus for each NH group there are two Cα and Cβ peaks visible. The chemical shift is evolved simultaneously on 13Cα and 13Cβ, so these appear in one dimension. The chemical shifts evolved in the other two dimensions are 15NH and 1HN.

Along with the CBCA(CO)NNH and HSQC this forms the standard set of experiments needed for backbone assignment. For large proteins the signal-to-noise may not be great and assignment using the HNCA, HN(CO)CA, HNCO and HN(CA)CO may form a better strategy. When using deuterated protein, the spectrum has to be recorded as an 'out-and-back' method and the signal-to-noise suffers even further.

72

3.6.8 CC(CO)NH (3D) Magnetisation is transferred from the side-chain hydrogen nuclei to their attached 13C nuclei. Then isotropic 13C mixing is used to transfer magnetisation between the carbon nuclei. From here, magnetisation is transferred to the carbonyl carbon, on to the amide nitrogen and finally the amide hydrogen for detection. The chemical shift is evolved simultaneously on all side-chain carbon nuclei, as well as on the amide nitrogen and hydrogen nuclei, resulting in a three-dimensional spectrum.

This is a useful spectrum for obtaining carbon side-chain assignments, but isn't necessarily a must.

73

3.6.9 H(CCO)NH (3D) min. labelling: 15N, 13C Magnetisation is transferred from the side-chain hydrogen nuclei to their attached 13C nuclei. Then isotropic 13C mixing is used to transfer magnetisation between the carbon nuclei. From here, magnetisation is transferred to the carbonyl carbon, on to the amide nitrogen and finally the amide hydrogen for detection. The chemical shift is evolved simultaneously on all side-chain hydrogen nuclei, as well as on the amide nitrogen and hydrogen nuclei, resulting in a three-dimensional spectrum with one nitrogen and two hydrogen dimensions.

This is a useful spectrum for obtaining hydrogen side-chain assignments, but isn't necessarily a must.

74

3.6.10 HBHA(CO)NH (3D) min. labelling: 15N, 13C This experiment is similar to the CBCAC(CO)NH: magnetisation is transferred from 1Hα and 1Hβ to 13Cα and 13Cβ, respectively, and then from 13Cβ to 13Cα. From here it is transferred first to13CO, then to 15NH and then to 1HN for detection. The chemical shift it not evolved on any of the carbon atoms. Instead, it is evolved on the 1Hα and 1Hβ, the 15NH and 1HN. This results in a three-dimensional spectrum with one nitrogen and two hydrogen dimensions.

This is a useful spectrum for obtaining Hα and Hβ assignments, but isn't necessarily a must.

75

3.7. Two Dimensional FT NMR in determining the structure of a protein. Biologically active Arg-Gly-Asp (RGD) oligopeptides provides us with a simple example of how 2D 1H NMR might be used to investigate structure. The heptapeptide Tyr-Gly-Arg-Gly-Asp-Ser-Pro (YGRGDSP) binds tightly to membrane - spanning receptors in cells. In contrast the conservative change of aspartic acid for Glutamic acid (E) gives a heptapeptide that does not bind to the membrane spanning receptors. The first task is to assign the resonances in the 1D1H NMR spectrum of each heptapeptide. Since only seven amino acids are dealt with, 1D spectrum is relatively simple with resonances well separated. The spectrum is generally measured in H20 at acid pH (pH 4), so that labile protons have a low exchange rate with the solvent. Thus, there are resonances due to the exchangeable protons as well as nonexchangeable protons in the heptapeptide. Resonances for the labile NH protons of the amide are mandatory for NMR analysis of peptides and proteins. The approximate position of of the resonances for the perotons in each amino are well known.

Many of these patterns of chemical shifts are unique to one of the 20 amino acids. For example, Valine has β-proton at 2.1 ppm and two diastereotopic methyl resonances at around 0.8 ppm. This pattern is clearly visible in a TOCSY spectrum.

76

However the protein will typically have a number of valines in the sequence, so at this point it can just be said that it may represent one of these residues. Other spin-spin systems fall into a group of amino acids. 3.7.1 AMX/ Three spin-spin systems A number of amino acids have the spin system CHα-CH2-R where R is a “dead end” for J coupling: a quaternary carbon or a heteroatom (such as oxygen or sulphur). These are called AMX or Three-spin systems, and include Asp, Asn, Cys, Phe, Tyr, Trp, and His. All have in addition to the backbone HN and Hα, two Hβ resonances in the vicinity of 3 ppm (2.6-3.4). If this pattern is observed in a TOCSY spectrum along an HN line, it can be concluded that it belongs to this “AMX” group of amino acids. Serine is technically an AMX spin system (CHα-CH2OH), but the β protons are shifted downfield by the oxygen closer to 4 ppm and just upfield of the Hα resonance. This makes Ser a recognizable “unique” spin system rather than part of the AMX group. 3.7.2 Five-spin systems CHα-CH2-CH2-R, where again R is a “dead end”: Glu, Gln, and Met. The two β-proton resonances appear around 2 ppm and the γ-protons (which may or may not be degenerate) are farther downfield (2.3-2.6 ppm). This pattern can usually be distinguished from the AMX pattern. 3.7.3 Structure of YGRGDSP peptide.

77

The COSY spectrum of YGRGDSP in the figure above shows crosspeaks between protons on adjacent atoms. The crosspeaks for amide NH/CαH, CαH/CβH, CβH/CγH, CγH/CδH, and CδH with one of the side chain NH resonances. Patterns such as these can be used to assign the resonances to amino acid spin systems. While the COSY spectrum can be used to assign resonances to many kinds of amino acids, it cannot be used to distinguish among different amino acids with the same spin system. However NOESY spectrum will have additional crosspeaks between the CαH of the amino acid i and the amide NH of i+1, which can be used for sequential assignment, and thus distinguish between different amino acids with the same pattern. A portion of NOESY spectrum of YGRGDSP is shown in the figure, which shows these additional crosspeaks for the RGD peptide.

The NOESY peak for the amide NH of G interacting with the CαH of Y1 identifies the G as the first Glycine, G2. The G2 resonance for CαH is identified by its NOESY interaction with the amide NH of R3. Similarly the amide NH of G4 interacts with the CαH of R3, and the CαH of G4 interacts with the amide NH of D5. For this simple peptide, the NOESY peak between the CαH for D5 and the amide NH for S6 provides no additional information. The P7 has no amide NH, and thus no interaction with the CαH of S6. The CδH resonances of P are often used in sequential assignment. Sequential assignment with NOESY crosspeaks is essential for the identification of each amino acid in the NMR studies of more complex proteins.

78

A Heptapeptide is so short that it would be largely random in structure in aqueous solution, and the crosspeaks observed in the COSY and NOESY spectra are the time average of all of the conformations assumed. With the resonances assigned, unexpected NOESY crosspeaks are looked upon between the protons that are normally far apart, but show up because a conformational preference form this sequence brings them closer together. Such a crosspeak is visible between the amide protons of aspartic acid and the Glycine in the fourth position. This crosspeak is characteristic of a type 2β turn.

79

However its mutant form YGRGESP where D from the wild form is replaced by E, has three amide NH/amide NH crosspeaks between R and G4, E and G4, and E and S.

The R-G4 and E-G4 crosspeaks are indicative of a type 1 or Tupe3 β turn. A type 1 β turn would have the amide linkage between G4 and R rotated 180 degrees from the amide. A type 3 β turn is a part of a 310 helix. The E-S crosspeak also suggests a 310

helical structure. Thus it can be said that the two heptapeptides (wild and mutant) have quite different conformational preferences in solution. Clearly the basic principle of 2D FT NMR spectroscopy is that it is used to determine the structure of the RGD oligopeptides. A COSY, sequential assignment from a NOESY (structural information from a NOESY) can be applied to longer peptides and proteins. Like protein crystallography, this method can also give a complete description of the three-Dimensional structure to atomic resolution. As a practical matter it becomes very tedious to assign the spin systems as the number of amino acids increases. However the construction of NOESY-COSY connectivity diagram aids considerably in the sequential. NOESY crosspeaks can be used to connect the CαH of residues i with the NH of residue i+1. By alternatively using these COSY and NOESY connectivity’s, it is possible to skip through the sequential assignment along the backbone.

80

Once the spin system and sequential assignments have been made, the three-dimensional structure of the protein can be investigated by using the NOESY crosspeaks between protons that are far apart in the sequence but close enough in distance for the through-space interaction (up to 0.5nm). The distance information from a 1H-1H NOESY is a key to solving the structure. Spin-spin coupling constants are sensitive to torsion angles, and provide the additional constraints. Secondary structures are easily identified early in the structure determination by the backbone crosspeaks. For instance, 1. Type 2 β turn – NH/NH crosspeaks between residues 3 and 4. 2. Type 1 β turn – NH/NH crosspeaks between residues 2 and 3, and 3 and 4. 3. α Helix and 310 helix – strong sequential crosspeaks between residues i and

i+1(corresponding to a distance of about 0.28nm).

- CαH/NH crosspeaks between residues i and i+3 corresponding to a distance of about 0.32nm)

4. β Strands – Strong sequential CαH/NH crosspeaks between residues i and i+1

(corresponding to a distance of about 0.22. Furthermore, secondary structures define specific torsion angles that result in specific spin-spin coupling constants. For instance, the α Helix is expected to have sequential J values between NH and CαH of about 4 Hz due to the repeating ϕ angle of 60 degrees. The β strand is expected to have sequential J values between NH and CαH of about 9 Hz due to the repeating ϕ angle of 120 degrees. Secondary structure also effects the chemical shift dispersion of CαH. The proton resonance is shifted to lower ppm for the α helix and higher ppm for the β strand.

81

Summary

Nuclear magnetic resonance spectroscopy is the use of the NMR phenomenon to study physical, chemical, and biological properties of matter. As a consequence, NMR spectroscopy finds applications in several areas of science. NMR spectroscopy is routinely used by chemists to study chemical structure using simple one-dimensional techniques. Two-dimensional techniques are used to determine the structure of more complicated molecules. These techniques are replacing x-ray crystallography for the determination of protein structure. Time domain NMR spectroscopic techniques are used to probe molecular dynamics in solutions. Solid state NMR spectroscopy is used to determine the molecular structure of solids.

Section 2

- Experiment.

82

--Experiment--13CChemicalshiftcalculationsfor8Murea-denaturedstateofSUMOproteinfromDrosophilamelanogasterbyDensityFunctionTheory(DFT)methods.

Abstract

SUMO, an important post-translational modifier of variety of substrate protein,regulatesdifferentcellularfunctions.HeretheNMRchemicalshiftsofthe8Murea-denaturedstateofSUMOfromDrosophilamelanogaster(Dsmt3)hasbeenreportedinordertofindoutthestructuralpreferenceamongthemifpresent.

Outcome Expected:

The NMR resonance assignment of the 8 M urea-denatured state of SUMO from Drosophila Melanogaster has been already carried out and also been submitted to the Biological Magnetic Resonance Bank (BMRB). The DFT calculations which are to be calculated on the same protein should also be able to give similar kind of result, which in turn can be correlated with the original NMR experiment. The accuracy of the experiment can be used to assign structural preferences to this protein even upon their Denaturation.

Normal phenomenon

Expected outcome

83

Tools used:

1. Protein visualization software’s like,

• Rasmol • Pymol • Swiss PDB Viewer • MolMol

2. Molecular dynamics related software’s like, • CYANA 2.1 • GROMACS

3. Protein modification software’s like, • Swiss PDB Viewer • Argus Lab • Ramachandran plot explorer.

4. Software for calculating NMR properties like, • Gaussian 03 5. Software for making the NMR Input file and visualizing the output

• Gaussview The other software’s which could also be used, • Visual Molecular Dynamics (VMD) for molecular Dynamics. • Vega ZZ for Molecular Dynamics for molecular dynamics

RasMol

RasMol is a computer program written for molecular graphics visualization intended and used primarily for the depiction and exploration of biological macromolecule structures, such as those found in the Protein Data Bank. It was originally developed by Roger Sayle in the early 90s. Historically; it was an important tool for molecular biologists since the extremely optimized program allowed the software to run on (then) modestly powerful personal computers.

84

Before RasMol, visualization software ran on graphics workstations that, due to their expense, were less accessible to scholars. RasMol has become an important educational tool as well as continuing to be an important tool for research in structural biology. RasMol has a complex version history. Starting with the series of 2.7 versions, RasMol is licensed under a dual license (GPL or custom license RASLIC). Thus, RasMol is (along with BALLView, Molekel, Jmol and PyMOL), among the few open source molecular visualization programs available.

RasMol includes a language (for selecting certain protein chains, or changing colors etc). Jmol has incorporated the RasMol scripting language into its commands. Protein Databank (PDB) files can be downloaded for visualization from the Research Collaborators for Structural Bioinformatics (RCSB) bank. These have been uploaded by researchers who have characterized the structure of molecules usually by X-ray crystallography or NMR spectroscopy.

Besides these, the basic information about the protein can be obtained like,

• Number of Hydrogen Bonds. • Number of Alpha Helices. • Number of Beta Sheets. • Number of Turns • Number of atoms in the protein. • Labeling the residues of the protein

Protein can also be visualized in different styles ranging from wireframe to Cartoons. Proteins can also be colored according to their groups, their structures, Chain and temperature using basic RasMol commands, thus ensuring the user to get all the basic details about the protein structure.

85

Pymol

PyMOL is an open-source, user-sponsored, molecular visualization system created by Warren Lyford DeLano and commercialized by DeLano Scientific LLC, which is a private software company dedicated to creating useful tools that become universally accessible to scientific and educational communities. It is well suited to producing high quality 3D images of small molecules and biological macromolecules such as proteins. According to the author, almost a quarter of all published images of 3D protein structures in the scientific literature were made using PyMOL.

PyMOL is one of few open source visualization tools available for use in structural biology. The Py portion of the software's name refers to the fact that it extends, and is extensible by the Python programming language.

Functions performed in Pymol.

1. Adding and removing hydrogen’s to and from the protein structure.

86

Removal of the hydrogen’s from protein is sometimes necessary for performing molecular dynamics on them as they add to the polarity and some unwanted charges. 2. Visualizing the output from molecular Dynamics. This especially can be used to play

the motions of the protein during the process of molecular dynamics in the form of trajectories.

87

Swiss-Pdb Viewer

Swiss-Pdb Viewer (aka Deep View) is an application that provides a user friendly interface allowing analyzing several proteins at the same time. The proteins can be superimposed in order to deduce structural alignments and compare their active sites or any other relevant parts. Amino acid mutations, H-bonds, angles and distances between atoms are easy to obtain thanks to the intuitive graphic and menu interface. Swiss-Pdb Viewer (aka Deep View) has been developed since 1994 by Nicolas Guex. Swiss-Pdb Viewer is tightly linked to SWISS-MODEL, an automated homology modelling server developed within the Swiss Institute of Bioinformatics (SIB) at the Structural Bioinformatics Group at the Biozentrum in Basel.

Working with these two programs greatly reduces the amount of work necessary to generate models, as it is possible to thread a protein primary sequence onto a 3D template and get an immediate feedback of how well the threaded protein will be accepted by the reference structure before submitting a request to build missing loops and refine sidechain packing. Swiss-Pdb Viewer can also read electron density maps, and provides various tools to build into the density. In addition, various modelling tools are integrated and command files for popular energy minimization packages can be generated. Finally, as a special bonus, POV-Ray scenes can be generated from the current view in order to make stunning ray-traced quality images.

88

Functions involved in the modification of the protein structure:

Swiss PDB Viewer was used in the experiment for cutting the proteins for the ease of calculating the NMR properties.

MolMol

MOLMOL is a molecular graphics program for display, analysis, and manipulation of three-dimensional structures of biological macromolecules, with special emphasis on nuclear magnetic resonance (NMR) solution structures of proteins and nucleic acids. MOLMOL has a graphical user interface with menus, dialog boxes, and on-line help. The display possibilities include conventional presentation, as well as novel schematic drawings, with the option of combining different presentations in one view of a molecule. Covalent molecular structures can be modified by addition or removal of individual atoms and bonds, and three-dimensional structures can be manipulated by interactive rotation about individual bonds. Special efforts were made to allow for appropriate display and analysis of the sets of typically 20-40 conformers that are conventionally used to represent the result of an NMR structure determination, using functions for superimposing sets of conformers, calculation of root mean square distance (RMSD) values, identification of hydrogen bonds, checking and displaying violations of NMR constraints, and identification and listing of short distances between pairs of hydrogen atoms.

89

Features:

1. For structure manipulation.

• create new molecule • add residue at start or end • change residue (mutation) • remove residue • add/remove atoms • add pseudo atom • flip ring/methyl atoms for better superposition • add bond • generate bonds between atoms with close distance • remove bonds • add/remove angles • add/remove angle constraints • calculate mean structure • definition of distances (constraints, H-bonds) • Setting of dihedral angles (construction of helices) • interactive rotation about single bonds • calculation of superpositions and principal axes

2. Calculations

a. global RMSDs with average, standard deviation, minimum and maximum b. global displacements c. local RMSDs d. local displacements e. RMSD calculations for groups of structures f. calculate best matching structure parts g. reduce number of structures (calculate clusters of similar structures) h. solvent accessible surface of residues i. electrostatic potential j. ring current and bond polarization shifts k. missing atom coordinates (protons, pseudo atoms) l. angular order parameters m. angles between helix axes n. relative lengths of principal axes o. angles of bonds relative to principal axes p. find/generate bonds between close atoms q. find/generate H-bonds r. find Van der Waals violations s. find short distances between atoms, generate peak list t. check distance constraints u. check angle constraints v. interactive measurement of bond lengths, bond angles and dihedrals

90

MOLMOLwasusedintheexperimentforcalculatingthemeanstructurefromanumberbest trajectories generated bymolecular dynamics. In the process itwas also used tosuperimpose the calculated trajectories frommolecular dynamics thus helping to findoutwhichpartoftheproteinwasmoredynamic.

Cyana 2.1

Automated nuclear magnetic resonance (NMR) structure calculation can be done with the program CYANA. Given a sufficiently complete list of assigned chemical shifts and one or several lists of cross-peak positions and columns from two-, three-, or four-dimensional nuclear Overhauser effect spectroscopy (NOESY) spectra, the assignment of the NOESY cross-peaks and the three-dimensional structure of the protein in solution can be calculated automatically with CYANA. It can also be used to generate random structures for a protein with the simple constraints in vacuum.

CYANAwas used in the experiment to produce the structures using the basic NMRconstraintslikethecouplingvalues.

GROningen MAchine for Chemical Simulations (GROMACS).

GROMACS is a molecular dynamics simulation package originally developed in the University of Groningen. GROMACS is an engine to perform molecular dynamics simulations and energy minimization.

91

These are two of the many techniques that belong to the realm of computational chemistry and molecular modelling. This software is known for its use of computational techniques in chemistry, ranging from quantum mechanics of molecules to dynamics of large complex molecular aggregates. Molecular modelling indicates the general process of describing complex chemical systems in terms of a realistic atomic model, with the aim to understand and predict macroscopic properties based on detailed knowledge on an atomic scale. Often molecular modelling is used to design new materials, for which the accurate prediction of physical properties of realistic systems is required. GROMACSwasusedtoperformmoleculardynamicsofSUMOinthepresenceof8Murea(solvent).

92

Arguslab

ArgusLab is a molecular modelling program. ArgusLab consists of a user interface that supports OpenGL graphics display of molecule structures and runs quantum mechanical calculations. It is basically protein visualization software which also can be modifying the protein structure by cutting them. The dihedral angles of the protein can also be changed by this software.

Ramachandran plot explorer.

It can be used to build the random coil structure for proteins. Main purpose of using this software is to change the dihedral angles (phi, psi omega) in protein backbone.

Gaussian 03

Gaussian 03 is the latest in the Gaussian series of electronic structure programs. Gaussian 03 is used by chemists, chemical engineers, biochemists, physicists and others for research in established and emerging areas of biochemical interest. Gaussian 03 is a connected system of programs for performing semi empirical, Ab initio, and density functional molecular orbital (MO) calculations. Starting from the basic laws of quantum mechanics, Gaussian predicts the energies, molecular structures, and vibrational frequencies of molecular systems, along with numerous molecular properties derived from these basic computation types. It can be used to study molecules and reactions under a wide range of conditions, including both stable species and compounds which are difficult or impossible to observe experimentally such as short-lived intermediates and transition structures.

Features:

Gaussian 03 is capable of predicting many properties of molecules and reactions, in the gas phase and solution, including

• Molecular energies and structures; • Energies and structures of vibration states; • Vibrational frequencies; • IR and Raman spectra, including pre-resonance Raman; • Thermo chemical properties; • Bond and reaction energies; • Reaction pathways; • Molecular orbital’s; • Atomic charges; • Multipole moments; • NMR shielding and magnetic susceptibilities; • Spin-spin coupling constants; • Optical rotations; • Electronic affinities and ionization potentials; • Electrostatic potentials and electron densities.

93

Gaussianwas used to calculate the chemical shielding andNMR chemical shifts foreachoftheaminoacidsinthe8Murea-denaturedproteinintheexperiment.

Gaussview

Gaussview is a full-featured graphical user interface for Gaussian 03. Gaussview 3.0 makes using Gaussian 03 simple and straightforward:

• Sketch in molecules using its advanced 3D Structure Builder, or load in molecules from standard files.

• Set up and submit Gaussian 03 jobs right from the interface, and monitor their progress as they run.

• Examine calculation results graphically via state-of-the-art visualization features: display molecular orbitals and other surfaces, view spectra, animate normal modes, geometry optimizations and reaction paths.

Gaussview supports all Gaussian 03 features, and it includes graphical facilities for generating keywords and options, molecule specifications and other input sections for even the most advanced calculation types. Gaussview makes it simple to set up ONIOM layers, unit cells for Periodic Boundary Conditions jobs, molecule specifications for transition structure optimizations.

Prominent features:

Building Molecules

GaussView includes an advanced Molecule Builder. You can use it to rapidly sketch in molecules and examine them in three dimensions. You can build molecules by atom, ring, group, amino acid and nucleoside, and you can also open PDB and other standard molecule files (hydrogen atoms can be added automatically with excellent accuracy and reliability).

94

Setting Up Gaussian 03 Calculations

Gauss View’s Gaussian Calculation Setup window allows you to set up Gaussian 03 jobs in a simple and straightforward manner. All of the features of Gaussian 03 are supported by the interface, enabling you to prepare input for any job type. The Gaussian Calculation Setup window’s Method panel allows you to select the theoretical method, basis set, and charge and spin multiplicity. Other panels allow you to specify the type of calculation (Job Type), Title section (Title), job resource locations and settings (Link 0). Each panel presents context sensitive options appropriate to the selected calculation type.

Visualizing Gaussian Results

Gaussview can graphically display a variety of Gaussian calculation results, including the following:

• Molecular orbital’s • Atomic charges • Surfaces from the electron density, electrostatic potential, NMR shielding density,

and other properties. Surfaces may be displayed in solid, translucent and wire mesh modes.

• Surfaces can be colored by a separate property. • Animation of the steps in geometry optimizations, potential energy surface scans,

intrinsic reaction coordinate (IRC) paths, and molecular dynamics trajectories.

GaussviewwasusedintheexperimenttomaketheinputfilesanddisplaytheoutputfilesforNMRcalculationinGaussian03.

Visual Molecular Dynamics

VMD (Visual Molecular Dynamics) is designed for the visualization and analysis of biological systems such as proteins, nucleic acids, lipid bilayer assemblies, etc. It may be used to view more general molecules, as VMD can read standard Protein Data Bank (PDB) files and display the contained structure. VMD provides a wide variety of methods for rendering and coloring a molecule: simple points and lines, CPK spheres and cylinders, licorice bonds, backbone tubes and ribbons, cartoon drawings, and others. VMD can be used to animate and analyze the trajectory of a molecular dynamics (MD) simulation. In particular, VMD can act as a graphical front end for an external MD program by displaying and animating a molecule undergoing simulation on a remote computer.

Vega ZZ

Vega ZZ is complete molecular modelling software, with good graphical display. Molecular dynamics can also be performed at a good accuracy using this software.

TheMDsimulationusingVMDandVegaZZwerenotsatisfactoryandcouldnotyieldresults.Sincethesystemwasveryhuge,which includedtenthousandsofwaterandureamoleculestogetherwiththeprotein,thesimulationswouldeventuallydie.

95

Approach

Step 1: Obtained the Dsmt3 structure file in PDB format from the Protein Data Bank

Step 2: Generated the topologies from the main structure with the basic NMR constraints

Step 3: Subjected the best topologies to molecular dynamic simulations in 8M urea.

Step 4: The mean structures were calculated from the best trajectories.

Step 5: NMR chemical shift calculations were performed on the best trajectories.

Step 6: The results were correlated with the original NMR experiment and the secondary structural propensities were calculated on each of the residues.

Step 1: Obtaining Dsmt3 Structure file.

The Dsmt3 structure file with the accession no 2k1f was taken from Protein Data Bank. The protein structure was determined by solution NMR. It is an ensemble of 20 different similar structures which show some dispersion at end and the start of the sequence.

Once the protein was downloaded, mean of those twenty structures were taken. It was this structure that was subjected to Denaturation in 8 M urea through Molecular Dynamics.

Sequence and Structure

96

Step 2: Generating Topologies

Topologies of the mean structures were generated using CYANA 2.1 (linux-based program.). Since the protein was huge enough in size and considering the computational time that could be taken for generating the topologies, the protein was cut into five different fragments. These fragments were divided based on their propensity to form secondary structures.

Fragment Residue numbers

Fragment 1 1-12

Fragment 2 11-32

Fragment 3 31-53

Fragment 4 52-72

Fragment 5 71-88

Method used in CYANA

First it was necessary to make 1000 random conformers or topologies from each of the five different fragments. For this, the sequences of each of the five fragments of SUMO protein were given to the program and were told to generate 1000 random topologies. Dynamics and annealing conditions were applied to give the energy of these 1000 random topologies. The dynamics of the protein was carried out in vacuum by applying basic NMR constraints to them. The program was told to select 20 best energy minimized structures. These 20 different topologies for each of the fragments could be viewed using software’s like Pymol, MolMol, and VMD etc.

Files used CYANA

1. *.CCO File

This file contains the coupling values for all the Hα in amino acids

97

The coupling values for the random coil state should be between 6 to 8. Coupling values below 6 gives an alpha helical conformation. Coupling values above 8 gives a Beta sheet

2. Init.cya File This file specifies the residue no at the start and at the end of the fragment, Makes the library of the information given by us, and Reads the sequence of the protein fragment. Since it contains many sets of commands, they can be regarded as a batch file.

3. Execution Batch file

The first command in the batch file will read the sequence of the third fragment. The second command reads the coupling values and the error values assigned to the protein fragment.

98

Finally 20 best topologies were calculated. Any 5 random topologies were chosen from these 20 topologies and used for NMR calculation.

Step 3: Molecular Dynamic Simulations of SUMO fragments.

Although normally represented as static structures, proteins are in fact dynamic. Most experimental properties, for example, measure a time average or an ensemble average over the range of possible configurations the molecule can adopt. One way to investigate the range of accessible configurations is to simulate the motions or dynamics of a molecule numerically. This can be done by computing a trajectory, a series of molecular configurations as a function of time, by the simultaneous integration of Newton's equations of motion.

Purpose of MD for SUMO

Proteins in solution are considered to dynamic. It is difficult to study their motions, behavior, and structural flexibility in solution. X-RAY crystallography techniques require strict periodic boundary conditions which is very difficult to obtain in non crystalline structures. Molecular dynamics simulations can predict the state of a protein in solution and save these states in the form of a trajectory. MD can predict the movement of large proteins in the solution which is not possible in X-ray.MD can simulate the exact condition of the existence of a protein. Structures obtained after MD simulation can be regarded as best energy minimized and geometrically optimized structures thus allowing them to be used in various experiments like NMR, Docking, protein-ligand interactions.

99

MolecularDynamicswasspecificallyperformedonSUMOtosimulatethedenaturingconditionsonthem,withureausedasasolvent.Thusitcanalsoberegardedasthevirtual process of denaturing the protein. This allowed the SUMO fragments to beusedincalculationoftheNMRchemicalshift,therebyallowingustostudyaboutthedenaturationpropertiesoftheProtein.

Accuracy of Molecular Dynamic Simulations

Calculations in MD are done with very good accuracy at an atomic scale of about 10-8 M. It is much more accurate that Continuum models and Monte Carlo simulations.

File formats used:

1. PDB:

This File is main signature file of the Protein. It contains all the details about the atoms present in the protein structure, its X, Y, and Z co-ordinates

2. Topology File:

This file, along with the atoms list, contains all the force field parameters like Charge and Mass to each and every atom in the protein.

3. Trajectory file:

This is an output file, which contains the trajectory data for the simulation in the binary format. It also contains all the coordinates, velocities, forces and energies.

100

4. *em.mdp file

This file specifies the parameters for running energy minimizations; allows you to specify the integrator (steepest descent or conjugate gradients), the number of iterations, frequency to update the neighbor list, constraints, neighbor list, constraints, etc.

101

5. *.mdp file

This File allows the user to set up specific parameters for Molecular Dynamics calculations that Gromacs performs. These parameters include:

• Time intervals for recording the trajectories • No of iterations to be performed. • The interactions to be included in the calculations.

The exercise falls apart in four sections, corresponding to the actual steps in an MD simulation.

1. Conversion of the pdb structure file to a Gromacs structure file, with the simultaneous generation of a descriptive topology file.

(It is first necessary to convert it to the gromos file type (*.gro). Original data in the pdb file is often incomplete, carbon bound hydrogen's are generally omitted).

2. Energy minimization of the structure to release strain.

3. Running full simulations.

4. Analyzing results.

102

Calculations in Gromacs:

Running Denaturing simulations for proteins in Gromacs generally takes 4 to 5 steps. In this experiment it took four steps to complete the simulation process.

1. conversionofPDBfiletotheGromacsformatandsolvatingtheproteinin8Murea

CommandsUsed:

Pdb2gmx is general command used for converting Pdb File of the protein which is used as an input to give the output in *.gro format. The force field for MD used is Gromacs force field (gmx). This is a force field especially used if the protein has to be studied for its NMR properties.

The force field (gmx) includes bond stretching energy, angle bending, bond rotation energies, nonbonding interactions like van der Waals and electrostatic interactions.

Editconf puts the protein into a box of specified dimensions. Here the dimensions were chosen in such a way, that the edges of the box were 1nm distance from the sides of the box. The box drawn was cubic in shape. Putting the protein into a box simulates the condition of putting the protein sample into a test tube.

This command is used to generate the solvent in the box created with SUMO fragment lying in it, in the form of 8M urea in the box. This simulates the condition of putting the denaturant into a test with the protein as a solute.

103

2. Energy minimization and MD of protein keeping solvent (8M urea) static.

This step allows the surface of protein to interact with urea, thereby causing the Denaturation of the protein surface.

Commands used:

This command inputs all the parameters required to carry out energy minimization of the protein moving in the Static solvent. ‘Minim.mdp’ has all the conditions required to carry out energy minimization.

This command gives the dynamics to the protein keeping the denaturing solvent static. The output of in the form of energy minimized structure is given.

3. Energy Minimization and MD of protein keeping Protein static and the solvent to move around it.

This step allows the denaturant to seep into protein structure. This denatures the interior part of the protein which is generally not in contact with the Denaturant.

Commands used:

This step is also known as ‘position restrained MD’ (pr.mdp). This command inputs all the parameters required to carry out ‘position restrained MD’. Parameters like the number of steps to be repeated, algorithms used, etc.

104

After energy minimization this commands was used to perform dynamics on Urea solvent keeping the protein static (moving the solvent in and around the protein).

4. Performingfulldynamicsofthesystem.

In this step both the solute and solvent were given motion. This simulates action of shaking the test tube with protein and urea. This step completes the process of Molecular dynamics.

Commandsused:

These two commands take output generated from the previous steps as an input, gives the conditions for performing full MD. Firstly the energy minimization is done in the full dynamic condition. Secondly complete dynamics is performed on both the systems.

5. finalizingthecalculations.

Commandsused:

The trajectories that were calculated from MD were filtered out (only the best trajectories out of the whole lot were selected). ‘trajconv’ was used to make a movie out of these trajectories and yielding several final protein structures (dsmt3-final.pdb) in the form of trajectories. For this work 500000 trajectories were initially computed and out of them 50000 best structures were filtered, which was visualized using PyMOL. The best 25 trajectories were selected from a total of 50000 trajectories and a mean structure was calculated on basis of root mean square deviation display, using MolMol.

105

Step 4: NMR chemical shift calculations

In this work only 22 of the total 88 residues of Dsmt3 were calculated for the NMR properties. These calculations were done on the Gaussian 03 package in Grendel Supercomputing facility of Arhus University, Denmark.

GrendelSupercomputeroverview

• System: Grendel - AMD/Opteron cluster, Installed 173 nodes in June 2006. Added 95 more nodes in August 2007. Added 335 more nodes in January 2009.

• Vendor: SUN and DELL • Provider: Atea (formerly Top Nordic) • Total number of nodes/cores in Grendel: 579/3248

Peak Performance: ca. 28 TFlops • 168 SUN Fire x2100 servers, 5 SUN Fire x4100 servers,

95 Dell SC1435 servers 335 SUN Fire x2200 servers, 1 SUN Fire x4600 server.

• Each x2100: 1 2.2 GHz Dual Core Opteron w. 2 GB memory and 80 GB SATA disk. • Each x4100: 2 2.4 GHz Dual Core Opteron w. 8 GB memory and 2 80 GB SAS disks. • Each SC1435: 2 2.6 GHz Dual Core Opteron w. 8 GB memory and 250 GB SATA disk. • 310 x2200: 2 2.3 GHz Quad Core Opteron w. 16 GB memory and 500 GB SATA disk. • 25 x2200: 2 2.3 GHz Quad Core Opteron w. 32 GB memory and 2 TB SATA disk. • 17 of the 335 SUN Fire x2200 nodes are placed at Denmark’s Statistik • Gigabit cluster infrastructure.

Building the Input Files for NMR calculations.

The calculations were mainly focussed on 22 residues of the third fragment. Then NMR chemical shift calculations for the whole 22 residue fragments would take an enormously long time to complete even with a good computational power. From the literature it was found that NMR interaction for an atom does not exist beyond a distance of 3 angstrom. This ensure that the 22 residue long fragment could still be cut during the NMR calculation, in such a way that it was possible to calculated the NMR chemical shifts for each and every residues with a good accuracy. Swiss PDB Viewer was used as a tool for cutting the protein within a specified radius.

Hydrogen Bond distance: 1.6 - 2.0A0 Covalent Bond distance: 0.96 – 1.0A0 Van der Waals radii: always less than 2A0 Electrostatic and ionic interactions: always less than 1.5 A0

For instance to find out the Chemical shifts for Tyr (12th residue) in the 22 residue protein, tyrosine residue is taken as a centre and circle is drawn around it with the radius of 3 A0. All the residues lying within that radius gets selected and rest of the protein can cut completely.

106

107

Once the specific part of the protein is cut, input for submitting the calculations is prepared from Gaussview.

108

Level of NMR calculations in Gaussian

For the NMR calculation of proteins in Gaussian the method of calculations used was ‘Density functional Theory’ (DFT). These calculations take into account all the interactions within the system as the environment effect on the system (interaction of water with proteins). Whereas the other calculations like Hartee Fock does not take the environmental solvent interactions on the proteins. This makes it difficult to calculate the chemical shift of paramagnetic atoms.

Basis set used:

The basis set nomenclature describes how the basis functions are constructed from the Gaussian primitives (the “contraction scheme”.)

109

Once the parameters and the type of calculation to be done are specified, the input file can be used for Gaussian calculation in Gaussian (gjf) format.

A Typical Gaussian NMR input file.

110

Checkpoint File:

It is a file that is crated once the input file is made. This can be used to restart the calculation if the calculation was aborted due to some technical reasons. This helps us from repeating the same calculation and thereby saving time.

Command line:

It includes all the parameters and conditions that were given to make the input file.

Z-matrix

Each line of a Z-matrix gives the internal coordinates for one of the atoms within the molecule. The most-used Z-matrix format uses the following syntax:

Element-label, atom 1, bond-length, atom 2, bond-angle, atom 3, dihedral-angle [, format-code]

Although these examples use commas to separate items within a line, any valid separator may be used. Element-label is a character string consisting of either the chemical symbol for the atom or its atomic number. If the elemental symbol is used, it may be optionally followed by other alphanumeric characters to create an identifying label for that atom. A common practice is to follow the element name with a secondary identifying integer: C1, C2, etc.

Atom1, atom2, atom3 are the labels for previously-specified atoms and are used to define the current atoms' position. Alternatively, the other atoms' line numbers within the molecule specification section may be used for the values of variables, where the charge and spin multiplicity line is line 0. The position of the current atom is then specified by giving the length of the bond joining it to atom1, the angle formed by this bond and the bond joining atom1 and atom2, and the dihedral (torsion) angle formed by the plane containing atom1, atom2 and atom3 with the plane containing the current atom, atom1 and atom2. Note that bond angles must be in the range 0º < angle < 180º. Dihedral angles may take on any value. The optional format-code parameter specifies the format of the Z-matrix input.

An example for Hydrogen Peroxide can be considered. A Z-matrix for this structure would be:

H O 1 0.9 O 2 1.4 1 105.0 H 3 0.9 2 105.0 1 120.0

The first line of the Z-matrix simply specifies hydrogen. The next line lists an oxygen atom and specifies the internuclear distance between it and the hydrogen as 0.9 Angstroms. The third line defines another oxygen with an O-O distance of 1.4 Angstroms (i.e., from atom 2, the other oxygen) and having an O-O-H angle (with atoms 2 and 1) of 105 degrees.

111

The fourth and final line is the only one for which all three internal coordinates need be given. It defines the other hydrogen as bonded to the second oxygen with an H-O distance of 0.9 Angstroms, an H-O-O angle of 105 degrees and an H-O-O-H dihedral angle of 120 degrees.

Viewing the spectra through Gaussview

The results obtained after Gaussian calculations can be visualized in the form of spectra in Gaussview. The reference used was Tetra Methyl Silane (TMS) as it is the most shielded compound and generally used for calculating the chemical shift in a biological sample. The results are generally give in the *.log format, which is readable by Gaussview.

Section 3

- Results and Discussions

112

Results of molecular Dynamics

The results of MD simulations in the form energy for the 2nd, 3rd, and 4th fragments are shown here. The MD simulations of the other fragments will be calculated in the near future.

For the 2nd fragment

For the 3rd fragment

For the 4th fragment

113

In each of the above results average potential energy was computed for the final trajectory (considered to be the best protein structure, to be used for NMR calculations), their deviation from the first trajectory and the total drift. The most important thing to be seen is the values of the Total-Drift in the energy. This implies the difference is the in the energies between the first and last (best trajectories). On an average 4982 structures were made.

Thus it can be said that that the process of performing Molecular Dynamics wassuccessful, as the total drift in the energy between structure at the start of theMolecularDynamicsandthestructureattheendwasquitelarge.Forinstanceitcanbe said that the energy of the 4982th structure was 678.573 Kcal less than the 1ststructureinthewholetrajectoryforthesecondfragment.

Always while performing Molecular Dynamics of protein in a solvent, such a downward trend in the energy should be achieved.

114

Results of NMR: the NMR chemical shifts for the third fragment of Denatured SUMO (residue 32 to 52) are as follows:

Residue and residue no

atom Absolute shielding

Theoretical chemical shifts

Experimental chemical shifts

His 32 Cα 124.2242 58.9 60.4

His 32 CO 9.3642 173.76 175

Thr 33 Cα 122.4942 60.63 60.4

Thr 33 CO 15.5942 167.53 174.8

Pro 34 Cα 116.2242 66.9 55.4

Pro 34 CO 11.32 171.8 176.6

Leu 35 Cα 124.94 58.18 55.4

Leu 35 CO 4.42 178.7 176.1

Arg 36 Cα 127.42 55.7 56.2

Arg 36 CO 6.12 177 176.1

Lys 37 Cα 128.1842 54.94 55.8

Lys 37 CO 7.82 175.3 176.2

Leu 38 Cα

119.81 63.31 55.8

Leu 38 CO 11.35 171.77 175.7

Met 39 Cα 121.33 61.79 55.7

Met 39 CO 4.94 178.18 175.8

Asn 40 Cα 127.95 55.17 53.2

Asn 40 CO 5.75 177.37 175.8

Ala 41 Cα 131.81 51.31 52.6

Ala 41 CO 9.16 173.96 177.2

Tyr 42 Cα 123.80 59.32 58.4

Tyr 42 CO 6.96 176.16 175.9

Cys 43 Cα 126.92 56.2 56.7

Cys 43 CO 6.62 176.5 175.6

Asp 44 Cα 123.72 59.4 54.4

Asp 44 CO 5.8042 177.32 176.2

Arg 45 Cα 125.76 57.36 56.3

Arg 45

CO 7.45 175.67 176.2

Ala 46 Cα 131.51 51.61 52.8

Ala 46 CO 3.42 179.7 178.3

Gly 47 Cα 135.29 47.83 45.6

Gly 47 CO 9.85 173.27 174.3

Leu 48 Cα 124.80 58.32 55.1

Leu 48 CO 7.76 175.36 177.7

Ser 49 Cα 123.09 60.03 63.9

Ser 49 CO 8.26 174.86 174.8

Met 50 Cα 120.52 62.6 56.7

Met 50 CO 9.02 174.1 175.7

For all the 22 residues in the 3rd fragment, the isotropic shielding values were obtained from Gaussian calculation. The chemical shifts were calculated from them keeping the values of Tetramethylsilane as a reference. The shielding values of TMS in the same Basis Set was found to be 183.1242 ppm.

Thus, chemical Shift = isotropic shielding of TMS – Isotropic shielding of residual Atoms.

Criteria for checking the accuracy of the Basis Set

In the above graph the accuracy does not improve beyond the Basis set. (b3lyp6311G (2d, 2p)). There for it was considered to be best basis set that that could be used for NMR calculations.

The results obtained were compared with the experimental data, to check for the accuracy of the theoretical calculations.

Gln Cα 129.52 53.6 56

Gln CO 6.82 176.3 175.9

Val 52 Cα 116.5 66.6 62.6

Val 52 CO 9.42 173.7 175.8

118

Correlation plot:

Based on the results obtained from the convergence plot, the theoretical calculation correlated well with the experimental results. The line also passed through the origin, suggesting correct assignment of the TMS shielding values as well. The Linear Regression coefficient score (R) was also found to be quite convincing.

119

Assigning Structural Propensities

Assigning the structural propensities to the 22 residue fragment was dependent chemical shifts of each of the residues. The structural elements in the 8 M urea-denatured state were compared in terms of the cumulative (Cα and CO) secondary shifts, calculated in a standard manner as:

c δC(cum) = δCα/25 + δCO/10 Cα Secondary shift dispersion rate of 25 ppm CO Secondary Shift dispersion rate of 10 ppm δCα = Theoretically calculated chemical shift – Random Coil Shift δCO= Theoretically calculated Chemical Shift – Random Coil Shift The normalization used for the individual secondary shifts are based on the total span of the respective chemical shifts in experimental values Cumulative Shifts

Residue & residue no

Cumulative Secondary Shifts

His 32 -0.114 Thr 33 -0.05 Pro 34 -0.264 Leu 35 0.3 Arg 36 0.08 Lys 37 -0.17 Leu 38 -0.127 Met 39 0.44 Asn 40 0.0958 Ala 41 -0.39 Tyr 42 0.3074 Cys 43 0.166 Asp 44 0.2725 Arg 45 0.03 Ala 46 0.187 Gly 47 0.08 Leu 48 -0.0212 Ser 49 0.11 Met 50 0.07 Gln 51 -0.026 Val 52 -0.06

120

Final outcome: Three to four residues showing a showing a downward trend (having negative cumulative shift) was assigned the Beta sheet propensity and those showing the upward trend were assigned an Alpha Helix. This was confirmed from the literature.

Thus the Residues 32 to 34 shows β Sheet propensity as it contains three residues in sequence that show a downward trend in the graph. On the contrary the residues 41 to 47 show a continuous upward trend in the graph, which is indicative of α-helical propensity.

121

Future prospects..................................................... The strategy developed for determining 13C chemical shifts by ab initio calculations in the protein SUMO can in fact be used on any other protein with no constraints on its size. The only limiting factor is the total computation time that will be taken for solving the whole protein secondary structure. In the case of SUMO, 22 residues out of the 88 residues have been subjected to the chemical shift calculation for finding out their structural propensities. The remaining calculations on the rest of the residues of SUMO will soon be carried out in the near future. It is anticipated that for completing the calculations for the whole protein, four more fragments (I: 1-20; II-21-30; IV: 53-75 and V: 76-88) need to be taken up in a systematic way by taking on an average three to four residues at a time. The overall computational effort when run on the Grendel Supercomputer amounts to a month and a half. Such calculations will be taken up and completed in the immediate future. It will then be possible to map the whole protein in the natured state for the hidden structural propensities. The same approach will also be used for the SUMO protein in its native state so the overall results of the ab initio calculations can be correlated with the experimental NMR data. It is hoped that at the end we will be able to use our work as a model to study the denaturation kinetics and the folding pathways of the SUMO and other bigger proteins (more than 70 residues). ab initio quantum chemical calculations of NMR chemical shifts (in silico NMR) can then be used along with the multidimensional NMR data as additional constraint for the refinement of protein structures. In all, the work reported in this project leads to the development of a valuable computational approach to gather more insights on the protein folding pathways.

References

122

1. Hay RT (Apr 2005). "SUMO: a history of modification". Mol. Cell 18 (1): 1–12.

2. Matunis MJ, Coutavas E, Blobel G (Dec 1996). "A novel ubiquitin-like modification modulates the partitioning of the Ran-GTPase-activating protein RanGAP1 between the cytosol and the nuclear pore complex". J Cell Biol. 135 (6 Pt 1): 1457–70.

3. Mahajan R, Delphin C, Guan T, Gerace L, Melchior F (Jan 1997). "A small ubiquitin-related polypeptide involved in targeting RanGAP1 to nuclear pore complex protein RanBP2". Cell 88 (1): 97–107.

4. Cheng TS, Chang LK, Howng SL, Lu PJ, Lee CI, Hong YR (Feb 2006). "SUMO-1 modification of centrosomal protein hNinein promotes hNinein nuclear localization". Life Sci. 78 (10): 1114–20.

5. Gill G (Oct 2005). "Something about SUMO inhibits transcription". Curr Opin Genet Dev. 15 (5): 536–41.

6. Zhang XD, Goeres J, Zhang H, Yen TJ, Porter AC, Matunis MJ (Mar 2008). "SUMO-2/3 modification and binding regulate the association of CENP-E with kinetochores and progression through mitosis". Mol Cell 29 (6): 729–41.

7. Azuma Y, Arnaoutov A, Dasso M (Nov 2003). "SUMO-2/3 regulates topoisomerase II in mitosis". J Cell Biol. 163 (3): 477–87.

8. Saitoh H, Hinchey J (Mar 2000). "Functional heterogeneity of small ubiquitin-related protein modifiers SUMO-1 versus SUMO-2/3". J Biol Chem. 275 (9): 6252–8.

9. Matic I, van Hagen M, Schimmel J, et al. (Jan 2008). "In vivo identification of human small ubiquitin-like modifier polymerization sites by high accuracy mass spectrometry and an in vitro to in vivo strategy". Mol Cell Proteomics. 7 (1): 132–44.

10. Matic I, Macek B, Hilger M, Walther TC, Mann M (Sep 2008). "Phosphorylation of SUMO-1 occurs in vivo and is conserved through evolution". J Proteome Res. 7 (9): 4050–7.

11. Gramatikoff K. et al. In Frontiers of Biotechnology and Pharmaceuticals, Science Press (2004) 4: pp.181-210.

12. Cheng CH, Lo YH, Liang SS, et al. (Aug 2006). "SUMO modifications control assembly of synaptonemal complex and polycomplex in meiosis of Saccharomyces cerevisiae". Genes Dev. 20 (15): 2067–81.

13. Pichler A, Knipscheer P, Saitoh H, Sixma TK, Melchior F (Oct 2004). "The RanBP2 SUMO E3 ligase is neither HECT- nor RING-type". Nat Struct Mol Biol. 11 (10): 984–91.

14. Mukhopadhyay D, Dasso M (Jun 2007). "Modification in reverse: the SUMO proteases". Trends Biochem. Sci. 32 (6): 286–95.

15. Ulrich HD (Oct 2005). "Mutual interactions between the SUMO and ubiquitin systems: a plea of no contest". Trends Cell Biol. 15 (10): 525-532.

16. Gill G (Oct 2005). "Something about SUMO inhibits transcription". Curr Opin Genet. Dev. 15 (5): 536-541.

17. Li M, Guo D, Isales CM, et al. (Jul 2005). "SUMO wrestling with type 1 diabetes". J. Mol. Med. 83 (7): 504-513.

18. Verger A, Perdomo J, Crossley M (Feb 2003). "Modification with SUMO. A role in transcriptional regulation". EMBO Rep. 4 (2): 137-142.

19. G.E Martin, A.S. Zekter (1988). Two-Dimensional NMR Methods for Establishing Molecular Connectivity. New York: VCH Publishers. p. 59.

20. J.W. Akitt, B.E. Mann (2000). NMR and Chemistry. Cheltenham, UK: Stanley Thornes. pp. 273, 287.

21. J.P. Hornak. "The Basics of NMR". Retrieved on 2009-02-23.

22. J. Keeler (2005). Understanding NMR Spectroscopy. John Wiley & Sons. ISBN 0470017864.

23. K. Wuthrich (1986). NMR of Proteins and Nucleic Acids. New York (NY), USA: Wiley-Interscience.

24. J.M Tyszka, S.E Fraser, R.E Jacobs (2005). "Magnetic resonance microscopy: recent advances and applications". ''Current Opinion in Biotechnology 16 (1): 93–99.

25. J.C. Edwards. "Principles of NMR". Process NMR Associates. http://www.process-nmr.com/pdfs/NMR%20Overview.pdf. Retrieved on 2009-02-23.

26. Wüthrich K (December 1990). "Protein structure determination in solution by NMR spectroscopy". J. Biol. Chem. 265 (36): 22059–62.

27. Bax A, Ikura M (May 1991). "An efficient 3D NMR technique for correlating the proton and 15N backbone amide resonances with the alpha-carbon of the preceding residue in uniformly 15N/13C enriched proteins". J. Biomol. NMR 1 (1): 99–104.

28. Güntert P (2004). "Automated NMR structure calculation with CYANA". Methods Mol. Biol. 278: 353–78.

29. Rieping W, Habeck M, Bardiaux B, Bernard A, Malliavin TE, Nilges M (February 2007). "ARIA2: automated NOE assignment and data integration in NMR structure calculation". Bioinformatics 23 (3): 381–2.

30. De Alba E, Tjandra N (2004). "Residual dipolar couplings in protein structure determination". Methods Mol. Biol. 278: 89–106.

31. Schwieters CD, Kuszewski JJ, Tjandra N, Clore GM (January 2003). "The Xplor-NIH NMR molecular structure determination package". J. Magn. Reson. 160 (1): 65–73.

32. Pervushin K, Riek R, Wider G, Wüthrich K (November 1997). "Attenuated T2 relaxation by mutual cancellation of dipole-dipole coupling and chemical shift anisotropy indicates an avenue to NMR structures of very large biological macromolecules in solution". Proc. Natl. Acad. Sci. U.S.A. 94 (23): 12366–71.

33. Markus MA, Dayie KT, Matsudaira P, Wagner G (October 1994). "Effect of deuteration on the amide proton relaxation rates in proteins. Heteronuclear NMR experiments on villin 14T". J Magn Reson B 105 (2): 192–5. doi:10.1006/jmrb.1994.1122. PMID 7952934.

34. Fiaux J, Bertelsen EB, Horwich AL, Wüthrich K (July 2002). "NMR analysis of a 900K GroEL GroES complex". Nature 418 (6894): 207–11.

35. Liu G, Shen Y, Atreya HS, et al. (July 2005). "NMR data collection and analysis protocol for high-throughput protein structure determination". Proc. Natl. Acad. Sci. U.S.A. 102 (30): 10487–92.

“Exploring Structural Propensities Exploring the structural Propensities on a Denatured Small Ubiquitin Like Modifier Protein by ab-initio Quantum Chemical Calculation ”. Dissertation submitted to Dr.D.Y.Patil University, Navi Mumbai for partialfulfillmenttowardsthedegreeofB-Tech.(Biotechnology)

By

AbhilashKannan

UndertheGuidance

Dr.GanapathySubramanianScientistE

MCC/BRC,MolecularBiologyunitNationalCentreforCellScience,Pune.

&ProfR.V.Hosur,

DepartmentofChemicalSciences,TataInstituteOffundamentalResearch,

Mumbai.

2009

Department Of Biotechnology and Bioinformatics,

Padmashree Dr. D. Y. Patil University,

BelapurCBD,NaviMumbai-400614.