protein production, characterization and structure ...8634/fulltext01.pdf · 4.1 new methods in...

76
Protein production, characterization and structure determination in structural genomics Esmeralda Alida Woestenenk Department of Biotechnology Royal Institute of Technology Stockholm 2004

Upload: others

Post on 22-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

Protein production, characterization and structure determination in structural genomics

Esmeralda Alida Woestenenk

Department of Biotechnology Royal Institute of Technology

Stockholm 2004

Page 2: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

ii

ISBN 91-7283-838-8

Printed by: Universitetsservice US AB Stockholm 2004

Page 3: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

iii

“Happy is he who gets to know the

reason for things”

-- Virgil (70-19 BC)

Page 4: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

iv

ContentsContentsContentsContents

Contents..................................................................................................................................... iv Abstract....................................................................................................................................... v Main References ....................................................................................................................... vi Acknowledgements................................................................................................................. vii Abbreviations............................................................................................................................ ix 1 Background to this thesis ................................................................................................ 1 2 Introduction....................................................................................................................... 2

2.1 General introduction................................................................................................ 2 2.2 Structural biology...................................................................................................... 4 2.3 Structural genomics.................................................................................................. 5 2.4 DNA cloning techniques......................................................................................... 9 2.5 Protein production and purification.................................................................... 12 2.6 Introduction to NMR spectroscopy.................................................................... 17

3 Fusion tags: facilitation of purification and enhancement of protein solubility ... 21 3.1 Fusion tags for solubility and purification.......................................................... 21 3.2 Effect of His tags on protein solubility............................................................... 22 3.3 Fast purification and biophysical characterization of proteins........................ 27

4 Structure determination of ribosomal protein L18 ................................................... 36 4.1 New methods in structure determination........................................................... 36 4.2 The ribosome .......................................................................................................... 37 4.3 Ribosomal protein L18 .......................................................................................... 39 4.4 Structure determination by NMR ........................................................................ 40 4.5 Solution structure of L18 from Thermus thermophilus ......................................... 48

5 References ........................................................................................................................ 55

Page 5: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

v

Esmeralda A. Woestenenk (2004): Protein production, characterization and structure determination in structural genomics. Department of Biotechnology, Royal Institute of Technology (KTH), Stockholm, Sweden.

AbstractAbstractAbstractAbstract

This thesis covers the process from expression of a heterologous gene in Escherichia coli to structure determination of a protein by nuclear magnetic resonance (NMR) spectroscopy.

The first part concerns structural genomics-related parallel screening studies on the effect of fusion tags (in particular the His tag) on protein solubility and the use of fusion tags in fast, parallel purification protocols intended for initial biophysical characterization of human proteins produced in E. coli. It was found that for most proteins the His tag has a negative influence on protein solubility. This influence appears to be more pronounced for our C-terminal His tag than for the N-terminal His tags used in this study. Moreover, high ratios of soluble per total protein do not always guarantee a high yield of soluble protein after purification, as different vector – target protein combinations result in large differences in host cell growth rates. Protein purification protocols for different fusion tags were developed that make it possible to express, purify and study structural properties of low concentration samples of 15N-labeled proteins in one or two days.

The second part of this thesis describes the assignment and solution structure determination of ribosomal protein L18 of Thermus thermophilus. The protein is a mixed α/β structure with two α-helices on one side of a four-stranded β-sheet. Comparison to RNA-bound L18 showed that the protein to a large extent adopts identical structures in free and bound states, with exception of the loop regions and the flexible N-terminus.

Keywords: protein production, protein solubility, fusion tags, nuclear magnetic resonance, structure determination, ribosomal protein

© Esmeralda A. Woestenenk, 2004

Page 6: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

vi

Main ReferencesMain ReferencesMain ReferencesMain References

This thesis is based on the following papers, referred to in the text by their Roman numerals.

I. Woestenenk, E.A., Hammarström, M., van den Berg, S., Härd, T. and Berglund, H. (2004) His tag effect on solubility of human proteins produced in Escherichia coli: a comparison between four expression vectors. J. Struc. Funct. Genomics 5, 217-229A

II. Woestenenk, E.A., Hammarström, M., van den Berg, S., Härd, T. and

Berglund, H. (2003) Screening methods to determine biophysical properties of proteins in structural genomics. Anal. Biochem. 318, 71-79B

III. Woestenenk, E.A., Allard, P., Gongadze, G.M., Moskalenko, S.E.,

Shcherbakov, D.V., Rak, A.V., Garber, M.B., Härd, T. and Berglund, H. (2000) Assignment and secondary structure identification of the ribosomal protein L18 from Thermus thermophilus. J. Biomol. NMR 17, 273-274A

IV. Woestenenk, E.A., Gongadze, G.M., Shcherbakov, D.V., Rak, A.V.,

Garber, M.B., Härd, T. and Berglund, H. (2002) The solution structure of ribosomal protein L18 from Thermus thermophilus reveals a conserved RNA-binding fold. Biochem. J. 363, 553-561C

All papers are reproduced with permission from the copyright holders: © Kluwer Academic Publishers A, © Elsevier Science B, and © the Biochemical Society C

Page 7: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

vii

AcknowledgementsAcknowledgementsAcknowledgementsAcknowledgements

While writing my thesis, I thought about this section but never actually thought I would really get to the stage of writing the acknowledgements. Then again, though it is hard imagining what things are like before you do them, you usually go through them and suddenly they’re done and it was okay. This might have been true for writing a thesis, doing a parachute jump or moving abroad, but learning NMR proved to be a little harder to tackle for me. And since I am sitting here writing my thesis anyway, I have people to thank:

First of all my supervisors Helena and Torleif. Helena for being so easy to talk to

with all kinds of issues and problems, scientific or non-scientific, and for providing endless help and guidance. If you want to have more time to work on your own things you will have to become less nice! Torleif for letting me work in your group and learn (some) NMR, for knowing so much, giving constructive criticism and teaching me the importance of accuracy and reproducibility.

Susanne for tolerating my company in all three (!) rooms we shared, and other (ex)

roomies Peter, Henrik, Anders Ö, Alexandra and Vildan. I also want to thank Peter for help with computer and NMR issues, Henrik for taking us out ice-skating, Vildan for nice conversations and Anders Ö for always being in a good mood. Elisabet, I enjoyed sharing a room with you in Tällberg and I’m glad I had you to talk to at times when things didn’t feel easy. Thanks to Inger for being so nice and knowing everything about difficult forms and other administrational stuff. I want to thank all other current and past members of the NMR group: Martin, Niklas, Christofer, Jakob, Per-Åke, Anders H, Magnus H (also for help with software!), Magnus W, Anja, and Lotta for a good atmosphere in and outside the lab.

The people at CSB for creating a nice and functional research environment.

Special thanks to Jurate for being a good friend, Peter H for being such an easy-going and fun person and Caroline for having a great sense of humor. Further thanks to Kristina Bergholm for helping me with formal papers and a place to live when I just got to Sweden. At AlbaNova I’d like to thank Cecilia Heyman from the library for cheerfully and swiftly sending me all the articles I requested and ‘plan 3’ for help with sequencing.

Page 8: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

viii

All co-authors that I haven’t already mentioned for interesting collaboration. My diploma work supervisors Leon Mur, Matthieu Joosten, Martin Scanlon and

Judy Halliday for teaching me how to do proper research. Malin, Nathalie and Milla for being nice to live with and easygoing. Milla, I’m

sorry you had to move back to Finland! Madeleine och Jill: tack för all hjälp, för att ni lyssnade och gav mig verktyg för att

förbättra mitt tankesätt. Tjejerna (Linda, Anna, Eleonor, Lynn och Nina): ni har varit stor hjälp och ni är underbara allihopa.

Jag vill även tacka Bosse och Lisbeth som fick mig att känna mig del av familjen. Clasien: voor alle leuke dingen die we samen hebben gedaan. Het is altijd leuk om

je te ontmoeten waar ook ter wereld! Michelle, Suus, Ellen en Willemijn (met aanhang): ik ben blij dat jullie zulke goede vrienden zijn gebleven al die tijd (al moeten we maar nooit meer met elkaar op vakantie).

Nancy: I wouldn’t want to miss our daily conversation or my reliable source of

what’s really going on in US politics! Pap en mam: bedankt voor al jullie steun en begrip door de jaren heen. Oek en

Marcel: het is altijd zo gezellig bij jullie thuis! Oek, een lievere grote zus bestaat er niet! Till slut vill jag tacka min underbaring John för att du är som du är – jag älskar dig!

Page 9: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

ix

AbbreviationsAbbreviationsAbbreviationsAbbreviations 2D two-dimensional 3D three-dimensional AC affinity chromatography ANS 8-anilino-1-naphtalene-sulfonic acid CD circular dichroism COSY correlation spectroscopy CSI chemical shift index DCW dry cell weight DNA deoxyribonucleic acid EM electron microscopy ESI-MS electrospray ionization mass spectrometry FCS fluorescence correlation spectroscopy FID free induction decay GST glutathione-S-transferase HIC hydrophobic exchange chromatography HPLC high performance liquid chromatography HSQC heteronuclear single quantum coherence IEX ion exchange chromatography IgG immunoglobulin G IPTG isopropyl-β-D-thiogalactopyranoside mRNA messenger ribonucleic acid MD molecular dynamics MRI magnetic resonance imaging NMR nuclear magnetic resonance NOE nuclear Overhauser effect NOESY nuclear Overhauser effect spectroscopy PAGE polyacrylamide gel electrophoresis PCR polymerase chain reaction rRNA ribosomal ribonucleic acid rf radio frequency SA simulated annealing SD Shine-Dalgarno SEC size exclusion chromatography TOCSY total correlation spectroscopy tRNA transfer ribonucleic acid

Page 10: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure
Page 11: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

Background 1

1111 Background to this thesisBackground to this thesisBackground to this thesisBackground to this thesis The living cell is a complex network of molecules that through regulated interactions create something that is more than the sum of its parts. Of these molecules, some have a structural function (cellulose, lipid membranes, some proteins), while others are carriers of the memory of the cell (DNA), and yet others are messengers and workhorses carrying out the essential biochemical processes in the cell (RNA and proteins). Even though many new functions of RNA have been identified in the recent years, proteins are the molecules that dominate cellular biochemistry. They are also the most interesting from a structural point of view, as they are built up from twenty different amino acids possessing side chains with various chemical properties forming intricate three-dimensional structures, whereas DNA and RNA only have four building blocks forming predominantly double helix structures. Protein structures can be studied by methods like NMR spectroscopy, X-ray crystallography, neutron scattering, electron microscopy and X-ray scattering; of these, NMR spectroscopy and X-ray crystallography provide the highest resolution structural models. Structural studies of proteins require copious amounts of pure material, which is most often produced in the bacterium E. coli.

This thesis comprises the entire process from protein production to screening for secondary and tertiary structure content and structure determination of a protein by NMR spectroscopy. Aiming to provide tools for structural genomics research, we have studied the effect of fusion tags and expression vectors on protein production levels and solubility, designed a method for rapid purification of proteins and subsequent screening for structural content by CD and NMR spectroscopy, while we have determined the structure of a ribosomal protein using conventional NMR methods within traditional structural biology. In a way, this thesis reflects the transition – both mentally and experimentally – from traditional structural biology to structural genomics. Protein science has never before yielded statistics about protein production levels, effect of fusion tags on solubility and suitability of expression systems as it has in structural genomics. While statistics is far from being everything, in particular in protein science, it provides valuable and useful knowledge when setting up expression trials for unknown proteins.

Page 12: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

2 Protein production, characterization and structure determination in structural genomics

2222 IntroductionIntroductionIntroductionIntroduction

2.12.12.12.1 General introductionGeneral introductionGeneral introductionGeneral introduction Proteins are products of genes and consist of amino acid chains folded into globular structures. For a long time, the material of which genes are made, nucleic acids or more specifically: DNA, were considered to be of a too low complexity to encode versatile molecules such as proteins. By the 1960s, scientists had identified numerous proteins and come to the conclusion that none of these looked alike. They all had a very varying composition, different sizes and different structures, while DNA was composed of only four nucleotides commonly described by single letters: adenine (A), guanine (G), cytosine (C) and thymine (T) or uracil (U) in RNA. How can a simple code like this encode proteins composed of twenty amino acids? For this, nature has chosen the use of three-nucleotide ‘words’ or codons, each of which encodes one

amino acid, resulting in some redundant codons (figure 1.1). For instance, there are six codons for the short, polar amino acid serine and for the large positively charged amino acid arginine, but only two for the large aromatic and hydrophobic amino acid phenylalanine, respectively. This codon degeneracy reduces the effect of mutations, but it can also be a way to write the code for the same proteins differently in different organisms. As GC basepairs have a higher melting temperature than AC basepairs, organisms prefer different ratios of GC/AT depending on for instance their environment. It

Figure 1.1 Triplet codons and their correspondingamino acids. The first vertical axis (1) gives the firstbase of the codon, the horizontal axis (2) gives thesecond base, etc. Ochre, amber and opal codons aretermination codons and indicate the end of atranscript.

Page 13: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

Introduction 3

appears also that amino acids with many codons are relatively cheap for the cell to synthesize 1.

The conversion of genetic information into a protein is called expression. First, a copy of the DNA is made into messenger RNA, or mRNA, in a process called transcription. This is done by the protein RNA polymerase, which in complex with other proteins, recognizes a DNA sequence in front of the genes. The mRNA contains a short sequence that is recognized by the ribosome, a large protein – RNA complex that subsequently translates the mRNA into a protein. After translation, the protein can be modified in various ways, transported to the place in or outside the cell where they perform their function, or they simply fold autonomously and get involved in one of the many processes in the cell that are carried out by proteins. Their versatility is intrinsic in their amino acid sequence. Amino acids are organized like beads in a polypeptide chain and in most proteins the chain folds into a compact globular structure. All proteins have different sequences of amino acids, also called their primary structure (figure 1.2).

Sequences of amino acids tend to form ordered elements, i.e. secondary structure (figure 1.2): they can either form spiral (called helical) or sheet-like structures while other sections interconnect these secondary structure elements and have no specific structure (loop or random coil regions). Since some amino acids occur frequently in one type of secondary structure but not in another type, it is fairly straightforward to

Figure 1.2 Structural levels in proteins. (A) The primary structure of proteins is its amino acid sequence; amino acids are represented by rounded rectangles in different colors for different amino acid types. (B) The secondary structure of the sequence in (A). (C) Tertiary structure isthe position of secondary structure elements relative to one another (the helix in (B) is coloredwhite while the remaining structure is in grey)

Page 14: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

4 Protein production, characterization and structure determination in structural genomics

predict the location of secondary structure elements in proteins and many different softwares exist that do this. Patterns of certain amino acids are also helpful determinants of secondary structure. The difficult part is the organization of the secondary structure elements relative to one another. This organization level is called tertiary structure or three-dimensional structure (figure 1.2). A typical manner by which a set of secondary structure elements is organized into a tertiary structure is called a fold. The three-dimensional structure is hard to predict if not significant sequence homology reveals its evolutionary relationship – and most likely also structural resemblance – to proteins with known structures. Knowledge of the three-dimensional structures of proteins often sheds light on their function and is essential for design of drugs that are efficient, specific and safe. Apart from medical applications, investigating protein structures is interesting as long as we have not covered the entire protein structure universe. We believe that the number of folds – i.e. ‘fold space’ – is limited but the fact that we do not know its size is a driving force in itself. When all folds are allegedly known, it will be interesting to explore protein fold families, discover the variations in these, and attempt to design new ones that nature does not use.

2.22.22.22.2 Structural biologyStructural biologyStructural biologyStructural biology Structural biology is the study of three-dimensional structures of molecules. A molecule cannot perform its function unless it can adopt its native structure. Life is thus based on the ability of molecules to adopt their structure correctly. Of all types of molecules, proteins are the most elusive in doing just this. In his Nobel Lecture of 1972 Christian Anfinsen described how he came to the conclusion that all information a protein needs to fold into its native conformation lies within the amino acid sequence of this protein, a rule which is often called Anfinsen’s dogma 2. Around the same time, Cyrus Levinthal came to the conclusion that a protein could not possibly sample all possible conformations before finding its native folded state, as this would take longer than the present age of the universe 3. This is the Levinthal paradox. The two findings combined suggested that protein folding must be a directed process. While extensive research has now led to a better understanding of protein folding, we are still not able to predict a protein structure from its sequence only. Thus, we still depend on experimental methods to obtain detailed models of protein structures.

Page 15: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

Introduction 5

Nuclear magnetic resonance (NMR) spectroscopy and X-ray crystallography are the two experimental methods available at present that are capable of producing high-resolution protein structures. A third technique, electron microscopy (EM), should theoretically be able to produce images on atomic scale, but as protein specimens scatter relatively weakly and are sensitive to electronic beam damage, the resolution limit for EM is presently about 8 Å 4. However, electron crystallography has for a small number of exceptional proteins yielded near-atomic resolution structures 5; 6. Both NMR spectroscopy and X-ray crystallography need relatively large amounts of material, which in protein science corresponds to milligrams per milliliter of pure protein. For X-ray crystallography, proteins need to form well-diffracting crystals, which still are somewhat of an art to obtain, while NMR spectroscopy uses liquid samples. However, from the point of data processing and interpretation the process of determining a protein structure by X-ray crystallography is usually substantially less demanding and time-consuming than determining a solution NMR structure. Automated data analysis has been fully implemented in X-ray crystallography while for NMR such software is mostly in the development stage.

The need for pure, high concentration protein samples makes structural biology an interdisciplinary field that covers molecular biology, microbial physiology, protein biochemistry and biophysics. One of the main bottlenecks in obtaining a well-behaving protein sample is expression and purification. Contrary to some other biological macromolecules, handling proteins is not straightforward. The varying composition of proteins causes differences in charge, stability, solution requirements, need for cofactors, etc. These differences are functionally very important and interesting from an evolutionary point of view, but tend to cause problems in protein production and purification.

2.32.32.32.3 Structural genomicsStructural genomicsStructural genomicsStructural genomics

2.3.1 The genome sequencing revolution The terms genomics, structural and functional genomics, and proteomics describe new branches in molecular biology, protein biochemistry and structural biology that are typical for the new era of the genome sequences. As with genome sequencing (termed genomics), which focuses on determining complete maps of the genomic

Page 16: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

6 Protein production, characterization and structure determination in structural genomics

DNA of entire organisms, other omics intend to provide the scientific world with the most complete map of the protein universe possible. This development represents a change of attitude in DNA and protein sciences, caused by the growing availability of DNA sequence information. Before sequences of entire genomes became available, databases such as Genbank1 and SwissProt2 contained the DNA and protein sequences of genes and proteins that had been annotated individually, through years of work. The first ‘organisms’ that had their small genomes analyzed to the last base were bacterial viruses or phages. The first genome sequenced was that of bacteriophage MS2, which was completed in 1976 7. The next year, Frederick Sanger and co-workers published the slightly larger 5,368 base pair genome of bacteriophage φX174, which is often considered as the first completely sequenced genome 8. Sanger received his second Nobel Prize in chemistry in 1980 for the development of the DNA sequence analysis method that was used for deciphering the bacteriophage genome. He also developed so-called shotgun sequencing which allowed for much faster sequencing and used that to complete the 48 kb genome of bacteriopage λ five years later 9.The first bacterial genome to be sequenced was that of Haemophilus influenzae in 1995 10. The first eukaryote to have its genome sequenced was baker’s yeast or Saccharomyces cerevisiae 11, followed by the worm Caenorabditis elegans in 1998 12 and the fruit fly Drosophila melanogaster 13. The human genome was, under enormous publicity, published in 2001 both by the prestigious International Human Genome Sequencing Consortium and the company Celera led by Craig Venter, and was shown to contain a 35,000 or so genes 14; 15. Before researchers started to grasp the sheer complexity of genomes, this number was somewhat of a disappointment considering earlier estimations were around 100,000 genes 16. After this landmark, Venter and co-workers have taken DNA sequencing yet a step further with their environmental genome shotgun sequencing of the Sargasso Sea 17.

2.3.2 Structural genomics With the annotation of genes in the 206 genomes that have been sequenced as of July 20043 a vast amount of new, unknown genes have become available for study. A

1 The NIH genetic sequence database: http://www.ncbi.nlm.nih.gov/Genbank/GenbankOverview.html 2 Databank of protein sequences: http://www.expasy.org/sprot/sprot-top.html 3 Genomes OnLine Database; http://www.genomesonline.org/

Page 17: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

Introduction 7

number of these genes can be classified into existing gene families with a certain structure or function, based on sequence homology to members of these families. Sequence homology tools, however, only have a limited reach when it comes to detecting homology, and it has been long known that structure of the final gene product, the protein, is much more conserved in evolution than the sequence of its gene. This means that, presently, we have to look at proteins and their structures directly to detect more remote homologies and structural similarities. At the same time, more protein structures will lead to an improvement of homology-based structure prediction, and one of the objectives of structural genomics is to cover the entire universe of protein structures. A large number of consortia, focusing on solving novel protein structures, have been set up around the world. These consortia involve a large number of affiliated research groups and information about target proteins, methods and project status is often shared. While different structural genomics projects target different organisms, they often select target proteins that have no homology to any known protein structures and in this way contribute to the expansion of the protein universe. Among target organisms are pathogens (e.g. Haemophilus influenzae, Mycobacterium tuberculosis, Campylobacter jejuni, the group of Herpesviridae), and thermophiles like Thermotoga maritima 18 and Methanococcus jannaschii, 19, while human proteins are targeted by many academic and commercial groups, especially potentially new drug targets such as GPCRs4 or kinases 20, and proteins that are related to diseases such as cancer 21 or part of metabolic routes 22.

Since the upcoming of structural genomics, the focus has somewhat shifted from ‘one technology for many proteins’ to application of various techniques to a limited number of target proteins using the skills in parallel treatment, high throughput and automation that have been developed in the early years of structural genomics (figure 2.1).

In ‘traditional’ structural genomics, a large test set of proteins is run through a limited number of expression and purification methods. Various independent approaches have shown that a relatively low percentage of eukaryotic proteins can be purified soluble from E. coli and are suitable for structural studies 23; 24; 25. Even if they can be produced, they are not always properly folded. One of the ideas of this approach is that some of the proteins that can be studied will represent new folds, thus contributing to the major objective of structural genomics. Furthermore, proteins 4 Membrane Protein Network http://www.mepnet.org/

Page 18: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

8 Protein production, characterization and structure determination in structural genomics

that remained behind can be subjected to a second production phase, using for instance a different expression host. The number of expression hosts, expression and purification conditions that are used by structural genomics groups has increased tremendously. This development has led to increased focus on fewer proteins – or a specific group of proteins – with the knowledge of new screening techniques. This is especially popular in the pharmaceutical industry, as a limited group of proteins forms the majority of probable drug targets.

As the numbers of proteins and/or the expression and purification conditions often are high in structural genomics, the process from expression to structure determination needs to be accelerated. This has led to increased use of parallel cloning systems (Paragraph 2.4), the development of new detection methods for soluble protein 26; 27, the development of robots for cloning, purification, and crystallization, the production of higher throughput HPLC systems, purification kits, ready-made gels for DNA and protein detection, as well as the development of (semi)automatic assignment programs for NMR (see Chapter 4). An important step in protein production in structural biology is to determine at an early stage whether a protein is structured under the conditions used. Especially in NMR, screening proteins for structural content has proven to be fairly straightforward. Decent 15N-HSQC spectra have been obtained after subjecting cell lysate containing the recombinant protein to buffer exchange only 28, while addition of rifampicin to cell cultures has been used to reduce background signal and selectively label the recombinant protein 29. The work described in Paper II is in line with these studies and is discussed in Chapter 3.

Figure 2.1 Schematic illus-tration of the applications ofrecently developed parallelscreening techniques instructural genomics. Left pa-nel: application of a standardmethod to a large set ofproteins, giving relatively lowyields. Right panel: appli-cation of various methods toone target protein.

Page 19: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

Introduction 9

2.42.42.42.4 DNA cloning techniquesDNA cloning techniquesDNA cloning techniquesDNA cloning techniques

2.4.1 Traditional cloning Without recombinant DNA technology and PCR, the Protein Data Bank (PDB) would not have contained the 24,213 protein structures it contains today5. The first protein structures determined were myoglobin and hemoglobin, proteins readily available from animal sources 30; 31. However, cloning and heterologous expression (expression of a gene foreign to the organism it is expressed in) are currently the standard ways of producing protein for structural biology efforts.

DNA technology may have begun with the construction of the first recombinant organism by Cohen and coworkers in 1973 32. The authors realized the potential of their work: ‘It may be possible to introduce in E. coli, genes specifying metabolic or synthetic functions such as photosynthesis or antibiotic production indigenous to other biological classes’. This new technology called for general guidelines that were issued for the first time by the National Institute of Health in 1976 and are updated regularly.

At the end of the 1960s, the first bacterial restriction endonucleases were identified and isolated 33; 34. These enzymes were crucial for the development of recombinant DNA techniques, as they cleave DNA at specific recognition sites. Just as essential is ligase, an enzyme from T4 bacteriophage, which repairs nicks in double-stranded DNA and joins blunt or compatible cohesive DNA ends 35. Both ligase and restriction endonucleases are still generally used in molecular cloning. A traditional recombinant DNA cloning procedure can be as follows: source DNA containing the target DNA fragment is digested with restriction endonucleases; a circular cloning vector is linearized using restriction endonucleases; vector DNA and target DNA are joined by ligation; the recombinant DNA construct is introduced in the host cell; selection of host cells carrying the construct by an antibiotic resistance gene present on the vector DNA. This procedure is still the standard cloning technique in most laboratories.

5 PDB statistics as of 10 August 2004; http://www.rcsb.org

Page 20: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

10 Protein production, characterization and structure determination in structural genomics

2.4.2 Alternative cloning techniques: GatewayTM Technology In recent years, life science companies have developed new cloning techniques that allow for fast and parallel cloning of large numbers of genes simultaneously. The three techniques that are currently available are GatewayTM Technology, TOPO TA Cloning® combined with the EchoTM cloning system (all from Invitrogen) and In-FusionTM (BD Biosciences). The GatewayTM, EchoTM and In-FusionTM technologies are based on site-specific recombination, while TOPO TA Cloning is based on the single adenine overhang created on PCR products by Taq polymerase. All cloning methods have their advantages and disadvantages; the new cloning methods were presented to fulfil the demand of the scientific community for fast cloning methods especially suitable for cloning cDNA into expression vectors, while traditional cloning remains the most versatile method available to date.

As the GatewayTM Technology has been used in this work (papers I and II), its background and principles will be described here. Instead of DNA digestion and ligation it employs the site-specific recombination that is the basis of the integration of phage λ into the Escherichia coli genome. Phage λ is a bacterial virus that attaches to the cell surface and injects its DNA into the host cell. Following infection, the DNA can be integrated in the host genome by site-specific recombination as part of the lysogenic (quiescent) cycle. In this state, the phage is called a prophage, because it merely consists of a DNA sequence that has the capability of producing a phage. Stimulants like UV radiation or mutagenic chemicals can end this cycle and prompt the start of the lytic cycle, which is initiated by excision of the phage DNA from the host genome. The phage DNA is transcribed and takes over the cellular machinery of the host, resulting in the production of hundreds of phage particles and lysis of the host cell.

Phage λ recombination is characterized by extreme directionality of the recombination reaction, which is the result of the existence of two different pathways for integrative and excisive recombination 36. Integration occurs by circularization of the λ DNA followed by recombination of the ‘phage attachment’ or attP site with the corresponding bacterial site, the attB site, producing two hybrid sites attL and attR flanking the inserted DNA 37. The insertion event is mediated by phage protein Int or integrase and bacterial helper proteins IHF (integration host factor) and Fis (factor for inversion stimulation). Excision needs an additional protein, excisionase or Xis. GatewayTM technology uses modified versions of the att recombination sites based on

Page 21: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

Introduction 11

the studies of Bushman et al. 38. In order to clone a gene by GatewayTM technology, attB sites are added to the termini of the gene using PCR. The so-called BP reaction, in which Int, IHF, the attB PCR product and a donor vector containing attP sites are present, inserts the gene into the donor vector and creates an ‘entry clone’ (figure 2.2). Donor vectors contain a cassette with the ccdB gene for negative selection, which encodes a protein that interferes with DNA gyrase and is lethal for most E. coli strains 39. Recombination will replace the ccdB gene with the target gene and only clones containing the replaced fragment will survive. The orientation of the inserted fragment is always correct due to att site mutations that cause att sites on the 5’ and 3’ ends only to react with other 5’ and 3’ att sites, respectively 40. The entry clone is the mother clone and is used in the LR reaction (in which attL and attR sites recombine) that creates expression clones. The entry clone is recombined with a ccdB gene containing destination vector using the enzymes Int, IHF and Xis. The advantage of

Figure 2.2 Schematic representation of the GatewayTM cloning reactions. (A) The BP reaction, producing the entry clone from an att site flanked PCR product and a donor vector. (B) TheLR reaction, in which recombination of entry clone and destination vector yields an expression clone. A large number of expression clones can be generated in parallel, allowing forstraightforward screening of for instance different fusion tags. All reactions yield a by-product containing the lethal ccdB gene. Figure adapted from the GatewayTM Technology manual.

Page 22: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

12 Protein production, characterization and structure determination in structural genomics

the system is that a gene can easily be cloned into various destination vectors, which may contain different fusion tags or promoters, thus allowing for faster screening of optimal expression conditions for the gene of interest.

2.52.52.52.5 Protein production and purificationProtein production and purificationProtein production and purificationProtein production and purification

2.5.1 Choice of host When producing proteins for biochemical and biophysical studies, a host should be chosen that produces the protein in sufficient quantities in reasonable culturing time and volume, in its native folded state and preferably active – although this is not always required. In practice, other factors such as cost of production and knowledge and equipment available in the laboratory are equally important. The most widely used recombinant expression host is the bacteria E. coli, commonly referred to as the workhorse of molecular biology. The advantages of E. coli as an expression host are numerous. Its genetics, biochemistry and physiology are extremely well studied, some advantages of which are a wide choice in promoter and regulatory sequences and the availability of specialized strains 41; 42; 43. It also has a short generation time (ca. 20 minutes) and can grow on minimal media, allowing for a cost effective way of isotope and selenomethionine labeling of proteins. It even grows in 2H2O, which is useful for certain NMR studies. Signal sequences at the N-terminal of the protein can direct it to the periplasm, where disulfide bonds can be formed, or to the outer membrane, where membrane proteins can be embedded. However, as bacteria are of a more simple design than eukaryotic cells, they cannot perform more advanced posttranslational modifications than disulfide bond formation. Other types of modification, including phosphorylation, glycosylation, acetylation, fatty acid acylation, and amidation are not native to bacteria and thus require different expression hosts or in vitro methods, if possible. The only eukaryotic host that approaches E. coli in its low requirements and ease of cultivation is yeast. Both Saccharomyces cerevisiae and Pichia pastoris are yeast species that are commonly used for heterologous protein expression. Especially P. pastoris is known for its ability to grow to extremely high cell densities and to produce grams of protein per liter 44. An advantage for NMR spectroscopy is that isotope labeling of recombinant proteins in yeast is possible without the need for the expensive media required for protein labeling in higher eukaryotes 45. Yeast cells

Page 23: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

Introduction 13

efficiently secrete proteins into the culture medium, which is not possible in E. coli, and they perform many posttranslational modifications of higher eukaryotes, although glycosylation differs between yeast and higher eukaryotes 44. In recent years, insect cell expression systems, often based on baculovirus infection, have gained much popularity. Insect cells perform most eukaryotic posttranslational modifications although glycosylation is not always complete or identical to mammalian glycosylation 46; 47; 48. While insect cell systems are widely used, most proteins intended for therapeutic use are produced in Chinese hamster ovary (CHO) cells or in E. coli 49. Mammalian cells require more time, skill and special growth conditions and do not always produce recombinant protein stably 50. Besides the expression systems mentioned here, there are several other systems available such as transgenic plants, cell-free expression systems, and milk from transgenic animals 49; 51; 52. All expression systems have their advantages and disadvantages, and factors such as yield of active protein, production time and relative effort, availability of material and skill, and importance of the target protein are often determining in whether or not an expression system is used for that specific protein.

2.5.2 Transcriptional and translational regulation in E. coli

In order for a gene to be transcribed, RNA polymerase binds to the promoter, a recognition element upstream to the gene. For recombinant gene expression, strong promoters leading to high protein levels are usually preferred. At the same time, the cells benefit from little or no basal transcription of the recombinant gene, as its product can be toxic to the cell or impair growth by the energy it requires of its host cell. It is a further requirement that promoters can be induced efficiently and at low cost. Common E. coli promoters are summarized in a review by Makrides 41. To ensure tight regulation of expression, the promoter is often under control of a regulatory gene such as the lac repressor, which can be present on the plasmid or in the E. coli genome. In the case of the T7-lac promoter-operator, a widely used promoter for recombinant overexpression, transcription is impaired as long as no inducer is added to the culture. A common E. coli expression strain, BL21(DE3), contains a lysogen of bacteriophage DE3 (a derivative of phage λ) in its genome. The DE3 lysogen contains T7 RNA polymerase under control of the lacUV5 promoter, which is isopropyl-β-D-thiogalactopyranoside (IPTG) inducible 53. Addition of IPTG

Page 24: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

14 Protein production, characterization and structure determination in structural genomics

to the culture initiates T7 RNA polymerase production, which is followed by transcription of the target gene on the plasmid.

Termination of transcription is regulated by a transcriptional terminator sequence downstream of the gene. Often, this region contains some sequence symmetry resulting in a hairpin or stem-loop structure 41. Transcriptional termination is of importance for several reasons. Transcription over a promoter may cause promoter occlusion, i.e. deactivation of the promoter 54. Furthermore, transcription from a strong promoter may destabilize plasmids if not a transcriptional terminator is present, as transcriptional readthrough could lead to overproduction of Rop, a protein that regulates plasmid copy number negatively 55. Its gene is located in the origin of replication and can be transcribed if no terminator is present.

Translation of mRNA into protein is a complex process carried out by the ribosome, tRNAs and various cofactors. In most cases, the translation initiation codon of E. coli mRNAs is AUG, encoding methionine. The ribosome, which will be further discussed in Chapter 4, binds to the Shine-Dalgarno (SD) site on the mRNA before translation is initiated. The SD site lies upstream of the translation initiation codon AUG and contains a sequence complementary to the 3´ end of 16S rRNA 56; 57;

58. The distance between SD site and the initiation codon, the nucleotides in the vicinity of the initiation codon, and mRNA secondary structure affect the translation efficiency. Both upstream and downstream of the start codon sequences have been identified that can function as translational enhancers, while an AAA codon (encoding lysine) following the initiation codon results in a high translation efficiency 41; 59; 60; 61. Optimizing sequences around the SD site may result in higher protein production levels, as was recently shown by Zhelyabovskaya and coworkers 62. Apart from sequences around the translation start, codon usage throughout the heterologous gene can affect translation rates significantly. In E. coli, codon usage is a form of expression regulation: genes that are expressed to high levels contain codons that occur frequently, major codons, while minor or rare codons can be found in genes that are expressed to low levels 63. Only low translation rates will be achieved if the foreign gene contains codons for which few tRNAs are made in E. coli. A similar problem arises when a foreign gene has an amino acid composition that does not agree well with that of a typical E. coli protein. In order to increase translation efficiency of such genes, commercially available expression strains exist that contain plasmids with the genes for rare codon tRNAs.

Page 25: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

Introduction 15

Apart from transcription and translation rates, mRNA and protein stability are determining factors for the amount of recombinant protein that is produced. Both mRNA and protein degradation are important regulatory mechanisms in all organisms. Stem-loop structures at both ends of mRNA molecules have been shown to increase resistance against mRNA degradation, while the nature of the N-terminal amino acid in proteins affect protein half-life 41; 64.

2.5.3 Fusion tags A common problem with protein overproduction in the E. coli cytoplasm is the accumulation of insoluble protein into inclusion bodies. While inclusion bodies can be regarded as an initial purification step, not all proteins are easily refolded. Soluble recombinant protein is thus preferred. Traditionally, proteins were purified from E. coli proteins based on their size, charge and surface characteristics using different chromatography techniques and ammonium sulfate precipitation. While these techniques are still commonly applied, it has become increasingly popular to genetically attach peptide or protein tags to the protein of interest, which can function as purification tags, solubilizing tags, or a combination of both. In this thesis, the term fusion tag is used for the sequence that is attached to the protein of interest, while fusion protein describes the two fused proteins forming one chimeric protein.

The concept of combining the sequences of two genes into one, yielding a single fusion protein product with two functional domains, is not self-evident. The usefulness of gene fusion came originally forward in studies of gene expression and protein relocation using β-galactosidase as a reporter gene 65; 66; 67. In 1983, Uhlén and coworkers described the first use of a fusion protein, in this case staphylococcal protein A, for purification purposes 68. Soon after this, the arginine tag was developed intended purely for protein purification with cation exchange chromatography 69. The large variety of fusion tags and proteins currently available is described in a number of reviews 41; 70; 71. For structural biology and structural genomics, short purification tags, solubilizing fusion tags or a combination of these are frequently used. Different fusion tags and their characteristics are further discussed in Chapter 3.

Page 26: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

16 Protein production, characterization and structure determination in structural genomics

2.5.4 Protein purification for NMR spectroscopy In recent years, focus on method development for structural genomics has resulted in fast screening protocols for the behavior and the structural state of proteins. It was found, for instance, that with little or no purification the tertiary structure of a protein could be screened using a simple NMR experiment, the 15N-HSQC 28; 72; 73. This finding represented a change of attitude in the NMR spectroscopy field, in which it had been common to produce protein in large quantities and to high purity. The ability to screen at an early stage allows for optimization of experimental conditions for a specific protein without destroying an expensive sample intended for structural studies, while it also aids in deciding if other production or purification strategies should be attempted. Chapter 3 treats fast purification and screening methods in greater detail.

High-resolution NMR spectroscopy for protein structure determination benefits greatly from pure, concentrated, stable protein samples at a pH below seven. Depending on the expression level of the protein, one to several liters of bacterial culture is generally grown in rich medium or in minimal medium (for labeling with 15N or 13C). After collection of the cells by centrifugation, the cells are lysed and an initial purification step is performed. These days, most protein purification protocols include one or more chromatography steps. Different types of chromatography target different properties of proteins. Examples of different chromatography techniques are:

• Affinity chromatography (AC): relies on the biological function of the protein of interest; this is generally used in fusion tag-based purification with commercially available affinity matrices.

• Ion exchange chromatography (IEX): based on the charge of the protein of interest.

• Hydrophobic interaction chromatography (HIC): makes use of the hydrophobic properties of the protein of interest.

• Gel filtration or size exclusion chromatography (SEC): makes use of the size differences between proteins.

The first purification step – the ‘capturing’ step – is most frequently AC, as this is very specific for the fusion tag that is used. With AC, around 90-95% purity is often obtained. IEX is also used as an initial purification step. IEX, AC and HIC are also good methods for intermediate purification. SEC is generally used as a polishing

Page 27: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

Introduction 17

method. The number of chromatography steps is usually two or three, mostly depending on the purity of the protein after the first step, and has to be considered carefully. Too few purification steps can result in an impure protein sample, while too many steps cause a significant loss of protein. Following purification, buffer is exchanged for a buffer suitable for NMR studies, i.e. containing salt concentrations below 200 mM (lower concentrations are optimal), pH below seven, and no other molecules of which the NMR spectrum overlaps with the protein spectrum, such as glycerol or Tris. To the sample 10% 2H2O is added in order to stabilize the magnetic field around the sample.

2.62.62.62.6 Introduction to NMR spectroscopyIntroduction to NMR spectroscopyIntroduction to NMR spectroscopyIntroduction to NMR spectroscopy NMR spectroscopy is about the interaction of nuclei with electromagnetic radiation. It is performed in the radio frequency (rf) range of electromagnetic radiation, which is a very low energy radiation (figure 2.3). This has two direct implications: NMR is a relatively insensitive spectroscopy technique that needs concentrated protein solutions, and the radiation does no damage to the sample nor changes the properties of the molecules in solution. This latter property is particularly important in magnetic resonance imaging (MRI), a NMR technique used in medicine that is not detrimental to the patient contrary to, for instance, X-ray techniques.

The atomic property that NMR affects is nuclear spin. Spin is as intrinsic to nuclei as other properties such as mass and charge. The spin of a nucleus is defined in its spin quantum number, I, which can have different values. If I of a nucleus equals zero, this nucleus is inactive in NMR. Nuclei with I = ½ are the most frequently used nuclei in NMR, whereas nuclei with higher spin quantum numbers are NMR-active

Figure 2.3 The spectrum of electromagnetic radiation from low frequency (indicated in Hzbelow the types of radiation), long wavelength and low energy to high frequency, shortwavelength and high energy radiation. A grey shade indicates the frequency range in whichNMR spectroscopy works. In this area, wavelengths are approximately 1-100 m. Visible light is indicated by a gradient-colored box.

Page 28: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

18 Protein production, characterization and structure determination in structural genomics

but not as useful. Examples of nuclei with I = ½ are 1H, 15N, 13C, and 31P. While 1H is the most frequently occurring hydrogen isotope in nature, the natural abundance of 15N and 13C is very low. Most organic molecules, including proteins, contain mostly carbon, nitrogen, hydrogen and oxygen. Although hydrogen-selective experiments are numerous, the ability to include the nitrogen and carbon atoms of proteins into NMR experiments by isotope labeling has increased the size limit of proteins that can be studied by NMR. The use of isotope labeling in structure determination is further discussed in Chapter 4.

In NMR spectroscopy, protein samples are placed into a strong magnetic field that makes the nuclear spins align in a parallel or an antiparallel fashion to the field. The parallel state has a slightly lower energy level than the antiparallel state, resulting in a minor overpopulation of the parallel state, even though the energy difference between the two states is very small (figure 2.4). This difference is dependent on magnetic field strength and is dictated by the Boltzmann distribution. In a magnetic field of 14.1 Tesla (corresponding to the basic 1H resonance frequency of 600 MHz), only five out of ten thousand nuclei are in excess in the parallel state at room temperature. The signal that is measured in the NMR experiment derives from this small population of nuclei, as the remaining signals are cancelled out. The sample is located in a probe where rf pulses can be emitted and received. A pulse with a suitable duration and strength disturbs the equilibrium of the population of nuclei, which subsequently return to equilibrium. The magnetization of the sample during that time induces an electric current in a detection coil, resulting in an oscillating signal that decays over time. This signal is called the free induction decay (FID). A mathematical tool, the Fourier transform, is then used to transform the time-dependent signal into a frequency signal. The outcome is a spectrum in which the peaks are separated by resonance frequency.

Figure 2.4 Spin energy levels of a spin ½ nucleus in a magnetic field (B0). There are two values for the magnetic quantum number m, ½ and -½, corresponding to the parallel ( ) and the antiparallel ( ) states (assuming an upward, vertical magnetic field). Their energy levels are separated by an amount of energy that is dependent on magnetic field strength.

Page 29: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

Introduction 19

The difference in resonance frequency – called the chemical shift – between nuclei is one of the cornerstones of NMR spectroscopy. The chemical shift of a nucleus is measured relative to a reference compound and is dependent on differences in chemical environment and on three-dimensional structure. For instance, in a random coil structure, hydrogen bound to a nitrogen atom has a specific, narrow chemical shift range that differs from that of hydrogen bound to carbon. In a protein structure however, hydrogens bound to nitrogen atoms will have a wider chemical shift range

(i.e. increased dispersion) due to differences in chemical environment, secondary and tertiary structure but they can still be distinguished from carbon-bound hydrogens (figure 2.5). As a first step in protein structure determination, all resonances are ‘assigned’, i.e. the identity of the amino acid and the atom within this amino acid

Figure 2.5 Approximate chemical shift ranges of various hydrogens in aprotein, drawn as lines in a 1D 1H spectrum of ribosomal protein L18 ofThermus thermophilus obtained at 600 MHz (14.1 Tesla) at 30ºC. Adaptedfrom reference 132. Chemical shift often has the unit ppm. This is amagnetic field-independent unit, as opposed to Hz: at 9.3 Tesla 1 ppmequals 400 Hz, while at 14.1 Tesla it is 600 Hz.

Page 30: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

20 Protein production, characterization and structure determination in structural genomics

giving rise to a resonance is elucidated. This is essential in the structure determination methods used today, which are further discussed in Chapter 4.

The size of proteins that can be studied by NMR is limited to approximately 30 kDa. Large proteins or protein complexes rotate slower in solution than small proteins, resulting in broadened lines and poor resolution. Another problem with larger molecules is crowding of the spectra and overlapping signals, which cause problems in the assignment procedure. The size limit has been expanded by various tricks such as uniform or selective isotope labeling, increase of dimensionality of NMR spectra to three or for dimensions, and the TROSY technique developed by Wüthrich and coworkers 74. Using solid state NMR it is also possible to study larger structures such as fibers, but this technique often uses specific labeling and has currently not reached the resolution of solution NMR 75; 76. Nevertheless, large progress has been made with solid state NMR in recent years, of which the publication of the structure of a microcrystalline SH3 domain is a good example 77.

Page 31: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

Fusion tags 21

3333 Fusion tags: facilitation of purification and Fusion tags: facilitation of purification and Fusion tags: facilitation of purification and Fusion tags: facilitation of purification and enhancement of protein solubility enhancement of protein solubility enhancement of protein solubility enhancement of protein solubility

3.13.13.13.1 Fusion tags for solubility and purificationFusion tags for solubility and purificationFusion tags for solubility and purificationFusion tags for solubility and purification As was mentioned in Chapter 2, the initial use of fusion tags was confined to purification, detection and localization of proteins. It was also mentioned that while they are still widely used for these purposes, some fusion tags have been found to enhance solubility of the protein to which they are fused. A few solubility enhancing fusion tags can also be used for protein purification using affinity chromatography (AC). Of all fusion tags, the His tag is the most widely used. It consists of a stretch of six or more histidine residues, and binds strongly to the Ni2+ or Cd2+ ions of immobilized metal ion affinity chromatography (IMAC) resin. Due to its strong binding and the independence of the binding to peptide conformation, purification using a His tag is relatively insensitive to buffer variations, and proteins can be purified even under heavily denaturing conditions (6 M Gd.HCl or 8 M Urea). Another advantage is that, due to its small size, it does not always need to be proteolytically removed. Removal of fusion tags is often necessary for larger fusion tags, especially in NMR spectroscopy as resonances of fusion tags appear in the NMR spectrum besides those of the target protein, thus crowding the spectrum in an undesirable way. Many affinity purification systems (excluding the IMAC system) make use of protein-ligand interactions that require both the fusion protein and the ligand on the affinity matrix to adopt their native structures. Examples of such systems are the glutathione-S-tranferase (GST) fusion binding to glutathione sepharose, and the fusions Z and Gb1, both binding to IgG sepharose. While these systems are more sensitive to buffer conditions, they are highly specific and protein purified with these systems is usually quite pure. Fusions with good solubility enhancement qualities that do not work (well) as purification tags are often used in tandem with smaller tags such as His tags.

This chapter, containing results from Papers I and II, will discuss both the use of fusion tags in affinity purification methods and the effect of some fusion tags on proteins solubility. A general problem in protein production is that there are no

Page 32: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

22 Protein production, characterization and structure determination in structural genomics

universal solutions. In this way, fusion tags gain good or bad reputations by their success rate for one or a few proteins. The popularity of structural genomics and the consequent hunt for soluble protein has resulted in more systematic studies of the success rate of different fusion tags, but as the results in Paper I show, drawing general conclusions for a specific fusion tag can be complicated by the large variety in expression vectors that are used. In Paper I, we have attempted to overcome this problem to use vectors with different features and sequences surrounding the fusion tag. The results of this study are discussed in Paragraph 3.2. Paper II describes the use of three different fusions for parallel purification of proteins intended for biophysical characterization. These results are discussed in Paragraph 3.3.

3.23.23.23.2 Effect of His tags on protein solubilityEffect of His tags on protein solubilityEffect of His tags on protein solubilityEffect of His tags on protein solubility

3.2.1 Background and experimental setup As is stated more generally above, systematic studies of the influence of His tags on protein solubility regardless of vector and target protein are scarce. In a study by Busso and coworkers, twenty-four proteins were expressed in vitro with both N and C-terminal His tags 78. Their results indicate that His tag placement affects solubility for some target proteins. These effects varied per target protein, but a significantly better overall performance was observed for N-terminal His tags. Further examples in the literature concerning His-tagged proteins and His tag influence on the production and solubility of the produced protein include mostly studies of one or a few proteins. In these cases, the His tag had no or a small negative effect on protein expression and solubility 79; 80; 81; 82. In the study described in Paper I, we expressed twenty human proteins with four controls using four different expression vectors for production of His-tagged protein. A schematic representation and the sequence of the expression cassettes of the vectors are shown in figure 3.1. Three of these vectors, i.e. pDEST17, pTH19, and pTH26, contain a N-His tag sequence, while one vector, pTH24, contains a C-His tag sequence. Vector pTH26 has a solubilizing fusion tag, the 434 repressor DNA binding domain, on the N-terminal side of the His tag. Throughout this paragraph, the vectors will be described with their name and the position of their His tags: pDEST17N-His, pTH19N-His, pTH24C-His, pTH24nat (native; by cloning a gene with a stop codon into this vector, the protein will not contain a His tag) and

Page 33: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

Fusion tags 23

pTH26434N-His. The main differences between the vectors are the presence or absence of a lac repressor gene and lac operator sequence, and the expressed sequence surrounding the hexahistidine sequence.

3.2.2 His tags affect protein solubility Figure 3.2 shows that while expression without a His tag generally results in the highest ratio of soluble protein, expression in pTH26434N-His gives the next best soluble/total protein ratios. As none of the other constructs were as soluble, the results suggest that His tags affect protein solubility negatively, which may be

Figure 3.1 Schematic representation and sequence alignment of the expression vectors used in this study. In the schematic drawings of the vectors a number of features are indicated: the T7or T7lac promoter as a black line; the tag sequence in dark grey (DNA and protein sequencesare given in the alignment below), the Gateway cassette in white, the T7 terminator as a blackline, the ampicillin resistance gene in black, the pUC and pBR322 origins of replication in grey,and the lacI gene in light grey. The expression cassettes in the sequence alignment below contain the DNA and translated protein sequence from promoter to stop codon. In thealignment, important elements are indicated by boxed text, while the translated amino acidsequence is given above the DNA sequence. The Gateway cassette is exchanged for a target gene by recombination to yield the expression clone.

Page 34: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

24 Protein production, characterization and structure determination in structural genomics

counteracted by the solubilizing 434 repressor DBD. These results are in line with a study on the solubilizing effect of maltose binding protein by Kapust and Waugh 83. Other studies have generally compared solubilizing fusions to His tags, and in these cases more protein was present in the soluble fraction with solubilizing fusions than with His tags 25; 84; 85. However, figure 3.2a shows that the fraction of soluble protein may vary considerably depending on the His tag construct used. Two of these studies used the pDEST17N-His vector 25; 85, which has not performed well in the study described in Paper I. The low solubility of pDEST17N-His-derived His-tagged proteins may in part be caused by the sequence N-terminal to the His tag sequence (figure 3.1; Hammarström et al., in preparation). It is possible that MSYY-H6 (pDEST17N-His) makes the protein more insoluble than MGSS-H6 (pTH19N-His). However, it cannot be ruled out either that these sequences are of importance for translational efficiency and/or protein half-life.

3.2.3 Cell growth rate determinant for protein yields In order to determine whether high expression levels, as judged from SDS gels, were the effect of high cell densities or high protein production levels per cell, total protein production per dry cell weight (DCW) and per culture volume was determined (figure 3.3a). The diagrams in this figure show that the highest average protein levels per culture volume are obtained with the pTH24 vector (both with pTH24C-His and pTH24nat). Per DCW, however, there are only minor differences between the average expression levels, indicating that cell growth is the determinant factor for recombinant

Figure 3.2 Average solubility and relative performance of vectors in producing soluble protein. The column diagram shows the average soluble per total protein for all target proteins in each expression construct. Proteins insoluble in all constructs (45%, see pie diagram) are included in the column diagram. The pie diagram shows the percentages of target proteins for which certain vectors yielded the highest soluble/total protein ratios. ‘Var’: for three (=13%) of the target proteins more than one vector was best.

Page 35: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

Fusion tags 25

protein yield in these cultures. Average growth levels of the cells carrying the four different vectors were determined by determining the growth curves for each vector – target combination and averaging these curves (excluding pTH24nat). The results show that pTH24C-His and pTH19N-His had the highest and second highest average growth rates (figure 3.3b). As these two vectors are very similar apart from the position and the sequence surrounding the His tags, the large difference in growth rate was puzzling. Moreover, cells containing pTH24C-His constructs displayed very tight regulation of expression before addition of IPTG, while cells containing pTH19N-His showed variation in growth rate that was dependent on the target gene (not shown), suggesting that a deleterious mutation in the lac operator or lac repressor sequences

could partially have destroyed gene expression regulation in pTH19N-His. Sequence analysis of the lac repressor gene of pTH19N-His revealed a deletion of one base (G1056) in the C-terminal tetramerization helix of the lac repressor, resulting in a frameshift that alters this region. Interestingly, the lac operator in these vectors consists of one palindrome, which binds to a lac repressor dimer. However, it has been shown that a lac repressor tetramer binds more rapidly to a single palindromic operator than a dimer mutant 86, which could explain the diminished effectivity of the pTH19N-His lac repressor mutant.

Figure 3.3 Average total expression levels and average culture growth per vector. (A) Averagetotal expression levels and relative performance of vectors. The column diagram shows theaverage total expression levels for each vector per culture volume and per DCW, as estimated from SDS gels. The pie diagram shows the percentage of target proteins for which certainvectors gave highest expression levels. In the area indicated by ‘var’, a number of differentvectors performed best. (B) Average growth curves for expression cultures of all targetproteins per vector. Cell densities for pTH24nat were not measured. At t = 2 hours, expression was induced by adding IPTG. The color differences of the curves represent the presence(black) or absence (grey) of a lac operator and repressor in the vector sequence.

Page 36: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

26 Protein production, characterization and structure determination in structural genomics

Large variations in cell growth were observed for different vector – target combinations (not shown). In some cases, target proteins had the same effect on cell growth regardless of the vector that was used for expression. It is possible that these proteins somehow interfere with the biochemistry of the host cell. We found no correlation between up or down regulation of growth and production level, solubility or tertiary structure content of the target proteins.

Upon purification of the His-tagged proteins under physiological and denaturing conditions, highest yields were obtained with pTH24C-His and pTH19N-His for

Figure 3.4 Average yields of soluble and total protein per vector. The color-coding of the column diagrams is identical to that in figure 3.3. Columns represent protein yields measuredby Bradford analysis (left axis) while the scatter diagram displays protein yields as measured onSDS gel (right axis). Upper panel: average yield of purified soluble protein for each vector andrelative performance of vectors. Lower panel: average yield of total protein for each vector and relative performance of vectors. Proteins that could not be purified were included incalculation of the average. The pie diagrams show the percentage of proteins for whichdifferent vectors were best, as determined from Bradford data or SDS gel.

Page 37: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

Fusion tags 27

physiological conditions (purifying only soluble protein) and pTH24C-His for denaturing conditions. The purification results per vector averaged for all proteins are summarized in figure 3.4. As the pie diagrams show, half of the target proteins could be purified under physiological conditions. Of these proteins, most were under 20 kDa. Only two of the larger proteins were produced in soluble form by the bacteria. This size barrier is a common problem in structural genomics, but it may be circumvented by expressing domains instead of full-length proteins 25; 87. The relatively good yields of pTH24C-His constructs in soluble protein purification appear to contrast the generally low solubility fractions of these constructs, but cell growth rates (and thus the absolute amounts of produced protein) are high for this vector, which is determinant for protein yield.

3.2.4 Conclusions This study shows that high soluble/total protein ratios do not necessarily result in high protein yields after purification. High protein production levels combined with high host cell growth levels with the vector encoding C-His tag proteins made it the preferred vector for both soluble and total protein purification, while good yields were also obtained with one of the vectors encoding N-His tag proteins. Furthermore, we find that the His tags have a negative effect on protein solubility. Our data suggest that our N-His tags have a smaller negative effect on solubility than our C-His tag. The solubilizing 434 repressor DBD fusion tag was able to reverse the negative effect of the His tag.

3.33.33.33.3 Fast purification and biophysical characterization of Fast purification and biophysical characterization of Fast purification and biophysical characterization of Fast purification and biophysical characterization of proteins proteins proteins proteins

3.3.1 Techniques for determination of protein structure content There are many biochemical and biophysical techniques that are sensitive to and give information about the secondary and/or tertiary structure of proteins. Paper II makes use of a combination of circular dichroism (CD) spectroscopy, NMR spectroscopy, and 8-anilino-1-naphtalene-sulfonic acid (ANS) binding and fluorescence, all of which are discussed below. There are, however, many more useful techniques, each with

Page 38: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

28 Protein production, characterization and structure determination in structural genomics

their own strength and requirements6. Analytical gel filtration, for instance, gives information about the degree of self-association and the form of the protein molecule, as compact globular proteins move slower through the gel than elongated or aggregated proteins by entering the gel rather than to move only through the larger spaces. Analytical ultracentrifugation is also useful in determining if any self-aggregation takes place or whether the molecule has an unusual shape. Recent progress in fluorescence correlation spectroscopy (FCS) has made it possible to study protein shape, binding and conformational change on the single molecule level 89; 90. For disulfide-bonded proteins, the formation of intermolecular disulfide bonds can be studied by gel filtration or native polyacrylamide gel electrophoresis (PAGE). New developments in mass spectrometry, such as soft ESI-MS, allow for detection of weak protein-ligand complexes 91. Dynamic light scattering gives information about the diffusion rate (and thus the size) of molecules in solution, while small angle X-ray scattering can measure the dimensions of macromolecules and can thus study their tertiary and quaternary structure 92; 93. Internal mobility in protein structure can be studied by hydrogen exchange, which can be monitored by NMR spectroscopy or neutron diffraction. Fluorescence quenching of aromatic groups in the protein can be indicative of their degree of rotational freedom (and thus their position in the protein) and of the structural state of the protein 92. Infrared and Raman spectroscopy are sensitive to backbone conformation and give information about secondary structure.

3.3.1.1 CD spectroscopy Almost all molecules synthesized by living cells are chiral or optically active. Chiral molecules interacting with polarized light forms the basis of CD spectroscopy. The optical activity of small molecules is determined by covalent structure, but for proteins it is molecular conformation that is determinant.

Light is composed of oscillating electric and magnetic waves, which are perpendicular to each other (figure 3.5). Unpolarized light, such as sunrays, has electric components in all directions perpendicular to the direction of propagation. In linear polarized light, the electrical component is confined to a plane (the amplitude thus varies with oscillation), while in circular polarized light the amplitude remains constant while direction changes, either in a left-hand or a right-hand mode (figure 6 Unless other references are indicated, most of the information in this section can be found in 88. Creighton, T. E. (1992). Proteins: structures and molecular properties. 2nd edit, W.H. Freeman & Company, New York, NY..

Page 39: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

Fusion tags 29

3.5). Linear polarized light can be viewed as the sum of opposite (right and left) circular polarized light of equal amplitude and phase. In CD spectroscopy, right and left-handed circularly polarized light is sent through the sample. Amino acids in different secondary structure elements absorb right and left-handed circularly

polarized light beams differently (this is called dichroism), which causes a difference in amplitude between the left and right-handed circularly polarized light. Because of this, the projection of the resulting amplitudes of the right and left-handed circularly polarized light will yield an ellipse instead of a line.

Ellipticity is used as the general unit in CD spectroscopy. This is the ratio of the minor to the major axis of the ellipse. CD spectroscopy is usually performed

using near (250-300 nm) and far-UV light (below 230 nm), covering the region where the peptide backbone (around 190 nm) and aromatic amino acid side chains absorb (in the near-UV region). Due to differences in backbone conformation, α-helical, β-sheet and random coil conformations give rise to different CD spectra. A protein is often a mixture of different structural elements and consequently so is its CD spectrum. Thus, CD spectra can be used to estimate the relative amounts of secondary structures types in proteins by deconvolution of the curve or fitting of the curve to CD spectra of proteins with known secondary structure.

3.3.1.2 NMR spectroscopy: the 15N-HSQC As was mentioned in Chapter 2, the 15N-HSQC experiment is a sensitive probe for protein tertiary structure and behavior in solution. A well-dispersed 15N-HSQC is a requirement for structure determination using NMR (see Chapter 4), as many three-dimensional (3D) experiments have a 15N-HSQC sequence implemented in their pulse sequences. A 15N-HSQC spectrum is a two-dimensional (2D) heteronuclear spectrum

Figure 3.5 Characteristics of light. The electriccomponent of an electromagnetic wave is shown inthe upper picture, with the direction of themagnetic component indicated with an arrow.Below are schematic illustrations of unpolarized,linearly and circularly polarized light.

Page 40: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

30 Protein production, characterization and structure determination in structural genomics

with a hydrogen and a nitrogen dimension. It shows all amide groups in the protein, most important of which are the backbone amides, one for each amino acid. Apart from the backbone amides, the side chain amide groups of Asn and Gln can be observed as two peaks with one 15N shift and different 1H chemical shifts, the amine group in the indole ring of Trp can be observed, as well as some Arg side chain amine groups. The peaks of amide groups in the 15N-HSQC spectrum are exquisitely informative of the surroundings of the amide groups:

• The dispersion of the peaks is indicative of the degree of secondary and especially tertiary structure of the protein (figure 3.6). Random coil backbone HN shifts lie between 7.7 and 8.5 ppm 94, while those of a structured protein can range from around 6 to 12 ppm, depending on secondary structure type, the presence of nearby aromatic rings and other factors like hydrogen bonds. Side chain amides of Gln and Asp are often well dispersed in a structured protein, while they form two sharp peaks in a random coil protein.

• The number of peaks should equal the number of amino acids (except Pro and the N-terminal residue) in a well-behaving monomeric protein of a size suitable for NMR. Absence of peaks could indicate exchange, disorder (see next point), or proteolytic degradation. Too many peaks can be an indication of the presence of multiple stable conformations.

• Conformational and/or chemical exchange can be manifested as differences in the height and sharpness of peaks. Aggregation can cause peaks to disappear because of the long rotational correlation times of large aggregates, while multimerization results in reduced sharpness of peaks.

Figure 3.6 Com-parison of 15N-HSQC spectra of a structured (human TCPG) and an unstructured protein (human IPKA). The horizontal axis repre-sents the 1H dimen-sion, the vertical axis 15N.

Page 41: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

Fusion tags 31

3.3.1.3 ANS binding and fluorescence ANS is a hydrophobic molecule that binds to molten globular proteins upon which its fluorescence is both enhanced and shifted in wavelength 95. It does not bind to completely structured proteins – except to those with accessible hydrophobic pockets – or to hydrophobic sequences in random coils. The molten globule state is characterized by the presence of all native secondary structure and most tertiary structure, though in a less compact and stable form. While CD spectroscopy detects secondary structure and the HSQC is indicative of tertiary structure, there is a grey zone in which it is problematic to define the structural state of the protein. In these cases, binding and fluorescence studies of ANS to the protein is helpful, although it is not entirely conclusive.

3.3.2 Fast screening for protein structure content In structural genomics, proteins are produced and purified in parallel using one or several standard protocols. As was touched upon in Chapter 2, it can be helpful to screen a protein for structural content before production of large quantities of protein is attempted. When producing human proteins in E. coli there are many steps that can prevent a protein from adopting its native conformation; these range from expression to the need for cofactors, misfolding, lack of binding partners and proper posttranslational modification. Paper II describes the development of a fast protein purification protocol based on affinity purification, followed by biophysical characterization, going from expression to knowledge about the secondary and tertiary structure of a protein in one to two days.

For expression, a culture volume of 50 ml in 15N-labeled minimal medium was chosen, as it is an easily handled volume while it also yields sufficient amount of protein – up to ca 25 picomoles, which corresponds to a 50 µM 500 µl sample. For biophysical characterization, concentrations lower than tens of micromolars are possible but would require longer experiment times and more uncertainly due to lower signal to noise levels. A purification protocol was developed as illustrated schematically in figure 3.7. An advantage of the method is that the purification strategy is the same for the different fusion proteins, apart from affinity matrix-specific buffer use. Subsequently eluted proteins were concentrated if necessary and subjected to biophysical characterization by NMR and CD spectroscopy and ANS fluorescence. We were able to produce, purify and characterize various proteins using

Page 42: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

32 Protein production, characterization and structure determination in structural genomics

this method, three of which are disease-related human proteins of which the structures were not known at the time of this study (data not shown).

We chose a system that included removal of fusion tags by proteolytic digestion, since fusion tags can complicate structural studies. In NMR, small fusions such as the Gb1 and Z domains used in Paper II can dominate a NMR spectrum run on the entire fusion protein (unpublished results). This is likely due to their small size (and thus a fast rotational correlation time) and stable structure, both of which are manifested as relatively sharp peaks in the 15N-HSQC spectrum. Even the shorter tags, for example the His tag, have been found to complicate both NMR spectra (by signal broadening) and protein crystallization, although the presence of fusion tags is not always a problem for the formation of well-diffracting crystals 96.

Paper II also showed that a combination of ANS fluorescence assays, NMR and CD spectroscopy results in a more thorough knowledge about the structural state of a

Figure 3.7 Schematic representation of the of purification scheme. The total time includingmeasurement of the NMR (and/or CD) spectra accumulates to 3 to 6 hours, depending on the protease that is used and if a concentration step is included. The SDS-polyacrylamide gel shows the purification process of target protein TCTP. Lane S shows the soluble fraction ofthe sonicated cells, with the target protein fused to GB1 as a 28 kDa product (apparent size 35kDa); lane W represents the flow-through after binding of the target to the IgG column; lane C is the cleaved TCTP protein (19.5 kDa – apparent size 27 kDa); lane N is the concentrated sample that was subsequently used for characterization by NMR and CD spectroscopy; lane Econtains the eluate from the IgG column.

Page 43: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

Fusion tags 33

protein than any of the techniques by themselves. This is illustrated in figure 3.8. For either a completely structured or a non-aggregating random coil protein, the information in the 15N-HSQC is satisfactory. However, in a grey zone where proteins can be in a molten globular state, aggregated, or in conformational exchange, the 15N-

Figure 3.8 Examples of CD and NMR spectra and ANS fluorescence curves of: the 19.5 kDahuman protein TCTP (translationally controlled tumor protein; top), the 8.0 kDa humanprotein IPKA (protein kinase inhibitor alpha; center) and the 8.5 kDa human protein HBP1(heat shock factor binding protein; bottom). On the y-axis of the ANS fluorescence curves isIntensity, with arbitrary units. The grey curve represents the fluorescence of ANS in a samplecontaining only buffer, while the black curve is ANS fluorescence when the protein inquestion is present. The NMR spectrum of TCTP is recorded at 600 MHz with 8 scans. TheNMR spectrum of IPKA is recorded at 600 MHz with 8 scans; that of HBP1 is recorded at500 MHz with 32 scans.

Page 44: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

34 Protein production, characterization and structure determination in structural genomics

HSQC can be puzzling and here combination with CD spectroscopy and ANS fluorescence can detect promising proteins. CD spectroscopy will detect the occurrence of secondary structure while ANS binds to hydrophobic pockets and can probably be inserted in the poorly packed hydrophobic core of a molten globule. In paper II, heat shock factor binding protein HBP1 was classified as a molten globule judging from a poor NMR spectrum, the presence of helical structure as seen by CD spectroscopy, and binding of ANS with the characteristic shift towards a shorter emission wavelength (figure 3.8). However, another publication showed that HBP1 probably consists of a single helix that self-associates into a coiled-coil trimer and sometimes a hexamer 97. The paper, a more traditional and thorough biophysical characterization study, described the occurrence of slow motions that reduced the sensitivity of the 15N-HSQC experiment. Removal of the unstructured C-terminus and use of a TROSY experiment improved the data in this publication significantly, neither of which was attempted in our study. Our aim was merely fast characterization of the structural state of proteins as they were for structural genomics applications, and with the construct used in our study this protein was problematic. Our conclusion in this case has to be that despite the wide and successful use of ANS in studying intermediate folding states of proteins, it is not suitable for all proteins. The slow motion that was observed in HBP1 might indicate subunit exchange, as the authors indicated, in which case it would not be unlikely that ANS could associate with the hydrophobic interior of the trimer. It should also be observed that even though typical molten globules exist, a folding intermediate of a protein is usually rather an ensemble of states than a well-defined single state, thereby complicating detection and definition.

In future biophysical characterization studies, caution should be exercised in interpretation of data such as those obtained for HBP1. While many proteins may readily be classified according to their structural content, some proteins will remain hard to pin down if not extensive studies are performed. In these cases, a potential molten globular protein could be subjected to proteases, as molten globules have been found to display increased sensitivity to proteases 98. However, the unstructured C-terminal region of HBP1 was found to be sensitive to protease digestion 97, which is probably common for proteins with flexible termini. Furthermore, melting curves can give insight in the degree of cooperativity of unfolding. Molten globules are

Page 45: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

Fusion tags 35

established to have no cooperative melting trajectory 99 and could in this way be distinguished from well-folded, multimeric and dynamic structures such as HBP1.

3.3.3 Conclusions We have developed and tested a simple and efficient protein purification method for biophysical screening of proteins and protein fragments by NMR and optical methods, such as CD spectroscopy. Using the present purification scheme it is possible to take several target proteins, produced as fusion proteins, from cell pellet to NMR spectrum and obtain a judgment on the suitability for further structural or biophysical studies in less than one day. The method is independent of individual protein properties as long as the target protein can be produced in soluble form with a fusion partner. Identical procedures for cell culturing, lysis, affinity chromatography, protease cleavage, and NMR sample preparation then only initially require optimization for different fusion partner and protease combinations. The purification method can be automated, scaled up or down, and extended to a traditional purification scheme. We have tested the method on several small human proteins produced in E. coli, and find that the method allows for detection of structured as well as unfolded proteins, while it also aids in characterizing proteins that contain some degree of structure.

Page 46: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

36 Protein production, characterization and structure determination in structural genomics

4444 Structure determination of ribosomal Structure determination of ribosomal Structure determination of ribosomal Structure determination of ribosomal protein L18protein L18protein L18protein L18

4.14.14.14.1 New methods in structure determinationNew methods in structure determinationNew methods in structure determinationNew methods in structure determination The section below describes the structure determination of a protein by NMR in the traditional way. Despite many improvements in pulse sequences, isotope labeling, magnetic field strength and software, protein structure determination by NMR is still a laborious process. Traditional structure determination methods are based on the assignment of the resonances of nitrogen, carbon and hydrogen atoms (depending on the isotope labeling) in the protein. These assignments can subsequently be used to identify NOEs, which form the most important distance restraints in NMR structure determination (see paragraph 4). The total acquisition time of the spectra needed for structure determination is generally around four to six weeks, while both assignment and identification of NOEs take months to complete, depending on the size of the protein, the quality of the spectra, skill, etc.

Especially in structural genomics, a high output of structures is desirable. There is thus a great demand for new, faster structure determination methods in NMR. Presently, developments in this area appear to take two different courses. One direction is the development of automated assignment methods and structure determination using NOEs. A number of programs for automated backbone and side chain assignment have been developed and they work fairly well 100; 101; 102; 103; 104. Automated assignment of NOEs, however, is more difficult as there is more overlap to be dealt with. These methods are often iterations of assignment and structure calculation, adding more assignments each round 105; 106; 107; 108; 109. Acquisition time can be reduced by improved hardware such as a cryoprobe, while clever cloning or labeling schemes can reduce overlap 110; 111.

A new way of structure determination is the use of residual dipolar couplings. The dipolar interaction that forms the basis of the NOE contains both distance- and angle-dependent terms. In solution NMR, the angle component is averaged out because of the tumbling of the molecules in solution. If the proteins can be ordered in solution to some extent, the angle component is not averaged out resulting in

Page 47: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

Structure determination of ribosomal protein L18 37

resonance splittings the frequency difference between which yields residual dipolar couplings. Aligning proteins can be done using liquid crystal media such as bicelles 112, phages 113; 114, cellulose microcrystals 115, and acrylamide gel 116; 117. Residual dipolar couplings contain long-range information about the orientation of atoms, which is due to the fact that they relate to an external frame of reference 118. While NOEs give distance information up to 5 Å, residual dipolar couplings can be used to determine the relative orientation of secondary structure elements – something highly useful for elongated proteins and nucleic acids. Residual dipolar couplings have been used in refining structures determined using NOEs, but they can also give structural information by themselves. Annila and coworkers classified a structure by residual dipolar coupling data using a fold library 119, while new folds can be identified using a method called molecular fragment replacement 120. Backbone structures can be directly calculated with residual dipolar couplings as main restraints 121. Even though these structures are only good enough for fold classification, clear advantages are the short acquisition time and less need for double labeling.

It is possible that these two courses in NMR structure determination will converge and yield a faster way of determining NMR structures than the methods we currently use. It is also conceivable that the two methods will serve different purposes. As one of the aims of structural genomics is to search the fold space, low resolution structures might be sufficient in providing us with new folds. The need for high resolution structures will always exist, and NMR spectroscopy has definitely a niche in this area, as many proteins do not crystallize and thus cannot be studied by X-ray crystallography.

4.24.24.24.2 The ribosomeThe ribosomeThe ribosomeThe ribosome The ribosome is an intricate molecular machine that translates mRNA strands into proteins. With a molecular weight of around 2.6 MDa, the bacterial ribosome is among the largest components in the prokaryotic cell. It is a ribonucleoprotein complex composed of three RNA molecules, 5S rRNA, 16S rRNA and 23S rRNA (where the ‘S’ designation refers to the sedimentation rate, which is related to molecular size), and over 50 proteins. Ribosomal RNA is often used for constructing evolutionary trees, as its importance for survival should have left it largely intact throughout evolution and the separation of species. However, enough differences

Page 48: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

38 Protein production, characterization and structure determination in structural genomics

between prokaryotic and eukaryotic ribosomes exist for several antibiotics (for example tetracycline and chloramphenicol) to block the function of prokaryotic ribosomes only.

Translation of mRNA into protein begins by binding of the ribosome to the mRNA molecule through complementarity between the SD site and a sequence in the 16S rRNA of the small subunit. Three initiation factor proteins move in to help the initiation of translation. A tRNA molecule carrying the first amino acid of the protein chain arrives at the ribosome and settles in the so-called A (aminoacyl tRNA) site, interacting with the AUG codon for methionine. Moving to the P (peptidyl tRNA) site, a tRNA carrying the second amino acid can bind at the A site, and a peptide bond is formed between the first and the second amino acids. The empty tRNA from the P site moves into the E (exit) site and is released, while the dipeptide tRNA from the A site now moves into the P site, and the translation process continues until a stop codon is met.

It has taken X-ray crystallographers many years to determine a high-resolution structure of the ribosome. In 2000, when determination of the solution structure of TthL18 was in process, the X-ray structure of the large subunit of the Haloarcula marismortui ribosome was determined to 2.4 Å 122, which was soon followed by two T. thermophilus small subunit structures 123; 124, a bacterial large subunit structure 125 and the entire ribosome of T. thermophilus at somewhat lower resolution 126. Until then, it had only been possible to study the individual proteins and parts of the rRNA molecules by X-ray crystallography and NMR spectroscopy, while low-resolution images of the ribosome were available from EM and X-ray crystallography studies. Many ribosomal proteins were suitable for NMR with relatively small sizes and flexible tails complicating crystal formation. The structure determination of L18 should be viewed in the light of the absence of high-resolution ribosome structures. In addition, many were inclined to think of ribosomal proteins as the active parts of the ribosome, while the role for rRNA could be in recognition and structural support for the proteins. As it turned out when the first structure was published, the situation was exactly the opposite: many of the ribosomal proteins functioned as recruiters of and cement between rRNA helices. No protein was observed in the vicinity of the active site portion of the large subunit, and a mechanism for peptidyl transfer was proposed that attributed catalytic properties to the rRNA 127. While the occurrence of enzymatic activity of RNA has long been established, another mechanism has been

Page 49: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

Structure determination of ribosomal protein L18 39

proposed more recently by Yonath and coworkers, in which they suggest the energy for peptidyl transfer is obtained from the structural framework the ribosome offers for the alignment of tRNA molecules 128; 129.

4.34.34.34.3 Ribosomal protein L18Ribosomal protein L18Ribosomal protein L18Ribosomal protein L18 L18 from Thermus thermophilus (TthL18) is a 12.5 kDa large subunit protein that is one of the primary 5S rRNA binding proteins and is also associated with 23S rRNA. Binding of L18 to 5S rRNA induces a conformational change in the RNA molecule, stimulates subsequent binding of L5 to 5S rRNA and induces formation of the 5S/23S rRNA – protein complex 130; 131. The subdomain of the ribosome formed by the 5S rRNA – protein complex is generally named the central protuberance, and the location of this subdomain and L18 in the large ribosomal subunit is shown in figure 4.1.

The basic N-terminus of L18 is not involved in 5S rRNA binding, but is necessary for formation of the complex with 23S rRNA. The 2.4 Å structure of the archaeal large subunit 122 shows that the basic N-terminus of H. marismortui L18 (HmaL18) becomes partially structured upon interaction with both its substrates: a helix and a helical turn are formed in the C-terminal half of the basic N-terminus. The first ten amino acids

Figure 4.1 The position of L18 in the 50S subunit. HmaL18 is shown in green, 5S rRNA in purple, and the two proteins thatcontact L18, L5 and L21e (e: does not occur in prokaryotes),are shown in red and blue, respectively. The rest of the 50S subunit is shown in black (proteins) and grey (23S rRNA).

Page 50: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

40 Protein production, characterization and structure determination in structural genomics

are completely unstructured and stick like a rod between 5S and 23S rRNA. The rest of the structure is largely similar to the solution structure of uncomplexed TthL18; details are further discussed in Paragraph 4 of this chapter.

4.44.44.44.4 Structure determination by NMRStructure determination by NMRStructure determination by NMRStructure determination by NMR7777

4.4.1 Resonance assignment NMR experiments can either involve transfer of magnetization through covalent bonds (scalar couplings), or transfer of magnetization through space (the nuclear Overhauser effect or NOE), or a combination of both in certain complex experiments. Scalar or J couplings are mediated by the electrons that form chemical bonds between atoms, while the NOE arises from dipolar cross-relaxation between nuclei close in space. A NOE between two interacting nuclei is dependent on the distance between these nuclei and as a consequence it yields direct information on the relative position of these nuclei in space. Hence, NOEs are the most important structural constraints used in structure determination by NMR. In order for the NOEs to be useful, it is necessary to assign (i.e. identify) all resonances corresponding to hydrogen atoms in the backbone and amino acid side chains in a protein.

As was discussed in Chapter 2, isotope labeling is frequently used in order to increase resolution and selectivity. In such cases, assignment of labeled carbon and nitrogen atoms are also performed, and special experiments have been designed for the sequential assignment of doubly labeled (13C/15N) proteins. For small proteins

7 In this section, information is mostly derived from two different textbooks: 132. Cavanagh, J., Fairbrother, W. J., A.G., P. I. & Skeltin, N. J. (1996). Protein NMR spectroscopy: principles and practice, Academic Press, San Diego, CA. 133. Wüthrich, K. (1986). NMR of proteins and nucleic acids, John Wiley & Sons, New York, NY. Other sources are indicated in the text.

Figure 4.2Schematic dra-wing of a serine residue with its characteristic COSY and TOCSY spectra.

Page 51: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

Structure determination of ribosomal protein L18 41

labeling with 15N usually suffices and is often used even when unlabeled protein would yield spectra suitable for assignment. Without labeled carbon atoms a combination of TOCSY, COSY and NOESY spectra are generally used for the assignment procedure. TOCSY is the acronym for TOtal Correlation SpectroscopY and is based on scalar coupling. A TOCSY experiment uses a so-called isotropic mixing sequence to transfer magnetization between all nuclei in a scalar coupled network. This mixing sequence consists of a series of pulses designed to remove all chemical shift differences and to create a strong coupling environment. These networks are often called spin systems and in protein NMR one amino acid corresponds to one spin system (figure 4.2). Despite chemical shift differences between side-chain hydrogens, the Cβ and Cγ hydrogens in particular can often not be assigned in a TOCSY spectrum, as their chemical shift ranges overlap. In order to alleviate assignment of a TOCSY spectrum, a COSY (COrrelation SpectroscopY) spectrum is usually used in parallel. COSY was the first two-dimensional NMR experiment to be developed and consists of a simple pulse scheme of two pulses separated by an incremented delay. Crosspeaks are the result of magnetization transfer between scalar coupled spins, and the strength of COSY is that transfer is restricted to three bonds. This distance is close enough to observe couplings between neighboring side chain hydrogens (e.g. Hα-Cα-Cβ-Hβ) but not beyond direct neighbors; this makes COSY perfectly complementary to TOCSY (figure 4.2).

With a 2D homonuclear (1H-1H) TOCSY spectrum or a 3D heteronuclear (15N-1H) TOCSY-HSQC spectrum, in which the 2D 1H-1H TOCSY spectrum is separated by a third nitrogen dimension (figure 4.3), the nature of the

Figure 4.3 Schematic illustration of a 3D spectrum, in this case a 1H-15N-TOCSY-HSQC. The 3D spectrum consists of planes of 2D spectra that, when projected in two directions, form a 2D TOCSY and a 15N-HSQC.

Page 52: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

42 Protein production, characterization and structure determination in structural genomics

amino acids in the protein can be identified. However, in order to place these amino acids at the correct locations in the protein sequence, a NOESY (NOE SpectroscopY) spectrum is required. In NOESY, magnetization is transferred through dipolar couplings during the mixing time, a fixed delay that should be optimized in order to allow for maximize NOE crosspeaks and to avoid indirect couplings (the result of spin diffusion) to result in crosspeaks. Both 2D 1H-1H NOESY and 3D 15N-1H NOESY-HSQC spectra are commonly used for sequential assignment.

Triple resonance experiments used for sequential assignment of double-labeled proteins facilitate assignment by their increased selectivity and separation. Another advantage of resonance assignment using sequential triple resonance experiments compared to NOESY-based assignment is that there are no long-range NOEs complicating the assignment of sequential NOEs. The most common form of these experiments yield 3D spectra with a proton, nitrogen and carbon dimension.

Assignment of N, HN, and Cα nuclei can be done by a combination of HNCA and HN(CO)CA experiments (figure 4.4). The experiment names give information on the path of the magnetization throughout the experiment. In the HNCA, magnetization is transferred in the following ways: 1HNi-15Ni-13Cα

i (forward) and: 1HNi-15Ni-13Cαi-1

(backward). In the HN(CO)CA, magnetization is transferred through the carbonyl

Figure 4.4 Schematic illustration of sequential assignment using triple resonance experiments.HNCA and HN(CO)CA experiments are analyzed in parallel in order to obtain sequentialbackbone assignments. The magnetization transfer in both experiments is shown to the left(redrawn from reference 132). To the right is a schematic overlay of HNCA and HN(CO)CAspectral strips, which are numbered by amino acid residue. The strips are 2D 1H-13C planesseparated by a 15N dimension.

Page 53: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

Structure determination of ribosomal protein L18 43

carbon, but it is not frequency-labeled there, which results in a spectrum containing the same information as a HNCA spectrum but only backward since magnetization starting on 1HNi can only travel through its neighboring carbonyl carbon to 13Cα

i-1. Similar experiments have been developed for assignment of CO nuclei (HNCO, HN(CA)CO) and for side chain assignment (for instance HNCACB, CBCA(CO)NH). In order to obtain carbon shifts for all side chains a 3D 1H-13C HCCH-TOCSY with a complementary HCCH-COSY is often used.

4.4.2 Secondary structure determination Once the assignments have been completed the aim is to identify and assign NOE crosspeaks, which give information about the tertiary structure of the protein. However, different secondary structure elements have characteristic NOE patterns and information about the location of secondary structure elements in the protein thus facilitates the search process for these crosspeaks at least to some extent. Vice versa, NOE patterns aid in identification of secondary structure elements.

The chemical shifts of Cα, Cβ and CO nuclei in a protein are strongly correlated to secondary structure 134; 135. It was found that these shifts could be used quantitatively to determine secondary structure conformation in a protein chain 136. Both Cα and CO nuclei shift downfield when located in helices and upfield in β-sheet structure. The Cβ nuclei experience a shift in the opposite direction. Wishart and Sykes developed a Chemical Shift Index (CSI) with reference values from twenty different proteins 136. The CSI has the values 1, 0, and –1 and is assigned to all nuclei based on their shifts relative to amino acid specific reference shifts. Consecutive sequences of either 1 or –1 indicate presence of a secondary structure element and whether it is α or β structure. The CSI for TthL18 as published in paper III is shown in figure 4.5.

Scalar coupling constants between the backbone HN and Hα, 3JHNHα, contain information about the backbone dihedral φ angle and can thus be used to determine the sequential location and nature of secondary structure elements. A common experiment used to measure these coupling constants is the 3D 15N-1H HNHA. The crosspeak intensity is related to the intensity of the diagonal peak, giving coupling constants in units of Hertz (Hz). Coupling constants typical for α-helical structure are lower than 5 Hz, while coupling constants indicative of β-sheets are higher than 8 Hz (figure 4.5).

Page 54: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

44 Protein production, characterization and structure determination in structural genomics

As was mentioned in the beginning of this section, NOE patterns are characteristic for secondary structure and are a third tool in identifying structural elements. For instance, as a single turn of a α-helix contains 3.6 residues, Hα of the first residue is close in space to HΝ of the third residue, while these atoms are far apart in β-sheets. Short-range NOEs are thus indicative of secondary structure elements.

Figure 4.5 Identification of secondary structure features in L18. Top: amino acid sequence of TthL18. Black dots mark residues whose amide protons exchange slowly (resonances that arestill present in the 15N-HSQC 10 minutes after dissolving lyophilised L18 in 2H2O; after 10 hours residues 36-40, 48, 68-73, 76, 77, 100, 102-105, and 109 could still be observed clearly). Secondary structure elements (arrows for β-strands, cylinders for α-helices) as identified by the data are indicated below the sequence. First panel: 13C chemical shift index of L18. Filled bars represent data derived from Cα secondary shifts, open bars represent data derived from Cβsecondary shifts. Residue numbers are indicated between the panels. Second panel: 3JHNHAcoupling constants calculated from integrals of non-overlapping peaks in a 3D 15N,1H-HNHA spectrum. Approximate limits for 3JHNHA coupling constants typical for α-helices (< 5 Hz) and β-strands (> 8 Hz) are indicated with dashed lines in the diagram. Third panel: {1H}-15N heteronuclear NOEs defined as Isaturated/Iunsaturated. Errors were estimated using the average noise level in the spectra. *: the value for Leu3 is not displayed in the graph; it was determined to be -3.89 +/-0.38.

Page 55: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

Structure determination of ribosomal protein L18 45

Finally, measurement of internal dynamics in the protein gives information about the degree of order in various parts of the protein and the likelihood of finding long-range structurally important NOEs in these parts. All proteins are dynamic to a certain degree and dynamic processes exist on various timescales, such as side chain rotation (fast) and domain motion (slow). Motion in the protein causes spin relaxation, which affects the NMR spectrum. This can result in lower quality data, but it is also useful and informative. The heteronuclear {1H}-15N steady state NOE experiment is often used to obtain information about the degree of order throughout the protein based on relaxation of backbone amide groups (figure 4.5).

4.4.3 Restraints for structure calculation Distance restraints based on NOEs constitute the most important information for structure determination by NMR. It is not unusual to use 1500-2000 NOEs for structure calculation for a high-resolution structure of a small to medium-sized protein such as L18. The integral of a NOE crosspeak is a measure of the distance between the nuclei that give rise to the crosspeak. A number of classes are generated for NOEs of different strength, for instance a class for very weak interactions (restraining the distance between the two nuclei to 1.8-6.0 Å), weak interactions (1.8-

Figure 4.6 Dihe-dral angles. Topfigure: segment ofa polypeptidechain with thelocations of thedihedral angles. ωis the peptidebond angle, whichis always planar(180º). The rota-mers for the χ1angles are shownin the bottomfigure.

Page 56: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

46 Protein production, characterization and structure determination in structural genomics

4.3 Å) medium interactions (1.8-3.5 Å) and strong interactions (1.8-2.5 Å). Angles in the protein backbone can also be used as structural restraints, as they

are specific for different secondary structure elements. The φ angle is that around the backbone N atom and the Cα, while the ψ angle describes the angle around the Cα-C’ bond (figure 4.6). Already in the 1960s, shortly after publication of the first protein structures, the Indian physicist G.N. Ramachandran determined that only certain φ and ψ angles were allowed in proteins 137. A so-called Ramachandran plot displays the allowed regions for combinations of φ and ψ angles. In other areas, steric collisions would occur and none of these ‘disallowed’ angles occur generally in proteins, except in glycine residues. Glycine is the simplest amino acid and contains only a hydrogen as a side chain, resulting in many more possible backbone conformations 138. This unusual property makes glycine structurally important and often conserved in proteins 139.

The HNHA experiment used for identification of secondary structure segments can also be used to obtain φ angular restraints. Angles are fixed to helical (-60º ± 30º) or sheet (-120º ± 30º) structure based on 3JHNHα, couplings. There are also programs that calculate φ and ψ angles based on backbone carbon, nitrogen and proton chemical shifts, one of which is the software TALOS 140. The χ1 torsion angle around the Cα-Cβ bond in the amino acid side chain (figure 4.6) is subject to conformational restrictions due to steric hindrance between the side chain atoms at the γ position and the protein main chain. Three different conformations can be possible: gauche gauche (+60º), trans gauche (-60º), and gauche trans (180º; figure 4.6). Using a combination of different experiments, such as TOCSY and NOESY experiments with short mixing times, χ1 angles can be determined experimentally.

Hydrogen bonds are important restraints in structure calculation and their presence is characterized by the short distance – around 2.2 Å – between the hydrogen-bond donor (e.g. an amide proton) and the hydrogen-bond acceptor (e.g. a carbonyl oxygen). A simple way to identify hydrogen bonds involving backbone amide groups is to dissolve 15N-labeled protein in 2H2O. Amide protons are prone to exchange with solvent protons (or deuterons), especially if they are not hydrogen bonded. A 15N-HSQC experiment run shortly after dissolving the protein in 2H2O will show only resonances of those amide groups involved in hydrogen bonds.

Page 57: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

Structure determination of ribosomal protein L18 47

4.4.4 Structure calculation using CNS Protein structure calculation from distances between hydrogen atoms is a highly complex optimization process in which it has to be avoided that the model settles into local minima during convergence. The basic task for a structure calculation program is to convert distances between atoms into spatial coordinates for all the atoms in such a way that they obey all laws in protein conformational chemistry. A number of programs exist nowadays that are generally used for protein structure calculation from NMR or X-ray crystallography data. The most well-known of these are X-PLOR 141 and CNS 142 for both types of data, and DYANA for only NMR data 143.

There are a few different structure calculation methods, combinations of which are applied in the above-mentioned programs. The amino acid chain needs to be fitted to the distance restraints, which can be done using either distance geometry or restrained molecular dynamics (MD) simulated annealing (SA). These techniques can work either in Cartesian space or in torsion angle space. The main difference between torsion angle space and Cartesian space is that in torsion angle space is dealt with internal coordinates rather than Cartesian coordinates. This reduces the need for strong functions to preserve the covalent structure (bonds and angles) that are needed in Cartesian space MD 143. It also reduced the risk for chirality errors that were discovered in structures calculated with Cartesian space MD 144. In CNS, a choice between Cartesian and torsion angle space can be made, while the structure calculation method is SA. As Paper IV uses CNS for structure calculation, the principle of SA is briefly discussed here. In SA, the allowed conformational space is searched at high temperature. The degrees of freedom are either the Cartesian coordinates of atoms (in Cartesian space) or the dihedral angles (in torsion angle space). A simplified force field is applied that disregards non-bonded electrostatic interactions and does not consider solvent. The temperature is subsequently lowered in a large number of steps, during which the model can settle in the conformation with the lowest potential energy. As a final step, the energy of the model is minimized in a number of cycles.

In practice, many structural models are calculated independently and sorted from low to high potential energy. Structure calculation is an iterative process during which violations are checked and corrected, while new restraints might be added between different runs. An ensemble of typically 20 models out of 40-50 calculated structures usually represents the final NMR structure. As there usually are few distance restraints

Page 58: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

48 Protein production, characterization and structure determination in structural genomics

for loop and other flexible regions, these regions remain undefined in the structural model, which is visible as large variations in conformation (figure 4.7). The quality of a structure is assessed by the energy of the structure, the number of distance restraints, the precision of the structure (i.e. the r.m.s.d. of all models to the average structure), and the number of violations in distances and angles.

Figure 4.7 Number of distance restraints and backbone r.m.s.d. of the ensemble. To the left, the column diagram shows the number of NOEs per residue (left axis). From bottom to top are displayed: intraresidue (light grey), sequential (white), medium range (grey) and long range (black) NOEs. The scatter diagram shows the average local backbone rmsd (right axis). The secondary structure elements are shown schematically above the diagram. To the right, the final ensemble of 27 structures of TthL18, superimposed on ordered regions. Resides 1-21 are not depicted (all structure images are made using MOLMOL 145).

4.54.54.54.5 Solution structure of L18 from Solution structure of L18 from Solution structure of L18 from Solution structure of L18 from Thermus thermophilusThermus thermophilusThermus thermophilusThermus thermophilus

4.5.1 Description of the structure and comparison to other structures

The assignment of TthL18 was presented in Paper III. Out of 109 expected resonances in the 15N-HSQC, 101 were observed and could be assigned. Missing peaks represented resides in the N-terminus or in loop regions. The N-terminus (residues 1-22) is completely disordered in solution, as was shown by measurement of {1H}-15N heteronuclear NOEs. Based on CSI, 3JHNHα couplings and amide proton exchange data, the secondary structure elements could be identified (figure 4.5).

Paper IV describes the solution structure of TthL18. It was found to be a mixed α/β protein with three consecutive antiparallel β-strands (residues 23 to 51) followed

Page 59: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

Structure determination of ribosomal protein L18 49

by a helical turn, a long loop, an α-helix (residues 63 to 79), a fourth small β-strand parallel to the first β-strand, another loop, the second helix (residues 96 to 107) and an extended, well-ordered C-terminus which almost forms a β-strand parallel to the fourth β-strand (figure 4.8a). Interestingly, the amide hydrogen of Phe111, the C-terminal residue, is hydrogen-bonded, but no other acceptor except its own aromatic ring can be seen in the vicinity of the NH group. The third β-strand was assumed to be shorter in Paper III (Leu47 – Ala50 vs. Val45 – Ala50 in the structure). The structure shows that Leu47 forms a bulge in the beginning of β3, with a φ angle of around –95º and a ψ angle of -63º. The CSI thus indicates that this residue and neighboring Thr46 are not typical β-sheet residues, although surrounding amino acids are.

Recently, a medium resolution structure of Bacillus stearothermophilus L18 (BstL18) was published 146. Comparison of this structure to the structure of TthL18 shows that the two proteins clearly possess the same fold (figure 4.8a). As in TthL18, the N-terminus of BstL18 is completely unstructured in solution. The globular domain has the same fold as other L18 proteins, although the helices are oriented differently in BstL18. The difference in orientation is reflected in a r.m.s.d. value of 2.87 Å when all residues in the α-helices and β-sheets of the two structures are superimposed, compared to 1.67 Å when only β-sheet atoms are superimposed 146. This is interesting regarding the high sequence similarity between the globular domains of TthL18 and BstL18 (72%). The largest sequence difference occurs in loop 3, which is longer in BstL18, but it is unclear whether this loop affects placement of the helices. The structure of this region is poorly defined in either of the structures.

As most of the positively charged residues were clustered in the N-terminal region, the β-sheet surface and the loops, it was anticipated that the main interactions with rRNA would be with these regions. Comparison of our TthL18 solution structure with the 2.4 Å resolution crystal structure of HmaL18 determined within the large subunit of the ribosome confirmed this (figure 4.8b). Superposition of the Cα atoms of the secondary structure elements in solution TthL18 with those in HmaL18 gave an r.m.s.d. of only 1.16 Å, making the two structures very similar. This suggests that L18 in its uncomplexed form is a scaffold prepared for RNA recognition, which agrees with its primary RNA binding function and its important role in inducing structural changes in 5S rRNA that subsequently allow for association with 23S rRNA and L5. Even though a 5.5 Å X-ray structure of the entire T. thermophilus ribosome is available 126, HmaL18 was fitted into the density map of this structure, which is the

Page 60: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

50 Protein production, characterization and structure determination in structural genomics

reason – apart from low resolution – why uncomplexed TthL18 was not compared to complexed TthL18.

Page 61: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

Structure determination of ribosomal protein L18 51

4.5.2 Other homologues of L18 Structural homology was detected between L18 and small subunit ribosomal

protein S11. Although sequence alignment showed only a 22% identity between TthS11 and TthL18, their structures are highly similar, as is the conformation of a short region of the RNA molecules (16S rRNA for S11 and 5S rRNA for L18) that interacts with the β-sheet. Sequence-specific interactions with RNA, however, are not conserved in the two proteins, suggesting that neither of the proteins would recognize the substrate of the other. A typical example of structural integrity but divergence of specific protein-RNA interactions is the bulge at the beginning of the third β-strand. This structure is present in complexed and uncomplexed L18 as well as in S11. Its presence causes two residues in the β-strand to form contacts with the bases of RNA in both the HmaL18 structure and the TthS11 structure, but the nature of the residues as well as that of the interacting bases is completely different in L18 and S11. The similarity of S11 and L18 indicates that the two proteins may be paralogous, i.e. the result of a gene duplication event. As they are located at only around 3 kB distance from each other in the E. coli chromosome, this is not unlikely.

The homologue of L18 in eukaryotes is L5, a multidomain protein of which one domain is homologous to prokaryotic L18. Sequence alignment as shown in Paper IV suggests that the structure of the L18-like domain in L5 has a structure similar to those of prokaryotic and archaeal L18 proteins. Xenopus laevis L5, however, was found to be largely unstructured in solution, but there is a mutual induced fit effect when the protein binds to 5S rRNA and as its prokaryotic homologue it appears to bind a large region of 5S rRNA 147. Aromatic ring interactions with RNA have been found to be

Figure 4.8 Structure of L18 and comparison to other structures. (A) Ribbon representation ofL18 with secondary structure elements. Left: similar orientation as the ensemble in figure 4.7.Center: similar orientation as in (B). Right: overlay with BstL18. Here, TthL18 is colored palerthan BstL18 with black coil regions. (B) Left: superposition of HmaL18 (green) and TthL18(red) with Hma 5S rRNA (grey). Right: the large figure shows the surface electrostatic potentialof TthL18 with Hma 5S rRNA in the same orientation as in the left figure. The small figureshows a different angle. (C) Left: superposition of HmaL18 (green) and TthS11 (blue) withsimilar-structured parts of their substrates: 5S rRNA (nucleotides 1-12 and 116-122; darkgreen) and 16S rRNA (nucleotides 676-690 and 701-713; dark blue). Right: bulge region ofTthL18 (residues 44-49; red), HmaL18 (55-60; green) and TthS11 (37-42; blue) with thebasepairs close in space to this region. In dark green is shown C6-G117 of Hma 5S rRNA,while in dark blue is shown C707-G683 of Tth 16S rRNA. Protein-RNA distances closer than4.0 Å are indicated by dashed lines.

Page 62: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

52 Protein production, characterization and structure determination in structural genomics

important for the interaction of L5 with 5S rRNA, which are not found in L18 interaction with 5S rRNA 148. Furthermore, Paper IV suggested that the region in eukaryotic L5 corresponding to the flexible N-terminus in prokaryotic L18 has gained a new function in eukaryotes as it contains a nucleolar-localization signal for transport of the protein – 5S rRNA complex to the nucleolus 149. Another signal sequence, a nuclear export signal, has been mapped to a region corresponding to the first α-helix in prokaryotic L18 150. As nuclear export was found to be most efficient when L5 was not bound to 5S rRNA, the unstructured state of L5 in solution might be of importance for nuclear export 150.

4.5.3 Thermostability of TthL18 Thermus thermophilus has an optimal growth temperature of around 70ºC, which groups the organism among the thermophilic bacteria. The largest group of organisms is mesophilic and prefers to grow around 30ºC (some of these organisms live inside mammals and have optimal growth temperatures of 37ºC), while the extreme temperatures for life are -20ºC on one end in polar ice 151 and 113ºC on the other end in black smokers at 3.5 km depth on the bottom of the ocean 152. While cold-adapted proteins need to possess larger flexibility compared to their mesophilic counterparts, heat-adapted proteins need an increased rigidity. A number of factors have been suggested to provide the increased stability of thermostable proteins: increased number of hydrogen bonds and salt bridges, better packing of the hydrophobic core, increase in the number of charged residues, decreased length of superficial loops and stabilization of secondary structure 153; 154; 155; 156; 157; 158. Other factors, such as high ratios of (Glu + Lys)/(Gln + His) and characteristic codon usage, have also been

Figure 4.9 (E+K)/(Q+H) ratios for nine L18 variants from different species. All organisms are prokaryotes except the archaeon M. jannaschii. The proteins are sorted according to increasing ratio, which groups them into mesophiles (meso) and thermophiles (thermo), excepting B. subtilis L18 (indicated as an outlier by an asterisk).

Page 63: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

Structure determination of ribosomal protein L18 53

postulated to be of significance for thermostability 159; 160. As expected, TthL18 is a thermostable protein. Temperatures up to 95ºC do not

induce melting, and the protein is completely intact after being subjected to this temperature, as shown by CD spectroscopy (unpublished data). As high-resolution structures of mesophilic L18 are not available, the number of hydrogen bonds and salt bridges, packing of the hydrophobic core and stability of secondary structure cannot be compared. As for loop length, loop 3 of TthL18 is the only loop that has different lengths in prokaryotes. It is shorter in T. thermophilus and D. radiodurans than in Bacillus species, E. coli and H. influenzae. Sequence comparisons suggest that these differences are rather the result of evolutionary kinship than of thermal adaptation. When comparing the occurrence of all amino acids in a number of mesophilic (E. coli, H. influenzae, Mycobacterium leprae and Vibrio cholerae, Bacillus subtilis) and thermophilic L18 sequences (M. jannaschii, B. stearothermophilus, Thermotoga maritima, Thermus thermophilus), pronounced grouping of mesophiles and thermophiles can be observed for a number of amino acids. Mesophile L18 contains more alanine and valine, while thermophile L18 has more leucine and slightly more phenylalanine and isoleucine. Residue volume increase as well as increased hydrophobicity of side chains have been linked to increased thermostability 161, which could explain the bias towards bulkier hydrophobic side chains in thermophilic L18. Moreover, the lysine content of thermophilic L18 is clearly increased compared to mesophilic L18. The (Glu + Lys)/(Gln + His) ratio for thermophilic L18 lies around 4.0, while it lies around 1.5 for mesophilic L18 (figure 4.9). Interestingly, B. subtilis has a ratio of 4.0 and groups with the thermophiles rather than mesophiles (its optimal growth temperature is 30ºC), while V. cholerae L18 has a relatively high ratio of 2.6. Although there is a general agreement with values reported in literature, the RNA-binding properties of L18 may in some species cause the ratios to be biased towards higher values. Nevertheless, TthL18 behaves like a typical moderately thermostable protein with increased occurrence of bulky hydrophobic side chains and lysine residues, and a (Glu + Lys)/(Gln + His) ratio of 3.6.

4.5.4 Conclusions L18 of Thermus thermophilus was found to be a mixed α/β protein with two helices that lie packed against a β-sheet. It contains a basic N-terminus of 20 amino acids that is

Page 64: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

54 Protein production, characterization and structure determination in structural genomics

unstructured and flexible in solution. The N-terminus is crucial for interaction with 23S rRNA and undergoes the largest structural change upon binding RNA. The globular part of the protein is very similar in its free and RNA-bound states, indicating that it is a scaffold for recognition of the parts of 5S rRNA that its binds. S11 was found to be a structural homologue of L18, recognizing partly similar structures of 16S and 5S rRNA, respectively. Furthermore, TthL18 appears to be a typical moderately thermostable protein when judged by amino acid composition.

Page 65: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

References 55

5555 ReferencesReferencesReferencesReferences 1. Akashi, H. & Gojobori, T. (2002). Metabolic efficiency and amino acid

composition in the proteomes of Escherichia coli and Bacillus subtilis. Proc Natl Acad Sci U S A 99, 3695-700.

2. Anfinsen, C. B. (1973). Principles that govern the folding of protein chains. Science 181, 223-230.

3. Levinthal, C. (1968). Are there pathways for protein folding? J. Chim. Phys. 65, 44-45.

4. Subramaniam, S. & Milne, J. L. S. (2004). Three-dimensional electron microscopy at molecular resolution. Annu. Rev. Biophys. Biomol. Struct. 33, 141-155.

5. Kimura, Y., Vassylyev, D. G., Miyazawa, A., Kidera, A., Matsushima, M., Mitsuoka, K., Murata, K., Hirai, T. & Fujiyoshi, Y. (1997). Surface of bacteriorhodopsin revealed by high-resolution electron crystallography. Nature 389, 206-11.

6. Murata, K., Mitsuoka, K., Hirai, T., Walz, T., Agre, P., Heymann, J. B., Engel, A. & Fujiyoshi, Y. (2000). Structural determinants of water permeation through aquaporin-1. Nature 407, 599-605.

7. Fiers, W., Contreras, R., Duerinck, F., Haegeman, G., Iserentant, D., Merregaert, J., Min Jou, W., Molemans, F., Raeymaekers, A., Van den Berghe, A., Volckaert, G. & Ysebaert, M. (1976). Complete nucleotide sequence of bacteriophage MS2 RNA: primary and secondary structure of the replicase gene. Nature 260, 500-507.

8. Sanger, F., Air, G. M., Barrell, B. G., Brown, N. L., Coulson, A. R., Fiddes, C. A., Hutchison, C. A., Slocombe, P. M. & Smith, M. (1977). Nucleotide sequence of bacteriophage φX174 DNA. Nature 265, 687-695.

9. Sanger, F., Coulson, A. R., Hong, G. F., Hill, D. F. & Petersen, G. B. (1982). Nucleotide sequence of bacteriophage lambda DNA. Journal of Molecular Biology 162, 729-773.

10. Fleischmann, R. D., Adams, M. D., White, O., Clayton, R. A., Kirkness, E. F., Kerlavage, A. R., Bult, C. J., Tomb, J. F., Dougherty, B. A., Merrick, J. M. & et al. (1995). Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269, 496-512.

11. Mewes, H. W., Albermann, K., Bahr, M., Frishman, D., Gleissner, A., Hani, J., Heumann, K., Kleine, K., Maierl, A., Oliver, S. G., Pfeiffer, F. & Zollner, A. (1997). Overview of the yeast genome. Nature 387, 7-65.

12. (1998). Genome sequence of the nematode C. elegans: a platform for investigating biology. The C. elegans Sequencing Consortium. Science 282, 2012-8.

13. Adams, M. D., Celniker, S. E., Holt, R. A., Evans, C. A., Gocayne, J. D., Amanatides, P. G., Scherer, S. E., Li, P. W., Hoskins, R. A., Galle, R. F., George, R. A., Lewis, S. E., Richards, S., Ashburner, M., Henderson, S. N.,

Page 66: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

56 Protein production, characterization and structure determination in structural genomics

Sutton, G. G., Wortman, J. R., Yandell, M. D., Zhang, Q., Chen, L. X., Brandon, R. C., Rogers, Y. H., Blazej, R. G., Champe, M., Pfeiffer, B. D., Wan, K. H., Doyle, C., Baxter, E. G., Helt, G., Nelson, C. R., Gabor, G. L., Abril, J. F., Agbayani, A., An, H. J., Andrews-Pfannkoch, C., Baldwin, D., Ballew, R. M., Basu, A., Baxendale, J., Bayraktaroglu, L., Beasley, E. M., Beeson, K. Y., Benos, P. V., Berman, B. P., Bhandari, D., Bolshakov, S., Borkova, D., Botchan, M. R., Bouck, J., Brokstein, P., Brottier, P., Burtis, K. C., Busam, D. A., Butler, H., Cadieu, E., Center, A., Chandra, I., Cherry, J. M., Cawley, S., Dahlke, C., Davenport, L. B., Davies, P., de Pablos, B., Delcher, A., Deng, Z., Mays, A. D., Dew, I., Dietz, S. M., Dodson, K., Doup, L. E., Downes, M., Dugan-Rocha, S., Dunkov, B. C., Dunn, P., Durbin, K. J., Evangelista, C. C., Ferraz, C., Ferriera, S., Fleischmann, W., Fosler, C., Gabrielian, A. E., Garg, N. S., Gelbart, W. M., Glasser, K., Glodek, A., Gong, F., Gorrell, J. H., Gu, Z., Guan, P., Harris, M., Harris, N. L., Harvey, D., Heiman, T. J., Hernandez, J. R., Houck, J., Hostin, D., Houston, K. A., Howland, T. J., Wei, M. H., Ibegwam, C., et al. (2000). The genome sequence of Drosophila melanogaster. Science 287, 2185-95.

14. Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., Smith, H. O., Yandell, M., Evans, C. A., Holt, R. A., Gocayne, J. D., Amanatides, P., Ballew, R. M., Huson, D. H., Wortman, J. R., Zhang, Q., Kodira, C. D., Zheng, X. H., Chen, L., Skupski, M., Subramanian, G., Thomas, P. D., Zhang, J., Gabor Miklos, G. L., Nelson, C., Broder, S., Clark, A. G., Nadeau, J., McKusick, V. A., Zinder, N., Levine, A. J., Roberts, R. J., Simon, M., Slayman, C., Hunkapiller, M., Bolanos, R., Delcher, A., Dew, I., Fasulo, D., Flanigan, M., Florea, L., Halpern, A., Hannenhalli, S., Kravitz, S., Levy, S., Mobarry, C., Reinert, K., Remington, K., Abu-Threideh, J., Beasley, E., Biddick, K., Bonazzi, V., Brandon, R., Cargill, M., Chandramouliswaran, I., Charlab, R., Chaturvedi, K., Deng, Z., Di Francesco, V., Dunn, P., Eilbeck, K., Evangelista, C., Gabrielian, A. E., Gan, W., Ge, W., Gong, F., Gu, Z., Guan, P., Heiman, T. J., Higgins, M. E., Ji, R. R., Ke, Z., Ketchum, K. A., Lai, Z., Lei, Y., Li, Z., Li, J., Liang, Y., Lin, X., Lu, F., Merkulov, G. V., Milshina, N., Moore, H. M., Naik, A. K., Narayan, V. A., Neelam, B., Nusskern, D., Rusch, D. B., Salzberg, S., Shao, W., Shue, B., Sun, J., Wang, Z., Wang, A., Wang, X., Wang, J., Wei, M., Wides, R., Xiao, C., Yan, C., et al. (2001). The sequence of the human genome. Science 291, 1304-51.

15. Consortium, I. H. G. S. (2001). Initial sequencing and analysis of the human genome. Nature 409, 860-921.

16. Burley, S. K. (2000). An overview of structural genomics. Nat Struct Biol 7 Suppl, 932-4.

17. Venter, J. C., Remington, K., Heidelberg, J. F., Halpern, A. L., Rusch, D., Eisen, J. A., Wu, D., Paulsen, I., Nelson, K. E., Nelson, W., Fouts, D. E., Levy, S., Knap, A. H., Lomas, M. W., Nealson, K., White, O., Peterson, J., Hoffman, J., Parsons, R., Baden-Tillson, H., Pfannkoch, C., Rogers, Y. H. & Smith, H. O. (2004). Environmental genome shotgun sequencing of the Sargasso Sea. Science 304, 66-74.

Page 67: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

References 57

18. Peti, W., Etezady-Esfarjani, T., Herrmann, T., Klock, H. E., Scott A. Lesley, S. A. & Wuthrich, K. (2004). NMR for structural proteomics of Thermotoga maritima: Screening and structure determination. Journal of Structural and Functional Genomics 5, 205-215.

19. Zarembinski, T. I., Hung, L. W., Mueller-Dieckmann, H. J., Kim, K. K., Yokota, H., Kim, R. & Kim, S. H. (1998). Structure-based assignment of the biochemical function of a hypothetical protein: a test case of structural genomics. Proc Natl Acad Sci U S A 95, 15189-93.

20. Chambers, S. P., Austen, D. A., Fulghum, J. R. & Kim, W. M. (2004). High-throughput screening for soluble recombinant expressed kinases in Escherichia coli and insect cells. Protein Expr Purif 36, 40-7.

21. Manjasetty, B. A., Quedenau, C., Sievert, V., Bussow, K., Niesen, F., Delbruck, H. & Heinemann, U. (2004). X-ray structure of human gankyrin, the product of a gene linked to hepatocellular carcinoma. Proteins 55, 214-7.

22. Erlandsen, H., Abola, E. E. & Stevens, R. C. (2000). Combining structural genomics and enzymology: completing the picture in metabolic pathways and enzyme active sites. Curr Opin Struct Biol 10, 719-30.

23. Yee, A., Chang, X., Pineda-Lucena, A., Wu, B., Semesi, A., Le, B., Ramelot, T., Lee, G. M., Bhattacharyya, S., Gutierrez, P., Denisov, A., Lee, C. H., Cort, J. R., Kozlov, G., Liao, J., Finak, G., Chen, L., Wishart, D., Lee, W., McIntosh, L. P., Gehring, K., Kennedy, M. A., Edwards, A. M. & Arrowsmith, C. H. (2002). An NMR approach to structural proteomics. Proc Natl Acad Sci U S A 99, 1825-30.

24. Folkers, G. E., Van Buuren, B. N. & Kaptein, R. (2004). Expression screening, protein purification and NMR analysis of human protein domains for structural genomics. J Struct Funct Genomics 5, 119-31.

25. Braun, P., Hu, Y., Shen, B., Halleck, A., Koundinya, M., Harlow, E. & LaBaer, J. (2002). Proteome-scale purification of human proteins from bacteria. Proc Natl Acad Sci U S A 99, 2654-9.

26. Knaust, R. K. & Nordlund, P. (2001). Screening for soluble expression of recombinant proteins in a 96-well format. Anal Biochem 297, 79-85.

27. Zhang, A., Gonzalez, S. M., Cantor, E. J. & Chong, S. (2001). Construction of a mini-intein fusion system to allow both direct monitoring of soluble protein expression and rapid purification of target proteins. Gene 275, 241-52.

28. Gronenborn, A. & Clore, G. (1996). Rapid screening for structural integrity of expressed proteins by heteronuclear NMR spectroscopy. Protein Science 5, 174-177.

29. Almeida, F. C., Amorim, G. C., Moreau, V. H., Sousa, V. O., Creazola, A. T., Americo, T. A., Pais, A. P., Leite, A., Netto, L. E., Giordano, R. J. & Valente, A. P. (2001). Selectively labeling the heterologous protein in Escherichia coli for NMR studies: a strategy to speed up NMR spectroscopy. J Magn Reson 148, 142-6.

30. Perutz, M. F. (1963). X-ray analysis of hemoglobin. Science 140, 863-869. 31. Kendrew, J. C. (1963). Myoglobin and the structure of proteins. Science 139,

1259-1266.

Page 68: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

58 Protein production, characterization and structure determination in structural genomics

32. Cohen, S. N., Chang, A. C., Boyer, H. W. & Helling, R. B. (1973). Construction of biologically functional bacterial plasmids in vitro. Proc Natl Acad Sci U S A 70, 3240-4.

33. Meselson, M. & Yuan, R. (1968). DNA restriction enzyme from E. coli. Nature 217, 1110-4.

34. Roulland-Dussoix, D. & Boyer, H. W. (1969). The Escherichia coli B restriction endonuclease. Biochim Biophys Acta 195, 219-29.

35. Fareed, G. C. & Richardson, C. C. (1967). Enzymatic breakage and joining of deoxyribonucleic acid. II. The structural gene for polynucleotide ligase in bacteriophage T4. Proc Natl Acad Sci U S A 58, 665-72.

36. Landy, A. (1989). Dynamic, structural, and regulatory aspects of λ site-specific recombination. Annu. Rev. Biochem. 58, 913-949.

37. Ptashne, M. (1986). A genetic switch: gene control and phage λ, Cell Press & Blackwell Scientific Publications, Cambridge Massachusetts.

38. Bushman, W., Thompson, J. F., Vargas, L. & Landy, A. (1985). Control of directionality in lambda site specific recombination. Science 230, 906-11.

39. Bernard, P. & Couturier, M. (1992). Cell killing by the F plasmid CcdB protein involves poisoning of DNA-topoisomerase II complexes. J Mol Biol 226, 735-45.

40. Anonymous. (2002). GatewayTM technology: A universal technology to clone DNA sequences for functional analysis and expression in multiple systems. version C edit, Invitrogen Corporation.

41. Makrides, S. C. (1996). Strategies for achieving high-level expression of genes in Escherichia coli. Microbiological reviews 60, 512-538.

42. Miroux, B. & Walker, J. E. (1996). Over-production of proteins in Escherichia coli: mutant hosts that allow synthesis of some membrane proteins and globular proteins at high levels. J Mol Biol 260, 289-98.

43. Derman, A. I., Prinz, W. A., Belin, D. & Beckwith, J. (1993). Mutations that allow disulfide bond formation in the cytoplasm of Escherichia coli. Science 262, 1744-7.

44. Higgins, D. R. & Cregg, J. M. (1998). Introduction to Pichia pastoris. Methods Mol Biol 103, 1-15.

45. van den Burg, H. A., de Wit, P. J. & Vervoort, J. (2001). Efficient 13C/15N double labeling of the avirulence protein AVR4 in a methanol-utilizing strain (Mut+) of Pichia pastoris. J Biomol NMR 20, 251-61.

46. Sperker, B., Murdter, T. E., Backman, J. T., Fritz, P. & Kroemer, H. K. (2002). Expression of active human β-glucuronidase in Sf9 cells infected with recombinant baculovirus. Life Sci 71, 1547-57.

47. James, D. C., Goldman, M. H., Hoare, M., Jenkins, N., Oliver, R. W., Green, B. N. & Freedman, R. B. (1996). Posttranslational processing of recombinant human interferon-γ in animal expression systems. Protein Sci 5, 331-40.

48. Geisse, S., Gram, H., Kleuser, B. & Kocher, H. P. (1996). Eukaryotic expression systems: a comparison. Protein Expr Purif 8, 271-82.

49. Andersen, D. C. & Krummen, L. (2002). Recombinant protein expression for therapeutic applications. Curr Opin Biotechnol 13, 117-23.

Page 69: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

References 59

50. Barnes, L. M., Bentley, C. M. & Dickson, A. J. (2003). Stability of protein production from recombinant mammalian cells. Biotechnol Bioeng 81, 631-9.

51. Houdebine, L. M. (2002). Antibody manufacture in transgenic animals and comparisons with other systems. Curr Opin Biotechnol 13, 625-9.

52. Yokoyama, S. (2003). Protein expression systems for structural genomics and proteomics. Curr Opin Chem Biol 7, 39-43.

53. Studier, F. W., Rosenberg, A. H., Dunn, J. J. & Dubendorff, J. W. (1990). Use of T7 RNA polymerase to direct expression of cloned genes. Methods Enzymol 185, 60-89.

54. Adhya, S. & Gottesman, M. (1982). Promoter occlusion: transcription through a promoter may inhibit its activity. Cell 29, 939-44.

55. Stueber, D. & Bujard, H. (1982). Transcription from efficient promoters can interfere with plasmid replication and diminish expression of plasmid specified genes. Embo J 1, 1399-404.

56. Shine, J. & Dalgarno, L. (1975). Determinant of cistron specificity in bacterial ribosomes. Nature 254, 34-8.

57. Shine, J. & Dalgarno, L. (1974). The 3'-terminal sequence of Escherichia coli 16S ribosomal RNA: complementarity to nonsense triplets and ribosome binding sites. Proc Natl Acad Sci U S A 71, 1342-6.

58. Steitz, J. A. & Jakes, K. (1975). How ribosomes select initiator regions in mRNA: base pair formation between the 3' terminus of 16S rRNA and the mRNA during initiation of protein synthesis in Escherichia coli. Proc Natl Acad Sci U S A 72, 4734-8.

59. Stenstrom, C. M., Jin, H., Major, L. L., Tate, W. P. & Isaksson, L. A. (2001). Codon bias at the 3'-side of the initiation codon is correlated with translation initiation efficiency in Escherichia coli. Gene 263, 273-84.

60. O'Connor, M., Asai, T., Squires, C. L. & Dahlberg, A. E. (1999). Enhancement of translation by the downstream box does not involve base pairing of mRNA with the penultimate stem sequence of 16S rRNA. Proc Natl Acad Sci U S A 96, 8973-8.

61. Looman, A. C., Bodlaender, J., Comstock, L. J., Eaton, D., Jhurani, P., de Boer, H. A. & van Knippenberg, P. H. (1987). Influence of the codon following the AUG initiation codon on the expression of a modified lacZ gene in Escherichia coli. Embo J 6, 2489-92.

62. Zhelyabovskaya, O. B., Berlin, Y. A. & Birikh, K. R. (2004). Artificial genetic selection for an efficient translation initiation site for expression of human RACK1 gene in Escherichia coli. Nucleic Acids Res 32, e52.

63. Kane, J. F. (1995). Effects of rare codon clusters on high-level expression of heterologous proteins in Escherichia coli. Curr Opin Biotechnol 6, 494-500.

64. Bachmair, A., Finley, D. & Varshavsky, A. (1986). In vivo half-life of a protein is a function of its amino-terminal residue. Science 234, 179-86.

65. Silhavy, T. J., Shuman, H. A., Beckwith, J. & Schwartz, M. (1977). Use of gene fusions to study outer membrane protein localization in Escherichia coli. Proc Natl Acad Sci U S A 74, 5411-5.

66. Casadaban, M. J., Chou, J. & Cohen, S. N. (1980). In vitro gene fusions that join an enzymatically active beta-galactosidase segment to amino-terminal

Page 70: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

60 Protein production, characterization and structure determination in structural genomics

fragments of exogenous proteins: Escherichia coli plasmid vectors for the detection and cloning of translational initiation signals. J Bacteriol 143, 971-80.

67. Zabeau, M. & Stanley, K. K. (1982). Enhanced expression of cro-beta-galactosidase fusion proteins under the control of the PR promoter of bacteriophage lambda. Embo J 1, 1217-24.

68. Uhlen, M., Nilsson, B., Guss, B., Lindberg, M., Gatenbeck, S. & Philipson, L. (1983). Gene fusion vectors based on the gene for staphylococcal protein A. Gene 23, 369-78.

69. Smith, J. C., Derbyshire, R. B., Cook, E., Dunthorne, L., Viney, J., Brewer, S. J., Sassenfeld, H. M. & Bell, L. D. (1984). Chemical synthesis and cloning of a poly(arginine)-coding gene fragment designed to aid polypeptide purification. Gene 32, 321-7.

70. Nilsson, J., Stahl, S., Lundeberg, J., Uhlen, M. & Nygren, P. A. (1997). Affinity fusion strategies for detection, purification, and immobilization of recombinant proteins. Protein Expr Purif 11, 1-16.

71. Terpe, K. (2003). Overview of tag protein fusions: from molecular and biochemical fundamentals to commercial systems. Appl Microbiol Biotechnol 60, 523-33.

72. Serber, Z. & Dotsch, V. (2001). In-cell NMR spectroscopy. Biochemistry 40, 14317-23.

73. de Lamotte, F., Boze, H., Blanchard, C., Klein, C., Moulin, G., Gautier, M. F. & Delsuc, M. A. (2001). NMR monitoring of accumulation and folding of 15N-labeled protein overexpressed in Pichia pastoris. Protein Expr Purif 22, 318-24.

74. Riek, R., Pervushin, K. & Wuthrich, K. (2000). TROSY and CRINEPT: NMR with large molecular and supramolecular structures in solution. Trends Biochem Sci 25, 462-8.

75. Yao, J., Nakazawa, Y. & Asakura, T. (2004). Structures of Bombyx mori and Samia cynthia ricini silk fibroins studied with solid-state NMR. Biomacromolecules 5, 680-8.

76. Petkova, A. T., Buntkowsky, G., Dyda, F., Leapman, R. D., Yau, W. M. & Tycko, R. (2004). Solid state NMR reveals a pH-dependent antiparallel beta-sheet registry in fibrils formed by a beta-amyloid peptide. J Mol Biol 335, 247-60.

77. Castellani, F., van Rossum, B. J., Diehl, A., Rehbein, K. & Oschkinat, H. (2003). Determination of solid-state NMR structures of proteins by means of three-dimensional 15N-13C-13C dipolar correlation spectroscopy and chemical shift analysis. Biochemistry 42, 11476-83.

78. Busso, D., Kim, R. & Kim, S. H. (2003). Expression of soluble recombinant proteins in a cell-free system using a 96-well format. J Biochem Biophys Methods 55, 233-40.

79. Kohli, B. M. & Ostermeier, C. (2003). A Rubredoxin based system for screening of protein expression conditions and on-line monitoring of the purification process. Protein Expr Purif 28, 362-7.

Page 71: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

References 61

80. Mohanty, A. K. & Wiener, M. C. (2004). Membrane protein expression and production: effects of polyhistidine tag length and position. Protein Expr Purif 33, 311-25.

81. Pryor, K. D. & Leiting, B. (1997). High-level expression of soluble protein in Escherichia coli using a His6-tag and maltose-binding-protein double-affinity fusion system. Protein Expr Purif 10, 309-19.

82. Routzahn, K. M. & Waugh, D. S. (2002). Differential effects of supplementary affinity tags on the solubility of MBP fusion proteins. J Struct Funct Genomics 2, 83-92.

83. Kapust, R. B. & Waugh, D. S. (1999). Escherichia coli maltose-binding protein is uncommonly effective at promoting the solubility of polypeptides to which it is fused. Protein Sci 8, 1668-74.

84. Shih, Y. P., Kung, W. M., Chen, J. C., Yeh, C. H., Wang, A. H. & Wang, T. F. (2002). High-throughput screening of soluble recombinant proteins. Protein Sci 11, 1714-9.

85. Hammarstrom, M., Hellgren, N., van Den Berg, S., Berglund, H. & Hard, T. (2002). Rapid screening for improved solubility of small human proteins produced as fusion proteins in Escherichia coli. Protein Sci 11, 313-21.

86. Hsieh, M. & Brenowitz, M. (1997). Comparison of the DNA association kinetics of the Lac repressor tetramer, its dimeric mutant LacIadi, and the native dimeric Gal repressor. J Biol Chem 272, 22092-6.

87. Prestegard, J. H., Valafar, H., Glushka, J. & Tian, F. (2001). Nuclear magnetic resonance in the era of structural genomics. Biochemistry 40, 8677-85.

88. Creighton, T. E. (1992). Proteins: structures and molecular properties. 2nd edit, W.H. Freeman & Company, New York, NY.

89. Van Craenenbroeck, E. & Engelborghs, Y. (2000). Fluorescence correlation spectroscopy: molecular recognition at the single molecule level. J Mol Recognit 13, 93-100.

90. Wallace, M. I., Molloy, J. E. & Trentham, D. R. (2003). Combined single-molecule force and fluorescence measurements for biology. J Biol 2, 4.

91. Beck, J. L., Colgrave, M. L., Ralph, S. F. & Sheil, M. M. (2001). Electrospray ionization mass spectrometry of oligonucleotide complexes with drugs, metals, and proteins. Mass Spectrom Rev 20, 61-87.

92. Van Holde, K. E., Johnson, W. C. & Ho, P. S. (1998). Principles of physical biochemistry, Prentice-Hall, Upper Saddle River, NJ.

93. Svergun, D. I. & Koch, M. H. (2002). Advances in structure analysis using small-angle scattering in solution. Curr Opin Struct Biol 12, 654-60.

94. Schwarzinger, S., Kroon, G. J., Foss, T. R., Chung, J., Wright, P. E. & Dyson, H. J. (2001). Sequence-dependent correction of random coil NMR chemical shifts. J Am Chem Soc 123, 2970-8.

95. Semisotnov, G. V., Rodionova, N. A., Razgulyaev, O. I., Uversky, V. N., Gripas, A. F. & Gilmanshin, R. I. (1991). Study of the "molten globule" intermediate state in protein folding by a hydrophobic fluorescent probe. Biopolymers 31, 119-28.

96. Mueller, U., Bussow, K., Diehl, A., Bartl, F. J., Niesen, F. H., Nyarsik, L. & Heinemann, U. (2003). Rapid purification and crystal structure analysis of a

Page 72: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

62 Protein production, characterization and structure determination in structural genomics

small protein carrying two terminal affinity tags. J Struct Funct Genomics 4, 217-25.

97. Tai, L. J., McFall, S. M., Huang, K., Demeler, B., Fox, S. G., Brubaker, K., Radhakrishnan, I. & Morimoto, R. I. (2002). Structure-function analysis of the heat shock factor-binding protein reveals a protein composed solely of a highly conserved and dynamic coiled-coil trimerization domain. J Biol Chem 277, 735-45.

98. Fontana, A., Polverino de Laureto, P., De Filippis, V., Scaramella, E. & Zambonin, M. (1997). Probing the partly folded states of proteins by limited proteolysis. Fold Des 2, R17-26.

99. Kuznetsova, I. M., Turoverov, K. K. & Uversky, V. N. (2004). Use of the phase diagram method to analyze the protein unfolding-refolding reactions: fishing out the 'invisible' intermediates. Journal of Proteome Research 3, 485-494.

100. Bartels, C., Guntert, P., Billeter, M. & Wuthrich, K. (1997). GARANT - a general algorithm for resonance assignment of multidimensional nuclear magnetic resonance spectra. J. Comp. Chem. 18, 139-149.

101. Hyberts, S. G. & Wagner, G. (2003). IBIS--a tool for automated sequential assignment of protein spectra from triple resonance experiments. J Biomol NMR 26, 335-44.

102. Zimmerman, D. E., Kulikowski, C. A., Huang, Y., Feng, W., Tashiro, M., Shimotakahara, S., Chien, C., Powers, R. & Montelione, G. T. (1997). Automated analysis of protein NMR assignments using methods from artificial intelligence. J Mol Biol 269, 592-610.

103. Malmodin, D., Papavoine, C. H. & Billeter, M. (2003). Fully automated sequence-specific resonance assignments of hetero- nuclear protein spectra. J Biomol NMR 27, 69-79.

104. Helgstrand, M., Kraulis, P., Allard, P. & Hard, T. (2000). Ansig for Windows: an interactive computer program for semiautomatic assignment of protein NMR spectra. J Biomol NMR 18, 329-36.

105. Nilges, M., Macias, M. J., O'Donoghue, S. I. & Oschkinat, H. (1997). Automated NOESY interpretation with ambiguous distance restraints: the refined NMR solution structure of the pleckstrin homology domain from beta-spectrin. J Mol Biol 269, 408-22.

106. Xu, Y., Wu, J., Gorenstein, D. & Braun, W. (1999). Automated 2D NOESY assignment and structure calculation of Crambin(S22/I25) with the self-correcting distance geometry based NOAH/DIAMOD programs. J Magn Reson 136, 76-85.

107. Herrmann, T., Guntert, P. & Wuthrich, K. (2002). Protein NMR structure determination with automated NOE assignment using the new software CANDID and the torsion angle dynamics algorithm DYANA. J Mol Biol 319, 209-27.

108. Herrmann, T., Guntert, P. & Wuthrich, K. (2002). Protein NMR structure determination with automated NOE-identification in the NOESY spectra using the new software ATNOS. J Biomol NMR 24, 171-89.

109. Linge, J. P., Habeck, M., Rieping, W. & Nilges, M. (2003). ARIA: automated NOE assignment and NMR structure calculation. Bioinformatics 19, 315-6.

Page 73: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

References 63

110. Gardner, K. H., Rosen, M. K. & Kay, L. E. (1997). Global folds of highly deuterated, methyl-protonated proteins by multidimensional NMR. Biochemistry 36, 1389-401.

111. Otomo, T., Ito, N., Kyogoku, Y. & Yamazaki, T. (1999). NMR observation of selected segments in a larger protein: central-segment isotope labeling through intein-mediated ligation. Biochemistry 38, 16040-4.

112. Tjandra, N. & Bax, A. (1997). Direct measurement of distances and angles in biomolecules by NMR in a dilute liquid crystalline medium. Science 278, 1111-4.

113. Clore, G. M., Starich, M. R. & Gronenborn, A. M. (1998). Measurement of residual dipolar couplings of macromolecules aligned in the nematic phase of a colloidal suspension of rod-shaped viruses. J Am Chem Soc 120, 10571-10572.

114. Hansen, M. R., Mueller, L. & Pardi, A. (1998). Tunable alignment of macromolecules by filamentous phage yields dipolar coupling interactions. Nat Struct Biol 5, 1065-74.

115. Fleming, K., Gray, D., Prasannan, S. & Matthews, S. (2000). Cellulose crystallites: a new and robust liquid crystalline medium for the measurement of residual dipolar couplings. J Am Chem Soc 122, 5224-5225.

116. Sass, H. J., Musco, G., Stahl, S. J., Wingfield, P. T. & Grzesiek, S. (2000). Solution NMR of proteins within polyacrylamide gels: diffusional properties and residual alignment by mechanical stress or embedding of oriented purple membranes. J Biomol NMR 18, 303-9.

117. Tycko, R., Blanco, F. J. & Ishii, Y. (2000). Alignment of biopolymers in strained gels: a new way to create detectable dipole-dipole couplings in high-resolution biomolecular NMR. J Am Chem Soc 122, 9340-9341.

118. Zhou, H., Vermeulen, A., Jucker, F. M. & Pardi, A. (1999). Incorporating residual dipolar couplings into the NMR solution structure determination of nucleic acids. Biopolymers 52, 168-80.

119. Annila, A., Aitio, H., Thulin, E. & Drakenberg, T. (1999). Recognition of protein folds via dipolar couplings. J Biomol NMR 14, 223-230.

120. Delaglio, F., Kontaxis, G. & Bax, A. (2000). Protein structure determination using molecular fragment replacement and NMR dipolar couplings. J Am Chem Soc 122, 2142-2143.

121. Fowler, C. A., Tian, F., Al-Hashimi, H. M. & Prestegard, J. H. (2000). Rapid determination of protein folds using residual dipolar couplings. J Mol Biol 304, 447-60.

122. Ban, N., Nissen, P., Hansen, J., Moore, P. B. & Steitz, T. A. (2000). The complete atomic structure of the large ribosomal subunit at 2.4 A resolution. Science 289, 905-20.

123. Wimberly, B. T., Brodersen, D. E., Clemons, W. M., Jr., Morgan-Warren, R. J., Carter, A. P., Vonrhein, C., Hartsch, T. & Ramakrishnan, V. (2000). Structure of the 30S ribosomal subunit. Nature 407, 327-39.

124. Schluenzen, F., Tocilj, A., Zarivach, R., Harms, J., Gluehmann, M., Janell, D., Bashan, A., Bartels, H., Agmon, I., Franceschi, F. & Yonath, A. (2000).

Page 74: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

64 Protein production, characterization and structure determination in structural genomics

Structure of functionally activated small ribosomal subunit at 3.3 angstroms resolution. Cell 102, 615-23.

125. Harms, J., Schluenzen, F., Zarivach, R., Bashan, A., Gat, S., Agmon, I., Bartels, H., Franceschi, F. & Yonath, A. (2001). High resolution structure of the large ribosomal subunit from a mesophilic eubacterium. Cell 107, 679-88.

126. Yusupov, M. M., Yusupova, G. Z., Baucom, A., Lieberman, K., Earnest, T. N., Cate, J. H. & Noller, H. F. (2001). Crystal structure of the ribosome at 5.5 A resolution. Science 292, 883-96.

127. Nissen, P., Hansen, J., Ban, N., Moore, P. B. & Steitz, T. A. (2000). The structural basis of ribosome activity in peptide bond synthesis. Science 289, 920-30.

128. Bashan, A., Agmon, I., Zarivach, R., Schluenzen, F., Harms, J., Berisio, R., Bartels, H., Franceschi, F., Auerbach, T., Hansen, H. A., Kossoy, E., Kessler, M. & Yonath, A. (2003). Structural basis of the ribosomal machinery for peptide bond formation, translocation, and nascent chain progression. Mol Cell 11, 91-102.

129. Yonath, A. (2003). Structural insight into functional aspects of ribosomal RNA targeting. Chembiochem 4, 1008-17.

130. Ciesiolka, J., Lorenz, S. & Erdmann, V. A. (1992). Structural analysis of three prokaryotic 5S rRNA species and selected 5S rRNA--ribosomal-protein complexes by means of Pb(II)-induced hydrolysis. Eur J Biochem 204, 575-81.

131. Newberry, V. & Garrett, R. A. (1980). The role of the basic N-terminal region of protein L18 in 5S RNA-23S RNA complex formation. Nucleic Acids Res 8, 4131-42.

132. Cavanagh, J., Fairbrother, W. J., A.G., P. I. & Skeltin, N. J. (1996). Protein NMR spectroscopy: principles and practice, Academic Press, San Diego, CA.

133. Wüthrich, K. (1986). NMR of proteins and nucleic acids, John Wiley & Sons, New York, NY.

134. Wishart, D. S., Sykes, B. D. & Richards, F. M. (1991). Relationship between nuclear magnetic resonance chemical shift and protein secondary structure. J Mol Biol 222, 311-33.

135. Spera, S. & Bax, A. (1991). Empirical correlation between protein backbone conformation and Cα and Cβ 13C nuclear magnetic resonance chemical shifts. J Am Chem Soc 113, 5490-5492.

136. Wishart, D. S. & Sykes, B. D. (1994). The 13C chemical-shift index: a simple method for the identification of protein secondary structure using 13C chemical-shift data. J Biomol NMR 4, 171-80.

137. Ramachandran, G. N., Ramakrishnan, C. & Sasisekharan, V. (1963). Stereochemistry of polypeptide chain configurations. J Mol Biol 7, 95-9.

138. Richardson, J. S. (1981). The anatomy and taxonomy of protein structure. Adv Protein Chem 34, 167-339.

139. Branden, C. & Tooze, J. (1999). Introduction to protein structure. 2nd edit, Garland Publishing, New York, NY.

140. Cornilescu, G., Delaglio, F. & Bax, A. (1999). Protein backbone angle restraints from searching a database for chemical shift and sequence homology. J Biomol NMR 13, 289-302.

Page 75: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

References 65

141. Brunger, A. T. (1992). X-PLOR version 3.1. A system for X-ray crystallography and NMR. Yale University Press, New Haven, CT.

142. Brunger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse-Kunstleve, R. W., Jiang, J. S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T. & Warren, G. L. (1998). Crystallography & NMR system: A new software suite for macromolecular structure determination. Acta Crystallogr D Biol Crystallogr 54 ( Pt 5), 905-21.

143. Guntert, P., Mumenthaler, C. & Wuthrich, K. (1997). Torsion angle dynamics for NMR structure calculation with the new program DYANA. J Mol Biol 273, 283-98.

144. Schultze, P. & Feigon, J. (1997). Chirality errors in nucleic acid structures. Nature 387, 668.

145. Koradi, R., Billeter, M. & Wuthrich, K. (1996). MOLMOL: a program for display and analysis of macromolecular structures. J Mol Graph 14, 51-5, 29-32.

146. Turner, C. F. & Moore, P. B. (2004). The solution structure of ribosomal protein L18 from Bacillus stearothermophilus. J Mol Biol 335, 679-84.

147. DiNitto, J. P. & Huber, P. W. (2003). Mutual induced fit binding of Xenopus ribosomal protein L5 to 5S rRNA. J Mol Biol 330, 979-92.

148. DiNitto, J. P. & Huber, P. W. (2001). A role for aromatic amino acids in the binding of Xenopus ribosomal protein L5 to 5S rRNA. Biochemistry 40, 12645-53.

149. Murdoch, K. J. & Allison, L. A. (1996). A role for ribosomal protein L5 in the nuclear import of 5S rRNA in Xenopus oocytes. Exp Cell Res 227, 332-43.

150. Rosorius, O., Fries, B., Stauber, R. H., Hirschmann, N., Bevec, D. & Hauber, J. (2000). Human ribosomal protein L5 contains defined nuclear localization and export signals. J Biol Chem 275, 12061-8.

151. Deming, J. W. (2002). Psychrophiles and polar regions. Curr Opin Microbiol 5, 301-9.

152. Blochl, E., Rachel, R., Burggraf, S., Hafenbradl, D., Jannasch, H. W. & Stetter, K. O. (1997). Pyrolobus fumarii, gen. and sp. nov., represents a novel group of archaea, extending the upper temperature limit for life to 113 degrees C. Extremophiles 1, 14-21.

153. Szilagyi, A. & Zavodszky, P. (2000). Structural differences between mesophilic, moderately thermophilic and extremely thermophilic protein subunits: results of a comprehensive survey. Structure Fold Des 8, 493-504.

154. Vogt, G., Woell, S. & Argos, P. (1997). Protein thermal stability, hydrogen bonds, and ion pairs. J Mol Biol 269, 631-43.

155. Vogt, G. & Argos, P. (1997). Protein thermal stability: hydrogen bonds or internal packing? Fold Des 2, S40-6.

156. Vetriani, C., Maeder, D. L., Tolliday, N., Yip, K. S., Stillman, T. J., Britton, K. L., Rice, D. W., Klump, H. H. & Robb, F. T. (1998). Protein thermostability above 100 degreesC: a key role for ionic interactions. Proc Natl Acad Sci U S A 95, 12300-5.

157. Cambillau, C. & Claverie, J. M. (2000). Structural and genomic correlates of hyperthermostability. J Biol Chem 275, 32383-6.

Page 76: Protein production, characterization and structure ...8634/FULLTEXT01.pdf · 4.1 New methods in structure determination ... secondary and tertiary structure content and structure

66 Protein production, characterization and structure determination in structural genomics

158. Thompson, M. J. & Eisenberg, D. (1999). Transproteomic evidence of a loop-deletion mechanism for enhancing protein thermostability. J Mol Biol 290, 595-604.

159. Singer, G. A. & Hickey, D. A. (2003). Thermophilic prokaryotes have characteristic patterns of codon usage, amino acid composition and nucleotide content. Gene 317, 39-47.

160. Farias, S. T. & Bonato, M. C. (2003). Preferred amino acids and thermostability. Genet Mol Res 2, 383-93.

161. Haney, P. J., Badger, J. H., Buldak, G. L., Reich, C. I., Woese, C. R. & Olsen, G. J. (1999). Thermal adaptation analyzed by comparison of protein sequences from mesophilic and extremely thermophilic Methanococcus species. Proc Natl Acad Sci U S A 96, 3578-83.