exposing relationships using directed evolution

3
| Research Focus Exposing relationships using directed evolution Oliver J. Miller and Paul A. Dalby The Advanced Centre for Biochemical Engineering, Department of Biochemical Engineering, University College London, Torrington Place, London WC1E 7JE, UK Functionally related protein structures that have under- gone significant mutagenesis and re-arrangement over a large evolutionary time-scale might no longer share enough sequence or structural similarity to be revealed by even the most advanced database searches. Recently, Christ and Winter used directed evolution to obtain functional variants of the RNA-hairpin-binding protein Rop. Using the functional sequences obtained, a structural database search revealed previously unknown similarity to the tRNA-binding region of valyl- tRNA synthetase. It is well established that proteins of both similar and unrelated function can have the same overall structural topology but statistically insignificant sequence homology. For example, human hemoglobin and lupine leghemo- globin have very similar tertiary structures but only 15.6% homology at the amino acid sequence level [1]. The extent to which sequences can be altered and yet achieve the same protein fold has been investigated with the directed evolution of a functional Src homology 3 (SH3) domain, using phage-displayed libraries containing a simplified alphabet of just five amino acids [2]. Sequence simplifica- tion was achieved at 40 of the 45 randomized non-peptide- binding residues, highlighting that, potentially, a protein could evolve to have a dramatically different sequence while retaining its structure and function. Consequently, we can expect an abundance of distantly related proteins with similar functions that are difficult to identify by comparison of their sequences alone. A fundamental aim of protein science is to develop a method to predict ab initio the folded structures of proteins from sequence data alone and, subsequently, to infer their function. The structure of many proteins can be identified by sequence homology with known protein structures, although this is not possible when a protein sequence has little or no significant homology to those in structure databases [3]. The recent report by Christ and Winter [4] demonstrates that directed evolution might bridge the gap to homology modeling for a subset of sequences that are related structurally and functionally but no longer have significant sequence homology. Protein engineering by directed evolution Over the past decade, directed evolution has become established as the leading method both for obtaining proteins with novel binding affinities and for altering the properties of enzymes [5,6]. It has been used to obtain enzyme activity [7] and to improve many properties of proteins, including binding affinities [8–10], enzyme activity [11,12], stability [13,14], substrate specificity [15,16], enantioselectivity [17] and protein expression [18]. The key to its success has been that it does not require comprehensive knowledge of protein structure and function. Successive rounds of random mutation and carefully designed selection or screening protocols identify improved proteins in a manner that mimics natural evolution processes. The screening or selection for new protein variants from a library of random mutants neatly avoids the require- ment that currently hampers rational protein design (i.e. understanding the complex relationship between protein structure and function). Mutations that alter or improve protein function are frequently obtained; these would have been difficult to predict by sequence analysis or protein modeling. Interestingly, these unexpected mutations, alongside those rationalized more readily, might play a significant role in understanding better both structure–function relationships and, as Christ and Winter have demonstrated, the evolutionary relationships between proteins [4]. Furthermore, directed evolution often reveals divergence to more than one consensus sequence that results in the same overall protein structure and function [19,10], thus highlighting the potential difficulty in identifying the evolutionary link between two distantly related sequences. The ability of directed evolution to identify these changes has led to its increased use as a tool for identifying protein residues or structural elements with functional importance. For example, it has been used to identify residues that affect enzyme regulation [20], to obtain functional consensus sequences compatible with certain structural elements in proteins [19,10] and to identify peptide sequence motifs that interact with target proteins [21–23]. After detecting consensus sequence motifs that bind to a chosen target molecule, computational search tools can then be used to identify potential interaction partners. Using this method, protein interaction networks have been identified for SH3 domains that were then refined using two-hybrid screening [24]. Directed evolution in bioinformatics Christ and Winter have extended the use of directed evolution to reveal an evolutionary relationship between two proteins that would have been difficult to identify using alternative current methods [4]. The consensus sequences obtained by directed evolution of the dimeric RNA-binding protein Rop, mapped to the RNA-binding Corresponding author: Paul A. Dalby ([email protected]). Update TRENDS in Biotechnology Vol.22 No.5 May 2004 www.sciencedirect.com

Upload: oliver-j-miller

Post on 12-Sep-2016

215 views

Category:

Documents


3 download

TRANSCRIPT

|Research Focus

Exposing relationships using directed evolution

Oliver J. Miller and Paul A. Dalby

The Advanced Centre for Biochemical Engineering, Department of Biochemical Engineering, University College London,

Torrington Place, London WC1E 7JE, UK

Functionally related protein structures that have under-

gone significant mutagenesis and re-arrangement over

a large evolutionary time-scale might no longer share

enough sequence or structural similarity to be revealed

by even the most advanced database searches.

Recently, Christ and Winter used directed evolution to

obtain functional variants of the RNA-hairpin-binding

protein Rop. Using the functional sequences obtained,

a structural database search revealed previously

unknown similarity to the tRNA-binding region of valyl-

tRNA synthetase.

It is well established that proteins of both similar andunrelated function can have the same overall structuraltopology but statistically insignificant sequence homology.For example, human hemoglobin and lupine leghemo-globin have very similar tertiary structures but only 15.6%homology at the amino acid sequence level [1]. The extentto which sequences can be altered and yet achieve thesame protein fold has been investigated with the directedevolution of a functional Src homology 3 (SH3) domain,using phage-displayed libraries containing a simplifiedalphabet of just five amino acids [2]. Sequence simplifica-tion was achieved at 40 of the 45 randomized non-peptide-binding residues, highlighting that, potentially, a proteincould evolve to have a dramatically different sequencewhile retaining its structure and function. Consequently,we can expect an abundance of distantly related proteinswith similar functions that are difficult to identify bycomparison of their sequences alone.

A fundamental aim of protein science is to develop amethod to predict ab initio the folded structures of proteinsfrom sequence data alone and, subsequently, to infer theirfunction. The structure of many proteins can be identifiedby sequence homology with known protein structures,although this is not possible when a protein sequence haslittle or no significant homology to those in structuredatabases [3]. The recent report by Christ and Winter [4]demonstrates that directed evolution might bridge the gapto homology modeling for a subset of sequences that arerelated structurally and functionally but no longer havesignificant sequence homology.

Protein engineering by directed evolution

Over the past decade, directed evolution has becomeestablished as the leading method both for obtainingproteins with novel binding affinities and for altering theproperties of enzymes [5,6]. It has been used to obtain

enzyme activity [7] and to improve many properties ofproteins, including binding affinities [8–10], enzymeactivity [11,12], stability [13,14], substrate specificity[15,16], enantioselectivity [17] and protein expression[18]. The key to its success has been that it does notrequire comprehensive knowledge of protein structure andfunction. Successive rounds of random mutation andcarefully designed selection or screening protocols identifyimproved proteins in a manner that mimics naturalevolution processes.

The screening or selection for new protein variants froma library of random mutants neatly avoids the require-ment that currently hampers rational protein design(i.e. understanding the complex relationship betweenprotein structure and function). Mutations that alter orimprove protein function are frequently obtained; thesewould have been difficult to predict by sequence analysisor protein modeling. Interestingly, these unexpectedmutations, alongside those rationalized more readily,might play a significant role in understanding betterboth structure–function relationships and, as Christ andWinter have demonstrated, the evolutionary relationshipsbetween proteins [4]. Furthermore, directed evolutionoften reveals divergence to more than one consensussequence that results in the same overall protein structureand function [19,10], thus highlighting the potentialdifficulty in identifying the evolutionary link betweentwo distantly related sequences.

The ability of directed evolution to identify thesechanges has led to its increased use as a tool for identifyingprotein residues or structural elements with functionalimportance. For example, it has been used to identifyresidues that affect enzyme regulation [20], to obtainfunctional consensus sequences compatible with certainstructural elements in proteins [19,10] and to identifypeptide sequence motifs that interact with target proteins[21–23]. After detecting consensus sequence motifs thatbind to a chosen target molecule, computational searchtools can then be used to identify potential interactionpartners. Using this method, protein interaction networkshave been identified for SH3 domains that were thenrefined using two-hybrid screening [24].

Directed evolution in bioinformatics

Christ and Winter have extended the use of directedevolution to reveal an evolutionary relationship betweentwo proteins that would have been difficult to identifyusing alternative current methods [4]. The consensussequences obtained by directed evolution of the dimericRNA-binding protein Rop, mapped to the RNA-bindingCorresponding author: Paul A. Dalby ([email protected]).

Update TRENDS in Biotechnology Vol.22 No.5 May 2004

www.sciencedirect.com

helix structure, have been used to identify a distantlyrelated enzyme, valyl-tRNA-synthetase (ValRS), withpreviously unknown structural and functional similarityto Rop. The two proteins have no significant sequencehomology and only a search with alternative functionalRop sequences revealed the potential link to ValRS.

In their approach, Christ and Winter randomized fiveresidues of Rop corresponding to the putative RNA-bindingsite within the N-terminal helix. A genetic complementa-tion approach was then used to select active Rop variants.The basis of this system is a derivative of the naturallyoccurring ColE1 plasmid with the rop gene deleted. Thisdeletion boosts the plasmid copy number and, conse-quently, increases the metabolic burden on the cell, whichresults in reduced growth rate. The increased copy numberalso raises the expression level of the plasmid-bornereporter gene LacZ. Clones from the library that expressedactive variants of Rop in trans had their growth ratesrestored and were, therefore, enriched by growth selectionin liquid media. Subsequent blue–white screening ofcolonies growing on X-Gal (5-bromo-4-chloro-3-indolyl-b-D-galactopyranoside) confirmed clones that expressedactive variants of Rop – their lower levels of reportergene expression colored them white. After three suchrounds of selection and screening, the sequences of 28active Rop variants were compiled and used to search aProtein-Data-Bank-derived database with the SPASMprogram [25]. All combinations of the obtained sequenceswere used in the search pattern, excluding positions atwhich mutations occurred only once. The search patternincluded only the mutated residues and enabled a maxi-mum of 1-A root-mean-square from their spatial arrange-ment in Rop. Initially, the inclusion of residue 25 returnedonly Rop as a match but its exclusion enabled six otherproteins to be identified, of which ValRS was the onlyRNA-binding protein. This refinement of the searchpattern seems to indicate that, in general, several versionsof a search pattern might be required for efficientidentification of ‘hits’ with SPASM.

Having obtained a match to ValRS, the authors built amodel of wild-type Rop bound to RNA, based on the ValRSstructure and the synthetic Tar–Tar* RNA hairpin, forwhich a nuclear magnetic resonance (NMR) structure isavailable. The binding affinity of Tar–Tar* for Rop issimilar to that of ColE1, the natural target of Rop, makingit a reasonable RNA structure to use in the model. Themodel obtained was consistent with previous NMR andbiochemical data for Rop. Comparison of the Rop–RNAmodel with the known structure of Rop in the absence ofRNA enabled Christ and Winter to rationalize the RNAbinding in terms of a ‘ribose trap’, in which a hydrogenbond between Arg-13 and Asn-10 of Rop is broken to formnew contacts with the ribose of RNA [4].

Concluding remarks

Overall, these results demonstrate that using directedevolution and structure searches is a powerful newapproach for identifying potential new evolutionary linksbetween distantly related protein sequences. Further-more, the identification of structural and functionalsimilarity to a protein for which a liganded structure is

available has enabled Christ and Winter to infer the modeof binding for their protein to a similar ligand. Thesimilarities suggest a possible common evolutionaryorigin for Rop and ValRS, bearing in mind that mostother tRNA synthetases (e.g. ArgRS) have differentbinding modes to RNA.

The technique used in the SPASM program identifiesonly proteins containing the search motif and does notrequire matches outside this region. Looking beyond theRNA contact sites, the authors found that both ValRS andRop contain a four-helix bundle. However, Rop is anantiparallel bundle between a homodimer, whereas ValRSis monomeric bundle. Also, Rop binds two RNA moleculesin a symmetrical manner, whereas ValRS binds only onetRNA molecule. Consequently, it is difficult to distinguishthe evolutionary link between Rop and ValRS as beingeither divergent or convergent evolution. Despite this,many researchers should, surely, be revisiting the resultsof their directed evolution experiments to see whether theycan reveal any further evolutionary links to functionallysimilar proteins.

This work has broad implications for the study ofprotein evolution. Prediction of evolutionary relationshipsis currently limited to cases in which sequence or struc-tural similarities are readily identified. Distant proteinrelatives that have mutated beyond recognition at thesequence level could now be identified by the methoddescribed by Christ and Winter [4] and used to improvemodels of protein evolution. Extensive application mightalso reveal many more, previously unseen, relationshipsbetween protein families.

It will be interesting to see whether this work will havean impact on sequence or structural homology searches. Inthe future, it might be possible to use a similar approachin silico, whereby localized random mutations are intro-duced into a structural model and the variants areprioritized by their predicted binding properties. Derivedconsensus sequences and structures could then be used tosearch for potential distantly related proteins.

AcknowledgementsWe thank the UK Biotechnology and Biological Sciences Research Councilfor funding O.J.M.

References

1 Berg, J.M. et al. (2002) Exploring evolution. In Biochemistry, (5th edn),pp. 179–180, Freeman, New York

2 Riddle, D.S. et al. (1997) Functional rapidly folding proteins fromsimplified amino acid sequences. Nat. Struct. Biol. 4, 805–809

3 Baker, D. and Sali, A. (2001) Protein structure prediction andstructural genomics. Science 294, 93–96

4 Christ, D. and Winter, G. (2003) Identification of functional similaritiesbetween proteins using directed evolution. Proc. Natl. Acad. Sci.U. S. A. 100, 13184–13189

5 Hoess, R.H. (2001) Protein design and phage display. Chem. Rev. 101,3205–3218

6 Dalby, P.A. (2003) Optimising enzyme function by directed evolution.Curr. Opin. Struct. Biol. 13, 500–505

7 Goud, G.N. et al. (2001) Specific glycosidase activity isolated from arandom phage display antibody library. Biotechnol. Prog. 17, 197–202

8 Lowman, H.B. et al. (1991) Selecting high-affinity binding proteins bymonovalent phage display. Biochemistry 30, 10832–10838

9 Smith, G.P. et al. (1998) Small binding proteins selected from a

Update TRENDS in Biotechnology Vol.22 No.5 May 2004204

www.sciencedirect.com

combinatorial repertoire of knottins displayed on phage. J. Mol. Biol.277, 317–332

10 Dalby, P.A. et al. (2000) Evolution of binding affinity in a WW domainprobed by phage display. Protein Sci. 9, 2366–2376

11 Stemmer, W.P.C. (1994) Rapid evolution of a protein in vitro by DNAshuffling. Nature 370, 389–391

12 Chen, K. and Arnold, F.H. (1993) Tuning the activity of an enzyme forunusual environments: sequential random mutagenesis of subtilisin Efor catalysis in dimethylformamide. Proc. Natl. Acad. Sci. U. S. A. 90,5618–5622

13 Flores, H. and Ellington, A.D. (2002) Increasing the thermalstability of an oligomeric protein, b-glucuronidase. J. Mol. Biol.315, 325–337

14 Pedersen, J.S. et al. (2002) Directed evolution of barnase stabilityusing proteolytic selection. J. Mol. Biol. 323, 115–123

15 Matsumura, I. and Ellington, A.D. (2001) In vitro evolution ofb-glucuronidase into a b-galactosidase proceeds through non-specificintermediates. J. Mol. Biol. 305, 331–339

16 Raillard, S. et al. (2001) Novel enzyme activities and functionalplasticity revealed by recombining highly homologous enzymes. Chem.Biol. 8, 891–898

17 Zha, D.X. et al. (2001) Complete reversal of enantioselectivity of anenzyme-catalyzed reaction by directed evolution. Chem. Comm.,2664–2665

18 Lin, Z. et al. (1999) Functional expression of horseradish peroxidase inE. coli by directed evolution. Biotechnol. Prog. 15, 467–471

19 Zhou, H.X. et al. (1996) In vitro evolution of thermodynamically stableturns. Nat. Struct. Biol. 3, 446–451

20 Salamone, P.R. et al. (2002) Directed molecular evolution ofADP-glucose pyrophosphorylase. Proc. Natl. Acad. Sci. U. S. A. 99,1070–1075

21 O’Neil, K.T. et al. (1992) Identification of novel peptide antagonists forGPIIb/IIIa from a conformationally constrained phage peptide library.Proteins 14, 509–515

22 Li, R.H. et al. (2003) Use of phage display to probe the evolution ofbinding specificity and affinity in integrins. Protein Eng. 16, 65–72

23 Kasanov, J. et al. (2004) Characterizing class I WW domains defineskey specificity determinants and generates mutant domains withnovel specificities. Chem. Biol. 8, 231–241

24 Tong, A.H.Y. et al. (2002) A combined experimental and computationalstrategy to define protein interaction networks for peptide recognitionmodules. Science 295, 321–324

25 Kleywegt, G.J. (1999) Recognition of spatial motifs in proteinstructures. J. Mol. Biol. 285, 1887–1897

0167-7799/$ - see front matter q 2004 Elsevier Ltd. All rights reserved.doi:10.1016/j.tibtech.2004.03.003

Nuclear remodeling after SCNT: a contractor’snightmare

Peter Sutovsky1,2 and Randall S. Prather1

1Department of Animal Science, University of Missouri-Columbia, S141 ASRC, 920 East Campus Drive, Columbia, MO 65211, USA2Department of Obstetrics & Gynecology, University of Missouri-Columbia, S141 ASRC, 920 East Campus Drive, Columbia,

MO 65211, USA

As the success rate of somatic cell nuclear transfer

(SCNT) remains low, researchers are turning to the very

early stages of pre-implantation development to try to

improve the developmental potential of reconstructed

mammalian embryos. Two recent papers highlight the

role of regulated proteolysis in nuclear remodeling after

SCNT. First, Gao et al. describe a rapid, programmed

replacement of the somatic-type linker histone H1

inside donor-cell nuclei with an oocyte-derived homo-

log after SCNT, which is subsequently reversed at the

time of maternal embryonic transition. Second, Zhou

et al. report the first successful cloning of a rat by using

selective blockers of the ubiquitin-dependent degra-

dation of cell-cycle regulator cyclin B. Therefore, a fast,

programmed proteolysis might be of central import-

ance for nuclear remodeling after SCNT, particularly in

the ubiquitin-proteasome pathway.

Even though the number of species cloned by somatic-cellnuclear transfer (SCNT) grows steadily, the overallsuccess rate remains low (,5%). Deviant patterns ofnuclear remodeling and improper replication of thenuclear DNA methylation patterns (gene imprinting)during pre-implantation embryo development have been

blamed for this poor developmental capacity of theclones. As a rule, both the maternal and the paternalDNA undergo gradual demethylation that is completed bythe blastocyst stage. However, some researchers are nowturning to very early stages of pre-implantation develop-ment to explain this problem. Notably, two recent papersindicate the importance of early events in nuclearremodeling of the donor cell after SCNT.

Histone replacement

The first paper by Gao et al. [1] demonstrates that thesomatic-cell-type histone H1 in the donor-cell nucleus israpidly replaced by oocyte-derived H1 within 60 min afternuclear transfer (NT) or intracytoplasmic sperm injection(ICSI) in mouse. The exchange of nuclear histone H1 isthen reversed at the two- to four-cell stage, when theoocyte-derived molecules are replaced by the embryo-derived H1. This is likely to be a consequence of the onset oftranscription and translation of the gene encodingembryonic H1, as well as the limited half-life of oocyte-derived H1. Overall, this shows that remodeling of thedonor-cell nucleus during SCNT is similar to the remodel-ing of the sperm nucleus after natural fertilization, andthat the mammalian ooplasm is programmed and wellequipped for this function. The ooplasmic factors con-tributing to nuclear remodeling are being sought in hopeCorresponding author: Peter Sutovsky ([email protected]).

Update TRENDS in Biotechnology Vol.22 No.5 May 2004 205

www.sciencedirect.com