what the papers say: protein structure and evolution: similar amino acid sequences sometimes produce...

2
BioEssays Vol. 2, No. 5 213 Protein Structure and Evolution: Similar Amino Acid Sequences Sometimes Produce Strikingly Different Three-Dimensional Structures Arthur M. Lesk It is well known that amino acid sequences of proteins determine their conformations; this mechanism trans- lates the one-dimensional genetic mes- sage into the three-dimensional world of structure. This principle has created an expectation that small changes in amino acid sequences should produce only limited changes in protein structures, and progressively greater divergence in sequence should produce correspond- ingly greater divergence in structure. This fairly widespread belief rests on a combination of (1) evidence (that is, many cases in which it is known to be correct); (2) analogy to the adaptive radiation observed in organic evolution; and (3) a priori considerations based on beliefs about the mechanisms of molecu- lar evolution. Thus, it is essential for evolution that some small changes in amino acid sequence cause only small, localized, changes in protein structure: if on the one hand proteins were rigidly stable, or if on the other hand they were so criti- cally constructed that any mutation simply blew a molecule apart, then point mutations could not generate functional or even structural diversity. Indeed, genetic changes during evolution have caused amino acid sequences to diverge into families of homologous Similar sequences often do generate similar structures; the globins, the cytochromes c and the serine proteases are the classic A recent but especially significant example is the observation that a series of immuno- globulins produced during the course of the reaction to a challenge by an antigen achieves progressive tuning of affinity and specificity by a small number of amino acid changes that must maintain the basic structure of the antigen-binding site.6 So far, so good. However, observations that are per- haps more surprising are the following. (1) Very different sequences can, if they have arisen under selective con- straints on function, still generate very similar structures. Vertebrate, insect and plant globins, for example, have amino acid sequence homologies below 20% but retain the same basic fold and pattern of intramolecular inter- action~.~~ (2) Very similar sequences can in some cases generate very different struc- tures, It is this fact that makes if difficult to predict the structural perturbation caused by even a small change in sequence. It has been a recurrent theme in discussions of sequence-structure relationships that the conformation of a residue in a native protein structure might be determined primarily by the identity of the residues neighbouring it in the sequence. This idea underlies most attempts at prediction of secondary structure from amino acid sequence, for example. And with the rise in interest in protein engineering - the deliberate modification of amino acid sequences - one wants to be able to predict the effects of amino acid substitutions. Such predictions would be very much simpler indeed if the structural perturbation arising from an amino acid substitution were always limited to the residues near the changed one. But what are the facts? Two recent investigations, based on protein structures determined by X-ray crystallography, show that the conform- ation of a segment of polypeptide chain can depend on more than the local sequence. Dijkstra, Weijer and Wierenga have compared the structures of two closely related enzymes : bovine and porcine pancreatic phospholipases A,.9 The amino acid sequences are highly hom- ologous: 85% of the 124 residues are identical. The structures also are very similar: the molecules have the same basic folding pattern; indeed, 107 of the 124 residues can be overlaid. But 12 of the remaining residues are in a segment of similar sequence and dissimilar conformation. The sequences involved : * bovine porcine leu-asp-ser-cys-lys-val-leu-Val-asp-asn-pro-tyr leu-asp-ser-cys-lys-phe-leu-Val-asp-am-pro-tyr differ in only one position: val/phe at position *. But the conformations are quite different: in the bovine enzyme enzyme these residues form an a-helix followed by a nearly extended chain. In the porcine enzyme the helix is unwound and followed by a bend. The sidechain of the Val in the bovine case is on the surface of the molecule; the sidechain of the phe is buried in the porcine structure. Apparently the Val sidechain, smaller than phe, is not large enough to fill the pocket that the phe occupies. (A somewhat similar case of a residue popping out of a pocket that is too large for it occurs in insulin.)'O The phospho- lipases are showing an extreme example of the conformational changes in loops which have been observed in crystal structures of other pairs ofclosely related proteins, and even in the same protein packed in different surroundings in different crystal forms. Note that the structure of the region is not distorted by effects occurring at its ends, because the regions flanking it have the same conformation in both structures. The conformation of this region depends on the interaction of its residues with other portions of the molecule. If the phospholipase structures chal- lenge the idea that local sequence determines local conformation, investi- gations of similar sequences in unrelated structures demolish it - as a reliable generalization - entirely. Kabsch and Sander have searched the corpus of solved protein structures for identical but nonhomologous oligopeptides." They found 25 cases of identical pentapeptide sequences in pairs of unrelated proteins. Identical hexapep- tides exist in pairs of unrelated proteins, but in no case are the structures of both proteins known. Of the 25 pentapeptides, 11 have qualitatively similar conformations in bothproteins. Forexample, the sequence ala-ala-leu-Val-lys is in an a-helix in both leghaemoglobin and cytochrome b562. But the other fourteen examples show qualitatively different conforma-

Upload: arthur-m-lesk

Post on 06-Jun-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

BioEssays Vol. 2, No. 5 213

Protein Structure and Evolution: Similar Amino Acid Sequences Sometimes Produce Strikingly Different Three-Dimensional Structures Arthur M. Lesk

It is well known that amino acid sequences of proteins determine their conformations; this mechanism trans- lates the one-dimensional genetic mes- sage into the three-dimensional world of structure. This principle has created an expectation that small changes in amino acid sequences should produce only limited changes in protein structures, and progressively greater divergence in sequence should produce correspond- ingly greater divergence in structure. This fairly widespread belief rests on a combination of ( 1 ) evidence (that is, many cases in which it is known to be correct); (2) analogy to the adaptive radiation observed in organic evolution; and (3) a priori considerations based on beliefs about the mechanisms of molecu- lar evolution.

Thus, it is essential for evolution that some small changes in amino acid sequence cause only small, localized, changes in protein structure: if on the one hand proteins were rigidly stable, or if on the other hand they were so criti- cally constructed that any mutation simply blew a molecule apart, then point mutations could not generate functional or even structural diversity. Indeed, genetic changes during evolution have caused amino acid sequences to diverge into families of homologous Similar sequences often do generate similar structures; the globins, the cytochromes c and the serine proteases are the classic A recent but especially significant example is the observation that a series of immuno- globulins produced during the course of the reaction to a challenge by an antigen achieves progressive tuning of affinity and specificity by a small number of amino acid changes that must maintain the basic structure of the antigen-binding site.6

So far, so good. However, observations that are per-

haps more surprising are the following. (1) Very different sequences can, if

they have arisen under selective con- straints on function, still generate very similar structures. Vertebrate, insect

and plant globins, for example, have amino acid sequence homologies below 20% but retain the same basic fold and pattern of intramolecular inter- a c t i o n ~ . ~ ~

(2) Very similar sequences can in some cases generate very different struc- tures, It is this fact that makes if difficult to predict the structural perturbation caused by even a small change in sequence. It has been a recurrent theme in discussions of sequence-structure relationships that the conformation of a residue in a native protein structure might be determined primarily by the identity of the residues neighbouring it in the sequence. This idea underlies most attempts at prediction of secondary structure from amino acid sequence, for example. And with the rise in interest in protein engineering - the deliberate modification of amino acid sequences - one wants to be able to predict the effects of amino acid substitutions. Such predictions would be very much simpler indeed if the structural perturbation arising from an amino acid substitution were always limited to the residues near the changed one. But what are the facts?

Two recent investigations, based on protein structures determined by X-ray crystallography, show that the conform- ation of a segment of polypeptide chain can depend on more than the local sequence.

Dijkstra, Weijer and Wierenga have compared the structures of two closely related enzymes : bovine and porcine pancreatic phospholipases A,.9 The amino acid sequences are highly hom- ologous: 85% of the 124 residues are identical. The structures also are very similar: the molecules have the same basic folding pattern; indeed, 107 of the 124 residues can be overlaid. But 12 of the remaining residues are in a segment of similar sequence and dissimilar conformation.

The sequences involved : *

bovine

porcine leu-asp-ser-cys-lys-val-leu-Val-asp-asn-pro-tyr

leu-asp-ser-cys-lys-phe-leu-Val-asp-am-pro-tyr

differ in only one position: val/phe at position *. But the conformations are quite different: in the bovine enzyme enzyme these residues form an a-helix followed by a nearly extended chain. In the porcine enzyme the helix is unwound and followed by a bend. The sidechain of the Val in the bovine case is on the surface of the molecule; the sidechain of the phe is buried in the porcine structure. Apparently the Val sidechain, smaller than phe, is not large enough to fill the pocket that the phe occupies. (A somewhat similar case of a residue popping out of a pocket that is too large for it occurs in insulin.)'O The phospho- lipases are showing an extreme example of the conformational changes in loops which have been observed in crystal structures of other pairs ofclosely related proteins, and even in the same protein packed in different surroundings in different crystal forms. Note that the structure of the region is not distorted by effects occurring at its ends, because the regions flanking it have the same conformation in both structures.

The conformation of this region depends on the interaction of its residues with other portions of the molecule.

If the phospholipase structures chal- lenge the idea that local sequence determines local conformation, investi- gations of similar sequences in unrelated structures demolish it - as a reliable generalization - entirely. Kabsch and Sander have searched the corpus of solved protein structures for identical but nonhomologous oligopeptides." They found 25 cases of identical pentapeptide sequences in pairs of unrelated proteins. Identical hexapep- tides exist in pairs of unrelated proteins, but in no case are the structures of both proteins known.

Of the 25 pentapeptides, 11 have qualitatively similar conformations in bothproteins. Forexample, the sequence ala-ala-leu-Val-lys is in an a-helix in both leghaemoglobin and cytochrome b562. But the other fourteen examples show qualitatively different conforma-

214 BioEssays Vol. 2, No. 5

WHAT THE PAPERS SAY

tions. For example, the sequence Val-asn-thr-phe-Val is a-helical in ery- throcuorin but in a strand of P-sheet in ribonuclease-S.

The distribution of structural changes is illuminating. The eleven similar structures consist of five turns, two P-strands and four a-helices. The four- teen dissimilar structures include eight examples of pentapeptides which form part of a helix in one structure and part or all of a P-strand in another. In all but two of the other cases, neither of the two dissimilar structures has a secondary structure (helix or sheet).

The implication is intriguing. In eight cases helices and sheets are, by changes in structural context, interconvertible. (Also worth mention here is a segment of the serine protease structure that is helical in mammalian enzymes such as chymotrypsin but forms two strands of P-sheet in bacterial homologues.12 In this case the sequences are not identical.) By contrast, turns seem not to be as easily interconvertible with elements of secondary structure (helices and strands). Some time ago Rose suggested that attempts to predict structure from sequence could be more fruitful if emphasis were shifted from trying to distinguish between helices and sheets, to locating the turns.13 This study by Kabsch and Sander supports Rose's view.

Conclusions

Several general implications emerge from the observations discussed here.

First, despite considerable genuine progress in understanding protein architecture, it remains difficult to 'cash in' our understanding for reliable pre- dictions. Two subjects of current re- search, which approach the status of heavy industries in the field of compu- tational molecular biology, are the following.

(1) The prediction of the secondary structure of a protein by statistical inference from the identities of a residue and its neighbors in the sequence.

Such predictions do not achieve more than about 60% s u c ~ e s s . ~ ~ - ~ ~ T h e depen- dence of conformation on tertiary structural interactions precludes sub- stantial improvements in the quality of this result.

(2) The prediction of the structural perturbation caused by a mutation as a small and localized deformation.

The model of a localized structural response to small changes in amino acid sequence seems to be necessary for even an attempt to predict the effects of

mutations (although, given the current state of the art, insufficient for real success). We have seen that the model is not generally valid. Moreover, it must be remembered that those closely related pr'oteins in which small sequence changes produce only small changes in structure have arisen under natural selec- tion, unlike the artificial modifications of protein engineers.

Nevertheless, one might reasonably ask whether it is possible to distinguish those cases in which the effects of mutation are localized from others in which they are likely to be more far-reaching and complex. Thus, if we were given the structure of porcine phospholipase, and asked to predict the effects of mutating phe 63 to Val, we might respond that phe 63 is buried in the porcine structure; changing its size and shape might have nonlocalized efTects; therefore no simple model or reliable prediction of the structural change is possible. This is fine, although many cases are known in which such changes, even in buried residues, do cause only minor deformations. But, what if we were given the bovine phlospholipase structure, and asked to predict the effect of mutating val63 - which is on the molecular surface - to phle?

It is conceivable that what is to us a headache may be to Nature a useful mechanism of evolution. Consider the generation of diversity - the raw mater- ial on which selection can operate- from point mutations. Some mutations make very little difference to a structure; some are lethal. Others do cause localized structural perturbations, and thereby allow selection to explore the immediate vicinity of a structure, so to speak. Small changes in sequence that make far-reaching changes in structure represent bolder experiments. It must be admitted, however, that no example of the development of a new function by such a mechanism has yet been identified.

Moral: The apple does not fall far from the tree, but it may change its shape when it hits the ground.

W'ork supported in part by US. National Science Foundation research grant PCM 83-20171.

protein structures: new insights from a growing data base. BioEssays 1, 105-1 10. 3 LESK, A. M. (1984). Protein ex Machina [The architecture of proteins]. KOS I(b),

4 DICKERSON, R. E. & GEIS, I. (1969). The Structure and Action of Proteins. Benjamin/ Cummings, Menlo Park, CA. 5 DICKERSON, R. E. & GEIS, I. (1983). Hemoglobin: Structure, Function, Evolution. and Pathology. Benjamin/Cummings, Menlo Park, CA. 6 GRIFFITHS, G. M., BEREK, C., KAARTINEN, M. & MILSTEIN, C. (1984). Somatic mutation and the maturation of the immune response to 2-phenyl oxazolone. Nature 312,271-275. 7 LESK, A. M. & CHOTHIA, C. (1980). How different amino acid sequences determine similar protein structures. I. The structure and evolutionary dynamics of the globins. J. Mol. Biol. 136, 225-270. 8 LESK, A. M. Themes and contrasts in protein structures. Trends in Biochem. Sci. 9, v-VII. 9 DIJKSTRA, B. W., WEIJER, W. J. & WIER- ENGA, R. K. (1983). Polypeptide chains with similar amino acid sequences but a distinctly different conformation. FEBS Letters 164,

10 CHOTHIA, C., LESK, A. M., DODSON, G. G. & HODGKIN, D. C. (1983). Transmis- sion of conformational change in insulin. Nature 302, 50&505. 11 KABSCH, W. & SANDER, C. (1984). On the use of sequence homologies to predict protein structure: identical pentapeptides can have completely different conformations. Proc. Natl Acad. Sci. USA 81, 1075-1078. 12 BRAYER, G. D., DELBAERE, L. T. J. & JAMES, M. N. G. (1978). Molecular structure of crystalline Streptomyces griseus protease A at 2.8 A resolution. 11. Molecular confor- mation, comparison with achymotrypsin and active-site geometry. J. Mol. Biol. 124,

13 ROSE, G. D. (1978). Prediction of chain turns in globular proteins on a hydrophobic basis. Nature 212, 586593. 14 KABSCH, W. & SANDER, C. (1983~). Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22,

15 KABSCH, W. &SANDER,C. (1983b). How good are predictions of protein secondary structure? FEBS Letters 155, 179-182. 16 ROSE, G. D., GIERASCH, L. & SMITH, J. A. (1985). Turns in peptides and proteins. Adu. Prot. Chem. 37 (In the press).

113-131.

25-27.

26 1-283.

2577-2637.

REFER EN C ES I A R T H U R M. LESK is a! the MRC I

2 LESK, A. M. (1984). The analysis of I I