structural comparisons among the short-chain helical cytokineshelix d. (d) exon structures of the...

15
Structural comparisons among the short-chain helical cytokines Denise A Rozwarski 1 , Angela M Gronenborn 2 , G Marius Clore 2 , J Fernando Bazan 3 , Andrew Bohm 4 , Alexander Wlodawer 5 , Marcos Hatada 6 and P Andrew Karplus 1 * 1 Section of Biochemistry, Molecular and Cell Biology, Cornell University, Ithaca, NY 14853, USA, 2 Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD 20892, USA, 3 Department of Biochemistry and Biophysics, University of California, San Francisco, CA 94143-0448, USA, 4 Calvin Lab, University of California, Berkeley, CA 94720, USA, 5 Macromolecular Structure Laboratory, NCI-Frederick Cancer Research and Development Center, ABL-Basic Research Program, Frederick, MD 21702-1201, USA and 6 Ariad Pharmaceuticals, 26 Landsdowne Street, Cambridge, MA 02139, USA Background: Cytokines and growth factors are soluble proteins that regulate the development and activities of many cell types. One group of these proteins have struc- tures based on a four-helix bundle, though this similarity is not apparent from amino acid sequence comparisons. An understanding of how diverse sequences can adopt the same fold would be useful for recognizing and align- ing distant homologs and for applying structural infor- mation gained from one protein to other sequences. Results: We have approached this problem by compar- ing the five known structures which adopt a granulocyte- macrophage colony-stimulating factor (GM-CSF)-like, or short-chain fold: interleukin (IL)-4, GM-CSF, IL-2, IL-5, and macrophage colony-stimulating factor. The compar- ison reveals a common structural framework of five seg- ments including 31 inner-core and 30 largely exposed residues. Buried polar interactions found in each pro- tein illustrate how complementary substitutions maintain protein stability and may help specify unique core pack- ing. A profile based on the known structures is not suf- ficient to guarantee accurate amino acid sequence align- ments with other family members. Comparisons of the conserved short-chain framework with growth hormone define the optimal structural alignment. Conclusions: Our results are useful for extrapolating functional results among the short-chain cytokines and growth hormone, and provide a foundation for similar characterization of other subfamilies. These results also show that the placement of polar residues at different buried positions in each protein complicates sequence comparisons, and they document a challenging test case for methods aimed at recognizing and aligning distant homologs. Structure 15 March 1994, 2:159-173 Key words: cytokine, helical bundle, homology modeling, protein evolution, protein folds Introduction Cytokines are a group of proteinaceous intercellular messengers involved in the development, activation, and regulation of cells of the circulatory system [1]. Structural studies of these molecules are progressing rapidly and have already revealed that the cytokines can be grouped into a number of different structural fami- lies. These include the interleukin (IL)-1-like [3-trefoil family [2], the IL-8-like family [3,4], and the growth hormone-like helical bundle family [5,6]. As shown in Fig. 1, members of the helical bundle family contain he- lices arranged in an up-up-down-down topology, which does not exist in any other known protein structures. Tertiary structures have been determined for nine members of the helical bundle family of cytokines, and they show that the family can be further divided into three apparent groups which we will refer to as the short-chain (five structures known), the long-chain (two structures known) and the interferon-like (two structures known) subfamilies. Although the helical bundle topology is the same within these three subfam- ilies, the folds are sufficiently different that successful homology model building applications would require that the folds be treated as distinct. As demonstrated in Fig. 1, key features of the long-chain subfamily are a bundle of four well-aligned helices 20 to 30 residues in length and a helix A to helix B crossover passing in front of helix D. The crossover connections include short helices, but these are not well-aligned with or in- tegrated with the bundle core. For the short-chain sub- family members, the helices of the bundle are aligned less well and are only 10 to 20 residues long. Also, the crossover from helix A to helix B passes behind helix D and is part of a short two-stranded antiparal- lel 3-sheet which contributes to the bundle core. The change in the crossover connection suggests there is some difference in the folding pathway of these two subfamilies. The interferons make up the third subfam- ily, and have features intermediate between the other two. The helix packing angles in these molecules are closer to those found in short-chain cytokines, but their *Corresponding author. ( Current Biology Ltd ISSN 0969-2126 159

Upload: others

Post on 03-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Structural comparisons among the short-chain helical cytokineshelix D. (d) Exon structures of the structurally known members of each subfamily (references in the order of appearance:

Structural comparisons amongthe short-chain helical cytokines

Denise A Rozwarski1 , Angela M Gronenborn2, G Marius Clore2,J Fernando Bazan 3, Andrew Bohm4, Alexander Wlodawer 5,

Marcos Hatada6 and P Andrew Karplus1*1Section of Biochemistry, Molecular and Cell Biology, Cornell University, Ithaca, NY 14853, USA, 2Laboratory of Chemical Physics,National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD 20892, USA, 3Department of Biochemistry and

Biophysics, University of California, San Francisco, CA 94143-0448, USA, 4Calvin Lab, University of California, Berkeley, CA 94720,USA, 5Macromolecular Structure Laboratory, NCI-Frederick Cancer Research and Development Center, ABL-Basic ResearchProgram, Frederick, MD 21702-1201, USA and 6Ariad Pharmaceuticals, 26 Landsdowne Street, Cambridge, MA 02139, USA

Background: Cytokines and growth factors are solubleproteins that regulate the development and activities ofmany cell types. One group of these proteins have struc-tures based on a four-helix bundle, though this similarityis not apparent from amino acid sequence comparisons.An understanding of how diverse sequences can adoptthe same fold would be useful for recognizing and align-ing distant homologs and for applying structural infor-mation gained from one protein to other sequences.Results: We have approached this problem by compar-ing the five known structures which adopt a granulocyte-macrophage colony-stimulating factor (GM-CSF)-like, orshort-chain fold: interleukin (IL)-4, GM-CSF, IL-2, IL-5,and macrophage colony-stimulating factor. The compar-ison reveals a common structural framework of five seg-ments including 31 inner-core and 30 largely exposedresidues. Buried polar interactions found in each pro-

tein illustrate how complementary substitutions maintainprotein stability and may help specify unique core pack-ing. A profile based on the known structures is not suf-ficient to guarantee accurate amino acid sequence align-ments with other family members. Comparisons of theconserved short-chain framework with growth hormonedefine the optimal structural alignment.Conclusions: Our results are useful for extrapolatingfunctional results among the short-chain cytokines andgrowth hormone, and provide a foundation for similarcharacterization of other subfamilies. These results alsoshow that the placement of polar residues at differentburied positions in each protein complicates sequencecomparisons, and they document a challenging test casefor methods aimed at recognizing and aligning distanthomologs.

Structure 15 March 1994, 2:159-173

Key words: cytokine, helical bundle, homology modeling, protein evolution, protein folds

IntroductionCytokines are a group of proteinaceous intercellularmessengers involved in the development, activation,and regulation of cells of the circulatory system [1].Structural studies of these molecules are progressingrapidly and have already revealed that the cytokines canbe grouped into a number of different structural fami-lies. These include the interleukin (IL)-1-like [3-trefoilfamily [2], the IL-8-like family [3,4], and the growthhormone-like helical bundle family [5,6]. As shown inFig. 1, members of the helical bundle family contain he-lices arranged in an up-up-down-down topology, whichdoes not exist in any other known protein structures.

Tertiary structures have been determined for ninemembers of the helical bundle family of cytokines,and they show that the family can be further dividedinto three apparent groups which we will refer to asthe short-chain (five structures known), the long-chain(two structures known) and the interferon-like (twostructures known) subfamilies. Although the helical

bundle topology is the same within these three subfam-ilies, the folds are sufficiently different that successfulhomology model building applications would requirethat the folds be treated as distinct. As demonstratedin Fig. 1, key features of the long-chain subfamily area bundle of four well-aligned helices 20 to 30 residuesin length and a helix A to helix B crossover passingin front of helix D. The crossover connections includeshort helices, but these are not well-aligned with or in-tegrated with the bundle core. For the short-chain sub-family members, the helices of the bundle are alignedless well and are only 10 to 20 residues long. Also,the crossover from helix A to helix B passes behindhelix D and is part of a short two-stranded antiparal-lel 3-sheet which contributes to the bundle core. Thechange in the crossover connection suggests there issome difference in the folding pathway of these twosubfamilies. The interferons make up the third subfam-ily, and have features intermediate between the othertwo. The helix packing angles in these molecules arecloser to those found in short-chain cytokines, but their

*Corresponding author.

( Current Biology Ltd ISSN 0969-2126 159

Page 2: Structural comparisons among the short-chain helical cytokineshelix D. (d) Exon structures of the structurally known members of each subfamily (references in the order of appearance:

160 Structure 1994, Vol 2 No 3

Fig. 1. Ribbon diagrams showing the structural features common to the four at-helix bundle cytokine family. The four helices are desig-nated A, B, C and D. The up-up-down-down topology refers to the fact that helices A and B point up while helices C and D point down.This creates a situation in which each helix is antiparallel to both of its neighboring helices. This topology is also called a double crossoverantiparallel helical bundle since there are two long segments after helices A and C which cross from one end of the bundle to the other.(a) The fold typical of the short-chain subfamily, whose members are listed in Table 1. Key elements include a two-stranded 3-sheet anda crossover connection that passes behind helix D. Generally, these proteins have fewer than 150 residues. (b) The fold typical of thelong-chain subfamily, whose structurally known members include growth hormone (GH) [19] and granulocyte colony-stimulating factor(G-CSF) [42]. Key elements include no [-strands and a crossover connection that passes in front of helix D. Also, a short helix after ciAthat is involved in receptor binding and is common to the two known structures is shown. Generally, these proteins are longer than 160residues. (c) The fold typical of the interferon-like subfamily, whose structurally known members include interferon-P (IFN-[) [43] andinterferon-y (IFN-y) [44,45]. Key elements include a helix in the C-D crossover and an A-B crossover connection that passes in front ofhelix D. (d) Exon structures of the structurally known members of each subfamily (references in the order of appearance: [46-55]). Themajor helices (striped) and P-strands (shaded) are indicated and labeled as in (a)-(c).

helix A to helix B crossover passes in front of helixD, as it does in the long-chain cytokines. One uniquefeature of the interferon fold is that the helix C to helixD crossover forms a fifth a-helix which contributes tothe bundle core. It should be noted that macrophagecolony-stimulating factor (M-CSF) is naturally dimeric,and that IL-5 and interferon-y, in the short-chain andinterferon-like subfamilies respectively, form unusualinterdigitating homodimers so that in the prototype do-

main shown in Fig. 1 the helix C to helix D crossoverand the D-helix come from the second chain.

A striking feature of this family is that there is very littleamino acid sequence similarity between family mem-bers, and even between pairs within the same subfamilythe sequences are not recognizably similar. Despite thislow level of sequence similarity, a strong case can bemade that the members of this family are all related by

Page 3: Structural comparisons among the short-chain helical cytokineshelix D. (d) Exon structures of the structurally known members of each subfamily (references in the order of appearance:

Short helical cytokine fold Rozwarski et al. 161

divergent evolution: they are functionally similar, all be-ing extracellular signaling molecules and, for the mostpart, binding to homologous receptors [5,7]; they arestructurally similar, all having a helical bundle of uniquetopology; and they are genetically similar [7], most hav-ing separate exons encoding the following structuralsegments: helix A; the A-B crossover; helices B and Cand part of the C-D crossover; and the rest of the C-Dcrossover and helix D (Fig. d). The striking similarityin exon structure is consistent with the hypothesis thatall of the members of this family have diverged froman ancestral protein in which this exon structure wasestablished; differences in individual family memberswould then be due to insertions or deletions occurringin the individual exons or, in the case of interferon-1,intron loss.

The short-chain subfamily is thought to includeat least nine members (Table 1). The structuresof five of them have been determined; these areIL-4 [8-11], granulocyte-macrophage colony-stimulat-ing factor (GM-CSF) [12,13], IL-2 ([14,15] and Mar-cos Hatada, unpublished data), M-CSF [16] and re-cently IL-5 [17]. Their complete amino acid sequencesand secondary structures are shown in Fig. 2. Under-standing how these molecules bind to their receptorsand facilitate signal transmission has been a primarymotivation for the structural studies, and valuable in-sights have been obtained [18]. Especially important inthis regard is the structure of human growth hormonein complex with its receptor which has provided aparadigm for cytokine-receptor interactions in the heli-cal bundle family [19,20]. Comparisons of these struc-tures are important to assess how structural and func-tional information from these family members can be

extrapolated to other, unknown, structures. In addition,because of the extensive differences in the amino acidsequences, this family is a fertile subject for the studyof protein folding and evolution as well as a testingground for sequence alignment and homology modelbuilding methods. We present here a comparison ofthe members of the short-chain subfamily whose struc-tures are known, to assess which features of the foldare conserved and to build a foundation on which com-parisons between all helical bundle cytokines can bemade.

ResultsFamily consensus frameworkThe short chain helical cytokines contain six commonelements of secondary structure: four ca-helices and twostrands of a 13-sheet (Figs 1 and 2). Since 13-strand 2is contiguous with a-helix D, it was here treated as asingle segment. The conserved juncture between thesetwo structural elements provided an important tetherpoint for the overlays by allowing every a-helix D tobe aligned unambiguously. Starting with these five seg-ments, a consensus framework for the fold was de-termined by carrying out all pairwise comparisons ofthe five cytokine structures and retaining a-carbon po-sitions common to all of the pairs. The results of thepairwise comparisons are summarized in Table 2 andFig. 3.

The pairwise comparisons yield from as few as 68equivalent residues in the GM-CSF/M-CSF pair to asmany as 81 equivalent residues in the nIL-4/IL-2,GM-CSF/IL-5 and IL-2/IL-5 pairs. (nIL-4 refers to the

Table 1. Members of the short-chain helical cytokine subfamily whose structures are known.

Name (abbreviation) Chain length Structure determinations PDB codeb Reference

Interleukin-4 (IL-4) 129 NMR (8.3, 1.0 A) 1bbn [81NMR (14.1,1.0A) litl 191X-ray (2.35 A, 23 %) [10]X-ray (2.25 A, 22 %) 1rcb [111]

Granulocyte-macrophage colony- 127 X-ray (2.4 A, 20 %) lgmf [121stimulating factor (GM-CSF) X-ray (2.8 A, 25 %) 1rgm [131

Interleukin-2 (IL-2) 133 X-ray (2.5 A, 20 %) 3ink [141NMR [151X-ray (2.0A, 19 %) MH, unpublished data

Macrophage colony-stimulating 158 x 2 X-ray (2.5 A, 20 /) lhmc [161factor (M-CSF)C

Interleukin-5 (IL-5) 115 x 2 X-ray (2.4 A, 21 %) [171

alndicates the method by which the structures were solved. For structures determined by X-ray crystallography (X-ray), the resolution of the analysis(in A) and R-factor (%) is given. For the structures determined by multidimensional NMR spectroscopy (NMR), the number of constraints per residueand the estimated precision of the main chain atoms (A) is given. bRefers to the access code to obtain the coordinates from the Brookhaven proteindatabank [57]. CM-CSF here refers to M-CSFc, which is a shorter version of M-CSF and M-CSFy.

Page 4: Structural comparisons among the short-chain helical cytokineshelix D. (d) Exon structures of the structurally known members of each subfamily (references in the order of appearance:

162 Structure 1994, Vol 2 No 3

Fig. 2. Sequences and secondary structures of the five cytokines compared in this study. For each protein, the first line gives the aminoacid sequence in the one-letter code (upper-case) and the second line gives the secondary structures according to the program DSSP[391; e = 3-sheet, h = ac-helix, g = 310 helix, b = 1-bridge, t = hydrogen-bonded turn and s = bend. For IL-4, the structure assignmentfor the X-ray diffraction-derived model (xlL-4) is given first and the NMR-derived model (nlL-4) is on the following line. Regions which arepart of the consensus framework (Fig. 3) are underlined.

NMR-derived structure.) The equivalent a-carbon po-sitions common to all of the combinations constitutea common framework for the fold, which consists of11 residues from a-helix A, 6 residues from 13-strand 1,13 residues from ca-helix B, 13 residues from a-helix'C and 18 residues from -strand 2/c-helix D for a to-tal of 61 residues. These five conserved segments willbe referred to as cA, 131, aB, aC, and 132/aD, respec-tively. The consensus framework as defined includes41-48% of the residues of each cytokine. As can beseen in the superimposed structures (Fig. 4) and hasbeen discussed in many of the original publications,the remainder of the residues are accounted for bythe varying lengths of the six main secondary structuralelements plus variations in the connecting loops andtermini which can include additional short secondarystructural elements.

Table 2 lists the root mean square (rms) deviations ofthe final overlays for the a-carbons in each of the struc-tural elements based on all pairwise equivalent residuesand the subset included in the consensus framework.The overall deviations in the pairwise combinationsrange from 1.7 A in the IL-4/IL-2 and IL-4/GM-CSF pairsto 2.9A in the IL-4/M-CSF and IL-2/IL-5 pairs. Com-parisons of the individual segments of the frameworkshow that the aA and aD elements have the lowest aver-age rms deviations (1.6A and 1.8A), while the 131 andaB elements have the highest average rms deviations(2.8A and 2.93A). The very high deviations of aB in allIL-5 comparisons are due to a rotation of this helix inIL-5. These results reflect that the A, C and D helicesare most consistently packed relative to one another.However, this trend is not completely uniform. For in-stance, aA and acB are the most similar segments in theIL-4/M-CSF pair.

For both the pairwise combinations and the com-mon framework, the overall deviations show that IL-4,GM-CSF, and IL-2 are more similar to each other than

to M-CSF or IL-5. It is interesting that the various levelsof structural similarity are not reflected in the level ofamino acid sequence identity. In fact, within the com-mon core, the xIL-4/M-CSF pair has both a greater se-quence identity (20 %) and higher structural deviation(2.8A), while the GM-CSF/IL-2 pair has a much lowersequence identity (11 %) and a lower structural devia-tion (2.1 A). This observation suggests that divergencein sequence among this family of proteins is extensiveenough that, for the most part, residue identities as op-posed to similarities are likely to be coincidental ratherthan the result of absolute conservation of a residuefrom a common ancestor. This is supported by theobservation that many identical residues adopt differentside chain conformations in the various structures (datanot shown). Unfortunately, this phenomenon makessequence alignment and homology building rather dif-ficult.

Common inner coreThe 61 residues that make up the family framework in-clude most of the residues that contribute to the buriedinner core of the fold. To identify these residues andto further assess how similar the frameworks of theindividual structures are, we calculated the accessiblesurface area as a function of residue number (Fig. 5).There is general agreement between the plots, as theca-helices and P-strands tend to show the expected pe-riodicity of three to four residues and two residues,respectively. These trends are particularly visible in theaverage profile (Fig. 5f).Among the five structures, 31 residues within the frame-work average <20 % surface accessibility. As antici-pated, these residues correlate well with those whichcan be seen to contribute to the packing of the innercore of each protein. The major exceptions in the in-dividual profiles are located in caB, where some non-core residues have lower than expected accessibilitiesbecause they are covered by the acC to caD connect-

10 20 30 40 50 60 70 80 90 100 110 120 130 140 150IL-4 I I I I I I I I I HKCDITLOEIIKTLNSLTEQKTLCTELTVTDIFAASKDTTEKETFCRAATVLRQFYSHHEKDTRCLGATAQQFHRHKQLIRFLKRLDRNWGL ALNSCPVKEAOSTLENFLERLKTIMREKYKCSttsthhhhhhhhhhhhhh ttttseee ggg ss hhhhhhhhhhhhhhhhhhhtt ttts sshhhhhhhhhhhhhhhhhhhhhhhhh eeehhhhhhhhhhhhhhhhhhhh

hhhhhhhhhhhhhhh gggssee sggg ss hhhhhhhhhhhhhhhhhhtttstts sshhhhhhhhhhhhhhhhhhhhhhhhh s eehhhhhhhhhhhhhhht s

GMCSF I I I I I I I I I IAPARSPSPsTQPWEHNAIQARRLLNLSRDTAAEMNETVEISEMFLQEPTL TRLELY LRGSLTKLKGPLTMMAIYKQHCPPTPETSCATIITFSFKENLKDFLLFDEPVQE

ttt hhhhhhhhhhhhhhhh hhhht eeeeess tts shhhhhhhhhht gggggghhhhhhhhhhhhhhs ss eeeeehhhhhhhhhhhhhhs ss

IL-2 I I I I I I I . I I I I I IAPTSSSTKKTQLOLEHL I DLI TLNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIV GSETTFMCEYADETATIVEFLNRWITFCQSIISTLT

s hhhhhhhhhhhhhhhhhhhhhhhht tt shhhhtt b bs sgggghhhhhthhhhhhhhtts gggsss hhhhhhhhhhhhhhhh s ttss b ss b hhhhhhhhhhhhhhhhttt

M CSF I I I I I I I I I I I I IEEVSEYCSHMIGSGHLOSLQR IDSQMETSCOITF FVDQEQLADPVCYLKKAFLLVQDIEDTMRFRDNTPNAIAIVQLQESLRLKSCFTKDYEEHDAACVRTFYETPLLFKVKNFNT NLLDKDANIFSKNCNNSFAECSSQG

hhhhhs hhhhhhhhhhhhts eeeeee ttt hhhhhhhhhhhhhhhhhht ttshhhhhhhhhhhhhhhhhtts gggttsseeeeee hhhhhhhhhhhhhhhhhhhhh ttggg hhhhht s

IL-5 I I I I I I I I I I IIPTEIPTSALKELALLTHRTL LIANETLRPVPVHKNHQLCTEEI FGITLSQTVQCGTVERLFKNLSLIKK YIDGQ KKKCGEERRRVNFLDYLFLGV EWIIES

hhhhhhhhhhhhhhshhhhht tt eeee sss gggshhhhhhhhhhhhtts ssthhhhhhhhhhhhhhhhhhhhttsseeeehhhhhhhhhhhhhhhhht

Page 5: Structural comparisons among the short-chain helical cytokineshelix D. (d) Exon structures of the structurally known members of each subfamily (references in the order of appearance:

Short helical cytokine fold Rozwarski et al. 163

Table 2. Structural deviations and sequence identities for pairwise overlays.

xA j1 cB OC 0f2/D Overall

equiv 18 (11)rmsd 1.7 (1.4)ident 6 % (9 %)

14 (13)

1.9 (1.8)14 (13)

2.1 (2.1)29 % (31 %)

19 (13)

1.7 (1.5)19 (13)

1.9 (1.7)

11 %/o (15 %)

19 (13)2.6 (2.5)19 (13)

2.4 (2.1)11 % (15

%)

15 (13)2.5 (2.1)15 (13)

2.3 (2.1)

27 % (31 %)

14 (13)2.7 (2.5)

0% (0/%)

18 (18)1.1 (1.1)

18 (18)1.2 (1.2)

33 % (33 %)

22 (18)1.3 (1.3)21 (18)

1.2 (1.2)27 % (28 %

)

24 (18)3.0 (2.5)21 (18)

3.0 (3.0)21 %/ (28 %)

22 (18)1.8 (1.7)19 (18)

1.7 (1.7)14 % (17 %)

19 (18)1.2 (0.9)

21 % (22 %)

69 (61)1.7 (1.7)69 (61)1.7 (1.7)

26 %0/ (30 %)

78 (61)1.7 (1.6)81 (61)

1.8 (1.6)17 %/ (20 %)

80 (61)2.9 (2.8)78 (61)

2.7 (2.6)15 % (20 %)

74 (61)2.8 (2.8)72 (61)

2.8 (2.7)

19 % (21 %)

78 (61)2.4 (2.1)

9 % (11 %)

equiv 11 (11)rmsd 2.3 (2.3)ident 9 % (9 % )

equivrmsdident

16 (11)1.6 (1.3)

19 % (27 %)

equiv 15 (11)rmsd 2.2 (2.6)ident 20 % (27 %)

equivrmsdident

17 (11)1.3 (1.2)

18 % (180/)

equiv 12 (11)rmsd 2.1 (2.0)ident 17 % (18 %)

Average equiv 15 (11)rmsd 1.6 (1.6)ident 16 % (20 %)

9 (6)3.3 (2.8)

12 0/ (17 %)

14 (13)2.9 (2.9)

13% (13/%)

16 (13)2.3 (2.1)

14 % (18 %)

21 (18)1.9 (1.8)

19 % (21 %)

76 (61)2.4 (2.3)

15% (18%)

For each pairwise comparison, statistics are given for all equivalent residues in the structural element and, in parentheses, for the subset of residues foundin the family consensus framework. The statistics reported are as follows: equiv = the number of equivalent residues in the segment; rmsd = the rootmean square deviations of the equivalent Ca atoms; ident = the percent amino acid sequence identity based on the equivalent residues. A comparisonof the xlL-4 and nlL-4 models assigned 103 equivalent residues with an overall rmsd of 1.1 A and a core rmsd of 0.9 A.

ing segment which passes around the outside of cB. residues 1, 3 and 4 of CxB, while for IL-2 it passes higherThis chain crossing takes place in different positions mainly covering residues 4 and 7 of aB. In IL-4 it passesfor the five structures and causes major differences in still higher to cover residues 7 and 11 of aB (Figs 4the individual accessibility patterns of ocB. In GM-CSF and 5). In IL-5 this is where the crossover betweenand M-CSF the crossover is very low and mostly covers the subunits occurs, so that the aC to oaD connecting

xlL-4/GM-CSF

nlL-4/GM-CSF

xl-4/11-2

nlL-4/IL-2

xlL-4/M-CSF

nlL-4/M-CSF

xl-4/IL-5

nlL-4/IL-5

GM-CSF/IL-2

equivrmsd

equivrmsdident

equivrmsd

equivrmsd

indent

equivrmsd

equivrmsdident

equivrmsd

equivrmsdident

14 (11)1.2 (1.2)14 (11)

0.9 (0.9)29 % (36 %)

18 (11)1.2 (1.1)18 (11)

1.3 (1.3)11 % (18%)

15 (11)

2.3 (2.6)15 (11)1.9 (1.9)

13% (18%)

15 (11)1.7 (1.2)15 (11)1.3 (0.9)

13 % (18 %)

10 (6)2.0 (2.0)10 (6)

1.7 (1.9)30 % (50 %)

6 (6)1.7 (1.7)10 (6)

2.5 (1.0)0% (0%)

7 (6)4.2 (3.8)

8 (6)3.9 (3.5)

14 % (17 %)

6 (6)4.5 (4.5)

7 (6)4.8 (4.6)

17 %/ (17 %)

13 (6)2.9 (1.6)

0% (0/%)

13 (13)2.3 (2.3)13 (13)

2.1 (2.1)8%

0/ (80/o)

13 (13)2.4 (2.4)13 (13)

2.3 (2.3)23 % (23 %)

15 (13)2.8 (2.9)15 (13)

2.2 (2.4)13 % (15%)

16 (13)4.1 (4.2)16 (13)

3.9 (4.0)25 % (23 %)

14 (13)3.3 (3.1)

14% (15 %)

GM-CSF/M-CSF

GM-CSF/IL-5

IL-2/M-CSF

IL-2/IL-5

M-CSF/IL-5

10 (6)3.1 (2.2)

20 % (33 %)

11 (6)

3.5 (3.0)9% (17/%)

8 (6)2.8 (2.5)

13 % (17 %)

12 (6)4.6 (3.2)

17 % (17 %)

7 (6)3.4 (3.2)

0% (0%)

13 (13)2.2 (2.2)

150% (15/%)

15 (13)2.7 (2.7)

7 % (8 %)

14 (13)2.7 (2.6)

7% (8/%)

14 (13)4.4 (4.6)

7% (8/%)

15 (13)3.1 (3.1)

7% (8%)

15 (13)2.7 (2.8)

13% (15/%)

20 (13)2.9 (2.6)

10% (15 %)

18 (13)1.6 (1.6)

22 % (31 %)

13 (13)1.9 (1.9)

8% (8/%)

17 (13)2.3 (2.0)

12% (15/%)

19 (18)2.4 (2.4)

16 % (17 %)

19 (18)1.5 (1.4)

26 % (28 %)

21 (18)2.2 (2.1)

10 % (11 %)

25 (18)1.5 (1.5)

16 % (17 %)

22 (18)3.0 (2.9)

9 % (11 %)

68 (61)2.6 (2.4)

15% (16 %)

81 (61)2.5 (2.2)

15 % (20%)

76 (61)2.3 (2.3)

14 % (18 %)

81 (61)2.9 (2.7)

14 % (13 %)

73 (61)2.8 (2.7)

10% (11 %)

Page 6: Structural comparisons among the short-chain helical cytokineshelix D. (d) Exon structures of the structurally known members of each subfamily (references in the order of appearance:

164 Structure 1994, Vol 2 No 3

IL-4 4GMCS IL-2 M-cSF

4D 16V 110r 17N 12L' 18A 13

IL-5

1Ln 191 14L 122 v82 200 1SE 13S 12KSE 21! 14 14G 13e10 22 17L 15H 14IAI 23R1 L 16L 1L12K 24R 19L 17Q 16A13T 25L 20D 18 mlL14L 26L 21L 1S9L 14L15N 27N 22 202 1'916 24 231 2R 20117L 296 241 22L 2215

121' 25L 23I 2219e 26N 24D20Q 27G 25S21K 281 260

23L 34A 3eR24C 35 39M25' 384 44,26z 37N 41T 32D

2L 3se 42F 333I 31L28T 391 43K 341' 3329V 40 44F 35F 33I30T 41E 45Y 36e 3 P31D 42V 46M 37F 3sv

39D

40T 54C 51T 49Y 44C41e 5L 528 S 45142K 562 53A 51K 46043e 57T SW K 2K 47e44T 5OR SW 53A 48145 5NL 546 54F 4SF

47R 61 s 56L 5A

48 62Y 5 9L 5 57V 549A 63K 63 582 53.50T 614Q 8 5D8 SS1v 60 62z 60l 54L521 66 63L 621 56B

3 62E 57S5X 63D 0 585 . 59

77K iT78 MIR 72P79A 92P 73NDI 683R 74 7

aR 84D 751 63G7F 69S EL 76A 64T

a3L 71 86 m77 6984K 71T 7S 7V 6BMR 72K BEN 790 67R8 74L 1 89r 1 8L 604a7D 748 984 Q8 6WStR 7G 91V an 70K69N 76P 921 a83 71N9L 77L 93 s 72L91W 781 94 am 73S92G 7M 95z 86 74L93a 8 4 96L a 7L 7519 81A 97K 889K 76K

109D M2(E 106. 8Y

106Q 100! lUT 107Y 90R107S 10. 112 106 91R1081T 102T 113 109O 9R109 1037 114! 11Q 93V110 14 15? IIL 9091131 10 UG 1110 982112F 106 117F 13 961a3L 1071K 18L 114L 97114 168 1W USB 98D

SR 107N 12R 16 99Y116 1L . 1W 117V 101U17K 12 ^ 1221 USK 1OQ1t- 12D r3T 219N 102B119! 113F 124 120W 103F1284 1141 121 12F 104L12aR 13 12860 I 105012 11W 127! 123E 106V

124125612R9

129! 125K o10N1305 109T

110!

GM-CSF IL-2 MCSF IL-5 IL-2 -CSF IL-S5

12P 7'

14R 5K 6P X 6PS lar . 7T luT . 7T

16V 01 65 110 e617N 12L 9 12L .18?A 133 10. 13Q lOL

19! 14L 12m UV 14L 12G a1V20Q 15E 138 12K 1SE 13 12K21E e1 1 4 13E 1le 14 13E22A 17 1s 14T 17 154 14723R 18 16L 11, 1L 16 15L24R 19L 17Q 16?. 1Ls 17Q 16A25L 2aD 185 m1 20D 1M 126L 21L 19L 18L 21L 14L 18,27N 2Q 20Q 19S 220 20Q 1962L 23M 2IR 201' 234 2R 20T295 24 22L 21 2A 22 21

25L 231 22R2N4 24D273 256281 29

321 34L i33A 37T34A 38 .356 384 A 3. 283s4 4 310 29E 4M 310 29337N 41T 320 3'3036 42F 33! 31L 2 33 31L39 4K 34 32R 43K 34T 32R40V 44F 35F 33I 44F 35F 331412 4sr 36e 3 P 4sY 36e 34P42V 4 37F 35v 46 37F 35V

445 48K 39D 37V 4X 37V45 . 40Q 38 49K 38H

5 . 39K529 4253T54C 51T 49Y55L s2 50L56 53L 52K157T 54K K5R 5 5s3A59L 54 54F6e 5Q 5%61L 50c 56L62Y 59L SN63K 60 5S064Q 61B 59D60 62 601669 63 64

67R 66K

701 88s 77717 a7n 7WV729 8am 780

74 89 8MO789 984 81075 SN 6876P 921 ML

781 9 8.78T 960 884

BON 99!. ex1MlA M7 869an 960 M.

80901845 - . 98919

862sm

.

3.1

982I I1 19100! 1 117 1075101! 112? 168102T 113T 10911037 114! 11091043 115V 111L105S 1160 11101069 1177 113110= Il4 114168r 119 11601089 1208 16111 2221W 117V1ux IM 1111M 127T I1137 126 12W1141 1201 121F118. 1282 1411 126 123N117! 128 1237

44C4sT46047!4849F

5CO

sm53A

51T 49Y 44C52 S& 4s7

54K 52K 4

5 53A 48!56 54F 495s 55. 5058C 56L 5059L S57V 266 580 530682 56 D 5T6i8 60I 5s

L 6M 5B

68K 62 57S

eAR 729e2P 73Ne83 74.8D 751

64T A. 7D

6W I 8 771 65V66e M87S 7WV 66067R I I a 79Q 67R

6 9I I 81 6W7*K 91V a8 70K71 I 921 e3 71N72L a8s 7AL738 94 8 aL 73874L1 9 86 74L751 941 a7 75176K I 1 88K 76K

78W791

(81810

3S1

*is 131 860

9R I 1117 107Y 9GR91R I I1 106 91R

I 113T 109T 92R93V 114 I 11P 93V94N I13 lL 9495 11165 11Q 95096F 117F 113L 96F971 1160 114L 979iD | I mAs lm 9iD9Y 12OR 116K 99Y

101 121W 117V 10o101Q 122! 1l9 10Q10W 123T ll9 1023103 I 124F 12W 103F104L 125 121 104L100 12 I 122N 105G106V 7S 17 123 106V

1291 125K 10o1305 126N 109T131T 127L 11UB132L . 111W

X-CSF IL-5

12G1 V13S 12K14G 13E1M 14T16L 17L17Q 16A

1 1 4 120Q 19S21R 201

23I 2

39D 37V

54 4459. S59!. 52583 547

6016 578

62

751 632

7178 66879 67

83. 72

a. 788 741

90 791F 79

10710 91 6 8

0L B 93V111L 9

L111 96.1160r 901169 57

12W 10371218 104.122N 1050

I= 1084

sr loG127L 1106128L 1111

framwodck

6

20

1a2aA3A4CA5SA6MA7vA80A9A

1O A110

tb17

211

43101

to10

2aBlaB4a85aB6MB7908o89M8

12

10aB

13aB

to30

18:2aC3AC4C5sac69C7ClcaC9ar:

12aC13aC

to17

022A23B2

2aD3D4D5_D6SD7D80090D

10W11D12a13aD14aD15WD

2o26

43L.III I

j j i

Page 7: Structural comparisons among the short-chain helical cytokineshelix D. (d) Exon structures of the structurally known members of each subfamily (references in the order of appearance:

Short helical cytokine fold Rozwarski et al. 165

Fig. 3 (opposite). Structurally equivalent residues for each pairwise comparison. The four groups of columns show pairwise comparisonsof IL-4 with the other four structures, then GM-CSF with the remaining three structures, then IL-2 with the remaining two and lastlyM-CSF with IL-5. The family consensus framework produced by all the comparisons is boxed. In the framework column, each residueposition is given a designation according to the segment in which it belongs. In this same column, the range of the number of residuesobserved between each segment in all of the protein sequences is included.

Fig. 4. Stereoviews of the superposi-tion of the ac-carbon backbones of thefive short-chain cytokine structures: IL-4(red), GM-CSF (green), IL-2 (cyan), M-CSF(violet) and IL-5 (yellow). (a) CompleteC,-backbone trace shown in a similarorientation to Fig. la. The central por-tions of the structures, which consti-tute the consensus framework, super-impose well, even though the individ-ual helix lengths are quite diverse. Ofnote is the variation in the path of thechain coming around the left side ofthe structure from the bottom of he-lix C to the top of helix D. In GM-CSF(green) the chain passes near the bot-tom of helix B, in M-CSF (violet) andIL-2 (cyan) it passes somewhat higherand in IL-4 (red) it passes higher still.The structure shown for IL-5 is a com-posite monomer comprising residues5-87 from one chain and 88-112 fromthe other chain. (b) Ca-backbone traceshowing just the consensus frameworkin the same orientation as (a). (c) C,-backbone trace of the consensus frame-work rotated through 900 relative to (a)and (b) such that the view looks downthe axis of the bundle.

segment does not cover ctB at all. In IL-4, GM-CSF andIL-2 this chain crossing is stabilized by a disulfide bondto acB.

Even among the 31 inner-core residues, there are vari-ations in the individual accessibility patterns. These arepartly correlated with small rotations in the helix axes,but mostly are due to the occurrence of residues withdifferent polarity at sites that are near the surface ofthe proteins. There are 13 cases (out of 31 x 5 = 155total positions) for which the individual accessibility

value of an inner-core residues exceeds 20 % (Fig. 5).The composition of the amino acids found at thesepositions is noteworthy: four are lysine, four glutamicacid, two arginine, two glycine and there is one eachof threonine, proline and isoleucine. Ten of these arepolar residues with long apolar stems which can foldto bury their methylene groups and turn outward toplace the polar end in the solvent. A good example ofthis is seen at position 5A, where four structures haveapolar residues, while human GM-CSF has an arginine(Figs 3 and 6a).

Page 8: Structural comparisons among the short-chain helical cytokineshelix D. (d) Exon structures of the structurally known members of each subfamily (references in the order of appearance:

166 Structure 1994, Vol 2 No 3

Fig. 5. Solvent accessibility patterns ofthe framework residues. (a) IL-4 (X-rayderived), (b) GM-CSF, (c) IL-2, (d) M-CSF,(e) IL-5, and ( average. The accessiblesurface area for each residue was calcu-lated with the program DSSP [391 usingall atoms of the structures. For M-CSFthe monomer was used. To convert topercent accessibilities, the surface ar-eas were divided by those calculatedfor amino acids in a Gly-X-Gly tripeptide[56]. Values > 50 % were set to 50 %for plotting purposes but not for theaveraging. The positions which average< 20 % accessibility and are defined asthe inner-core positions are shown byfilled circles. The fact that some residueswhich have average accessibilities of be-tween 20 and 25 % are excluded fromthe inner-core while some residues withexactly 20 /% are included emphasizesthat such a 20 %/o cutoff, although usefulfor analysis, is somewhat artificial.

The overall amino acid compositions of the inner coreand the exterior framework are given in Table 3. Thelarge majority of the inner core residues are hydropho-bic, but still 23 % of them have a side chain withhydrogen-bonding potential. Most of these side chainsare folded so that the hydrogen-bonding group is at thesurface of the protein, but some of them are involved inburied hydrogen-bonding interactions (see below). Ifthe inner core were to be defined more restrictively, in-cluding only residues having an average accessibility of<5 %, 18 positions would qualify, and hydrogen bond-ing side chains would only account for about 12 % ofthe residues in each of the protein cores. The polar sidechains are fairly well distributed among the inner-coreresidues so that only 12 of the 31 inner-core positionsremain purely apolar. If all known sequences of theseproteins are included in the analysis (Fig. 7) then onlynine of the inner-core residues remain consistently ap-olar, and only a single inner-core residue, the leucineat 8xA, is perfectly conserved. (The reported DNA se-quence for sheep IL-4 has a proline at this position,but as only a single base change converts the codonto a leucine, this may be a sequencing error.) Even forthis leucine, however, the side chain torsion angles arenot conserved, suggesting there may not be an absolutestructural requirement for leucine at this position. It is

also of interest that none of the disulfide bonds arewithin the helical bundle core. The cysteine in the innercore at 15caD of IL-2 is present as a sulfhydryl and thatat 8aB is in a disulfide bond with the C to D crossover.

An analysis of the major packing interactions in eachprotein shows that despite the diversity in sequences,the inner core contact patterns are reasonably wellconserved. For inner-core contacts within 4.5 A, IL-4has 52 interactions, GM-CSF has 57 interactions, andIL-2 has 54 interactions. Twenty-four of these inter-actions occur at common residue positions in thesethree proteins. An interesting feature of this is thepresence of hydrophilic side chains that have beensubstituted for hydrophobic side chains at buriedpositions. These substitutions are usually toleratedthrough differences in pairs (or triplets) of residuesand may also involve. changes in main chain confor-mation. An example of this occurs at positions 11cAand 5oSD. In GM-CSF a serine-lysine hydrogen bondpair exists, while in the other structures, and even inthe mouse GM-CSF sequence [21], hydrophobic sidechains occupy this location (Fig. 6b). Other examplesinclude the Asp31-Serl07 pair in IL-4 (Fig. 6c), theTyr62-AsnlO9 pair in GM-CSF, the Gln58-Ser84 pair in

(a) 50O

O(b) 5 1

50 o(c) 50-

(d) 50-0-

(e) 50-

0-o 50

oi

IL-4

:GM

IL-2

:M

: IL-5

: Avg

aA I1 aB aC 2 /aD

LQEIIKTLNSL LTVTDI TEKETFCRUTVL LKRLCRNLGUG QSTLENFLERLKTDEK

IQEARRLLNLS ETVEVI CLQTRLELYKQCL LTKLKGPL)1S IITFESFKENLKDFLLVI

LEHLLLDLI FKFYMP TELKHLQCLEEEL ISNIIWVNLELKG TATVEFLTFCQSI

GScHLQSLQL TFEF YLKKAFLLVDI ZVLQELSLRLKS YETPLOLLEKVKNFINET

VKETULLSTH RIPVP CTEEIFQGIGTLE VERLFKNLSLIK(K RRRIVNQFLDYLQEFLGU

12345678911 123456 123456739111U1216781111 23123456719111111l1 123 123 12345

Page 9: Structural comparisons among the short-chain helical cytokineshelix D. (d) Exon structures of the structurally known members of each subfamily (references in the order of appearance:

Short helical cytokine fold Rozwarski et al. 167

Fig. 6. Examples of polar residues ininner-core positions. (a) A slice throughthe four-helix bundle showing residues3cLA-6cA, 7B-10aB, 6cC-9aC and7ocD-10aD. GM-CSF (green), IL-5 (yel-low), and M-CSF (violet) are shown.Arg23 of GM-CSF (at position 5A) andHis15 of M-CSF (at position 4A) ex-emplify residues at inner-core positionswhich adopt conformations that ex-pose their polar moieties to solvent.Dotted lines highlight, buried hydro-gen bonds involving Tyr62...Asn109 inGM-CSF (green), Thr14...Asn71 ...Lys70 ofIL-5 (yellow), and Gln58...Ser84 in M-CSF(violet). The labels give the consensusframework numbering. (b) The Ser29(11aA), Asp31, Lys107 (5D) triplet ofhuman GM-CSF. Asp31 is not a frame-work residue, but is included in the in-teraction. It should be noted that evenin mouse GM-CSF, these three residuesare changed to a hydrophobic triple ofmethionine, valine and isoleucine [21].(c) The Asp31 (5J1), Ser107 (212) pairin IL-4. In (b) and (c) GM-CSF (thicklines) and IL-4 (thin lines) are both shownto contrast the polar and apolar clus-ters and lengths of hydrogen bonds aregiven in A.

M-CSF, and the Thrl4-Asn71 pair of IL-5 (all shown inFig. 6a).

Alignment of unknown structuresIn order to predict the secondary structure locations ofother members of the short-chain subfamily, an accu-rate amino acid sequence alignment is necessary. Un-fortunately, given the high divergence of the sequencesin this family, such alignments have been difficult, ashas been noted previously [22-25]. However, the anal-ysis presented above shows that the short-chain helicalcytokine family has a well conserved structural frame-work whose information content could be extended toother members of the family. Fig. 7 shows a sequencealignment profile for the short-chain subfamily basedon the framework of the five structures analyzed here.The profile was improved by including the sequencesof other species for each cytokine. It consists of thefive segments of the consensus framework in which no

gaps are allowed, a minimum spacing between eachsegment, and a requirement that each framework el-ement map largely onto its appropriate exon (see Fig.7 legend).

The power of the profile was tested by cross-validationanalysis, where the five cytokines were each individuallyremoved from the profile and the remaining four wereused to find the optimal alignment of the one excluded.The cross-validation tests showed that aA and cxC couldbe correctly aligned in all five cases, while 032/aD wascorrectly aligned in four cases, and 31 and cB werecorrectly aligned in only three cases. The lower perfor-mance for 31 may relate to its short length, and for cxBit may relate to the variation in its interaction with theacC to cD connection which causes the hydrophobicitypattern to be less conserved. For the helices that aremisaligned, the alignment is incorrect by either 3 or 7residues (one or two turns) so that the hydrophobicity

(b)

(c)

Page 10: Structural comparisons among the short-chain helical cytokineshelix D. (d) Exon structures of the structurally known members of each subfamily (references in the order of appearance:

168 Structure 1994, Vol 2 No 3

pattem is in register. Fig. 7 also reports the alignmentsyielded by our profile for the putative members of theshort-chain subfamily IL-3, IL-7 and stem cell factor(SCF).

Comparison with growth hormoneHuman growth hormone is a member of the long-chainsubfamily, which clearly has its own unique structuralframework. However, since much is known about theway in which growth hormone binds to its receptor[19,20], a comparison of the short-chain frameworkwith growth hormone could give insight into how theshort-chain helical cytokines bind their receptors. Be-cause the helices are much longer in growth hormoneand the helix packing angles are somewhat different,six different alignments are possible which allow areasonable correspondence of all helical frameworkresidues from the short-chain fold with helical residuesfrom growth hormone (Fig. 8). Overlay statistics werecalculated for each of the six possible overlays (rasters)using each of the five short chain structures (Table 4).No single raster is best for all five structures, with low-est deviations occurring for rasters la (for IL-2), 3a (forIL-4, GM-CSF and M-CSF), and 3b (for IL-5). However,if it is assumed that a single overlay is valid for all of

the structures then raster 3a clearly stands out as thebest choice. The alignment for this raster is includedin Fig. 7 (as human GH).

DiscussionA comparison of the five known short-chain helicalcytokine structures has revealed that there is a struc-turally conserved framework which includes five seg-ments and nearly half of the residues in each protein.However, the level of structural difference (as great as2.9A rms deviation) highlights the malleability of theshort-chain cytokine fold. In a study of protein struc-tural evolution by Chothia and Lesk [26], the pairs withthe greatest divergence (hemoglobin at-chain versuserythrocruorin and plastocyanin versus azurin) have anrms deviation of only 2.3k We speculate that thereare three main factors contributing to the high plas-ticity of the cytokine fold: firstly, there is no active sitewhich demands extremely precise positioning of reac-tive groups; secondly, the individual cytokine structurescan co-evolve with their cognate receptor so that eventhose residues which are important for recognition mayhave more freedom to change and shift, and thirdly,the fold is based on a helical bundle so that, unlikethe -trefoil family of factors [2], there are no mainchain-main chain interactions in the core which mustbe conserved and thus limit structural divergence.

As in all comparative studies, the exact composition ofthe framework segments defined here is not absolute: itis a function of the set of cytokine structures used, thefact that equivalencies were based on local rather thanglobal overlays, and the 3 A cutoff value used. Examin-ing the residues included in the 31 segment illustratesthis point. If only nIL-4, GM-CSF and IL-2 were usedfor comparison, four residues which form a short a-helix between ctA and 31 would have been included inthe framework (Fig. 3). In addition, although nIL-4 andxIL-4 (the NMR-derived and X-ray-derived structures,respectively) are very similar, these same extra fourresidues are not scored as equivalent in the xIL-4/IL-2pair. This is due to a small difference between xIL-4and nIL-4 such that the residue preceding 31 is 3.1 Aaway from its equivalent in IL-2 and is barely excludedby the 3A cutoff.

In addition to the defined structural elements of theframework, other common features of the fold exist.Four of the five proteins have a disulfide bond connect-ing aB to the loop between moC and P2/aD. However,the cysteine in aB occurs at a unique framework posi-tion in each protein (7oLB in IL-4, lcdB in GM-CSF andIL-5, and 8aB in IL-2; M-CSF has a disulfide involving theresidue preceding IcB, but connects to a residue after,32/aD). The disulfide bond and conserved 13-sheet inthe short-chain subfamily may be involved in stabilizingthe chain crossings which differ from those of the othersubfamilies (Fig. 1).

Table 3. Amino acid composition of inner-core and externalframework residues.

Residue type Inner-core External

Non-hydrogen bonding: Gly 4 6Ala 7 1Cys 2 3Pro 4 1Val 13 5lie 20 3

Leu 50 11Met 5 2Phe 15 2

Subtotal 120 (77 %) 34 (23 %)

Hydrogen bonding: Ser 4 9Thr 5 19Asn 2 10Asp 2 4His 3 1GIn 1 16Glu 6 24Lys 6 16Arg 4 12Tyr 1 4Trp 1 1

Subtotal 35 (23 %) 116 (77 %)Totals 155 150

The number of residues is summed over the five structures used in thecomparison. The 31 residues of the inner-core and the 30 residues whichare external are defined in Fig. 5. Although the sulfur atoms of cysteineand methionine are technically polar and can be involved in hydrogenbonding, such interactions are rarely seen [581 and we have classifiedthe residues as non-hydrogen bonding.

Page 11: Structural comparisons among the short-chain helical cytokineshelix D. (d) Exon structures of the structurally known members of each subfamily (references in the order of appearance:

Short helical cytokine fold Rozwarski et al. 169

Fig. 7. Amino acid sequence alignments based on framework residues. The first section shows the aligned sequences of the five knownstructures and their cognates from different species. The vertical shading shows the residues that constitute the inner core. The numbersbetween the framework segments are the minimum number of residues required to bridge the adjacent segments. The five lines in thesecond section are the best alignments resulting from the cross-validation tests. The best alignments which correctly match the truestructural alignment are shaded. The four lines in the third section are the optimal alignments of the structurally unknown familymembers IL-3, IL-7, and stem cell factor (SCF). The vertical shading again highlights the inner core. The last line gives the optimal structuralalignment of growth hormone (raster 3a of Fig. 8, Table 4).

Implications for protein design and modelingIn de novo protein design studies, the four helix bun-dle has been a common target because of its sim-plicity [27]. It has been shown that designs employ-ing amphipathic helices, with leucines uniformly onthe inside and hydrophilic residues on the outside,form extremely stable bundles. However, NMR stud-ies reported that the designs had more of a moltenglobule-like nature instead of a unique mature struc-ture [28]. The short-chain helical cytokines, each form-ing a uniquely structured inner core, provide an in-teresting contrast. Their inner cores contain primarilyleucine residues, but also include a significant numberof isoleucine, phenylalanine and valine residues, as wellas 23 % polar side chains (especially the longer ones),of which about one third are truly fully buried.

Although isoleucine and valine are not strong helix-forming residues [29], when they do occur in a he-lix, their P-branched side chains can only adopt a sin-gle conformation at X1 [30], providing a significant or-dering influence on the core. Similarly, the phenylala-nine residues contribute to ordering as they are moreconformationally restrained than leucines and are alsofound primarily as a single rotomer. In addition, buriedpolar side chains will tend to interact with other polargroups (either on the surface or inside) in a uniquefashion, thus further contributing to a specific archi-tecture of the inner core. A detailed analysis of thenon-framework residues in each protein is beyond thescope of this paper, but a cursory inspection reveals anumber of well buried polar side chains which hydro-gen bond to distant portions of the chain to stabilize

Page 12: Structural comparisons among the short-chain helical cytokineshelix D. (d) Exon structures of the structurally known members of each subfamily (references in the order of appearance:

170 Structure 1994, Vol 2 No 3

the tertiary fold. This is reminiscent of protein-proteininterfaces where it is thought that apolar contacts pro-vide the majority of the binding strength and polar con-tacts largely provide specificity [31].

The cross-validation test shows that alignments basedon a four structure profile are imperfect, but are ap-proaching a reasonable level of reliability. It shouldbe noted that since some of the criteria used in thealignment (the minimum connecting lengths and theexon requirements) were derived from the five knownstructures, some bias is present even in the cross-vali-dation tests. For instance, if the exon requirement is leftout, the number of incorrectly aligned segments jumpsfrom five to eight (data not shown). For predictive pur-poses, these additional requirements are only helpfulif they are obeyed by the structures to be aligned. In-deed, if they are not obeyed then their inclusion guar-

antees that the profile will deliver an incorrect answer!We purposely have used no other information to gen-erate the predictive alignments of IL-3, IL-7 and SCF,although such information is available in some cases.For instance, we suspect that cxB of SCF is not correctlyaligned in Fig. 7, but that the alignment with M-CSFreported by Bazan [32] which conserves a cysteine pre-ceding residue lcdB will be correct.

One of the major contributing factors to the diffi-culty of aligning members of this subfamily is the spo-radic presence of polar residues at inner-core posi-tions. There are nine inner-core positions for whichfour structures have apolar residues and only one hasa polar side chain. It is striking that every one of thefive structures contributes such a unique residue: IL-4has aspartic acid at 51 and arginine at 8aB; GM-CSFhas arginine at 5A, tyrosine at 9cdB, and lysine at 5cD;IL-2 has glutamic acid at 12cLB and tryptophan at 8cD;M-CSF has serine at 8C; and IL-5 has glutamic acidat 13ciB. This suggests that even the nine inner-corepositions that are seen to be apolar in all of the se-quences of the profile need not be apolar in othershort-chain cytokines. Since many occurrences of po-lar residues in the inner core involve complementarysubstitutions at two or three interacting positions (Fig.6), sequence structure alignment methods using two-dimensional residue contact information [33,34] mayperform better than one-dimensional methods [35,36].However, even these methods will have difficulty withresidues at the interfaces of the inner core and surfacewhich can tuck into the core or point out to the surfacedepending on the nature of the side chain (Fig. 6a).

Structure/function predictionsBiochemical studies to identify receptor recognitionresidues in the short-chain helical cytokines have uni-

raster

laA llaA

La&_k 1liaA.

laA 11iA

la 13.-C

1ctC 13aC

1aD 15aD

1aD 15aD

laD 15oLD

1aD 15CaD

laD 15aD

laD 15aD

*· ** **· *** **

LSRLFDNAMLRAHRLHQLAFDTYQEFZAY-\-DSNVYDLLKDLEEGIOTLMGR-\-ALLKNYGLLYCFRKDMDKvETFLRIVQCRS

10 20 30 110 120 160 170 180helix A helix C helix D

Fig. 8. The six tested alignments of the short chain consensus core with the ac-helices of human growth hormone. The amino acidsequence of human growth hormone is given with asterisks indicating those residues directly involved in receptor binding [19,20]. Theblack bars above the sequence correspond to the aligned positions of the consensus framework helices A, C, and D. Rasters 1, 2 and3 allow for the sliding of the shorter short-chain consensus helices along the longer growth hormone helices and rasters a and b allowfor different tilts of the short-chain framework to match the different packing angles of the growth hormone helices.

Table 4. Statistics for overlays on human growth hormone.

Raster IL-4 GM-CSF IL-2 M-CSF IL-5 Average

la 1.9 2.4 1.8 2.2 3.0 2.3lb 2.5 2.5 2.2 2.9 2.5 2.52a 2.5 2.8 2.6* 2.9 3.5 2.92b 2.4 2.6 2.5* 3.1 2.9 2.73a 1.8 2.0 1.9' 1.9 2.7 2.13b 2.5 2.3 2.4' 2.9 2.2 2.5

The root mean square deviations (in A) are given for each of the fiveshort-chain structures for each of the six possible ways they can beoverlayed on the structure of human growth hormone (see Fig. 8). Thebest raster for each structure is highlighted in bold. All 53 C, positions(from the four helices of the bundle and residue 32) are found to beequivalent, except for four cases of IL-2 noted by asterisks in which only52 residues were defined as equivalent.

la

lb

2a 1lk 11Ak

2b la, 1 . a

3a laA llaA

3b

laC 13aC

laC 13aC

lac 13aC

lac 13aC

** ** **

Page 13: Structural comparisons among the short-chain helical cytokineshelix D. (d) Exon structures of the structurally known members of each subfamily (references in the order of appearance:

Short helical cytokine fold Rozwarski et al. 171

formly identified residues which occur on the A, Cand/or D helices [5,18]. One residue implicated to beimportant in IL-4, GM-CSF, IL-2 and IL-5 is an acidicresidue on helix A. Within the consensus frameworkdefined here, these residues are structurally equivalentin IL-4, GM-CSF and IL-5 (position 3ctA), but in IL-2 theresidue is one turn further up the helix (position 7caA).However, the side chain conformations are such thatthe carboxylates superimpose within about 3A, andcould conceivably interact with an equivalent residueon a receptor. This example highlights the danger ofassigning structural equivalencies on the basis of func-tional similarities.

Helices A, C and D are also the three main regions ofthe growth hormone structure which are recognizedby its receptor [19,20]. This suggests that if an ac-curate structural alignment can be obtained betweengrowth hormone and the short-chain subfamily, thenit may be possible to predict receptor-binding residueswithin the short-chain subfamily. Our comparisons ofthe short-chain framework with growth hormone sug-gest that one of the six conceivable alignments is no-tably better than the others (raster 3a in Fig. 8 andTable 4). This alignment (shown in Fig. 7) is identi-cal to those previously determined by considering allresidues (not just the framework residues) of someindividual short-chain structures [18,25,37]. We will notrepeat the discussion of those papers, but note thatthis alignment does indeed align many known func-tional residues of the short-chain family members withgrowth hormone residues which contact the receptor(for example, Asnl2 of growth hormone aligns withthe above mentioned acidic residues at position 3aA).From a structural perspective, it is interesting to notethat Asp169 of growth hormone aligns with 4tD, aposition which is fully buried and occupied by pheny-lalanine or leucine in all the short-chain structures. Asnoted by de Vos et al. [ 19], Asp169 makes buried polarinteractions with the side chains of Ser55 and Trp86.

Despite the appeal of the comparison with growth hor-mone, we note that there may not be a single choicefor the correct alignment, as the individual helical cy-tokines may not all bind their receptors with equivalentresidues. The growth hormone-receptor complex sug-gests that variation will occur because many residuesoutside of the four-helix bundle are also involved inreceptor binding and these residues have no obviousequivalents in the short-chain subfamily. The possibil-ity of the binding site sliding up or down the helicalframework by a turn or two along the four-helix bun-dle would bring another level of flexibility to the evo-lution of binding specificity for this family fold. Addi-tional cytokine-receptor complex structures are clearlyrequired to provide answers to some of these issues.

Biological implicationsStudies of the structure/function relations for cy-tokines are part of understanding the mecha-nisms of cell-cell communication at the molec-ular level. Structural studies have shown thatcytokines and growth factors can be groupedinto structural classes which transcend the classi-cal nomenclature and groupings originally arisingfrom the different disciplines of immunology, vi-rology, hematology, and physiology. For instance,many proteinaceous messengers which are his-torically and biochemically quite distinct, includ-ing growth hormone, colony-stimulating factors,interferons and some interleukins, are found toshare a common structural framework (a heli-cal bundle) and a common receptor type. Themembers of this structural family can be dividedinto three subfamilies that are represented bythe structures of granulocyte-macrophage colony-stimulating factor (GM-CSF), growth hormoneand interferon-3, and which we refer to as theshort-chain, the long-chain, and the interferon-like fold, respectively. An important goal is to de-termine how these molecules compare with eachother and how information about structure andstructure/function relationships can be general-ized among these molecules.

In this report, we define a number of structuralfeatures shared by the five cytokines that areknown to adopt a short-chain fold: IL-2, IL-4, IL-5,GM-CSF and macrophage colony-stimulating fac-tor. However, we also show that even this largeamount of information is insufficient to allow usto predict which other proteins will adopt thesame fold, because of the high level of divergencein amino acid sequence and in structure seenin this family. We also identify the best struc-tural alignment of the conserved framework ofthe short-chain fold with the growth hormonestructure. This yields potential insights into themode of receptor binding of the short-chain cy-tokines based on the structure of the growthhormone-receptor complex.

Materials and methodsCoordinates of the published structures of the cytokines wereused for the comparisons (Table 1). For IL-4, only one NMR

Page 14: Structural comparisons among the short-chain helical cytokineshelix D. (d) Exon structures of the structurally known members of each subfamily (references in the order of appearance:

172 Structure 1994, Vol 2 No 3

structure [8,37] and one X-ray structure [11] were utilized, al-though more structures have been determined by both methods[9,10]. For IL-2, the high resolution structure from Hatada (un-published data), for which we obtained permission to use theC, coordinates, was used for structural comparisons, but it isquite similar to the structure reported by McKay [14].

Structural superpositions were performed based on c-carbonatoms alone. In order to overlay the structures and assign equiv-alent residues, the strategy developed by Chothia and Lesk [26]was adopted. In this procedure, equivalent secondary structuralelements are chosen manually and used to define an initial con-servative set of structurally equivalent segments. One segment ata time, the structures are optimally superimposed [38] and addi-tional residues, adjacent to the original input residues, are identi-fied as equivalent based on a 3 Ai cutoff distance. The residues ofthe extended segments are then combined to define a completeset of equivalent residues which can be used to calculate thefinal overlay. This method has the advantage of being more gen-erous in defining equivalent residues than methods based onlyon separation of residues in a global overlay. In the cases studiedhere, a-carbons shifted as far as 6.8 A were included in the finaloverlay. However, we have found that the set of equivalent atomsdefined is dependent on which atoms are used in the startingset, and if the output of one round is used as input to the nextround the method does not always lead to a self-consistent andreasonable set of equivalent atoms. In this study, a self-consistentresult was obtained by a cyclic procedure which used the set ofequivalent atoms common to all pairwise comparisons as theinput to the next round.

All accessibility calculations and secondary structure definitionswere accomplished with the program DSSP [39]. Amino acidsequence alignments were carried out with a modified version ofthe package developed at Kansas State University [40] using theMcLachlan similarity matrix [41]. The modification allows oneto specify segments in one of the sequences into which gapscannot be placed and which we have used to disallow gaps inthe segments containing the core fold. Other scoring matriceswere also used, but did not give improved results in the cross-validation tests.

Acknowledgements: We thank Steve Jordan and Mike Milbum, GlaxoResearch Institute, Research Triangle Park, NC 27709, for allowingus access to the full coordinates of IL-5. We thank Brad Graves,Hoffman-LaRoche Inc., Nutley, NJ 07110, for allowing us early accessto the -carbon coordinates of IL-2 and running DSSP on the fullcoordinate set. This work was supported by grant GM43566-03 fromthe National Institutes of Health (NIGMS) to PAK, contract no. NO1-CO-74101 from the National Cancer Institute, DHHS with ABL to AW,and a grant from the AIDS Directed Antiviral Program of the Officeof the Director of the National Institute of Health to AMG and GMC.

References1. Thomson, A.W. (1991). The Cytokine Handbook. Academic

Press, London and San Diego.2. Murzin, AG., Lesk, AM. & Chothia, C. (1992). Beta-trefoil fold

patterns of structure and sequence in the Kunitz inhibitors,interleukins 1-beta and 1-alpha, and fibroblast growth factors.J. Mol BioL 223, 531-544.

3. Clore, G.M., Appella, E., Yamada, M., Matsushima, K. & Gro-nenborn, AM. (1990). The three-dimensional structure ofinterleukin-8 in solution. Biochemistry 29, 1689-1696.

4. Baldwin, E.T., et a, & Wlodawer, A. (1991). Crystal structureof interleukin-8: symbiosis of NMR and crystallography. Proc.Natl. Acad Sci USA 88, 502-506.

5. Sprang, S.R. & Bazan, J.F. (1993). Cytokine structural taxonomyand mechanisms of receptor engagement. Curr. Opin. Struct.BioL 3, 815-827.

6. Boulay, J.L. & Paul, W.E. (1992). The interleukin-4 family oflymphokines. Curr. Opin. ImmunoL 4, 294-298.

7. Bazan, J.F. (1990). Structural design and molecular evolutionof a cytokine receptor superfamily. Proc. Natt Acad Sci. USA87, 6934-6938.

8. Powers, R., Garret, D.S., March, CJ., Frieden, EA, Gronenbom,AM. & Clore, G.M. (1992). Three-dimensional solution struc-ture of human interleukin-4 by multidimensional heteronuclearmagnetic resonance spectroscopy. Science 256, 1673-1677.

9. Smith, L.J., et at, & Dobson, C.M. (1992). Human interleukin-4.J. Mol. BioL 224, 899-904.

10. Walter, M.R., et a, & Bugg, C.E. (1992). Crystal struc-ture of recombinant human interleukin-4. J. Biol Cbem. 267,20371-20376.

11. Wlodawer, A, Pavlovsky, A & Gustchina, A. (1992). Crystalstructure of human recombinant interleukin-4 at 2.25A resolu-tion. FEBS Lett. 309, 59-64.

12. Diederichs, K., Boone, T. & Karplus, PA (1991). Novelfold and putative receptor binding site of granulocyte-macro-phage colony stimulating factor. Science 254, 1779-1782.

13. Walter, M.R., Cook, WJ., Ealick, S.E., Nagabhusan, T.L., Trotta,P.P. & Bugg, C.E. (1992). Three-dimensional structure of re-combinant human granulocyte-macrophage colony stimulatingfactor. J. MoL Bio 224, 1075-1085.

14. McKay, D.B. (1992). Unraveling the structure of IL-2 [Re-sponse]. Science 257, 412-413.

15. Mott, H.R., Driscoll, P.C., Boyd, J., Cooke, R.M., Weir, M.P.& Campbell, I.D. (1992). Secondary structure of humaninterleukin-2 from 3D heteronuclear NMR experiments. Bio,chemstry 31, 7741-7744.

16. Pandit, J., Bohm, A, Jancarik, J., Halenbeck, R., Koths, K. & Kim,S. (1992). Three-dimensional structure of dimeric human re-combinant macrophage colony-stimulating factor Science 258,1358-1362.

17. Milbum, M.V., et aL & Wells, T.N.C. (1993). A novel dimer con-figuration revealed by the crystal structure at 2.4 A resolutionof human interleukin-5. Nature 363, 172-176.

18. Kaushansky, K. & Karplus, PA (1993). Hematopoietic growthfactors: understanding functional diversity in structural terms.Blood 82, 3229-3240.

19. de Vos, AM., Ultsch, M. & Kossiakoff, AA (1992). Humangrowth hormone and extracellular domain of its receptor: crys-tal structure of the complex. Science 255, 306-312.

20. Wells, JA & de Vos, AM. (1993). Structure and function ofhuman growth hormone: implications for the hematopoietins.Annu Rev. Biopbys Biomol Struct. 22, 329-351.

21. DeLamarter, J.F., Mermod, JJ., Liang, C.M., Eliason, J.F. &Thatcher, D.R. (1985). Recombinant murine GM-CSF from E.coli has biological activity and is neutralized by a specific serum.EMBO J. 4, 2575-2582.

22. Schrader, J.W., Ziltner, H.J. & Leslie, K.B. (1986). Structuralhomologies among the hematopoietins. Proc NatL Acad Sci.USA 83, 2458-2463.

23. Parry, DAD., Minasian, E. & Leach S.J. (1991). Cytokine con-formations: predictive studies. J Mo Recogn. 4, 63-75.

24. Bazan, J.F. (1992). Unraveling the structure of IL-2. Science257, 410-412.

25. Wlodawer, A, Pavlovsky, A & Gustchina, A (1993). Hematopoi-etic cytokines: similarities and differences in the structures, withimplications for receptor binding. Protein Sci 2, 1373-1382.

26. Chothia, C. & Lesk, A (1986). The relation between the di-vergence of sequence and structure in proteins. EMBO J. 4,823-826.

27. DeGrado, W.F., Wasserman, Z.R. & Lear, J.D. (1989). Proteindesign, a minimalist approach. Science 243, 622-628.

28. Osterhout, JJ., et al, & DeGrado, W.F. (1992). Characterizationof the structural properties of alB, a peptide designed to forma four-helix bundle. J. Am. Chem. Soc. 114, 331-337.

29. Chou, P.Y. & Fasman, G.D. (1978). Empirical predictions ofprotein conformation. Annu Rev. Biocbem 47, 251-276.

30. McGregor, MJ., Islam, SA & Sternberg, MJ.E. (1987). Analy-sis of the relationship between side chain conformation andsecondary structure in globular proteins. J. Mo Biol 198,295-310.

31. Janin, J. & Chothia, C. (1990). The structure of protein-proteinrecognition sites. J. BioL Chem 265, 16027-16030.

Page 15: Structural comparisons among the short-chain helical cytokineshelix D. (d) Exon structures of the structurally known members of each subfamily (references in the order of appearance:

Short helical cytokine fold Rozwarski et al. 173

32. Bazan, J.F. (1991). Genetic and structural homology of stemcell factor and macrophage colony stimulating .factor. Cell 65,9-10.

33. Godzik, A. & Skolnick, J. (1992). Sequence-structure matchingin globular proteins: applications to supersecondary and ter-tiary structure determination. Proc. Natl. Acad Sci USA 89,12098-12102.

34. Jones, D.T., Taylor, R.W. & Thornton, J.M. (1992). A new ap-proach to protein fold recognition. Nature 358, 86-89.

35. Bowie, J.U., Luthy, R. & Eisenberg, D. (1991). A method toidentify protein sequences that fold into a known three-di-mensional structure. Science 253, 164-170.

36. Ouzounis, C., Sander, C., Scharf, M. & Schneider, R. (1993).Prediction of protein structure by evaluation of sequence-struc-ture fitness. J. Mol Biol 232, 805-825.

37. Powers, R., Garret, D.S., March, CJ., Frieden, EA, Gronenbom,AM. & Clore, G.M. (1993). The high resolution three-dimen-sional solution structure of human interleukin-4 determinedby multidimensional heteronuclear magnetic resonance spectr-oscopy. Biochemistry 32, 6744-6762.

38. Kabsch, W. (1976). A solution for the best rotation to relatetwo sets of vectors. Acta Crystallogr. A 32, 922-923.

39. Kabsch, W. & Sander, C. (1983). Dictionary of protein struc-ture: pattern recognition of hydrogen-bonded and geometricalfeatures. Biopolymers 22, 2577-2637.

40. Reeck, G.R., Jackson, PJ. & Teller, D.C. (1982). Domain struc-ture in high molecular weight high mobility group nonhistonechromatin proteins. Nature 300, 76-78.

41. McLachlan, AD. (1971). Tests for comparing related amino-acid sequences: cytochrome c and cytochrome c551. J Mol.Biol. 61, 409-424.

42. Hill, C.P., Osslund, T.D. & Eisenberg, D. (1993). The struc-ture of granulocyte-colony stimulating factor and its relation-ship to other growth factors. Proc. Nattl Acad Sci. USA 90,5167-5171.

43. Senda, T., et at, & Mitsui, Y. (1992). Three-dimensional crystalstructure of recombinant murine interferon-P. EMBO J. 11,3193-3201.

44. Samudzi, C.T., Burton, L.E. & Rubin, J.R. (1991). Crystal struc-ture of recombinant rabbit interferon-y at 2.7A resolution. J.Biol. Chem. 266, 21791-21797.

45. Ealick, S.E., et al, & Bugg, C.E. (1991). Three-dimensionalstructure of recombinant human interferon-y. Science 252,698-702.

46. Arai, N., et al, & Arai, K. (1989). Complete nucleotide sequenceof the chromosomal gene for human IL-4 and its expression.J Immunol. 142, 274-282.

47. Miyatake, S., et al., & Arai, K. (1985). Structure of the chro-mosomal gene for granulocyte-macrophage colony stimulatingfactor: comparison of the mouse and human genes. EMBO J.4, 2561-2568.

48. Degrave, W., et al., & Fiers, W. (1983). Cloning and structureof the human interleukin 2 chromosomal gene. EMBO J. 2,2349-2353.

49. Kawasaki, E.S., et al, & Mark, D.F. (1985). Molecular cloningof a complementary DNA encoding human macrophage-specificcolony-stimulating factor (CSF-1). Science 230, 291-296.

50. Tanabe, T., et al, & Honjo, T. (1987). Molecular cloning andstructure of the human interleukin 5 gene. J. Biol. Chem. 262,16580-16584.

51. Roskam, W.G. & Rougeon, F. (1979). Molecular cloning andnucleotide sequence of the human growth hormone structuralgene. Nucleic Acids Res 7, 305-320.

52. Nagata, S., et al, & Yamazaki, T. (1986). The chromosomalgene structure and two mRNAs for human granulocyte colony-stimulating factor. EMBO J 5, 575-581.

53. Vodjdani, G., Coulombel, C. & Doly, J. (1988). Structure andcharacterization of a murine chromosomal fragment containingthe interferon beta gene. j Mol. Biol. 204, 221-231.

54. Lawn, R.M., et al., & Goeddel, D.V. (1981). Human fibroblastinterferon gene lacks introns. Nucleic Acids Res 9, 1045-1052.

55. Taya, Y., Devos, R., Tavernier, J., Cheroutre, H., Engler, G. &Fiers, W. (1982). Cloning and structure of the human immuneinterferon-gamma chromosomal gene. EMBO J. 1, 953-958.

56. Miller, S., Janin, J., Lesk, A.M. & Chothia, C. (1987). Interiorand surface of monomeric proteins. J. Mol. Biol. 196, 641-656.

57. Bernstein, F.C., et al, & Tasumi, M. (1977). The protein databank: a computer based archival file for macromolecular struc-tures. J Mol. Biol 112, 535-542.

58. Baker, E.N. & Hubbard, R.E. (1984). Hydrogen bonding inglobular proteins. Prog. Biophys. Mot Biol. 44, 97-177.

Received: 29 Dec 1993; revisions requested: Jan 17 1994;revisions received: 26 Jan 1994. Accepted: 27 Jan 1994.