canonical structure repertoire of the antigen-binding site of immunoglobulins suggests strong...

8
J. Mol. Biol. (1995) 254, 497–504 Canonical Structure Repertoire of the Antigen-binding Site of Immunoglobulins Suggests Strong Geometrical Restrictions Associated to the Mechanism of Immune Recognition Enrique Vargas-Madrazo 1 *, Francisco Lara-Ochoa 2 and Juan Carlos Almagro 2 * Is the structural repertoire of immunoglobulins free to adopt an almost 1 Instituto de Investigaciones infinite number of conformations to build the diversity of the immune Biolo ´gicas, Universidad response or does it take advantage of only a few conformations? In this Veracruzana, J. Barrera 54 M. A. Mun ˜oz, Xalapa paper we study this question by applying the canonical structure model to Veracruz, Mexico characterize the structural repertoire of immunoglobulins. The results found, indicate that only ten combinations out of the 300 2 Instituto de Quı ´mica possible different canonical structure classes (combinations of canonical Universidad Nacional structures), make up 87% of 381 sequences analyzed. This suggests that the Auto ´noma de Me ´xico structural repertoire of immunoglobulins is restricted to the preferential Circuito Exterior, Ciudad use of a small number of canonical structure classes. The possible Universitaria, C.P. 04510 functional significance of these results was studied by analyzing the Me ´xico, D.F. correspondence between the observed canonical structural repertoire implicit in Ig sequences and the types of antigens recognized. Two different sets of canonical structure classes were distinguished: one with preference for some specific types of antigens like proteins, polysaccharides or haptens, and the other with multi-specific binding capabilities. Analysis of antibodies of known three-dimensional structure shows that for two specific classes, the canonical conformations of H2 and L1 determine the geometrical characteristics of the antigen-binding site, while at least in one multi-specific class, the changes in the general geometry of the antigen-binding site are produced by different conformations of H3. Implications of these results for the molecular recognition process mediated by immunoglobulins are discussed. 7 1995 Academic Press Limited Keywords: canonical structure; canonical structure class; antigen-antibody complex; molecular recognition *Corresponding author Introduction Antibody molecules are highly antigen-specific receptors of the immune system. Antigen-antibody interaction involves the antibody variable domains V H and V L , each composed of a two b-sheet framework (Amzel & Poljak, 1979). The antigen- binding site is composed of six hypervariable loops; three from V H and three from V L denoted H1, H2, H3 and L1, L2, L3, respectively (Wu & Kabat, 1970; Poljak et al ., 1973). Analysis of antibodies of known three-dimen- sional structure has revealed a small number of main-chain conformations or canonical structures for H1 and H2 as well as for L1, L2, L3 (Chothia & Lesk, 1987; Chothia et al ., 1989; Tramontano et al ., 1990). A canonical structure is determined by the loop size and by the presence of certain residues at key positions in both the loop and the framework regions (Chothia & Lesk, 1987; Chothia et al ., 1989; Tramontano et al ., 1990). Based on these rules relating the amino acid sequence and the three- dimensional structure of the hypervariable loops, Abbreviations used: Ig, immunoglobulin; VL, variable light domain; VH, variable heavy domain; H1, H2 and H3, first, second and third hypervariable loop of the heavy chain, respectively; L1, L2 and L3; first, second and third hypervariable loop of the light chain, respectively; PDB, Brookhaven Protein Data Bank. 0022–2836/95/480497–08 $12.00/0 7 1995 Academic Press Limited

Upload: enrique-vargas-madrazo

Post on 07-Oct-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

J. Mol. Biol. (1995) 254, 497–504

Canonical Structure Repertoire of theAntigen-binding Site of Immunoglobulins SuggestsStrong Geometrical Restrictions Associated to theMechanism of Immune Recognition

Enrique Vargas-Madrazo 1*, Francisco Lara-Ochoa 2 andJuan Carlos Almagro 2*

Is the structural repertoire of immunoglobulins free to adopt an almost1Instituto de Investigacionesinfinite number of conformations to build the diversity of the immuneBiologicas, Universidadresponse or does it take advantage of only a few conformations? In thisVeracruzana, J. Barrera 54

M. A. Munoz, Xalapa paper we study this question by applying the canonical structure model toVeracruz, Mexico characterize the structural repertoire of immunoglobulins.

The results found, indicate that only ten combinations out of the 3002Instituto de Quımica possible different canonical structure classes (combinations of canonicalUniversidad Nacional structures), make up 87% of 381 sequences analyzed. This suggests that theAutonoma de Mexico structural repertoire of immunoglobulins is restricted to the preferentialCircuito Exterior, Ciudad use of a small number of canonical structure classes. The possibleUniversitaria, C.P. 04510 functional significance of these results was studied by analyzing theMexico, D.F. correspondence between the observed canonical structural repertoire

implicit in Ig sequences and the types of antigens recognized. Two differentsets of canonical structure classes were distinguished: one with preferencefor some specific types of antigens like proteins, polysaccharides orhaptens, and the other with multi-specific binding capabilities.

Analysis of antibodies of known three-dimensional structure shows thatfor two specific classes, the canonical conformations of H2 and L1determine the geometrical characteristics of the antigen-binding site, whileat least in one multi-specific class, the changes in the general geometry ofthe antigen-binding site are produced by different conformations of H3.Implications of these results for the molecular recognition process mediatedby immunoglobulins are discussed.

7 1995 Academic Press Limited

Keywords: canonical structure; canonical structure class; antigen-antibodycomplex; molecular recognition*Corresponding author

Introduction

Antibody molecules are highly antigen-specificreceptors of the immune system. Antigen-antibodyinteraction involves the antibody variable domainsVH and VL, each composed of a two b-sheetframework (Amzel & Poljak, 1979). The antigen-binding site is composed of six hypervariable loops;

three from VH and three from VL denoted H1, H2,H3 and L1, L2, L3, respectively (Wu & Kabat, 1970;Poljak et al., 1973).

Analysis of antibodies of known three-dimen-sional structure has revealed a small number ofmain-chain conformations or canonical structuresfor H1 and H2 as well as for L1, L2, L3 (Chothia &Lesk, 1987; Chothia et al., 1989; Tramontano et al.,1990). A canonical structure is determined by theloop size and by the presence of certain residues atkey positions in both the loop and the frameworkregions (Chothia & Lesk, 1987; Chothia et al., 1989;Tramontano et al., 1990). Based on these rulesrelating the amino acid sequence and the three-dimensional structure of the hypervariable loops,

Abbreviations used: Ig, immunoglobulin; VL, variablelight domain; VH, variable heavy domain; H1, H2 andH3, first, second and third hypervariable loop of theheavy chain, respectively; L1, L2 and L3; first, secondand third hypervariable loop of the light chain,respectively; PDB, Brookhaven Protein Data Bank.

0022–2836/95/480497–08 $12.00/0 7 1995 Academic Press Limited

Canonical Structure Repertoire of Igs498

analyses of functional germline genes (Chothiaet al., 1992; Williams & Winter, 1993; Cox et al.,1994), pseudogenes (Vargas-Madrazo et al., 1995;Almagro et al., 1995b) and mature amino acidsequences (Chothia et al., 1989; Vargas-Madrazoet al., 1995; Almagro et al., 1995b) have corroboratedthe existence of canonical structures in most Igsequences.

Following a different line of reasoning, it has beensuggested that some geometrical features of theantigen-binding site correlate with the type ofantigen recognized (Davies et al., 1990; Wilson et al.,1991; Webster et al., 1994; Wilson & Stanfield, 1993);antibodies specific for small molecules have concaveantigen-binding sites, while those that bind largermolecules such as proteins, tend to have flattenedantigen-binding sites (Rees & de la Paz, 1986; Bolger& Sherman, 1991; Wilson et al., 1991; Webster et al.,1994).

This last suggestion, taken together with the factthat canonical structures are present in most Igsequences, led us to expect that certain combi-nations of canonical structures could correlate withthe type of recognized antigen. Such kind ofcorrelation between the general architecture of theantigen-binding site and antibody function wouldprovide insight into the general mechanism of themolecular recognition process mediated by Igs. Inaddition, heuristic schemes based on this kind ofcorrelation relating the Ig amino acid sequence, itsthree-dimensional structure and its correspondingfunction could be useful for more rational de novodesign of antibodies of desired specificity.

In this paper, the above proposition is examinedby extending the concept of canonical structureclass proposed by Chothia et al. (1992) as thecombination of canonical structure types in H1 andH2, also to include the combinations with L1, L2and L3. This concept allows us to characterize thestructural repertoire implicit in Ig sequences. Thepossible correlation between different canonicalstructure classes and the recognized antigen is thenstudied by classifying the antigens in terms of theirgross chemical and biochemical characteristics, forexample, as proteins, polysaccharides, haptens, etc.Finally, the results are discussed by analyzingantibodies of known three-dimensional structure.

Results

Distribution of canonical structure classes inIg sequences

Several canonical structure types have beencurrently described for five of the six hypervariableloops that form the antigen-binding site (Chothia &Lesk, 1987; Chothia et al., 1989, 1992). Threecanonical structure types have been identified forH1, four types for H2, five types for L1, one typefor L2, and five types for L3 (Chothia & Lesk, 1987;Chothia et al., 1989, 1992). From these canonicalstructure types, the total number of possiblecanonical structure classes for the Igs is 300.

In order to study which of the possible canonicalstructure classes occur in the functional domains ofIgs, the VH and VL sequences compiled in theKabat’s Data Base (Kabat et al., 1991) wereexamined. There are approximately 4000 sequencesand sequence fragments reported in the Kabat’sData Base for each V domain. Nonetheless, toinclude a sequence in the analysis, this should fulfilfour conditions. (1) The sequence should becomplete. (2) VH and VL for the same antibodyshould be present in the data base. (3) The sequenceshould have the patterns corresponding to canonicalstructures for H1, H2, L1, L2 and L3. (4) Thesequence should have a known specificity. Appli-cation of these criteria to all of the sequencescompiled in the Kabat’s Data Base reduce thesample to 381 sequences, including 341 sequencesfrom mouse, 38 sequences from human and twosequences from rabbit (see Materials and Methodsfor the screening details).

Analysis of the aforementioned 381 sequencesshows that, of the 300 possible canonical structureclasses, only 29 are found. If classes having morethan 2% of the total number of the sequencesanalyzed are the only ones considered, the result iseven more surprising; only ten classes are sufficientto describe 86.9% of the sample, while theremaining 19 classes only describe the 13.1%(Table 1, first and second columns). That is, a mere3.3% of the total of possible canonical structureclasses represents roughly 87% of the Ig sequencesanalyzed. These results strongly suggest that theobserved structural repertoire of Igs is restricted tothe preferential use of a small number ofcombinations of canonical structures in five of thesix hypervariable loops that form the antigen-bind-ing site.

Another significant feature of the ten mostoccurring canonical structural classes observed inthe structural repertoire of Igs is that H1, L2 and L3always appear with the canonical structure type 1(see first column of Table 1). This means that H1, L2and L3 do not contribute to the variation of the mostfrequent classes; only H2 and L1 change from oneclass to another.

Canonical structure classes andgross specificities

Based on the previously exposed results, we shallnow analyze the specificities of the ten classescomprising 87% of the total sequences. With thisaim, a classification in terms of gross chemical andbiochemical nature of the recognized antigen wasintroduced (see Materials and Methods). Results arereported in columns three to eight of Table 1. Itshould be noted that specificities in these columnsare arranged in decreasing order of antigen sizefrom protein to hapten and that, at a firstapproximation, this scale represents the curvatureradius of the antigen surface.

Although the classification of antigens based onchemical and biochemical criteria here followed

Canonical Structure Repertoire of Igs499

simplifies the epitope structure, it is possible toobserve that for some classes just a few specificities(one or two) actually occur and do so with high(above 50) frequencies (see for example the class1-4-3-1-1). In contrast, in other classes (for examplethe class 1-3-2-1-1), several specificities occur at atime with low (below 50) frequencies. So, theclassification of the antigens according to their grossspecificities allows us to identify canonical struc-tural classes having a few specificities and withhigh frequencies and classes with diverse speci-ficities and with low frequencies. Therefore, basedon this behavior regarding specificity, canonicalstructure classes can be classified as specific or asmulti-specific (see Table 1). According to thisclassification, six of the ten classes shown in Table 1are preferentially specific for one type of antigen,while the remaining four are multi-specific. Thesequences classified within specific classes roughlyrepresent half of the sequences and cover almost allthe antigen types, from protein to hapten (seeTable 1). The multi-specific classes, on the otherhand, appear to be capable of interacting withvarious types of antigen and comprise the other halfof the sequences.

It is worth mentioning that, in the specific classes,the length of H2 and L1 correlate with the type ofrecognized antigen. That is, antibodies with shortloops in H2 and L1 (class 1-1-2-1-1) are preferen-tially specific for large antigens (proteins). Incontrast, antibodies with long loops in H2 and L1(classes 1-4-3-1-1 and 1-4-4-1-1) are preferentiallyspecific for small molecules (haptens). In themulti-specific classes, however, such correlationcould not be established.

Because the above analysis was made using arelatively small number of sequences (381) whencompared with the potential repertoire of 0107 to108 different antibodies in a mouse (Winter &Milstein, 1991), it would be reasonable to supposethat this sample size may be not representative ofthe natural repertoire of Igs. As mentioned before,the sample size used in this analysis is aconsequence of several restrictions that should besatisfied by the sequences to be studied. Neverthe-less, based on the above mentioned observation thatonly H2 and L1 contribute to the diversity of the tenmost used classes, it is possible to increase thesample size to test its influence on the aboveanalysis. Therefore, considering those sequences inthe Kabat’s Data Base having canonical structures inH2 and L1, independently of the pattern they havein H1, L2 and L3, the sample size can be roughlydoubled to get 678 sequences. Results for thisincreased sample are very similar to those obtainedfor the 381 sequences (data not shown), indicatingthat the obtained results are independent of thesample size. Moreover, these results prove consist-ent with the observation that H1, L2 and L3 do notcontribute to the variation of the most frequentcanonical structure classes.

Discussion

In the preceding sections we have characterizedthe structural repertoire implicit in Ig sequences byapplying the concept of canonical structure classes.The results show that the observed structuralrepertoire of Igs includes a very small amount of allpossible combinations of canonical structures.

Table 1. Distribution of canonical structure classes and its correlation with the gross specificitiesPercentages of gross specificities for each classc

Canonical Percentage ofstructure classa the classb Protein Surface antigen Polysaccharide Nucleic acid Peptide Hapten Group of

1-1-2-1-1 3.2 56d 0 0 25 0 19 S1-1-4-1-1 3.7 10 0 33 13 0 44 M1-2-1-1-1 7.1 5 5 80 5 0 4 S1-2-2-1-1 24.5 16 44 0 18 0 23 M1-2-3-1-1 2.9 57 43 0 0 0 0 S1-2-4-1-1 14.2 11 4 5 24 52 4 S1-3-2-1-1 7.9 15 26 0 31 20 8 M1-3-4-1-1 10 43 0 14 11 25 6 M1-4-3-1-1 6.8 11 0 0 0 0 89 S1-4-4-1-1 6.6 4 0 0 41 0 55 SOthers 13.1 26 17 17 6 21 13

a For each hypervariable loop the corresponding canonical structure type is reported.b Percentage for each canonical structure class with respect to the total number of sequences analyzed. Only the ten classes with

more than 2% of the sequences are reported in detail. The percentages of the remaining 19 classes are categorized as others at thebottom of the Table.

c Gross specificities as defined in Materials and Methods.d The values within the Table represent weighted frequencies for each class with respect to its gross specificity. As seen from the

number in parenthesis, sample sizes are different making the results incomparable if the absolute numbers of sequence wereconsidered. To circumvent this difficulty a weighing factor was defined as follows: [Weighing factor] = (Total size sample)/(grossspecificity size sample). For example, for anti-protein antibodies: [anti-protein weighing factor] = 381/169 = 2.25. So, for each grossspecificity the number of sequences for each class was multiplied by its correspondent weighing factor. This procedure made thenumber of sequences independent from the size of the sample of each gross specificity and therefore, amenable for comparison.

e Classification of canonical structure classes as specific or multi-specific. A class is categorized as specific and denoted by S if itpresents at least one specificity with a value above 50. Otherwise, the class is categorized as multi-specific and is denoted by M (seethe text for details).

Canonical Structure Repertoire of Igs500

Figure 1. Ribbon representation of the backbone ofantibodies fitting the following canonical structureclasses: a, 1-1-2-1-1 (preferentially specific for proteins);b, 1-2-2-1-1 (multi-specific); and c, 1-4-4-1-1 (preferen-tially specific for small molecules). To compare differ-ences between molecules belonging to the same class, twostructures of each class were superposed by fitting theatoms of the VH and VL framework regions as defined byChothia & Lesk (1987). The Fv fragments superposed ineach class are for the canonical structure class 1-1-2-1-1:Fab HyHEL-10 in blue (PDB entry: 3HFM, Sheriff et al.,1987) and Fv D1.3 in orange (PDB entry: 1VFB, Bhat et al.,1990). Both Igs are anti-lysozyme antibodies but not foroverlapping epitopes. For the canonical structure class1-2-2-1-1: Fab 36-71 in blue (PDB entry: 6FAB, Strong et al.,1991) and Fab NC41 in orange (PDB entry: 1NCA, Tulipet al., 1992). Fab NC41 is specific for neuraminidase, while

Moreover, within the observed structural repertoireroughly 87% of the sequences correspond to onlyten canonical structure classes (see Table 1). Theseten canonical structure classes, being the majorconstituents of the observed repertoire, can beclassified in two main groups regarding their abilityto bind different types of antigens. One group haspreferences for certain antigen type, while the otherhas multi-specific capabilities.

Several lines of evidence suggest that there aresignificant differences in the shape of the antigen-binding site, depending on whether a large antigensuch as a protein or a small ligand such as a haptenis bound (Davies et al., 1990; Wilson et al., 1991;Webster et al., 1994; Wilson & Stanfield, 1993). Wefound that in the specific classes the length of H2and L1 correlate with the type of recognizedantigen, while in the multi-specific classes, suchcorrelation could not be established. Thus, in orderto understand the relation between the canonicalstructural repertoire implicit in Ig sequences and itsantigen preference we analyzed the three-dimen-sional structures of the Igs reported in the PDB(Bernstein et al., 1977).

Within the classes named as specific, thosestructures that preferentially bind the largest andthe smallest antigen types were considered. Theseare class 1-1-2-1-1, that preferentially bind proteins,and class 1-4-4-1-1, that preferentially bind smallmolecules. For the multi-specific classes, the mostabundant in sequences (class 1-2-2-1-1, see Table 1)was chosen for the analysis. In this class, antibodieswith specificities as diverse as those for neu-raminidase (a very large antigen) and p-azophenyl-arsonate (a small antigen) are found.

In Figure 1, a ribbon drawing of the Ca atoms ofantibodies fitting the three aforementioned classes isdepicted. To compare differences between mol-ecules belonging to the same class, two structures ofeach class were superposed. From this Figure, asignificant difference is clear in the shape of theantigen-binding site depending on the class towhich the structure belongs. For the specific classes(Figure 1a and c), those antibodies that preferen-tially bind large molecules have rather flatantigen-binding sites which is a consequence of theshort loops these structures have in H2 and L1(Figure 1a). In antibodies that recognize smallmolecules the antigen-binding site presents a cleft,built by long loops in H2 and L1 (Figure 1c).Additionally, for these classes, H3 apparently is notessential in determining the gross geometrical

Fab 36-71 is specific for p-azophenylarsonate. For thecanonical structure class 1-4-4-1-1: Fab 4-4-20 in blue(PDB entry: 4FAB, Herron et al., 1989) and Fab BV04-01in orange (PDB entry: 1CBV, Herron et al., 1991). Fab4-4-20 is anti-fluoranescein and Fab BV04-01 is anti-trinucleotide. In order to distinguish the antigen-bindingsite of each molecule it was assigned a light color(structures in blue have the antigen-binding site in lightblue and structures in orange have the antigen-bindingsite in light orange).

Canonical Structure Repertoire of Igs 501

features of the antigen-binding site. These obser-vations relating the shape of the antigen-biding siteand the size of the antigen recognized, explain thecorrespondence between the length of H2 and L1loops and the antigen recognized found in thesequences analysis. In the multi-specific class on theother hand (Figure 1b), the shape of the antigen-binding site is in-between the above describedgeometries where, seemingly, H3 is the chief factorin modulating this shape.

In the specific classes a factor that could beconsidered to influence the shape of the antigen-binding site resides in the side-chains of thosehypervariable residues at the antigen-binding site(Alzari et al., 1990; Mian et al., 1991; Padlan et al.,1995). One way to examine whether or not thisfactor changes the gross shape of the antigen-bind-ing site is by displaying the solvent-accessiblesurface of all heavy atoms conforming it (Figure 2).The Figure so depicted shows that the overall shapeof antigen-binding site does not change, regardlessof the side-chains of amino acids of each particular

molecule. This observation indicates that the surfacegenerated by the side-chains would modulate thefine complementarity between the antibody andthe recognized epitope, but does not determine thegross complementarity between the antibody andthe type of antigen recognized. This reinforces theobservation that the canonical conformation of H2and L1 is the main factor in determining grossspecificities.

Differently to specific classes, the distinctivefeature of the structures fitting multi-specificclass 1-2-2-1-1 is the large conformational differ-ences of H3. As it is observed from a solvent-accessible surface of the molecules belonging tothis class (Figure 3a) the general geometry ofthe antigen-binding site is relatively flat in theanti-neuraminidase structure while for the anti-p-azophenylarsonate structure a central hole appearsin the antigen-binding site. These differences in theoverall shape of the antigen-binding site (relativelyflat or a central hole) are the consequence of the H3conformation (Figure 3b). Therefore, this might

Figure 2. Solvent-accessible surface of the heavy atoms that conform the antigen-binding site of Fvs belonging to thecanonical structure classes a: 1-1-2-1-1 (preferentially specific for proteins) and b: 1-4-4-1-1 (preferentially specific forsmall molecules). The solvent-accessible surface was calculated with a 1.7 A radius probe over all the heavy atoms ofthe hypervariable regions (Kabat et al. (1991) definition) using the Connolly algorithm (Connolly, 1983). View and colorcode as in Figure 1.

Canonical Structure Repertoire of Igs502

Figure 3. a, Solvent-accessible surface of the heavy atoms that conform the antigen-binding site of antibodies belongingto canonical structure class 1-2-2-1-1 (multi-specific). b, Slice showing the differences in shape of the antigen-bindingsite due to differences in the Ca conformation of H3. View and color code as in Figure 1.

explain why a correspondence between the canoni-cal structures of H2 and L1 and the type of antigenrecognized could not be established in the sequenceanalysis.

Finally, it has been well established that, togetherwith the conformation of H3 and the surfacegenerated by the side-chains, the complementaritybetween the antibody and the antigen depends onsome factors such as the variable relative dispo-sition of VH:VL domains (Davis & Metzger, 1983;Padlan, 1994) and conformational rearrangements ofthe antibody in response to the ligand binding(Colman et al., 1987; Wilson & Stanfield, 1993;Padlan, 1994). From Figure 1a and c it should benoted that no major differences are found in the Ca

atoms of the antigen-binding site between mol-ecules belonging to the same specific class. Thus,one can conclude that these factors seem to be morerelated to the fine specificity of these antibodies.

In summary, the above observations takentogether show that the observed structural reper-toire implicit in Ig sequences includes a very smallamount of all possible combinations of canonicalstructures. This suggests that antibodies have

geometrical restrictions that would be associated tothe mechanism of immune recognition. In a grosslevel of antigen-antibody interaction these geometri-cal restrictions could determine that antibodies withsimilar shapes in their antigen-binding sitesrecognize antigens with similar global geometricfeatures. In specific classes, the shape of theantigen-binding site is primarily determined by thecanonical structures of H2 and L1. In multi-specificclasses, the first level of antigen-antibody recog-nition would seem to be mainly determined by theH3 conformation. In both cases, the fine adjustmentfor an increased affinity of the antigen-antibodyinteraction should be contributed by a complexcombination of other factors such as: (1) the sidechains of the residues at the antigen-binding site(Alzari et al., 1990; Mian et al., 1991; Padlan et al.,1995); (2) variable relative disposition of VH:VL

domains (Davis & Metzger, 1983; Padlan, 1994); and(3) conformational rearrangements of the antibodyin response to the ligand binding (Colman et al.,1987; Wilson & Stanfield, 1993, 1994; Padlan, 1994).

Because no canonical structure has currently beendescribed for H3 (Chothia & Lesk, 1987; Wilson &

Canonical Structure Repertoire of Igs 503

Stanfield, 1993), the shape of the antigen-bindingsite in the multi-specific classes was madeunpredictable from the amino sequence. However,for the specific classes comprising about half of theobserved structural repertoire implicit in Igsequences, the scheme of variation by H2/L1(independently of H3) could be useful to explainoverall features of the mechanism of immunerecognition mediated by Igs. In addition, thespecific classes which are predicted starting fromcertain rules in the amino acid sequence (canonicalstructures), provides a heuristic scheme to correlatethe amino acid sequence with antibody function.This scheme could be useful to design de novoantibodies of desired specificity starting only fromthe amino acid sequence.

Materials and Methods

Ig sequences

The Ig sequences were obtained from Kabat’s DataBase (Kabat et al., 1991) via the internet on-line service inAugust 1994. The total set of sequences amounts to 4565sequences for VH and 3377 sequences for VL. From thissample, the number of sequences considered for thecomputation of the distribution of canonical structureclasses was 381. This number is a consequence of severalrestrictions that should be satisfied by the sequences to bestudied. These restrictions are (numbers in parenthesisare the amount of sequences remaining in each successivestage of categorization): (1) The sequence should becomplete (3118 and 2415 sequences for VH and VL,respectively); (2) both domains (VH and VL) of a givenantibody should be reported (1192 sequences); (3) thesequences should simultaneously have sequence patternscompatible with some canonical structure type (seebelow) for H1, H2, L1, L2 and L3 (415 sequences); and (4)the sequences needed to have a reported specificity (381sequences). The multiple sequence alignment of the 381sequences used here is available on request to the authors.

Determination of canonical structures andcanonical structure classes

Canonical structure types for H1, H2, L1, L2 and L3 inthe sequences were determined following all conventionsof numbering, localization of hypervariable loops,placement of insertions and the amino acids patternproposed by Chothia et al. (Chothia & Lesk, 1987;Tramontano et al., 1990; Chothia et al., 1992). Sequencemanagement, analysis and determination of canonicalstructures were made using the VIR package developedin our group (Almagro et al., 1995a). Canonical structureclasses are defined as the different combinations ofcanonical structure types for H1, H2, L1, L2 and L3.

Definition of gross specificities

The sample of 381 sequences includes 126 different finespecificities. Fine specificity means the specificityreported for a sequence in the Kabat’s Data Base, forexample, anti-lysozyme, anti-neuraminidase, anti-p-azophenylarsonate and so on. To correlate the canonicalstructure classes with its specificity, a classification interms of the chemical and biochemical nature of the

recognized antigen was introduced. This classificationreduced the above 126 specificities to only six groups ofgross specificities, as follows: (1) protein, (2) surfaceantigen, (3) polysaccharide, (4) nucleic acid, (5) peptide,and (6) hapten. This classification was made based onprevious categorization reported by other authors (seefor example, Bolger & Sherman, 1991; Webster et al.,1994).

In order to avoid confusion on the above classification,there are several considerations that should be high-lighted however. These are; Protein: If an antibodyrecognizes a soluble or membrane-embedded protein.The recognized protein could be an Ig itself. There couldbe many cases in which the epitope recognized by theantibody is a glyco or glyco-peptide region of the protein.This information was not considered in this classification.Surface antigen: There are some cases in which thespecificity of the antibody is ambiguous, as in ‘‘anti-RBC’’or ‘‘anti-E. coli ’’. For these cases, an ambiguous group of‘‘anti-surface antigen’’ was defined. Most of thesespecificities were expected to be anti-protein or anti-polysaccharide. Polysaccharide: All the carbohydratepolymers recognized were considered as members of thisgroup. Nucleic acid: DNA, RNA, single chain, doublechain and any compound formed by nucleotide are soclassified. Peptide: Defined as a segment of a protein ora small natural polypeptide like angiotensin. Hapten: Anysmall molecule is considered as hapten. The effect of thecarrier in the process of antigen recognition could besignificant in some cases (Kabat, 1978). At this level ofanalysis, however, it is not possible to consider thisinformation. Antibodies that recognize lipids andcatalytic antibodies, the antigen being small, were alsoincluded in this group.

AcknowledgementsWe thank Dr Eduardo Horjales for valuable discussions

on the three-dimensional analysis, V. Hernandez-Mendi-ola, M. Ramirez-Benites, P. Reidy, H. Delgado for technicalassistance and H. Cecena-Alvarez for revision andpreparation of the submitted manuscript. E.V. wassupported by SNI-Conacyt and FOMES-UV. J.C.A. wassupported by the DGAPA grant IN-206093.

ReferencesAlmagro, J. C., Vargas-Madrazo, E., Zenteno-Cuevas, R.,

Hernandez-Mendiola, V. & Lara-Ochoa, F. (1995a).VIR: a computational tool for analysis of im-munoglobulin sequences. BioSystems, 35, 25–32.

Almagro J. C., Lara-Ochoa, F. & Vargas-Madrazo E.(1995b). Structural repertoire in human VL pseudo-genes of immunoglobulins: comparison with func-tional germline genes and amino acid sequences.Immunogenetics. In the press.

Alzari, P. M., Spinelli, S., Mariuzza, R. A., Boulot, G.,Poljak, R. J., Jarvis, J. M. & Milstein, C. (1990).Three-dimensional structure determination of ananti-2-phenyloxazolone antibody: the role of somaticmutation and heavy/light chain pairing in thematuration of an immune response. EMBO J. 9,3807–3814.

Amzel, L. M. & Poljak, R. J. (1979). Three-dimensionalstructure of immunoglobulins. Annu. Rev. Biochem.48, 961–997.

Canonical Structure Repertoire of Igs504

Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer,E. F., Jr, Brice, M. D., Rodgers, J. R. Kennard, O.,Shimandouchi, T. & Tasumi, M. (1977). The ProteinData Bank. A computer-based archival file formacromolecular structures. J. Mol. Biol. 112, 535–542.

Bhat, T. N., Bentley, G. A., Fischmann, T. O., Boulot, G. &Poljak R. J. (1990). Small rearrangements in structuresof Fv and Fab fragments of antibody D1.3 on antigenbinding. Nature, 347, 483–485.

Bolger, M. B. & Sherman, M. A. (1991). Computermodeling of combining site structure of anti-haptenmonoclonal antibodies. Methods Enzymol. 203, 21–45.

Chothia, C. & Lesk, A. M. (1987). Canonical structuresfor the hypervariable regions of immunoglobulins.J. Mol. Biol. 196, 901–917.

Chothia, C., Lesk, A. M., Tramontano, A., Levitt, M.,Smith-Gill, S. J., Air, G., Sheriff, S., Padlan, E. A.,Davies, D., Tulip, W. R., Colman, P. M., Spinelli, S.,Alzari, P. M. & Poljak, R. J. (1989). Conformations ofimmunoglobulins hypervariable regions. Nature, 342,877–883.

Chothia, C., Lesk, A. M., Gherardi, E., Tomlinson, I. M.,Walter, G., Marks, J. D., Llewelyn, M. B. & Winter, G.(1992). Structural repertoire of the human VH

segments. J. Mol. Biol. 227, 799–817.Colman, P. M., Laver, W. G., Varghese, J. N., Baker, A. T.,

Tulloch, P. A., Air, G. M. & Webster, R. G. (1987).Three-dimensional structure of a complex of anti-body with influenza virus neuraminidase. Nature,326, 358–363.

Connolly, M. L. (1983). Solvent-accessible surfaces ofproteins and nucleic acids. Science, 221, 709–713.

Cox, J. P. L., Tomlinson, I. A. & Winter, G. (1994). Adirectory of human germ-line V-kappa segmentsreveals a strong bias in their usage. Eur. J. Immunol.24, 827–836.

Davies, D. R., Padlan, E. A. & Sheriff, S. (1990).Antibody-antigen complexes. Annu. Rev. Biochem. 59,439–473.

Davis, D. R. & Metzger, H. (1983). Structural basis ofantibody function. Annu. Rev. Immunol. 1, 87–117.

Herron, J. N., He, X., Mason, M. L., Voss Junior, E. W. &Edmundson, A. B. (1989). Three-dimensional struc-ture of a fluorescein-Fab complex crystallized in2-methyl-2,4-pentanediol. Proteins: Struct. Funct.Genet. 5, 271–280.

Herron, J. N., He, X. M, Ballard, D. W., Blier, P. R., Pace,P. E., Bothwell, A. L. M., Voss Junior, E. W. &Edmundson, A. B. (1991). An autoantibody tosingle-stranded DNA: comparison of the three-dimensional structures of the unliganded Fab anda deoxynucleotide-Fab complex. Proteins: Struct.Funct. Genet. 11, 159–175.

Kabat, E. A. (1978). The structural basis of antibodycomplementarity. Advan. Protein Chem. 32, 1–75.

Kabat, E. A., Wu, T. T., Perry, H. M., Gottesman, K. S. &Foeller, C. (1991). Sequences of Proteins of Immunologi-cal Interest. 5th edit., Public Health Service, N.I.H.Washington, DC.

Mian, I. S., Bradwell, A. R. & Olson, A. J. (1991). Structure,function and properties of antibody binding sites.J. Mol. Biol. 217, 133–151.

Padlan, E. A. (1994). The anatomy of the antibodymolecule. Mol. Immunol. 31, 169–217.

Padlan, E. A., Abergel, C. & Tipper, J. P. (1995).Identification of specificity determining residues inantibodies. FASEB J. 9, 133–139.

Poljak, R. J., Amzel, L. M., Avey, H. P., Chen, B. L.,Phizacherley, R. P. & Saul, F. (1973). Three-dimen-sional structure of the Fab' fragment of a humanImmunoglobulin at 2.8-A resolution. Proc. Natl Acad.Sci. USA, 70, 3305–3310.

Rees, A. R. & de la Paz, P. (1986). Investigating antibodyspecificity using computer graphics and proteinengineering. Trends Biochem. Sci. 11, 144–148.

Sheriff, S., Silverton, E. W., Padlan, E. A., Cohen, G. H.,Smith-Gill, S. J., Finzel, B. C. & Davies, D. R.(1987). Three dimensional structure of an antibody-antigen complex. Proc. Natl Acad. Sci. USA, 84,8075–8079.

Strong, R. K., Campbell, R., Rose, D. R., Petsko, G. A.,Sharon, J. & Margolies, M. N. (1991). Three-dimensional structure of murine anti-p-azophenyl-arsonate Fab 3 6-71. 1. X-ray crystallography,site-directed mutagenesis, and modeling of thecomplex with hapten. Biochemistry, 30, 3739–3748.

Tramontano, A., Chothia, C. & Lesk, A. M. (1990).Framework residue 71 is a major determinant of theposition and conformation of the second hypervari-able region in the VH domains of immunoglobulins.J. Mol. Biol. 215, 175–182.

Tulip, W. R., Varghese, J. N., Laver, W. G., Webster, R. G.& Colman, P. M. (1992). Refined crystal structure ofthe influenza virus N9 neuraminidase-NC41 Fabcomplex. J. Mol. Biol. 227, 122–148.

Vargas-Madrazo, E., Almagro, J. C. & La-Ochoa, F. (1995).Structural repertoire in VH pseudogenes of im-munoglobulins: comparison with human germlinegenes and human amino acid sequences. J. Mol. Biol.246, 74–81.

Webster, D. M., Henry, A. H. & Rees, A. R. (1994).Antibody-antigen interactions. Curr. Opin. Struct.Biol. 4, 123–129.

Williams, S. C. & Winter, G. (1993). Cloning andsequencing of human immunoglobulin V-lambdasegments. Eur. J. Immunol. 23, 1456–1461.

Wilson, I. A. & Stanfield, R. L. (1993). Antibody-antigen interactions. Curr. Opin. Struct. Biol. 3,113–118.

Wilson, I. A. & Stanfield, R. L. (1994). Antibody-antigeninteractions: new structures and new conformationalchanges. Curr. Opin. Struct. Biol. 4, 857–867.

Wilson, I. A., Rini, J. M., Fremont, D. H., Feiser, G. G. &Sture, E. A. (1991). X-ray crystalographic analysis offree and antigen-complexed Fab fragments toinvestigate structural basis of immune recognition.Methods Enzymol. 203, 153–176.

Winter, G. & Milstein, C. (1991). Man-made antibodies.Nature, 349, 293–299.

Wu, T. T. & Kabat, E. A. (1970). An analysis of thesequences of the variable regions of Bence Jonesproteins and myeloma light chains and theirimplications for antibody complementarity. J. Exp.Med. 132, 211–250.

Edited by I. A. Wilson

(Received 7 May 1995; accepted in revised form 11 September 1995)