molecular model-building by...

Molecular Model-building by Computer

In which biochemists observe models of giant molecules

as they are displayed on a screen by a computer and try

to fold them into the shapes that they assume in nature

by Cyrus Levinthal

Many problems of modern biologyare concerned with the detailedrelation between biological

function and molecular structure. Someof the questions currently being askedwill be completely answered only whenone has an understanding of the struc-ture of all the molecular components ofa biological system and a knowledge ofhow they interact. There are, of course,

a large number of problems in biologyinto which biologists have some insightbut concerning which they cannot yetask suitable questions in terms of molec-ular structure. As they see such prob-lems more clearly, however, they in-variably find an increasing need forstructural information. In our laboratoryat the Massachusetts Institute of Tech-nology we have recently started using a

MOLECULAR MODEL of a segment of cytochrome c, a protein that plays an importantrole in cell respiration, is shown as it is displayed on an oscilloscope screen. The proteinhas 104 amino acid subunits; this segment consists of units 5 through 18 (designated hereby their abbreviated names). The heme group, which acts as a carrier of electrons, is knownto be attached to amino acids 14 and 17. In the hypothetical structure shown here thisstretch of the molecule is assumed to be in the characteristic "alpha helix" configuration.

4 2 - . • '"' •

computer to help gain such informationabout the structure of large biologicalmolecules.

For the first half of this century themetabolic and structural relations amongthe small molecules of the living cellwere the principal concern of bio-chemists. The chemical reactions thesemolecules undergo have been studiedintensively. Such reactions are specifical-ly catalyzed by the large protein mole-cules called enzymes, many of whichhave now been purified and alsostudied. It is only within the past fewyears, however, that X-ray-diffractiontechniques have made it possible todetermine the molecular structure ofsuch protein molecules. These giantmolecules, which contain from a thou-sand to tens of thousands of atoms,constitute more than half of the dryweight of cells. Protein molecules notonly act as enzymes but also providemany of the cell's structural compo-nents. Another class of giant molecules,the nucleic acids, determine what kindof protein the cell can produce, but mostof the physiological behavior of a cell isdetermined by the properties of its pro-teins.

The X-ray-diffraction methods for in-vestigating the three-dimensional struc-ture of protein molecules are difficultand time-consuming. So far the struc-tures of only three proteins have beenworked out: myoglobin, hemoglobin andlysozyme [see "The Three-dimensionalStructure of a Protein Molecule," byJohn C. Kendrew, SCIENTIFIC AMERI-CAN, December, 1961, and "The Hemo-globin Molecule," by M. F. Perutz, No-vember, 1964]. In their studies of thehemoglobin molecule M. F. Perutz andhis associates at the Laboratory ofMolecular Biology in Cambridge, En-gland, have observed that the structureof the molecule changes slightly when

CYTOCHROME HELIX is rotated on the screen (left to right, topto bottom), a procedure that makes it possible for the investi-gator to perceive the three-dimensional arrangement of theatoms. A set of three-dimensional vectors connecting the atoms of

the molecule is stored in the computer's high-speed memory. Thecomputer calculates the projection of those vectors on a plane anddraws the projected vectors on the screen. The operator controlsthe rotation of the plane, thus apparently turning the model.

43

GLYCINE LEUCINE LEUCINE SERINE PHENYLALANINE

CH3 CH3

HISTIDINE LYSINE THREONINE

PROTEIN BACKBONE is a chain of peptide groups (six atoms:carbon, carbon, oxygen, nitrogen, hydrogen, carbon). Each aminoacid in a chain contributes a group to the backbone and also has a

ISOLEUCINE LEUCINE LYSINE

distinguishing side group (color tint). The amino acid sequence ofa number of proteins is known. What is shown here is a shortsegment of the protein myoglobin, which stores oxygen in muscle.

ALPHA HELIX results from the arrangement of planar peptidegroups (CCONHC) pivoted about the carbons to which side groups

(color) are attached. Shade of atoms indicates nearness to viewer.The entire helix is held rigid by hydrogen bonds (broken lines).

MYOGLOBIN MOLECULE is a folded, partly helical chain, asshown in drawing (left) of its form as determined through X-rayanalysis by John C. Kendrew of the Laboratory of Molecular

Biology in Cambridge, England. The chain enfolds an oxygen-carrying heme group (colored disk). Every 10th amino acid is num-bered. The computer model of myoglobin is shown at right.

44

oxygen is attached to it or removedfrom it. The hemoglobin molecule isthe only one for which this kind ofstudy has as yet been carried out. Itis known, however, that many proteinschange their shape as they perform theirfunctions and that their shape is furthermodified by the action of the smallmolecules that activate or inhibit them.The large number of enzyme systems in-volved in regulating the complex meta-bolic pathways of the living cell havebeen studied so far only at the level ofthe overall shape of the enzyme mole-cule; practically nothing is known of thespecific structural changes that may beimportant for enzyme function and con-trol.

Another problem currently being in-•^^ vestigated by many workers con-cerns the way in which proteins achievetheir final three-dimensional confieura-tion when they are synthesized. Duringthe past few years many of the process-es involved in protein synthesis havebecome rather well understood. As aresult one knows, at least in generalterms, how the cell determines thesequence of amino acids from whichprotein molecules are assembled andhow this sequence establishes the way inwhich the atoms of a protein are con-nected [see top illustration on oppositepage]. It is not, however, the chemicalsequence, or connectedness, that estab-lishes the functional properties of aprotein. These properties are a conse-quence of the exact three-dimensionalarrangement of the molecule's atoms inspace.. As a result of work in the past 15

years, there is now a considerable bodyof evidence showing that the three-dimensional configuration of a proteinmolecule is determined uniquely by itsamino acid sequence. The number ofpossible sequences is immense becausethe cell has at its disposal 20 kinds ofamino acid building block. The con-figuration assumed by any particularsequence reflects the fact that the mole-cule arranges itself so as to minimize itstotal free energy. In other words, eachprotein has the shape it has becauseoutside energy would be needed to giveit any other shape. The experimentalevidence for this conclusion comes fromresults obtained with many differentproteins.

The first really critical experimentsin this regard were carried out byChristian B. Anfinsen and his collab-orators at the National Institutes ofHealth with the enzyme ribonuclease,a protein consisting of 153 amino acids

COMPUTER MODEL of myoglobin, which was based on coordinates supplied by Kendrew,is rotated on the screen in this sequence (top to bottom, left to right). This display omits theheme group. The pictures are selected frames from a 16-millimeter motion-picture film.

45

ROTATION ANGLES determine the reJation of two successive peptide groups in a chain.Two such groups are shown here. The six atoms (CCONHC) of each group lie in a plane.Two adjacent planes have one atom—the carbon to which a hydrogen and a side group (R)are bonded—in common. Two rotation angles (arrows) give the relation of the two planes.

RECTANGULAR COORDINATE SYSTEMS establish the location of atoms in the model.One (left), at the amino end of the peptide chain, is the frame of reference for the entirechain. Others (right) are local systems defining positions of atoms in side groups. Thecomputer calculates the transformation that refers local systems back to the original system.

in a single chain. In our laboratory wehave studied the enzyme alkaline phos-phatase, which consists of about 800amino acids arranged in two chains.Both proteins can be treated so that theylose their well-defined three-dimensionalconfiguration without breaking any ofthe chemical bonds that establish theconnectedness of the molecule. In this"denatured" state the proteins are nolonger enzymatically active. But if thedenaturing agent is removed and theproteins are put in a solution containingcertain salts and the correct acidity, theactivity can be reestablished.

The alkaline phosphatase moleculehas two identical subunits that are in-active by themselves. They can beseparated from each other by increasingthe acidity of the solution, and they re-assemble to form the active "doublet"when the solution is made neutral onceagain. In addition the subunits them-selves can be denatured, with the resultthat they become random, or structure-less, coils. Under the proper conditionsit takes only a few minutes to reestablishthe enzymatic activity from this dis-rupted state, along with what appears tobe the original three-dimensional con-figuration of the doublet molecule. Anenzymatically active hybrid moleculecan even be formed out of subunitsfrom two different organisms. The in-dividual subunits from the two or-ganisms have different amino acid se-quences, but they fold into a shape suchthat the subunits are still able to recog-nize each other and form an activemolecule. These renaturation processescan take place in a solution containingno protein other than the denaturedone, and without the intervention ofother cellular components.

Apart from the renaturation experi-ments, the mechanism of synthesis hassuggested an additional factor that maybe relevant in establishing the correctthree-dimensional form of the protein.It is known that the synthetic processalways begins at a particular end of theprotein molecule—the end carrying anamino group (NH.,)—and proceeds tothe end carrying a carboxyl group(COOH). It is plausible to imagine thatproteins fold as they are formed in sucha way that the configuration of theamino end is sufficiently stable to pre-vent its alteration while the rest of themolecule is being synthesized. Althoughthis hypothetical mechanism seems tobe contradicted by the renaturation ex-periments just described, it may repre-sent the way some proteins are folded.Because the mechanism would placecertain constraints on the folding of a

46

protein molecule, it implies that theactive protein is not in a state in whichfree energy is at a minimum but ratheris in a metastable, or temporarily stable,state of somewhat higher energy.

A molecular biologist's understanding"^^ of a molecular structure is usuallyreflected in his ability to construct athree-dimensional model of it. If themolecule is large, however, model-build-ing can be frustrating for purely me-chanical reasons; for example, the modelcollapses. In any event, the building ofmodels is too time-consuming if onewishes to examine many different con-figurations, which is the case when oneis attempting to predict an unknownstructure. When one is dealing with thelargest molecules, even a model is notmuch help in the task of enumeratingand evaluating all the small interactionsthat contribute to the molecule's stabil-ity. For this task the help of a computeris indispensable.

Any molecular model is based on thenature of the bonds that hold particularkinds of atoms together. From the view-point of the model-builder the importantfact is that these bonds are the same nomatter where in a large molecule theatom is found. For instance, if a carbonatom has four other atoms bonded to it,they will be arranged as if they werelocated at the corners of a tetrahedron,so that any two bonds form an angleof approximately 109.5 degrees. Thelengths of bonds are even more constantthan their angles; bonds that are onlyone to two angstrom units long are fre-quently known to be constant in lengthwith an accuracy of a few percent. Ingeneral the details of atomic spacing areknown from the X-ray analysis of smallmolecules; this knowledge simplifies thetask of building models of large mole-cules.

An example of the value of suchknowledge was the discovery by LinusPauling and Robert B. Corey at theCalifornia Institute of Technology thatthe fundamental repeating bond in pro-tein structures—the peptide bond thatjoins the CO of one amino acid to theNH of the next—forms an arrangementof six atoms that lie in a plane. Thisknowledge enabled them to predict thatthe amino acid units in a protein chainwould tend to become arranged in aparticular helical form: the alpha helix.It was subsequently found that suchhelixes provide a significant portion ofthe structure of many protein molecules.Thus in advance of any crystallographicinformation about the structure of aparticular protein molecule, one knows

ROTATIONAL POSITION

TOTAL ENERGY of a configuration varies with distance between atoms, which depends onall the rotation angles. It is easy to distinguish lowest-energy point L from local valleys, ormetastable states, by visual inspection, but a computer must calculate each point in thecurve, repeating the operation often enough not to miss a valley. It must calculate theenergy at angles separated by no more than a small fraction of the interval indicated as a.

CUBING PROCEDURE eliminates unnecessary computation by identifying atoms thatare sufficiently near each other to affect the molecule's energy. The computer searches thearea around an atom residing in the center cube and reports if there are any atoms inthe 26 adjacent cubes. This also helps to reveal a molecule's edges and holes inside it.

MANIPULATION of the computer-built model is accomplished by three different rou-tines. The routine called "close" in effect puts a spring between any pair of points indicat-ed by the investigator (a). The routine "glide" causes a single helical region to be pulledalong its own axis (b). "Revolve" imposes a torque that rotates a region about its axis (c).

47

the spatial arrangement of atoms withinthe peptide bonds, as well as the de-tailed geometry of its alpha-helicalregions.

The planar configuration of the pep-tide bond allows an enormous reductionin the number of variables necessaryfor a complete description of a proteinmolecule. Instead of three-dimensionalcoordinates for each atom, all one needsin order to establish the path of thecentral chain of the molecule are thetwo rotation angles where two peptidebonds come together [see top illustra-tion on page 46]. The complete de-scription requires, in addition to thisinformation, the specification of therotation angles of the side chains inthose amino acids whose side chains arenot completely fixed.

A further reduction in the number ofvariables would be possible if one couldpredict from the amino acid sequencewhich parts of the molecule are in theform of alpha helixes. The few proteinswhose structures have been completelydetermined provide some indication ofwhich amino acids are likely to be foundin helical regions, but not enough is yetknown to make such predictions withany assurance.

Because protein chains are formed bylinking molecules that belong to a single

class (the amino acids), the linkageprocess can be expressed mathematical-ly in a form that is particularly suitedfor a high-speed digital computer. Wehave written computer programs thatcalculate the coordinates of each atomof the protein, using as input variablesonly those angles in which rotationalchanges can occur; all other angles andbond lengths are entered as rigid con-straints in the program. The method ofcalculation involves the use of a localcoordinate system for the atoms in eachamino acid unit and a fixed overallcoordinate system into which all localcoordinates are transformed.

The transformation that relates thelocal coordinate systems to the fixedcoordinate system is recalculated bythe computer program each time a newatom is added to the linear peptidebackbone. Each chemical bond istreated as a translation and a rotation ofthis transformation. The process re-quires a substantial amount of calcula-tion, but each time the backbonereaches the central atom of a new aminoacid the relative positions of the side-group atoms can be taken from the com-puter memory where this information isstored for each of the 20 varieties ofamino acid. It is then a simple matter totranslate and rotate the side-group

atoms from their local coordinate systeminto the fixed-coordinate system of theentire molecule.

The new value of the transformationat each step along the backbone is de-termined by the fixed rotation anglesand translation distances that are builtinto the computer program and by thevariable angles that must somehow bedetermined during the running of theprogram. The principal problem, there-fore, is precisely how to provide correctvalues for the variable angles. A num-ber of investigators are working onthis problem in different ways. Beforediscussing any of these approaches Ishould emphasize the magnitude of theproblem that remains even after onehas gone as far as possible in usingchemical constraints to reduce the num-ber of variables from several thousandto a few hundred. Because each bondangle must be specified with an accu-racy of a few degrees, the numberof possible configurations that can re-sult when each angle is varied by asmall amount becomes astronomicaleven for a small protein. Moreover, thesesmall rotations can produce a largeeffect on the total energy of the struc-ture [see top illustration on precedingpage].

One way to understand the difficulty

CYTOCHROME C segment seen in the two opening illustrations isshown on this page and the opposite page in the various degrees of

detail of which the system is capable. At first only the carbon atomsto which the side groups are attached are displayed, along with the

48

of finding a configuration of minimumenergy is to imagine the problem facinga man lost in a mountainous wildernessin a dense fog. He may know that with-in a few miles there is an inhabited rivervalley leading into a calm lake. He mayalso know that the lake is at the lowestpoint in the area, but let us assume thathe can only determine his own elevationand the slope of the ground in his im-mediate vicinity. He can walk downwhatever hill he is standing on, but thiswould probably trap him at the bottomof a small valley far from the lake heseeks. In finding a configuration ofminimum energy the comparable situ-ation would be getting trapped in ametastable state far from a real energyminimum. Our lost man has only twodimensions to worry about, north-southand east-west; the corresponding prob-lem in a molecule involves severalhundred dimensions.

r approach to this problem has as-umed that even sophisticated tech-

niques for energy minimization will not,at least at present, be sufficient to de-termine the structure of a protein fromits amino acid sequence in a fully auto-matic fashion. We therefore decided todevelop programs that would make useof a man-computer combination to do a

kind of model-building that neither aman nor a computer could accomplishalone. This approach implies that onemust be able to obtain information fromthe computer and introduce changes inthe way the program is running in aspan of time that is appropriate to hu-man operation. This in turn suggeststhat the output of the computer mustbe presented not in numbers but invisual form.

I first became aware of the possibili-ties of using visual output from a com-puter in a conversation with RobertM. Fano, the director of Project MACat the Massachusetts Institute of Tech-nology. (MAC stands for multiple-accesscomputer.) It soon became clear thatthe new types of visual display that hadbeen developed would permit directinteraction of the investigator and amolecular model that was being con-structed by the computer. All our sub-sequent work on this problem has madeuse of the large computer of ProjectMAC, which operates on the basis of"time-sharing," or access by many users.The system, developed by Fernando J.Corbato, allows as many as 30 people tohave programs running on the computerat the same time. For all practical pur-poses it seems to each of them that heis alone on the system. A user can have

any of his data in the high-speed mem-ory printed out on his own typewriter,and he can make whatever changes hewants in this stored data. What is moreimportant from our point of view is theability to make changes in the com-mands that control the sequential flowof the program itself.

It is true, of course, that one of 30users has access to a computer that,when it is fully occupied, has only athirtieth of the speed of the normalmachine, but for many problems this isenough to keep the man side of theman-machine combination quite welloccupied. The program that acts tosupervise the time-sharing system is or-ganized in such a way that no user caninterfere with the system or with anyother user, and the computer is notallowed to stand idle if the man takestime out to think.

In working with molecular models weare interested in being able to obtaindata quickly in order to evaluate theeffect of changing the input variablesof the program. For any particularmolecular configuration the computercan readily supply the positions (in thethree-dimensional coordinates x, tj andz) of all the atoms. The important ques-tion is: What can be done with 5,000to 10,000 numbers corresponding to the

square outline of the heme group connected to this segment (1, 2).The other atoms of the backbone are added (3, 4), then the oxygen

and hydrogen atoms (5, 6). The heme group is shown in more de-tail (7-9), and side groups are named (JO) and displayed (11, 12).

49

"CLOSING" ROUTINE is applied to a hypothetical peptide chain assumed to consist ofseveral helical sections. The individual sections do not bend, but the rotation angles be-tween sections are changed (top to bottom, left to right) to close the chain into a circle.

position of every atom in even a smallprotein? Obviously if we could formu-late specific questions concerning ener-gy, bond angles and lengths, overallshape, density and so on, the computercould calculate the answer and therewould be no need for a man ever tolook at the numerical values of the co-ordinates. We realized that our beethope of gaining insight into unexpectedstructural relations—relations that hadnot been anticipated—lay in getting thecomputer to present a three-dimensionalpicture of the molecule.

Although computer-controlled oscil-loscopes have been available for about10 years, they have been used mainlyto display numbers and letters. It isonly recently that they have been usedto produce the output of computer pro-grams in graphical form. The oscillo-scope tube can of course present only atwo-dimensional image. It is no greattrick, however, to have the computerrotate the coordinates of the moleculebefore plotting their projection. If thisis done, the brain of the human viewerreadily constructs a three-dimensionalimage from the sequential display oftwo-dimensional images. Such sequentialprojections seem to be just as useful tothe brain as simultaneous stereoscopicprojections viewed by two eyes. Theeffect of rotation obtained from thecontinuously changing projection none-theless has an inherent ambiguity. Anobserver cannot determine the direc-tion of rotation from observation of thechanging picture alone. In the ProjectMAC display system, designed by JohnWard and Robert Stotz, the rate of rota-tion of the picture is controlled by theposition of a globe on which the ob-server rests his hand; with a littlepractice the coupling between hand andbrain becomes so familiar that anyambiguity in the picture can easily beresolved.

Tn evaluating the energy of a particularconfiguration of a protein molecule

one must add up all the small inter-actions of atomic groups that contributeto this energy. These must includeinteractions not only between the dif-ferent parts of the protein molecule butalso between the parts of the proteinmolecule and the surrounding watermolecules. If we are interested in alter-ing the configuration in the direction oflower energy, we must be able to cal-culate the derivative of each of theenergy terms—that is, the direction inwhich the energy curves slope—aschanges are made in each of the rota-tion angles. Accordingly we must calcu-

50

late a very large number of interatomicdistances and the derivative of thesedistances with respect to the allowedrotation angle.

The derivatives can be calculated,however, without going through theextended transformation calculationsneeded to generate the coordinatesthemselves. The rotation around anychemical bond will cause one part of themolecule to revolve with respect to an-other. If both members of a pair ofatoms are on the same side of the bondbeing altered, the rotation will not giverise to a change in their distance. If thetwo atoms are on opposite sides of therotating bond, one of the atoms willmove in a circle around the axis of rota-tion while the other remains stationary.By analyzing the geometry of this mo-tion we can simplify the derivative cal-culations to the point where they re-quire very little computation.

Even though each of these derivativecalculations can be done in a few hun-dred microseconds of computer time,there would still be an excessive amountof calculation if it were done for allpossible pairs of atoms within the mole-cule. Fortunately the interactions withwhich we are concerned are short-rangeones, and most of the pairs of atomsare too far apart to contribute appre-

ciably to the overall energy. In order toselect out all pairs of atoms that areclose to each other, we have developeda procedure called cube-testing. All thespace in the region of the molecule isdivided into cubes of some predeter-mined size, let us say five angstroms onan edge (two or three times the typicalbond length) and each atom is assignedto a cube. To consider the interactionsinvolving any given atom one need onlydetermine the distance between thisatom and all others in the same cubeand in the 26 surrounding ones [seemiddle illustration on page 47]. Al-though the procedure is still time-con-suming, it is much faster than having todo calculations for all possible pairs ofatoms in the molecule.

In addition to enabling us to screenthe data for close pairs, the cubingprocedure provides information aboutwhich groups in the protein moleculecan interact with the surrounding watermolecules. In order to enumerate suchinteractions, we first define the "inside-ness" and "outsideness" of the molecule,outsideness meaning that a particularatom or group of atoms is accessible tothe surrounding water and insidenessthat it is not. If we examine the cubingpattern for a particular molecule, anatom on the outside would be in a cube

that is surrounded on one side by filledcubes and on the other by empty ones.In a similar way we can detect holes inthe midst of the structure by lookingfor empty cubes that are surrounded onall sides by filled ones.

T)y suitable use of derivative calcula-tion and cubing we can alter any

configuration of the molecule in the di-rection of lower energy. This procedure,however, would almost certainly lead toa structure that is trapped in one of thelocal minima—one of the higher valleysof our wilderness analogy—and may noteven be close to the true minimum-energy configuration we are looking for.For a real molecule floating in solution,the local minima would not representtraps because the normal thermal vi-bration of the molecule and its partssupplies enough energy to move thestructure out of any valley that is not atrue minimum.

Although there are various ways inwhich one can use random elements ina computer calculation to simulatethermal vibration, it is our experiencethat an investigator who is looking atthe molecule can frequently understandthe reason for the local minimum andby making a small alteration in thestructure can return the program to its

DEOXYRIBONUCLEIC ACID (DNA) is modeled by a programdevised by Robert Langridge and Andrew A. MarEwan of theChildren's Cancer Research Foundation and Harvard MedicalSchool. This DNA sequence (left to right, top to bottom) begins

with a single nucleotide: a pentagonal sugar plus a phosphategroup and a base. A second nucleotide is added, and then more tomake a helical chain that joins with a second chain to form thecharacteristic double helix. Then the helix is rotated in apace.

51

DISPLAY UNIT designed by John Ward and Robert Stotz has access to the large time-shared centra] computer at the Massachusetts Institute of Technology. The investigatorcommunicates with the computer by typing commands or punching preset buttons on akeyboard or by pointing with the light pen. He regulates the direction and the speedof rotation of the model by moving the gimbaled control on which his right hand rests.

downhill path. Such alterations can beaccomplished in the computer by chang-ing the program in such a way as tointroduce pseudo-energy terms thathave the effect of pulling on parts ofthe structure. A few simple subprogramsthat introduce the appropriate pseudo-energies enable us to do the same kindof pulling and pushing in the computerthat we can do with our hands whilebuilding actual models.

Pulling a structure by means of thesepseudo-energy terms is also useful forbuilding a model that observes all thechemical constraints and at the sametime has its atoms as close as possibleto the positions indicated by X-raydiffraction studies. In this case thepseudo-energies can be regarded assprings pulling designated atoms totheir experimentally determined posi-tions. Such calculations have beencarried out by Martin Zwick, a gradu-ate student at M.I.T., in order to makea model of myoglobin fit the configura-tion determined from X-ray data byJohn C. Kendrew and Herman Watsonat the Laboratory of Molecular Biologyin Cambridge. For this type of prob-lem a helpful procedure has been de-veloped by William Davidon of Haver-ford College. In his program one startsby "walking" in the direction of thesteepest slope, but with each successivestep one builds up information as tohow the slope of the hillside changes.

Once we had produced a computermodel of myoglobin, we could ask ques-tions concerning the relative importanceof short-range forces acting between

various parts of the molecule. There are,for example, Van der Waals forces: elec-trostatic attractions due to the electricdipoles that all atoms induce in oneanother when they are close together.There are also electrostatic interactionsthat arise from the permanent electricdipoles associated with the peptidebonds. The permanent dipole attractionsturned out to be larger than we had ex-pected. It is thus possible that the elec-trostatic interaction between different re-gions of a protein may play a substantialrole in stabilizing its structure. This re-sults in part from the fact that the elec-tric dipoles in an alpha helix are addedto one another along the direction of theaxis. For this reason two helixes thatwind in the same direction will repeleach other and two that wind in oppo-site directions will attract each other. Inmyoglobin the overall effect is a sub-stantial attractive force. This calculationrequires some form of model-building,because the electrically charged regionsare associated with the C—O and N—Hgroups along the backbone, and the hy-drogen atom is not detected in the X-rayanalysis.

Although the electrostatic interac-tion energy is of the same order of mag-nitude as that required to denature aprotein, it is probably not the dominantenergy for the folding of a protein mole-cule. The primary source of energy forthis purpose probably comes from theinteraction of the amino acid side chainsand the surrounding water; the electro-static interactions may do no more thanmodify the basic structure.

We still have much to learn about themagnitude of the various energy termsinvolved in holding a protein moleculetogether. Meanwhile we have been try-ing to develop our computer technique,using the knowledge we have. Is thisknowledge enough to enable us to findthe lowest energy state of a proteinmolecule and to predict its structurein advance of its determination by X-rayanalysis?

'T'he answer to this question probablydepends on how well we understand

what really happens when a proteinmolecule in a cell folds itself up. Ourwork has been based on the hypothesisthat the folding starts independently inseveral regions of the protein and thatthe first structural development is theformation of a number of segments ofalpha helix. Our assumption is thatthese segments then interact with oneanother to form the final molecularstructure. In this method of analyzingthe problem the units that have tobe handled independently are the heli-cal regions rather than the individ-ual amino acids. Thus the number ofindependent variables is greatly re-duced. The success of the proceduredepends, however, on the assumptionthat we can deduce from the amino acidsequence alone which regions are likelyto be helical. It is not necessary, how-ever, that we guess the helical regionscorrectly the first time; we can see whathappens when helical regions are placedin many different parts of the molecule.Several other groups working on thisproblem are following the hypothesisthat the folding proceeds only from theamino end of the protein. Until one ofthese approaches succeeds in predictingthe structure of a protein and having theprediction confirmed by X-ray analysis,we can only consider the differenthypotheses as more or less plausibleworking guides in studying the problem.

It is still too early to evaluate theusefulness of the man-computer com-bination in solving real problems ofmolecular biology. It does seem likely,however, that only with this combina-tion can the investigator use his "chemi-cal insight" in an effective way. Wealready know that we can use the com-puter to build and display models oflarge molecules and that this procedurecan be very useful in helping us tounderstand how such molecules func-tion. But it may still be a few yearsbefore we have learned just how usefulit is for the investigator to be able tointeract with the computer while themolecular model is being constructed.

52

molecular model-building by...

Documents