protein nmr structure determination with automated noe ... · viously been solved by interactive...

19
Protein NMR Structure Determination with Automated NOE Assignment Using the New Software CANDID and the Torsion Angle Dynamics Algorithm DYANA Torsten Herrmann, Peter Gu ¨ ntert * and Kurt Wu ¨ thrich Institut fu ¨ r Molekularbiologie und Biophysik Eidgeno ¨ssische Technische Hochschule-Ho ¨nggerberg CH-8093 Zu ¨ rich, Switzerland Combined automated NOE assignment and structure determination module (CANDID) is a new software for efficient NMR structure determi- nation of proteins by automated assignment of the NOESY spectra. CANDID uses an iterative approach with multiple cycles of NOE cross- peak assignment and protein structure calculation using the fast DYANA torsion angle dynamics algorithm, so that the result from each CANDID cycle consists of exhaustive, possibly ambiguous NOE cross-peak assign- ments in all available spectra and a three-dimensional protein structure represented by a bundle of conformers. The input for the first CANDID cycle consists of the amino acid sequence, the chemical shift list from the sequence-specific resonance assignment, and listings of the cross-peak positions and volumes in one or several two, three or four-dimensional NOESY spectra. The input for the second and subsequent CANDID cycles contains the three-dimensional protein structure from the previous cycle, in addition to the complete input used for the first cycle. CANDID includes two new elements that make it robust with respect to the presence of artifacts in the input data, i.e. network-anchoring and con- straint-combination, which have a key role in de novo protein structure determinations for the successful generation of the correct polypeptide fold by the first CANDID cycle. Network-anchoring makes use of the fact that any network of correct NOE cross-peak assignments forms a self-consistent set; the initial, chemical shift-based assignments for each individual NOE cross-peak are therefore weighted by the extent to which they can be embedded into the network formed by all other NOE cross- peak assignments. Constraint-combination reduces the deleterious impact of artifact NOE upper distance constraints in the input for a protein struc- ture calculation by combining the assignments for two or several peaks into a single upper limit distance constraint, which lowers the probability that the presence of an artifact peak will influence the outcome of the structure calculation. CANDID test calculations were performed with NMR data sets of four proteins for which high-quality structures had pre- viously been solved by interactive protocols, and they yielded comparable results to these reference structure determinations with regard to both the residual constraint violations, and the precision and accuracy of the atomic coordinates. The CANDID approach has further been validated by de novo NMR structure determinations of four additional proteins. The experience gained in these calculations shows that once nearly complete sequence-specific resonance assignments are available, the auto- mated CANDID approach results in greatly enhanced efficiency of the NOESY spectral analysis. The fact that the correct fold is obtained in 0022-2836/02/$ - see front matter q 2002 Elsevier Science Ltd. All rights reserved Present address: P. Gu ¨ntert, RIKEN Genomic Sciences Center, 1-7-22 Suehiro, Tsurumi, Yokohama 230-0045, Japan. E-mail address of the corresponding author: [email protected] Abbreviations used: NOE, nuclear Overhauser effect; NOESY, nuclear Overhauser enhancement spectroscopy; CANDID, combined automated NOE assignment and structure determination module; DYANA, dynamics algorithm for NMR applications; 2D, 3D, 4D, two, three, four-dimensional. doi:10.1016/S0022-2836(02)00241-3 available online at http://www.idealibrary.com on B w J. Mol. Biol. (2002) 319, 209–227

Upload: lamliem

Post on 27-May-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

Protein NMR Structure Determination with AutomatedNOE Assignment Using the New Software CANDIDand the Torsion Angle Dynamics Algorithm DYANA

Torsten Herrmann, Peter Guntert* and Kurt Wuthrich

Institut fur Molekularbiologieund BiophysikEidgenossische TechnischeHochschule-HonggerbergCH-8093 Zurich, Switzerland

Combined automated NOE assignment and structure determinationmodule (CANDID) is a new software for efficient NMR structure determi-nation of proteins by automated assignment of the NOESY spectra.CANDID uses an iterative approach with multiple cycles of NOE cross-peak assignment and protein structure calculation using the fast DYANAtorsion angle dynamics algorithm, so that the result from each CANDIDcycle consists of exhaustive, possibly ambiguous NOE cross-peak assign-ments in all available spectra and a three-dimensional protein structurerepresented by a bundle of conformers. The input for the first CANDIDcycle consists of the amino acid sequence, the chemical shift list from thesequence-specific resonance assignment, and listings of the cross-peakpositions and volumes in one or several two, three or four-dimensionalNOESY spectra. The input for the second and subsequent CANDID cyclescontains the three-dimensional protein structure from the previous cycle,in addition to the complete input used for the first cycle. CANDIDincludes two new elements that make it robust with respect to thepresence of artifacts in the input data, i.e. network-anchoring and con-straint-combination, which have a key role in de novo protein structuredeterminations for the successful generation of the correct polypeptidefold by the first CANDID cycle. Network-anchoring makes use of thefact that any network of correct NOE cross-peak assignments forms aself-consistent set; the initial, chemical shift-based assignments for eachindividual NOE cross-peak are therefore weighted by the extent to whichthey can be embedded into the network formed by all other NOE cross-peak assignments. Constraint-combination reduces the deleterious impactof artifact NOE upper distance constraints in the input for a protein struc-ture calculation by combining the assignments for two or several peaksinto a single upper limit distance constraint, which lowers the probabilitythat the presence of an artifact peak will influence the outcome of thestructure calculation. CANDID test calculations were performed withNMR data sets of four proteins for which high-quality structures had pre-viously been solved by interactive protocols, and they yielded comparableresults to these reference structure determinations with regard to both theresidual constraint violations, and the precision and accuracy of theatomic coordinates. The CANDID approach has further been validatedby de novo NMR structure determinations of four additional proteins.The experience gained in these calculations shows that once nearlycomplete sequence-specific resonance assignments are available, the auto-mated CANDID approach results in greatly enhanced efficiency of theNOESY spectral analysis. The fact that the correct fold is obtained in

0022-2836/02/$ - see front matter q 2002 Elsevier Science Ltd. All rights reserved

Present address: P. Guntert, RIKEN Genomic Sciences Center, 1-7-22 Suehiro, Tsurumi, Yokohama 230-0045, Japan.

E-mail address of the corresponding author: [email protected]

Abbreviations used: NOE, nuclear Overhauser effect; NOESY, nuclear Overhauser enhancement spectroscopy;CANDID, combined automated NOE assignment and structure determination module; DYANA, dynamics algorithmfor NMR applications; 2D, 3D, 4D, two, three, four-dimensional.

doi:10.1016/S0022-2836(02)00241-3 available online at http://www.idealibrary.com onBw

J. Mol. Biol. (2002) 319, 209–227

cycle 1 of a de novo structure calculation is the single most importantadvance achieved with CANDID, when compared with previously pro-posed automated NOESY assignment methods that do not use network-anchoring and constraint-combination.

q 2002 Elsevier Science Ltd. All rights reserved

Keywords: automated NOE assignment; network-anchoring; constraintcombination; CANDID; DYANA*Corresponding author

Introduction

In de novo three-dimensional structure determi-nations of proteins in solution by NMR spectros-copy, the key conformational data are upperdistance limits derived from NOEs.1 NOEs resultfrom cross-relaxation due to the dipole–dipoleinteractions between nearby pairs of nuclear spinsin a molecule undergoing Brownian motion,2 andin two-dimensional (2D) or higher-dimensionalheteronuclear-resolved [1H,1H]-NOESY spectrathey are manifested by cross-peaks.3,4 In order toextract informative distance constraints from aNOESY spectrum, its cross-peaks have to beassigned, i.e. the pairs of hydrogen atoms thatgive rise to the observed cross-peaks need to beidentified. These NOESY assignments are basedon 1H chemical shift values that result fromprevious sequence-specific resonanceassignments.1 However, because of the limitedaccuracy with which NOESY cross-peak positionsand chemical shift values can be measured, it is ingeneral not possible to unambiguously assign allNOESY cross-peaks on the basis of the knownchemical shift values alone, not even for smallproteins. The number of NOESY cross-peaks thatcan be unambiguously assigned from knowledgeof the 1H chemical shifts decreases rapidly withincreasing uncertainty of the information onchemical shifts and NOE peak positions,5 and maydrop below 10% of the total number of NOEcross-peaks.5 Obtaining a comprehensive set of dis-tance constraints from a NOESY spectrum of a pro-tein has therefore conventionally been an iterativeprocess in which preliminary protein three-dimensional (3D) structures calculated from afraction of the total number of distance constraintsare used to reduce the ambiguity of additionalcross-peak assignments.6 Additional difficultiesmay arise from spectral artifacts and noise, andfrom the absence of expected signals because offast relaxation. These inevitable shortcomings ofcurrent NMR data collection are the main reasonthat laborious interactive procedures are stillprominent in 3D protein structure determinations.

Interactive computer-supported NOESY assign-ment methods have been introduced that usechemical shift fits for initial assignments, and amolecular model from a preliminary structuredetermination to validate individual ones out ofthe list of possible assignments for each cross-peak.6 – 8 The user decides interactively about the

assignment and/or temporary removal of indi-vidual NOESY cross-peaks, possible taking intoaccount supplementary information such as lineshapes or secondary structure data, and performsa structure calculation with the resulting, usuallyincomplete input. In practice, several cycles ofNOESY assignment and structure calculation arerequired to obtain a high-quality structure.9 Auto-mated NOESY assignment/structure calculationmethods attempt to work along the same generalscheme without interactive interventions.10

Because in the initial phase of a structure determi-nation the number of cross-peaks with uniqueassignments based on chemical shift fitting maynot be sufficient to define the protein fold, auto-mated methods should be able to initially extractinformation also from NOESY cross-peaks thatcannot yet be assigned unambiguously. Further-more, an automated procedure must be able tosubstitute for the intuitive decisions made by anexperienced spectroscopist in dealing with spectralnoise, other artifacts, and possibly inaccuratelypositioned real NOE cross-peaks.

The programs DIANA9,11 and DYANA12 havepreviously been supplemented with the automatedNOESY assignment routine NOAH.5,13 In NOAH,the multiple assignment problem is treated bytemporarily ignoring cross-peaks with too many(typically, more than two) assignment possibilitiesand instead generating independent distance con-straints for all assignment possibilities of theremaining cross-peaks, where one takes intoaccount that part of these distance constraints maybe incorrect. NOAH requires high accuracy of thechemical shifts and peak positions in the input. Itmakes use of the fact that only a set of correctassignments can form a self-consistent network,and convergence towards the correct structure hasbeen achieved for several proteins. Another auto-mated NOESY assignment procedure, ARIA, hasbeen interfaced with the programs XPLOR andCNS,14 – 17 and a similar approach has beenimplemented by Savarin et al.18 ARIA introducedthe concept of ambiguous distance constraints forhandling of ambiguities in the initial, chemicalshift-based NOESY cross-peak assignments. Whenusing ambiguous distance constraints eachindividual NOESY cross-peak is treated as thesuperposition of the signals from each of itsmultiple initial assignments, using relativeweights proportional to the inverse sixth powerof the corresponding interatomic distance in a

210 Automated NOE Assignment with CANDID

preliminary model of the molecular structure.19,20

In this way, information from cross-peaks with anarbitrary number of initial assignment possibilitiescan be used for the structure calculation, andalthough inclusion of erroneous assignments for agiven cross-peak results in a loss of information, itwill not lead to inconsistencies as long as one orseveral correct assignments are among the initialassignments. Both of these automated methods arequite efficient for improving and completing theNOESY assignment once a correct preliminarypolypeptide fold is available, for example, basedon a limited set of interactively assigned NOEs.On the other hand, obtaining a correct initial foldat the outset of a de novo structure determinationoften proves to be difficult, because the structure-based filters used in both of these procedures forthe elimination of erroneous cross-peak assign-ments are then not operational. A third approachthat uses rules for assignments similar to the onesused by an expert to generate an initial proteinfold has been implemented in the program AUTO-STRUCTURE, and applied to protein structuredetermination.10,21

The CANDID procedure described here com-bines features from NOAH and ARIA, such as theuse of three-dimensional structure-based filtersand ambiguous distance constraints, with the newconcepts of network-anchoring and constraint-combination that further enable an efficient andreliable search for the correct fold in the initialcycle of de novo NMR structure determinations.

Algorithms

This section starts with a brief overview of theprocess of automated protein structure determi-nation with CANDID, and then presents a tech-nical description of the individual steps of theprocedure in the order in which they appear inFigure 1. The flow diagram (Figure 1) emphasizesthe new elements implemented in the CANDIDalgorithm with thick-framed boxes. The key newfeatures are “network-anchoring” of the initial,chemical shift-based NOE cross-peak assignments,and “constraint-combination”, which representsan extension of the concept of ambiguous NOEassignments.19,20 These two elements are of criticalimportance for the generation of the correct poly-peptide fold during the first cycle of a de novoprotein structure determination.

The automated CANDID method proceeds initerative cycles, each consisting of exhaustive, inpart ambiguous NOE assignment followed by astructure calculation with the DYANA torsionangle dynamics algorithm. Between subsequentcycles, information is transferred exclusivelythrough the intermediary 3D structures, in thatthe protein molecular structure obtained in agiven cycle is used to guide further NOE assign-ments in the following cycle. Otherwise, the sameinput data are used for all cycles, i.e. the amino

acid sequence of the protein, one or several chemi-cal shift lists from the sequence-specific resonanceassignment, and one or several lists containing thepositions and volumes of cross-peaks in 2D, 3D orfour-dimensional (4D) NOESY spectra. The inputmay further include previously assigned NOEupper distance constraints or other previouslyassigned conformational constraints, which willthen not be changed by CANDID, but will beused for the structure calculation.

A CANDID cycle starts by generating for eachNOESY cross-peak an initial assignment list, i.e.hydrogen atom pairs are identified that could,from the fit of chemical shifts within the user-defined tolerance range, contribute to the peak.Subsequently, for each cross-peak these initialassignments are weighted with respect to severalcriteria (listed in Figure 1), and initial assignmentswith low overall score are then discarded. In thefirst cycle, network-anchoring has a dominantimpact, since structure-based criteria cannot beapplied yet. For each cross-peak, the retainedassignments are interpreted in the form of anupper distance limit derived from the cross-peakvolume. Thereby, a conventional distance con-straint is obtained for cross-peaks with a singleretained assignment, and otherwise an ambiguousdistance constraint is generated that embodiesseveral assignments.19,20 All cross-peaks with apoor score are temporarily discarded. In order toreduce deleterious effects on the resulting structurefrom erroneous distance constraints that may passthis filtering step, long-range distance constraintsare incorporated into “combined distance con-straints” (Figure 1). The distance constraints arethen included in the input for the structure calcu-lation with the DYANA torsion angle dynamicsalgorithm.

The structure calculations described here com-prise seven CANDID cycles. The second and sub-sequent CANDID cycles differ from the first cycleby the use of additional selection criteria for cross-peaks and NOE assignments that are based onassessments relative to the protein 3D structurefrom the preceding cycle. Since the precision ofthe structure determination normally improveswith each subsequent cycle, the criteria for accept-ing assignments and distance constraints aretightened in more advanced cycles of the CANDIDcalculation. The output from a CANDID cycleincludes a listing of NOESY cross-peak assign-ments, a list of comments about individual assign-ment decisions that can help to recognize potentialartifacts in the input data, and a 3D proteinstructure in the form of a bundle of conformers.

In the final CANDID cycle (cycle 7), anadditional filtering step ensures that all NOEshave either unique assignments to a single pair ofhydrogen atoms, or are eliminated from the inputfor the structure calculation. This allows for thedirect use of the CANDID NOE assignments insubsequent refinement and analysis programs thatdo not handle ambiguous distance constraints,

Automated NOE Assignment with CANDID 211

and enables here direct comparisons of theCANDID results with the corresponding dataobtained by conventional, interactive procedures.

Experimental input data

The formats of the input files containing theamino acid sequence, chemical shifts, NOE cross-peak positions and volumes, and possiblepreviously assigned cross-peaks or distance con-straints for use in the structure calculation(Figure 1) are compatible with the programsXEASY,22 which supports interactive sequentialassignment protocols, and DYANA.12

Initial assignment list

The chemical shift value of an atom, a, in theinput chemical shift list for a NOESY spectrum,S, is denoted VS

a ^ DVSa; where DVS

a describes atolerance range that allows for the limited pre-cision of experimental chemical shift determi-nations. The absence of a chemical shift value forthe atom a is indicated by setting VS

a ¼ 1: A dif-ferent chemical shift list may be used for eachNOESY spectrum from which peaks are extracted.However, if experimental conditions are carefullymatched when recording multiple NOESY spectra,it is advantageous to generate a single list ofchemical shifts for all the spectra.

Figure 1. Flow chart of NMRstructure determination using theCANDID method for automatedNOE cross-peak assignment.

212 Automated NOE Assignment with CANDID

A peak, p, in an input peak list for the 2D, 3D or4D NOESY spectrum S is characterized by Dchemical shift coordinates, v

pi ði ¼ 1; 2;…;DÞ; by

corresponding tolerance ranges, Dvpi ; and by its

volume, I p. The dimensions of the spectrum arechosen such that v

p1 and v

p2 denote the two 1H

chemical shifts. Each hydrogen atom, a, iscovalently bound to a “heavy” atom, h(a). Ifapplicable, v

p3 and v

p4 refer to the chemical shifts

of the 13C or 15N atoms that are covalently boundto the 1H atoms represented by the dimensions 1and 2, respectively. CANDID can work simul-taneously with several NOESY peak lists ofdifferent dimensionality.

A 1H atom, a, is assigned to dimension i ði ¼ 1; 2Þof a peak, p, from the NOESY spectrum S if thechemical shift value VS

a agrees with the peakposition v

pi within a given tolerance range, i.e. if:

vpi 2VS

a

�� �� # max Dvpi ;DV

Sa

� �ð1Þ

and in the case of 3D or 4D NOESY spectrum:

vpiþ2 2VS

hðaÞ

��� ��� # max Dvpiþ2;DV

ShðaÞ

� �ð2Þ

If Api ði ¼ 1; 2Þ is the set of all hydrogen atoms a that

satisfy the conditions of equations (1) and (2), thenthe chemical-shift based initial assignments for thepeak p are given by the direct product of the setsA

p1 and A

p2; i.e. by the ordered pairs of hydrogen

atoms, (a,b), with a [ Ap1 and b [ A

p2: All peaks

with at least one initial assignment on the diagonal,i.e. a ¼ b; are eliminated from further consider-ation as cross-peaks.

Ranking of the initial assignments

A NOESY cross-peak with a single initial assign-ment gives rise to a conventional upper distanceconstraint, whereas a NOESY cross-peak withn $ 2 initial assignments gives rise to anambiguous distance constraint19,20 that will not dis-tort the protein structure by the inadvertentinclusion of incorrect initial assignments as longas the correct assignment is also present amongthe initial assignments. However, since anambiguous distance constraint has a reduced infor-mation content, it may nonetheless be difficult forthe structure calculation to converge to the correctstructure. It is therefore important, in as far aspossible, to eliminate incorrect initial assignmentsbefore the start of the structure calculation. Forthis filtering process, the initial assignments areranked by their generalized relative contributions,and only the assignments with sufficiently highcontributions are retained as conventional orambiguous distance constraints for the structurecalculation.

This generalized relative contribution, Vk, of aninitial assignment, k, to a cross-peak volume isgiven by the normalized total score from fourstructure-independent and one structure-based

term (see Figure 1), and is defined by:

Vk ¼Ck minðTkOkNk; SmaxÞDkXn

i¼1

Ci minðTiOiNi; SmaxÞDi

ð3Þ

such thatPn

k¼1 Vk ¼ 1: Ck is the weight for thecloseness of the chemical shift fit. The otherweighting factors, Tk, Ok, Nk and Dk are related tothe presence of symmetry-related cross-peaks inthe NOESY spectra, the compatibility with thecovalent polypeptide structure, the convergence ofnetwork-anchoring, and the compatibility with theprotein 3D structure. These factors will be definedin the following sections. The product of the threeweighting factors, Tk, Ok and Nk is capped at auser-defined maximal value, Smax.

Ranking of the initial assignments by thecloseness of the chemical shift fit

For the purpose of discriminating betweenmultiple initial assignments, the agreementbetween peak coordinates, v

pi ði ¼ 1; 2;…;DÞ; and

the chemical shifts of the corresponding initialassignment, VS

aiða1 ¼ a; a2 ¼ b; a3 ¼ hðaÞ; a4 ¼

hðbÞÞ; is quantified by a Gaussian weighting factor:

Ck ¼ Cpa;b

¼ exp 21

2

XD

i¼1

vpi 2VS

ai

Gmax Dvpi ;DV

Sai

� �0@

1A

20@

1A ð4Þ

which has the value 1.0 for a perfect fit. G is a user-defined parameter that determines the weight ofclose chemical shift alignment relative to the otherranking criteria.

Ranking of the initial assignments by thepresence of symmetry-related cross-peaks in3D and 4D heteronuclear-resolvedNOESY spectra

Assume that a peak, p, in a 3D NOESY spectrumhas an initial assignment (a,b,h(a)). Then, peak p p,at the position ðv

pp

1 ;vpp

2 ;vpp

3 Þ is in the transposedposition with respect to the initial assignment(a,b,h(a)) if:

vpp

i 2VSai

��� ��� # max Dvpp

i ;DVSai

� �ði ¼ 1; 2; 3Þ ð5Þ

where a1 ¼ b; a2 ¼ a and a3 ¼ hðbÞ: 4D NOESYspectra can be treated in an analogous manner. Ifa transposed peak is found by the criterion ofequation (5), a weight Tk ¼ T q 1 is attached tothe corresponding initial assignment, k, where T isa user-defined constant, and otherwise Tk is set tounity. To prevent arbitrary discrimination amongthe initial assignments for a given peak, values ofTk – 1 are used only if the heavy atoms h(b) havebeen assigned for all initial assignments of thispeak.

Automated NOE Assignment with CANDID 213

Ranking of the initial assignments by thecompatibility with the covalentpolypeptide structure

The fixed bond lengths, bond angles andchiralities of the covalent structure impose NOE-observable upper limits on certain intraresidualand sequential distances.1,9,23,24 In CANDID theseconformation-independent upper limits, uðccÞ

ab ; arecomputed analytically for atom pairs (a,b) that areseparated by one or two torsion angles. CANDIDgives a weight Ok ¼ O q 1 to an initial assignmentk if the corresponding distance cannot exceed auser-defined maximal value, dmax, where O is auser-defined constant, and otherwise Ok is set tounity. This discrimination in favor of assignmentsthat are expected to yield observable NOEs in allpossible conformations of the protein correspondsto the common treatment of short-range 1H– 1Hconnectivites by experienced spectroscopists in thecourse of interactive NOE assignments.1

Ranking of the initial assignments by thecompatibility with the intermediate 3Dprotein structure

The structure-based volume contribution Dk, ofan initial assignment, k ðk ¼ 1;…; nÞ; is calculatedas an average over all M conformers in the prelimi-nary structure bundle:

Dk ¼1

M

XMj¼1

dðjÞakbk

= �dðjÞ� �2h

ð6Þ

where dðjÞak;bk

denotes the distance between the twoatoms ak and bk in conformer j, and:

�d ¼Xn

k¼1

d26akbk

!21=6

ð7Þ

For the isolated spin pair approximation, h ¼ 6;but smaller values may be used to reduce thesensitivity of Dk to structural variations in thesituation where only an imprecise preliminarystructure is available as a reference. In the absenceof a structure during the first CANDID cycle,uniform weights, Dk ¼ 1=n; are applied for allinitial assignments.

Ranking of the initial assignments of network-anchoring

Network-anchoring exploits the observation thatthe correctly assigned constraints form a self-consistent subset in any network of distance con-straints that is sufficiently dense for the determi-nation of a protein 3D structure. Network-anchoring thus evaluates the self-consistency ofNOE assignments independent of knowledge onthe 3D protein structure, and in this way compen-sates for the absence of 3D structural informationat the outset of a de novo structure determination.The requirement that each NOE assignment must

be embedded in the network of all other assign-ments makes network-anchoring a sensitiveapproach for detecting erroneous, “lonely” con-straints that might artificially constrain unstruc-tured parts of the protein. Such constrains mightnot lead to systematic constraint violations duringthe structure calculation, and could therefore notbe eliminated by 3D structure-based peak filters.

The weighting factor for network-anchoring foran initial assignments of a NOESY cross-peak,Nk ¼ Nab; is calculated as follows. All atomsg – a,b are searched that are connected simul-taneously to both atoms a and b, either by initialassignments of other NOE cross-peaks, or becausethe covalent polypeptide structure implies that thedistance must be sufficiently short to produce aNOE. In addition, the atoms g are required to bein the same residue as either a or b, or in one ofthe neighboring residues. Nab is then defined asthe sum over all indirect pathways that connectthe atoms a and b via a third atoms, g:

Nk ¼ Nab ¼Xg

ffiffiffiffiffiffiffiffiffiffiffiffiffiffinagnbg

pð8Þ

Nab thus represents the number of indirect connec-tions between the atoms a and b through a thirdatom g and their impact on the network-anchoring,with the sum of equation (8) running over allatoms g as defined above. The weights for the con-nections a to g, nag, are defined by:

nag ¼ ~naguð~nag 2 n minÞ; with ~nag

¼ maxX

p

VðpÞag ;VðccÞ

ag

!ð9Þ

where u is the Heaviside function, and nmin is athreshold for the minimal contribution that will beconsidered. ~nag is the sum of all generalizedvolume contributions (equation (3)) taken over allthe peaks with an assignment (a,g), where VðccÞ

ag

ensures that there is a minimal contribution forpairs of hydrogen atoms with intraresidual andsequential relative positioning:

VðccÞag ¼

nmax if uðccÞag # dmax;

nmin all other intraresidual

or sequential combinations;

0 all longer-range connectivities:

8>>>>>>><>>>>>>>:

ð10Þ

The calculation of the network-anchoring contri-bution by equations (8)–(10) is recursive in thesense that its evaluation for a given peak requiresthe knowledge of the generalized volume contri-butions (equation (3)) from other peaks, which inturn is a function of some of these same network-anchoring contributions. Therefore, the calculationof these quantities is iterated alternatingly threetimes. If separate peak lists from different NOESY

214 Automated NOE Assignment with CANDID

spectra are used, the peaks from all peak listscontribute simultaneously to network-anchoring.

Finally, a measure of residue-wise network-anchoring between residues A and B, defined by:

�NAB ¼Xa[A

Xb[B

Nab ð11Þ

is computed by CANDID.

Elimination of low-ranking initial assignments

The list of initial assignments is screened forhigh values of the generalized contributions, Vk,(equation (3)) to the total peak volume, and onlythose assignments are retained for whichVk $ Vmin:

15 These assignments will then be usedto generate distance constraints for the structurecalculation.

In the last CANDID cycle, the remainingambiguous NOE assignments are either replacedwith unambiguous assignments to unique pairs ofprotons or the corresponding NOESY cross-peaksare discarded as a source of information for thestructure calculation. Technically, this filtering isachieved by increasing the acceptable minimalgeneralized volume contribution for an initialassignment to be retained in the input for the struc-ture calculation, Vmin, to a value larger than 50%.

Calibration of NOE upper distance constraints

A NOESY cross-peak with a single assignment,(a,b), gives rise to an upper bound, b, on the dis-tance between the two hydrogen atoms, a and b,dab, in the molecular structure: dab # b: A NOESYcross-peak with n $ 2 assignments can be inter-preted as the superposition of n signals giving riseto an ambiguous distance constraint:19,20

�d ¼Xn

k¼1

d26akbk

!21=6

# b ð12Þ

Each of the distances dakbkcorresponds to one

assignment, (ak,bk). In CANDID the upper distancebound b from a peak p in the NOESY spectrum Swith volume I p and n assignments ðak;bkÞ; k ¼1;…; n; is computed as:

b ¼Xn

k¼1

IpVkffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiQS

akQS

bk

q0B@

1CA

21=6

ð13Þ

QSa and QS

b are atomic calibration constants for theatoms a and b in the spectrum S. In commonpractice, equal calibration constants are used forgiven types of atoms, such as for all backboneatoms, all side-chain atoms, or all methyl groups.9

CANDID provides a choice of three ways to setthe atomic calibration constants, i.e. fixed user-defined calibration, automated structure-independent calibration, or automated structure-based calibration. Fixed calibration uses QS

a and

QSb values chosen by the user, which will be held

constant throughout all cycles of a CANDID calcu-lation. The two automated methods do not requiresuch explicit input from the user. Automatedstructure-independent calibration defines the cali-bration constant in such a manner that the averageof the upper distance bounds for all peaks involv-ing a given combination of atom types attains apredetermined value.5 Structure-based automatedcalibration sets the calibration constant such thatthe available preliminary structure does not violatemore than a predetermined percentage of theupper distance bounds.

Elimination of spurious NOE cross-peaks

Identification and elimination of potentiallyerroneous NOE cross-peaks is an essential step infinding the correct protein 3D structure, since theexperimental input data typically contain hardlyavoidable imperfections. Usually, a limited numberof peaks can therefore not be assigned correctly,e.g. because they correspond to noise artifacts,some peak positions have been determined with alarger error than the chemical shift tolerance, thechemical shift list is incomplete, some peakintegrals have been severely overestimated, andsimilar.

To minimize deleterious effects on the resultingstructure, CANDID applies four different peakfilters, so that a distance constraint is derived froma given peak only if the following conditions aremet: (i) at least one of the generalized contri-butions, Vk (equation (3)), exceeds the thresholdvalue Vmin; (ii) the number of assignments isbelow a user-defined maximal value nmax; (iii) thecorresponding distance constraint is not violatedby more than a cutoff value, dcut, in more than auser-defined percentage of the number of con-formers used to represent a preliminary structure(only in the second and subsequent CANDIDcycles, see Figure 1); (iv) the assignments ofthe peak are well anchored in the network of theassignments of all peaks, as quantified by thefollowing: kf lp ¼

Pnk¼1 Vkfk is the average of a

quantity, f, over the n assignments of a peak, p,weighted by their generalized volume contri-butions, Vk. To be accepted, a peak, p, must eithersatisfy the single condition of having a highaverage network-anchoring per residue (equation(11)), k �Nlp $ �Nhigh; or the combined condition ofhaving a minimal value of average network-anchoring per residue and per atom, k �Nlp $ �Nmin

and kNlp $ Nmin; where �Nhigh; �Nmin and Nmin areconstant parameters of CANDID, with �Nhigh .�Nmin:

Constraint-combination

In the practice of NMR structure determinationwith biological macromolecules, spurious distanceconstraints in the input may arise from mis-interpretation of stochastic noise, and similar, as

Automated NOE Assignment with CANDID 215

well as from real signals that involve atoms thatare not included in the chemical shift list. Thissituation is particularly critical at the outset of astructure determination, before the availability ofa preliminary structure for 3D structure-basedscreening of constraint assignments. Constraint-combination aims at minimizing the impact ofsuch imperfections on the resulting structure atthe expense of a temporary loss of information.Constraint-combination is applied in the earlyCANDID cycles (the first two cycles in the calcu-lations here, unless noted otherwise). It consists ofgenerating virtual distance constraints withcombined assignments from different, in generalunrelated cross-peaks. The basic property ofambiguous distance constraints is that the constraintwill be satisfied by the correct protein structure pro-vided that at least one of the assignments in the com-bined constraint is correct. Overall, combinedconstraints therefore have a lower probability ofbeing erroneous than the individual constraints.

Two different modes of constraint-combinationhave been implemented in CANDID (further com-bination modes can readily be envisaged), i.e.“2 ! 1 combination” of all long-range assignmentsof two peaks into a single new, virtual constraint,and “4 ! 4 pairwise combination” of the long-range assignments of four peaks into four new,virtual constraints. Constraint-combination isapplied only to the long-range peaks, i.e. peaks forwhich all assignments are to pairs of atoms separ-ated by at least five residues in the sequence,1

because the effect of an erroneous long-range con-straint on the global fold of a protein is muchstronger than that of erroneous short and medium-range constraints. For further description of thetwo modes of constraint-combination we denotethe set of all assignments of a given peak with anupper case letter. 2 ! 1 combination replaces twoconstraints with the assignment sets A and B by asingle, ambiguous constraints with assignment setA < B (the union of sets A and B ). 4 ! 4 pairwisecombination replaces four constraints with theassignment sets, A, B, C and D by four ambiguousconstraints with assignment sets A < B, A < C,A < D and B < C. To increase the efficiency of4 ! 4 pairwise combination, the long-range peaksare sorted according to their total residue-wisenetwork-anchoring, i.e. by the sum of the quan-tities defined in equation (11) over all assignmentsof the peak, and the assignment sets A, B, C and Dare selected from the first, second, third, and fourthquarter of the sorted peak list, respectively. Thenumber of long-range constraints is halvedby 2 ! 1 combination, whereas 4 ! 4 pairwisecombination preserves more of the intrinsicstructural information, since the number of con-straints is unchanged. It furthermore takes intoaccount that certain peaks and their assignmentsare more reliably documented than others, becausein the combined constraints the assignment setsA, B, C and D are used 3, 2, 2 and 1 times,respectively.

A quantitative impression of the effect of con-straint-combination on the input for a structurecalculation can be obtained from the following con-siderations. For an experimental data set contain-ing N long-range peaks, we assume a uniformprobability, p ! 1, that any one of these peakswould lead to an erroneous constraint. By 2 ! 1constraint-combination the N experimental con-straints are substituted by N/2 virtual constraintswith a uniform error probability p 2 ! p. In thecase of 4 ! 4 constraint-combination, we assumethat N long-range peaks can be grouped into fourclasses with error probabilities ap, p, p and(22a)p, so that the overall probability for an inputconstraint to be erroneous is again p. The para-meter a, with 0 # a # 1, accounts for the strongerexperimental evidence for the presence of thepeaks in the first class when compared to those inthe two middle classes and the fourth class (seeabove). The N virtual long-range constraintsobtained after 4 ! 4 combination have an overallerror probability of ðaþ ð1 2 a2Þ=4Þp2; which issmaller than p 2 for a , 1, i.e. whenever the rankingof the experimental constraints for variablereliability was successful. For example, 4 ! 4 con-straint-combination with a ¼ 0:5 will transform aninput data set of 900 correct and 100 erroneouslong-range cross-peaks (i.e. N ¼ 1000, p ¼ 0.1) intoa new set of approximately 993 correct and sevenerroneous virtual combined constraints. For thesame system, 2 ! 1 constraint-combination willyield approximately 495 correct and five erroneousvirtual combined constraints. For all calculationshere, 4 ! 4 constraint-combination was used inthe first two CANDID cycles, unless notedotherwise.

The upper distance bound b for a virtual com-bined constraint is derived from the two upper dis-tance bounds b1 and b2 of the two parentexperimental constraints either as the r26-sum, b ¼ðb26

1 þ b262 Þ21=6 (which was used for the calcu-

lations here), or as the maximum, b ¼ maxðb1; b2Þ:The first choice minimizes the loss of informationin situations where two correct constraints arecombined, whereas the second choice avoids intro-ducing a spurious shorter upper bound if a correctand an erroneous constraint are combined.

Structure calculation using DYANA torsionangle dynamics

The program DYANA was adapted for auto-mated structure determination in conjunction withCANDID, so that it accepts ambiguous distanceconstraints in the input and generally interactsefficiently with CANDID. Each CANDID cycle iscompleted by a structure calculation using the fastDYANA torsion angle dynamics algorithm withthe standard simulated annealing schedule,whereby the input comprises the list of distanceconstraints from CANDID, and possibly additionalconformational constraints from other sources(Figure 1). To minimize loss of information,

216 Automated NOE Assignment with CANDID

constraints relating to degenerate groups of protonsare expanded into ambiguous distance constraintsinvolving all hydrogen atoms of the correspondingdegenerate group(s),25 and the absence of stereo-

specific assignments for diastereotopic groups istreated by periodic optimal swapping of the pairsof diastereotopic atoms for minimal target functionvalue during the simulated annealing.26

Table 1. Experimental chemical shift assignments and NOESY peak lists used for the final structure calculations basedon interactive analysis of the NOSEY spectra of four proteins that have been used here to validate the CANDIDprocedure

Proteina Size (residues) Assigned chemical shifts (%)b NOESY spectrac Peaks pickedd

CopZ 68 94.9 2D, H2O 11753D (15N), H2O 1063

WmKT 88 97.0 2D, H2O 1998bPrP(121–230) 110 98.1 3D (15N), H2O 1893

3D (13C), H2O 3859P14a 135 99.4 3D (15N), H2O 1457

3D (13C), H2O 30552D, H2O 19252D, 2H2O 2001

a CopZ: apo-from of the copper chaperone Z;38 WmKT: killer toxin from the yeast Williopsis mrakii;32 bPrP(121–230): globulardomain of the bovine prion protein comprising residues 121–230;39 P14a: pathogenesis-related protein from tomato leaves.36

b Percent of the total number of non-labile hydrogen atoms and backbone amide protons for which the chemical shifts wereassigned. Pairs of diastereotopic protons or methyl groups are considered to be assigned when at least one of the 1H chemical shiftsis known.

c Notation used: 2D, 2D [1H,1H]-NOESY; H2O, solvent of 95% H2O/5% 2H2O; 2H2O, solvent of 100% 2H2O; 3D (15N), 3D 15N-resolved[1H,1H]-NOESY; 3D (13C), 3D 13C-resolved [1H,1H]-NOESY.

d Number of interactively picked NOESY cross-peaks. Not all of these peaks were assigned by the spectroscopist in the final stageof the structure determination (see Table 2 below).

Table 2. Experimental input for the final structure calculations of the four test proteins of Table 1 and statistics of thestructure determinations using either interactive or automated NOESY assignment

Quantity CopZ WmKT bPrP(121–230) P14a

A. CANDIDNOE cross-peaks assigneda 1025 1865 1670 1340

887 3538 291616741715

NOE upper distance limitsb 937 1223 2091 1885Residual DYANA target function (A2)c 1.56 1.74 2.19 4.16RMSD (A)d 0.53 0.76 0.57 0.82

B. Interactive structure determinationNOE cross-peaks assigneda 1024 1421 1493 1248

947 3310 2636244210

NOE upper distance limitsb 993 1053 1797 1701Residual DYANA target function (A2)c 1.67 3.51 0.79 3.13RMSD (A2)d 0.42 0.68 0.58 0.88

C. Comparisons between CANDID and interactive assignmentPeaks with identical assignment (%)e 92.3 92.3 94.1 92.6Peaks with different assignments (%)f 4.7 4.0 3.2 2.4Peaks assigned interactively but not by CANDID (%) 3.0 3.7 2.7 5.0Average rank of the interactive assignment after cycle 1g 1.14 1.08 1.11 1.08Average rank of the interactive assignment after cycle 6g 1.06 1.05 1.04 1.03RMSD between mean structures (A) 0.67 0.80 1.11 1.09

a On the basis of the experimental input data of Table 1. The number of assigned NOE cross-peaks given for each protein from topto bottom corresponds to the listing of the NOESY spectra in Table 1.

b Number of NOE upper distance limits that represent conformational restraints on the polypeptide fold.c The residual DYANA target function value is the average for the bundles of conformers representing the NMR structure. The

target function values before energy minimization are given.d The RMSD is the average of the RMSD values between the individual conformers in the bundle and their mean coordinates for

the backbone atoms N, Ca, C0 of residues 2–67 for CopZ, 4–39 and 47–87 for WmKT, 128–166 and 172–223 for bPrP(121–230), and2–134 for P14a. The RMSD values after energy minimization are given.

e Peaks with identical assignments are those for which the interactive assignment is the same as the assignment made by CANDID.f Peaks with different assignment have been assigned by both approaches, but the interactive assignment is to a different pair of

protons than by CANDID.g The initial assignment of each NOE cross-peak are sorted by decreasing generalized volume contribution (equation (3)). The

average rank given for an assignment from the interactive approach is relative to all retained CANDID assignments.

Automated NOE Assignment with CANDID 217

Results

CANDID calculations with experimentaldata sets

Automated combined structure determinationand NOE assignment with CANDID was validatedusing experimental input data set that had beenprepared for previous conventional NMR structuredetermination of four proteins (Table 1; see alsoMaterials and Methods). The four proteinsrepresent different molecular sizes, differentsecondary structure types, and different isotope

labeling strategies (Table 1). Between 12% and49% of the peaks that had been picked were leftunassigned in the conventional structure determi-nations of the four proteins (Tables 1 and 2). Forthe CANDID calculation the unassigned NOESYpeak lists were used, and seven cycles of CANDIDassignment and DYANA structure calculation wereperformed (see Materials and Methods).

Table 2 provides an overview of the results.Between 88% and 93% of all NOESY peaks wereassigned by CANDID, and for all proteins a lowfinal target function value and a small RMSDwere obtained, which is by conventional criteria

Figure 2. Evolution of character-istic parameters for NMR structuresin the course of the seven cycles ofCANDID structure calculation forthe four proteins CopZ (blue),WmKT (green), bPrP (121–130)(black) and P14a (red). (a) Averagenumber of assignments per NOEcross-peak retained in the input forthe structure calculation. In cycle 7,an additional filtering step enforcesthe value 1.0 (see text). (b) Per-centage of interactively assignedpeaks that were discarded byCANDID. (c) Average final targetfunction value for the bundle ofconformers representing the resultof the DYANA structure calcu-lation. (d) RMSD calculated as theaverage of the RMSD valuesbetween the individual conformersin the bundles and their meancoordinates. (e) RMSD drift, calcu-lated as the RMSD between themean coordinates of the bundles ofconformers obtained after the kthand the seventh CANDID cycles.(f) RMSD between mean structures,calculated as the RMSD betweenthe mean coordinates of thebundles of conformers obtainedafter the kth CANDID cycle andthe final result of the interactivereference structure determination.All RMSD values are calculated forthe backbone atoms N, Ca and C0

of the well-defined segments of thepolypeptide chains given in Table 2.

218 Automated NOE Assignment with CANDID

indicative of a high-quality structure determi-nation. The evolution of various quality par-ameters in the course of the seven CANDID cyclesis illustrated in Figure 2. The average number ofambiguous assignments per cross-peak (Figure2(a)) decreases from about three in cycle 1 toabout 1.5 in cycle 6, whereby in cycle 6 most ofthe ambiguous assignments come from peaks witheither two or three possible assignments. In cycle7 unambiguous assignment is enforced. The frac-tion of interactively assigned peaks that are leftunassigned by CANDID remains below 6%throughout, and varies only slightly in the courseof the calculation (Figure 2(b)). A slight increase ofthis parameter towards the later cycles reflects theincreasing stringency of the filtering criteria. Thepresence of a significant number of artifact cross-peaks in the input peak list, which had beendiscarded in the conventional structure determi-nations, manifests itself in relatively high valuesof the DYANA target function in the early cycles

of the calculations for bPrP (121–230) and P14a(Figure 2(c)). Nonetheless, because network-anchoring detected and eliminated many of theartifact peaks, and constraint-combination reducedthe impact of the remaining artifacts, the calcu-lations converged to quite well-defined structuresalready in the first CANDID cycle (Figure 2(d)).Improved precision is achieved during the cycles2–7 (Figure 2(d)). The system is stable in the sensethat the RMSD drift of the mean coordinates issmall and decreases monotonously towards thefinal structure during the entire CANDID calcu-lations (Figure 2(e)). This important result is alsoapparent from the bundles of conformers obtainedafter the CANDID cycles 1, 6 and 7 (Figure 3(a)–(c)), which show defined structures for cycle 1with readily apparent similarity with the finalstructure (see also Figures 2(f) and 3(c)). Finding adefined and correct fold in the first cycle is thekey to reliable automated NOESY assignment andstructure calculation, since subsequent cycles are

Figure 3. Bundles of conformersof the four proteins used for thevalidation of the CANDID/DYANA procedure. (a) CANDID/DYANA cycle 1 (10 conformers);(b) CANDID/DYANA cycle 6 (20conformers); (c) CANDID/DYANAcycle 7 (20 conformers after energy-refinement); (d) structure from theprevious interactive determination(20 conformers after energy-refinement).

Automated NOE Assignment with CANDID 219

driven by intermediary 3D structure-based assign-ment and peak filters (Figure 1).

Comparison with conventional, interactivestructure determination

For these comparisons we used exclusively theresults obtained after the CANDID/DYANA cycle7 and restrained energy-refinement in a water-shell with the program OPALP,27,28 using theAMBER force field29 (see Materials and Methods),both because the input for the calculation of thereference structures contained only unambiguousdistance constraints, and because these structureshad also been energy-refined with OPAL28 orOPALP.27

The input data of NOE upper distance con-straints obtained with CANDID are in very goodoverall agreement with those of the interactivestructure determinations that are used as areference (Tables 1 and 2). For all peak lists exceptthe one for CopZ, a slightly larger number ofpeaks were assigned by CANDID than had beenassigned interactively (Table 2), which indicatesthat an even more thorough use of the spectralinformation was possible with CANDID than withthe interactive approaches. The overwhelmingmajority of NOESY cross-peaks, i.e. between 92%and 94% of all peaks assigned previously by inter-active approaches for the four proteins, haveidentical assignments (Table 2). For only between2.4% and 4.7% of the peaks the interactiveapproach and CANDID yielded different assign-ments, and between 2.7% and 5.0% of the peaksthat had previously been assigned interactivelywere not assigned by CANDID. These includesome uncertain interactive assignments, but thedifferences have primarily been caused by aslightly different calibration of the distance con-straints. Whereas r26 and r24 relationships between

peak volume and upper distance bounds wereused in the interactive approach,9 r26 relationshipswere used with CANDID (Table 5). Furthermore,the two approaches used different treatments ofdegenerate diastereotopic groups of protons, i.e.pseudoatoms in the interactive calculation of thereference structures, and r26-summation withswapping of diastereotopic atoms for the struc-tures based on CANDID assignments.25,26 Thesedifferent treatments result in slightly differentvalues for some of the upper distance bounds. Forexample, in cases where CANDID uses a tighterupper bound than the interactive approach, thiscan lead to a consistent constraint violation andhence to elimination of the corresponding NOESYcross-peak from the input for the structurecalculation.

The close coincidence of the NOE assignmentsobtained with CANDID and in the reference struc-ture determinations leads to structures that areclosely similar and of comparable quality, asshown by the target function and RMSD values ofTable 2. With both approaches, the final targetfunction values are in the range 1–4 A2

(1A ¼ 0.1 nm), and the RMSD values vary between0.4 and 0.9 A. The RMSD values between the meanstructures obtained by the interactive and the auto-matic approach are in the range 0.7–1.1 A, which isslightly less than the sum of the average RMSDvalues for the two bundles of conformers. Thisgood agreement is evident also by inspection ofthe structures in Figure 3(c) and (d), which showvisible deviations almost exclusively for surfaceloop regions. Evaluation of the stereochemicalquality of the energy-minimized protein structuresdetermined by the automated CANDID approachwith the program PROCHECK30 resulted in verysimilar statistics of the Ramachandran plots as forthe reference structures. The CANDID structureshave between 93% and 97% of the residues in the

Figure 4. Impact of using net-work-anchoring and constraint-combination in the CANDID/DYANA calculation of the proteinsCopZ (a) and bPrP(121–230) (b).The evolution of the averageRMSD for the bundle of conformers(“precision”; lower panels) and ofthe RMSD between the meanstructures obtained by the kthCANDID cycle and the final struc-ture from the interactive approach(“accuracy”; upper panels) areshown. The same CANDID proto-col was used throughout, exceptthat network-anchoring, and con-straint-combination were only usedas follows: no network-anchoring,no constraint-combination (dotted);network-anchoring, no constraint-combination (broken); no network-

anchoring, constraint-combination (dot-dashed); network-anchoring and constraint-combination (continuous).

220 Automated NOE Assignment with CANDID

“most favored” and “additional allowed”regions, as defined by PROCHECK,30 whereasthe corresponding values for the interactivestructure determinations are in the range from95% to 99%.

Effect of network-anchoring and constraint-combination

To assess the impact of the presently introducedtechniques of network-anchoring and constraint-combination on automated NOESY assignment,the CANDID calculations with the data sets ofTable 1 were repeated using the same protocolexcept that either network-anchoring, or con-straint-combination, or both were inactivated. Theresults for CopZ and bPrP(121–230) are shown inFigures 4 and 5 and Table 3 (similar results were

obtained for the other two proteins). The standardCANDID schedule using both network-anchoringand constraint-combination yielded in all cases theclosest structure to the reference. If one or both ofthe new techniques were inactivated, the structuresafter cycle 1 were much more distorted, as mani-fested by RMSD values between the mean struc-tures obtained by CANDID and the interactiveapproach of more than 3.0 A (Figure 4). For CopZ,the computation without network-anchoring andconstraint-combination yielded a relativelyprecisely defined but severely erroneous structure(Figure 4(a)). In the other computations of Figure4, the convergence towards the correct structurewas always fastest when using the standardschedule, and later cycles in the variant protocolswere in most instances able to largely correct thesevere distortions present in the structures fromthe first cycle. The fact that the RMSD values

Figure 5. Bundles of the conformers with the lowest residual target function values for the protein CopZ after theCANDID cycles 1 (top) and 6 (bottom). The structures were obtained with the four CANDID calculations of Figure 4:(a) using network-anchoring and constraint-combination; (b) no network-anchoring, constraint-combination; (c) net-work-anchoring, no constraint-combination; (d) no network-anchoring, no constraint-combination.

Table 3. Consistency with the conventionally determined reference structures of CopZ and bPrP(121–230) of thedistance constraints from cycle 1 of CANDID obtained with and without network-anchoring and/or constraint-combination

Network-anchoring

Constraint-combi-nation

CopZ bPrP(121–230)

Target functiona

(A2)Constraint

violationsb . 5 ATarget functiona

(A2)Constraint

violationsb . 5 A

Yes Yes 53 0 385 2No Yes 221 2 605 5Yes No 3399 25 8136 43No No 6714 56 14842 98

a Value of the DYANA target function calculated for the NOE distance constraints of the first CANDID cycle relative to thereference structure. The mean value for the bundle of 20 conformers representing the reference structure (Tables 1 and 2) is listed.

b Number of constraint violations larger than 5.0 A counted for the input of NOE distance constraints of the first CANDID cyclerelative to the reference structure.

Automated NOE Assignment with CANDID 221

within the bundles are smaller than the RMSDvalues between the mean structures reveals thatthe apparent precision may give a misleading indi-cation of high accuracy of the structure in theabsence of network-anchoring and/or constraint-combination. In the case of CopZ, constraint-combination was particularly important, whereasthe protocol without network-anchoring yielded agood structure (Figure 4(a)). The observations forCopZ are visualized in Figure 5 with the resultsfrom the first and sixth CANDID cycles of calcu-lations with and without network-anchoring and/or constraint-combination. For bPrP(121–230), theuse of either network-anchoring or constraint-combination alone was not sufficient to attain com-parable convergence and structure quality after theearly cycles as when using the standard protocolwith both of these two techniques activated (Figure4(b)). Overall, the results of Table 3 and Figures 4and 5 indicate that automated NOESY assignmentwithout network-anchoring and constraint-combination is hardly reliable unless additionalinput data, such as a certain number of previouslyassigned long-range NOEs or a preliminary poly-peptide fold, is available with the input for thefirst cycle of calculations.

The principal intended role of network-anchoring is to replace the initially unavailablechecks for compatibility with the 3D structurewithin the automated NOESY assignment (Figure1), whereas constraint-combination reduces theimpact of unidentified artifact constraints in theinput for the first structure calculation. The plotsof Figure 2 illustrate that the results from cycle 1are quite in line with those of the subsequent“structure-based” cycles, indicating that theenvisaged goal had been largely attained. In par-ticular, the low number of only about three poss-ible assignments per peak already for the firstcycle (Figure 2(a)) is a direct result of network-anchoring, since solely on the basis of chemicalshift information, there would be 5.7, 9.2, 6.9 and15.5 initial ambiguous assignments per peak forCopZ, WmKT, bPrP(121–230) and P14a, respect-ively. Network-anchoring is thus very effective inidentifying correct assignments among allchemical-shift based initial assignments. This isalso reflected by the fact that the interactivelydetermined final reference assignment is usuallyat top rank if the initial assignments are sorted bydecreasing generalized relative contributions(equation (3)). The average rank of the referenceassignment therefore changes only from 1.08–1.14in cycle 1 to 1.04–1.06 in cycle 6 for the differentproteins studied (Table 2), so that for at least 86%of the peaks assigned by both methods the refer-ence assignment is also the most highly weightedassignment by CANDID in cycle 1.

The effectiveness of network-anchoring and con-straint-combination has also been assessed byevaluating the consistency of the distance con-straints produced in the first CANDID cycles withthe reference structure. To this end the DYANA

target function was calculated (not minimized)and the number of severe constraint violationsabove a threshold value of 5.0 A counted for thereference structure when using the NOE distanceconstraints of the first CANDID cycle as input(Table 3). The results clearly show that theCANDID standard protocol with network-anchor-ing and constraint-combination yields by far thebest set of distance constraints and that constraint-combination is highly effective in reducing theimpact of unidentified artifact peaks on thestructure.

De novo structure determination of WmKTusing an automatically picked peak list

To assess the potential of using CANDID in con-nection with automation of further parts of thestructure determination process, it was applied toa peak list for the 2D [1H,1H]-NOESY spectrum ofWmKT that was created automatically by the pro-gram AUTOPSY.31 The resulting list of 3789 peakspicked and integrated by AUTOPSY was used asinput for CANDID, and automated NOESY assign-ment and structure determination were performedwith the same protocol and using the same chemi-cal shift list as with the interactively preparedWmKT peak list of Table 1 (since AUTOPSY pickspeaks on both sides of the diagonal in a 2D[1H,1H]-NOESY spectrum, the number of peaks inthe AUTOPSY peak list is about twice that of theinteractively prepared peak list; see Materials andMethods).

Overall, the results obtained on the basis of theAUTOPSY peak list are similar to those from theinteractively prepared WmKT peak list (Table 2,Figures 2 and 3). In the seventh CANDID cycle,88.3% of the peaks were assigned, the final averagetarget function value was 5.34 A2, the averageRMSD value of the energy-refined bundle of con-formers relative to the mean coordinates of thebackbone atoms N, Ca and C0 of residues 4–39and 47–87 was 0.56 A, and the RMSD of the meancoordinates to those of the conventionally deter-mined structure was 0.96 A. Using the AUTOPSYpeak list, a slightly lower percentage of the totalnumber of NOESY peaks was thus assigned, thefinal target function values were somewhat higherand the final structure deviates slightly more fromthe reference structure. This data manifest thehigher quality of the interactively prepared peaklist, which had been optimized in multiple roundsof structure refinement and NOE assignment,32

but the calculation on the basis of the AUTOPSYpeak list clearly resulted also in a high-qualitystructure. In terms of precision as measured bythe average RMSD value for the bundle of 20energy-refined conformers, the automatedCANDID method using AUTOPSY actuallyresulted in a somewhat better-defined structurethan either the interactive approach32 or CANDIDusing an interactively prepared peak list (Table 2).

222 Automated NOE Assignment with CANDID

Discussion and Conclusions

This work documents that the CANDIDapproach for automated NOE assignment andprotein structure calculation yields comparableNOE assignments and protein 3D structures tothose obtained by conventional, interactive struc-ture determination. The new concepts of network-anchoring and constraint-combination ensured inall applications so far that the correct fold of theprotein was obtained already in the first cycle,which is the single most important advanceachieved with CANDID when compared withpreviously proposed automated NOE assignmentmethods.5,13,15 The potential of the CANDIDprocedure to become a generally applicablemethod for automated NOE assignment is sup-ported by successful applications for structuredeterminations of three variant human prionproteins,33 the calreticulin P-domain,34 thepheromone-binding protein from Bombyx mori,35

the human Doppel protein (to be published), andthe chicken prion protein (to be published). Thesede novo structure determinations showed that auto-mated NOE assignment with CANDID is fasterand more objective than the conventional, inter-active approach, so that the analysis of the NOESYspectra is no longer a time-limiting step in de novoprotein structure determinations by NMR.

For the validation of the CANDID procedure forautomated NOE assignment, the current versionof CANDID has been interfaced with the programDYANA.12 The CANDID program could in futureapplications be used also in conjunction withother structure calculation programs that canhandle ambiguous distance constraints. For theevaluation of proper performance of CANDID,independent of the availability of an interactivelydetermined reference structure, we propose a setof general guidelines (see (a)–(e) below). Mostimportant, the CANDID input must include nearlycomplete sequence-specific resonance assignments,and CANDID cannot normally make up for lack ofchemical shift assignments.

The guidelines (a)–(e) for checks on the success-ful performance of CANDID for automated NMRstructure determination of globular proteinsemerged from the test calculations here, and fromexperience gained in the aforementioned de novostructure determinations.33 – 35 They consist of twostraightforward requirements on the input chemi-cal shift lists and NOESY cross-peak lists, (a) and(b), and of three output criteria to judge thereliability of the resulting structure, (c)–(e). All thepresently described CANDID test calculationsusing network-anchoring and constraint-combination, as well as the applications for denovo structure determinations33 – 35 fulfilled thesefive criteria.

(a) The input chemical shift list must containmore than 90% of the non-labile and backboneamide 1H chemical shifts. If 3D or 4D hetero-

nuclear-resolved [1H,1H]-NOESY spectra areused, more than 90% of the 15N and/or 13Cchemical shifts must also be available.

(b) The peak lists must be faithful represen-tations of the NOESY spectra, and the chemicalshift positions of the NOESY cross-peaks mustbe correctly calibrated to fit the chemical shiftlists within the chemical shift tolerances. Therange of allowed chemical shift variations(“tolerances”) for 1H should not exceed^0.02 ppm when working with homonuclear[1H,1H]-NOESY spectra, or ^0.03 ppm whenworking with heteronuclear-resolved 3D or4D NOESY spectra, and the tolerances for the15N and/or 13C shifts should not exceed^0.6 ppm.

(c) The average final DYANA target functionvalue for the bundle of conformers used torepresent the structure from the first CANDIDcycle should be below 250 A2, and the corre-sponding value for the last CANDID cycleshould be below 10 A2, with more than 80% ofall picked NOESY cross-peaks assigned and lessthan 20% of the peaks with exclusively long-range assignments1 eliminated by the peak filtersof CANDID. If CANDID is used with a differentstructure calculation algorithm, the parametersgiven for the target function will need to beredefined accordingly.

(d) The average backbone RMSD to the meancoordinates for the structured parts of the poly-peptide chain should be below 3.0 A for thebundle of conformers used to represent thestructure from CANDID cycle 1.

(e) The RMSD drift between the mean atomcoordinates after the first and the lastCANDID cycles calculated for the backboneheavy atoms of the structured part of the poly-peptide chain should be smaller than 3.0 A,and it should not exceed the average RMSDto the mean coordinates after cycle 1 by morethan 25%.

The input requirements (a) and (b) are imposedto ensure that in most instances the correct assign-ment is among the initial assignments of a cross-peak. Incomplete chemical shift lists make itimpossible for CANDID to correctly assign any ofthe NOEs involving the unassigned atoms. There-fore, the assignment criterion (a) is particularlyimportant for atoms with a large number ofexpected NOEs, such as the hydrogen atoms ofthe polypeptide backbone and the hydrophobiccore side-chains, whereas incomplete chemicalshift information for hydrophilic surface side-chains is more tolerable. The criterion (b) is neededbecause one often uses different NMR spectra forobtaining resonance assignments, and for thecollection of the conformational constraints,respectively. Clearly, if the difference between theNOESY cross-peak positions and the chemicalshift value of a given atom is larger than thetolerance range, then CANDID cannot make

Automated NOE Assignment with CANDID 223

correct NOE assignments. Such difficulties do notusually arise in structure determinations withhomonuclear 1H NMR, where the assignments arebased on sequential NOEs observed in the samedata sets that are also used for the collection of con-formational constraints.1 In projects with hetero-nuclear NMR, the chemical shift lists may need tobe updated by reference to the correspondingNOESY cross-peak positions, using NOESY cross-peaks that have been assigned as part of thesequence-specific resonance assignment procedure,which typically makes use of numerous intra-residual and sequential NOEs, in addition to thedata from heteronuclear triple resonanceexperiments.

The three output criteria (c)–(e) emphasize thecrucial importance of getting good results fromthe first CANDID cycle. For reliable automatedNMR structure determination, the bundle of con-formers obtained after cycle 1 should be reasonablycompatible with the input data (criterion (c)) andshow a defined fold of the protein (criterion (d)).Structural changes between the first and sub-sequent CANDID cycles should occur within theconformation space determined by the bundle ofconformers obtained after cycle 1, with the implicitassumption that this conformation space containsthe correct fold of the protein (criterion (e)). Theoutput criteria for target function and RMSDvalues might need to be slightly relaxed for pro-teins with more than 150 amino acid residues, andtightened for small proteins of less than 80residues.

In principle, a de novo protein structure determi-nation requires one round of 7 CANDID cycles(Figure 1). This is realistic for projects where anessentially complete chemical shift list is availableand much effort was made to prepare a complete,high-quality input of NOESY peak lists. In practice,our experience so far has been that it may be moreefficient to start a first round of CANDID analysiswithout excessive work for the preparation of theinput peak list, using an incomplete list of “safelyidentifiable” NOESY cross-peaks, and then use theresult of the first round of CANDID assignmentand structure determination as additional infor-mation from which to prepare an improved, morecomplete NOESY peak list as input for a secondround of 7 CANDID cycles.

Materials and Methods

Experimental NMR data sets used for the validationof automated structure determination with CANDID/DYANA

For the evaluation of the performance of CANDID/DYANA the experimental NMR data sets of fourproteins were used for which high-quality NMR struc-tures had previously been determined by a conventional,interactive approach (Tables 1 and 2; PDB entries: CopZ,1CPZ; WmKT, 1WKT; bPrP(121–230), 1DWZ; P14a,1CFE). For all four proteins nearly complete sequence-

specific resonance assignments for the backbone and theside-chains are available. The chemical shifts of the 1H,15N, and 13C atoms were used as deposited in theBioMagResBank (accession codes: CopZ, 4344; WmKT,5255; bPrP(121–130), 4563; P14a, 4301). A single chemicalshift list was used for each of the proteins CopZ, WmKTand bPrP(121–130). For P14a, separate chemical shiftlists were utilized for each of the four peak lists derivedfrom four different NOESY data sets, as it had beendone in the interactive structure determination in orderto account for slight deviations between correspondingchemical shifts in the different NOESY spectra.36

The positions and volumes of all the peaks in theNOESY peak lists from the original structure determi-nation were used as input for CANDID. These peaklists resulted from interactive peak picking with theprogram XEASY.22 They contain peaks that had beenunambiguously assigned and the corresponding upperdistance constraints used for the structure calculation,as well as unassigned peaks that were not included inthe input for the final structure calculation and maytherefore be artifacts (Tables 1 and 2).

For all four proteins the experimentally determined3J-coupling constants were used as in the previous, inter-active structure determination. In each CANDID cycle,these scalar coupling constants were converted in con-junction with the updated list of NOE upper distanceconstraints into torsion angle constraints by the gridsearch procedure FOUND.24 Stereospecific assignmentsfrom the interactive structure calculations were notincluded into the input for the CANDID/DYANAstructure determination. Each disulfide bridge was con-strained by a set of upper and lower distanceconstraints.37

To explore the potential of CANDID for more fullyautomated structure determination, an additional calcu-lation was performed using a peak list for WmKT thathad been produced automatically with the programAUTOPSY, as described by Koradi et al.31

Standard protocol for automated structuredetermination with CANDID and DYANA

The CANDID/DYANA calculations comprised eithersix iterative cycles of NOESY assignment and structurecalculation with the parameters given in Tables 4 and 5,or seven cycles for all calculations used for comparisonwith the reference structures of Table 1. Upper distancebounds were derived from NOESY cross-peak intensitiesaccording to equation (13). In the CANDID calculationsonly the two automated procedures for determining thecalibration constants were applied, which ensured thatno explicit input from the user was required (seeAlgorithms). The atomic calibration constants of thebackbone and non-methyl b-protons were determinedin the first CANDID cycle by automated, structure-independent calibration with a target average upper dis-tance bound of 3.8 A. In the following cycles, automatedstructure-based calibration was used, requiring that lessthan 15 and 10% of the constraints violate the inputstructures of cycles 2–3 and 4–7, respectively. For theremaining methyl and non-methyl protons the cali-bration constants were set to 3.0 and 1.5 times the valuedetermined for backbone and non-methyl b-protons,respectively.

DYANA structure calculations using the standardsimulated annealing schedule with 8000 torsion angledynamics steps were started from 80 and 60 randomized

224 Automated NOE Assignment with CANDID

conformers in cycles 1 and 2, 3–7, respectively.12 The 10best conformers from the cycles 1 and 2, and the 20 bestconformers from the cycles 3–6 were used as inputstructure for the next CANDID cycle.

Computations were performed on shared-memorymultiprocessor Compaq computers using four Alphaprocessors in parallel for the structure calculations. Thecomputation time for a complete automated structuredetermination with CANDID and DYANA ranged from3.9 h for CopZ to 15.4 h for P14a on a single processor,and was spent predominantly in the DYANA structurecalculation of, in total, 460 conformers per structuredetermination.

Energy-minimization of the protein 3D structuresusing the program OPALP

Since the program DYANA does not optimize againsta general energy force field that would also account forelectrostatic interactions,12 the bundles of conformers tobe used for representation of the NMR structure wereenergy-minimized in a water-shell with the programOPALP,27,28 using the AMBER force field.29 OPALPaccepts as input a bundle of conformers and the confor-mational constraints used in the final DYANA structurecalculation, whereby the NOE upper distance constraintsmust be assigned to single pairs of hydrogen atoms. Theenergy-refined bundle of 20 conformers is used to rep-resent the 3D NMR structure, which can then be directlycompared with the reference structure determinations ofTable 1. These have also used exclusively unambiguous

NOE assignments in the input for the structure calcu-lation, and were similarly refined with OPAL orOPALP.32,36,38,39

Reference NOESY assignments and 3DNMR structures

The outcome of the automated CANDID structuredeterminations was evaluated by comparison with thepublished, interactively determined NOESY assignmentsand structures of CopZ, WmKT, bPrP(121–230), andP14a.32,36,38,39 These structures had been calculated on thebasis of the experimental data of Table 1 by the programDYANA12 for CopZ and bPrP(121–230), and by its pre-decessor DIANA using the REDAC strategy for WmKTand P14a.9,11 All reference calculations were thus per-formed in torsion angle space, with fixed values of thebond lengths, bond angles, planar groups and chiralitiesaccording to the ECEPP/2 force field,40 and using thesame functional form and weighting factors for the targetfunction as in the presently introduced CANDID/DYANA protocol.

Structure analysis and comparison

RMSD values are used for three different types ofcomparisons: The RMSD of a bundle of n conformers isthe average of the n RMSD values between the indi-vidual conformers and the mean coordinates for thebundle. The RMSD between mean structures is theRMSD value between the mean coordinates of two

Table 4. Cycle-independent CANDID parameters used in the structure calculations here

Symbol Parameter Value

Dvp1 Tolerance range for peak positions in the indirect 1H dimension (equation (1)) 0.02 ppm (bPrP: 0.03 ppm) (P14a:

0.025 ppm)Dv

p2 Tolerance range for peak positions in the direct 1H dimension (equation (1)) 0.02 ppm (P14a: 0.025 ppm)

Dvp3 Tolerance range for peak positions in the 13C or 15N dimension (equation (2)) 0.4 ppm

DVSa Tolerance range of chemical shift (equations (1) and (2)) 0.0 ppm

G Scaling factor for chemical shift agreement (equation (4)) 0.5dmax Maximal value for covalent structure-constrained 1H–1H distances for which a NOE

is expected5.5 A

nmin Minimal network-anchoring contribution for all other intraresidual and sequentialdistances (equation (10))

0.1

nmax Maximal network-anchoring contribution for covalent structure-constraineddistances (equation (10))

1.0

nmax Acceptable maximal number of ambiguous assignments per peak 20Mvio Acceptable maximal number of conformers with violation .dcut among all M

conformersM/2

Nhigh Lower limit to qualify a cross-peak for having a “high network-anchoring perresidue” (equation (11))

4.0

Table 5. Cycle-dependent CANDID parameters used in the structure calculations here

Symbol ParameterValue in cycle i ði ¼ 1;…; 7Þ

1 2 3 4 5 6 7

Vmin Acceptable minimal volume contribution per assignment (%) (equation (3)) 1.0 0.1 0.5 1.0 2.5 5.0 50.1T Weight for presence of transposed peak (equation (3)) 10 10 10 10 1 1 1O Weight for covalent structure-constrained distances (equation (3)) 10 10 10 1 1 1 1h Exponent for 3D structure-based volume contribution (equation (6)) 3 3 6 6 6 6 6Smax Maximal structure-independent generalized volume contribution (equation (3)) 20 20 20 20 10 10 10dcut Upper limit on acceptable distance violation for elimination of spurious NOESY

cross-peaks (A)– 1.5 0.9 0.6 0.3 0.1 0.1

�Nmin Threshold for acceptable lower limit of network-anchoring per residue 1.0 0.75 0.5 0.5 0.5 0.5 0.5Nmin Threshold for acceptable lower limit of network-anchoring per atom 0.25 0.25 0.4 0.4 0.4 0.4 0.4

Automated NOE Assignment with CANDID 225

bundles of conformers, for example, correspondingbundles obtained by CANDID and by the interactiveapproach. The RMSD drift for the CANDID cycle k isthe RMSD between the mean coordinates of the bundlesof conformers obtained in the last cycle and in cycle k.The mean coordinates of a structure bundle are calcu-lated by superimposing conformers 2;…; n onto the firstconformer for minimal RMSD for the backbone atomsN, Ca and C0 and subsequent calculation of the arith-metic average of the Cartesian coordinates. RMSD valueswere calculated for well-defined segments of the poly-peptide chain, as published with the original structuredeterminations and given in Table 2. The programMOLMOL was used to visualize the 3D structures andfor the calculation of RMSD values.41

Implementation and availability of CANDID

The core of the current version of CANDID isimplemented in standard Fortran-77 and has been builtupon the data structures and into the framework of theuser interface of the program DYANA. The standardschedule and parameters for a complete automatedstructure determination with CANDID and DYANA arespecified in a script written in the interpreted commandlanguage INCLAN,12 that gives the user high flexibilityin the way automated structure determination is per-formed without need to modify the compiled core partof CANDID. The current versions of CANDID andDYANA are available from P.G.

Acknowledgments

Financial support by the Schweizerischer National-fonds (project 31.49047.96), and the use of the high-performance computing facilities of ETH Zurich, EPFLausanne, and the Centro Svizzero di Calcolo Scientificoare gratefully acknowledged.

References

1. Wuthrich, K. (1986). NMR of Proteins and NucleicAcids, Wiley, New York.

2. Solomon, I. (1955). Relaxation processes in a systemof two spins. Phys. Rev. 99, 559–565.

3. Anil-Kumar, Ernst, R. R. & Wuthrich, K. (1980). Atwo-dimensional nuclear Overhauser enhancement(2D NOE) experiment for the elucidation of completeproton–proton cross-relaxation networks in biologi-cal macromolecules. Biochem. Biophys. Res. Commun.95, 1–6.

4. Macura, S. & Ernst, R. R. (1980). Elucidation of crossrelaxation in liquids by 2D NMR spectroscopy. Mol.Phys. 41, 95–117.

5. Mumenthaler, C., Guntert, P., Braun, W. & Wuthrich,K. (1997). Automated combined assignment ofNOESY spectra and three-dimensional protein struc-ture determination. J. Biomol. NMR, 10, 351–362.

6. Guntert, P., Berndt, K. D. & Wuthrich, K. (1993). Theprogram ASNO for computer-supported collectionof NOE upper distance constraints as input forprotein structure determination. J. Biomol. NMR, 3,601–606.

7. Meadows, R. P., Olejniczak, E. T. & Fesik, S. W.(1994). A computer-based protocol for semi-

automated assignments and 3D structure determi-nation of proteins. J. Biomol. NMR, 4, 79–96.

8. Duggan, B. M., Legge, G. B., Dyson, H. J. & Wright,P. E. (2001). SANE (structure assisted NOE evalu-ation): an automated model-based approach forNOE assignment. J. Biomol. NMR, 19, 321–329.

9. Guntert, P., Braun, W. & Wuthrich, K. (1991).Efficient computation of three-dimensional proteinstructures in solution from nuclear magneticresonance data using the program DIANA and thesupporting programs CALIBA, HABAS andGLOMSA. J. Mol. Biol. 217, 517–530.

10. Moseley, H. N. B. & Montelione, G. T. (1991). Auto-mated analysis of NMR assignments and structuresfor proteins. Curr. Opin. Struct. Biol. 9, 635–642.

11. Guntert, P. & Wuthrich, K. (1991). Improvedefficiency of protein structure calculations fromNMR data using the program DIANA with redun-dant dihedral angle constraints. J. Biomol. NMR, 1,446–456.

12. Guntert, P., Mumenthaler, C. & Wuthrich, K. (1997).Torsion angle dynamics for NMR structure calcu-lation with the new program DYANA. J. Mol. Biol.273, 283–298.

13. Mumenthaler, C. & Braun, W. (1995). Automatedassignment of simulated and experimental NOESYspectra of proteins by feedback filtering and self-correcting distance geometry. J. Mol. Biol. 254,465–480.

14. Nilges, M. (1995). Calculation of protein structureswith ambiguous distance restraints. Automatedassignment of ambiguous NOE crosspeaks anddisulphide connectivities. J. Mol. Biol. 245, 645–660.

15. Nilges, M., Macias, M., O’Donoghue, S. I. &Oschkinat, H. (1997). Automated NOESY interpret-ation with ambiguous distance restraints: the refinedNMR solution structure of the pleckstrin homologydomain from b-spectrin. J. Mol. Biol. 269, 408–422.

16. Brunger, A. T. (1992). X-PLOR, Version 3.1. A Systemfor X-ray Crystallography and NMR, Yale UniversityPress, New Haven, CT.

17. Brunger, A. T., Adams, P. D., Clore, G. M., DeLano,W. L., Gros, P., Grosse-Kunstleve, R. W. et al. (1998).Crystallography and NMR system: a new softwaresuite for macromolecular structure determination.Acta Crystallog. sect. D, 54, 905–921.

18. Savarin, P., Zinn-Justin, S. & Gilquin, B. (2001). Vari-ability in automated assignment of NOESY spectraand three-dimensional structure determination: atest case on three small disulfide-bonded proteins.J. Biomol. NMR, 19, 49–62.

19. Nilges, M. (1993). A calculation strategy for the struc-ture determination of symmetric dimers by 1H NMR.Proteins: Struct. Funct. Genet. 17, 297–309.

20. Nilges, M. & O’Donoghue, S. I. (1998). AmbiguousNOEs and automated NOE assignment. Prog. NMRSpectrosc. 32, 107–139.

21. Greenfield, N. J., Huang, Y. J., Palm, T., Swapna, G. V.T., Monleon, D., Montelione, G. T. & Hitchcock-DeGregori, S. E. (2001). Solution NMR structure andfolding dynamics of the N terminus of a rat non-muscle alpha-tropomyosin in an engineered chimericprotein. J. Mol. Biol. 312, 833–847.

22. Bartels, C., Xia, T., Billeter, M., Guntert, P. &Wuthrich, K. (1995). The program XEASY forcomputer-supported NMR spectral analysis of bio-logical macromolecules. J. Biomol. NMR, 6, 1–10.

23. Wuthrich, K., Billeter, M. & Braun, W. (1983).Pseudo-structures for the 20 common amino acids

226 Automated NOE Assignment with CANDID

for use in studies of protein conformation bymeasurements of intramolecular proton–proton dis-tance constraints with nuclear magnetic resonance.J. Mol. Biol. 169, 949–961.

24. Gutert, P., Billeter, M., Ohlenschlager, O., Brown, L. R.& Wuthrich, K. (1998). Conformational analysis ofprotein and nucleic acid fragments with the newgrid search algorithm FOUND. J. Biomol. NMR, 12,543–548.

25. Fletcher, C. M., Jones, D. N. M., Diamond, R. &Neuhaus, D. (1996). Treatment of NOE constraintsinvolving equivalent or nonstereoassigned protonsin calculations of biomacromolecular structures.J. Biomol. NMR, 8, 292–310.

26. Folmer, R. H. A., Hilbers, C. W., Konings, R. N. H. &Nilges, M. (1997). Floating stereospecific assignmentrevisited: application to an 18 kDa protein and com-parison with J-coupling data. J. Biomol. NMR, 9,245–258.

27. Koradi, R., Billeter, M. & Guntert, P. (2000). Point-centered domain decomposition for parallel molecu-lar dynamics simulation. Comput. Phys. Commun.124, 139–147.

28. Luginbuhl, P., Guntert, P., Billeter, M. & Wuthrich, K.(1996). The new program OPAL for moleculardynamics simulations and energy refinements of bio-logical macromolecules. J. Biomol. NMR, 8, 136–146.

29. Cornell, W. D., Cieplak, P., Bayly, I., Gould, I. R.,Merz, K. M., Ferguson, D. M. et al. (1996). A secondgeneration force field for the simulation of proteins.Nucleic acids, and organic molecules. J. Am. Chem.Soc. 118, 2309–2309.

30. Morris, A. L., MacArthur, M. W., Hutchinson, E. G. &Thornton, J. M. (1992). Stereochemical quality ofprotein structure coordinates. Proteins: Struct. Funct.Genet. 12, 345–364.

31. Koradi, R., Billeter, M., Engeli, M., Guntert, P. &Wuthrich, K. (1998). Towards fully automatic peakpicking and integration of biomolecular NMRspectra. J. Magn. Reson. 135, 288–297.

32. Antuch, W., Guntert, P. & Wuthrich, K. (1996).Ancestral bg-crystallin precursor structure in a yeastkiller toxin. Nature Struct. Biol. 3, 662–665.

33. Calzolai, L., Lysek, D. A., Guntert, P., von Schroetter,C., Riek, R., Zahn, R. & Wuthrich, K. (2000). NMRstructures of three single-residue variants of thehuman prion protein. Proc. Natl Acad. Sci. USA, 97,8340–8345.

34. Ellgaard, L., Riek, R., Herrmann, T., Guntert, P.,Braun, D., Helenius, A. & Wuthrich, K. (2001). NMRstructure of the calreticulin P-domain. Proc. NatlAcad. Sci. USA, 98, 3133–3138.

35. Horst, R., Damberger, F., Luginbuhl, P., Guntert, P.,Peng, G., Nikonova, L. et al. (2001). NMR structurereveals intramolecular regulation mechanism forpheromone binding and release. Proc. Natl Acad. Sci.USA, 98, 14374–14379.

36. Fernandez, C., Szyperski, T., Bruyere, T., Ramage, P.,Mossinger, E. & Wuthrich, K. (1997). NMR solutionstructure of the pathogenesis-related protein P14a.J. Mol. Biol. 266, 576–593.

37. Williamson, M., Havel, R. F. & Wuthrich, K. (1985).Solution conformation of proteinase inhibitor IIAfrom bull seminal plasma by 1H nuclear magneticresonance and distance geometry. J. Mol. Biol. 155,311–319.

38. Wimmer, R., Herrmann, T., Solioz, M. & Wuthrich, K.(1999). NMR structure and metal interactions of theCopZ copper chaperone. J. Biol. Chem. 274,22597–22603.

39. Lopez Garcıa, F., Zahn, R., Riek, R. & Wuthrich, K.(2000). NMR structure of the bovine prion protein.Proc. Natl Acad. Sci. USA, 97, 8334–8339.

40. Nemethy, G., Pottle, S. M. & Scheraga, H. A. (1983).Energy parameters in polypeptides. 9. Updating ofgeometrical parameters, nonbonded interactions,and hydrogen bond interactions for the naturallyoccurring amino acids. J. Phys. Chem. 87, 1883–1887.

41. Koradi, R., Billeter, M. & Wuthrich, K. (1996).MOLMOL: a program for display and analysis ofmacromolecular structures. J. Mol. Graph. 14, 51–55.

Edited by M. F. Summers

(Received 23 January 2002; received in revised form 13 March 2002; accepted 13 March 2002)

Automated NOE Assignment with CANDID 227