collaborative development for setup, execution, sharing and analytics of complex nmr experiments

9
Collaborative development for setup, execution, sharing and analytics of complex NMR experiments q Alistair G. Irvine a , Vadim Slynko b , Yaroslav Nikolaev b , Russell R.P. Senthamarai a , Konstantin Pervushin a,b,a School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore 637551, Singapore b Biozentrum of University of Basel, Klingelbergstrasse 70, Basel CH-4056, Switzerland article info Article history: Received 5 September 2013 Revised 2 December 2013 Available online 16 December 2013 Keywords: NMR experiment database NMR Wiki Collaborative development Pulse program optimisation Spin dynamics analysis Signal-to-noise prediction abstract Factory settings of NMR pulse sequences are rarely ideal for every scenario in which they are utilised. The optimisation of NMR experiments has for many years been performed locally, with implementations often specific to an individual spectrometer. Furthermore, these optimised experiments are normally retained solely for the use of an individual laboratory, spectrometer or even single user. Here we intro- duce a web-based service that provides a database for the deposition, annotation and optimisation of NMR experiments. The application uses a Wiki environment to enable the collaborative development of pulse sequences. It also provides a flexible mechanism to automatically generate NMR experiments from deposited sequences. Multidimensional NMR experiments of proteins and other macromolecules consume significant resources, in terms of both spectrometer time and effort required to analyse the results. Systematic anal- ysis of simulated experiments can enable optimal allocation of NMR resources for structural analysis of proteins. Our web-based application (http://nmrplus.org) provides all the necessary information, includes the auxiliaries (waveforms, decoupling sequences etc.), for analysis of experiments by accurate numerical simulation of multidimensional NMR experiments. The online database of the NMR experi- ments, together with a systematic evaluation of their sensitivity, provides a framework for selection of the most efficient pulse sequences. The development of such a framework provides a basis for the collab- orative optimisation of pulse sequences by the NMR community, with the benefits of this collective effort being available to the whole community. Ó 2013 Elsevier Inc. All rights reserved. 1. Introduction Nuclear magnetic resonance (NMR) has been an important tool in elucidating the structure and dynamics of many biological mol- ecules. The ability to use NMR for the study of macromolecules has been facilitated by developments in both the spectrometer hard- ware and the pulse sequences used to detect signal. Continuous improvements in sensitivity of pulse sequences have enabled the elucidation of ever larger macromolecular complexes. Despite these advancements, many pulse programs still require optimisa- tion of parameters before being implemented in a specific experi- ment. Such customary development of NMR pulse sequences is often done ad hoc by individuals or individual groups on a single spectrometer. In such circumstances a more centralised repository would be beneficial, providing wider access to tailor-made optimi- sation of pulse sequences. In this study, we propose a web-based system that enables the collaborative development of NMR experiments. Such a system provides a flexible approach that enables the whole NMR commu- nity to share and contribute collectively to the refinement of experiments and each of their components. The general scheme is shown in Fig. 1, which illustrates the iterative manner in which evaluation and development of NMR experiments could be achieved. In addition to the development of individual pulse sequences, some mechanism for assessing those pulse sequences most appro- priate for any given sample would be beneficial. A quantitative assessment of the sensitivity of different pulse sequences would be informative. It would empower experimentalists to make in- formed judgements about the most appropriate pulse programs to run for each individual sample. A number of analytical tools are available to assess this sensitiv- ity through the use of spin dynamics simulations [1–6]. However, many of these simulation programs can only be used for very small 1090-7807/$ - see front matter Ó 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jmr.2013.12.004 q Submitted jointly with J. Magn. Reson. 239 (2014) 141–149. DOI: http:// dx.doi.org/10.1016/j.jmr.2013.10.023 Corresponding author at: School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore 637551, Singapore. E-mail address: [email protected] (K. Pervushin). Journal of Magnetic Resonance 239 (2014) 121–129 Contents lists available at ScienceDirect Journal of Magnetic Resonance journal homepage: www.elsevier.com/locate/jmr

Upload: konstantin

Post on 28-Dec-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Journal of Magnetic Resonance 239 (2014) 121–129

Contents lists available at ScienceDirect

Journal of Magnetic Resonance

journal homepage: www.elsevier .com/locate / jmr

Collaborative development for setup, execution, sharing and analyticsof complex NMR experiments q

1090-7807/$ - see front matter � 2013 Elsevier Inc. All rights reserved.http://dx.doi.org/10.1016/j.jmr.2013.12.004

q Submitted jointly with J. Magn. Reson. 239 (2014) 141–149. DOI: http://dx.doi.org/10.1016/j.jmr.2013.10.023⇑ Corresponding author at: School of Biological Sciences, Nanyang Technological

University, 60 Nanyang Drive, Singapore 637551, Singapore.E-mail address: [email protected] (K. Pervushin).

Alistair G. Irvine a, Vadim Slynko b, Yaroslav Nikolaev b, Russell R.P. Senthamarai a,Konstantin Pervushin a,b,⇑a School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore 637551, Singaporeb Biozentrum of University of Basel, Klingelbergstrasse 70, Basel CH-4056, Switzerland

a r t i c l e i n f o

Article history:Received 5 September 2013Revised 2 December 2013Available online 16 December 2013

Keywords:NMR experiment databaseNMR WikiCollaborative developmentPulse program optimisationSpin dynamics analysisSignal-to-noise prediction

a b s t r a c t

Factory settings of NMR pulse sequences are rarely ideal for every scenario in which they are utilised. Theoptimisation of NMR experiments has for many years been performed locally, with implementationsoften specific to an individual spectrometer. Furthermore, these optimised experiments are normallyretained solely for the use of an individual laboratory, spectrometer or even single user. Here we intro-duce a web-based service that provides a database for the deposition, annotation and optimisation ofNMR experiments. The application uses a Wiki environment to enable the collaborative developmentof pulse sequences. It also provides a flexible mechanism to automatically generate NMR experimentsfrom deposited sequences.

Multidimensional NMR experiments of proteins and other macromolecules consume significantresources, in terms of both spectrometer time and effort required to analyse the results. Systematic anal-ysis of simulated experiments can enable optimal allocation of NMR resources for structural analysis ofproteins. Our web-based application (http://nmrplus.org) provides all the necessary information,includes the auxiliaries (waveforms, decoupling sequences etc.), for analysis of experiments by accuratenumerical simulation of multidimensional NMR experiments. The online database of the NMR experi-ments, together with a systematic evaluation of their sensitivity, provides a framework for selection ofthe most efficient pulse sequences. The development of such a framework provides a basis for the collab-orative optimisation of pulse sequences by the NMR community, with the benefits of this collective effortbeing available to the whole community.

� 2013 Elsevier Inc. All rights reserved.

1. Introduction

Nuclear magnetic resonance (NMR) has been an important toolin elucidating the structure and dynamics of many biological mol-ecules. The ability to use NMR for the study of macromolecules hasbeen facilitated by developments in both the spectrometer hard-ware and the pulse sequences used to detect signal. Continuousimprovements in sensitivity of pulse sequences have enabled theelucidation of ever larger macromolecular complexes. Despitethese advancements, many pulse programs still require optimisa-tion of parameters before being implemented in a specific experi-ment. Such customary development of NMR pulse sequences isoften done ad hoc by individuals or individual groups on a singlespectrometer. In such circumstances a more centralised repository

would be beneficial, providing wider access to tailor-made optimi-sation of pulse sequences.

In this study, we propose a web-based system that enables thecollaborative development of NMR experiments. Such a systemprovides a flexible approach that enables the whole NMR commu-nity to share and contribute collectively to the refinement ofexperiments and each of their components. The general schemeis shown in Fig. 1, which illustrates the iterative manner in whichevaluation and development of NMR experiments could beachieved.

In addition to the development of individual pulse sequences,some mechanism for assessing those pulse sequences most appro-priate for any given sample would be beneficial. A quantitativeassessment of the sensitivity of different pulse sequences wouldbe informative. It would empower experimentalists to make in-formed judgements about the most appropriate pulse programsto run for each individual sample.

A number of analytical tools are available to assess this sensitiv-ity through the use of spin dynamics simulations [1–6]. However,many of these simulation programs can only be used for very small

Fig. 1. Schematic illustration of a web-based system providing an environment forthe collaborative development of NMR experiments.

122 A.G. Irvine et al. / Journal of Magnetic Resonance 239 (2014) 121–129

systems, with most limited to about 5–8 spins as the computa-tional complexity of the system increases with the number of spins[7]. For such theoretical tools the assessment of large spin systemsremains intractable. One exception is a software package known asSpinach, which is capable of operating with polynomial scaling[8,9]. This polynomial scaling enables Spinach to simulate multi-dimensional NMR experiments requiring larger spin systems, asis applicable to most biological molecules currently studied [10].Integration of a web-based NMR database with a powerful analyt-ical tool such as Spinach provides the capability to evaluate andcompare the sensitivity of many different experiments.

For a specific sample of interest, the relative sensitivity of awide variety of pulse sequences can be predicted utilising the re-cent advances in numeric propagation of the density operatorthroughout the pulse sequence using large spin systems of coupledspins in the presence of relaxation [10]. Thus, experiments likely toyield the greatest amount of information can be selected for execu-tion on the spectrometer. When coupled with a database contain-ing all the relevant components of these experiments, the setup ofeach selected experiment can be generated algorithmically, withall the appropriate parameters provided. Such an algorithmic setupwould enable direct execution of each experiment on any spec-trometer described in the database fully utilising sample-contextdependent information. A time-stamped record of these experi-ments, together with their precise conditions, could be retainedby the database, providing transparency and an easy mechanismfor replication.

The web-based application outlined in this study facilitates theabove processes by providing an online database of all the compo-nents necessary for the development of NMR experiments. It facil-itates the automatic and flexible generation of NMR experimentsbased on the selection of appropriate pulse programs, samplesand spectrometers. Furthermore, prediction of the sensitivity ofexperiments is calculated via interfacing with the Spinach spindynamics software [8,10,11]. Output from Spinach is analysed tocalculate the signal-to-noise (S/N) of multidimensional NMRexperiments, based on the experimentally measured S/N from a1D experiment.

2. Methods and design

2.1. Software implementation

The current prototype of the NMR relational database applica-tion has been named NMRplus and available at http://nmrplus.org.NMRplus has been implemented using the C# language within theMicrosoft .NET 4.0 framework. The application includes IronPython

(version 2.6) Python interpreter. IronPython is an implementationof the Python programming language targeting the .NETFramework. This dynamic language complements the staticallytyped language of C#, and allows the execution of Python scriptswithin .NET applications. NMRplus enables client-side aspects ofthe system to interact with the server using Ajax controls via theAjaxControlToolkit, maximising its potential for providing an inter-active and dynamic web interface. With Ajax, web applications canretrieve data from the server asynchronously in the backgroundwithout interfering with the display and behaviour of the existingpage. The Wiki environment is based on the ScrewTurn Wiki opensource Wiki software, version 3.0.2 (http://www.screwturn.eu/).The software was developed using Microsoft Visual Studio 2013Professional integrated development environment. The relationaldatabase was implemented using Microsoft SQL Server 2008 (Ver-sion 10.0.2531.0). The source code for NMRplus is available fromthe CodePlex Open Source Community at http://ideanmr.codeplex.com. Instructions for building NMRplus from the sourcecode are available in the documentation section of this website.

2.2. Generation of NMR experiments

The setup of an NMR experiment can be broken down into threekey components:

1. A pulse sequence, with all the relevant parameters supplied (viaa script).

2. A specific spectrometer.3. A specific sample.

Within NMRplus, these three key components are combined togenerate a full setup of the experiment executable at the spectrom-eter without further adjustments. Some sub-components whichmay be relevant to many pulse sequences, such as composite pulsedecoupling (CPD) scripts or waveforms, may be supplied separatelyand added to each relevant pulse sequence. Once these compo-nents have been defined, an experiment can be automatically gen-erated simply by choosing the relevant pulse program,spectrometer and sample. Additional user-defined parameterscan also be provided, such as 90 degree pulse durations or fre-quency offset values explicitly determined experimentally. Withthese parameters, NMRplus generates a Python script, which canbe executed directly on the spectrometer. The current implemen-tation of NMRplus provides scripts for execution on Bruker spec-trometers. Upon execution, the Python script generates separatefiles according to the required folder structure with Topspin,inserting the appropriate data into each relevant folder (seebelow).

As well as permitting the user to download the generatedexperiment, the experimental setup is saved within the user’s ac-count. This provides an online timestamp for when the experimentwas executed, all experimental parameters and experiment-spe-cific metadata (a copy of experimental setup, sample description,pulse sequences used, spectrometer calibration data etc.) contrib-uting to reproducibility of the experiments.

2.3. Analysis of NMR experiments

In silico analysis of NMR experiments enables the establishmentof signal-to-noise statistics. This can be performed for many differ-ent multi-dimensional experiments, thus enabling comparison ofthese experiments under the desired conditions. This in turn al-lows key decisions to be made about which experiments are mostappropriate to run for the given sample. In NMRplus, after genera-tion of an experiment through the selection of appropriate pulseprogram, spectrometer, sample etc., the experiment can be ana-

A.G. Irvine et al. / Journal of Magnetic Resonance 239 (2014) 121–129 123

lysed using the spin dynamics software Spinach, which can run avariety of simulations [10].

2.4. Bruker-to-Spinach translation

Bruker use a particular syntax for specifying the parameters ofan experiment. However, the current implementation of Spinachis written in Matlab [9]. To prepare the experiment for analysisby Spinach, the Bruker syntax must be translated into Spinach syn-tax. This functionality is partially provided in NMRplus, with aSpinach input macro saved into the database. In the current imple-mentation, some manual intervention is required before the macrocan be executed in Spinach. Once the appropriate changes havebeen made in Spinach, the final version of the macro can be savedback to the NMRplus database.

2.5. Spinach execution

At this time, NMRplus does not have a direct connection to theSpinach software. In order to execute Spinach simulations, theMatlab script for the appropriate experiment is copied from NMR-plus to the Spinach software.

For each experiment, Spinach establishes a magnetisationtransfer pathway. One of the most important parameters used inassessing this pathway is the rotational correlation time (sC),which is directly proportional to the molecular mass of the sample.As well as the sC value associated with the sample of interest, Spin-ach simulations can be performed for any given sC. As an extensionto a single simulation at the sC of the sample, simulations can berepeated for different sC values, producing a series of simulatedinterferograms, one for each sC value.

Often, a series of simulations provided with a range of sC valuescan be informative of how the pulse program is likely to performwith samples over a range of molecular masses. In a typically anal-ysis, multiple Spinach simulations are executed for a single exper-iment, obtaining the results of a simulation using a different sC

value for each run.This produces a set of both real and imaginary data for each

indirect dimension of the experiment for each value of sC.The output from Spinach is provided in matrix form. This is

manually converted to XML format, with the full set of resultssaved into NMRplus as a single XML file. The XML data capturestimepoints of the simulations for each dimension, providing bothreal and imaginary data.

2.6. Calculating signal-to-noise from Spinach output

Using the simulated interferogram data, predictions of signal-to-noise (S/N) in multi-dimensional experiments can be made.The XML file contains simulated interferograms, with data for eachindirect dimension of the experiment at each sC value analysed. Inorder to establish an accurate S/N analysis for multi-dimensionalexperiments of individual samples, some sense of the quality ofthe sample is required. Our current algorithms for predicting theS/N of multi-dimensional experiments use a parameter for theexperimentally acquired S/N from a 1D spectrum. Thus, acquisitionof a 1D experiment is a prerequisite for accurate S/N prediction inthe indirect dimensions.1 This seems reasonable, since acquisition of1D data is usually rapid, requiring very few resources compared tomulti-dimensional experiments. Once the S/N value has been estab-lished for a 1D experiment, this parameter is propagated through thesimulated interferograms for the multi-dimensional experiments.

1 If a 1D acquisition is not available users can enter any S/N for the 1D experimentand still observe the relative effects on each of the indirect dimensions.

Using this strategy we assume that the noise remains constant forall experiments with a given initial signal, i.e. the noise remains con-stant for a given S/N. So, if we know the noise associated with a 1Dexperiment, then we know the noise associated with all multi-dimensional experiments. However, as the signal propagates foreach dimension of an experiment, the intensity of the signal is re-duced to different extents. Thus, the overall S/N will vary betweendifferent experiments. The extent of the reduction of the signal isdetermined by the magnetisation transfer pathways, which are cal-culated by Spinach and reflected in the simulated interferogramsproduced for each experiment. The flow of magnetisation throughthese pathways can be illustrated graphically, as shown previouslyby Senthamarai et al. [10].

In predicting the S/N for multi-dimensional experiments, NMR-plus provides the user with parameters to specify both the numberof points in the time domain and the spectral width of each dimen-sion. A natural cubic spline function is used to interpolate the sim-ulated interferogram data to the specified number of points in thetime domain for each dimension. The dwell time is calculated bythe inverse of the spectral width. However, for any individualinterferogram (for a specific rotational correlation time), the inter-polated values cannot exceed the maximum simulated timepoint(Tmax). For the interferograms where the maximum time point ex-ceeds Tmax, zero filling is applied for the additional points.

The noise is introduced into the simulated interferogram datausing a random number generator. For each point in the spectrum,noise is generated using a random number that is inversely propor-tional to the S/N ratio from the 1D experiment, e.g. for an experi-ment with a 1D S/N of 10 a random number proportional to1/10 = 0.1 is selected. Thus, the noise introduced into each indirectdimension of a multi-dimensional simulated experiment is equiv-alent to the experimentally derived noise of the 1D experiment.Once the noise has been introduced, a cosine bell apodisation func-tion is applied to the interferogram.

The simulated interferogram data is then Fourier transformedinto the frequency domain, using a forward Fourier transformfunction. From the resulting spectrum, the signal is identified fromthe maximum peak height and the noise is calculated from the per-turbation of the baseline. Thus, the S/N for each indirect dimensionof the multi-dimensional experiment is calculated.

If the analysis is performed for data with multiple values of sC

then a separate S/N value is calculated for each sC in each indirectdimension. In addition, the user may have a sample of a size notrepresented in the rotational correlation times used in the Spinachsimulations. In this case, an additional parameter for predicting theS/N allows the user to input any sC value. The S/N for this new sC

can then be inferred from the sC values used in the simulation.Hence, prediction of S/N in NMRplus can be performed for a widerange of sC values, beyond the few selected for analysis usingSpinach.

Once all of the data has been calculated, a 1D spectrum repre-senting the S/N for each indirect dimension is displayed for eachvalue of sC, along with the S/N predictions and other statistics.The number of increments used is determined by the inverse ofthe spectral width input as a parameter for each dimension.

3. Implementation

3.1. Database design

Central to the creation of a web-based NMR database is the de-sign of the database itself. With NMRplus, a relational databasewas designed and created using the Microsoft SQL Server databasemanagement system. The tables and their relationships are illus-trated in supplementary Fig. S1. Separate tables were created for

124 A.G. Irvine et al. / Journal of Magnetic Resonance 239 (2014) 121–129

users, pulse programs, spectrometers and samples. Tables werealso created for each type of auxiliary component, with pulseprograms linked to the relevant components. Each experimentincludes links to identify a specific user, pulse program, spectrom-eter and sample. Some abstract classes were introduced to retaindata common to a number of entities.

3.2. Software framework

One of the main goals of providing an online NMR database wasnot only to provide a resource for users to deposit and retain exper-iments, but also to encourage an environment of discussion andcollaboration through which changes could be made on a continu-ous basis. With this in mind, the software and user interface mustbe designed in such a way to facilitate this collaborative environ-ment. One solution is to use a Wiki – a collaboratively edited, infor-mation centred website. Users can typically add, modify or deleteits content via a web browser. This makes a Wiki environment asuitable platform for the collaborative development of pulsesequences.

The current implementation of the web-based applicationadopted a strategy for development using the Microsoft .NETenvironment. A framework called ScrewTurn Wiki is animplementation of the Wiki concept using the .NET environment(http://www.screwturn.eu/). Hence, ScrewTurn Wiki was used asthe basis from which the web application was implemented.Fig. 2 shows the front page developed for the NMRplus application.

3.3. Creating a pulse program

A new pulse program can be entered into the database in anumber of different ways.

The current implementation of NMRplus allows input of a Brukerpulse program directly into the database. The pulse program isaccompanied by a script used to specify many of the parameters.Once pulse programs exist in the database, optimisations can easilybe made, creating a new version with each saved modification. How-ever, major changes to an existing pulse program may warrant a

Fig. 2. Home page of the NM

completely new pulse program to be created. A new pulseprograms can be created by deriving it from an existing pulseprogram, using this as a template whereby the majority ofpulse program parameters may remain the same, but modificationsmade to the Bruker pulse program and the associated script are usedto specify the new pulse sequence. Although the current implemen-tation of NMRplus only accommodates Bruker pulse programs, thesame general principles can be applied for any other vendor.

3.4. Accessibility of pulse programs

A web-based application can provide a flexible approach to useraccessibility. One goal of an online NMR database is to provide acollaborative environment in which pulse programs and experi-mental setups can be both optimised by and shared with the wholeNMR community. However, there are a variety of reasons whyindividual users may prefer to keep a subset of their data confiden-tial. With the current implementation of NMRplus, users that havecreated NMR experiments can choose to make each one either ‘pri-vate’, whereby only the current user can view the pulse program,or ‘public’, whereby everyone is able to view and edit the pulseprogram. Making the pulse program public opens it up to collabo-rative development by the NMR community at large. Upon logginginto the system, users are able to display a list of all componentsthey have access to, including pulse programs, spectrometers andsamples (as well as auxiliary components such as waveforms andCPDs).

A possible extension to the online application could providemultiple layers of user options, with users able to join differentgroups or sub-groups, having flexibility to share pulse programsand experiments at many different levels.

3.5. Calculation of analytical statistics

3.5.1. Interpretation of dimension-specific parameters frominterferogram XML data

Some of input parameters for prediction of the signal-to-noisecan specify values for different indirect dimensions, e.g. the num-

Rplus web application.

A.G. Irvine et al. / Journal of Magnetic Resonance 239 (2014) 121–129 125

ber of time domains (TDs) or spectral width (SW). For a givenexperiment, these parameters have a specified value taken fromthe setup script. In NMRplus, these experimental values are usedas default values for input into S/N prediction. In order to test dif-ferent experimental scenarios, the user can then change these de-fault values and thus analyse the consequences of modifyingspecific parameters.

In order to retrieve dimension-specific data from an experi-ment, a link needs to be established between the simulated inter-ferogram and the indirect dimension it represents in theexperiment. In the Bruker representation of dimension-specificdata, an integer prefix is associated with a particular dimension.In this case, a dimension prefix is encoded into the XML data ofthe interferogram, such that it matches the dimension prefix usedby in the experimental parameters. For example, a prefix of ‘‘1’’may represent the 15N dimension in the experimental setup. In thisscenario, parameters specific to the 15N dimension would then beretrieved from the experiment using the prefix ‘‘1’’, e.g. ‘‘1 TD’’may be the number of time domain points in the 15N dimension.To implement this, regular expressions are used to reconstructthe desired parameter label and retrieve its associated value (e.g.‘‘1 TD 2048’’), based on the given dimension prefix (‘‘1’’) and thespecific desired parameter (‘‘TD’’).

3.5.2. Interpolation of simulated interferograms in multipledimensions

The interferograms output from Spinach simulations typicallyprovide sparse sampling compared to the number of time domainpoints requested from the relevant input parameter. In this case, itis necessary to interpolate the data for the given number of timepoints, as described above.

Once the data has been interpolated for all sampled values of sC,the interpolated data can then be used to estimate the values forany desire rotational correlation time sC, specified as a parameterby the user.

For each individual timepoint, t, the amplitude of the signal, At,can be calculated based on the following formula:

At ¼ A0 � e�sC Ct ð1Þ

where A0 is the initial amplitude of the signal at time zero in theinterferogram, sC is the rotational correlation time, and C is a con-stant that can be calculated separately for each timepoint, basedon the simulated points available for that timepoint. Based onknown datapoints an average value for constant C can be calculated.The more datapoints that are available for a given timepoint, themore accurate the constant C will become. Simulated interfero-grams for very small values of sC (<1 ns) are excluded during thiscalculation, since these do not reflect the relaxation properties thatare being modelled. For complex data, the real and imaginary com-ponents are calculated independently.

Note that amplitudes of the initial signal are normalised basedon a value of 1 at the beginning of the pulse sequence. Due to lossof signal during magnetisation transfer, the initial amplitude A0 ofthe simulated interferograms will be less than one; the magnitudeof the loss of signal is typically proportional to the rotational cor-relation time. If only relaxation effects are to be observed, indepen-dent of the loss of magnetisation during the pulse sequence, thenthe initial amplitude of each interferogram could be normalisedto a value of 1.

3.6. Calculate the signal peak amplitude

An initial value for the signal is given by the maximum peakamplitude of the points in the spectrum. However, this initial valuealso includes the underlying noise within the spectrum. To estab-lish the true signal, some mechanism must be able to separate

the noise from the signal. Since noise is, by definition, random, itwill average to zero over a sufficient number of trials. Thus the truesignal will in turn emerge if the maximum amplitude is sampledover the same number of trials. Hence a number of iterations arerequired to gain a reasonable estimate of the analytical parame-ters. The error in the signal is calculated as the RMSD of the signalacross all trials.

3.7. Calculate noise as perturbation of baseline

For each trial, since the noise within a spectrum will averageto zero, the magnitude of the noise is measured as the RMSD ofthe noise from the baseline. For complex data points, each pointis multiplied by its complex conjugate, such that a positive realnumber is produced, representing deviation from the expectedvalue.

3.8. Calculate linewidth

The linewidth and the amplitude of the signal are directly esti-mated from the simulated 1D spectrum containing noise and usedas the benchmark parameters in analytical statistics. This enablesestimating the effect of noise on the expected variation of peakamplitude and peak maximum, translating into uncertainty ofthe expected chemical shifts. These variations are significant inthe case of low signal-to-noise ratio (e.g., S/N < 10) representing atypical and most interesting case of multidimensional NMR of pro-teins. Processing of the simulated interferogram results in a 1Dspectrum for each indirect dimension whereby the signal peak isin the centre. The amplitude of the signal can be retrieved fromthe value at this midpoint. The linewidth is calculated as the fullwidth at half height. The half-height is simply half of the maximumamplitude. To determine the linewidth, the point below the peakfrequency nearest to the half-height is identified (comparing withthe real component of each point). This allows the number ofpoints between the half-height and the midpoint to be identified;this number is doubled to get the number of points representingthe full width at half-height, i.e. between the lower and upperpoints at half-height. By calculating the frequency width per pointin the spectrum (spectral width/number of points), the linewidthcan be converted from a number of points to an actual frequency.Using this approach, the accuracy of the linewidth will be deter-mined by the digital resolution of the signal peak; if there are nopoints located close to the half-height of the peak, then the line-width may not be calculated accurately. The linewidth is calculatedfor each trial, and an RMSD is calculated for the variation of line-widths across these trials, giving an indication of the error in thiscalculation.

3.9. Calculate signal-to-noise

A signal-to-noise is calculated for each trial of each indirectdimension based on the signal amplitude divided by the RMSD ofthe noise. This is then averaged across all trials, with an error esti-mated from the RMSD of these values.

4. Results

To illustrate the use of the web-based NMR database, a TROSY-ST2-PT pulse program was implemented and used to generate anNMR experiment [12], This experiment was originally developedby one of the authors (K.P.) and extensively analysed and opti-mised in terms of relaxation losses in the magnetisation transferpathways [10,13,14]. Since the main application of this experimentis to obtain 1H–15N correlations in large proteins, it is critically

126 A.G. Irvine et al. / Journal of Magnetic Resonance 239 (2014) 121–129

important to have as comprehensive as possible a numericalmodel of the performance of this experiment to make meaningfulpredictions of the expected spectral sensitivity thus dictating thechoice of this experiment as an illustration for the use of the NMR-plus service. Simulations of this experiment were performed usingthe Spinach spin dynamics implementation [10] and the outputwas analysed using NMRplus.

4.1. Creation of the TROSY-ST2-PT pulse program

The first step of creating a new pulse program is to assign keyattributes to it, such as name, dimensionality and general descrip-tion. Then the pulse program itself should be inserted. In the caseof the TROSY-ST2-PT example [12], the pulse program contains theBruker syntax containing all the pulse program parameters associ-ated with the pulses, delays, gradients etc. A setup script is inputinto the application as a way to specify the parameters of the pulseprogram. This is similar to specifying the acquisition parameters inTopspin, except the parameters are calculated as a standalone Py-thon script utilising the sample and spectrometer specifications.The setup script used to set the parameters for TROSY-ST2-PTcan be reviewed in the corresponding tabs at the NMRplus servicesite http://nmrplus.cloudapp.net.

Once the setup script is complete, the pulse program can besubmitted to the database. The current user becomes the ownerof the pulse program. They can keep the pulse program privateor make it publicly available. If incomplete, a draft version of thepulse program can be saved and completed at a later date. Aftera new pulse program has been created, modifications or optimisa-tions can be implemented by creating new versions of the existingpulse program. Alternatively, if large changes are made, an existingpulse program can be used as a template for creating a new pulseprogram that is derived from the original.

4.2. Additional components

In order to generate an experiment based on the TROSY-ST2-PT pulse program, the experiment needs to be associated with aspectrometer, sample and additional auxiliary data. The specificsof each of these components need to be entered separately intothe database. This approach provides a tremendous flexibility ingenerating NMR experiments for a given pulse program, wherebyit can be tailored to any available spectrometer and any availablesample.

Just as with pulse programs, different versions can be createdfor any spectrometer, sample or auxiliary component, any of whichcan be made either publicly or privately accessible.

4.3. Enter spectrometer

Details for each spectrometer are entered via the Wiki interface.In the current implementation of NMRplus, a spectrometer shoulddefine not only the specifications of the hardware, but also thesoftware used to control the instrument. This ensures that thecomponents of generated NMR experiments can be allocated tothe correct locations when the Python script is executed. Specifica-tions include the proton frequency of the B0 field, the current ver-sion of the software used (e.g. Topspin 2.1) and the required filestructure. Precise specifications of each channel are also included.With this information provided for each spectrometer, the sameexperiment can be generated for use on many different spectrom-eters, simply by specifying the spectrometer to be used at the timeof execution.

4.4. Enter sample

In a similar manner, the particulars of an individual sample areentered into the online database. Details include the buffer used,isotopic labelling and general remarks about the sample’s purityor origin. Inclusion of the rotational correlation time of the mole-cules, if known, can assist in predictions of sensitivity for experi-ments associated with the sample (see below).

4.5. Auxiliary components

For many pulse programs, additional components are requiredbeyond the script used to specify many of the parameters. Any aux-iliary components, such as waveforms or CPD scripts, can be addedto the online database independently of any pulse program. Thisenables these components to be reused by many different pulseprograms. The auxiliary components are not actually interpretedby the web-based application, but simply saved directly to entriesin the database. In preparation of generating a new NMR experi-ment, the appropriate auxiliary components should be selectedby the user. The selected components are then packaged up withinthe self-extracting archive file such that they will be placed in theappropriate directory on the spectrometer software whenever thePython script is executed.

4.6. Generating a TROSY-ST2-PT experiment

Having entered all the necessary information into the onlinedatabase, an NMR experiment can now be generated. Centralto any experiment is the pulse program itself, so generation ofan experiment using NMRplus is performed by first selectingthe relevant pulse program. Once the user has logged into thesystem, all of the pulse programs that they have access to aredisplayed in a list. This list consists of all pulse programs thathave been made publicly accessible by any user plus the pulseprograms that are private to the current user. If this list is verylong then the user can search for the desired pulse programusing a search filter. Once selected, the current attributes ofthe pulse program are displayed in a variety of tabs. Here, thepulse program and setup parameters can be reviewed. Alongsidethe pulse program, an interface is provided to select all of theassociated attributes to setup an experiment, as illustrated inFig. 3.

The sample and spectrometer are selected from dynamicallygenerated lists of those in the database that the user has accessto. An appropriate name is given to the experiment, which will alsobecome the filename of the generated Python script. Any numberof experiment specific parameters, such as optimised frequencyoffset or pulse time values, can be specified at this point. Theseparameters will override any default values specified in the setupscript. This provides a powerful mechanism to customise any indi-vidual experiment. At this stage, the relevant auxiliary componentsnecessary for the experiment should also be selected. Finally, oncethe Generate button is pressed, the input attributes are validatedand a bundled self-extracting archive Python script is generated.This can be executed directly on the specified spectrometer torun the experiment.

4.7. Analysis of the TROSY-ST2-PT experiment

4.7.1. Preparation of a Spinach marco for the TROSY-ST2-PTexperiment

Once an experiment has been generated, simulations can be runto analyse the performance of the experiment, without the need toexecute it on the spectrometer. The spin dynamics software Spin-ach is capable of simulating a large variety of NMR experiments

Fig. 3. Generating a NMR experiment using NMRplus. Illustrated are the sample,spectrometer, parameters and auxiliary components chosen for setup of a TROSY-ST2-PT experiment. Once initiated, the application generates a self-extractingPython script that can be run directly on the specified spectrometer.

A.G. Irvine et al. / Journal of Magnetic Resonance 239 (2014) 121–129 127

and this process is facilitated by NMRplus. A draft version of theSpinach macro can be provided by the application. This translatesall the relevant experimental parameters into the Spinach syntax.However, due to the complexities of Spinach, some manualintervention is required before execution. Once finalised, the cor-rected macro can be saved back into the online database, such thatnon-expert users can run the Spinach analysis at any time.

4.7.2. Execution of the Spinach marco for the TROSY-ST2-PTexperiment

Execution of an experiment in Spinach is performed indepen-dently of the online application. The current version of Spinach isimplemented as a Matlab application and is freely available fordownload from spindynamics.org. A variety of simulations of the

Fig. 4. Simulated free induction decays produced by Spinach for the TROSY

TROSY-ST2-PT experiment were run using a different rotationalcorrelation time (sC) for each simulation. The output from eachsimulation is in the form of a matrix. These are then analysed toextract the simulated interferogram for each sC and collate the datainto a single XML file (see the script tab at http://nmrplus.org).This interferogram data is illustrated graphically in Fig. 4. InNMRplus, this XML data is entered into the database and associ-ated with the TROSY-ST2-PT pulse program where the data canbe analysed.

4.7.3. Calculating S/N statistics for the TROSY-ST2-PT experimentThe online application provides a number of parameters that

provide flexibility in the calculation of the S/N of the experiment.Most notably, the S/N predicted for each indirect dimension isdetermined based on the experimentally determined (or postu-lated) S/N from a 1D spectrum, i.e. the direct dimension. An exam-ple of the parameters used to predict S/N statistics for the TROSY-ST2-PT experiment is shown in Fig. 5.

The algorithms used to predict the S/N are as described earlier.The predicted S/N for the TROSY-ST2-PT experiment at a variety ofrotational correlation times in the 15N dimension is shown in sup-plementary Fig. S1. The value of sC for which S/N prediction was re-quested from user input is always displayed first. As expected, theS/N in the indirect dimension decreases as the rotational correla-tion time increases.

4.8. Using NMRplus to calculate the S/N of a 1D proton spectrum

The analytical tool of NMRplus primarily facilitates the calcula-tion of S/N in each of the indirect dimensions of a multi-dimensional experiment, based on both simulations and on a S/Nparameter input from a 1D experiment. However, the same analyt-ical tool can be used to facilitate the calculation of S/N in a 1D pro-ton spectrum.

In order to accurately quantify the signal-to-noise for a proteinsample from a 1D proton spectrum, accurate quantification of therelevant signals is required. Clearly the amplitude of individualproton peaks in the sample will vary greatly, and peaks arisingfrom the solvent should not be considered. The overall signal needsto be a reasonable average of all proton amplitudes for the givensample.

-ST2-PT experiment using a series of rotational correlation times (sC).

Fig. 5. Parameters for predicting S/N of multi-dimensional NMR experiments inNMRplus. The parameters specified by the user are combined with the simulatedinterferogram data to predict S/N in each of the indirect dimensions.

128 A.G. Irvine et al. / Journal of Magnetic Resonance 239 (2014) 121–129

Fig. 6 shows a 1D proton spectrum for a 25 kDa soluble protein.To quantify the average signal of the sample, focus was put on thefingerprint region (5.5–10 ppm). Large overlap between individualproton signals makes it difficult to estimate signal directly. How-ever, since it is known that this region produces signals from theamide and aromatic protons, the total number of expected peakscan be calculated from the primary sequence of the protein.

The b-catenin construct shown in Fig. 6 contains 199 aminoacids, however, multimeric aggregates of this protein are expectedin solution [15]. The total number of protons expected in the fin-gerprint region, based on the number of amide protons and aro-matic protons, was calculated to be 375. The total number ofpoints for the acquired spectrum was set to 2048. The fingerprintsignal was taken as the region spanning from 5.50 to 9.85 ppm(points 498–943). The noise was sampled across 300 points span-ning 10.80–13.70 ppm (points 100–399). A summation of theamplitudes for all points calculates a total signal for all protonsin the region. This value is converted into an average signal perproton. The relationship between the total signal and averageamplitude per proton can be expressed as follows:

XN

i¼1

Si ¼ NaDm1=2h ð2Þ

where N is the number of protons (375 for the b-catenin example), Sis the total signal in the fingerprint region of the spectrum, Dm1/2 is

Fig. 6. 1D proton spectrum of a 25 kDa recombinant protein representing the firstfour armadillo repeats of transcriptional co-activator b-catenin showing all signalsfrom the sample; Insert: close-up showing the fingerprint region (5.5–10 ppm).Baseline correction was optimised around the fingerprint region, so that both thesignal and noise of this region could be accurately quantified.

the peak linewidth (full width at half height), h is the peak ampli-tude (height), and a is a constant representing the proportion ofthe area (amplitude � linewidth) populated by the peak.

Fig. 7. x1(1H) slices of the TROSY-ST2-PT HSQC experiment for a b-catenin sample.(A) A variety of slices through experimentally acquired spectrum in the 15Ndimension, each showing a single peak. Top, column 178; middle, column 355;bottom, column 752. (B) Analytically predicted spectrum using NMRplus, based oninterferogram data from Spinach spin dynamic simulations. Analytical parametersincluded the experimentally acquired S/N from the direct dimension, spectrometerfrequency and the expected rotational correlation time of the protein. Using arotational correlation time of 68 ns, the analytics produced a S/N and linewidthsimilar to that of the experimentally derived data.

A.G. Irvine et al. / Journal of Magnetic Resonance 239 (2014) 121–129 129

This formula treats the signal as a rectangular area representedby the amplitude of the peak multiplied by the linewidth. If we as-sume that peaks approximate to a Gaussian shape for each proton,and since the linewidth is define as full width at half height, theamount of signal discarded below the half height is approximatelyequal the amount of extra signal gained above half height, so thedefined area represents a good approximation of the actual area(i.e. we assume a = 1.0). Thus, calculation of the average signalamplitude depends on achieving a reasonable estimation of anaverage linewidth for the protons in the fingerprint region.

One method for achieving this is to simulate the one dimen-sional pulse program using the Spinach software. Simulations wereperformed for rotational correlation times at 10 ns steps from10 ns to 60 ns. The resulting interferograms were formatted andentered into NMRplus.

In analysis of the data, the following were used as input param-eters to NMRplus: sC = 15 ns; time domains = 2048; spectralwidth = 20 ppm; spectrometer frequency = 800 Hz; number ofpoints = 2048. The analysis resulted in a linewidth of 15.625 Hz.

A Matlab script was written to compute the average signal,based on the formula above, inputting the calculated linewidth,Dm1/2, as a parameter to the equation. The RMSD of the noise wasalso calculated from the exported values. The final calculated sig-nal-to-noise for the experimentally acquired 1D proton spectrumwas computed as 30.96.

This signal-to-noise value from the 1D proton spectrum couldthen be used as a parameter to predict the signal-to-noise thatmay be expected for this sample in a multi-dimensional experi-ment, based on simulated experiments using the Spinach spindynamics software. Knowledge of the S/N for any individual sam-ple in the direct dimension should allow prediction of the S/Nexpected for each indirect dimension.

For the b-catenin example above, the 1D S/N was used to predictthe expected S/N for the 15N dimension of a TROSY-ST2-PT exper-iment. The previously described Spinach simulations of the pulsesequence at various rotational correlation times allowed analysisof the interferograms and prediction of S/N for the b-cateninsample, by inserting the relevant parameters for this protein. Thepredicted S/N and linewidth could then be compared to theexperimentally derived TROSY-ST2-PT spectrum, by comparingrepresentative slices in the 15N dimension with the 15N spectrumproduced from analytics of the interferogram data, Fig. 7.

The monomer state of b-catenin protein has a rotational corre-lation time (sC) of approximately 15 ns. However, when this sC isused in NMRplus with the interferogram data, a much larger S/Nis produced compared with the experimental data. However, ifthe b-catenin protein is in a multimeric state, a much larger sC

would be expected. With a sC = 68 ns, a S/N = 20.1 is produced,with a linewidth = 30 Hz, which is comparable to the observedexperimental data. This is in agreement with evidence of a mul-timeric state for this protein in another recent structural studyof a similar b-catenin construct [15]. NMR relaxation propertiesof the construct were consistent with a molecular mass >65 kDa.

5. Conclusions

This study has presented a web accessible database forNMR experiments. The application facilitates collaborative

development, annotation, sharing and optimisation of pulse pro-grams and other experimental components. Users can use thesecomponents to generate a wide variety of NMR experiments in aflexible manner. Furthermore, the software can predict the sensi-tivity of these experiments by working in conjunction with theSpinach spin dynamics software.

Acknowledgments

The authors thank Singapore National Research Foundation,Singapore Ministry of Education as well as Swiss National ScienceFoundation for financial support (NRF CRP, MOE ARC Tier-2 grantsand SNF grant to Konstantin Pervushin, respectively). The authorswould like to thank RIKEN NMR facility (Japan) for the productionof truncated constructs of b-catenin and acquisition of triple-reso-nance spectra.

Appendix A. Supplementary material

Supplementary data associated with this article can be found, inthe online version, at http://dx.doi.org/10.1016/j.jmr.2013.12.004.

References

[1] T. Allman, A.D. Bain, J.R. Garbow, SIMPLTN, a program for the simulation ofpulse NMR spectra, J. Magn. Reson., Ser. A 123 (1996) 26–31.

[2] M. Bak, J.T. Rasmussen, N.C. Nielsen, SIMPSON: a general simulation programfor solid-state NMR spectroscopy, J. Magn. Reson. 147 (2000) 296–330.

[3] W.B. Blanton, BlochLib: a fast NMR C++ tool kit, J. Magn. Reson. 162 (2003)269–283.

[4] F.S. Debouregas, J.S. Waugh, Antiope, a program for computer experiments onspin dynamics, J. Magn. Reson. 96 (1992) 280–289.

[5] P. Nicholas, D. Fushman, V. Ruchinsky, D. Cowburn, The virtual NMRspectrometer: a computer program for efficient simulation of NMRexperiments involving pulsed field gradients, J. Magn. Reson. 145 (2000)262–275.

[6] S.A. Smith, T.O. Levante, B.H. Meier, R.R. Ernst, Computer-simulations inmagnetic-resonance – an object-oriented programming approach, J. Magn.Reson., Ser. A 106 (1994) 75–105.

[7] W. Kohn, Nobel lecture: electronic structure of matter-wave functions anddensity functionals, Rev. Mod. Phys. 71 (1999) 1253–1266.

[8] I. Kuprov, N. Wagner-Rundell, P.J. Hore, Bloch-Redfield-Wangsness theoryengine implementation using symbolic processing software, J. Magn. Reson.184 (2007) 196–206.

[9] I. Kuprov, Polynomially scaling spin dynamics II: further state-spacecompression using Krylov subspace techniques and zero track elimination, J.Magn. Reson. 195 (2008) 45–51.

[10] R.R. Senthamarai, I. Kuprov, K. Pervushin, Benchmarking NMR experiments: arelational database of protein pulse sequences, J. Magn. Reson. 203 (2010)129–137.

[11] I. Kuprov, N. Wagner-Rundell, P.J. Hore, Polynomially scaling spin dynamicssimulation algorithm based on adaptive state-space restriction, J. Magn. Reson.189 (2007) 241–250.

[12] K.V. Pervushin, G. Wider, K. Wuthrich, Single transition-to-single transitionpolarization transfer (ST2-PT) in [15N,1H]-TROSY, J. Biomol. NMR 12 (1998)345–348.

[13] D. Yang, L.E. Kay, Improved 1HN-detected triple resonance TROSY-basedexperiments, J. Biomol. NMR 13 (1999) 3–10.

[14] D. Nietlispach, Suppression of anti-TROSY lines in a sensitivity enhancedgradient selection TROSY scheme, J. Biomol. NMR 31 (2005) 161–166.

[15] M. de la Roche, T.J. Rutherford, D. Gupta, D.B. Veprintsev, B. Saxty, S.M. Freund,M. Bienz, An intrinsically labile a-helix abutting the BCL9-binding site of b-catenin is required for its inhibition by carnosic acid, Nat. Commun. 3 (2012),http://dx.doi.org/10.1038/ncomms1680.