g model article in press - tufts...

10
Please cite this article in press as: A. Robbat Jr., et al., Optimizing targeted/untargeted metabolomics by automating gas chromatogra- phy/mass spectrometry (GC–GC/MS and GC/MS) workflows, J. Chromatogr. A (2017), http://dx.doi.org/10.1016/j.chroma.2017.05.017 ARTICLE IN PRESS G Model CHROMA-358514; No. of Pages 10 Journal of Chromatography A, xxx (2017) xxx–xxx Contents lists available at ScienceDirect Journal of Chromatography A j o ur na l ho me page: www.elsevier.com/locate/chroma Full length article Optimizing targeted/untargeted metabolomics by automating gas chromatography/mass spectrometry (GC–GC/MS and GC/MS) workflows Albert Robbat Jr. a,, Nicole Kfoury a , Eugene Baydakov b , Yuriy Gankin b a Department of Chemistry, Tufts University, 200 Boston Ave, Suite G700, Medford, MA, 02155, United States b EPAM Systems, 41 University Drive, Newtown, PA 18940, United States a r t i c l e i n f o Article history: Received 26 January 2017 Received in revised form 20 April 2017 Accepted 5 May 2017 Available online xxx Keywords: Annotated database building MS subtraction Spectral deconvolution GC–GC/MS GC/MS a b s t r a c t New database building and MS subtraction algorithms have been developed for automated, sequential two-dimensional gas chromatography/mass spectrometry (GC–GC/MS). This paper reports the first use of a database building tool, with full mass spectrum subtraction, that does not rely on high resolution MS data. The software was used to automatically inspect GC–GC/MS data of high elevation tea from Yunnan, China, to build a database of 350 target compounds. The database was then used with spectral deconvolution to identify 285 compounds by GC/MS of the same tea. Targeted analysis of low elevation tea by GC/MS resulted in the detection of 275 compounds. Non-targeted analysis, using MS subtraction, yielded an additional eight metabolites, unique to low elevation tea. © 2017 Elsevier B.V. All rights reserved. 1. Introduction Botanicals and their essential oils are highly complex prod- ucts that contain thousands of volatile organic compounds (VOCs). These plant-based flavorants are used in foods, beverages, and pharmaceuticals and personal care, cleaning supply and other con- sumer products as odorants. When and where these products are sourced as well as fruits, vegetables, tea, coffee, cocoa, herbs and spices is becoming increasingly important. For example, recently we showed that under extreme weather conditions such as the East Asian monsoon rains the quality of tea changes significantly [1–3]. Not only is tea less flavorful after the heavy rains, which farmers and buyers know well, the concentrations of antioxidant, anticancer, anti-inflammatory, antifungal, and antibacterial compounds also change drastically 5-days after the start of the monsoon rains. Although some metabolite concentrations changed little, other sensory and nutraceutical compounds increased or decreased hun- dreds of percent. These findings make evident that concentration changes are unrelated to leaf growth and are of great importance to industry and consumers alike given the current and expected Corresponding author. E-mail address: [email protected] (A. Robbat Jr.). increases in frequency and duration of extreme weather events [4,5]. Toward this end, we developed new data analysis software to analyze 2-dimensional, automated, sequential gas chromatog- raphy/mass spectrometry (GC–GC/MS) data to obtain retention time and mass spectral constituent information in complex sam- ples [6,7]. The same software can be used to track their presence by GC/MS [8–10]. Although GC/MS is quantitative, the technique by itself cannot quantify the number of metabolites identified by GC–GC/MS or differentiate them in samples exposed to extreme weather nor track them in the manufacture of finished goods with- out employing spectral deconvolution algorithms [11]. While GC–GC/MS is excellent at producing retention time and mass spectral data, it is extremely time-consuming. For example, if the 1st column separation employs a 40-min temperature pro- gram and 1-min sample portions are transferred from the 1st to the 2nd column, a total of 40, 2nd dimension data files are produced. If the 2nd column is a 50-min separation, the analysis of a single sample takes days. In addition, despite the increase in separation space GC–GC/MS offers, coelution still occurs due to the complex- ity of natural products. Also, high concentration analytes such as limonene in citrus oils will appear in multiple data files due to flow switch imprecision, which means the same compound must be rec- onciled to eliminate redundancies in the database. The total time we spend creating one library takes months to accomplish. To over- http://dx.doi.org/10.1016/j.chroma.2017.05.017 0021-9673/© 2017 Elsevier B.V. All rights reserved.

Upload: others

Post on 12-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: G Model ARTICLE IN PRESS - Tufts Universityase.tufts.edu/chemistry/robbat/documents/2017_optimizingTargeted.… · cite this article in press as: A. Robbat Jr., et al., Optimizing

C

F

Ocw

Aa

b

a

ARRAA

KAMSGG

1

uTpssswANbacAsdct

h0

ARTICLE IN PRESSG ModelHROMA-358514; No. of Pages 10

Journal of Chromatography A, xxx (2017) xxx–xxx

Contents lists available at ScienceDirect

Journal of Chromatography A

j o ur na l ho me page: www.elsev ier .com/ locate /chroma

ull length article

ptimizing targeted/untargeted metabolomics by automating gashromatography/mass spectrometry (GC–GC/MS and GC/MS)orkflows

lbert Robbat Jr. a,∗, Nicole Kfoury a, Eugene Baydakov b, Yuriy Gankin b

Department of Chemistry, Tufts University, 200 Boston Ave, Suite G700, Medford, MA, 02155, United StatesEPAM Systems, 41 University Drive, Newtown, PA 18940, United States

r t i c l e i n f o

rticle history:eceived 26 January 2017eceived in revised form 20 April 2017ccepted 5 May 2017vailable online xxx

a b s t r a c t

New database building and MS subtraction algorithms have been developed for automated, sequentialtwo-dimensional gas chromatography/mass spectrometry (GC–GC/MS). This paper reports the first useof a database building tool, with full mass spectrum subtraction, that does not rely on high resolutionMS data. The software was used to automatically inspect GC–GC/MS data of high elevation tea fromYunnan, China, to build a database of 350 target compounds. The database was then used with spectral

eywords:nnotated database buildingS subtraction

pectral deconvolutionC–GC/MS

deconvolution to identify 285 compounds by GC/MS of the same tea. Targeted analysis of low elevationtea by GC/MS resulted in the detection of 275 compounds. Non-targeted analysis, using MS subtraction,yielded an additional eight metabolites, unique to low elevation tea.

© 2017 Elsevier B.V. All rights reserved.

C/MS

. Introduction

Botanicals and their essential oils are highly complex prod-cts that contain thousands of volatile organic compounds (VOCs).hese plant-based flavorants are used in foods, beverages, andharmaceuticals and personal care, cleaning supply and other con-umer products as odorants. When and where these products areourced as well as fruits, vegetables, tea, coffee, cocoa, herbs andpices is becoming increasingly important. For example, recentlye showed that under extreme weather conditions such as the Eastsian monsoon rains the quality of tea changes significantly [1–3].ot only is tea less flavorful after the heavy rains, which farmers anduyers know well, the concentrations of antioxidant, anticancer,nti-inflammatory, antifungal, and antibacterial compounds alsohange drastically 5-days after the start of the monsoon rains.lthough some metabolite concentrations changed little, otherensory and nutraceutical compounds increased or decreased hun-reds of percent. These findings make evident that concentration

Please cite this article in press as: A. Robbat Jr., et al., Optimizing targphy/mass spectrometry (GC–GC/MS and GC/MS) workflows, J. Chrom

hanges are unrelated to leaf growth and are of great importanceo industry and consumers alike given the current and expected

∗ Corresponding author.E-mail address: [email protected] (A. Robbat Jr.).

ttp://dx.doi.org/10.1016/j.chroma.2017.05.017021-9673/© 2017 Elsevier B.V. All rights reserved.

increases in frequency and duration of extreme weather events[4,5].

Toward this end, we developed new data analysis softwareto analyze 2-dimensional, automated, sequential gas chromatog-raphy/mass spectrometry (GC–GC/MS) data to obtain retentiontime and mass spectral constituent information in complex sam-ples [6,7]. The same software can be used to track their presenceby GC/MS [8–10]. Although GC/MS is quantitative, the techniqueby itself cannot quantify the number of metabolites identified byGC–GC/MS or differentiate them in samples exposed to extremeweather nor track them in the manufacture of finished goods with-out employing spectral deconvolution algorithms [11].

While GC–GC/MS is excellent at producing retention time andmass spectral data, it is extremely time-consuming. For example,if the 1st column separation employs a 40-min temperature pro-gram and 1-min sample portions are transferred from the 1st to the2nd column, a total of 40, 2nd dimension data files are produced.If the 2nd column is a 50-min separation, the analysis of a singlesample takes days. In addition, despite the increase in separationspace GC–GC/MS offers, coelution still occurs due to the complex-ity of natural products. Also, high concentration analytes such aslimonene in citrus oils will appear in multiple data files due to flow

eted/untargeted metabolomics by automating gas chromatogra-atogr. A (2017), http://dx.doi.org/10.1016/j.chroma.2017.05.017

switch imprecision, which means the same compound must be rec-onciled to eliminate redundancies in the database. The total timewe spend creating one library takes months to accomplish. To over-

Page 2: G Model ARTICLE IN PRESS - Tufts Universityase.tufts.edu/chemistry/robbat/documents/2017_optimizingTargeted.… · cite this article in press as: A. Robbat Jr., et al., Optimizing

ING ModelC

2 omato

ctmmbts

larimomsc[[C(MP[TsddcpdptTaMc

w.isrstWNimttmod

2

2

awtim

ARTICLEHROMA-358514; No. of Pages 10

A. Robbat Jr. et al. / J. Chr

ome these deficiencies, we developed new data analysis softwarehat automatically inspects each peak in the data file, subtracts the

ass spectrum of a compound from the total ion current (TIC) chro-atogram, and evaluates whether the residual signal approximates

ackground noise. When this occurs, compound identity, retentionime, mass spectrum, and deconvolution ions are uploaded to theoftware.

Rasmussen and Isenhour [12] first assessed the efficiency ofibrary search algorithms to identify unknowns, followed by Steinnd Scott [13] and McLafferty et al. [14] Recently, Stein [15]eviewed the basic principles and factors that affect compounddentification using mass spectral reference libraries while Spark-

an [16], Koo [17] and Samokhin [18] compared the performancef newer library-matching algorithms [19–22] to those of Ras-ussen, Stein, and McLafferty. The development of early mass

pectral deconvolution software aimed at untangling spectra ofoeluting compounds was investigated by Champan [23] and Likic24]. More recent deconvolution software was reviewed by Putri25], Du [26] and Yi [27], including vendor-specific software such ashromaTOF (LECO), MassHunter Profinder (Agilent), and MassLynxWaters) as well as ADAP-GC 2.0 [28], AutoDecon [29], AMDIS [30],

etaboliteDetector [31], MetaboAnalyst [32], MetabolomeExpressroject [33], MetAlign [34], mMass [35], MZmine [36], OpenChrom37], PyMS [38], PYQAN [39], SpectConnect [40], and TagFinder [41].he latter group operate on a wide range of data files. All of theseolutions provide spectral matching between library and sampleata. Until BinBase none of the aforementioned software includedatabase functions that allowed analysts to add new information,ompare sample outputs, or track compounds across multiple sam-les [42]. Although BinBase and Mass Profiler (Agilent) can compareata sets, they rely on high resolution MS data to differentiate sam-les. In addition, BinBase is reliant on LECO’s ChromaTOF softwareo deconvolve spectra, limiting its application to LECO instruments.o our knowledge no software program exists that can differenti-te MS fragmentation patterns and automatically subtract a fullS spectrum from the TIC signal to reveal and identify coeluting

ompounds.In this paper, we present new data analysis software, which

orks on all instrument data files that produce an industry standardcdf extension [43–45]. The Ion Analytics software automaticallynvestigates each GC–GC/MS peak, determines mass spectral con-tancy at each scan across the peak. If invariant, uploads theetention time, mass spectrum, and relative abundance of three toix fragmentation ions used for deconvolution as well as the iden-ity of the compound after searching the analyst’s library, NIST,

iley, Adams or any other spectral library that can be saved inIST format. If peak scans are not constant, the software automat-

cally differentiates fragmentation patterns, subtracts the “clean”ass spectrum from the TIC signal at each scan. After spectral sub-

raction the residual signal is compared to background noise. If thewo signals approximate one another, the above mentioned infor-

ation is uploaded into the database. To demonstrate the utilityf the software for quantifying target and non-target compounds,ifferences in high and low elevation tea are presented.

. Experimental

.1. Sample collection and extraction

Tea samples were collected from Yunnan, China at high (1400 m)nd low (650 m) elevation from the same mountain. Tea extracts

Please cite this article in press as: A. Robbat Jr., et al., Optimizing targphy/mass spectrometry (GC–GC/MS and GC/MS) workflows, J. Chrom

ere prepared by simultaneous distillation-extraction [1]. 10 g ofea was brewed in 100 mL of deionized water at 90 ◦C, then cooledn a sealed container for 30 min. Both the infusion and 12 mL of

ethylene chloride were distilled at 100 ◦C and 60 ◦C, respectively,

PRESSgr. A xxx (2017) xxx–xxx

for 2 h with VOCs collected in the organic phase. Anhydrous sodiumsulfate was added to the distillate and concentrated to 500 �L undera stream of purified nitrogen.

2.2. Automated, sequential 2D GC–GC/MS

The instrument configuration and heartcutting process havebeen described in detail elsewhere [1]. Briefly, the first GC (Agi-lent model 6890, Santa Clara, CA) housed the 1st column (C1,30 m × 250 �m × 0.25 �m Rtx-Wax, Restek, Bellefonte, PA) andwas equipped with a flame ionization detector (FID). Operat-ing conditions were: 40 ◦C for 1 min, then ramped to 240 ◦C at5 ◦C/min. C1 was connected to a CIS4 inlet (Gerstel, Mülheim ander Ruhr, Germany), operating in splitless mode, on one end andto a 5-port crosspiece (Gerstel) on the other. The 2nd column(C2, 30 m × 250 �m × 0.25 �m Rxi-5MS, Restek) was housed in GC2 (Agilent model 6890), which was connected to the crosspiecethrough a cryogenic freeze trap (CTS1, Gerstel) on one end and to anAgilent mass spectrometer (model 5975) on the other. C2 operatingconditions were: 40 ◦C for 1 min, and ramped to 280 ◦C at 5 ◦C/min.Both columns operated at 1.2 mL/min of constant helium flow. Amulti-column switching device (MCS, Gerstel) supplied counter-current flow to the crosspiece. Based on 1 min sample portions, atotal of 40 heartcut data files were obtained. Because each heart-cut was an independent analysis, subsequent injections were madeafter each preceding sample portion eluted from both columns.As a result, total analysis time was 3.5 days for each sample. MSoperating conditions were: 230 ◦C and 150 ◦C for the ion sourceand quadrupole, respectively, 70 eV electron impact voltage, and50–350 mass range, 12 scans/sec. A standard mixture of C7–C30 n-alkanes (Sigma-Aldrich, St. Louis, MO) was used to calculate theretention index (RI) for each compound.

2.3. Tea analysis

GC/MS operating conditions were as described in system 2. Con-centrations were calculated as relative peak areas (RPA) exceptfor four compounds. Naphthalene-d8 (Restek) served as the inter-nal standard. Calibration curves were produced for pentanoland terpinolene (TCI, Nihonbashi-honco, Japan), trans-linalooloxide (Sigma-Aldrich), and toluene (Supelco, Bellefonte, PA) from0.5 ug/ml to 50 ug/ml. Response factors were calculated for eachcompound as follows:

RF = AiCISAISCi

where subscripts i and IS refer to calibration compounds and inter-nal standard, and C and A refer to their corresponding concentrationand peak area. Calibration curves were acceptable when the aver-age response factor, relative standard deviation (RSD), over theconcentration range was ≤15%, with r2 ≥ 0.99.

2.4. Data analysis

New data analysis software (Ion Analytics, Andover, MA) wasused to automatically inspect and record compound identities, peakretention times, and mass spectra of GC–GC/MS data for untargetedcompounds. For MS subtraction, each software parameter definedbelow is set by the user. First, each peak was screened to deter-mine if the spectrum at each scan was constant (±20%). If so, thesoftware computed the match between sample and library spectra

eted/untargeted metabolomics by automating gas chromatogra-atogr. A (2017), http://dx.doi.org/10.1016/j.chroma.2017.05.017

(e.g., NIST, Wiley, Adams) using the NIST reverse search spectralsimilarity algorithms. [30,46] If the fit was acceptable, compoundname, CAS #, retention time, reference spectrum, 3–6 target ionsand relative abundances were recorded in the database. In contrast,

Page 3: G Model ARTICLE IN PRESS - Tufts Universityase.tufts.edu/chemistry/robbat/documents/2017_optimizingTargeted.… · cite this article in press as: A. Robbat Jr., et al., Optimizing

ARTICLE IN PRESSG ModelCHROMA-358514; No. of Pages 10

A. Robbat Jr. et al. / J. Chromatogr. A xxx (2017) xxx–xxx 3

F romath

spwrcACsaiwsnwarnsstdtsn

vomssa

I

ig. 1. GC–GC/MS analysis of high elevation tea from Yunnan, China. The top cheartcuts at 9 and 26 min on Rxi-5MS (C2).

ample and library or literature [47–50] retention index (RI) com-arisons were manual. Approximately 250 reference compoundsere used to confirm compound identity by comparing sample and

eference spectra and retention index. These standards were pur-hased from Sigma-Aldrich, TCI, Acros Organics (Pittsburgh, PA),lfa Aesar (Ward Hill, MA), MP Biomedicals (Santa Ana, CA), SPEXertiPrep (Metuchen, NJ) and AccuStandard (New Haven, CT). Ifample spectra and reference or library spectra did not match, thebove information was uploaded into the database with a numericdentifier. Second, if the spectra across the peak varied, the soft-

are employed MS subtraction algorithms to search for constantcans, where the number of contiguous constant scans must beo fewer than three. The average mass spectrum from these scansere subtracted from the TIC signal. Once subtracted, the software

utomatically inspected residual ion signals to determine if theesulting peak scans were constant or approximated backgroundoise, which was determined by inspecting the highest baseline m/zignal. If constant, the mass spectra of the second compound wasubjected to the treatment described above, with identities, reten-ion times, mass spectra, and deconvolution ions uploaded into theatabase. If not (unresolved peak), the software repeated the sub-raction process until the residual signal approximated backgroundignal. If the resulting signal does not meet the user-defined criteria,o additional information is obtained.

Once the database is constructed, it is used with spectral decon-olution to identify target compounds. The analyst can also set eachf these spectral deconvolution parameters. First, the deviation inass spectra must be ≤20% for five or more consecutive scans. The

oftware normalizes confirming ions to the base ions at each peakcan. The reduced ion intensity (relative to the base ion, i = 1), Ii(t),

Please cite this article in press as: A. Robbat Jr., et al., Optimizing targphy/mass spectrometry (GC–GC/MS and GC/MS) workflows, J. Chrom

t scan (t) is defined as follows:

i (t) = Ai (t)RiA1

ogram is the separation on Rtx-Wax (C1). The bottom chromatograms are 1 min

where Ai(t) is the i-th qualifier ion intensity at scan (t) and Ri isthe expected relative ion abundance ratio. For signals that meetthe ion signal ratio criterion (±20%), the molecular and confirmingions appear at the same height. The spectral match, �I, is calculatedby:

�I =∑N−1

i=1

∑Nj=i+1Abs

(Ii − Ij

)

∑N−1i=1 i

where �I is the average relative reduced intensity deviation of eachof the N qualifier ions. The closer �I is to zero, the better the match.Second, the scan-to-scan variance (SSV), �E, must be <5:

�E = �I · log (A1)

The software eliminates additive ion signal due to coelutingcompounds by comparing all ion ratios against one another. The rel-ative error (RE) is computed at each scan. If the ion ratio exceeds theexpected ratio, the residual ion signal is subtracted. An acceptablematch is determined by:

�I ≤ K + �0

A1

where K is the acceptable percentage difference set by the analystand �o is the additive error attributable to background signal orinstrument noise. The compound is considered present when �Ior �E ≤ �Emax, where the maximum allowable SSV is �Emax. Thiscriterion measures both the �I at each scan and its variance fromone scan to the next. Third, the Q-value must be ≥93. It is an inte-ger between 1 and 100. The Q-value measures the total ion ratio

eted/untargeted metabolomics by automating gas chromatogra-atogr. A (2017), http://dx.doi.org/10.1016/j.chroma.2017.05.017

deviation of the absolute value of the expected minus observedion ratio divided by the expected ion ratio times 100 for each ionacross the peak. The closer the value is to 100, the higher the cer-tainty between the database and sample spectra. Fourth, the Q-ratio

Page 4: G Model ARTICLE IN PRESS - Tufts Universityase.tufts.edu/chemistry/robbat/documents/2017_optimizingTargeted.… · cite this article in press as: A. Robbat Jr., et al., Optimizing

Please cite this article in press as: A. Robbat Jr., et al., Optimizing targeted/untargeted metabolomics by automating gas chromatogra-phy/mass spectrometry (GC–GC/MS and GC/MS) workflows, J. Chromatogr. A (2017), http://dx.doi.org/10.1016/j.chroma.2017.05.017

ARTICLE IN PRESSG ModelCHROMA-358514; No. of Pages 10

4 A. Robbat Jr. et al. / J. Chromatogr. A xxx (2017) xxx–xxx

Fig. 2. Inspection of 2nd dimension peak at 13.23 min. When the mass spectrum is constant across the peak, the software compares the sample spectrum to referencecompound and commercial library spectra to assign identity, in this case, phenylethyl alcohol.

Fig. 3. Inspection of 2nd dimension peak at 3.834 min. If the mass spectra vary across the peak, due to coeluting compounds, the software searches for 3–5 invariant scansand averages them to compare reference and/or library spectra. Spectra 1–3 correspond to toluene and spectra 4–6 correspond to toluene with a coeluting compound (blueions). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Page 5: G Model ARTICLE IN PRESS - Tufts Universityase.tufts.edu/chemistry/robbat/documents/2017_optimizingTargeted.… · cite this article in press as: A. Robbat Jr., et al., Optimizing

ARTICLE IN PRESSG ModelCHROMA-358514; No. of Pages 10

A. Robbat Jr. et al. / J. Chromatogr. A xxx (2017) xxx–xxx 5

Fig. 4. MS subtraction of toluene spectrum (b) from the total ion current (TIC) chromatogram (a) yields residual spectrum (c). If the residual spectrum (c) is constant it iscompared to reference and/or library spectra (d) to assign identity. Since library spectra include ions <50 mass units, see experimental section, the residual spectra (e) isbased on the base ion at m/z 42, hence the resulting signal. The peak ion detail view shown in the right top depicts the TIC (black), toluene (blue) and residual (red) peaksa ach rea is figu

ciiie

3

fcttssrpi

3

iptWCaerwai

fter toluene subtraction whereas the middle box illustrates co-maximization of end library (black) for pentanol. (For interpretation of the references to colour in th

ompares the ratio of molecular ion intensity to confirming ionntensities across the peak. The acceptability limit for this criterions ±20%. The software assigns a compound name or a numericaldentifier when the four compound acceptance criteria are met,stablishing a single acceptance criterion. [1,9,51,52]

. Results and discussion

Although GC–GC/MS is time-consuming, it is the best techniqueor producing comprehensive libraries of chemical constituents inomplex samples. An illustrative example is shown in Fig. 1. Theop chromatogram is the separation of high elevation green tea onhe 1st column, while the bottom two chromatograms depict 1 mineparations at 9 and 26 min. Evident is the increase in separationpace, since the first sample portion corresponds to an unresolvedegion of the chromatogram while the second reveals a few com-ounds on the wax column. More than 50 compounds have been

dentified from these two heartcuts.

.1. Library creation

First, the Automated Method Construction command is used tonspect all 40 data files. If the mass spectrum is constant at eacheak scan, see Fig. 2, the software compares the sample mass spec-rum and retention time against the user and commercial libraries.

hen the compound acceptance criterion is met, compound name,AS #, RT, mass spectrum, and 3–6 target ions and their abundancesre uploaded to the database. For example, phenylethyl alcohollutes at 13.23 min on the Rxi-5MS phase in sample portion 26, with

Please cite this article in press as: A. Robbat Jr., et al., Optimizing targphy/mass spectrometry (GC–GC/MS and GC/MS) workflows, J. Chrom

eference compound data confirming compound identity. In caseshere commercial libraries, e.g., Adams, provide retention index

nd mass spectral information such data also confirm compounddentity. In all other cases, e.g., where NIST/Wiley spectra meet the

sidual ion trace. The bottom box shows the spectrum match of the residual (blue)re legend, the reader is referred to the web version of this article.)

similarity factor match criterion, compounds are considered ten-tatively identified. If the mass spectrum cannot be matched to alibrary spectrum but all other peak confirmation criteria are met,the compound is assigned a unique number that can be updatedwhen reference compounds become available.

If, on the other hand, spectra vary, see Fig. 3, the softwaresearches for 3–5 invariant scans, averages the mass spectra, andcompares sample vs. reference compound patterns. Spectra 1–3correspond to toluene and spectra 4–6 correspond to toluene witha coeluting compound (blue ions). When the acceptance criterionis met, the associated information for that compound, in this casetoluene, is added to the database. Then, the software subtracts theaverage toluene mass spectrum (b) from the TIC (a) signal as shownin Fig. 4 resulting in residual signal (c). These ion signals are consis-tent with scans 4–6 in Fig. 3 after toluene subtraction. Fig. 4 (right,top) shows the TIC (black), toluene (blue) and residual (red) peaks.TIC and residual ion traces co-maximize and are shown in the mid-dle. The bottom box illustrates the match for pentanol when theresidual (blue) and library spectra (black) are merged. Recall thatthe mass spectrometer was scanned from 50 to 350 m/z, whichexplains the missing sample ions in these figures.

The Method Automation Window in Fig. 5 shows 25 peaks weredetected above the user defined peak threshold for sample portion9. The pink row reports the retention time, peak area height, andlibrary match similarity values. In this example, the pink row indi-cates toluene has been subtracted from the TIC in Fig. 4. Similarly,the two pink rows in Fig. 6 indicate toluene (b) and pentanol (c)spectra have been subtracted from the TIC (a) resulting in residual(d) in Fig. 7. The right-hand side makes evident that the residual

eted/untargeted metabolomics by automating gas chromatogra-atogr. A (2017), http://dx.doi.org/10.1016/j.chroma.2017.05.017

TIC (red, top) and each of its contributing ions (middle and bottom)approximate baseline noise.

Page 6: G Model ARTICLE IN PRESS - Tufts Universityase.tufts.edu/chemistry/robbat/documents/2017_optimizingTargeted.… · cite this article in press as: A. Robbat Jr., et al., Optimizing

ARTICLE IN PRESSG ModelCHROMA-358514; No. of Pages 10

6 A. Robbat Jr. et al. / J. Chromatogr. A xxx (2017) xxx–xxx

F ne. Thi rea anf

3

Gpclpwdfitbrtt

ig. 5. Method Automation Window for sample portion 9 after subtraction of toluendicate MS subtraction, with the highlighted line listing the retention time, peak arom the peak.

.2. Target and non-target compound analysis

More than 350 high elevation metabolites were detected byC–GC/MS. Of these, 150 were confirmed using reference com-ounds, with another 104 identified from commercial libraries. Inontrast, GC/MS analysis of the same extract detected 285 metabo-ites. The difference is due to mass on-column. Since 1-min sampleortions are spread over the 2nd dimension column, the 1st columnas purposely overloaded, which is impractical in GC/MS since

etector saturation is more easily achieved. Reference data con-rmed 120 compounds, libraries assigned another 98. Fig. 8 shows

he total and reconstructed ion current (RIC) chromatograms for

Please cite this article in press as: A. Robbat Jr., et al., Optimizing targphy/mass spectrometry (GC–GC/MS and GC/MS) workflows, J. Chrom

oth high and low elevation teas. Each colored peak in the RIC cor-esponds to a specific compound in the sample. The software listshese compounds by color in the legend. The RIC chromatogram ishe base ion signal of the reconstructed ion after spectral decon-

e dialog box reports detection of 25 peaks. The pink line and the line that followsd height, as well as similarity factor for toluene after its spectrum was subtracted

volution of target compounds. When the high elevation databaseis used to analyze the low elevation tea, 275 target compoundsare detected. The balance, 10 metabolites, are unique to the highelevation tea. MS subtraction, non-target analysis, of the lowelevation tea yielded eight unique metabolites. The unique com-pounds in both teas are of sensory and human health importance,as are many of the common compounds. Although informative,the number of common metabolites in each chemical family: 37hydrocarbons, 34 oxygenated monoterpenes, 33 oxygenated het-erocycles, 17 aliphatic alcohols, 15 monoterpene hydrocarbons,13 oxygenated sesquiterpenes, 12 aliphatic aldehydes, 12 acids,11 aliphatic ketones, 10 aliphatic esters, 9 sesquiterpene hydro-

eted/untargeted metabolomics by automating gas chromatogra-atogr. A (2017), http://dx.doi.org/10.1016/j.chroma.2017.05.017

carbons, 7 nitrogen and 3 sulfur containing compounds, and 2oxygenated diterpenes, is less instructive than the differences inconcentration of individual compounds. A follow-up paper willhighlight the significance of these findings.

Page 7: G Model ARTICLE IN PRESS - Tufts Universityase.tufts.edu/chemistry/robbat/documents/2017_optimizingTargeted.… · cite this article in press as: A. Robbat Jr., et al., Optimizing

ARTICLE IN PRESSG ModelCHROMA-358514; No. of Pages 10

A. Robbat Jr. et al. / J. Chromatogr. A xxx (2017) xxx–xxx 7

Fig. 6. Method Automation Window for sample portion 9 after subtraction of toluene ansubtractions have occurred, with the highlighted line reporting retention time, peak area

from the peak. The residual spectrum fails to meet the peak acceptance criterion and end

Table 1Metabolite concentrations and the relative percent difference (RPD) as determinedby MS subtraction and spectral deconvolution algorithms.

Compound r2 Subtraction(�g/ml)

Deconvolution(�g/ml)

RPD

Toluene 0.999 6.73 6.72 0.08Pentanol 0.998 2.94 2.94 0.08Terpinolene 0.997 3.75 3.60 4.26

atlt

calibration curve for each compound is all that is required to quan-tify target and non-target compounds. If, for example, the analyst

trans-Linalool oxide (furanoid)0.999 4.11 4.06 1.43

To assess quantitative differences between the deconvolutionnd MS subtraction algorithms, the concentration of toluene, pen-anol, terpinolene and trans-linalool oxide was measured. Table 1

Please cite this article in press as: A. Robbat Jr., et al., Optimizing targphy/mass spectrometry (GC–GC/MS and GC/MS) workflows, J. Chrom

ists the results. Excellent agreement was obtained as evident byhe relative percent difference (RPD), which was <5%.

d pentanol spectra. The second pink line and the two lines that follow indicate twoand height as well as similarity factor of pentanol after its spectrum was subtracteds the compound identity search.

4. Conclusion

Although software is available that can differentiate one peakscan from the next, only Ion Analytics combines deconvolution, MSsubtraction, and quantitation in the same program to investigatecomplex samples analyzed by different vendor instruments. Oncenon-target compounds are included in the database, they becometarget compounds for the next sample. By building the databaseand annotating key constituents such as sensory or nutritional com-pounds, sample comparison is more easily accomplished. Since theMS subtraction and deconvolution algorithms rely on the base ionof a given compound to quantify its concentration in a sample, one

eted/untargeted metabolomics by automating gas chromatogra-atogr. A (2017), http://dx.doi.org/10.1016/j.chroma.2017.05.017

produces calibration curves from reference standards and provesthat the instrument is in control at some later date by comparing

Page 8: G Model ARTICLE IN PRESS - Tufts Universityase.tufts.edu/chemistry/robbat/documents/2017_optimizingTargeted.… · cite this article in press as: A. Robbat Jr., et al., Optimizing

Please cite this article in press as: A. Robbat Jr., et al., Optimizing targeted/untargeted metabolomics by automating gas chromatogra-phy/mass spectrometry (GC–GC/MS and GC/MS) workflows, J. Chromatogr. A (2017), http://dx.doi.org/10.1016/j.chroma.2017.05.017

ARTICLE IN PRESSG ModelCHROMA-358514; No. of Pages 10

8 A. Robbat Jr. et al. / J. Chromatogr. A xxx (2017) xxx–xxx

Fig. 7. MS subtraction of target compounds toluene (b) and pentanol (c) from the TIC (a) peak. The resulting residual spectrum (d and bottom right-hand box) is ion signalnoise, see baseline (red, top box), whose individual ion traces are also shown (middle box). (For interpretation of the references to colour in this figure legend, the reader isreferred to the web version of this article.)

Fig. 8. GC/MS total and reconstructed ion current (RIC) chromatograms of high and low elevation teas on Rxi-5MS. Each colored peak in the RIC Corresponds to a specificcompound in the sample. The software lists these compounds by color in the legend.

Page 9: G Model ARTICLE IN PRESS - Tufts Universityase.tufts.edu/chemistry/robbat/documents/2017_optimizingTargeted.… · cite this article in press as: A. Robbat Jr., et al., Optimizing

ING ModelC

omato

ttale

D

tK

A

cTw

R

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

ARTICLEHROMA-358514; No. of Pages 10

A. Robbat Jr. et al. / J. Chr

he average response factor obtained over the concentration rangeo the mid-point calibration concentration on the day of analysisnd obtains agreement within ±20%, the concentration of the ana-yte at time of analysis is valid. Such data can be obtained usingither algorithm.

isclosures

Drs. Gankin and Baydakov wrote the software following direc-ion from Dr. Robbat; all of whom have disclosed their interest tofoury. Dr. Robbat is the founder of Ion Analytics.

cknowledgements

No public, commercial, or not-for-profit entity provided finan-ial support for this work. Gerstel GmbH, Gerstel USA, Agilentechnologies, and Ion Analytics provided instruments and softwaree used to analyze the samples.

eferences

[1] A. Kowalsick, N. Kfoury, A. Robbat Jr., S. Ahmed, C. Orians, T. Griffin, S. Cash,J.R. Stepp, Metabolite Profiling of Camellia sinensis by automated sequential,multidimensional gas chromatography/mass spectrometry reveals strongmonsoon effects on tea constituents, J. Chromatogr. A 1370 (2014) 230–239.

[2] S. Ahmed, C.M. Orians, T.S. Griffin, S. Buckley, U. Unachukwu, A.E. Stratton, J.R.Stepp, A. Robbat Jr., S. Cash, E.J. Kennelly, Effects of water availability and pestpressures on tea (Camellia sinensis) growth and functional quality, AoB Plants6 (2013) lt054.

[3] S. Ahmed, J.R. Stepp, C. Orians, T. Griffin, C. Matyas, A. Robbat Jr., S. Cash, D.Xue, C. Long, U. Unachukwu, S. Buckley, D. Small, E. Kennelly, Effects ofextreme climate events on tea (Camellia sinensis) functional quality validateindigenous farmer knowledge and sensory preferences in tropical China, PLoSOne 9 (2014) e109126.

[4] Climate Change Impacts in the United States: The Third National ClimateAssessment, U.S. Global Change Research Program, 2014.

[5] FAO’s work on climate change: united nations climate change conference, in:Food and Agriculture Organization of the United Nations, 2016, 2016.

[6] C. Zeigler, A. Robbat Jr., Comprehensive profiling of coal tar and crude oil toobtain mass spectra and retention indices for alkylatred PAH shows whycurrent methods err, Environ. Sci. Technol. 46 (2012) 3935–3942.

[7] C. Zeigler, M. Schantz, S. Wise, A. Robbat Jr, Mass spectra and retentionindixes for polycyclic aromatic sulfur heterocycles and some alkylatedanalogs, Polycyclic Aromat. Compd. 32 (2012) 154–176.

[8] A. Robbat Jr, A. Kowalsick, Using automated sequential two-dimensional gaschromatography/mass spectrometry to produce a library of essential oilcompounds and track their presence in gin, based on spectral deconvolutionsoftware, in: R. Marsili (Ed.), Flavor, Frangrance, and Odor Analysis, CRC PressBoca Raton, 2011, 2017, pp. 183–206.

[9] Robbat Jr., A. Kowalsick, J. Howell, Tracking juniper berry content in oils anddistillates by spectral deconvolution of gas chromatography/massspectrometry data, J. Chromatogr. A 1218 (2011) 5531–5541.

10] P.M. Antle, C.D. Zeigler, N.M. Wilton, A. Robbat Jr., A more accurate analysis ofalkylated PAH and PASH and its implications in environmental forensics, Int. J.Environ. Anal. Chem. 94 (2014) 332–347.

11] Title III. −Protecting Safety and Security of Food and Supply, A −Protection ofFood Supply, in: U.S.F.a.D. Administration (Ed.), 2002.

12] G.T. Rasmussen, T.L. Isenhour, The evaluation of mass spectral searchalgorithms, J. Chem. Inf. Model. 19 (1979) 179–186.

13] S.E. Stein, D.R. Scott, Optimization and testing of mass spectral library searchalgorithms for compound identification, J. Am. Soc. Mass Spectrom. 5 (1994)859–886.

14] F.W. McLafferty, M. Zhang, D.B. Stauffer, S.Y. Loh, Comparison of algorithmsand databases for matching unknown mass spectra, J. Am. Soc. MassSpectrom. 9 (1998) 92–95.

15] S.E. Stein, Mass spectral reference libraries: an ever-expanding resource forchemical identification, Anal. Chem. 84 (2012) 7274–7282.

16] D. Sparkman, Informatics and mass spectral databases in the evaluation ofenvironmental mass spectral data, in: A. Lebedev (Ed.), ComprehensiveEnvironmental Mass Spectrometry, ILM, Publications, Saint Albans, 2012, pp.89–121.

17] I. Koo, S. Kim, X. Zhang, Comparative analysis of mass spectral

Please cite this article in press as: A. Robbat Jr., et al., Optimizing targphy/mass spectrometry (GC–GC/MS and GC/MS) workflows, J. Chrom

matching-based compound identification in gas chromatography–massspectrometry, J. Chromatogr. A 1298 (2013) 132–138.

18] A. Samokhin, K. Sotnexova, V. Lashin, I. Revelsky, Evaluation of mass spectrallibrary search algorithms implemented in commercial software, J. MassSpectrom. 50 (2015) 820–825.

[

[

PRESSgr. A xxx (2017) xxx–xxx 9

19] I. Koo, X. Zhang, S. Kim, Wavelet- and fourier-transform-based spectrumsimilarity approaches to compound identification in gaschromatography/mass spectrometry, Anal. Chem. 83 (2011) 5631–5638.

20] S. Kim, I. Koo, J. Jeong, S. Wu, X. Shi, X. Zhang, Compound identification usingpartial and semipartial correlations for gas chromatography–massspectrometry data, Anal. Chem. 84 (2012) 6477–6487.

21] C. Li, J. Han, Q. Huang, B. Li, Z. Zhang, C. Guo, An effective two-stage spectrallibrary search approach based on lifting wavelet decomposition forcomplicated mass spectra, Chemom. Intell. Lab. Syst. 132 (2014) 75–81.

22] I. Koo, S. Kim, B. Shi, P. Lorkiewicz, M. Song, C. McClain, Z. Zhang, EIder Acompound identification tool for gas chromatography mass spectrometrydata, J. Chromatogr. A 2016 (1448) 107–114.

23] J.R. Chapman, Trends in automatic data processing, Int. J. Mass Spectrom. IonPhys. 45 (1982) 207–218.

24] V.A. Likic, Extraction of pure components from overlapped signals in gaschromatography-mass spectrometry (GC–MS), BioData Mining 2 (2009).

25] S.P. Putri, S. Yamamoto, H. Tsugawa, E. Fukusaki, Current metabolomics:techonological advances, J. Biosci. Bioeng. 116 (2013) 9–16.

26] X. Du, S.H. Zeisel, Spectral deconvolution for gas chromatography massspectrometry-based metabolomics: current status and future perspectives,Comput. Struct. Biotechnol. J. 4 (2013) 1–10.

27] L. Yi, N. Dong, Y. Yun, B. Deng, D. Ren, S. Liu, Y. Liang, Chemometric methodsin data processing of mass spectrometry-based metabolomics: a review, Anal.Chim. Acta 914 (2016) 17–34.

28] Y. Ni, Y. Qiu, W. Jiang, K. Suttlemyre, M. Su, W. Zhang, W. Jia, X. Du, ADAP-GC2.0: deconvolution of coeluting metabolites from GC/TOF-MS data formetabolomics studies, Anal. Chem. 84 (2012) 6619–6629.

29] M.L. Johnson, L. Pipes, P.P. Veldhuis, L.S. Farhy, D.G. Boyd, W.S. Evans,AutoDecon, a deconvolution algorithm for identification and characterizationof leuteinizing hormone secretory bursts: description and validation usingsynthetic data, Anal. Biochem. 381 (2008) 8–17.

30] S.E. Stein, An integrated method for spectrum extraction and compoundidentification from gas chromatography/mass spectrometry data, J. Am. Soc.Mass Spectrom. 10 (1999) 770–781.

31] K. Hiller, J. Hangebrauk, C. Jager, J. Spura, K. Schreilber, D. Schomburg,MataboliteDetector. comprehensive analysis tool for targetd and nontargetedGC/MS based metabolome analysis, Anal. Chem. 81 (2009) 3429–3439.

32] J. Xia, R. Mandal, I.V. Sinelnikov, D. Boroadhurst, D.S. Wishart, MetaboAnalyst2.0-a comprehensive server for metabolomic data analysis, Nucleic Acids Res.40 (2012) W127–133.

33] A.J. Carroll, M.R. Badger, A.H. Millar, The MetabolomeExpress Project:enabling web-based processing, analysis and transparent dissemination ofGC/MS metabolomics datasets, BMC Bioinf. 11 (2010) 376.

34] A. Lommen, MetAlign: interface-driven, versatile metabolomics tool forhyphenated full-scan mass spectrometry data preprocessing, Anal. Chem. 81(2009) 3079–3086.

35] M. Strohalm, D. Kavan, P. Novak, M. Volny, V. Havlicek, MMass 3: across-platform software environment for precise analysis of massspectrometric data, Anal. Chem. 82 (2010) 4648–4651.

36] T. Pluskal, S. Castillo, A. Villar-Briones, M. Oresic, MZmine 2: modularframework processing, visualizing, and analyzing mass spectrometry-basedmolecular profile data, BMC Bioinf. 11 (2010) 395.

37] P. Wenig, J. Odermatt, OpenChrom: a cross-platform open source software forthe mass spectrometric analysis of chromatographic data, BMC Bioinf. 11(2010) 405.

38] S. O’Callaghan, D.P. De Souza, A. Isaac, Q. Wang, L. Hodkinson, M. Olshansky, T.Erwin, B. Appelbe, D.L. Tull, U. Roessner, A. Bacic, M.J. McConville, V.A. Likic,PyMS: a python toolkit for processing of gas chromatography-massspectrometry (GC–MS) data. Application and comparative study of selectedtools, BMC Bioinf. 13 (2012) 115.

39] M.M. Smits, R. Carleer, J.V. Colpaert, PYQUAN. A rapid workflow around theAMDIS deconvolution software for high throughput analysis of pyrolysisGC/MS data, J. Anal. Appl. Pyrolysis 118 (2016) 335–342.

40] M.P. Styczynski, J.F. Moxley, L.V. Tong, J.L. Walther, K.L. Jensen, G.N.Stephanopoulos, Systematic identification of conserved metabolites in GC/MSdata for metabolomics and biomarker discovery, Anal. Chem. 79 (2007)966–973.

41] A. Luedemann, K. Strassburg, A. Erban, J. Kopka, TagFinder for the quantitativeanalysis of gas chromatography-mass spectrometry (GC–MS)-basedmetabolite profiling experiments, Bioinformatics 24 (2008) 732–737.

42] K. Skogerson, G. Wohlgemuth, D.K. Barupal, O. Fiehn, The volatile compoundBinBase mass spectral database, BMC Bioinf. 12 (2011) 321.

43] ASTM, Standard Specification for Analytical Data Interchange Protocol forChromatographic Data, In: E1947-98, ASTM International, WestConshohocken,PA, 2014.

44] ASTM, Standard Guide for Analytical Data Interchange Protocol forChromatographic Data, In: E1948-98, ASTM International, WestConshohocken, PA, 2014.

45] ASTM, Standard Specificaiton for Analytical Data Interchange Protocol forMass Spectrometric Data, In: E2077-00, ASTM International, WestConshohocken, PA, 2016.

eted/untargeted metabolomics by automating gas chromatogra-atogr. A (2017), http://dx.doi.org/10.1016/j.chroma.2017.05.017

46] S.E. Stein, Optimization and testing of mass spectral library search algorithmsfor compound identification, J. Am. Soc. Mass Spectrom. 5 (1994) 859–866.

47] P. Pripdeevech, S. Wongpornchai, Odor and flavor volatiles of different typesof tea, in: V.R. Preedy (Ed.), Tea in Health and Disease Prevention, Elsevier,Chiang Rai/Chiang Mai, 2013, pp. 307–322.

Page 10: G Model ARTICLE IN PRESS - Tufts Universityase.tufts.edu/chemistry/robbat/documents/2017_optimizingTargeted.… · cite this article in press as: A. Robbat Jr., et al., Optimizing

ING ModelC

1 omato

[

[

[

[

[52] P.M. Antle, C. Zeigler, Y. Gankin, A. Robbat Jr., New spectral deconvolutionalgorithms for the analysis of polycyclic aromatic hydrocarbons and sulfurheterocycles by comprehensive two-dimensional gaschromatography-quadrupole mass spectrometery, Anal. Chem. 85 (2013)10369–10376.

ARTICLEHROMA-358514; No. of Pages 10

0 A. Robbat Jr. et al. / J. Chr

48] M. Zhu, E. Li, H. He, Determination of volatile chemical constitutes in tea bysimultaneous distillation extraction, vacuum hydrodistillation and thermaldesorption, Chromatographia 68 (2008) 603–610.

49] H.-P. Lv, Q.-S. Zhong, Z. Lin, L. Wang, J.-F. Tan, L. Guo, Aroma characterisationof Pu-erh tea using headspace-solid phase microextraction combined withGC/MS and GC–olfactometry, Food Chem. 130 (2012) 1074–1081.

Please cite this article in press as: A. Robbat Jr., et al., Optimizing targphy/mass spectrometry (GC–GC/MS and GC/MS) workflows, J. Chrom

50] V.I. Babushok, P.J. Linstrom, I.G. Zenkevich, Retention indices for frequentlyreported compounds of plant essential oils, J. Phys. Chem. Ref. Data 40 (2011).

51] A. Robbat Jr., N.M. Wilton, A new spectral deconvolution-Selection ionmonitoring method for the analysis of alkylated polycyclic aromatichydrocarbons in complex mixtures, Talanta 125 (2014) 114–124.

PRESSgr. A xxx (2017) xxx–xxx

eted/untargeted metabolomics by automating gas chromatogra-atogr. A (2017), http://dx.doi.org/10.1016/j.chroma.2017.05.017