identification of a set of conserved eukaryotic internal

15
Identification of a Set of Conserved Eukaryotic Internal Retention Time Standards for Data-independent Acquisition Mass Spectrometry Parker, S. J., Rost, H., Rosenberger, G., Collins, B. C., Malmström, L., Amodei, D., Venkatraman, V., Raedschelders, K., Eyk, J. E. V., & Aebersold, R. (2015). Identification of a Set of Conserved Eukaryotic Internal Retention Time Standards for Data-independent Acquisition Mass Spectrometry. Molecular and Cellular Proteomics, 14(10), 2800-2813. https://doi.org/10.1074/mcp.O114.042267 Published in: Molecular and Cellular Proteomics Document Version: Publisher's PDF, also known as Version of record Queen's University Belfast - Research Portal: Link to publication record in Queen's University Belfast Research Portal Publisher rights © 2015 by The American Society for Biochemistry and Molecular Biology, Inc. This work is made available online in accordance with the publisher’s policies. Please refer to any applicable terms of use of the publisher. General rights Copyright for the publications made accessible via the Queen's University Belfast Research Portal is retained by the author(s) and / or other copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated with these rights. Take down policy The Research Portal is Queen's institutional repository that provides access to Queen's research output. Every effort has been made to ensure that content in the Research Portal does not infringe any person's rights, or applicable UK laws. If you discover content in the Research Portal that you believe breaches copyright or violates any law, please contact [email protected]. Download date:13. Feb. 2022

Upload: others

Post on 13-Feb-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Identification of a Set of Conserved Eukaryotic Internal

Identification of a Set of Conserved Eukaryotic Internal RetentionTime Standards for Data-independent Acquisition Mass Spectrometry

Parker, S. J., Rost, H., Rosenberger, G., Collins, B. C., Malmström, L., Amodei, D., Venkatraman, V.,Raedschelders, K., Eyk, J. E. V., & Aebersold, R. (2015). Identification of a Set of Conserved Eukaryotic InternalRetention Time Standards for Data-independent Acquisition Mass Spectrometry. Molecular and CellularProteomics, 14(10), 2800-2813. https://doi.org/10.1074/mcp.O114.042267

Published in:Molecular and Cellular Proteomics

Document Version:Publisher's PDF, also known as Version of record

Queen's University Belfast - Research Portal:Link to publication record in Queen's University Belfast Research Portal

Publisher rights© 2015 by The American Society for Biochemistry and Molecular Biology, Inc.This work is made available online in accordance with the publisher’s policies. Please refer to any applicable terms of use of the publisher.

General rightsCopyright for the publications made accessible via the Queen's University Belfast Research Portal is retained by the author(s) and / or othercopyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associatedwith these rights.

Take down policyThe Research Portal is Queen's institutional repository that provides access to Queen's research output. Every effort has been made toensure that content in the Research Portal does not infringe any person's rights, or applicable UK laws. If you discover content in theResearch Portal that you believe breaches copyright or violates any law, please contact [email protected].

Download date:13. Feb. 2022

Page 2: Identification of a Set of Conserved Eukaryotic Internal

Identification of a Set of Conserved EukaryoticInternal Retention Time Standards for Data-independent Acquisition Mass Spectrometry*□S

Sarah J. Parker‡‡, Hannes Rost§¶, George Rosenberger§¶, Ben C. Collins§,Lars Malmstrom�, Dario Amodei**, Vidya Venkatraman‡‡, Koen Raedschelders‡‡,Jennifer E. Van Eyk‡‡‡, and Ruedi Aebersold§§§¶¶

Accurate knowledge of retention time (RT) in liquid chro-matography-based mass spectrometry data facilitatespeptide identification, quantification, and multiplexing intargeted and discovery-based workflows. Retention timeprediction is particularly important for peptide analysis inemerging data-independent acquisition (DIA) experimentssuch as SWATH-MS. The indexed RT approach, iRT, usessynthetic spiked-in peptide standards (SiRT) to set RT toa unit-less scale, allowing for normalization of peptide RTbetween different samples and chromatographic set-ups.The obligatory use of SiRTs can be costly and compli-cates comparisons and data integration if standards arenot included in every sample. Reliance on SiRTs alsoprevents the inclusion of archived mass spectrometrydata for generation of the peptide assay libraries centralto targeted DIA-MS data analysis. We have identified a setof peptide sequences that are conserved across mosteukaryotic species, termed Common internal RetentionTime standards (CiRT). In a series of tests to support theappropriateness of the CiRT-based method, we show: (1)the CiRT peptides normalized RT in human, yeast, andmouse cell lysate derived peptide assay libraries and en-abled merging of archived libraries for expanded DIA-MSquantitative applications; (2) CiRTs predicted RT inSWATH-MS data within a 2-min margin of error for themajority of peptides; and (3) normalization of RT using the

CiRT peptides enabled the accurate SWATH-MS-basedquantification of 340 synthetic isotopically labeled pep-tides that were spiked into either human or yeast celllysate. To automate and facilitate the use of these CiRTpeptide lists or other custom user-defined internal RTreference peptides in DIA workflows, an algorithm wasdesigned to automatically select a high-quality subset ofdatapoints for robust linear alignment of RT for use. Im-plementations of this algorithm are available for the Open-SWATH and Skyline platforms. Thus, CiRT peptides canbe used alone or as a complement to SiRTs for RT nor-malization across peptide spectral libraries and in quan-titative DIA-MS studies. Molecular & Cellular Proteom-ics 14: 10.1074/mcp.O114.042267, 2800–2813, 2015.

The separation of peptides by reverse phase liquid chro-matography (LC)1 prior to mass spectrometry detection hasbecome an important technique for the in depth analysis ofproteomes in data-dependent acquisition (DDA), data-inde-pendent acquisition (DIA), and targeted mass spectrometryworkflows. In addition, the reproducibility of the retention andsubsequent elution of a peptide from a reversed phase (e.g.C18-based) column at a specific time point along the acidic/organic gradient has led to the application of retention time(RT) as an orthogonal characteristic applicable to confidentpeptide identification (2, 3), targeted quantitative assayscheduling (4), and MS1-based quantitation (5). In targetedworkflows, where an MS1 precursor and associated MS2fragment masses (i.e. a transition group or peptide assay) aremonitored and quantified with high sensitivity, accurate priorknowledge of peptide RT is a critical piece of information usedfor distinguishing between multiple extracted ion chromato-gram (XIC) peaks that may occur along the chromatographic

From the ‡Department of Medicine, Johns Hopkins University,Baltimore Maryland; §Department of Biology, Institute of MolecularSystems Biology, ETH Zurich, Zurich, Switzerland; ¶PhD Program inSystems Biology, University of Zurich and ETH Zurich, Zurich, Swit-zerland; �S3IT, University of Zurich; **Stanford University; ‡‡Ad-vanced Clinical Biosystems Research Institute, The Heart Institute,and Department of Medicine, Cedars-Sinai Medical Center, Los An-geles, California; §§Faculty of Science, University of Zurich, Zurich,Switzerland

Received June 21, 2014, and in revised form, July 20, 2015Published, July 21, 2015, MCP Papers in Press, DOI 10.1074/

mcp.O114.042267Author contributions: S.J.P., H.R., G.R., B.C.C., K.R., J.E.V., and

R.A. designed research; S.J.P., H.R., G.R., and L.M. performed re-search; S.J.P., H.R., G.R., L.M., D.A., and K.R. contributed newreagents or analytic tools; S.J.P., H.R., G.R., D.A., V.V., and R.A.analyzed data; S.J.P., H.R., G.R., B.C.C., J.E.V., and R.A. wrote thepaper.

1 The abbreviations used are: LC, Liquid Chromatography; RT,Retention Time; SiRT, Synthetic indexed Retention Time standardpeptides; CiRT, Common internal Retention Time standard peptides;LC, Liquid Chromatography; DDA, Data-Dependent Acquisition; DIA,Data-Independent Acquisition; iRT-C18, Defined, unitless space de-scribed in Escher et al (1); TPP, Trans Proteomic Pipeline; dRT,Difference between observed and predicted retention time.

Tecnological Innovations and Resources© 2015 by The American Society for Biochemistry and Molecular Biology, Inc.This paper is available on line at http://www.mcponline.org

crossmark

2800 Molecular & Cellular Proteomics 14.10

Page 3: Identification of a Set of Conserved Eukaryotic Internal

space. This is of particular concern in comparative experi-ments where it is crucial that signals arising from identicalpeptides are appropriately aligned and quantified acrosssamples.

The importance of accurate RT knowledge is amplified inemerging DIA analytical strategies such as SWATH-MS (6, 7).With the SWATH-MS approach, large “peptide assay librar-ies” of peptide precursor and selected product ion transitiongroups, akin to those used in multiple reaction monitoring(MRM) experiments, are assembled from shotgun data sets.These peptide assay libraries are used to probe for and ex-tract tens-to-hundreds of thousands of ion chromatogramsfrom complex and convoluted chimeric MS2-fragmentationspectra derived from “all observable” peptides falling withinpredetermined mass windows (for instance, 25 m/z) (6). Withthis high-throughput in silico targeted ion extraction the cur-rently available data analysis pipelines are challenged andorthogonal a-priori information (such as RT and fragment ionintensity information) is needed for correct identification of thetarget signal among multiple peak groups. Multiple peakgroup XIC signals may be detected because of amino acidmodifications, closely related peptide isoforms, or simply theexistence of multiple peptides with similar chemical proper-ties and thus co-eluting transitions (8). Thus, the preciseknowledge of RT is integral to the three major steps in theSWATH-MS analysis workflow: (1) alignment of peptide assaylibrary and DIA-MS chromatograms; (2) prediction of assaylibrary RT for accurate identification of target peptides inDIA-MS files; and (3) estimating confidence of peptide iden-tifications for the accurate quantification of transition groupXICs from DIA-MS files. A method proposed as a means toalign chromatograms for DIA-MS analysis must be tested ineach of these application steps.

Achieving maximum proteome coverage with SWATH-MSis dependent upon the assembly of comprehensive peptideassay libraries, and as such an active goal within the field is tobuild databases composed of every mass-spectrometry ob-servable peptide within a given proteome (9–13). While re-quiring a large amount of resources up front to build theselibraries, once assembled they become a permanent andexchangeable tool for targeted proteomic quantification. Fur-thermore, these peptide assay libraries are applicable for thequantification of both a single protein via SRM and for up tothousands of proteins and potentially their modified forms in aSWATH-MS experiment. Due to the importance of accurateRT knowledge for SWATH-MS, the portability of peptide as-say libraries is critically dependent on the establishment of auniform, normalized RT index to which the raw RTs in peptideassay libraries can be aligned.

Recently, Escher et al. introduced a set of synthetic pep-tides that are added to samples and used to normalize pep-tide RT across any reversed phase LC-gradient elution profile,but in their example with C18 as the stationary phase (iRT-C18). The normalization of RT by the synthetic iRT standards

(SiRTs) allows for the meaningful transfer of peptide assaylibraries between labs (1). The utility and accuracy of the RTalignment generated by the iRT-C18 approach has by nowbeen demonstrated in multiple studies (14–16), and algo-rithms for transforming experimental peptide RTs to a normal-ized iRT-C18 space have been included in software toolsused for both SRM-assay design (Skyline) and SWATH-MSdata analysis (Skyline, OpenSWATH, Spectronaut).

Although the established iRT-C18 approach is effective andreproducible, a typical iRT-normalization relies upon the pres-ence of spiked synthetic peptides within every sample ana-lyzed, which presents a few drawbacks. First, the inclusion ofthe SiRT reference peptides in every single MS-injection be-comes costly, especially for very large experiments and inhigh-throughput laboratories. Second, if the SiRT peptidesare not included, or are of low signal quality, then automateddata analysis of a measurement is currently impossible. Third,the SiRT approach precludes the inclusion of archived MSdata, for example shotgun data collected prior to the devel-opment of the SiRT-based approach, in building assaylibraries.

Here we present a simple solution to these limitations,whereby we have generated lists of common peptides highlyconserved among most eukaryotes along with their normal-ized RT values (termed CiRT, for common iRT peptides). Thepeptides we have chosen are likely to be present in a range ofeukaryotes from yeast to human across a number of experi-mental preparations and thus can be used to translate rawpeptide RTs into a normalized iRT space. Furthermore, in ourdescription of the assembly of these peptide lists we outlinethe workflow necessary for generating alternative RT calibra-tion peptides for samples in which the particular endogenousCiRT peptides described here are not or are only partiallypresent. In this article we have tested the performance of theCiRT method as it is applied to the three critical steps forDIA-MS targeted analysis: (1) peptide assay RT alignment; (2)peptide identification in DIA-MS data files; (3) quantitativeaccuracy based on CiRT alignment. Notably, for this work wehave chosen to use the same normalized RT space estab-lished by Escher and colleagues (iRT-C18) such that currentlyexisting SWATH-MS peptide assay libraries can be directlyapplied in ongoing projects. However, our approach is moregeneral and applicable to any RT normalization scheme andthe iRT-C18 scheme was chosen as a convenient referenceframework for our comparisons.

EXPERIMENTAL PROCEDURES

Biological Sample Preparation—Data sets were generated fromthree separate biological sources.

(1) HEK293 (ATCC, Manassas, VA) cells (Human cell line) at �80%confluency were harvested on ice by pipetting in ice-cold PBS with 1mM EDTA, pelleted by centrifugation for 3 min at 300 � g at 4 °C, andsnap frozen. Cells were lysed on ice in HNN lysis buffer (0.5% NonidetP-40, 50 mM HEPES, pH 7.5, 150 mM NaCl, 50 mM NaF, 200 �M

NaVO3, 0.5 mM PMSF), and protease inhibitor mixture (Sigma, St.

Common Internal Retention Time Standards for Quantitative LC MS

Molecular & Cellular Proteomics 14.10 2801

Page 4: Identification of a Set of Conserved Eukaryotic Internal

Louis, MO) and centrifuged at 16,100 � g for 15 min at 4 °C to removeinsoluble material. Protein was precipitated from the lysate using 6volumes of ice cold acetone for 1 h, pelleted by centrifugation at16,100 � g for 15 min at 4 °C. The protein pellet was washed threetimes with 200 �l ice cold acetone with interspersed centrifugation,dried briefly by vacuum centrifugation at 45 °C, and resuspended in 8M urea, 50 mM NH4HCO3 prior to digestion (described below).

(2) The yeast strain BY4741 MATa his3� leu2� met15� ura3�(Yeast cell line) was grown in S.D. medium until they reached an A600of 0.8. The culture media was quenched by addition of trichloroaceticacid to a final concentration of 6.25% and the cells were harvested bycentrifugation at 1500 � g for 5 min at 4 °C. The supernatants werediscarded and the cell pellets were washed three times by centrifu-gation with cold (�20 °C) acetone to remove interfering compounds.The final cell pellets were resolubilized in lysis buffer containing 8 M

urea, 0.1 M NH4HCO3 and 5 mM EDTA and cells were disrupted byglass bead beating (5 times 5 min at 4 °C) to lyse the cells prior todigestion (described below).

(3) Mouse immortalized vascular smooth muscle cells (VSMC,ATCC CRL-2797, Manassass, VA) were grown to 90% confluencyand lysed by freeze/thaw cycling and cell scraping in ice-cold 8 M

urea.Preparation of Peptide Samples for LC MS—Proteins were reduced

with either 10–12 mM DTT (Mouse, Yeast lysate) or 5 mM TCEP (HEKlysate) for 30 min at 37 °C and subsequently alkylated with iodoac-etamide (10 mM, 40 mM and, 50 mM for HEK, yeast, and mouselysates, respectively). Mouse lysates were diluted with 0.1 M

NH4HCO3 to a 6 M Urea concentration and digested in 0.1 �g Lys-C(Roche, Indianapolis, IN) for 4 h at 37 °C prior to trypsin digestion. Allsamples were diluted to �1.5 M Urea by addition of 0.1 M NH4HCO3,after which sequencing grade modified trypsin (Promega, Madison,WI) was added at a 1:100 ratio of enzyme/substrate. Digestion of allsamples was stopped by acidification with Formic acid. To test thealignment capabilities of the CiRT peptides on a library synthesizedfrom a highly fractionated data set, peptides from the mouse lysateonly were separated by basic reverse phase chromatography, col-lecting 80 fractions. The 80 fractions were reduced to 12 fractions byrecombining fractions spaced far from each other in the organicelution gradient. Fractions were dried and re-suspended in 0.1%trifluoroacetic acid prior to desalting.

Acidified peptide mixtures derived from all three sources weredesalted using reverse phase cartridges Sep-Pak C18 (Waters, Mil-ford, MA) according to the following procedure; wet cartridge with 1volume of 100% methanol, wash with 1 volume of 80% acetonitrile,equilibrate with 4 volumes of 0.1% formic acid, load acidified digest,wash 6 volumes of 0.1% formic acid, and elute with 1 volume of 50%acetonitrile in 0.1% formic acid. Peptides were dried using a vacuumcentrifuge and resolubilized in 0.1% formic acid with synthetic iRT(SiRT) calibration peptides at a 1:20 v/v ratio (Biognosys, Schlieren,Switzerland).

Shotgun Data Dependent Mass Spectrometry and Assay LibraryGeneration—Peptides were analyzed by LC MS over a 2 h gradientfrom 2–35% acetonitrile by an Eksigent NanoLC Ultra 2D Plus HPLCsystem coupled to an 5600 TripleTOF mass spectrometer (AB Sciex,Framingham, MA) operating in shotgun mode. The top 20 most in-tense MS1 precursors (collected between 360 and 1460 m/z for 250ms) with charge states between 2 and 5 were selected for MS2fragmentation, with a 15 s exclusion window. Fragment MS2 ionswere collected for 100ms across a 50–2000 m/z range.

Raw profile mode wiff files were centroided and converted tomzML using the AB Sciex converter (v1.3) and subsequently con-verted to mzXML using msconvert (ProteoWizard, v3.04.238). Peptidesequences were assigned using parallel searches with the OMSSAand X!Tandem algorithms, searching against UniProtKB/TrEMBL da-

tabases of Human, Yeast, or Mouse proteins, appended with com-mon contaminants and decoy sequences (Human 40,577 proteinsand decoys, September 2013; Yeast 12,006 proteins and decoys,October 2013; Mouse 33,330 proteins and decoys, October 2013).Semitryptic peptides with up to two missed cleavages were al-lowed, with carbamidomethylation of cysteines set as a fixed mod-ification and variable modifications of oxidized methionine andphosphorylation of tyrosine, serine, and threonine. Precursor andproduct ion mass errors were set to 30 ppm and 75 ppm, respec-tively. Search engine results were converted to pepXML formatusing omssa2pepXML (v2.1.9) or Tandem2XML (v4.6.0). Peptidespectral match probability scoring was modeled in PeptideProphet(v4.6.0), and the resulting interact.pepXML files of the two searchengines were combined in iProphet (v4.6.0).

Peptide assay libraries were generated with SpectraST (v4.0) fromthe identified peptides with a Peptide Prophet probability � 0.95. Theresulting spectrast .splib file was submitted as input to the customspectrast2spectrast_irt.py converter script that was used to align RTto the normalized iRT_C18 space, initially using the SiRT peptides (1)and subsequently by substituting in the CiRT-reference peptides,generated as described below (note: this functionality has now beenincorporated directly into the SpectraST (v5.0) workflow). A nonre-dundant, consolidated peptide assay library was created from each ofthe initial iRT normalized libraries with SpectraST. To test the feasi-bility of merging data sets derived from archived and diverse sourcesfor library creation, two additional human-derived data sets, 11OFFGEL fractions and 15 basic reverse phase fractions, generatedfrom human cell lysates were accessed and downloaded from thePRIDE data repository (PXD000953, filenames beginning with PFARIDand PXD000442, filenames containing MCF71-MCF715) (17, 18). Allthree data sets from the human-derived lysates were then mergedinto a single master library using the SpectraST library manipulationtool available in the trans proteomic pipeline. Spectral libraries wereformatted for OpenSWATH using the custom script spectrast2tsv.py,assigning 5 y- or b-ion transitions to each peptide, and includingpeptides between m/z 350–2000, with charge states 1, 2, 3, or 4. TheOpenSWATH tools ConvertTSVtoTraML and OpenSwathDecoy-Generator were then used to format the library for input to Open-SWATH and append shuffled decoys to the full assay libraries asdescribed (7).

All custom python scripts are available at: https://github.com/msproteomicstools.

OpenSWATH tools can be found at: http://www.openswath.org.Identification of CiRT Reference Peptides—Two separate peptide

assay libraries comprising 6728 yeast and 9565 human peptides thatwere derived from cell lysates (confidence of identification set to�1% FDR) were generated as described above. The raw RT of eachpeptide was normalized to the iRT-C18 space using the traditionalbiognosys SiRT reference peptides (Biognosys, Schlieren, Switzer-land). In a separate analysis, we ran a query of a theoretical trypticdigest of the entire Swiss-Prot protein database, selecting all anno-tated species from bacteria up the phylogenetic tree to homo sapiensto generate a table organized with a peptide amino acid sequence inone column and the number of times that peptide sequence occursacross all species in the database in a second column, rank orderedby the most commonly occurring peptide sequences at the top of thelist. Although the surrounding amino acids of a given conservedpeptide sequence may differ across species, by specifically queryingthe peptide sequences produced by theoretical tryptic digest weensured that the peptide sequence reported is consistent with themost common method for peptide sample preparation used in massspectrometry experiments. To identify the CiRT list, the peptidescommon between the yeast and human libraries were cross-refer-enced against the 500 most commonly occurring tryptic peptide

Common Internal Retention Time Standards for Quantitative LC MS

2802 Molecular & Cellular Proteomics 14.10

Page 5: Identification of a Set of Conserved Eukaryotic Internal

sequences from the UniProtKB/Swiss-Prot database query. The re-sult of this three way comparison was a list of mass-spectrometryaccessible peptides that are present in both yeast and human celllysates, that were also among the most commonly occurring (i.e.highly conserved) peptides across the entire Swiss-Prot protein da-tabase (CiRT; supplemental Table S1). We also included a handful oftrypsin autolysis peptides, as these are very commonly detected as aconsequence of typical MS sample preparation, for a final CiRT list of113 peptides. The reference iRT value for each CiRT peptide wascalculated as an average of the iRT value calculated from the identi-fication of that peptide in the yeast and human lysates.

SWATH Mass Spectrometry and OpenSWATH Data Analysis—Peptides analyzed by SWATH-MS were acquired as described pre-viously (6, 15). Briefly, the TripleTOF 5600 mass spectrometer wastuned to allow a quadrupole resolution of 25 m/z mass selection.Precursor MS1 ions were grouped into 26 m/z windows across a400–1200 m/z range, creating a set of one precursor scan and 32MS/MS fragmentation scans with 0.5 m/z overlap at either end of agiven window. SWATH MS2 spectra were collected between 100–2000 m/z within each window, with collision energy determined asthat appropriate for a 2� charged ion centered within the window witha spread of 15. Total duty cycle was �3.4 s, with 100 ms accumu-lation time set for each of the 32 SWATH scans in high sensitivitymode and one high resolution survey scan at the beginning of eachcycle.

Data were analyzed using the recently published OpenSWATHworkflow, described in detail elsewhere (7, 15). Briefly, raw wiff filesfrom SWATH-MS acquisitions were converted in profile mode tomzXML using msconvert. This conversion results in wrongly anno-tated precursor isolation windows, and was fixed with the customscript fix_swath_windows.py. The repaired mzXML file was then splitinto 33 files (one for each SWATH window, plus the precursor scans)using the custom script split_mzXML_intoSWATH.py, and eachmzXML file was subsequently converted to mzML with msconvert.The resulting mzMLs were input into the OpenSWATH workflow,along with the SiRT or a custom internal iRT alignment TraML file, andthe appropriate transition library assay TraML file (yeast, human,mouse, or SGS). The candidate peptide signals from the Open-SWATH output were then classified using the pyprophet implemen-tation of the mProphet algorithm (v0.10.0, https://pypi.python.org/pypi/pyprophet/0.10.0) (17). Only peak group identifications with anassay q-value � 1% were included for all downstream analyses.

Assessment of Quantitative Accuracy using the SWATH-MS GoldStandard Data Set—The effect of iRT alignment on quantitative ac-curacy by SWATH-MS was assessed on the previously describedSWATH-MS Gold Standard (SGS) data set, details for which can befound in Rost et al. (7). Briefly, 340 isotopically labeled “heavy”peptides along with the iRT-kit (Biognosys) were spiked into human oryeast cell lysate at increasing twofold concentration increments from0.116 fmol up to 60 fmol. Lysates spiked with the peptides at eachconcentration were analyzed in triplicate SWATH-MS acquisitions. Anassay library containing each of the spiked-in peptides, aligned tonormalized iRT-C18 space, was used for targeted extraction on eachSGS data file. XICs and peak groups were generated for each pep-tide, when observable, using the OpenSWATH workflow with the RTnormalization performed in duplicate extractions, once with the iRT-kit peptides and once with the internal custom iRT peptides. Peakgroups that did not reach the q-value threshold below 1% q-valuewere given a zero value. Peptide intensity at each concentration in thetwofold dilution series was normalized to the intensity value from thehighest concentration (30 fmol/ul). The Log (base 2) values of normal-ized peptide intensities (nLog2) were averaged across technical rep-licates. Box and whisker plots were generated to test whether theobserved intensity of peptides fit the expected linear pattern of a

twofold increase in nLog2 intensity between any two dilution steps,which with perfect quantification should be equal to 1. To comparethe accuracy of quantification between CiRT and SiRT normalizeddata sets, the error of quantification was calculated as described inequation 1:

Error � 1 � norm log2 DilutionX� � norm log2 DilutionX � 1��

The average error of all peptides observed between any two dilu-tion steps was calculated for the CiRT and SiRT normalized data sets,and one-way ANOVA was used to examine any differences in quan-titative error between the CiRT and SiRT normalized data sets.

Algorithmic Refinements to Improve the Robustness of the Auto-mated RT Normalization—First, through manual inspection of Open-SWATH subscores (described in Rost et al., 7) for high and low qualitypeptide transition groups, we optimized a filter function that wasimplemented as an extension to the preliminary scoring of the existingOpenSwathRTNormalizer component tool in the OpenSWATH work-flow. This filter uses a simplified linear model with pretrained fixedweights (instead of the semi-supervised learning linear discriminantanalysis) and is optimized to filter and remove low-scoring candidatesignals without requiring RT information. Second, to automaticallyremove outlier signals introduced by false positive peptide peakgroup identifications that might introduce noise in the linear regres-sion, we added two additional outlier removal methods. The originallyimplemented jackknife method optimizing the coefficient of determi-nation of the linear regression was supplemented with an algorithmselecting the largest residual and an implementation and adaptationof the Random Sample Consensus (RANSAC) algorithm (20). Third, toensure that the selected endogenous normalization peptides andsmoothed linear regression cover the whole retention time range, thealgorithms require that at least one sampled peptide is present ineight out of ten bins that collectively span the entire retention timerange. This option can be enabled in the OpenSwathRTNormalizercomponent of the OpenSWATH tool using the parameter “-best_pep-tides” and an assay library of candidate alignment peptides and theirassigned iRT values.

We then created a benchmark data set to test the accuracy andefficiency of the different algorithms for RT peptide selection and RTnormalization. We selected one data set where we manually anno-tated the position of 113 endogenous CiRT peptides (if they could bedetected at all in the samples analyzed) to create a “ground truth”value for this file. The original CiRT assay library was complementedwith a varying number of additional known false assays (up to a10-fold excess) created with one of two methods (“assay decoy” or“iRT decoy”). The assay decoys were generated using the Open-SwathDecoyGenerator (7) and appended in ratios from 1:1 to 1:10.These decoys were used to assess the optimized filter function forundetectable peptides. The “iRT decoys” were generated and ap-pended in ratios from 1:1 to 1:10 by duplication of the target assaysand randomized shuffling of the iRT coordinates within the list. Thegoal was here to generate assays that will result in detected signalsbut with wrong iRT coordinates, challenging the outlier removal algo-rithms even more.

The linear regressions and transformations from iRT to RT space ascomputed by the different algorithms were assessed by using thetarget CiRT assay library as input for the algorithms without anydecoys as ground truth. The performance of the algorithms wasjudged by computing accuracy and recall (using manually validateddatapoints and treating the problem as a classical classification prob-lem). In addition, we report the standard deviation of the differencebetween the predicted retention times using the computed model andthe true, known retention time as an objective goodness-of-fitmeasure.

Common Internal Retention Time Standards for Quantitative LC MS

Molecular & Cellular Proteomics 14.10 2803

Page 6: Identification of a Set of Conserved Eukaryotic Internal

Common Internal Retention Time Standards for Quantitative LC MS

2804 Molecular & Cellular Proteomics 14.10

Page 7: Identification of a Set of Conserved Eukaryotic Internal

Deposition of Data—All unpublished data have been deposited forpublic access: http://www.peptideatlas.org/PASS/PASS00724.

RESULTS

Our goal was to develop a set of commonly detected pep-tides for internal RT calibration in DIA workflows. The integra-tion of internal RT standards into the DIA workflow involvesthe following steps (summarized in Fig. 1): (1) normalizing RTwithin ion libraries using CiRT; (2) predicting and scoringassay peak group RT in a SWATH-MS data file; (3) ensuringCiRT-based RT prediction does not adversely effect quanti-tative accuracy by SWATH-MS.

Normalization of Assay Library RTs Using Internal CiRTPeptides—One utility of internal RT standards is to be able toconvert the raw RT of a given shotgun sample to a normalizedRT for easy transfer of peptide transition libraries across lab-oratories and experimental set ups. We tested the capabilitiesof the CiRT-reference peptides to normalize RT in a total offive different data sets: Two unfractionated whole cell lysatesamples generated from human and yeast cell lines; one dataset comprised of 12 separate shotgun files generated from abasic reverse phase fractionation of mouse VSMC digest; andtwo additional data sets comprised of 11 OFFGEL fractionsand 15 basic reverse phase fractions generated from humancell lysates that were accessed and downloaded from thePRIDE data repository. To perform the RT normalization step,CiRT-reference peptides were formatted for insertion into thespectrast2spectrast_irt.py script (supplemental Table S4) as acomma separated list containing only the exact peptide se-quence followed by a colon and then the assigned iRT value(e.g. PEPTIDE1:iRT1, PEPTIDE2:iRT2, etc).

First, we compared the CiRT normalization peptides toexogenous SiRT standards for library RT alignment. Ninety-eight (86%) of the 113 possible CiRT peptides were identifiedin and used to fit the yeast library iRT normalization and 84(74%) could be detected and used to fit the human librarynormalization. The errors between the predicted RT (calcu-lated from the normalized iRT-value) and the actual observedRT (�RT) were comparable between CiRT-peptide and SiRT-peptide alignments (Fig. 2A). For all normalization strategies,the interquartile range for overall error in RT-prediction fromnormalized iRT was within 30 s.

Next, we tested whether the endogenous CiRT peptidescould be used to normalize RT within a series of peptide setsfractionated from a starting sample of trypsin digested com-plex lysate. This presents a unique challenge for the use ofendogenous internal reference peptides, because these nor-malization reference peptides are present prior to peptidefractionation, and the starting list must be large enough toensure distribution of reference peptides with adequate cov-erage of the chromatogram in each fraction. We tested thecapability of CiRT peptides to align 12 shotgun runs fromseparate peptide fractions of mouse vascular smooth musclecell lysate that had been separated by basic reverse phase(bRP) fractionation, as well as 11 samples of OFFGEL frac-tionated peptides and 15 samples of bRP fractionated pep-tides both derived from human cell lysate that were accessedvia the PRIDE data repository (PXD000953 and PXD000442,respectively). The CiRT peptide alignments were successfulfor all three sets of files, with each separate alignment exhib-iting R2 values � 0.92. Tables of the peptides included foralignment of each fraction for the mouse and human OFFGELdata set are shown in supplemental Tables S2 and S3. Over-all, the CiRT alignment predicted RT within 60 s for themajority of assay library peptides generated from the fraction-ated data sets (Fig. 2B).

Normalization and Prediction of RT in SWATH-MS FilesUsing CiRT—In the SWATH-MS targeted data analysis work-flow, accurate alignment of the experimental RT to the pep-tide assay iRT is crucial to enable sensitive detection of peakgroup candidates. To perform the RT-normalization of aSWATH-MS file in the OpenSWATH workflow, ion chromato-grams are first extracted for the RT-reference peptides. Inorder to confidently and accurately assign peak group identityfor a normalization peptide, it is imperative that (1) the peptideis present in high abundance within the sample so that thepeak group has high intensity and low signal-to-noise and (2)there is no ambiguity in peak group assignment. If a peptidefrom the normalization set is not present in the SWATH-MSsample or is present in a moderate to low abundance, this canlead to misidentification of a peak group, erroneous RT as-signment for this peptide and failure of the RT normalizationstep.

Fig. 1. A schema depicting the process of identifying internal iRT normalization peptides. Top Panel: Samples of lysate are digested withtrypsin, spiked with synthetic iRT calibration peptides (SiRT), and analyzed by data-dependent mass spectrometry. The most abundantpeptides, or those identified across multiple species, are selected as candidates and iRT values are assigned using the linear regressioncreated by referencing the external iRT calibrant peptides. Middle panel: Retention time is normalized across spectral libraries by replacing theSiRT peptides with the selected CiRT peptides in the SpectraST iRT normalization step. For libraries generated from multiple fractions ofpeptides, larger CiRT lists are required. Bottom panel: Prediction of retention time in SWATH-MS data files using CiRTs requires very highintensity, low signal-to-noise calibration candidates. A filtered list of CiRT normalization peptides, created either manually or via the newlywritten algorithm for CiRT refinement described here, are extracted from the SWATH-MS data and a linear regression is computed to transformiRT to observed RT for that file. The subsequent linear equation is used to predict retention time of a given peptide in the assay library withina user-specified confidence window, typically 5 min. Candidate peak groups selected within this window are scored using the OpenSWATHscoring algorithm, where the difference between the experimental retention time relative to the predicted retention time of a given peptide isscored and contributes to the composite score used to assess confidence in peak group identification.

Common Internal Retention Time Standards for Quantitative LC MS

Molecular & Cellular Proteomics 14.10 2805

Page 8: Identification of a Set of Conserved Eukaryotic Internal

To address these requirements, we first generated a refinedlist of 14 CiRT peptides for SWATH-MS data set normalization(CiRT-SW) that represents the top most abundant and unam-biguous peptides from the 113 CiRT peptide list. We selected

these manually, comparing the extracted transition groupchromatograms of only the CiRT peptides across the fullchromatographic timespan from SWATH-MS runs generatedfrom human, yeast, and mouse cell lysates using Skylinesoftware. It is important to note that this was not a full Open-SWATH analysis, as we did not attempt to score and quantifythe peptides using the full algorithm scoring method, wesimply manually selected the peptides for high intensity,matching fragment ion co-elution and equal distributionacross the chromatogram. To clarify this point, the total ex-tracted ion chromatograms of the 14 peptides selected for theCiRT_SW list, as well as examples of peptides not selected,are provided in supplemental Fig. S1. This selection resultedin a set of high intensity, high signal-to-noise, and unambig-uous CiRT peptides for use in RT normalization of mostSWATH-MS files by any experimenter (Table I).

In our initial test of the CiRT method, the refined CiRT SWlist was used to predict RT and extract target peptides in ayeast, human, and mouse cell SWATH-MS data file, respec-tively. Independent t-tests comparing the �RT values for pep-tides identified in SWATH analyses performed using the twodifferent alignment methods (CiRT versus SiRT) indicated thatthe accuracy of the RT prediction, was significantly differentbetween sets (Fig. 3A; t � 9.36, p � 0.0001 comparing �RTvalues for CiRT and SiRT aligned peptides in human lysates,t � 43.35, p � 0.0001 comparing �RT values for CiRT andSiRT in yeast lysates, and t � 6.77, p � 0.0001). Neither theSiRT nor the CiRT peptide sets indicated bias toward higheraccuracy of alignment, whereas the SiRT normalization setexhibited higher absolute error in the human sample (mean�RT of 14.5 15 standard deviation versus mean �RT of23.9 94 standard deviation for CiRT and SiRT alignedpeptides, respectively). Conversely the CiRT set exhibitedhigher absolute error in the yeast sample (mean �RT of 21.9

43.2 standard deviation versus mean �RT of 16.2 50.1standard deviation for CiRT and SiRT, respectively) and themouse sample (mean �RT of 35.6 109.6 standard deviationversus mean �RT of 23.2 106.7 standard deviation for CiRTand SiRT, respectively).

As an additional and important proof of principle, we testedwhether the CiRT method could be used to normalize RTacross multiple files from diverse sources (e.g. different labs,different fractionation methods, etc), ultimately yielding amerged library capable of improved and accurate SWATH-MSquantification. We analyzed the same human derived SWATHfile described above against a larger, expanded library of18,521 proteotypic peptides representing 4063 proteins(compared with 9107 peptides from 2195 proteins in thesmaller Human library derived from a single, unfractionatedcell lysate DDA-MS file). Using the expanded library, we iden-tified an additional 3,899 unique peptides and 774 proteinsfrom the same human SWATH-MS sample originally queriedagainst the smaller library of human peptides (Fig. 4).

FIG. 2. Assay library retention time normalization between syn-thetic and internal calibration peptides. The difference betweenobserved and predicted retention time (�RT) was calculated for eachpeptide included in the assay libraries generated from the same rawfiles of human and yeast lysate (A), and from three different sets offractionated peptides from mouse (generated in house) and humancell lysate (data downloaded from the PRIDE MS data repository),separated by either basic reverse phase (bRP) or OFFGEL fraction-ation methods (B). Retention times were predicted separately usinglinear regression equations created from a large list of 113 commoniRT (CIRT) peptides or the 11 synthetic iRT peptides (SiRT). Data arepresented as box and whisker plots, with the middle quartiles sur-rounding the median for the entire assay library represented by thebox, whiskers showing the 95% data range and the upper and lower2.5% of all values plotted as individual data points. Note: for somedata sets the distribution of �RT was within such a small range thatthe box and whisker plots appear as a single horizontal line. Y-axisranges were set at values that demonstrate the full range of error.

Common Internal Retention Time Standards for Quantitative LC MS

2806 Molecular & Cellular Proteomics 14.10

Page 9: Identification of a Set of Conserved Eukaryotic Internal

Comparison of the Quantitative Accuracy of CiRT VersusiRT Normalization—The SWATH-MS technique is a method oflabel free quantification. The RT alignment between assaylibrary and SWATH-MS data file is a critical component ofpeptide identification and subsequent quantification, andtherefore inaccurate alignments may negatively influencequantitative accuracy. Given the observed differences in �RTbetween SiRT and CiRT peptide sets, we examined whetherthese differences were sufficiently large to influence the iden-tification and subsequent quantification of peptides bySWATH-MS. To do this, we compared the peak group inten-sities of each peptide identified with � 1% FDR between thedifferent RT normalization sets (Fig. 3B). For the human lysate,530 peptides (6.9%) had different intensities between the twoRT normalization sets. Among these, 328 were detectedwith � 99% confidence of identification exclusively in theCiRT normalized data file, and 20 found only in the SiRT datafile. An additional 208 peptides with different peak group XICintensity values were detected in both CiRT and iRT normal-ized data files, but the actual observed RT between thesepeptides were different, suggesting erroneous or ambiguouspeak group identification. Examination of the �RTs for thesepeptides indicated a roughly equivalent likelihood that theSiRT or CiRT peptide set more accurately predicted RT. Fi-nally, the remaining 74 peptides (0.8%) had equal values forobserved RT, and the difference in intensity appears to bebecause of the peak location at the left or right border of theRT window used for peak extraction, which can result in slightdifferences in the extracted intensity for some but not alltransition group peaks. The quantitative discrepancies weresimilarly distributed for the yeast sample, with 2.6% of allpeptides found to have different intensity between the CiRTand SiRT alignments. These discrepancies were attributed toa subset of peptides in both standards: 63 peptides identifiedonly in the CiRT aligned data and 8 peptides only identified inSiRT aligned data, with 51 (0.8%) of peptides identified inboth alignments but with different observed retention times

and 46 (0.7%) detected in both alignments but with the sameretention time. For the mouse sample, a higher overall per-centage of discrepancies was found (8.1%), with the majorityof error attributed to the observation of 365 peptides onlydetected in the SiRT-aligned and 165 peptides detected onlyin the CiRT_SW aligned data.

Taken together, these data indicate that slight differences inthe accuracy of RT prediction by a given RT alignment setdoes not appear to substantially bias the overall distribution ofquantitative errors in the SWATH-MS data set. This observa-tion can be explained in part by the use of multiple orthogonalpeak group scoring parameters in the OpenSWATH analysissoftware, of which RT is only one component. Finally, it isimportant to note that most discrepancies occur in the lowerintensity range (e.g. for less confident peptide identifications),and that the percentage of quantitative differences betweenpeptides in the SiRT or CiRT-aligned data were 10% or fewerof all peptides quantified in each sample.

To further ensure that the CiRT-SW normalization achievescomparable quantitative sensitivity and accuracy relative toSiRT normalization, we compared quantitative performance ofthe two RT alignment methods on a “gold standard” data setcomposed of 10 twofold dilution steps of 340 chemicallysynthesized stable isotope standard peptides, beginning at aconcentration of 30 fmol/�l and ending at 0.058 fmol/�l (de-scribed in detail in Methods). The median and distribution ofnormalized log2 (nLog2) intensities for the peptides observedat each dilution step are shown in Fig. 5A. A comparison of theslope m in the linear relationship describing observed andexpected mean nLog2-intensity (Fig. 5B) showed no signifi-cant differences in quantitative accuracy between the RTalignment by SiRT or CiRT, as indicated by overlap in the S.D.of the slopes (m � 0.889 0.018 for SiRT and m � 0.865

.019 for CiRT in Yeast lysate, m � 1.031 .025 for SiRT andm � 1.037 .030 for CiRT in human cell lysate). The meanerror of quantification (Fig. 5C) indicated that although therewere noticeable matrix effects as well as an expected de-

TABLE ICiRT_SW peptides

Peptide sequence Protein iRT

DSYVGDEAQSK Actin �14.8AGFAGDDAPR Actin �8.7ATAGDTHLGGEDFDSR Heat shock protein SSA3 5.2VATVSLPR Trypsin 13.4ELISNASDALDK ATP-dependent molecular chaperone HSP82 25.1IGPLGLSPK 60S ribosomal protein L12 29.4TTPSYVAFTDTER Heat shock protein SSA3 34.8VC160�ENIPIVLC160�GNKVDVK GTP-binding nuclear protein GSP1/CNR1 55.0DLTDYLMK Actin 59.8LGEHNIDVLEGNEQFINAAK Trypsin 60.0SYELPDGQVITIGNER Actin 66.9YFPTQALNFAFK ADP/ATP translocase 93.5SNYNFEKPFLWLAR GTP-binding nuclear protein GSP1/CNR1 94.0DSTLIMQLLR 14-3-3 protein 101.8

Common Internal Retention Time Standards for Quantitative LC MS

Molecular & Cellular Proteomics 14.10 2807

Page 10: Identification of a Set of Conserved Eukaryotic Internal

crease in accuracy and increase in variability at the lowestconcentrations, the error in CiRT and SiRT normalized data ateach dilution step and in either matrix were not significantlydifferent. These data indicate that the chosen RT normaliza-tion approach (CiRT versus SiRT) used did not significantlycontribute to any errors in quantification.

Performance and Robustness of the Automated RT Normal-ization Algorithms—Although the CiRT lists compiled andtested as described above were chosen to be generalizable tomultiple species and experimental designs so that many ex-

perimenters can immediately implement them for internal RTcalibration of DIA-MS files, these particular peptides will notbe found in every possible experimental sample. For experi-mental preparations in which the CiRT_SW peptides are notpresent, researchers can use other candidate internal refer-ence peptides from their library to predict RT for peptidesassayed in SWATH-MS files. This process is not trivial, as wehave mentioned above, endogenous peptides can exhibitmore variation in signal quality among a given set of referenceRT standards relative to synthetic peptides, and in our case

FIG. 3. Accuracy between retention time prediction methods for peak groups extraction from SWATH-MS data files. A, The differencebetween observed and predicted retention time (�RT) of each of the confidently identified (FDR � 1%) peak groups defined by anRT-normalized assay library and extracted from a human, yeast, or mouse SWATH-MS run are compared between conditions where the assaylibrary and SWATH-MS normalization were performed with synthetic iRT (SiRT) or common internal iRT (CiRT) normalization peptides. Data arepresented as box and whisker plots, with the middle quartiles surrounding the median for the entire assay library represented by the box,whiskers showing the 95% data range and the upper and lower 2.5% of all values plotted as individual data points. B, Correlation betweenthe intensity of a given peptide as determined by the SiRT or CiRT normalization procedure. Each dot represents the summed intensity of alltransitions extracted for a given peptide peak group from the same raw file, with the only difference being the method of RT normalization. C,Distribution of matching and mis-matching peptide peak group intensities for the human (left) and yeast (middle) and mouse (right) derivedsamples. Pie charts depict overall distribution peptides with matching or mismatching intensity values between CiRT and SiRT aligned datasets. Horizontal bars show distribution of peptides among different categories explaining mismatched intensity values.

Common Internal Retention Time Standards for Quantitative LC MS

2808 Molecular & Cellular Proteomics 14.10

Page 11: Identification of a Set of Conserved Eukaryotic Internal

we needed to hand curate the CiRT_SW peptides to identifythe best candidates for SWATH-MS normalization. We havetherefore amended the workflow in order to automate theselection of endogenous peptide standards for SWATH-MSnormalization from a large list of candidate standards. Thismodified approach has now been implemented as a new filterfunction in the OpenSwathRTNormalizer tool (see Methods).

Using a benchmark data set (described in Methods), weevaluated the performance of the new filter function designedto remove low-scoring signals and select the best referencepeptides for RT prediction using a combination of three outlierremoval algorithms: jackknife, largest residual removal andRandom Sample Consensus (RANSAC)(20). We found thatthe filter function confidently removes low-scoring signals(originating from decoys) at least up to a tenfold excess offalse signals. The best-performing algorithm (iterative Jack-knife with removal of low-quality peaks) was able to producerobust regression models even in the presence of a very largenumber of false signals. Specifically, the measured error ofthe linear model increased only minimally from 3.76 to either3.81 or 3.86 for the two different methods of adding falsesignal (Fig. 5A, red and green curves) whereas a substantiallylarger increase in the model error was observed when notremoving low-quality peaks (Fig. 6A, cyan and magentacurves). These results suggest that the new method, imple-mented in the publicly available development version of theOpenSWATH software, is robust enough to deal with nonop-timal lists of potential CiRT candidates provided by an experi-mentor and filter a substantial amount of noise in the data toselect the best reference peptides for RT normalization.

Upon investigation of individual regression models, even inthe presence of a large amount of false signals (tenfold excessof decoys relative to targets), the new algorithm was able toretrieve almost all of the correct signals and produce a linearmodel of high quality (R2 � 0.95) (see Fig. 6B). The automated

selection of target signals from noise was made with anaccuracy of 94.9%, determined from the observation of 62(5%) decoy peak groups that were incorrectly retained in thelinear regression, out of a total of 1364 peak groups initiallyentered (black dots in Fig. 6B), with 1221 decoy peak groupscorrectly discarded by the filter function. Of the 81 “knowntrue” target signals from the input data, 74 peak groups werecorrectly retained and used for the regression, representing a91.3% recall rate for this data set (red dots in Fig. 6B). Asnoted above, the regression models obtained from the noisydata were extremely similar to the ones obtained using onlythe 81 “known true” target data points without the decoysadded (black and dotted red lines in Fig. 6B). Thus, we showthat the enhanced filter function in the OpenSwathRT-Normalizer tool can automatically select correct candidatepeptides for accurate RT alignment from a large input listcontaining up to 90% false signals. To test the practical utilityof the novel algorithm, we repeated the OpenSWATH analysisof each yeast, human, and mouse file originally aligned by theCiRT_SW peptides, and instead entered the entire set of 113CiRT peptides (CiRT_ALL), allowing the -best_peptides func-tion to automatically select ideal candidates for alignmentyielded slightly less precise alignment of RT, demonstrated bya slightly higher dRT value distribution relative to that of theCiRT_SW, however the quantitative outcome was essentiallycomparable to alignments performed using the manually se-lected peptides (supplemental Fig. S2).

DISCUSSION

The CiRT peptides identified in this manuscript provide amethod to normalize RT for peptide identification and quan-tification in SWATH data analysis across experiments per-formed on most eukaryotic species. The endogenous CiRTpeptides described here provide an alternative to the use ofthe spiked-in SiRT standard peptides in every single sample

FIG. 4. Expansion of peptide assay library using archived data sets improves SWATH-MS quantitative depth. A, The precision of RTprediction within the same SWATH-MS data file was slightly lower but still comparably accurate between a larger peptide assay librarycomprised of human peptide data from three diverse sources and the original, small human library built from the same lysate digest as theSWATH-MS data being analyzed. B, Use of the expanded library increased the number of nonredundant peptide sequences and correspondingproteins quantified from the same DIA-MS data file relative to the smaller library.

Common Internal Retention Time Standards for Quantitative LC MS

Molecular & Cellular Proteomics 14.10 2809

Page 12: Identification of a Set of Conserved Eukaryotic Internal

of a SWATH experiment. We show here that the CiRT pep-tides are detected in human, mouse, and yeast protein sam-ples, are therefore represent tryptic peptide sequences thatare highly conserved across eukaryotic species and general-

izable to many common experimental preparations. We havetested the CiRT method and shown that these peptides ef-fectively align the raw RTs of a peptide transition library to thenormalized iRT space, and further that RT prediction and

FIG. 5. Use of internal retention time prediction peptides does not alter the accuracy of peptide quantitation by SWATH-MS. A,Synthetic, heavy peptides were spiked into cell lysate from either human-derived (upper panel) or yeast (lower panel) cells at progressivelytwofold decreasing concentrations from 30 femtomoles to .058 femtomoles on column. Assay libraries, normalized to iRT, for the 340 syntheticpeptides were used to extract peak groups from SWATH files, and the intensity of a given peak group was normalized to its observed intensityat 30 fmol and set to log scale with a base of 2. Perfectly accurate quantification would therefore represent a single unit increase betweendilution steps. Quantification subsequent to synthetic iRT prediction is shown on the left and the CiRT shown on the right. B, Comparison ofthe linear estimate comparing the mean observed normalized Log2 Intensities plotted against that which would be expected based on theactual concentration of heavy peptide spiked into a given sample. Each data point represents the mean standard deviation (S.D.) of nLog2

intensity across all peptides observed at a given concentration, C, A plot of the mean absolute error calculated for each dilution step of the10 � 2-fold dilution series (described in methods). No peptides were detected in the lowest concentration, and as such there are 8 dilutionsteps used to calculate quantification error see methods for the equation used for error estimation). Values are presented as mean S.D. ateach dilution step, for each RT normalization method (CiRT versus SiRT) in Yeast and Human samples.

Common Internal Retention Time Standards for Quantitative LC MS

2810 Molecular & Cellular Proteomics 14.10

Page 13: Identification of a Set of Conserved Eukaryotic Internal

normalization by the CiRT peptides results in peptide identi-fication and quantification that is comparably accurate to thatachieved with SiRT peptides.

We based the CiRT approach on the original concept intro-duced by Escher et al. (1). In the design of their 11 syntheticpeptides for RT alignment, they applied specific criteria tooptimize the performance of their peptides: (1) intensity of theionized peptide; (2) absence of amino acids prone to modifi-cation; (3) nonnaturally occurring peptide sequences; and (4)distribution of peptide RT across the chromatogram. Theselection of internal CiRT peptides differed slightly from thatof Escher et al. in that the peptides were selected first-and-foremost based on their frequently occurring and evolution-arily conserved sequence of amino acids across many eu-karyotic samples, with the additional caveat that the peptidesare readily observed by MS analysis of our own human andyeast samples. We could have potentially expanded the list ofCiRT candidates by cross referencing the SWISS-PROT con-served peptide list with other MS data repositories such asNIST or PRIDE, however we were constrained at the time ofthe study by the requirement for both (1) a TripleTOF gener-ated DDA spectrum for each peptide in order to compose atransition group library and (2) an iRT value for each peptide,preferably iRTs from multiple observations in different samplematrices (e.g. human, mouse, yeast lysates). As the volume ofappropriate and publically accessible data sets increase webelieve that a list of CiRT reference peptides for iRT alignmentcan be expanded by our own group as well as the greater MScommunity.

The CiRT peptides are endogenous to the samples, and assuch, in assigning iRT values to the CiRT candidates weallowed for two common modifications that occur very typi-cally MS samples preparations: oxidization of methionine andcarbamidomethylation of cysteines. Allowing these commonlyoccurring modification states among the CiRT peptidesmakes them more consistent with the likely state of the pep-tide in an actual sample. This criteria differs from that neces-sary for the SiRTs, because the assigned iRT for SiRT pep-tides is based on their unmodified state, and they areexogenously added to sample, so any modification to themwould result in altered RT properties and render them uselessor even detrimental for iRT alignment. With CiRTs, the iRTvalue corresponding to the exact modified (or unmodified, aswas commonly the case) sequence was determined fromexperimental data based on the most commonly observedstate of the peptide in actual biological samples. For both themanual selection of CiRT_SW peptides and in the refinedalgorithm we have applied similar criteria as Escher et al.,specifically that the CiRT peptides are well distributed acrossthe chromatographic range, and that their extracted peakgroups exhibit high intensities so they can be easily distin-guished from noise and false positives.

The CiRT endogenous peptide sets can be used synergis-tically with exogenous SiRT standards or as an alternative RTstandard in many eukaryotic samples. The iRT assigned to theCiRT peptides in the current manuscript was calculated as anaverage of iRT assigned from the observation of the peptide inour experimental preparations of a human, yeast, and mouse

FIG. 6. Robustness of the computed linear alignment in the presence of noise signal when using a jackknife approach for outlierremoval. A, Increasing the number of noise signals from 2� to 10� impacts the error of the linear model (measured as standard deviation ofthe residuals of the correct signals) only substantially if peak quality is not taken into account. If a peak quality threshold is used and low-qualitypeaks are discarded (open symbols), the error of the linear model is almost constant even if a large number of false signals are present. B,Example robust regression in the presence of a 10-fold excess of noise peaks using the jackknife approach to remove outlier signal. In red areall known correct datapoints and in black are noise datapoints that by chance correlate with the correct ones and thus pull through the filtering.The dashed black regression line was obtained from all shown datapoints (R2 � 0.95), the solid black regression line was the regression modelobtained from only the known correct datapoints (R2 � 0.98). The measured error of the linear model is 3.86 (whereas the error when only usingthe known correct datapoints is 3.76) and while achieving an accuracy of 94.9%.

Common Internal Retention Time Standards for Quantitative LC MS

Molecular & Cellular Proteomics 14.10 2811

Page 14: Identification of a Set of Conserved Eukaryotic Internal

sample. Although it is unexpected that iRTs for a given pep-tide will vary substantially, it may be prudent to acquire one ormore test samples spiked with the SiRT-peptides and deter-mine experiment-specific iRT values for the endogenous CiRTpeptides. Once calibrated in this way, within a given experi-mental design internal CiRTs can be used rather than spikingSiRT peptides into every single sample as a cost-efficientstrategy for RT normalization.

In addition, this approach will allow access to archived LCMS data sets that were generated without standards or withdifferent internal standards, as we have demonstrated withthe inclusion of two data sets downloaded from the PRIDEdata repository. These can be normalized to iRT space andused for building of peptide assay libraries. This approach isapplicable even when the candidate peptides are spread outbetween multiple fractions, granted the list of candidates issufficiently large to result in even spread of RT normalizationpeptides across all fractions. We demonstrated this conceptwith the alignment of RT across two separate fractionateddata sets acquired from different laboratories and combinedwith our own data file from human cell lysate to generate amerged and expanded human library, which notably en-hanced peptide detection and quantification by SWATH-MSanalysis.

Finally, we have ensured the generalizability of the method forworkflows in which these specific CiRT peptides are not de-tected (e.g. nontryptic digestions, PTM enriched samples, affin-ity purifications) by refining the “OpenSwathRTNormalizer” al-gorithm for automated selection of internal RT alignmentreference peptides. In our example, the refined algorithm wasable to normalize RT using a list where as few as 10% of internalRT reference peptides candidates were actually present in thesample. Therefore, when the CiRT list described here does notapply, alternative lists of internal RT alignment candidates canbe generated by the user to be appropriate for their experimen-tal preparation (e.g. immunoprecipitations, PTM enrichments) ororganism. The amended “OpenSwathRTNormalizer” algorithmwith the “best_peptides” feature will select a refined subset ofthe user-defined RT alignment candidates from their input listfor RT normalization in subsequent SWATH experiments.

The CiRT peptide lists are provided in the appropriate for-mats for insertion into the OpenSWATH conversion andSWATH-MS iRT-normalization programs, and downloads ofthese formatted lists are provided in the supplemental mate-rial, as are brief instructions for their insertion into the work-flow (supplemental Table S4). The algorithms for internal iRTselection from large candidate lists for SWATH-MS iRT align-ment are also now implemented in new version of Open-SWATH (www.openswath.org), which will allow users togenerate custom RT normalization sets ideally applicable tovirtually any experiment and workflow.

Acknowledgments—We thank L. Gillet for providing and preparingthe yeast samples used in the current manuscript, and L. Gillet, P.

Navarro who worked along with the co-authors of the current manu-script H. Rost & G. Rosenberger to prepare and acquire the SWATHGold Standard data set, as described in (7).

* This work was supported in part by an European MolecularBiology Organization short term fellowship ASTF 456-2013awarded to S.J.P. H.L.R. was funded by ETH Zurich (ETH-30 11-2).G.R. was funded by the Swiss Federal Commission for Technologyand Innovation CTI (13539.1 PFFLI-LS). K.R. acknowledgessupport from Canadian Institute of Health Research, FRN:MFE123700.The project was also supported in part by the SNSF(Grant# 3100A0-688 107679), the European Research Council(Grant# ERC-2008-AdG 233226) to R.A, the US NHLBI ProteomicsContract HHSN268201000032C and R01 (R01 AR41135–18) to JVEand salary support by the National Marfan Foundation Victor EMcKusik Post-Doctoral Fellowship to S.J.P.

□S This article contains supplemental Figs. S1 and S2 and TablesS1 to S4.

¶¶ To whom correspondence should be addressed: ETH Zurich,Institute of Molecular Systems Biology, HPT E 78, Auguste-Piccard-Hof 1, 8093 Zurich, Switzerland. Tel.: �41 44 633 31 70; Fax: �41 44633 15 32; E-Mail: [email protected].

REFERENCES

1. Escher, C., Reiter, L., MacLean, B., Ossola, R., Herzog, F., Chilton, J.,MacCoss, M. J., and Rinner, O. (2012) Using iRT, a normalized retentiontime for more targeted measurement of peptides. Proteomics 12,1111–1121

2. Klammer, A. A., Yi, X., MacCoss, M. J., and Noble, W. S. (2007) Improvingtandem mass spectrum identification using peptide retention time pre-diction across diverse chromatography conditions. Anal. Chem. 79,6111–6118

3. Pfeifer, N., Leinenbach, A., Huber, C. G., and Kohlbacher, O. (2009) Im-proving peptide identification in proteome analysis by a two-dimensionalretention time filtering approach. J. Proteome Res. 8, 4109–4115

4. Gallien, S., Peterman, S., Kiyonami, R., Souady, J., Duriez, E., Schoen, A.,and Domon, B. (2012) Highly multiplexed targeted proteomics usingprecise control of peptide retention time. Proteomics 12, 1122–1133

5. Bateman, N. W., Goulding, S. P., Shulman, N. J., Gadok, A. K., Szumlinski,K. K., MacCoss, M. J., and Wu, C. C. (2014) Maximizing peptide identi-fication events in proteomic workflows using data-dependent acquisition(DDA). Mol. Cell. Proteomics 13, 329–338

6. Gillet, L. C., Navarro, P., Tate, S., Rost, H., Selevsek, N., Reiter, L., Bonner,R., and Aebersold, R. (2012) Targeted data extraction of the MS/MSspectra generated by data-independent acquisition: a new concept forconsistent and accurate proteome analysis. Mol. Cell. Proteomics 11,O111 016717

7. Rost, H. L., Rosenberger, G., Navarro, P., Gillet, L., Miladinovic, S. M.,Schubert, O. T., Wolski, W., Collins, B. C., Malmstrom, J., Malmstrom, L.,and Aebersold, R. (2014) OpenSWATH enables automated, targetedanalysis of data-independent acquisition MS data. Nat. Biotechnol. 32,219–223

8. Rost, H., Malmstrom, L., and Aebersold, R. (2012) A computational tool todetect and avoid redundancy in selected reaction monitoring. Mol. Cell.Proteomics 11, 540–549

9. Karlsson, C., Malmstrom, L., Aebersold, R., and Malmstrom, J. (2012)Proteome-wide selected reaction monitoring assays for the humanpathogen Streptococcus pyogenes. Nat. Commun. 3, 1301

10. Lam, H., and Aebersold, R. (2011) Building and searching tandem mass(MS/MS) spectral libraries for peptide identification in proteomics. Meth-ods 54, 424–431

11. Picotti, P., Clement-Ziza, M., Lam, H., Campbell, D. S., Schmidt, A.,Deutsch, E. W., Rost, H., Sun, Z., Rinner, O., Reiter, L., Shen, Q.,Michaelson, J. J., Frei, A., Alberti, S., Kusebauch, U., Wollscheid, B.,Moritz, R. L., Beyer, A., and Aebersold, R. (2013) A complete mass-spectrometric map of the yeast proteome applied to quantitative traitanalysis. Nature 494, 266–270

12. Schubert, O. T., Mouritsen, J., Ludwig, C., Rost, H. L., Rosenberger, G.,Arthur, P. K., Claassen, M., Campbell, D. S., Sun, Z., Farrah, T., Gen-

Common Internal Retention Time Standards for Quantitative LC MS

2812 Molecular & Cellular Proteomics 14.10

Page 15: Identification of a Set of Conserved Eukaryotic Internal

genbacher, M., Maiolica, A., Kaufmann, S. H., Moritz, R. L., and Aeber-sold, R. (2013) The Mtb proteome library: a resource of assays to quantifythe complete proteome of Mycobacterium tuberculosis. Cell Host Mi-crobe 13, 602–612

13. Picotti, P., and Aebersold, R. (2012) Selected reaction monitoring-basedproteomics: workflows, potential, pitfalls and future directions. Nat.Meth. 9, 555–566

14. Huttenhain, R., Soste, M., Selevsek, N., Rost, H., Sethi, A., Carapito, C.,Farrah, T., Deutsch, E. W., Kusebauch, U., Moritz, R. L., Nimeus-Malm-strom, E., Rinner, O., and Aebersold, R. (2012) Reproducible quantifica-tion of cancer-associated proteins in body fluids using targeted pro-teomics. Sci. Transl. Med. 4, 142ra194

15. Collins, B. C., Gillet, L. C., Rosenberger, G., Rost, H. L., Vichalkovski, A.,Gstaiger, M., and Aebersold, R. (2013) Quantifying protein interactiondynamics by SWATH mass spectrometry: application to the 14-3-3system. Nat. Meth. 10, 1246–1253

16. Liu, Y., Huttenhain, R., Surinova, S., Gillet, L. C., Mouritsen, J., Brunner, R.,Navarro, P., and Aebersold, R. (2013) Quantitative measurements of

N-linked glycoproteins in human plasma by SWATH-MS. Proteomics 13,1247–1256

17. Rosenberger, G., Koh, C. C., Guo, T., Rost, H. L., Kouvonen, P., Collins,B. C., Heusel, M., Liu, Y., Caron, E., Vichalkovski, A., Faini, M., Schubert,O. T., Faridi, P., Ebhardt, H. A., Matondo, M., Lam, H., Bader, S. L.,Campbell, D. S., Deutsch, E. W., Moritz, R. L., Tate, S., and Aebersold,R. (2014) A repository of assays to quantify 10,000 human proteins bySWATH-MS. Scientific Data. 1, Article no. 140031

18. Segura, V., Medina-Aunon, J. A., Mora, M. I., et al (2014) Surfing transcrip-tomic landscapes. A step beyond the annotation of chromosome 16proteome. J. Proteome Res. 13, 158–172

19. Teleman, J., Roest, H. l., Rosenberger, G., Schmitt, U., Malstroem, L.,Malstroem, J., and Levander, U. (2015) DIANA - algorithmic improve-ments for analysis of complex peptide sample data-independent acqui-sition MS data. Bioinformatics 31, 555–562

20. Fischler, M. A., and Bolles, R. C. (1981) Random sample consensus: aparadigm for model fitting with applications to image analysis and auto-mated cartography. Commun. ACM 24, 381–395

Common Internal Retention Time Standards for Quantitative LC MS

Molecular & Cellular Proteomics 14.10 2813