reproducible single shot plasma proteome …...reproducible single shot plasma proteome profiling...

1
Roland M. Bruderer 1 , Oliver M. Bernhardt 1 , Tejas Gandhi 1 , Sebasan Müller 1 , Polina Mironova 2 , Ondine Walter 2 , Jérôme Carayol 2 , Jörg Hager 2 , Armand Valsesia 2 , Loïc Dayon 2 , Jan Muntel 1 , Arne Astrup 3 , Wim H.M. Saris 4 and Lukas Reiter 1 1) Biognosys, 8952 Zurich-Schlieren, Switzerland, 2) Nestlé Instute of Health Sciences, Lausanne, Switzerland, 3) University of Copenhagen, Copenhagen, Denmark 4) Maastricht University Medical Centre, Maastricht, The Netherland Reproducible single shot plasma proteome profiling with high throughput capabilies on a robust capillary flow setup Introducon The proteins from the blood circulatory system are indicave for the status of an individual. Comprehensive, robust high throughput analysis of the proteome will enable holisc analysis of the health state. Recently, three studies were published advancing this task based on nano-flow LC-MS DDA combined with TMT or label free quanficaon and MS1 alignment (Comine et al. 2015 JPR, Geyer et al. 2016 MSB, Geyer et al. 2016 Cell Systems). The average protein idenficaons per run ranged from 190 with the TMT (180min method, 4plex) to 284 (33min method) and 437 (80min method) for the label free approaches. Nano-flow setups are delicate and require long gradient overheads. To significantly reduce this limitaon, we established a robust capillary-flow LC-MS-DIA. Then we applied it to DiOGenes. DioGenes is a pan-European program targeng the obesity problem from a dietary perspecve: seeking new insights and new routes to prevenon. 1546 samples from this project were analyzed. METHODS Plasma samples were prepared with an opmized in soluon digeson protocol using Biognosys’ sample preparaon kit. Biognosys’ iRT kit was spiked into the samples before injecon. The samples were acquired on a Thermo Scienfic Fusion Lumos mass spectrometer. A capillary flow LC setup was used consisng of a Waters M-class UPLC and a 300um*150m Waters CSH 1.7um column connected to an EASY transfer line and an EASY spray source. Shotgun runs were performed for spectral library generaon using UHPLC-HPRP fraconaon and searched using our new search engine (Pulsar) integrated in Spectronaut and comparavely using MaxQuant. Spectral libraries were generated with Biognosys’ Spectronaut. DIA methods were acquired with a gradient of 40min with a 45min injecon to injecon cycle. Targeted analysis of DIA runs was performed using Spectronaut. Conclusions The capillary flow LC-MS setup enabled robust idenficaon and quanficaon of over thousand plasma samples. A 40min gradient represented a opmal balance between reproducible idenficaon, method overhead me and throughput. At at depth close to 500 proteins per acquision, a data set completeness of 86% was reached at coefficients of variaon for pepdes below 10% (for ten repeated injecons). The setup was successfully used to analyze 1,546 samples of the DiOGenes study. A single column was used to perform all the acquisions, one ion funnel cleaning and 17 transfer capillary cleanings were performed, thereby intensity remained on a constant level. Variance analysis of the pools spiked on all plates showed a median protein coefficient of variaon of 16% at protein level (for the pools spread over 1,546 runs). Cumulavely 590 proteins were idenfied. Outlook The DiOGenes study will be subjected to further exploratory and comparave analysis. Figure 1: Establishment of a capillary flow setup (A) A Waters M-Class UPLC was connected to a Thermo Scienfic LUMOS mass spectrometer. As chromatography a 300um*15cm CSH C18 column 1.7um (Waters) was selected. (B) The idenficaons at different gradient lengths were calculated. (C) The peak capacity in dependence of the flow rate was determined. (D) The idenficaons in dependence of sample loading was evaluated. D B C A Figure 2: Opmized DIA method (A) A 40 min gradient at 5ul/min with 5ug sample was found to be an opmal balance between idenficaons (96% of the maximum and minimized overhead of 13%). This method results in 32 DIA per day. (B) The coefficient of variaon for ten repeated injecons of a plasma reference sample was calculated. The idenficaons were at on average of 486 proteins. A B Figure 4: Randomizaon design of the sample set on 96 well plates The DiOGenes sample set of 1546 samples was randomized and prepared in 17 96-well plates. The randomizaon of the plates was checked by mulple clinical variables like gender, country, clinical invesgaon day and age. The resulng design showed an equal distribuon of the samples over the plates and this was verified by a condional independence test. On each plate four sample pools (twice CID1 and twice CID3) were added to enable performance tacking and the anchors can be sued for batch correcon (i.e. with ComBat). This represents about one QC acquision per day or 4% acquision me for QC. Figure 5: Exploratory analysis of the sample set The DiOGenes sample set was acquired within 2 months. The data set was analyzed using Spectronaut on a workstaon containing 16 cores and 128GB of RAM within 24h. (A) Analysis of the intensity summed by run for the 1,546 runs. (B) Analysis of the idenficaons for the 1,546 runs. (C) Visualizaon of the runs was performed by an unsupervised clustering and heat map visualizaon. (D) Coefficient of variaon analysis of a set of reference pools (33 samples), distributed over the 1,546 runs (median 16.3% for protein). A C Study: 477 samples CID1 477 samples CID2 477 samples CID3 115 samples CID4 High organic wash Figure 3: Large-scale study from DiOGenes 1,546 samples from the DiOGenes data set were prepared in 17*96 well plates, using the opmized plasma sample preparaon from Biognosys. The samples were stemming from 4 clinical invesgaon days. (CID1 = base line, CID2 = 8-week weight loss, CID3 = aſter 6 months weight maintenance, CID4 = aſter 1 year weight maintenance). D B

Upload: others

Post on 19-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reproducible single shot plasma proteome …...Reproducible single shot plasma proteome profiling with high throughput capabilities on a robust capillary flow setup Introduction The

Roland M. Bruderer1, Oliver M. Bernhardt1, Tejas Gandhi1, Sebastian Müller1, Polina Mironova2, Ondine Walter2, Jérôme Carayol2, Jörg Hager2, Armand Valsesia2, Loïc Dayon2, Jan Muntel1, Arne Astrup3, Wim H.M. Saris4 and Lukas Reiter1

1) Biognosys, 8952 Zurich-Schlieren, Switzerland, 2) Nestlé Institute of Health Sciences, Lausanne, Switzerland, 3) University of Copenhagen, Copenhagen, Denmark 4) Maastricht University Medical Centre, Maastricht, The Netherland

Reproducible single shot plasma proteome profiling with high throughput capabilities on a robust capillary flow setup

IntroductionThe proteins from the blood circulatory system are indicative for the status of an individual. Comprehensive, robust high throughput analysis of the proteome will enable holistic analysis of the health state. Recently, three studies were published advancing this task based on nano-flow LC-MS DDA combined with TMT or label free quantification and MS1 alignment (Cominetti et al. 2015 JPR, Geyer et al. 2016 MSB, Geyer et al. 2016 Cell Systems). The average protein identifications per run ranged from 190 with the TMT (180min method, 4plex) to 284 (33min method) and 437 (80min method) for the label free approaches. Nano-flow setups are delicate and require long gradient overheads. To significantly reduce this limitation, we established a robust capillary-flow LC-MS-DIA. Then we applied it to DiOGenes. DioGenes is a pan-European program targeting the obesity problem from a dietary perspective: seeking new insights and new routes to prevention. 1546 samples from this project were analyzed.

METHODSPlasma samples were prepared with an optimized in solution digestion protocol using Biognosys’ sample preparation kit. Biognosys’ iRT kit was spiked into the samples before injection. The samples were acquired on a Thermo Scientific Fusion Lumos mass spectrometer. A capillary flow LC setup was used consisting of a Waters M-class UPLC and a 300um*150m Waters CSH 1.7um column connected to an EASY transfer line and an EASY spray source. Shotgun runs were performed for spectral library generation using UHPLC-HPRP fractionation and searched using our new search engine (Pulsar) integrated in Spectronaut and comparatively using MaxQuant. Spectral libraries were generated with Biognosys’ Spectronaut. DIA methods were acquired with a gradient of 40min with a 45min injection to injection cycle. Targeted analysis of DIA runs was performed using Spectronaut.

ConclusionsThe capillary flow LC-MS setup enabled robust identification and quantification of over thousand plasma samples. A 40min gradient represented a optimal balance between reproducible identification, method overhead time and throughput. At at depth close to 500 proteins per acquisition, a data set completeness of 86% was reached at coefficients of variation for peptides below 10% (for ten repeated injections). The setup was successfully used to analyze 1,546 samples of the DiOGenes study. A single column was used to perform all the acquisitions, one ion funnel cleaning and 17 transfer capillary cleanings were performed, thereby intensity remained on a constant level. Variance analysis of the pools spiked on all plates showed a median protein coefficient of variation of 16% at protein level (for the pools spread over 1,546 runs). Cumulatively 590 proteins were identified.

OutlookThe DiOGenes study will be subjected to further exploratory and comparative analysis.

Figure 1: Establishment of a capillary flow setup(A) A Waters M-Class UPLC was connected to a Thermo Scientific LUMOS mass spectrometer. As chromatography a 300um*15cm CSH C18 column 1.7um (Waters) was selected. (B) The identifications at different gradient lengths were calculated. (C) The peak capacity in dependence of the flow rate was determined. (D) The identifications in dependence of sample loading was evaluated.

D

B

C

A

Figure 2: Optimized DIA method(A) A 40 min gradient at 5ul/min with 5ug sample was found to be an optimal balance between identifications (96% of the maximum and minimized overhead of 13%). This method results in 32 DIA per day. (B) The coefficient of variation for ten repeated injections of a plasma reference sample was calculated. The identifications were at on average of 486 proteins.

A B

Figure 4: Randomization design of the sample set on 96 well platesThe DiOGenes sample set of 1546 samples was randomized and prepared in 17 96-well plates. The randomization of the plates was checked by multiple clinical variables like gender, country, clinical investigation day and age. The resulting design showed an equal distribution of the samples over the plates and this was verified by a conditional independence test. On each plate four sample pools (twice CID1 and twice CID3) were added to enable performance tacking and the anchors can be sued for batch correction (i.e. with ComBat). This represents about one QC acquisition per day or 4% acquisition time for QC.

Figure 5: Exploratory analysis of the sample setThe DiOGenes sample set was acquired within 2 months. The data set was analyzed using Spectronaut on a workstation containing 16 cores and 128GB of RAM within 24h. (A) Analysis of the intensity summed by run for the 1,546 runs. (B) Analysis of the identifications for the 1,546 runs. (C) Visualization of the runs was performed by an unsupervised clustering and heat map visualization. (D) Coefficient of variation analysis of a set of reference pools (33 samples), distributed over the 1,546 runs (median 16.3% for protein).

A

C

Study:477 samples CID1 477 samples CID2 477 samples CID3115 samples CID4

High organic wash

Figure 3: Large-scale study from DiOGenes1,546 samples from the DiOGenes data set were prepared in 17*96 well plates, using the optimized plasma sample preparation from Biognosys. The samples were stemming from 4 clinical investigation days. (CID1 = base line, CID2 = 8-week weight loss, CID3 = after 6 months weight maintenance, CID4 = after 1 year weight maintenance).

D

B