taverna workflows for analysis of mass spectrometry data ... · palmblad et al, asms 2011 poster tp...

1
Jeroen S. de Bruin, André M. Deelder and Magnus Palmblad Biomolecular Mass Spectrometry Unit, Leiden University Medical Center, Leiden, The Netherlands Taverna Workflows for Analysis of Mass Spectrometry Data in Proteomics TP 402 Overview Scientific workflow managers provide interfaces to data analysis pipelines, support scripting and link with numerous resources on the web. We have implemented a number of proteomics workflows in Taverna. Workflow managers provide clean and uniform interfaces to all pipelines. Workflows can be easily modified or expanded. Introduction Scientific workflow managers have become popular in bioinformatics for assembling specialized software modules or scripts into larger, comprehensive data analysis pipelines. They facilitate developing, demonstrat- ing and sharing data analysis workflows. Workflow managers such as Kepler, Galaxy and Taverna Work- bench [1] add visualization and a graphical user interface for designing and executing analytical pipelines. Taverna also has integrated support for BeanShell Java scripts and RShell for data processing in R. Methods We implemented existing workflows, such as the Trans-Proteomic Pipeline (TPP) [2], as well as new work- flows in Taverna Workbench version 2.2.0. The new workflows were designed to integrate a number of tools previously developed for aligning and combining ion trap MS/MS data with accurate MS data from FTICR [3] and for internal calibration of the FTICR MS data [4]. The workflows start with LC-MS and LC-MS/MS data in mzXML, except Workflow 4 that takes raw data direc- tories as input. The user-defined processing parameters can also be saved to an XML file for future refer- ence. The data from Bruker Daltonics instruments are first converted to mzXML by compassXport, version 3.0.2 or later. X!Tandem, Tandem2XML, PeptideProphet and other TPP tools are accessed from the standard TPP installation without modification. Here we used compassXport version 3.0.3 and TPP version 4.4 VU- VUZELA rev 1, Build 201010121551 (MinGW). To demonstrate the workflows, we used data from amaZon ETD ion traps and a 12 T solariX FTICR system (both Bruker Daltonics). For instruments from other vendors, the initial conversion of raw data to mzXML need to be replaced with the converter for that vendor’s data format. Some such converters are already available as Web services for use in Taverna. Results A common TPP analysis workflow was successfully implemented in Taverna. Additional workflows containing multiple levels of nested workflows were also implemented, including combinations of TPP tools and our existing programs. Java Beanshells can be used to call existing command-line tools, R support is still ‘in development’. Figure 1. The core of the TPP with the X!Tandem search engine, format conversion and PeptideProphet. Workflow_1a Workflow input ports Workflow output ports Workflow input ports Workflow output ports FASTA_File Tandem mzXML_File Tandem_Param_File Log_File Tandem2pepXML pepXML_Output PeptideProphet Tandem_Param_File FASTA_File Log_File mzXML_File Prophet_Pars pepXML_Output Workflow_2a Workflow input ports Workflow output ports Workflow output ports Workflow input ports Log_File pepAlign pepWarp msHybrid msHybrid_Pars pepAlign_Pars pepXML_Input FT_mzXML_Input IT_mzXML_Input Warped_pepXML_Output Warped_pepXML_File Hybrid_IT_mzXML_Output Hybrid_IT_mzXML_File FT_mzXML_File Log_File Workflow_1 IT_mzXML_File FASTA_File Tandem_Param_File Prophet_Pars pepAlign_Pars msHybrid_Pars Workflow input ports Workflow output ports FT_mzXML_File Workflow_2 msRecal Tandem_Param_File FASTA_File IT_mzXML_Files Prophet_Pars Log_File pepAlign_Pars msHybrid_Pars msRecal_Pars Loop_cnt Loop_Decrementer IT_mzXML_Files FT_mzXML_File Loop_cnt Echo_List compassXport Workflow input ports Workflow output ports Multi_compassXport Workflow input ports Workflow output ports Workflow output ports Workflow input ports Raw_Data_Dir compassXport Result_Dir Log_File Mode mzXML_Output Workflow_3 Raw_Data_Dir compassXport Result_Dir Log_File Mode mzXML_Output Hybrid_IT_mzXML_Files Recal_FT_mzXML_File Final_count msHybrid_Pars FTICR_Data_Dir pepAlign_Pars Result_Dir IT_Data_Dirs List_Data_Dirs Log_File Tandem_Param_File FASTA_File Prophet_Pars msRecal_Pars Iterations Mode_value Dir_Ext Figure 2. Workflow 2, or the TPP workflow in Figure 1 embedded as Workflow_1 and combined with pepAlign, msHybrid and pepWarp to align accurate mass LC-MS data (here from an FTICR) with LC-MS/MS data and make a hybrid mzXML file with accurate precursor masses from the FTICR and original ion trap tandem mass spectra. Figure 3. Workflow 3 embeds Workflow 2 and adds recalibration of the FTICR LC-MS data using peptides identified by MS/MS in the ion trap as internal calibrants. The alignment-hybridization-database search- recalibration sequence can be iterated an arbitrary number of times using a loop counter. Figure 4. In Workflow 4, compassXport is used to convert raw ion trap and FTICR data to mzXML and feed these files to Workflow_3. This workflow then takes multiple raw datasets from two different mass spectrom- eters, refines the data through chromatographic alignment and calibration and statistically evaluates the qual- ity of peptide-spectrum matches using PeptideProphet. 0 500 1000 1500 2000 2500 3000 3500 4000 -0.5 0 0.5 1 1.5 2 2.5 original hybrid 0 1000 2000 3000 4000 5000 6000 0 1000 2000 3000 4000 5000 6000 0 200 400 600 800 1000 1200 1400 1600 1800 -0.01 -0.005 0 0.005 0.01 hybrid recalibrated hybrid -1.0 0.0 1.0 0 1 2 3 4 5 mean error LC-MS retention time (s) LC-MS/MS retention time (s) PSMs/10 mDa mass measurement error PSMs/1 mDa mass measurement error mass measurement error (Da) iteration mass measurement error (Da) error (ppm) standard deviation pepAlign alignment a b c d Figure 6. Results from using Workflow 4 on E. coli LC-MS (FTICR) and LC-MS/MS (ion trap) datasets, illus- trating (a) successful chromatographic alignment, (b) generation and use of a hybid mzXML file in an X!Tandem search and (c) using identified peptides with aligned retention times as internal calibrants. Even though the refinement sequence can be iterated indefinitely, there is little or no improvement after two itera- tions (d). Conclusions Existing and new data processing workflows for mass spectrometry proteomics data could be imple- mented in Taverna Workbench. Taverna is useful in design, evaluation and presentation of such workflows. The workflows are easily modified and appended with additional modules. Future improvements include support of mzML and additional components, such as rt [5] for retention time prediction and use in peptide/protein identification. References 1. http://www.taverna.org.uk 2. http://www.proteomecenter.org/software.php 3. Palmblad et al, J. Am. Soc. Mass Spectrom, 18(10), 1835-1843 (2007) 4. Palmblad et al, Rapid Commun. Mass Spectrom, 20, 3076-3080 (2006) 5. Palmblad et al, Anal. Chem, 74(21), 5826-5830 (2002) Figure 5. The user designs the workflow ( X ) and provide input parameters, i.e. files or directories and pro- cessing options for the different modules ( Y ). Taverna can then execute the workflow and provide feedback in real time ( Z - a ). The processing can be local or remote, e.g. via Web services or a computer running a Tav- erna server. The processing is visualized in real time and a structured report collecting statistics and any error messages is generated when the workflow is completed. ` X Y Z [ \ ] ^ _ a

Upload: others

Post on 25-Sep-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Taverna Workflows for Analysis of Mass Spectrometry Data ... · Palmblad et al, ASMS 2011 poster TP 402 Created Date: 6/22/2011 2:28:40 PM

Jeroen S. de Bruin, André M. Deelder and Magnus Palmblad

Biomolecular Mass Spectrometry Unit, Leiden University Medical Center, Leiden, The Netherlands

Taverna Workflows for Analysis of Mass Spectrometry Data in Proteomics TP 402

Overview

Scientific workflow managers provide interfaces to data analysis pipelines, support scripting and link with numerous resources on the web.

We have implemented a number of proteomics workflows in Taverna. Workflow managers provide clean and uniform interfaces to all pipelines. Workflows can be easily modified or expanded.

Introduction

Scientific workflow managers have become popular in bioinformatics for assembling specialized software modules or scripts into larger, comprehensive data analysis pipelines. They facilitate developing, demonstrat-ing and sharing data analysis workflows. Workflow managers such as Kepler, Galaxy and Taverna Work-bench [1] add visualization and a graphical user interface for designing and executing analytical pipelines. Taverna also has integrated support for BeanShell Java scripts and RShell for data processing in R.

Methods

We implemented existing workflows, such as the Trans-Proteomic Pipeline (TPP) [2], as well as new work-flows in Taverna Workbench version 2.2.0. The new workflows were designed to integrate a number of tools previously developed for aligning and combining ion trap MS/MS data with accurate MS data from FTICR [3] and for internal calibration of the FTICR MS data [4].

The workflows start with LC-MS and LC-MS/MS data in mzXML, except Workflow 4 that takes raw data direc-tories as input. The user-defined processing parameters can also be saved to an XML file for future refer-ence. The data from Bruker Daltonics instruments are first converted to mzXML by compassXport, version 3.0.2 or later. X!Tandem, Tandem2XML, PeptideProphet and other TPP tools are accessed from the standard TPP installation without modification. Here we used compassXport version 3.0.3 and TPP version 4.4 VU-VUZELA rev 1, Build 201010121551 (MinGW).

To demonstrate the workflows, we used data from amaZon ETD ion traps and a 12 T solariX FTICR system (both Bruker Daltonics). For instruments from other vendors, the initial conversion of raw data to mzXML need to be replaced with the converter for that vendor’s data format. Some such converters are already available as Web services for use in Taverna.

Results

A common TPP analysis workflow was successfully implemented in Taverna. Additional workflows containing multiple levels of nested workflows were also implemented, including

combinations of TPP tools and our existing programs. Java Beanshells can be used to call existing command-line tools, R support is still ‘in development’.

Figure 1. The core of the TPP with the X!Tandem search engine, format conversion and PeptideProphet.

Workflow_1a

Workflow input ports

Workflow output ports

Workflow input ports

Workflow output ports

FASTA_File

Tandem

mzXML_FileTandem_Param_File Log_File

Tandem2pepXML

pepXML_Output

PeptideProphet

Tandem_Param_File FASTA_File Log_FilemzXML_File Prophet_Pars

pepXML_Output

Workflow_2a

Workflow input ports

Workflow output ports

Workflow output ports

Workflow input ports

Log_File

pepAlign

pepWarpmsHybrid

msHybrid_Pars pepAlign_Pars pepXML_InputFT_mzXML_Input IT_mzXML_Input

Warped_pepXML_Output

Warped_pepXML_File

Hybrid_IT_mzXML_Output

Hybrid_IT_mzXML_File

FT_mzXML_File Log_File

Workflow_1

IT_mzXML_File FASTA_FileTandem_Param_File Prophet_ParspepAlign_ParsmsHybrid_Pars

Workflow input ports

Workflow output ports

FT_mzXML_File

Workflow_2

msRecal

Tandem_Param_File FASTA_FileIT_mzXML_FilesProphet_ParsLog_File pepAlign_ParsmsHybrid_ParsmsRecal_Pars Loop_cnt

Loop_Decrementer

IT_mzXML_FilesFT_mzXML_File Loop_cnt

Echo_List

compassXport

Workflow input ports

Workflow output ports

Multi_compassXport

Workflow input ports

Workflow output ports

Workflow output ports

Workflow input ports

Raw_Data_Dir

compassXport

Result_Dir Log_FileMode

mzXML_Output

Workflow_3

Raw_Data_Dir

compassXport

Result_Dir Log_FileMode

mzXML_Output

Hybrid_IT_mzXML_FilesRecal_FT_mzXML_FileFinal_count

msHybrid_ParsFTICR_Data_Dir pepAlign_ParsResult_Dir IT_Data_Dirs

List_Data_Dirs

Log_File Tandem_Param_File FASTA_File Prophet_Pars msRecal_Pars IterationsMode_value Dir_Ext

Figure 2. Workflow 2, or the TPP workflow in Figure 1 embedded as Workflow_1 and combined with pepAlign, msHybrid and pepWarp to align accurate mass LC-MS data (here from an FTICR) with LC-MS/MS data and make a hybrid mzXML file with accurate precursor masses from the FTICR and original ion trap tandem mass spectra.

Figure 3. Workflow 3 embeds Workflow 2 and adds recalibration of the FTICR LC-MS data using peptides identified by MS/MS in the ion trap as internal calibrants. The alignment-hybridization-database search-recalibration sequence can be iterated an arbitrary number of times using a loop counter.

Figure 4. In Workflow 4, compassXport is used to convert raw ion trap and FTICR data to mzXML and feed these files to Workflow_3. This workflow then takes multiple raw datasets from two different mass spectrom-eters, refines the data through chromatographic alignment and calibration and statistically evaluates the qual-ity of peptide-spectrum matches using PeptideProphet.

0

500

1000

1500

2000

2500

3000

3500

4000

-0.5 0 0.5 1 1.5 2 2.5

originalhybrid

0

1000

2000

3000

4000

5000

6000

0 1000 2000 3000 4000 5000 6000

0

200

400

600

800

1000

1200

1400

1600

1800

-0.01 -0.005 0 0.005 0.01

hybridrecalibrated hybrid

-1.0

0.0

1.0

0 1 2 3 4 5

mean error

LC-MS retention time (s)

LC-M

S/M

S re

tent

ion

time

(s)

PS

Ms/

10 m

Da

mas

s m

easu

rem

ent e

rror

PS

Ms/

1 m

Da

mas

s m

easu

rem

ent e

rror

mass measurement error (Da) iteration

mass measurement error (Da)

erro

r (pp

m)

standard deviation

pepAlign alignment

a b

c d

Figure 6. Results from using Workflow 4 on E. coli LC-MS (FTICR) and LC-MS/MS (ion trap) datasets, illus-trating (a) successful chromatographic alignment, (b) generation and use of a hybid mzXML file in an X!Tandem search and (c) using identified peptides with aligned retention times as internal calibrants. Even though the refinement sequence can be iterated indefinitely, there is little or no improvement after two itera-tions (d).

Conclusions

Existing and new data processing workflows for mass spectrometry proteomics data could be imple- mented in Taverna Workbench.

Taverna is useful in design, evaluation and presentation of such workflows. The workflows are easily modified and appended with additional modules. Future improvements include support of mzML and additional components, such as rt [5] for retention

time prediction and use in peptide/protein identification.

References

1. http://www.taverna.org.uk

2. http://www.proteomecenter.org/software.php

3. Palmblad et al, J. Am. Soc. Mass Spectrom, 18(10), 1835-1843 (2007)

4. Palmblad et al, Rapid Commun. Mass Spectrom, 20, 3076-3080 (2006)

5. Palmblad et al, Anal. Chem, 74(21), 5826-5830 (2002)

Figure 5. The user designs the workflow ( ) and provide input parameters, i.e. files or directories and pro-cessing options for the different modules ( ). Taverna can then execute the workflow and provide feedback in real time ( - ). The processing can be local or remote, e.g. via Web services or a computer running a Tav-erna server. The processing is visualized in real time and a structured report collecting statistics and any error messages is generated when the workflow is completed.