taverna workflows for analysis of mass spectrometry data ... · palmblad et al, asms 2011 poster tp...
TRANSCRIPT
Jeroen S. de Bruin, André M. Deelder and Magnus Palmblad
Biomolecular Mass Spectrometry Unit, Leiden University Medical Center, Leiden, The Netherlands
Taverna Workflows for Analysis of Mass Spectrometry Data in Proteomics TP 402
Overview
Scientific workflow managers provide interfaces to data analysis pipelines, support scripting and link with numerous resources on the web.
We have implemented a number of proteomics workflows in Taverna. Workflow managers provide clean and uniform interfaces to all pipelines. Workflows can be easily modified or expanded.
Introduction
Scientific workflow managers have become popular in bioinformatics for assembling specialized software modules or scripts into larger, comprehensive data analysis pipelines. They facilitate developing, demonstrat-ing and sharing data analysis workflows. Workflow managers such as Kepler, Galaxy and Taverna Work-bench [1] add visualization and a graphical user interface for designing and executing analytical pipelines. Taverna also has integrated support for BeanShell Java scripts and RShell for data processing in R.
Methods
We implemented existing workflows, such as the Trans-Proteomic Pipeline (TPP) [2], as well as new work-flows in Taverna Workbench version 2.2.0. The new workflows were designed to integrate a number of tools previously developed for aligning and combining ion trap MS/MS data with accurate MS data from FTICR [3] and for internal calibration of the FTICR MS data [4].
The workflows start with LC-MS and LC-MS/MS data in mzXML, except Workflow 4 that takes raw data direc-tories as input. The user-defined processing parameters can also be saved to an XML file for future refer-ence. The data from Bruker Daltonics instruments are first converted to mzXML by compassXport, version 3.0.2 or later. X!Tandem, Tandem2XML, PeptideProphet and other TPP tools are accessed from the standard TPP installation without modification. Here we used compassXport version 3.0.3 and TPP version 4.4 VU-VUZELA rev 1, Build 201010121551 (MinGW).
To demonstrate the workflows, we used data from amaZon ETD ion traps and a 12 T solariX FTICR system (both Bruker Daltonics). For instruments from other vendors, the initial conversion of raw data to mzXML need to be replaced with the converter for that vendor’s data format. Some such converters are already available as Web services for use in Taverna.
Results
A common TPP analysis workflow was successfully implemented in Taverna. Additional workflows containing multiple levels of nested workflows were also implemented, including
combinations of TPP tools and our existing programs. Java Beanshells can be used to call existing command-line tools, R support is still ‘in development’.
Figure 1. The core of the TPP with the X!Tandem search engine, format conversion and PeptideProphet.
Workflow_1a
Workflow input ports
Workflow output ports
Workflow input ports
Workflow output ports
FASTA_File
Tandem
mzXML_FileTandem_Param_File Log_File
Tandem2pepXML
pepXML_Output
PeptideProphet
Tandem_Param_File FASTA_File Log_FilemzXML_File Prophet_Pars
pepXML_Output
Workflow_2a
Workflow input ports
Workflow output ports
Workflow output ports
Workflow input ports
Log_File
pepAlign
pepWarpmsHybrid
msHybrid_Pars pepAlign_Pars pepXML_InputFT_mzXML_Input IT_mzXML_Input
Warped_pepXML_Output
Warped_pepXML_File
Hybrid_IT_mzXML_Output
Hybrid_IT_mzXML_File
FT_mzXML_File Log_File
Workflow_1
IT_mzXML_File FASTA_FileTandem_Param_File Prophet_ParspepAlign_ParsmsHybrid_Pars
Workflow input ports
Workflow output ports
FT_mzXML_File
Workflow_2
msRecal
Tandem_Param_File FASTA_FileIT_mzXML_FilesProphet_ParsLog_File pepAlign_ParsmsHybrid_ParsmsRecal_Pars Loop_cnt
Loop_Decrementer
IT_mzXML_FilesFT_mzXML_File Loop_cnt
Echo_List
compassXport
Workflow input ports
Workflow output ports
Multi_compassXport
Workflow input ports
Workflow output ports
Workflow output ports
Workflow input ports
Raw_Data_Dir
compassXport
Result_Dir Log_FileMode
mzXML_Output
Workflow_3
Raw_Data_Dir
compassXport
Result_Dir Log_FileMode
mzXML_Output
Hybrid_IT_mzXML_FilesRecal_FT_mzXML_FileFinal_count
msHybrid_ParsFTICR_Data_Dir pepAlign_ParsResult_Dir IT_Data_Dirs
List_Data_Dirs
Log_File Tandem_Param_File FASTA_File Prophet_Pars msRecal_Pars IterationsMode_value Dir_Ext
Figure 2. Workflow 2, or the TPP workflow in Figure 1 embedded as Workflow_1 and combined with pepAlign, msHybrid and pepWarp to align accurate mass LC-MS data (here from an FTICR) with LC-MS/MS data and make a hybrid mzXML file with accurate precursor masses from the FTICR and original ion trap tandem mass spectra.
Figure 3. Workflow 3 embeds Workflow 2 and adds recalibration of the FTICR LC-MS data using peptides identified by MS/MS in the ion trap as internal calibrants. The alignment-hybridization-database search-recalibration sequence can be iterated an arbitrary number of times using a loop counter.
Figure 4. In Workflow 4, compassXport is used to convert raw ion trap and FTICR data to mzXML and feed these files to Workflow_3. This workflow then takes multiple raw datasets from two different mass spectrom-eters, refines the data through chromatographic alignment and calibration and statistically evaluates the qual-ity of peptide-spectrum matches using PeptideProphet.
0
500
1000
1500
2000
2500
3000
3500
4000
-0.5 0 0.5 1 1.5 2 2.5
originalhybrid
0
1000
2000
3000
4000
5000
6000
0 1000 2000 3000 4000 5000 6000
0
200
400
600
800
1000
1200
1400
1600
1800
-0.01 -0.005 0 0.005 0.01
hybridrecalibrated hybrid
-1.0
0.0
1.0
0 1 2 3 4 5
mean error
LC-MS retention time (s)
LC-M
S/M
S re
tent
ion
time
(s)
PS
Ms/
10 m
Da
mas
s m
easu
rem
ent e
rror
PS
Ms/
1 m
Da
mas
s m
easu
rem
ent e
rror
mass measurement error (Da) iteration
mass measurement error (Da)
erro
r (pp
m)
standard deviation
pepAlign alignment
a b
c d
Figure 6. Results from using Workflow 4 on E. coli LC-MS (FTICR) and LC-MS/MS (ion trap) datasets, illus-trating (a) successful chromatographic alignment, (b) generation and use of a hybid mzXML file in an X!Tandem search and (c) using identified peptides with aligned retention times as internal calibrants. Even though the refinement sequence can be iterated indefinitely, there is little or no improvement after two itera-tions (d).
Conclusions
Existing and new data processing workflows for mass spectrometry proteomics data could be imple- mented in Taverna Workbench.
Taverna is useful in design, evaluation and presentation of such workflows. The workflows are easily modified and appended with additional modules. Future improvements include support of mzML and additional components, such as rt [5] for retention
time prediction and use in peptide/protein identification.
References
1. http://www.taverna.org.uk
2. http://www.proteomecenter.org/software.php
3. Palmblad et al, J. Am. Soc. Mass Spectrom, 18(10), 1835-1843 (2007)
4. Palmblad et al, Rapid Commun. Mass Spectrom, 20, 3076-3080 (2006)
5. Palmblad et al, Anal. Chem, 74(21), 5826-5830 (2002)
Figure 5. The user designs the workflow ( ) and provide input parameters, i.e. files or directories and pro-cessing options for the different modules ( ). Taverna can then execute the workflow and provide feedback in real time ( - ). The processing can be local or remote, e.g. via Web services or a computer running a Tav-erna server. The processing is visualized in real time and a structured report collecting statistics and any error messages is generated when the workflow is completed.