experiences to learn from the ms proteomics field

19
Experiences to learn from the mass spectrometry proteomics field Dr. Juan Antonio Vizcaíno Proteomics Team Leader EMBL-EBI Hinxton, Cambridge, UK

Upload: juan-antonio-vizcaino

Post on 06-Apr-2017

175 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Experiences to learn from the MS proteomics field

Experiences to learn from the mass spectrometry proteomics field

Dr. Juan Antonio Vizcaíno

Proteomics Team LeaderEMBL-EBIHinxton, Cambridge, UK

Page 2: Experiences to learn from the MS proteomics field

Juan A. Vizcaí[email protected]

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

•Develops data format standards for proteomics.•Both data representation and annotation standards.•Involves data producers, database providers, software producers, publishers, …•Active Workgroups: MI, MS, PI and now a new QC group.•Inter-group activities: MIAPE and Controlled Vocabularies.•Started in 2002, so some experience already…•One annual meeting in March-April, regular phone calls.•Peer Review for standards: PSI document process.

http://www.psidev.info

HUPO Proteomics Standards Initiative

Page 3: Experiences to learn from the MS proteomics field

Juan A. Vizcaí[email protected]

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

Current PSI Proteomics Standard File Formats for Mass Spectrometry

• mzMLMS data

• mzIdentMLIdentification

• mzQuantMLQuantitation

• mzTabFinal Results

• TraMLSRM

Page 4: Experiences to learn from the MS proteomics field

Juan A. Vizcaí[email protected]

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

• mzML is actively used already to store MS data (very flexible format).

• mzTab is a tab-delimited format that it is being extended to support MS metabolomics data in a better way. It can be used for both identification and quantification results.

• mzQuantML and TraML could be used with small molecule data, but it has not been tested.

Reuse of data standards in metabolomics

Page 5: Experiences to learn from the MS proteomics field

Juan A. Vizcaí[email protected]

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

• mzML is actively used already to store MS data (very flexible format).

• mzTab is a tab-delimited format that it is being extended to support MS metabolomics data in a better way. It can be used for both identification and quantification results.

• Meeting next week in Liverpool organised by A. Jones.

• mzQuantML and TraML could be used with small molecule data, but it has not been tested.

Reuse of data standards in metabolomics

Page 6: Experiences to learn from the MS proteomics field

Juan A. Vizcaí[email protected]

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

Current Standard File Formats that are or could be used in metabolomics

• mzMLMS data

• mzIdentMLIdentification

• mzQuantML *Quantitation

• mzTabFinal Results

• TraML * SRM

Page 7: Experiences to learn from the MS proteomics field

Juan A. Vizcaí[email protected]

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

Current vision for data exchange standards in MS

Neumann (IPB-Halle), Proteomics and HUPO-PSI community

Page 8: Experiences to learn from the MS proteomics field

Juan A. Vizcaí[email protected]

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

imzML: data standard for mass imaging data

http://www.imzml.org

Not a PSI format: Based on mzML

Page 9: Experiences to learn from the MS proteomics field

Juan A. Vizcaí[email protected]

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

qcML files to be generated after submission

• XML format that captures output from QC pipelines

Page 10: Experiences to learn from the MS proteomics field

Juan A. Vizcaí[email protected]

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

• Don’t reinvent the wheel! There is no need…

• Software libraries (APIs) to handle the standards.

• Data converters.

• Data visualisation tools.

• Data analysis tools and workflows.

• A big proportion of the available software is open source.

Opportunity to reuse and extend existing software

Page 11: Experiences to learn from the MS proteomics field

Juan A. Vizcaí[email protected]

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

mzML: more software available

The most popular search engines support mzML

Many parser libraries available

Conversion from raw files into mzMLhttp://www.psidev.info/mzml_1_0_0

Page 12: Experiences to learn from the MS proteomics field

Juan A. Vizcaí[email protected]

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

Data visualisation: PRIDE Inspector Toolsuite

Wang et al., Nat. Biotechnology, 2012Perez-Riverol et al., MCP, 2016

PRIDE Inspector Toolsuite

PRIDE Inspector Toolsuite supports:

- PRIDE XML- mzIdentML - mzML & all types of spectra files- mzTab identification and Quantification

https://github.com/PRIDE-Toolsuite/

Page 13: Experiences to learn from the MS proteomics field

Juan A. Vizcaí[email protected]

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

OpenMS/TOPP• OpenMS – an open-source C++ framework for computational

mass spectrometry• Jointly developed at ETH Zürich, FU Berlin, University of Tübingen• Open source: BSD 3-clause license• Portable: available on Windows, OSX, and Linux• TOPP – The OpenMS Proteomics Pipeline

• Building blocks: one application for each analysis step• All applications share identical user interfaces• Uses PSI standard formats and integrates seamlessly with other applications

supporting these formats• Can be integrated in various workflow systems

• TOPPAS – TOPP Pipeline Assistant• Galaxy• WS-PGRADE• KNIME

Kohlbacher et al., Bioinformatics (2007), 23:e191

Page 14: Experiences to learn from the MS proteomics field

Juan A. Vizcaí[email protected]

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

ProteomeXchange Consortium•Goal: Development of a framework to allow

standard data submission and dissemination pipelines between the main existing proteomics repositories.

•Includes PeptideAtlas (ISB, Seattle), PRIDE (Cambridge, UK), MassIVE (UCSD, San Diego) and jPOST (Japan) will be integrated in July 2016.

•EU FP7 CA (01/2011-> 06/2014).

•Common identifier space (PXD identifiers)

•Two supported data workflows: MS/MS and SRM.

•Main objective: Make life easier for researchers

http://www.proteomexchange.org Vizcaíno et al., Nat Biotechnol, 2014

Page 15: Experiences to learn from the MS proteomics field

Juan A. Vizcaí[email protected]

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

PRIDE Archive submitted datasets up until 1st April, 2016

• In the last complete year: on average, >150 submitted datasets per month

• Size of PRIDE Archive: ~ 220TB

Page 16: Experiences to learn from the MS proteomics field

Juan A. Vizcaí[email protected]

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

Vendor support for mzIdentML has grown in parallel with the number of submitted datasets

Search Engine

Results + MS files

Search engines

mzIdentML

- Mascot- MSGF+- Myrimatch and related tools from D. Tabb’s

lab- OpenMS- PEAKS- PeptideShaker (several open source tools)- ProCon (ProteomeDiscoverer, Sequest)- Scaffold- TPP via the idConvert tool (ProteoWizard)- ProteinPilot (from version 5.0)- X!Tandem (from PILEDRIVER version)- Others: library for X!Tandem conversion, lab

internal pipelines, …- Crux

An increasing number of tools support export to mzIdentML 1.1

Updated list: http://www.psidev.info/tools-implementing-mzIdentML#.

Page 17: Experiences to learn from the MS proteomics field

Juan A. Vizcaí[email protected]

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

•Develop tools in parallel with the data standards.

•Don’t reinvent the wheel! Many ideas and software already there.

•Ideally, get vendors involved as soon as possible.

•Data repositories and data standards are a perfect match.

Conclusions

Page 18: Experiences to learn from the MS proteomics field

Juan A. Vizcaí[email protected]

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

Acknowlegements and further reading…

http://www.psidev.info

Poster P18

Page 19: Experiences to learn from the MS proteomics field

Juan A. Vizcaí[email protected]

12th Conference of the Metabolomics SocietyDublin, 27 June 2016

Questions?