introduction to ebi for proteomics in elixir
TRANSCRIPT
EMBL-EBI Now and in the Future
The EMBL-EBI ELIXIR NodeDr. Juan Antonio Vizcano
Proteomics Team LeaderEMBL-EBIHinxton, Cambridge, UK
Juan A. [email protected] meetingTuebingen, 1 March 2017EBI activities split by the current ELIXIR platforms
Data
Tools
Interoperability
Compute
Training
Juan A. [email protected] meetingTuebingen, 1 March 2017
2
EBI activities split by the current ELIXIR platforms
Data
Tools
Interoperability
Compute
Training
Juan A. [email protected] meetingTuebingen, 1 March 2017
3
PRIDE stores mass spectrometry (MS)-based proteomics data:Peptide and protein expression data (identification and quantification)Post-translational modificationsMass spectra (raw data and peak lists)Technical and biological metadataAny other related information
Full support for tandem MS approachesAny data workflow is now supported.
PRIDE (PRoteomics IDEntifications) Archivehttp://www.ebi.ac.uk/pride/archiveMartens et al., Proteomics, 2005Vizcano et al., NAR, 2016
Juan A. [email protected] meetingTuebingen, 1 March 2017
4
PRIDE is leading the global ProteomeXchange Consortium
PASSEL (SRM data)
PRIDE (MS/MS data)
MassIVE (MS/MS data)
Raw
ID/Q
Meta
jPOST(MS/MS data)
Mandatory raw data deposition since July 2015
Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories.
http://www.proteomexchange.orgNew in 2016Vizcano et al., Nat Biotechnol, 2014Deustch et al., NAR, 2017
Juan A. [email protected] meetingTuebingen, 1 March 2017PRIDE Archive ~6,000 datasets from over 51 countries and >2,000 groupsData volume:Total: ~280 TB Number of all files: ~560,000PXD000320-324: ~ 4 TBPXD002319-26 ~2.4 TBPXD001471 ~1.6 TB>50% of all are publicly accessible~90% of all ProteomeXchange datasets
YearSubmissionsAll submissionsCompletePRIDE Archive growthIn 2016:
1,979 submitted datasets (Record) ~165 datasets per monthMain organisms represented~50% of datasets Homo sapiens Mus musculus Saccharomyces cerevisiae Arabidopsis thaliana Rattus norvegicus >900 reported taxa in total
Juan A. [email protected] meetingTuebingen, 1 March 2017(> 922 processed by MaxQuant)
6
Public proteomics datasets are being increasingly reused
Martens & Vizcano, Trends Bioch Sci, 2017
Data download in 2016: 243 TB
Juan A. [email protected] meetingTuebingen, 1 March 2017
7
Citations for PRIDE/PX keep increasing
Naik, Nature, 9 Nov 2016
Juan A. [email protected] meetingTuebingen, 1 March 2017
8
PRIDE and ELIXIR
PRIDE is by far, the worlds largest proteomics data repository.
PRIDE has submitted an application to become a core ELIXIR resource.
Not only an EMBL-EBI activity. Involvement of ELIXIR-DE in PRIDE activities:G. Mayer (Bochum) helping with data submissions.Federated PRIDE in the future?
Juan A. [email protected] meetingTuebingen, 1 March 2017
9
EBI activities split by the current ELIXIR platforms
Data
Tools
Interoperability
Compute
Training
Juan A. [email protected] meetingTuebingen, 1 March 2017
10
PRIDE Components: Data Submission ProcessPRIDE InspectorPX Submission Tool
mzIdentMLmzTabIn addition to PRIDE Archive, the PRIDE team develops and maintains different tools and software libraries to facilitate the handling and visualisation of MS proteomics data and the submission process
Juan A. [email protected] meetingTuebingen, 1 March 2017
11
EBI activities split by the current ELIXIR platforms
Data
Tools
Interoperability
Compute
Training
Juan A. [email protected] meetingTuebingen, 1 March 2017
12
Develops data standards for proteomics.Both data representation and annotation standards.Involves data producers, database providers, software producers, publishers, everyone who wants to be involvedInter-group activities: MIAPE and Controlled Vocabularies.Started in 2002, so some experience alreadyOne annual meeting in March-April, regular phone calls.
Closer interaction with the metabolomics community (MSI).
http://www.psidev.infoHUPO Proteomics Standards Initiative
Juan A. [email protected] meetingTuebingen, 1 March 2017Current PSI Standard File Formats for MS
Juan A. [email protected] meetingTuebingen, 1 March 2017PRIDE Inspector Toolsuite
Wang et al., Nat. Biotechnology, 2012Perez-Riverol et al., Bioinformatics, 2015Perez-Riverol et al., MCP, 2016
PRIDE Inspector - standalone tool to enable visualisation and validation of MS data. Build on top of ms-data-core-api - open source algorithms and libraries for computational proteomics.Supported file formats: mzIdentML, mzML, mzTab (PSI standards), and PRIDE XML.Broad functionality.
https://github.com/PRIDE-Utilities/ms-data-core-apihttps://github.com/PRIDE-Toolsuite/pride-inspector
Summary and QC charts
Peptide spectra annotation and visualization
Juan A. [email protected] meetingTuebingen, 1 March 2017
15
Public datasets from different omics: OmicsDIhttp://www.ebi.ac.uk/Tools/omicsdi/Aims to integrate of omics datasets (proteomics, transcriptomics, metabolomics and genomics at present). PRIDE MassIVEjPOSTPASSELGPMDB
ArrayExpressExpression Atlas
MetaboLightsMetabolomics WorkbenchGNPS
EGAPerez-Riverol et al., Nat Biotechnol, in press
Juan A. [email protected] meetingTuebingen, 1 March 2017PRIDE Proteomes provide an across-dataset and quality filtered view on PRIDE Archive data. Good PSMs are assessed using the PRIDE Cluster approach, based on spectral clustering.16
Summary
Juan A. [email protected] meetingTuebingen, 1 March 2017
17
EBI activities split by the current ELIXIR platforms
Data
Tools
Interoperability
Compute
Training
Juan A. [email protected] meetingTuebingen, 1 March 2017
18
Compute
Scalable, fully reproducible and freely available pipelines are needed, e.g. for certification purposes.
Need to do it for all the main proteomics analysis approaches, and for multi-omics techniques (e.g. proteogenomics).
It should be possible to deploy them in different computing set-ups (e.g. cloud environments).
Needed to tackle larger studies (e.g. in clinical context).
Juan A. [email protected] meetingTuebingen, 1 March 2017
19
ELIXIR Implementation Project
1-year project just started. Led by EMBL-EBI (Vizcano) and ELIXIR-Germany (Kohlbacher, Eisenacher).
Aim: Development of reproducible data analysis pipelines for shot-gun proteomics using the OpenMS framework.
Deployment in the EMBL-EBI Embassy cloud as proof of concept:Facilitate deployment in other cloud environments.
Direct connection with public datasets in PRIDE.
Juan A. [email protected] meetingTuebingen, 1 March 2017
20
EMBL-activities sorted by current ELIXIR platforms
Data
Tools
Interoperability
Compute
Training
Juan A. [email protected] meetingTuebingen, 1 March 2017
21
Annual WT Proteomics Bioinformatics Course
Other shorter raining activities (e.g. EMBL-EBI e-learning platform)More coordination of training activities is needed
Co-organised by L. Martens & myselfRunning for 10 yearsIt includes many relevant resources and toolsIt has been sponsored by EuPA
Juan A. [email protected] meetingTuebingen, 1 March 2017
22
Conclusions
Data -> PRIDE database
Tools -> PRIDE Inspector/ PX submission tool
Interoperability Data standards/ PRIDE and other resources
Compute -> Starting to work in data analysis pipelines
Training -> More coordination is needed
Juan A. [email protected] meetingTuebingen, 1 March 2017
23
Aknowledgements: PeopleAttila CsordasTobias TernentGerhard Mayer (de.NBI)
Johannes GrissYasset Perez-RiverolManuel Bernal-LlinaresAndrew Jarnuczak
Enrique Perez
Former team members, especially Rui Wang, Florian Reisinger, Noemi del Toro, Jose A. Dianes & Henning Hermjakob
Alvis Brazma, Ugis Sarkans & Robert Petryszak
Acknowledgements: The PRIDE Team
@pride_ebi@proteomexchange
Juan A. [email protected] meetingTuebingen, 1 March 201724