introduction to ebi for proteomics in elixir

23
The EMBL-EBI ELIXIR Node Dr. Juan Antonio Vizcaíno Proteomics Team Leader EMBL-EBI Hinxton, Cambridge, UK

Upload: juan-antonio-vizcaino

Post on 05-Apr-2017

55 views

Category:

Science


0 download

TRANSCRIPT

EMBL-EBI Now and in the Future

The EMBL-EBI ELIXIR NodeDr. Juan Antonio Vizcano

Proteomics Team LeaderEMBL-EBIHinxton, Cambridge, UK

Juan A. [email protected] meetingTuebingen, 1 March 2017EBI activities split by the current ELIXIR platforms

Data

Tools

Interoperability

Compute

Training

Juan A. [email protected] meetingTuebingen, 1 March 2017

2

EBI activities split by the current ELIXIR platforms

Data

Tools

Interoperability

Compute

Training

Juan A. [email protected] meetingTuebingen, 1 March 2017

3

PRIDE stores mass spectrometry (MS)-based proteomics data:Peptide and protein expression data (identification and quantification)Post-translational modificationsMass spectra (raw data and peak lists)Technical and biological metadataAny other related information

Full support for tandem MS approachesAny data workflow is now supported.

PRIDE (PRoteomics IDEntifications) Archivehttp://www.ebi.ac.uk/pride/archiveMartens et al., Proteomics, 2005Vizcano et al., NAR, 2016

Juan A. [email protected] meetingTuebingen, 1 March 2017

4

PRIDE is leading the global ProteomeXchange Consortium

PASSEL (SRM data)

PRIDE (MS/MS data)

MassIVE (MS/MS data)

Raw

ID/Q

Meta

jPOST(MS/MS data)

Mandatory raw data deposition since July 2015

Goal: Development of a framework to allow standard data submission and dissemination pipelines between the main existing proteomics repositories.

http://www.proteomexchange.orgNew in 2016Vizcano et al., Nat Biotechnol, 2014Deustch et al., NAR, 2017

Juan A. [email protected] meetingTuebingen, 1 March 2017PRIDE Archive ~6,000 datasets from over 51 countries and >2,000 groupsData volume:Total: ~280 TB Number of all files: ~560,000PXD000320-324: ~ 4 TBPXD002319-26 ~2.4 TBPXD001471 ~1.6 TB>50% of all are publicly accessible~90% of all ProteomeXchange datasets

YearSubmissionsAll submissionsCompletePRIDE Archive growthIn 2016:

1,979 submitted datasets (Record) ~165 datasets per monthMain organisms represented~50% of datasets Homo sapiens Mus musculus Saccharomyces cerevisiae Arabidopsis thaliana Rattus norvegicus >900 reported taxa in total

Juan A. [email protected] meetingTuebingen, 1 March 2017(> 922 processed by MaxQuant)

6

Public proteomics datasets are being increasingly reused

Martens & Vizcano, Trends Bioch Sci, 2017

Data download in 2016: 243 TB

Juan A. [email protected] meetingTuebingen, 1 March 2017

7

Citations for PRIDE/PX keep increasing

Naik, Nature, 9 Nov 2016

Juan A. [email protected] meetingTuebingen, 1 March 2017

8

PRIDE and ELIXIR

PRIDE is by far, the worlds largest proteomics data repository.

PRIDE has submitted an application to become a core ELIXIR resource.

Not only an EMBL-EBI activity. Involvement of ELIXIR-DE in PRIDE activities:G. Mayer (Bochum) helping with data submissions.Federated PRIDE in the future?

Juan A. [email protected] meetingTuebingen, 1 March 2017

9

EBI activities split by the current ELIXIR platforms

Data

Tools

Interoperability

Compute

Training

Juan A. [email protected] meetingTuebingen, 1 March 2017

10

PRIDE Components: Data Submission ProcessPRIDE InspectorPX Submission Tool

mzIdentMLmzTabIn addition to PRIDE Archive, the PRIDE team develops and maintains different tools and software libraries to facilitate the handling and visualisation of MS proteomics data and the submission process

Juan A. [email protected] meetingTuebingen, 1 March 2017

11

EBI activities split by the current ELIXIR platforms

Data

Tools

Interoperability

Compute

Training

Juan A. [email protected] meetingTuebingen, 1 March 2017

12

Develops data standards for proteomics.Both data representation and annotation standards.Involves data producers, database providers, software producers, publishers, everyone who wants to be involvedInter-group activities: MIAPE and Controlled Vocabularies.Started in 2002, so some experience alreadyOne annual meeting in March-April, regular phone calls.

Closer interaction with the metabolomics community (MSI).

http://www.psidev.infoHUPO Proteomics Standards Initiative

Juan A. [email protected] meetingTuebingen, 1 March 2017Current PSI Standard File Formats for MS

Juan A. [email protected] meetingTuebingen, 1 March 2017PRIDE Inspector Toolsuite

Wang et al., Nat. Biotechnology, 2012Perez-Riverol et al., Bioinformatics, 2015Perez-Riverol et al., MCP, 2016

PRIDE Inspector - standalone tool to enable visualisation and validation of MS data. Build on top of ms-data-core-api - open source algorithms and libraries for computational proteomics.Supported file formats: mzIdentML, mzML, mzTab (PSI standards), and PRIDE XML.Broad functionality.

https://github.com/PRIDE-Utilities/ms-data-core-apihttps://github.com/PRIDE-Toolsuite/pride-inspector

Summary and QC charts

Peptide spectra annotation and visualization

Juan A. [email protected] meetingTuebingen, 1 March 2017

15

Public datasets from different omics: OmicsDIhttp://www.ebi.ac.uk/Tools/omicsdi/Aims to integrate of omics datasets (proteomics, transcriptomics, metabolomics and genomics at present). PRIDE MassIVEjPOSTPASSELGPMDB

ArrayExpressExpression Atlas

MetaboLightsMetabolomics WorkbenchGNPS

EGAPerez-Riverol et al., Nat Biotechnol, in press

Juan A. [email protected] meetingTuebingen, 1 March 2017PRIDE Proteomes provide an across-dataset and quality filtered view on PRIDE Archive data. Good PSMs are assessed using the PRIDE Cluster approach, based on spectral clustering.16

Summary

Juan A. [email protected] meetingTuebingen, 1 March 2017

17

EBI activities split by the current ELIXIR platforms

Data

Tools

Interoperability

Compute

Training

Juan A. [email protected] meetingTuebingen, 1 March 2017

18

Compute

Scalable, fully reproducible and freely available pipelines are needed, e.g. for certification purposes.

Need to do it for all the main proteomics analysis approaches, and for multi-omics techniques (e.g. proteogenomics).

It should be possible to deploy them in different computing set-ups (e.g. cloud environments).

Needed to tackle larger studies (e.g. in clinical context).

Juan A. [email protected] meetingTuebingen, 1 March 2017

19

ELIXIR Implementation Project

1-year project just started. Led by EMBL-EBI (Vizcano) and ELIXIR-Germany (Kohlbacher, Eisenacher).

Aim: Development of reproducible data analysis pipelines for shot-gun proteomics using the OpenMS framework.

Deployment in the EMBL-EBI Embassy cloud as proof of concept:Facilitate deployment in other cloud environments.

Direct connection with public datasets in PRIDE.

Juan A. [email protected] meetingTuebingen, 1 March 2017

20

EMBL-activities sorted by current ELIXIR platforms

Data

Tools

Interoperability

Compute

Training

Juan A. [email protected] meetingTuebingen, 1 March 2017

21

Annual WT Proteomics Bioinformatics Course

Other shorter raining activities (e.g. EMBL-EBI e-learning platform)More coordination of training activities is needed

Co-organised by L. Martens & myselfRunning for 10 yearsIt includes many relevant resources and toolsIt has been sponsored by EuPA

Juan A. [email protected] meetingTuebingen, 1 March 2017

22

Conclusions

Data -> PRIDE database

Tools -> PRIDE Inspector/ PX submission tool

Interoperability Data standards/ PRIDE and other resources

Compute -> Starting to work in data analysis pipelines

Training -> More coordination is needed

Juan A. [email protected] meetingTuebingen, 1 March 2017

23

Aknowledgements: PeopleAttila CsordasTobias TernentGerhard Mayer (de.NBI)

Johannes GrissYasset Perez-RiverolManuel Bernal-LlinaresAndrew Jarnuczak

Enrique Perez

Former team members, especially Rui Wang, Florian Reisinger, Noemi del Toro, Jose A. Dianes & Henning Hermjakob

Alvis Brazma, Ugis Sarkans & Robert Petryszak

Acknowledgements: The PRIDE Team

@pride_ebi@proteomexchange

Juan A. [email protected] meetingTuebingen, 1 March 201724