EBI is an Outstation of the European Molecular Biology Laboratory.
17.10.2010
EMBL-EBI Proteomics data resources and services
Rafael JIMENEZ (EBI, Hinxton, UK)
4th Annual Forum for SMEsMunich, October 18th-19th 2010
Context
Integration, standards and dissemination
UniprotProtein Sequences
ReactomePathways
IntActInteractions
PRIDEMass Spec
DASPSICQUIC
EnCore
Annotation
Archive
sequence databases
(INSDC)
EMBL
DDBJNCBI
interactions
IMEx
IntAct
BIND
DIP
MINT
…
mass spec
ProteomeXchange
PRIDE
PeptideAtlas
GPMDB
Tranche
…
Sharing infrastructures
• Multiple repositories in a particular field
Collaboration and data exchange
More data coverage
• Proteomics Standards Initiative
• Work group of the Human Proteome Organization
• Defines community standards for data in proteomics• … facilitating data comparison, exchange and verification
PSI
4
http://www.psidev.info/
• Proteomics Standards Initiative
• Work group of the Human Proteome Organization
• Defines community standards for data in proteomics• … facilitating data comparison, exchange and verification
PSI
5
• MIAPE: The Minimum Information About a Proteomics Experiment
• Data and metadata from proteomics experiments
• Data: results
• Metadata: data about the data
• Where the samples came from
• How the analysis were performed
• Proteomics Standards Initiative
• Work group of the Human Proteome Organization
• Defines community standards for data in proteomics• … facilitating data comparison, exchange and verification
PSI
6
http://www.psidev.info/
7
PSI-MI
Data format
Data distribution
Control vocabulary
Data submission
Website
Standard format
Tools
PSICQUIC
PSI-MI CV
Reporting guideline MIMIx
Tools
PSI-MI XML
PSI-MITAB
XML Java API
MITAB Java API
XMLMakerFlattener
XML Validator
MIF25_view.xsl
MIF25_compact.xsl
MIF25_expand.xsl
PSI-MI XML files
PSI Excel Sheet
PSI Web Form
Data
Servers
Registry
Clients
• Work group of the Proteomics Standards Initiative
• Community coordination effort to ensure deposition of
data in public repositories
• Concentrating on …
• Annotation and representation of published MI data
• Accessibility of MI data to the user community
PSI - Molecular Interactions
Data format
Data distribution
Control vocabularyMIAPE
Reporting guideline
PSI-MI XML
PSI-MITAB
PSICQUIC
MIMIxPSI-MI CV
http://www.psidev.info/MI
Scoring
PSISCORE
PSI-MI format
• Community standard for Molecular Interactions
• Jointly developed by major data providers: BIND,
CellZome, DIP, GSK, HPRD, Hybrigenics, IntAct, MINT, MIPS, Serono,
U. Bielefeld, U. Bordeaux, U. Cambridge, and others
• Collecting and combining data from different sources
has become easier
• Standardized annotation through PSI-MI ontologies
• Tools from different organizations can be chained, e.g.
IntAct data in Cytoscape.
9
psi-mi/xml25 psi-mi/tab25
PSI-MI Control vocabulary
• Ontology browser: http://www.ebi.ac.uk/ontology-lookup
MIMIx
• MIAPE document guideline for molecular interactions• 1. Manuscript information
• 2. Experiment
• 3. Interaction
• 4. Confidence
Data distribution: PSICQUIC
• Proteomics Standards Initiative Common QUery InterfaCe.
• Community effort to standardise the way to access and retrieve data
from Molecular Interaction databases.
• Widely implemented by independent interaction data resources.
• Based on the PSI standard formats (PSI-MI XML and MITAB)
• Not limited to protein-protein interactions, also e.g.
• Drug-target interactions
• Simplified pathway data
• A registry listing resources implementing PSICQUIC
• Documentation: http://psicquic.googlecode.com
PSICQUIC implementation
….….
….....
….….
….....
PSICQUIC PSICQUIC PSICQUIC
Sample
Observation error
Interaction databases
Publications
PSICQUIC services
Annotation error
User
PSICQUIC
Registry
PSICQUIC client
PSICQUIC
Registry
• 13 sources
• 14.665.530
interactions
http://www.ebi.ac.uk/Tools/webservices/psicquic/registry/registry?action=STATUS
PSICQUIC example: REST queries
Bruno Aranda ([email protected])
http://mint.bio.uniroma2.it/mint/psicquic/webservices/current/search/query/p53
http://www.ebi.ac.uk/Tools/webservices/psicquic/intact/webservices/current/search/query/p53
http://www.ebi.ac.uk/Tools/webservices/psicquic/chembl/webservices/current/search/interactor/p53
1
2
3
17
PSICQUIC client
18
PSICQUIC clustering
19
PSISCORE
20
20
PSISCORE
Scoring algorithm
description, provided
by scoring server /
registry
Examplary visualization
of a scoring algorithm
with a 0-1 range
Scoring algorithms
offered by PSISCORE
servers
IMEx website http://www.imexconsortium.org/
IMEx: The International Molecular Exchange Consortium
• Group of major public interaction data providers sharing curation effort: DIP, IntAct, MINT, MPact, MatrixDB, MPIDB and BioGRID
• Independent molecular interaction resources
• Common curation standards for detailed curation
• Common data formats (PSI-MI XML, PSI-MITAB, PSICQUIC)
• Common accession number space
• Coordinated & non-redundant curation
• In production mode since February 2010
• Since 3/2009 supported by the European Commission under PSIMEx, contract number FP7-HEALTH-2007-223411, with additional partners Vital-IT, Nature,
Wiley, BiaCore (GE), U. Maryland, CSIC, TU Munich, MIPS, SCBIT (Shanghai)
Imex.sf.net
IntAct
• Freely available, open-source database system
• Public repository of molecular interactions
• Interactions manually curated and reviewed by experts
• Interaction derived from literature or direct user submissions
• Topic centric datasets (eg. Cancer, Chromatin, MSD…)
• Analysis tools for interaction data
• EBI database (part of the IMEx consortium and the PSI-MI)
• Data updated every week: ftp://ftp.ebi.ac.uk/pub/databases/intact
• Data formats available:
http://www.ebi.ac.uk/intact
IntAct statistics
IntAct statistics
• Interactions by identification method
• ~70% Y2H
• ~25% Affinity purification
• ~3% Physical data
• ~2% Other methods
IntAct statistics
IntAct: Search and results
Export
Custom columns
Filters
More results(PSICQUIC)
IntAct
29
PSI-MSS PSI-MS
PSI-PI
Data format
Tools
Standard format
Reporting guideline MIAPE-MS
mzMLTraML
- ProDaC
- OpenMS/TOPP
- ProteoWizard
- Proteios
- TPP
- X!Tandem
- Myrimatch
- InSilicoSpectro
- NCBI C++ toolkit
- Mascot
Validation, analysis, exporters, viewers , ...
- Phenyx
- PEAKS
- mzML_Exporter
- CompassXport
- Insilicos Viewer
-Jmzml
- Pride Inspector
- Pride Converter
…
Control vocabulary PSI-MS
Data format
Tools
Standard format
Reporting guideline MIAPE-MSI
mzIdentMLmzQuantML
- mzIdentML validator
- Mascot
- OMSSA
- Peaks
- Phenyx
- PLGS
- ProCon
- ProteinPilot
- ProteinScape
- SEQUEST
Validation, analysis, exporters, viewers , ...
- SpectraST
- Spectrum Mill
- X!Tandem
- OpenMS/TOPP
- Scaffold
- TPP
- Mascot Integra
- MIAPE MSI exporter
- CSV exporter
…
Tools
Data
WebsitePride Inspector
Pride Converter
Pride Biomart
Pride QProjects
PICR
OLS
• Work group of the Proteomics Standards Initiative
• Community coordination to ensure deposition of data in
public repositories
• Concentrating on …
• Annotation and representation of published MS data
• Accessibility of MS data to the user community
PSI - Mass Spectrometry Standards
Individual
proteins
Peptides
Protein
mixture
Peptide
Mass
Separation 2D-SDS-PAGE
Spot Cutting
Digestion
Trypsin
Mass Spectroscopy MALDI-TOF
Database
search
mzML
mzIdentML
Protein
identification
Quantification
mzQuantML
Protein
quantification
mzXML
mzData
analysisXML
PSI-MS Controlled vocabulary
31
• Share by PSI-MSS and PSI-PI
• Ontology browser: http://www.ebi.ac.uk/ontology-lookup
MIAPE
PSI-MS PSI-PI
ProteomExchange website
33
http://www.proteomeexchange.com
ProteomExchange:Enhancing Cooperation of Proteomics Data Repositories
• Group of major public Mass Spec data providers
• Single point of submission to proteomics repositories
• Encourage data exchange
• Common data formats (mzML, mzIdentML, mzQuantML)
• Common accession number space
• Coordinated & non-redundant data
• Since 2010 supported by the European Commission
35
Secondary resources
Data reprocessing and notification
Journals
WileyProteomics
NBT
JPR
MCP
Standards Local data management systems
mzQuantML
Release 1 Release 2 Release 3
ProHITS
MS-Lims
ProCon
Phenyx
OmicsHub
Other
LIMS
Pride Converter
Repositories
PrideMetadata,
Results
mzML
mzIdentML
Peptide
AtlasUniprot
NISTSpectrum
libraries
……
Imple
mente
d in
Data submission
RSS
feed
Central
Dataset
Look-up
Service
MIAPE
validation
Accession
Number/
reviewer login
Notification
Reprocessing notification
TrancheRaw
data
Peptidome
Metadata,
Results
xref xref
Data release / publications
Proposal structure
http://www.ebi.ac.uk/pride
The Proteomics Identifications Database
• Centralized, standards compliant, public data repository for proteomics identifications
• Open source
• Open data
• > 100.000.000 spectra
• ~ 4.000.000 protein identifications
• Detailed annotation of meta-data
• Vizcaíno JA, Côté R, Reisinger F, Foster JM, Mueller M, Rameseder J, Hermjakob H, Martens L.A guide to the Proteomics Identifications Database proteomics data repository.Proteomics. 2009 Sep;9(18):4276-83.PMID: 19662629
PRIDE data content
37
Release of PRIDE Converter
Protein IDs Peptide IDs
PRIDE data content
PRIDE Website
PART_OF
Search by
• Experiment
• Protein id
• Ontology
PRIDE Website
• Results
• Peptide IDs
• Protein IDs
• Mass spectra as peak lists
• Metadata - experiment
• Analysis
17.10.201041
BioMart – System Overview
ATGCTGTTGTGCATGCTGGACTGGATGGCCCGATGGATGCTGTTGTGCATGCTGGACTGGATGGCCCGATGG
Source data(MySQL, Oracle, Postgres)
DB
Mart
Bert Overduin
42
PRIDE Biomart
1. Filter 2. Attributes
3. Results
http://www.ebi.ac.uk/pride/prideMart.do
http://www.ebi.ac.uk/pride/biomart/martservice?query= XML
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query virtualSchemaName = "default" formatter = "TSV" header = "0"
uniqueRows = "0" count = "" datasetConfigVersion = "0.6" >
<Dataset name = "pride" interface = "default" >
<Filter name = "experiment_ac" value = "1632"/>
<Attribute name = "submitted_accession" />
</Dataset>
</Query>
Easy programmatic access!
17.10.201043
Ontology Lookup Service
Web services!
• REST
• SOAP
• A unified, single point of query for over 69 ontologies
(updated daily) and upwards of 850,000 terms.
http://www.ebi.ac.uk/ontology-lookup/
Protein Identifier Cross-Reference Service
Logical xref
(hyperlinked)Inactive xref
Secondary
Identifier
Active xref
(hyperlinked)Richard Cote
• Common protein identifier space
• Aliases/synonyms for an identifier
• Maps secondary IDs to recent primary IDs
Web services!
• REST
• SOAP
http://www.ebi.ac.uk/Tools/picr/
PRIDE Converter• Wizard-like graphical user interface
• Data formats into valid PRIDE XML
• Efficient access to the OLS
• FTP submissions
Pride inspector• mzML and PRIDE XML files
• Browse locally PRIDE database
• Facilitate publication reviews
47 74 Protein DAS sources!
PRIDE
DAS 1.6
DAS & Dasty3Uniform access to multiple
repositories of biological
data distributed in different
geographical locations.
• New resource of High-quality data
• Determine which data from PRIDE is good
• Support evidence for protein existence in UniProt
Data exports:•Links, DAS track for all PRIDE data
•Quality controlled, e.g. “Protein Existence”, Expression Atlas from PRIDE-Q
PRIDE-Q *
Curation
Automated rules,
Curator override
PRIDE-Q
•Human pathway knowledgebase
•Manually curated
•Open source, open data
•Collaboration between EBI, OCRI and NYU
•Online since 2003•Matthews L, et al: Reactome knowledgebase of human biological pathways and processes. Nucleic Acids Res. 2008 Nov 3.
http://www.ebi.ac.uk/pride
Reactome
50
Stats
http://reactome.oicr.on.ca
Sid
eb
ar
Main
text
Navigation bar
New site! Coming soon …
Pathway description
authors
summary
speciesGO term
other species
molecules
UniProtEnsembl
MIMKEGG
ChEBICompound
Entrez Gene
HapmapUCSC
RefSeq
PubChem
The Pathway BrowserSpecies selector
Search &
Analyze barSidebar
Pathway Diagram Panel
Details Panel (hidden)
Zoom/move
toolbar
Thumbnail
Pathway
Reaction
Black-box
Pathway Analysis – Overrepresentation
„Top-level‟
Reveal next level
P-val, In set/In pathway
Species Comparison II
Yellow = human/rat
Blue = human only
Grey = not relevant
Black = Complex
Expression Analysis II „Hot‟ = high
„Cold‟ = low
Molecular Interaction Overlay
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query virtualSchemaName = "default" formatter = "TSV" header = "0"
uniqueRows = "0" count = "" datasetConfigVersion = "0.6" >
<Dataset name = "pathway" interface = "default" >
<Filter name = "referencepeptidesequence_uniprot_id_list"
value = "P25205"/>
<Attribute name = "stableidentifier_identifier" />
<Attribute name = "pathway_db_id" />
</Dataset>
</Query>
BioMart1. Filter
2. Attributes
3. Results
http://www.reactome.org:5555/biomart/martservice?query=XMLEasy programmatic access!
http://www.reactome.org:5555/biomart/martview
Adknoledgments …
• EU:• ProDaC (to 03/2009)
• ProteomeBinders
• BioSapiens
• Felics
• LipidomicNet
• APO-SYS
• PSIMEx (since 03/2009)
• EMBL
• Wellcome Trust
• NIH
The Funding
60
Lab B
Private Data in
PRIDE “Collaboration”
Comparison
Reviewer
Lab A
Lab C
PRIDE private mode
Publicly available data
•Private mode allows data
analysis within a
collaboration
•PRIDE tools are already
accessible in private mode, in
particular experiment
comparison (alpha)
•On manuscript submission,
reviewers can access the data
in standard format
Lab B
Private Data
“Collaboration”
Reviewer
Lab A
Lab C
PRIDE private mode
Publicly available data
•Private mode allows data
analysis within a
collaboration
•PRIDE tools are already
accessible in private mode, in
particular experiment
comparison (alpha)
•On manuscript submission,
reviewers can access the data
in standard format
•On manuscript publication,
the data becomes public