the complex portal - relationship to gene ontology sandra orchard (intact)
TRANSCRIPT
The Complex Portal- relationship to Gene Ontology
Sandra Orchard(IntAct)
Project Aim
• To design a Online Portal to search and visualise protein complexes
• Including cross-referencing to source databases and beyond
• Export to interested parties in a format of their choice
• Incorporate the data into network analysis tools
• Emphasis on major model organisms, chosen to span the taxonomic range –
• Homo sapiens, Saccharomyces cerevisiae, Escherichia coli
• Mus musculus, Caenorhabditis elegans, Drosophila melanogaster, Saccharomyces pombe, Arabidopsis thaliana
• All data held in IntAct DB – share editor, protein update mechanism, QC procedures
• Separate search and visualisation facility
• wwwdev.ebi.ac.uk/intact/complex/
Definition: stable protein complexes
A stable set (2 or more) of interacting protein molecules which
• can be co-purified and
• have been shown to exist as a functional unit in vivo.
Non-protein molecules (e.g. small molecules, nucleic acids) may also be present in the complex.
What is not a stable complex?• Two proteins associated in a pulldown /
coimmunoprecipitation with no functional link• Enzyme/substrate, receptor/ligand or similar transient
interactions• Exception - obligate complex that requires substrate/ligand,
e.g. PDGF receptors
Source Databases
• PDBe (EBI) – almost 1000 complexes imported
• ChEMBL (EBI) – 81 complexes imported, more to come with each release
• MatrixDB (Sylvie Richard-Blum, Univ. of Lyon)
• Mining UniProt – yeast (Bernd Roechert, SIB – manually)
• Reactome – human (EBI)
• Manual curation from IMEx DBs & the literature
• Gramene – Arabidopsis
• Unmaintained web resources – CYGD (yeast), CORUM (human), E. coli website, 3D Complexes (Sarah Teichmann, EBI),
Data captured currently for IntAct complexes• Participants – proteins (UniProt), small molecules
(ChEBI), nucleic acids (Ensembl, ChEBI, RNACentral?)
• Species
• Stoichiometry – when known
• Topology (= binding sites) – when known
Data captured currently for IntAct complexes• Complex-specific, free-text annotation fields:• Function and context – UniProt-style (visible in search
results)
• Assembly, e.g. homodimer, heterotetramer…
• Physical properties, e.g. MW, size, topology/assembly
• Ligands
• Disease
Data captured currently for IntAct complexes• Complex names:• Recommended name:
most recognisable name from literature, use GO component if specific complex exists in GO
• Systematic name:
based on Reactome’s new CV names – ‘string of gene names with stoichiometry’
• Synonyms:
all other names the complex may be known as
Data captured currently for IntAct complexes• Structured annotation using GO (BP, MF, CC)
• Cross references to experimental evidence:• IMEx (+ non-IMEx IntAct & DIP), PDB, EMDB
• Cross references to related complex data: • Reactome (human)
• ChEMBL
• PubMed (for further information)
• Intenz (enzyme EC numbers)
• OMIM (disease)
• ECO (evidence code ontology)
Parallel Annotation of complexes in GO
• Project start > 400 complex terms in GO CC, mostly children
of GO:0043234 protein complex – lacking hierarchal
structure
• Good collaboration with GO to provide structured annotation
• Parent terms mainly based on complex function
• TermGenie (TG) Standard Form <protein_complex_by_activity>
• Otherwise use TG Free Form
• Some complexes still direct children of GO:0043234 protein complex
• Adding “logical definitions” / “cross-products” / “extensions”
• e.g. “capable_of x activity”
ECO – Evidence Code Ontology
• ECO:0000353 physical interaction evidence used in manual assertion (=IPI)
• full experimental evidence for the complexes is present
• ECO:0000266 - sequence orthology evidence used in manual assertion (=ISO)
• only limited experimental evidence exists for a complex in one species (e.g. mouse) but it is desirable to curate the complex which has been curated in another species (e.g. human) and orthologous gene products exist in the former species, e.g. PDGFs
• ECO:0000306: inference from background scientific knowledge used in manual assertion, if:
• no or only partial experimental evidence can be found but the complexes are generally assumed to exist, e.g. GABA receptors exist in ChEMBL
Download
• At present:
• One PSI-MI xml 2.5.4 file for all complexes on ftp site
• From next IntAct release:
• One file per complex within a folder per species on ftp site and a zip file per species
• Future:
• Separate files for each complex accessible on each complex details page
• List of files for complexes from search results list
• Database specified dumps
• Network analysis appropriate format (as developed by MIPS)
Project status
• Website will move to production site end March
• Further development (particularly graphics) will be made public over the next 6 months
• Curation priorities – Human (mouse), yeast, Ecoli
- user requestsExports to GOA (process and component) and UniProt under discussion.
Future Plans - Display
• Add search filters, e.g.
• Species –almost done
• GO terms
• ECO
Advanced Search
• Links to ‘experimental evidence’ and ‘related complexes’ searches
• Schematic view of complex
• Add existing widgets/BioJS components to show content from other databases directly in the Complex Portal (BioJS)
- crystal structure, pathway, enzyme reactions etc
Future Plans - Functionality
• Concept of ‘sets’ – important for Reactome import
• Hierarchy of complex sets specific complex sub-complex
• Introducing features to indicate, e.g. complex-drug binding sites
Complexes on demand
1. Request via ‘Contact us’ button
1. Name & components
2. Experimental paper
3. Full details including Function,
stoichiometry and topology
.. or we give you access to editor to create your own
17
????
??? ?
??
?
?
?
?
?
?
??
?
?
? ?
?
Summary of ‘User Survey’ and own goals
Summary of ‘User Survey’ - Search
Summary of ‘User Survey’ - Display
Summary of ‘User Survey’ - Features
Expression Atlas?
Summary of ‘User Survey’ - Features
Manually for mouse
ECO xref to exp-evidence
Summary of ‘User Survey’ - Features
Definition???Reactome
Summary of ‘User Survey’ - Features
Summary of ‘User Survey’ - Downloads
IntAct and Complex Portal homepage
Complex PortalUniProt-style display
Complex Portaltab-style display