embl-ebi now and in the future · elixir) resource catalogues (drcat) information standards...
TRANSCRIPT
The ontology of bioinformatics operations, types of data and identifiers, data formats and topics
EDAM ontology
Jon Ison PhD
Matúš Kalaš
Ever more / diverse bioinformatics tools & data resources
Scientists require better ways to handle resources
Tool discovery Organise
Find
Understand
Compare
Select
Problem!
Tool (inter)operability Use Connect
Ever more / diverse bioinformatics tools & data resources
Scientists require better ways to handle resources
Tool discovery Organise
Find
Understand
Compare
Select
Problem!
Tool (inter)operability Use Connect
Resource descriptions Relevant for practical scientific / technic purposes
Consistent
Comprehensive
Machine understandable
Searchable
And big efforts… Ontologies Information standards
Annotation & curation Common schema
Registry Community / social
What’s needed?
Resource descriptions Relevant for practical scientific / technic purposes
Consistent
Comprehensive
Machine understandable
Searchable
And big efforts… Ontologies Information standards
Annotation & curation Common schema
Registry Community / social
What’s needed?
Unify the vocabulary and semantics of common bioinformatics concepts Terms with definitions
Minimal ontology structure (to help developers and annotators)
Bioinformatics-specific
• computer science & biology excluded!
General purpose
• fill gaps between specialised ontologies
• never encroach on well developed & maintained ontologies!
• maintain organic boundaries with other ontologies
Aim & Scope
Data
Topic
Identifier
Operation
Format
EDAM concepts
Data Sequence trace
Raw microarray data
Topic Phylogenetics
Transcriptomics
Identifier Ensembl ID
UniProt accession Protein name
Operation Sequence alignment
SNP detection
Format FASTQ SBML
EDAM concepts
Data Information, represented in an information artefact
(data record) that is 'understandable' by
dedicated computational tools that can use the data as input or produce it as
output.
Topic A category denoting a rather broad domain or
field of interest, of study, application, work, data, or technology. Topics have
no clearly defined borders between each other.
Identifier A text token, number or something else which identifies an entity, but
which may not be persistent (stable) or
unique (the same identifier may identify multiple
things).
Operation A function that processes a set of inputs and results
in a set of outputs, or associates arguments
(inputs) with values (outputs).
Format A defined way or layout of
representing and structuring data in a
computer file, blob, string, message, or elsewhere.
EDAM concepts
Concept declarations Concept URI Primary label Synonyms Definition Relations to other EDAM concepts Other goodies (in progress): • Regular expressions (Identifiers only, for validation) • Example (Identifiers only, for convenience) • Documentation (Formats only, URL to format specification)
• All terms have definitions • All Identifiers related to Data • All Formats related to Data
Data
Topic
Identifier
Operation
Format is_identifier_of is_format_of
has_input has_output
EDAM relations
Data
Topic
Identifier
Operation
Format is_identifier_of is_format_of
has_topic
has_input has_output
EDAM relations
Data
Topic
Identifier
Operation
Format is_identifier_of is_format_of
has_topic
has_input has_output
EDAM relations
tool has_function
Relations apply between concepts and/or annotated entities Operation has_input Data Sequence annotation has_input Sequence record Operation has_output Data RNA structure prediction has_output RNA structure record Operation or Data has_topic Topic Phylogenetic tree has_topic Phylogenetics Format is_format_of Data CHP is_format_of Processed microarray data Identifier is_identifier_of Data InterPro accession is_identifier_of Protein signature Tool has_function Operation BLAST has_function Sequence database search
EDAM concept relations
2461 concepts 246 topics
612 data
528 identifiers
456 formats
619 operations
Comprehensive for sequence and structure analysis (including “next gen” sequencing)
Comprehensive list of data formats
Add new content for use-cases as required
Looking to work with domain experts!
Status (v1.2 / March 2013)
• EDAM_1.2 is available • Quarterly release cycle • Coordinated with SWO & WSIO
Bioinformatics 2013; doi: 10.1093/bioinformatics/btt113
EDAM
Tools (EMBOSS tools)
Collections (DebianMed,
EMBOSS)
Web services
(iHOP, WSDbfetch, Bergen WS)
Schema & data formats
(BioXSD)
Workbenches & workflow (eSysBio)
Software registries
(BioMB, BioCatalogue,
ELIXIR)
Resource catalogues
(DRCAT)
Information standards (BioDBCore)
Infrastructures (BioMB, ELIXIR)
Databases & ontologies
(SWO)
Web portals & pages
(Institut Pasteur)
Applications
Example information models
Tool e.g. Web service SAWSDL standard
Data resource e.g. DRCAT BioDBCore standard
Concepts identified by global URIs (in EDAM.owl): http://edamontology.org/<subontology>_<Id>
http://edamontology.org/<Id> (relations)
Concepts have identifiers (in EDAM.obo): EDAM_⟨subontology⟩:⟨Id⟩
EDAM_:⟨Id⟩ (relations)
e.g. Sequence record: http://edamontology.org/data_0849
EDAM_data:0849
http://edamontology.org/has_function
EDAM:has_function
Usage
• Use whatever works for you! • URIs are stable with clean
obsoletion mechanism
EDAM in OBO format (view in OBOEdit) EDAM in OWL format (view in Protege or TopBraid) sourceforget.net/projects/edamontology/files
NBCO BioPortal browser http://bioportal.bioontology.org/ EBI Ontology Lookup Service ebi.ac.uk/ontology-lookup/
Files & Interfaces
edamontology.org bioportal.bioontology.org/ ebi.ac.uk/ontology-lookup/ Email Jon cc Matus [email protected]
Mailing lists (low traffic!) https://lists.sourceforge.net/lists/listinfo/edamontology-users https://lists.sourceforge.net/lists/listinfo/edamontology-developers https://lists.sourceforge.net/lists/listinfo/edamontology-announce
Links & Contacts