embl-ebi now and in the future · elixir) resource catalogues (drcat) information standards...

23
The ontology of bioinformatics operations, types of data and identifiers, data formats and topics EDAM ontology Jon Ison PhD Matúš Kalaš

Upload: others

Post on 31-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EMBL-EBI Now and in the Future · ELIXIR) Resource catalogues (DRCAT) Information standards (BioDBCore) Infrastructures (BioMB, ELIXIR) Databases & ontologies (SWO) Web portals &

The ontology of bioinformatics operations, types of data and identifiers, data formats and topics

EDAM ontology

Jon Ison PhD

Matúš Kalaš

Page 2: EMBL-EBI Now and in the Future · ELIXIR) Resource catalogues (DRCAT) Information standards (BioDBCore) Infrastructures (BioMB, ELIXIR) Databases & ontologies (SWO) Web portals &

Ever more / diverse bioinformatics tools & data resources

Scientists require better ways to handle resources

Tool discovery Organise

Find

Understand

Compare

Select

Problem!

Tool (inter)operability Use Connect

Page 3: EMBL-EBI Now and in the Future · ELIXIR) Resource catalogues (DRCAT) Information standards (BioDBCore) Infrastructures (BioMB, ELIXIR) Databases & ontologies (SWO) Web portals &

Ever more / diverse bioinformatics tools & data resources

Scientists require better ways to handle resources

Tool discovery Organise

Find

Understand

Compare

Select

Problem!

Tool (inter)operability Use Connect

Page 4: EMBL-EBI Now and in the Future · ELIXIR) Resource catalogues (DRCAT) Information standards (BioDBCore) Infrastructures (BioMB, ELIXIR) Databases & ontologies (SWO) Web portals &

Resource descriptions Relevant for practical scientific / technic purposes

Consistent

Comprehensive

Machine understandable

Searchable

And big efforts… Ontologies Information standards

Annotation & curation Common schema

Registry Community / social

What’s needed?

Page 5: EMBL-EBI Now and in the Future · ELIXIR) Resource catalogues (DRCAT) Information standards (BioDBCore) Infrastructures (BioMB, ELIXIR) Databases & ontologies (SWO) Web portals &

Resource descriptions Relevant for practical scientific / technic purposes

Consistent

Comprehensive

Machine understandable

Searchable

And big efforts… Ontologies Information standards

Annotation & curation Common schema

Registry Community / social

What’s needed?

Page 6: EMBL-EBI Now and in the Future · ELIXIR) Resource catalogues (DRCAT) Information standards (BioDBCore) Infrastructures (BioMB, ELIXIR) Databases & ontologies (SWO) Web portals &
Page 7: EMBL-EBI Now and in the Future · ELIXIR) Resource catalogues (DRCAT) Information standards (BioDBCore) Infrastructures (BioMB, ELIXIR) Databases & ontologies (SWO) Web portals &

Unify the vocabulary and semantics of common bioinformatics concepts Terms with definitions

Minimal ontology structure (to help developers and annotators)

Bioinformatics-specific

• computer science & biology excluded!

General purpose

• fill gaps between specialised ontologies

• never encroach on well developed & maintained ontologies!

• maintain organic boundaries with other ontologies

Aim & Scope

Page 8: EMBL-EBI Now and in the Future · ELIXIR) Resource catalogues (DRCAT) Information standards (BioDBCore) Infrastructures (BioMB, ELIXIR) Databases & ontologies (SWO) Web portals &

Data

Topic

Identifier

Operation

Format

EDAM concepts

Page 9: EMBL-EBI Now and in the Future · ELIXIR) Resource catalogues (DRCAT) Information standards (BioDBCore) Infrastructures (BioMB, ELIXIR) Databases & ontologies (SWO) Web portals &

Data Sequence trace

Raw microarray data

Topic Phylogenetics

Transcriptomics

Identifier Ensembl ID

UniProt accession Protein name

Operation Sequence alignment

SNP detection

Format FASTQ SBML

EDAM concepts

Page 10: EMBL-EBI Now and in the Future · ELIXIR) Resource catalogues (DRCAT) Information standards (BioDBCore) Infrastructures (BioMB, ELIXIR) Databases & ontologies (SWO) Web portals &

Data Information, represented in an information artefact

(data record) that is 'understandable' by

dedicated computational tools that can use the data as input or produce it as

output.

Topic A category denoting a rather broad domain or

field of interest, of study, application, work, data, or technology. Topics have

no clearly defined borders between each other.

Identifier A text token, number or something else which identifies an entity, but

which may not be persistent (stable) or

unique (the same identifier may identify multiple

things).

Operation A function that processes a set of inputs and results

in a set of outputs, or associates arguments

(inputs) with values (outputs).

Format A defined way or layout of

representing and structuring data in a

computer file, blob, string, message, or elsewhere.

EDAM concepts

Page 11: EMBL-EBI Now and in the Future · ELIXIR) Resource catalogues (DRCAT) Information standards (BioDBCore) Infrastructures (BioMB, ELIXIR) Databases & ontologies (SWO) Web portals &

Concept declarations Concept URI Primary label Synonyms Definition Relations to other EDAM concepts Other goodies (in progress): • Regular expressions (Identifiers only, for validation) • Example (Identifiers only, for convenience) • Documentation (Formats only, URL to format specification)

• All terms have definitions • All Identifiers related to Data • All Formats related to Data

Page 12: EMBL-EBI Now and in the Future · ELIXIR) Resource catalogues (DRCAT) Information standards (BioDBCore) Infrastructures (BioMB, ELIXIR) Databases & ontologies (SWO) Web portals &

Data

Topic

Identifier

Operation

Format is_identifier_of is_format_of

has_input has_output

EDAM relations

Page 13: EMBL-EBI Now and in the Future · ELIXIR) Resource catalogues (DRCAT) Information standards (BioDBCore) Infrastructures (BioMB, ELIXIR) Databases & ontologies (SWO) Web portals &

Data

Topic

Identifier

Operation

Format is_identifier_of is_format_of

has_topic

has_input has_output

EDAM relations

Page 14: EMBL-EBI Now and in the Future · ELIXIR) Resource catalogues (DRCAT) Information standards (BioDBCore) Infrastructures (BioMB, ELIXIR) Databases & ontologies (SWO) Web portals &

Data

Topic

Identifier

Operation

Format is_identifier_of is_format_of

has_topic

has_input has_output

EDAM relations

tool has_function

Page 15: EMBL-EBI Now and in the Future · ELIXIR) Resource catalogues (DRCAT) Information standards (BioDBCore) Infrastructures (BioMB, ELIXIR) Databases & ontologies (SWO) Web portals &

Relations apply between concepts and/or annotated entities Operation has_input Data Sequence annotation has_input Sequence record Operation has_output Data RNA structure prediction has_output RNA structure record Operation or Data has_topic Topic Phylogenetic tree has_topic Phylogenetics Format is_format_of Data CHP is_format_of Processed microarray data Identifier is_identifier_of Data InterPro accession is_identifier_of Protein signature Tool has_function Operation BLAST has_function Sequence database search

EDAM concept relations

Page 16: EMBL-EBI Now and in the Future · ELIXIR) Resource catalogues (DRCAT) Information standards (BioDBCore) Infrastructures (BioMB, ELIXIR) Databases & ontologies (SWO) Web portals &

2461 concepts 246 topics

612 data

528 identifiers

456 formats

619 operations

Comprehensive for sequence and structure analysis (including “next gen” sequencing)

Comprehensive list of data formats

Add new content for use-cases as required

Looking to work with domain experts!

Status (v1.2 / March 2013)

• EDAM_1.2 is available • Quarterly release cycle • Coordinated with SWO & WSIO

Page 17: EMBL-EBI Now and in the Future · ELIXIR) Resource catalogues (DRCAT) Information standards (BioDBCore) Infrastructures (BioMB, ELIXIR) Databases & ontologies (SWO) Web portals &
Page 18: EMBL-EBI Now and in the Future · ELIXIR) Resource catalogues (DRCAT) Information standards (BioDBCore) Infrastructures (BioMB, ELIXIR) Databases & ontologies (SWO) Web portals &

Bioinformatics 2013; doi: 10.1093/bioinformatics/btt113

Page 19: EMBL-EBI Now and in the Future · ELIXIR) Resource catalogues (DRCAT) Information standards (BioDBCore) Infrastructures (BioMB, ELIXIR) Databases & ontologies (SWO) Web portals &

EDAM

Tools (EMBOSS tools)

Collections (DebianMed,

EMBOSS)

Web services

(iHOP, WSDbfetch, Bergen WS)

Schema & data formats

(BioXSD)

Workbenches & workflow (eSysBio)

Software registries

(BioMB, BioCatalogue,

ELIXIR)

Resource catalogues

(DRCAT)

Information standards (BioDBCore)

Infrastructures (BioMB, ELIXIR)

Databases & ontologies

(SWO)

Web portals & pages

(Institut Pasteur)

Applications

Page 20: EMBL-EBI Now and in the Future · ELIXIR) Resource catalogues (DRCAT) Information standards (BioDBCore) Infrastructures (BioMB, ELIXIR) Databases & ontologies (SWO) Web portals &

Example information models

Tool e.g. Web service SAWSDL standard

Data resource e.g. DRCAT BioDBCore standard

Page 21: EMBL-EBI Now and in the Future · ELIXIR) Resource catalogues (DRCAT) Information standards (BioDBCore) Infrastructures (BioMB, ELIXIR) Databases & ontologies (SWO) Web portals &

Concepts identified by global URIs (in EDAM.owl): http://edamontology.org/<subontology>_<Id>

http://edamontology.org/<Id> (relations)

Concepts have identifiers (in EDAM.obo): EDAM_⟨subontology⟩:⟨Id⟩

EDAM_:⟨Id⟩ (relations)

e.g. Sequence record: http://edamontology.org/data_0849

EDAM_data:0849

http://edamontology.org/has_function

EDAM:has_function

Usage

• Use whatever works for you! • URIs are stable with clean

obsoletion mechanism

Page 22: EMBL-EBI Now and in the Future · ELIXIR) Resource catalogues (DRCAT) Information standards (BioDBCore) Infrastructures (BioMB, ELIXIR) Databases & ontologies (SWO) Web portals &

EDAM in OBO format (view in OBOEdit) EDAM in OWL format (view in Protege or TopBraid) sourceforget.net/projects/edamontology/files

NBCO BioPortal browser http://bioportal.bioontology.org/ EBI Ontology Lookup Service ebi.ac.uk/ontology-lookup/

Files & Interfaces

Page 23: EMBL-EBI Now and in the Future · ELIXIR) Resource catalogues (DRCAT) Information standards (BioDBCore) Infrastructures (BioMB, ELIXIR) Databases & ontologies (SWO) Web portals &

edamontology.org bioportal.bioontology.org/ ebi.ac.uk/ontology-lookup/ Email Jon cc Matus [email protected]

[email protected]

Mailing lists (low traffic!) https://lists.sourceforge.net/lists/listinfo/edamontology-users https://lists.sourceforge.net/lists/listinfo/edamontology-developers https://lists.sourceforge.net/lists/listinfo/edamontology-announce

Links & Contacts