use of ontologies in the life sciences: biopax

20
Use of Ontologies in the Life Scien ces: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at www.biopax.org )

Upload: madison-harrington

Post on 30-Dec-2015

20 views

Category:

Documents


1 download

DESCRIPTION

Use of Ontologies in the Life Sciences: BioPax. Graciela Gonzalez, PhD (some slides adapted from presentations available at www.biopax.org ). Definition of an Ontology. Conceptualization of a domain of interest Concepts, relations, attributes, constraints, objects, values … - PowerPoint PPT Presentation

TRANSCRIPT

Use of Ontologies in the Life Sciences: BioPax

Graciela Gonzalez, PhD

(some slides adapted from presentations available at www.biopax.org)

Definition of an Ontology

Conceptualization of a domain of interest Concepts, relations, attributes, constraints, objects, v

alues… An ontology is a specification of a conceptualization

Formal notation Documentation

A variety of forms, but includes: A vocabulary of terms Some specification of the meaning of the terms

Ontologies – Key Aspects

Focus on semantics! Accurately model a complex domain Capture semantic nuances Rigorously define what each field means Adhere to those definitions!

Ontologies – Key Aspects

Ontologies are for people and computers:People browse the ontology to learn itIt encodes the definition of a concept so that

the computer “understands” it “understands” = automated reasoning with

concept definitionsIs concept A more general than concept B?Is X an instance of concept A?

Components of an Ontology

Concepts (Class, Set, Type, Predicate)ex: Gene, Reaction, Macromolecule

Taxonomy of concepts (generalization/specialization hierarchies)

ex: a physical interaction is an interaction Relations and Attributes Domains –values allowed for an attribute-

ex: a feature location consists of a sequence location Constraints and other meta-information about relations

ex: a pathway has at least one interaction

Ontologies in Bioinformatics

Biological DBs need to have a good ontology AND a good mapping –implementation- of it: this prevents errors on data entry and interpretation

Provide a common framework for multidatabase queries

Provide a controlled vocabulary, such as for genome annotation

For information extraction

BioPAX Biological PAthway eXchange

A data exchange ontology and format for biological pathway integration, aggregation and inference

Open source, ongoing

BioPAX Goals

Include support for these pathway types:Metabolic pathwaysSignaling pathwaysProtein-protein interactionsGenetic regulatory pathways

Note: representing pathways is nothing new

The problem

200 + pathway databases of different kinds (http://www.pathguide.org/)

Rich data, different ontologies Nightmare for integration and data exc

hange

Biological pathways

MetabolicPathways

MolecularInteractionNetworks

SignalingPathways

Ontologies reflect “real life”

A typical pathway would be decomposed into:

A single pathway instance, which would contain several pathway steps, which would each contain one or more interactions occurring between physical entity participants, which each point to one physical entity.

BioPAX vs other ontologies

Conceptual framework based upon existing DB schemas, allowing wide range of detail, multiple levels of abstraction

Uses (refers to) existing ontologies to provide supplemental annotations where appropriate Cellular location GO Component Cell type Cell.obo Organism NCBI taxon DB

Incorporates other standards where appropriate Interoperates with existing standards

BioPax & other Exchange Formats

BioPAX

PSI-MI 2SBML,CellML

GeneticInteractions

Molecular InteractionsPro:Pro All:All

Interaction NetworksMolecular Non-molecularPro:Pro TF:Gene Genetic

Regulatory PathwaysLow Detail High Detail

DB ExchangeFormats

Simulation ModelExchange Formats

RateFormulas

Biochemical Reactions

Small MoleculesLow Detail High Detail

Metabolic PathwaysLow Detail High Detail

BioPAX level 1

Capturing data at different resolutions Metabolic pathway data has a high level of det

ail Molecular interaction have less

Ex: no causal or temporal aspects of interactions

BioPAX Level 2 captures molecular binding interactions at a relatively high level in the ontology class hierarchy

This reflects the fact that any given binding interaction may be a low-resolution (or more abstract) view of a more specific type of interaction.

Example

A signaling database would likely capture the interaction between MEK1 and ERK1 as a catalysis event (MEK1 catalyzes the phosphorylation of ERK1).

A molecular interaction database would likely store the interaction using a simpler abstraction, such as a protein-protein interaction.

BioPAX Level 2 supports both of these representations.

Aggregation, Integration, Inference with BioPax

1. Aggregation: represent multiple kinds of pathway databases

metabolic molecular interactions signal transduction gene regulatory

2. Integration: special constructs designed for integration

DB References XRefs (Publication, Unification, Relationship) Synonyms

3. OWL DL – to enable reasoning

BioPAX Ontology: Top Level

Pathway A set of interactions E.g. Glycolysis, MAPK, Apoptosis

Interaction A set of entities and some relationship between

them E.g. Reaction, Molecular Association, Catalysis

Physical Entity A building block of simple interactions E.g. Small molecule, Protein, DNA, RNA

Entity

Pathway

Interaction

Physical Entity

Subclass (is a)Contains (has a)

BioPAX Ontology: Physical Entities

PhysicalEntity

Complex RNAProtein Small Molecule

• This class serves as the super-class for all physical entities, although its current set of subclasses is limited to molecules. • This list may be expanded to include photon, environment, cell and cellular component in later levels of BioPAX.

Interaction Class Structure

Relational implementation