eye what kinds of things exist? what are the relationships between these things? ommatidium sense...
TRANSCRIPT
eye
what kinds of things exist?
what are the relationships between these things?
ommatidium
sense organ
eye disc
is_a
part_of
developsfrom
A biological ontology is:
A machine interpretable representation of some aspect of biological reality
Three practical considerations
Acknowledge the substantive issues of sociology and operations to build community
Follow the principles of ontology construction hygiene
Annotation is the rate-limiting step. It can be facilitated by improved ontologies
Drosophila
Meeting the goal: Drawing inferences
A B C DSP:1234 SP:8723 SP:19345?
PMID:5555 PMID:4444
BSP:48392
B CSP:48291 SP:38921
Direct evidence Direct evidence
Indirect evidence
Indirect evidence
PMID:8976
PMID:9550 PMID:3924
Human
Xenopus
Overview Following basic rules helps make better ontologies
We will work through the principles-based treatment of relations in ontologies, to show how ontologies can become more reliable and more powerful
Why do we need rules for good ontology?
Ontologies must be intelligible both to humans (for annotation) and to machines (for reasoning and error-checking)
Unintuitive rules for classification lead to entry errors (problematic links)
Facilitate training of curators Overcome obstacles to alignment with other ontology and terminology systems
Enhance harvesting of content through automatic reasoning systems
Tactile senseTactionTactition
?
The Challenge of Univocity:People call the same thing by
different names
Tactile senseTactionTactition
perception of touch ; GO:0050975
Univocity: GO uses 1 term and many characterized synonyms
= bud initiation
= bud initiation
= bud initiation
The Challenge of Univocity: People use the same words to describe different things
Bud initiation? How is a computer to know?
= bud initiation
sensu Dentists
= bud initiation
sensu yeasties
= bud initiation
sensu agronomists
Univocity: GO adds “sensu” descriptors to discriminate
among organisms
The Importance of synonyms for utility:
How do we represent the function of tRNA?
Biologically, what does the tRNA do?Identifies the codon and inserts the amino acid in the growing polypeptide
Molecular_function
Triplet_codon amino acid adaptor activity
GO Definition: Mediates the insertion of an amino acid at the correct point in the sequence of a nascent polypeptide chain during protein synthesis.
Synonym: tRNA
The Challenge of Positivity
Some organelles are membrane-bound.A centrosome is not a membrane bound organelle,but it still may be considered an organelle.
The Challenge of Positivity: Sometimes absence is a
distinction in a Biologist’s mind
non-membrane-bound organelle
GO:0043228 membrane-bound organelle
GO:0043227
Positivity
Note the logical difference between “non-membrane-bound organelle” and “not a membrane-bound organelle”
The latter includes everything that is not a membrane bound organelle!
The Challenge of Objectivity: Database users want to know if
we don’t know anything (Exhaustiveness with respect
to knowledge)
We don’t know anything about a gene product with
respect to these
We don’t know anything about the ligand that
binds this type of GPCR
Objectivity
How can we use GO to annotate gene products when we know that we don’t have any information about them? Currently GO has terms in each ontology to describe unknown
An alternative might be to annotate genes to root nodes and use an evidence code to describe that we have no data.
Similar strategies could be used for things like receptors where the ligand is unknown.
Single Inheritance
GO has a lot of is_a diamonds Some are due to incompleteness of the graph
Some are due to a mixture of dissimilar classes within the graph at the same level
Is_a diamond in GO Process
behavior
locomotory behavior larval behavior
larval locomotory behavior
Is_a diamond in GO Function
enzyme regulator activity
GTPase regulatoractivity
enzyme activatoractivity
GTPase activatoracivity
Is_a diamond in GO Cellular Component
organelle
non-membrane boundorganelle
intracellularorganelle
non-membrane boundintracellular organelle
Technically the diamonds are correct, but could
be eliminatedlocomotory behavior larval behavior
GTPase regulatoractivity
enzyme activatoractivity
non-membrane boundorganelle
intracellularorganelle
What do these pairs have in common?
They are all differentiated from the parent term by a
different factorlocomotory behavior larval behavior
GTPase regulatoractivity
enzyme activatoractivity
non-membrane boundorganelle
intracellularorganelle
Type of behavior vs. what is behaving
What is regulated vs. type of regulator
Type of organelle vs. location of organelle
Insert an intermediate grouping term
behavior
locomotory behavior larval behavior
larval locomotory behavior
behavior of a thing
descriptive behavior
Why insert terms that no one would use?
behavior
locomotory behavior larval behavior rhythmic behavior adult behavior
By the structure of this graph, locomotory behaviorhas the same relationship to larval behavior
as to rhythmic behavior
Why insert terms that no one would use?
behavior
locomotory behavior larval behaviorrhythmic behavior adult behavior
But actually, locomotory behavior/rhythmic behaviorand larval behavior/adult behavior group naturally
Descriptivebehavior
Behaviorof athing
This type of single step differentiation of terms between levels would allow us to use distancesbetween nodes and levels to compare similarity.
GO DefinitionsA definition written by
a biologist:necessary & sufficient
conditions written definition(not computable)
Graph structure: necessary conditions
formal(computable)
Relationships and definitions
The set of necessary conditions is determined by the graph This can be considered a partial definition
Important considerations: Placement in the graph- selecting parents
Appropriate relationships to different parents
True path violation
The importance of relationships Cyclin dependent protein kinase
Complex has a catalytic and a regulatory subunit
How do we represent these activities (function) in the ontology?
Do we need a new relationship type (regulates)?Catalytic activity
protein kinase activity
protein Ser/Thr kinase activity
Cyclin dependent protein kinase activity
Cyclin dependent protein kinase regulator activity
Molecular_function
Enzyme regulator activity
Protein kinase regulator activity
Relationships provide the computable definitions
The definition of a class lower down in the hierarchy is provided by specifying the parent of the class together with the relevant differentia.
Differentia tells us what marks out instances of the defined class within the wider parent class.
Structured definitions contain both genus and differentiae
Essence = Genus + Differentiae
neuron cell differentiation =Genus: differentiation (processes whereby a relativelyunspecialized cell acquires the specialized features of..)Differentiae: acquires features of a neuron
Alignment of the Two Ontologies will permit the generation of consistent
and complete definitions
id: CL:0000062name: osteoblastdef: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." [MESH:A.11.329.629]is_a: CL:0000055relationship: develops_from CL:0000008relationship: develops_from CL:0000375
GO
Cell type
New Definition
+
=Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix.
Alignment of the Two Ontologies will permit the generation of consistent and complete definitions
id: GO:0001649
name: osteoblast differentiationsynonym: osteoblast cell differentiationgenus: differentiation GO:0030154 (differentiation)differentium: acquires_features_of CL:0000062 (osteoblast)definition (text): Processes whereby a relatively unspecialized cell acquires the specialized features of an osteoblast, the mesodermal cell that gives rise to bone
Formal definitions with necessary and sufficient conditions, in both human readable and computer readable forms
Basis in Reality
GO is designed by a consortium As long as egos don’t get in the way, GO represents universals rather than concepts
Large-scale developments of the GO are a result of compromise
Gene Annotators have a large say in GO content Annotators are experts in their fields Annotators constantly read the scientific literature
But, since GO is representing a science, GO actually represents paradigms.
Therefore, it is essential that GO is able to change!
Classes and Instances
When should we create a new class as opposed to multiple annotations?
When the the biology represents a universal principal. Receptor signaling protein tyrosine kinase activity does not represent receptor signaling protein activity and tyrosine kinase activity independently.
Consequences of inconsistencies
Hard to synchronize manually can be automated currently requires text mining
Inconsistent user-search results Problems likely to resurface with other ontologies embedded in the GO chemical, protein multiple species-specific anatomical ontologies
OBO ontologies should inform GO
Combinatorial annotation
Allow combinatorial annotations in appropriate context phenotypic annotation
eg “tail fin placement ventralized at stage pharyngula:prim25”
combinatorial GO annotation eg “activation of MAPK in mesenchymal cell”
New AmiGO will allow for these annotations
Animal models
Mutant Gene
Mutant or missing Protein
Mutant Phenotype
Animal disease models
Humans Animal models
Mutant Gene
Mutant or missing Protein
Mutant Phenotype
(disease)
Mutant Gene
Mutant or missing Protein
Mutant Phenotype
(disease model)
Animal disease models
Humans Animal models
Mutant Gene
Mutant or missing Protein
Mutant Phenotype
(disease)
Mutant Gene
Mutant or missing Protein
Mutant Phenotype
(disease model)
Animal disease models
Humans Animal models
Mutant Gene
Mutant or missing Protein
Mutant Phenotype
(disease)
Mutant Gene
Mutant or missing Protein
Mutant Phenotype
(disease model)
Animal disease models
SHH-/+ SHH-/-
shh-/+ shh-/-
Phenotype (clinical sign) = entity + attribute
Phenotype (clinical sign) = entity + attribute
P1 = eye + hypoteloric
Phenotype (clinical sign) = entity + attribute
P1 = eye + hypoteloric
P2 = midface + hypoplastic
Phenotype (clinical sign) = entity + attribute
P1 = eye + hypoteloric
P2 = midface + hypoplastic
P3 = kidney + hypertrophied
Phenotype (clinical sign) = entity + attribute
P1 = eye + hypoteloricP2 = midface + hypoplastic P3 = kidney + hypertrophied
PATO: hypoteloric
hypoplastic
hypertrophied
ZFIN: eye
midface
kidney
+
Phenotype (clinical sign) = entity + attribute
Anatomical ontology
Cell & tissue ontology
Developmental ontology
Gene ontology
biological process
molecular function
cellular component
+ PATO(phenotype and trait ontology)
Phenotype (clinical sign) = entity + attribute
P1 = eye + hypoteloricP2 = midface + hypoplastic P3 = kidney + hypertrophied
Syndrome = P1 + P2 + P3 (disease)
= holoprosencephaly
Human holo-prosencephaly
Zebrafishshh
Zebrafishoep
OMIM gene
ZFIN gene
FlyBase gene
FlyBase mut pub
ZFIN mut pub
mouse rat SNOMED
OMIM disease
LAMB1 lamb1 LanB1 5 15 39 -
FECH fech Ferro-
chelatase
2 5 2 29 Protoporphyria, Erythropoietic
GLI2 gli2a ci 388 41 22 -
SLC4A1 slc4a1 CG8177 7 7 19 Renal Tubular Acidosis, RTADR
MYO7A myo7a ck 84 5 9 3 16 Deafness; DFNB2; DFNA11
ALAS2 alas2 Alas 1 7 14 Anemia, Sideroblastic, X-Linked
KCNH2 kcnh2 sei 27 3 12 -
MYH6 myh6 Mhc 166 3 1 12 Cardiomyopathy, Familial Hypertrophic; CMH
TP53 tp53 p53 64 3 3 19 11 Breast Cancer
ATP2A1 atp2a1 Ca-P60A 32 6 1 11 Brody Myopathy
EYA1 eya1 eya 251 5 4 6 Branchiootorenal Dysplasia
SOX10 sox10 Sox100B 1 17 4 4 Waardenburg-Shah Syndrome
CBiO Bioinformatics Goals
1. Apply the ontology Software toolkit for “annotation”
applications
2. Manage the data Database and interface to store & view
annotations
3. Investigate and compare Linking human diseases to genetic models
4. Maintain Ongoing reconciliation of ontologies
with annotations
Ontology requirements
Ontologies attribute ontology GO anatomical ontologies
cell species-specific
Ontology integration technical semantic
Review of proposed (2003) EAV model
A phenotype is described using an Entity-Attribute-Value triple similiar “slot-value” frame-based systems
Entities are drawn from various OBO ontologies
Attributes and Values in one ontology - PATO
Separation of concerns
Not phenotypes: Genotype Environment Assay, measurement systems Images
Association = Genotype Phenotype Environment AssayPhenotype = Entity Attribute ValueEntity = OBOClassIDAttribute | Value = PATOClassID
schema:
2003 Pilot study
Trial of EAV model on small collection of genotypes FlyBase ZFIN
Genes were non-orthologous New curations - in progress
orthologous genes with clinical relevance Use the same data model and exchange format?
Example data records
Genotype Entity
Attribute Value
npo gut structure dysplastic
gut relative size small
r210 retina
pattern irregular
brain structure fused
ZFIN schema extension: stages
Genotype Stage Entity Attribute Value
npo Hatching:Pec-fin gut structure dysplastic
Hatching:Pec-fin gut relative size
small
r210 Hatching:Long-pec retina pattern irregular
Larval:Protruding-mouth
brain structure fused
Stages
Association = Genotype Phenotype Environment AssayPhenotype = Stage* Entity Attribute ValueEntity = OBOClassIDStage = OBOAnatomicalStageClassIDAttribute | Value = PATOClassID
* means zero or more
Interpretation of the schema
How should EAV data records be interpreted by a computer? What are the instances?
Is the EAV schema just to improve database searching? Can it be used for meaningful cross-species comparisons?
What is the Entity slot used for?
Genotype Entity Attribute Value
npo gut structure dysplastic
gut relative size small
r210 retina pattern irregular
brain structure fused
tm84 d/v pattern formation
qualitative abnormal
blood islands
relative number number increased
Bsb[2] elongation of arista literal
process arrested
C-alpha[1D] adult behaviour
behavioral activity uncoordinated
2003 trial data: FB & ZFIN
What is the Entity slot used for?
In practical terms: An ID from one of the following ontologies
GO CC GO BP GO MF Species-specific anatomical ontology OBO Cell
Or a cross-product e.g. acidification GO:0045851
which has_locationOBOREL midgut FBbt00005383
[example from FBal0062296: Acidification in the midgut of homozygous larvae is often less than in wild-type larvae]
But what does it mean in the context of an annotation?
Universals and particulars
An ontology consist of universals (classes) fruitfly wing flight
Experimental data generally concerns particulars (instances) that instantiate universals
this particular wing of this particular fruitfly this particular fruitfly participating in this particular flight from here to there
in annotation we often use a class ID as a proxy for an (unnamed) instance (or collection of instances)
It is important to always keep this distinction in mind
What is an Entity?UBO - Upper Bio OntologyBFO - Basic Formal Ontology
What is the Attribute slot used for?
Genotype Entity Attribute Value
npo gut structure dysplastic
gut relative size small
r210 retina pattern irregular
brain structure fused
tm84 d/v pattern formation
qualitative abnormal
blood islands
relative number number increased
Bsb[2] elongation of arista literal
process arrested
C-alpha[1D] adult behaviour
behavioral activity uncoordinated
2003 trial data: FB & ZFIN
Attributes: a proposal
Treat as Qualities a dependent entity
a quality must have independent entity(s) as bearer
the quality inheres_in the bearer
Examples The particular shape of this ball The particular structure of this wing The particular length of this tail The particular rate of synaptic transmission between these two neurons
Attribute universals vs Attribute particulars
In an EAV annotation, a PATO class ID typically serves as a proxy for an unnamed attribute instance
Universals (classes) must always be defined in terms of their instances
A Formal Theory of Substances, Qualities, and Universals Fabian NEUHAUS Pierre GRENON Barry SMITH
How is the value slot used?
Genotype Entity Attribute Value
npo gut structure dysplastic
gut relative size small
r210 retina pattern irregular
brain structure fused
tm84 d/v pattern formation
qualitative abnormal
blood islands
relative number number increased
Bsb[2] elongation of arista literal
process arrested
C-alpha[1D] adult behaviour
behavioral activity uncoordinated
Redundancy in EAV triples
entity attribute value
fin shape irregular shape
eye color hue blue
mesenchyme relative thickness
thin
brain structure fused
retinal cells
relative orientation
disoriented
Proposal: EA modelentity attribute value
fin shape irregular shape
eye color hue blue
mesenchyme relative thickness
thin
brain structure fused
retinal cells
relative orientation
disoriented
Proposal: EA modelentity attribute attribute
fin shape irregular shape
eye color hue blue
mesenchyme relative thickness
thin
brain structure fused
retinal cells
relative orientation
disoriented
Association = Genotype Phenotype Environment AssayPhenotype = Stage* Entity Attribute ValueEntity = OBOClassIDAttribute = PATOVersion2ClassID
Proposed schema modification
Monadic and relational attributes
Monadic: the quality/attribute inheres in a single entity
Relational: the quality/attribute inheres in two or more entities
sensitivity of an organism to a kind of drug sensitivity of an eye to a wavelength of light
can turn relational attributes into cross-product monadic attributes
e.g. sensitivityToRedLight better to use relational attributes
avoids redundancy with existing ontologies
Association = Genotype Phenotype Environment AssayPhenotype = Stage* Entity Attribute Entity*Entity = OBOClassIDAttribute = PATOVersion2ClassID
Incorporating relational attributes
Example data record:Phenotype = “organism” sensitiveTo “puromycin”
Measurable attributes
Some attributes are inexact and implicitly relative to a wild-type or normal attribute relatively short, relatively long, relatively reduced
easier than explicitly representing: this tail length shorter-than ‘canonical mouse’ wild-type tail length
Some attributes are determinable use a measure function
unit, value, {time} this tail has length L
measure(L, cm) = 2
Keep measurements separate from (but linked to) attribute ontology
Incorporating measurements
Association = Genotype Phenotype Environment AssayPhenotype = Stage* Entity Attribute Entity* Measurement*Measurement = Unit Value (Time)Entity = OBOClassIDAttribute = PATOVersion2ClassID
Example data record:Phenotype = “gut” “acidic” Measurement = “pH” 5
Composite phenotype classes
Mammalian phenotype has composite phenotype classes e.g. “reduced B cell number”
Compose at annotation time or ontology curation time? False dichotomy
Core 2 will help map between composite class based annotation and EA annotation
Interpreting annotations
Annotations are data records typically use class IDs implicitly refer to instances
How do we map an annotation to instances?
Important for using annotations computationally
Interpreting annotations (1)
What does an EA (or EAV) annotation mean? Annotation:
Genotype=“FBal00123” E=“brain” A=“fused” presumed implied meaning:
this organism has_part x, where
x instance_of “brain”x has_quality “fused”
or in natural language: “this organism has a fused brain”
Various built-in assumptions
Interpreting annotations (II)
What does this mean: annotation:
Genotype=“FBal00123” E=“wing” A=“absent” using same mapping as annotation I:
fly98 has_part x, where x instance_of “wing” x has_quality “absent”
or in natural language: this fly has a wing which is not there !
What we really intend: NOT(this organism has_part x, where x instance_of “wing”)
Interpreting annotations (II)
What does this mean: annotation:
Genotype=“FBal00123” E=“wing” A=“absent” using same mapping as annotation I:
this organism has_part x, where x instance_of “wing” x has_quality “absent”
or in natural language: this fly has a wing which is not there !
What we really intend: this organism has_quality “wingless” “wingless” = the property of having count(has_part “wing”)=0
Are our computational representations intended to capture linguistic statements or reality?
Does this matter?
Logical reasoners will compute incorrect results unless explicitly provided with specific rules for certain attributes such as “absent”
What are the consequences? Basic search will be fine
e.g. “find all wing phenotypes” But computers will not be able to reason correctly
Interpreting annotations (III)
What does this mean: annotation:
E=“digit” A=“supernumery” using same interpretation as annotation I:
this organism has_part x, where x instance_of “digit” x has_quality “supernumery”
or in natural language: this organism has a particular finger which is supernumery
What we really intend: this person has_quality “supernumery finger” “supernumery finger” = the property of having count(has_part “digit”) > wild-type”
!!!
Interpreting annotations (IV)
What does this mean: annotation:
Gt=“mp001” E=“brown fat cell” A=“increased quantity”
using same mapping as annotation I: this organism has_part x, where
x instance_of “brown fat cell” x has_quality “increased quantity”
or in natural language: this organism has a particular brown fat cell which is increased in quantity
What we really intend: this organism has_part population_of(“brown fat cell”) which has_quality increased size
Other use cases
spermatocyte devoid of asters Homeotic transformations increased distance between wing veins
Some vs all
Alternate perspectives
process vs state regulatory processes:
acidification of midgut has_quality reduced rate
midgut has_quality low acidity
development vs behavior wing development has_quality abnormal flight has_quality intermittent
granularity (scale) chemical vs molecular vs cell vs tissue vs anatomical part
Summary
Define attributes in terms of instances
Evaluate proposed new schema measurement proposal relational attribute proposal
Complexity trade-off create library of use cases Core2 will create tools to present user-friendly layer
Alternate perspective annotations are useful
Ontologies must adapt over time
Getting it right It is impossible to get it right the 1st (or 2nd, or 3rd, …) time.
What we know about biology is continually growing
This “standard” requires versioning.
Improve
Collaborate and Learn
Our Outlook
“The needs of the many outweigh the needs of the few, or of the one.”