introduction to anatomy ontology building
DESCRIPTION
Introduction to anatomy ontology building. David Osumi -Sutherland FlyBase ( www.flybase.org ) Virtual Fly Brain ( www.virtualflybrain.org ). Take home messages. An ontology is a classification There are lots of useful ways to classify stuff - PowerPoint PPT PresentationTRANSCRIPT
+
Introduction to anatomy ontology buildingDavid Osumi-SutherlandFlyBase (www.flybase.org)Virtual Fly Brain (www.virtualflybrain.org)
+Take home messages An ontology is a classification There are lots of useful ways to classify stuff Maintaining multiple classification schemes by hand is
impractical So you should automate it.
Everybody makes mistakes So you should get the computer find errors for you
Re-use other people’s work where possible import class hierarchies use common patterns
Cautionary note – formal languages have limitations. Don’t expect to be able to express everything!
+What is an ontology ?
A set of defined, inter-related terms to use in annotation/metadata/knowledge bases.
A classification
A query-able store of (scientific) knowledge that uses logical inference.
+What is an ontology ?
A set of defined, inter-related terms to use in annotation/metadata/knowledge bases.
A classification
A query-able store of (scientific) knowledge that uses logical inference.
depends on
depends on
depends on
+What (use) is an ontology?
A set of defined, inter-related terms to use in annotation. Annotation of
papers; specimens; gene expression; phenotype… Use of common annotation terms across multiple
databases allows easy shared integration. Relations between terms allow annotations to be
grouped in scientifically meaningful ways requires an ontology to be an accurate and
scientifically meaningful classification and store of scientific knowledge.
+What is an ontology ?
A classification There are lots of scientifically useful ways to
classify a bit of anatomy. its parts and their arrangement its relation to other structures
what is it: part of; connected to; adjacent to, overlapping? its shape its function its developmental origins its species or clade its evolutionary history?
+What is an ontology ?
The scientific knowledge an ontology contains can make the reasons for classification explicit. e.g.
Any sense organ that functions in the detection of smell is an olfactory sense organ
All large basiconic sensilla of the antenna function in detection of smell
Therefore all large basiconic sensilla of the antenna are are olfactory sense organs
+Virtual Fly Brain Demo
+Why ontology development is like software or database development Ideal case –
maintainable basic maintenance (e.g. correcting simple errors) is
easy scalable
grow your project as large as you need without breaking
extensible easy to add new functionality without breaking existing
integrate-able Can integrate easily with work of others – so you don’t
have to solve all problems yourself
+Why ontology development is like software or database development Ideal case – Future editors can build on your work
maintainable – By multiple editors basic maintenance (e.g. correcting simple errors) is easy
scalable – By multiple editors grow your project as large as you need without breaking
extensible – By multiple editors easy to add new functionality without breaking existing
integrate-able Can integrate easily with work of others – so you don’t
have to solve all problems yourself
+How not to build ontologies- The trap
A small, simple ontology or program with one developer can get away with practices that a large one can not given
shallow, single inheritance classification (each class has 0-1 superclasses)
very few relationship types < 1000 terms.
it is feasible to: have little annotation/documentation have no automated error checking have no automated classification keep redundancy to a minimum by hand
+How not to build ontologies- The trap
Small, simple ontologies and programs have a habit of growing large and complicated. Users demand lots more terms for annotation Users demand multiple axes of classification
No scientific reason to favor one over another Users demand/editors favor multiple relationship
types to record information they believe scientifically important.
Editors/coders move on someone else has to continue their work. Is the
documentation mainly in the old developers head?
+How not to build ontologies- The trap
Worst case scenario – the tangled pit of misery: Difficult, perhaps impossible to maintain or extend
Tangled, convoluted, redundant structure with little or no documentation or annotation.
Editing tends to inadvertently break previous functionality.
Little or no error checking means you don't even notice when you break stuff. Users find out later.
Even you can't easily edit what you built 6 months ago without getting confused and making a mess.
+Avoiding tangled pits of misery
There are no perfect answers, but these might help: good annotation and documentation; good, consistent style; avoidance of redundancy; let the computer keep track of things for you modularity; automate a consistent set of tests of existing functionality (j-unit /
consistency); constant testing during development; design patterns.
+Good Practice 1:Good annotation and documentation
Clear textual definitions with references ensure accurate manual annotation make assertions of scientific fact trace-able serve as documentation for future ontology
developers
Also useful to record – for users and future developers: Experimental evidence for assertions of scientific fact Notes on confusing or conflicting usage of terms Reasons for design choices/compromises
+Options for formalization
OWL W3C standard Decidable Big open source community of tool developers multiple fast reasoners – getting better all the time Easy to read syntax – OWL Manchester syntax (OWL MS)
OBO Best thought of as a subset of OWL, with which it is
increasingly integrated Limited community of tool developers Easy(ish) to read syntax
Common logic Very powerful. But easy to come up with solutions that can’t
be usefully reasoned with.
+Relationships are the formalized part of a definition.
The criteria for class membership is recorded using textual definitions, at least some elements of which are formalized as relationships. name: insect wing def: “A membranous dorsal
appendage or the meso- or metathorax that functions in flight .” [Snodgrass, 1935]
is_a: appendage relationship: part_of thoracic
segment relationship: has_function_in flight
+Classification is transitive
If A SubClass* of B and B SubClassOf C then A SubClassOf C All members of class A are members of class C. So, the
definition of class C must apply to class A.
* OWL (MS) SubClassOf ≅ OBO is_a
+Classification is transitive ‘material anatomical entity’
<- is_a ‘sense organ’ <- is_a sensillum
<- is_a ‘olfactory sensillum’<- is_a ‘antennal basiconic
sensillum’
‘material anatomical entity’: “… has mass.” ‘sense organ’: “… functions in the detection of a stimulus
involved in sensory perception.” sensillum: “A sense organ consisting of a small cluster of
cells of various types.” ‘olfactory sensillum’: “… functions in the detection of
smell”
* OWL (MS) SubClassOf ≅ OBO is_a
+class – class relationships are quantified Class:Class relationships are many to many
Does the relation apply to all or just some of the class ? we specify this with quantifiers:
∀: for all, all, only, every ∃: there exists, some
Cautionary note – Modeling knowledge as class hierarchies defined with
quantified logic is an extremely useful but is limited. Don’t expect to be able to use if for everything you know! Expressivity of OWL is more limited still.
+relationships specify necessary conditions for class membership Being part of an insect thorax is a
necessary condition of being in the class ‘insect leg’. English:
All insect legs are part of some (type of) insect thorax
OBO (quantifiers hidden) name: insect leg relationship: part_of thorax
OWL (MS): ‘insect wing’ SubClassOf part_of
some thorax PL:
∀leg(x), ∃thorax(y) and part_of(x,y) *
* ignoring time argument from OBO RO 2005
+Classification is transitive If A SubClass* of B and B SubClassOf C then A
SubClassOf C All members of class A are members of class C. So, the
definition of class C must apply to class A.
* OWL (MS) SubClassOf ≅ OBO is_a
(all) leg part_of some thorax
‘front leg’ SubClassOf leg
therefore (all) ‘front leg’part_of some thorax
+Directionality and quantifiers
True: all ‘insect wing’ part_of some ‘insect thorax’
False: all ‘insect thorax’ has_part some ‘insect wing’
True: all ‘claw’ connected_to some ‘tarsal segment’
False: all ‘tarsal segment’ connected_to some claw
+
• It is difficult to keep track of multiple classification chains to: • ensure completeness;• avoid redundancy;• avoid introducing error
due to inheritance of classification criteria from a distant ancestor
Manually maintaining an ontology with multiple
classification schemes is impractical
+Automating multiple classification. The scientific knowledge an ontology contains
can make the reasons for classification explicit. e.g.
Any sense organ that functions in the detection of smell is an olfactory sense organ
All large basiconic sensilla of the antenna function in detection of smell
Therefore all large basiconic sensilla of the antenna are are olfactory sense organs
+Automating multiple classification. We can specify that some set of necessary conditions for class
membership are sufficient to determine class membership English
Any sense organ that functions in the detection of smell is an olfactory sense organ
OWL (MS): olfactory sense organ’ EquivalentTo: sense organ that
has_function_in some ‘detection of chemical stimulus involved in sensory perception of smell’
OBO name: olfactory sense organ intersection_of: sense organ intersection_of: has_function_in ‘detection of chemical
stimulus involved in sensory perception of smell’
+Automating multiple classification.
‘olfactory sense organ’ EquivalentTo: sense organ that has_function_in some ‘detection of chemical stimulus involved in sensory perception of smell’
‘large basiconic sensillum of antenna’ SubClassOf: ‘sense organ’; SubClassOf has_function_in some ‘detection of chemical stimulus involved in sensory perception of smell’
Reasoner concludes: ‘large basiconic sensillum of antenna’ SubClassOf ‘olfactory sense organ’ Keene & Waddell, 2007
+Use other people’s work to build your classification Gene Ontology classification of sensory processes:
+Automating multiple classification.
+Some extra OWL expressivity
In OWL we can also specify number (cardinality): (all) insect: SubClassOf
has_component exactly 6 leg
+Error checking is essential – everybody makes mistakes
Some classes don’t have instances in common. Nothing can be an oak tree and a fruit fly; an anatomical structure and a biological process. We say that such classes are disjoint
Declaring classes to be disjoint allows reasoners to find contradictions. This is especially powerful when combined with domain and range constraints.
This is your main means of error checking. Use it extensively. It also speeds up some reasoners.
+Error checking - domain and range constraints ‘cortisol secretion’ SubClassOf ‘endocrine hormone secretion’
SubClassOf process ‘adrenal gland’ SubClassOf ‘endocrine gland’ SubClassOf structure structure DisjointWith process (nothing can be both a
structure(adrenal gland) and a process (e.g. cortisol secretion) has_function_in
domain: structure* range: process*if x has_function_in y then x must be an object and y must be a process.
Now if I mistakenly add: cortisal secretion has_function_in some adrenal gland.
Inconsistency: cortisol secretion SubClassOf structure and process
* more strictly, structure= continuant; range = occurrent
+Error checking is essential – everybody makes mistakes
Some classes don’t have instances in common. Nothing can be an oak tree and a fruit fly; an anatomical structure and a biological process. We say that such classes are disjoint
Declaring classes to be disjoint allows reasoners to find contradictions. This is especially powerful when combined with domain and range constraints.
This is your main means of error checking. Use it extensively. It also speeds up some reasoners.
+Reasoner assisted error checking by eye
Keep an eye on classification inferred by the reasoner.
Protégé shows inferred classification and inherited relationships – keep an eye on these
+Reasoner assisted error checking by eye
Run some test queries – do they give the answers you expect?
+Mereologypart_of is transitive
If A part_of B part_of C part_of DThen A part_of D
overlap is not transitive. If A overlaps B overlaps C then A may or may not overlap C
B CD
A
A B CA
B
C
+Transitivity of part_of
Given (All) ‘insect coxa’
part_of some ‘insect leg’ (All) ‘insect leg’ part_of
some ‘insect thoracic segment’
(All) ‘insect thoracic segment’ part_of some ‘insect thorax’
Then (All) ‘insect coxa’
part_of some ‘insect thorax’
+Automating partonomy
As for class – maintaining multiple overlapping part hierarchies by hand is hard.
Some scope for auto-populating partonomies – e.g.- English
Any anatomical structure that functions in endocrine hormone secretion is part of some endocrine system
OWL (‘anatomical structure’ that has_function_in some ‘endocrine
hormone secretion’) SubClassOf (part_of some ‘endocrine system’) OBO
name: endocrine system component intersection_of: anatomical structure’ intersection_of: has_function_in ‘endocrine hormone secretion’ relationship: part_of endocrine system
+Declaring spatial disjointness provides error checking for partonomy
In OWL: part_of some X DisjointWith part_of some Y
+Reasoning with overlap
A overlaps B if and only ifthere exists some X and X part_of A and X part_of B
rules: If X part_of A then X overlaps AIf A has_part X then A overlaps A
overlaps. * part_of. * has_part
In OWL (MS) * = SubPropertyOfIn OBO *= is_a
A BX
A BX
+Reasoning with overlap
More rules
If A has_part X and X part_of B then X overlaps BIf C has_part A and A overlaps B then C overlaps
B If B overlaps A and A part_of C then B overlaps C
In OWL (MS):has_part o part_of -> overlaps
In OBO:name: overlapsholds_over_chain: has_part part_of
A BX
A BX
A BX
C
+
Image - Greg Jefferis
Keene & Waddell, 2007
+Shortcut relations
In OWL, we can write compound class expressions: ‘antennal lobe projection
neuron’ has_part some (soma that part_of some ‘antennal lobe cortex’)
But these can quickly get long and verbose ‘‘DL1 adPN’ has_part some
(potsynaptic membrane (GO) that part_of some (synapse (GO) that part_of some ‘DL1 glomerulus’)))
+Shortcut relations Shortcut relations stand in for
compound class expressions. ‘DL1 adPN’ has_part some
(potsynaptic membrane (GO) that part_of some (synapse (GO) that part_of some ‘DL1 glomerulus’)))
> ‘DL1 adPN’
has_postsynaptic_terminal_in some ‘DL1 glomerulus’
Can be expanded if detail needed.
Provides rigorous documentation of meaning.
+Where to start?
Make a flat list of the terms you need and list the types of classification you want to use to link them together.
Has someone already formalized this type of classification? If so, use their pattern. If not – draft some formalizations yourself:
Are any simplifications justifiable – or likely to be too misleading? DON’T FORMALIZE FOR THE SAKE OF IT! Some classifications are
hard to formalize well – or may be best left to human judgment. Import upper classifications and relations Import classifications to root for all foreign terms used. Work with ontologists to formally define relations where possible
But don’t let this become a road block!
+Technical issues
Imports: Importing whole ontologies is easy in both
OBO and OWL But importing large ontologies is impractical
in both Generating simple slices of OBO ontologies
is easy (have perl scripts, happy to share) Generating slices of OWL ontologies – some
tools (Ontofox), but still need work.
+Developing nested ontologies
CARO
VAO
Present TAO Modularized ontology
+Resources
CARO – upper ontology new version being prepared out soon.
Some standard patterns using qualities
FUNCARO provides standard patterns for representing function using CARO
+ GO
ro.owl new home for OBO relations – particularly shortcut relations.
Imports fundamental relations from BFO (basic formal ontology)
+
There are lots of scientifically useful ways to classify a bit of anatomy: parts and their arrangement - its relation to other structures
what is it: part of; connected to; adjacent to, overlapping?
its shape its function its developmental origins its species or clade its evolutionary history?
Multiple classification
+type of classification
relation object of relation
what parts does it have?
has_parthas_component (for counts)
anatomical entity
what is it part of? part_of anatomical entityquality (e.g.- shape) has_quality PATO termfunction has_function_in
capable_of (?)GOperhaps behavior ontologies?
developmental origin develops_from anatomical entity
developmental fate develops_into anatomical entity
connectivity (e.g.- muscle/tendon to bone)
connected_to anatomical entity
evolutionary origin dervied_by_descent_from ?homolgous_to ?
anatomical entity
species/clade/taxon in_taxon ? species/clade/taxon
+Avoiding tangled pits of misery
There are no perfect answers, but these might help: You do this my hand
good annotation and documentation; good, consistent style;
Automated classification and consistency checking gives: avoidance of redundancy computer keeps track of things for you automation a consistent set of tests of existing functionality (j-unit / consistency); constant testing during development;
Importing useful slices of other ontologies gives you: modularity;
Upper ontologies give you: design patterns
+Take home messages An ontology is a classification There are lots of useful ways to classify stuff Maintaining multiple classification schemes by hand is impractical
So you should automate it. Everybody makes mistakes
Let the computer find errors for you Use the reasoner to test as you build
Re-use other people’s work where possible import class hierarchies use common patterns
Cautionary note – formal languages have limitations. Don’t expect to be able to express everything!
+Acknowledgments
Virtual Fly Brain - Michael Ashburner, Cahir O’Kane, Douglas Armstrong, Simon Reeve, Nestor Milyaev
FlyBase HAO – Andy Deans/Matt Yoder/Jim Balhoff Chris Mungall, LBL Berkeley Melissa Haendel, eagle-I Alan Ruttenberg, SUNY Buffalo Barry Smith, SUNY Buffalo Robert Stevens, (Co-ode; OWL-API) Manchester University BBSRC (grant award BB/G02247X/1)
+
+Drosophila anatomy ontology as an example Circa 2006: tangled pit of misery. 6500 term. 6%
definitions. Many of them not suitable (give example). Sufficient inconsistency that not reliable for grouping
terms / reasoning - give examples Sufficiently incomplete that most queries/groupings missed
very many terms - use mechanosensory bristle (or something similar) as
example. Editing a nightmare - unclear what original reasons were for
relationships. For any term - not clear what relations already inferred or how to
+
2008 (0% inferred)
2011 (100% inferred)
sense organ 835 759chemosensory organ
14 96
gustatory organ 0 49
olfactory organ 0 37