how to build an ontology
DESCRIPTION
How to Build an Ontology. Barry Smith http://ontology.buffalo.edu/smith. Mission of the NCBO. To create software and support services for science-based ontology development and use in the biomedical domain - PowerPoint PPT PresentationTRANSCRIPT
1
How to Build an Ontology
Barry Smith
http://ontology.buffalo.edu/smith
2
Mission of the NCBOTo create software and support services for
science-based ontology development and use in the biomedical domain
Science-based = ontologies for support of scientific research (taken as encompassing evidence-based medicine)
Science-based = using the scientific method as part of the process of ontology development and testing
5
Scientific ontologies have special features
Every term in a scientific ontology must be such that the developers of the ontology believe it to refer to some entity on the basis of the best current evidence
6
For scientific ontologies
reusability is crucial
compatibility with neighboring scientific ontologies is crucial it should not be too
easy to add new terms to an ontology
we want to introduce these features in clinical medicine ...
10
An Ontological SquareUpper-level integrating ontologies
Domain ontologies
11
An Ontological SquareUpper-level integrating ontologies
Domain ontologies
Ontologies in support of science
Administrative ontologies
12
An Ontological SquareUpper-level integrating ontologies
Domain ontologies
Ontologies in support of science
BFO (Basic Formal Ontology)
DOLCE
SNOMED
SwissProt
FMA
Administrative ontologies(for e-commerce, etc.)
FOAF top level:
person, topic, document, primary topic ...
Amazon.com ontology
Library of Congress Catalog
13
Problem of ensuring sensible cooperation in a massively interdisciplinary community
concepttypeinstancemodelrepresentationdata
14
RetailPrice hasA Denomination InstanceOf Dollar (p. 101)
SI-Unit instanceof System-of-Units (p. 40)
from Handbook of Ontology(Semantic Web approach)
15
from: Ontological Engineering(Semantic Web approach)
location =def. a spatial point identified by a name (p. 12)
arrivalPlace =def. a journey ends at a location (p. 13)
facet =def. ternary relation that holds between a frame, a slot, and the facet (p. 51)
16
Entity =def
anything which exists, including things and processes, functions and qualities, beliefs and actions, documents and software (Levels 1, 2 and 3)
17
First basic distinction
universal vs. instance
(science text vs. diary)
(man vs. Maximilian)
18
Instances databases
For scientific ontologies
it is generalizations that are important = universals, types, kinds, species
19
A 515287 DC3300 Dust Collector Fan
B 521683 Gilmer Belt
C 521682 Motor Drive Belt
Catalog vs. inventory
20
Catalog vs. inventory
Catalog of Universals/Types
22
Ontology Universals Instances
23
Ontology = A Representation of Universals
24
Ontology = A representation of universals
Each node of an ontology consists of:
• preferred term (aka term)
• term identifier (TUI, aka CUI)
• synonyms
• definition, glosses, comments
25
An ontology is a representation of universals
We learn about universals in reality from looking at the results of scientific experiments in the form of scientific theories – which describe not what is particular in reality but what is general
siamese
mammal
cat
organism
substanceuniversals
animal
instances
frogleaf class
27
Domain =def
a portion of reality that forms the subject-matter of a single science or technology or mode of study or administrative practice ...;
proteomics
HIV
epidemiology
28
Representation =def
an image, idea, map, picture, name or description ... of some entity or entities.
29
Ontologies are representational artifacts
comparable to science texts
33
Periodic Table
The Periodic Table
34
Ontologies are here
35
or here
36
What do ontologies represent?
37
Ontologies do not represent concepts in people’s heads
38
They represent universals in reality
39
“leg” is not the name of a concept
concepts do not stand in
part_of
connectedness
causes
treats ...
relations to each other
A 515287 DC3300 Dust Collector Fan
B 521683 Gilmer Belt
C 521682 Motor Drive Belt
instances
universals
41
Inventory vs. Catalog:Two kinds of composite representational artifacts
Databases represent instances
Ontologies represent universals
42
How do we know which general terms designate universals?
Roughly: terms used by scientists to designate entities about which we have a plurality of different kinds of testable proposition
(cell, electron ...)
43
Problem: fiat demarcations
male over 30 years of age with family history of diabetes
abnormal curvature of spine
participant in trial #2030
44
Problem: roles
fist
patient
FDA-approved drug
45
Administrative ontologies often need to go beyond universals
Fall on stairs or ladders in water transport injuring occupant of small boat, unpowered
Railway accident involving collision with rolling stock and injuring pedal cyclist
Nontraffic accident involving motor-driven snow vehicle injuring pedestrian
46
universals vs. classes
universals
{a,b,c,...} classes
47
Class =defa maximal collection of particulars determined by a general term (‘cell’. ‘electron’), (‘ ‘restaurant in Palo Alto’, ‘Italian’)
the class A = the collection of all particulars x for which ‘x is A’ is true
48
Problem
The same general term can be used to refer both to universals and to collections of particulars. Consider:
HIV is an infectious retrovirus
HIV is spreading very rapidly through Asia
49
universals vs. classes
universals
{c,d,e,...} classes
50
Extension =def
The extension of a universal A is the class: instance of the universal A
(it is the class of A’s instances)
(the class of all entities to which the term ‘A’ applies)
51
universals vs. classes
universals
defined classes
52
universals vs. classes
universals
populations, ...
53
Defined class =def
a class defined by a general term which does not designate a universal
the class of all diabetic patients in Leipzig on 4 June 1952
54
OWL is a good representation of defined classes
• sibling of Finnish spy
• member of Abba aged > 50 years
55
Terminology =def.
a representational artifact whose representational units are natural language terms (with IDs, synonyms, comments, etc.) which are intended to designate universals together with defined classes.
56
universals, classes, concepts
universals
defined classes
‘concepts’
57
universals < defined classes < ‘concepts’
‘concepts’ which do not correspond to defined classes:
‘Surgical or other procedure not carried out because of patient's decision’
‘Absent nipple’
58
(Scientific) Ontology =def.
a representational artifact whose representational units (which may be drawn from a natural or from some formalized language) are intended to represent
1. universals in reality
2. those relations between these universals which obtain universally (= for all instances)
lung is_a anatomical structure
lobe of lung part_of lung
59
Part II
How to Build an Ontology
60
How to build an ontology
work with scientists to create an initial top-level classification
find ~50 most commonly used terms corresponding to universals in reality
arrange these terms into an informal is_a hierarchy according to this Universality principle
A is_a B every instance of A is an instance of B
fill in missing terms to give a complete hierarchy
(leave it to domain scientists to populate the lower levels of the hierarchy)
61
Principle of Low Hanging Fruit
Include even absolutely trivial assertions (assertions you know to be universally true)
pneumococcal virus is_a virus
Computers need to be led by the hand
62
MeSHMeSH Descriptors
Index Medicus Descriptor Anthropology, Education, Sociology and Social Phenomena (MeSH Category) Social Sciences Political Systems National Socialism
National Socialism is_a Political SystemsNational Socialism is_a Anthropology ...
63
Principle
Use singular nouns
Terms in ontologies represent universals
64
Goal: Each term in an ontology represents exactly one universal
there are universals also of collectivities:
population
complex of cells
65
the use-mention confusion
Conceptual Entities =Def.
An organizational header for concepts representing mostly abstract entities.
swimming is healthy and has eight letters
66
Principle
Avoid confusing between words and things
Avoid confusing between concepts in our minds and entities in reality
Recommendation: avoid the word ‘concept’ entirely
67
Trialbank
‘information’ = def. ‘a written or spoken designation of a concept’
68
‘Heparin therapy’ is an instance of ‘written or spoken designation of a concept’
What are the problems here?
1. misuse of quotation marks
2. confusion of instances and universals
3. confusion of concept and reality
Trialbank
69
Plant Ontology
cell = def. plant cell, consisting of protoplast and cell wall; ...
70
Principle
For the sake of interoperability with other ontologies, do not give special meanings to terms with established general meanings
(Don’t use ‘cell’ when you mean ‘plant cell’)
71
ICNP: International Classification of Nursing Procedures
water =def. a type of Nursing Phenomenon of Physical Environment with the specific characteristics: clear liquid compound of hydrogen and oxygen that is essential for most plant and animal life influencing life and development of human beings.
72
Principle
Supply definitions wherever possible
(both human-understandable natural language definitions, and equivalent formal definitions)
73
Principle
Each term should have at most one definition*
*which may have both natural-language and formal versions
74
The Problem of Circularity
A Person = def. A person with an identity document
cell = def. plant cell, consisting of protoplast and cell wall; ...
75
Principle
Avoid circular definitions
(The term defined should not appear in its own definition)
76
HL7
‘stopping a medication’ = def.
change of state in the record of a Substance Administration Act from Active to Aborted
77
Principle
A definition should use terms which are easier to understand than the term defined
(HL7 creates a topsy turvy world, in which simple things are made difficult)
78
Principle
Use Aristotelian definitions
An A is a B which C’s.
79
Principle
Do not seek to define everything
80
In every ontology
some terms and some relations are primitive = they cannot be defined (on pain of infinite regress)
Examples of primitive relations:
identity
instance_of
83
Rules for formatting terms• Avoid abbreviations even when it is clear
in context what they mean (‘breast’ for ‘breast tumor’)
• Avoid acronyms
• Avoid mass terms (‘tissue’, ‘brain mapping’, ‘clinical research’ ...)
• Treat each term ‘A’ in an ontology is shorthand for a term of the form ‘the universal A’
84
Univocity Terms should have the same meanings on
every occasion of use.
(They should refer to the same universals)
Basic ontological relations such as is_a and part_of should be used in the same way by all ontologies
85
Universality
Ontologies should include only those relational assertions which hold universally
pneumococcal virus causes pneumonia
86
Universality
Often, order will matter:
We can assert
adult transformation_of child
but not
child transforms_into adult
87
Universality
viral pneumonia caused by virus
but not
virus causes pneumonia
pneumococcal virus causes pneumonia
88
Universality
protocol-design earlier_than results analysis
but not
results analysis later_than protocol-design
89
Positivity
Complements of universals are not themselves universals.
Terms such as non-mammal non-membrane other metalworker in New Zealand
do not designate universals in reality
90
Ontology of universals logic of terms
There are no conjunctive and disjunctive universals:
anatomic structure, system, or substance
musculoskeletal and connective tissue disorder
rheumatism, excluding the back
91
Objectivity
Which universals exist in reality is not a function of our knowledge.
Terms such as
unknown
unclassified
unlocalized
arthropathies not otherwise specified
do not designate universals in reality.
92
Keep Epistemology Separate from Ontology
If you want to say that
We do not know where A’s are located
do not invent a new class of
A’s with unknown locations
(A well-constructed ontology should grow linearly; it should not need to delete classes or relations because of increases in knowledge)
93
If you want to say
I surmise that this is a case of pneumonia
do not invent a new class of surmised pneumonias
Keep Sentences Separate from Terms
94
Single Inheritance
No kind in a classificatory hierarchy should have more than one is_a parent on the immediate higher level
95
Multiple Inheritance
thing
carblue thing
blue car
is_a is_a
96
Multiple Inheritance
is a source of errors
encourages laziness
serves as obstacle to integration with neighboring ontologies
hampers use of Aristotelian methodology for defining terms
hampers use of statistical search tools
97
Multiple Inheritance
thing
carblue thing
blue car
is_a1 is_a2
98
is_a Overloading
The success of ontology alignment demands that ontological relations (is_a, part_of, ...) have the same meanings in the different ontologies to be aligned.
99
Compositionality
The meanings of compound terms should be determined
1. by the meanings of component terms
together with
2. the rules governing syntax
100
Why do we need rules/standards for good ontology?
Ontologies must be intelligible both to humans (for annotation and curation) and to machines (for reasoning and error-checking): the lack of rules for classification leads to human error and blocks automatic reasoning and error-checking
Intuitive rules facilitate training of curators and annotators
Common rules allow alignment with other ontologies
ontologies are legends for cartoons
102
Randomized controlled trialshttp://rctbank.ucsf.edu/ontology/outline/index.htm
103
Top-Level Class Hierarchy for RCT
Root Secondary-study
Trial-details
Trial
Concept • Generic-concept • Population-concept • Protocol-concept • Design-concept • Outcome-concept • Administrative-concept • Intervention-concept
104
Trial DetailsRoot
Secondary-study Trial-details
• Erratum • Publication-details • Trial-entry-details • Administrative-details
– Secondary-administrative-details – Primary-administrative-details
» Executed-administrative-details » Intended-administrative-details
• Conclusion-details • Background-details
– Intended-background-details – Executed-background-details
• Stopping-details • Retraction-details • Correction-details • Fraud-details
105
Top-Most Class Hierarchy for RCT
Root Secondary-study
Trial-details
Trial
Concept • Generic-concept • Population-concept • Protocol-concept • Design-concept • Outcome-concept • Administrative-concept • Intervention-concept
106
Concept • Generic-concept
– Term-information – Time-entity – Rule-concept – Situation
• Population-concept – Subgroup – Recruitment-flowchart – Population – Recruitment – Site-enrollment
• Protocol-concept – Follow-up-compliance – Follow-up-activity – Follow-up – Protocol-change – Treatment-assignment – Protocol – Reason – Outcomes-followup – Secondary-study-protocol
107
Concept • Design-concept
– Survival-analysis-and-results – Statistical-analysis-and-results – Sample-size-calculation – Trial-design – Hypothesis-concept – Study-objective – Study-monitoring – Regression-analysis-and-results – Stopping-rule
• Outcome-concept – Special-variable-information – Outcome-assessment – Miscellaneous-outcome-entity – Result-entity – Outcome-value-entity – Outcome
108
Concept • Administrative-concept
– Publication-concept – Study-site – Person – Ethics – Study-committee – Funder – Institution – Registry-id
• Intervention-concept – Blinding-concept – Compliance-details – Intervention-step – Intervention-arm – Co-intervention – Intervention – Compliance-result – Intervention-logic
109
Top-Level Class Hierarchy for RCT
Root Secondary-study
Trial-details
Trial
Concept • Generic-concept • Population-concept • Protocol-concept • Design-concept • Outcome-concept • Administrative-concept • Intervention-concept
110
What the top level should look like
111
Two kinds of entities
occurrents (processes, events, happenings)
continuants (objects, qualities, states...)
112
Continuants (aka endurants)have continuous existence in timepreserve their identity through changeexist in toto whenever they exist at all
Occurrents (aka processes)have temporal partsunfold themselves in successive phasesexist only in their phases
113
You are a continuant
Your life is an occurrent
You are 3-dimensional
Your life is 4-dimensional
114
Dependent entities
require independent continuants as their bearers
There is no run without a runner
There is no grin without a cat
115
Dependent vs. independent continuants
Independent continuants (organisms, buildings, environments)
Dependent continuants (quality, shape, role, propensity, function, status, power, right)
116
All occurrents are dependent entities
They are dependent on those independent continuants which are their participants (agents, patients, media ...)
117
BFO Top-Level Ontology
ContinuantOccurrent
(always dependent on one or more
independent continuants)
IndependentContinuant
DependentContinuant
118
= A representation of top-level types
Continuant Occurrent
IndependentContinuant
DependentContinuant
cell component
biological process
molecular function
119
Top-Level Ontology
Continuant Occurrent
IndependentContinuant
DependentContinuant
Functioning
Side-Effect, Stochastic Process, ...
Function
120
Top-Level Ontology
Continuant Occurrent
IndependentContinuant
DependentContinuant
Functioning Side-Effect, Stochastic Process, ...
Function
121
Top-Level Ontology
Continuant Occurrent
IndependentContinuant
DependentContinuant
Quality Function Spatial Region
Functioning Side-Effect, Stochastic Process, ...
instances (in space and time)
122
123
124
CTO will be part of OBI
Ontology of Biomedical Investigations
http://obi.sourceforge.net
which is in turn part of the OBO Foundry
http://obofoundry.org
125
126
127
128
129
132
Top-Level Class Hierarchy for RCT
Root Secondary-study
Trial-details
Trial
Concept • Generic-concept • Population-concept • Protocol-concept • Design-concept • Outcome-concept • Administrative-concept • Intervention-concept
133
Amended Top-Level Class Hierarchy for RCT
EntityContinuant
PopulationProtocolDesign
OccurrentTrial
Secondary-study Intervention
?? Trial-details ?? Outcome-concept ?? Administrative-concept
134
Concept • Generic-concept
– Term-information – Time-entity – Rule-concept
» Clinical-rule
Exclusion-rule
Inclusion-rule » Rule-entity
Recursive-rule
Base-rule » Ethnicity-language-rule » Age-gender-rule » Situation
135
136
137
Concept • Protocol-concept
– Follow-up-compliance – Follow-up-activity – Follow-up – Protocol-change – Treatment-assignment – Protocol – Reason – Outcomes-followup – Secondary-study-protocol
138
Amended Top-Level Class Hierarchy for RCT
EntityContinuant
Protocol• Secondary-study-protocol
Reason Occurrent
• Treatment-assignment • Follow-up
– Follow-up-activity – Outcomes-follow-up
• Protocol-change
139
Concept • Population-concept
– Subgroup – Recruitment-flowchart – Population – Recruitment – Site-enrollment
140
Amended Top-Level Class Hierarchy for RCT
EntityContinuant
Protocol• Secondary-study-protocol
Recruitment-flowchart Reason Population
• Subgroup
Occurrent• Priors
– Recruitment– Site-enrollment – Treatment-assignment
• Follow-up – Follow-up-activity – Outcomes-follow-up
• Protocol-change
141
Concept • Administrative-concept
– Publication-concept – Study-site – Person – Ethics – Study-committee – Funder – Institution – Registry-id
142
Continuant• Information object
– Publication – Registry-ID
• Study-site • Person • Institution
– Study-committee – Funder
???Ethics
143
Concept • Intervention-concept
– Blinding-concept – Compliance-details – Intervention-step – Intervention-arm – Co-intervention – Intervention – Compliance-result – Intervention-logic
144
Occurrent• Intervention
– Blinding– Intervention-step – Intervention-arm – Co-intervention
• ??? Intervention-logic
• ??? Compliance-result
• ??? Compliance-details
167
END