1 ontology tutorial part 1 what is ontology and what can it do? barry smith
Post on 21-Dec-2015
221 views
TRANSCRIPT
1
Ontology Tutorial Part 1What is Ontology
and What Can It Do?
Barry Smith
http://ontology.buffalo.edu/smith
2
The problem of data integration / information fusion
About 30,000 genes in a human
Probably 100-200,000 proteins
Individual variation in most genes
100s of cell types
100,000s of disease types
3
DNA
Protein
Organelle
Cell
Tissue
Organ
Organism
Muscle tissueNerve tissueConnective tissueEpithelial tissueBlood
Musculo-skeletal systemCirculatory systemRespiratory systemDigestive systemNervous systemUrinary systemReproductive systemEndocrine systemLymphoidal system
MitochondriaNucleusEndoplasmic reticulumCell membrane
4
The Challenge
Each (clinical, pathological, genetic, proteomic, pharmacological …) information system uses its own terminology and category systembiomedical research demands the ability to navigate through all such information systems How can we overcome the incompatibilities which become apparent when data from distinct sources is combined?
5
Answer:
“Ontology”
6
Three senses of ontology
1. Philosophical sense: an inventory of the types of entities and relations in reality
2. Knowledge engineering sense: an ontology as a consensus representation of the concepts used in a given domain(Semantic Web)
3. Ontology as controlled vocabulary(Gene Ontology, Open Biological Ontologies Consortium)
7
Three senses of ontology
1. Philosophical sense: an inventory of the types of entities and relations in reality
2. Knowledge engineering sense: an ontology as a consensus representation of the concepts used in a given domain
(Semantic Web)
3. Ontology as controlled vocabulary
(Gene Ontology, Open Biological Ontologies Consortium)
8
Ontology as a branch of philosophy
seeks to establish
the basic formal-ontological structures
the kinds and structures of objects, properties, events, processes and relations in each material domain of reality
9
Formal ontology an analogue of pure mathematics
Can be applied to different domains
10
Material ontology a kind of generalized chemistry or zoology
(Aristotle’s ontology grew out of biological classification)
11
Aristotle
world’s first ontologist
12
World‘s first ontology
(from Porphyry’s Commentary on Aristotle’s Categories)
13
Linnaean Ontology
14
Formal Ontology– theory of part and whole– theory of dependence / unity– theory of boundary, continuity and contact– theory of universals and instances – theory of continuants and occurrents (objects
and processes)– theory of functions and functioning– theory of granularity
15
Formal Ontology
the theory of those ontological structures
(such as part-whole, universal-particular)
which apply to all domains whatsoever
16
Formal-Ontological Categoriessubstanceprocessfunctionunitypluralitysitedependent partindependent part
are able to form complex structures in non-arbitrary ways joined by relations such as part, dependence, location.
17
A Network of Domain Ontologies
Material (Regional) Ontologies
Basic Formal Ontology
18
19
Three senses of ontology
1. Philosophical sense: an inventory of the types of entities and relations in reality
2. Knowledge engineering sense: an ontology as a consensus representation of the concepts used in a given domain
(Semantic Web)
3. Ontology as controlled vocabulary
(Gene Ontology, Open Biological Ontologies Consortium)
20
Assumptions
Communication / compatibility problems should be solved automatically
(by machine)
Hence ontologies must be applications running in real time
21
Application ontology:
Ontologies are inside the computer
thus subject to severe constraints on expressive power
(effectively the expressive power of Description Logic)
22
Problem: Confusion of concepts and entities in reality
Don’t construct theories of reality; construct ‘models’ of ‘concepts’
23
The Semantic Web
Ontology in the Knowledge Engineering Sense
24
A new silver bullet
25
The Semantic Web
designed to integrate the vast amounts of heterogeneous online data and services
via dramatically better support at the level of metadata designed to yield the ability to query and integrate across different conceptual systems
26
Tim Berners-Lee, inventor of the internet
‘sees a more powerful Web emerging, one where documents and data will be annotated with special codes allowing computers to search and analyze the Web automatically. The codes … are designed to add meaning to the global network in ways that make sense to computers’
27
hyperlinked vocabularies, called
‘ontologies’ will be used by Web authors ‘to explicitly define their words and
concepts as they post their stuff online.
‘The idea is the codes would let software "agents" analyze the Web on our behalf, making smart inferences that go far beyond the simple linguistic analyses performed by today's search engines.’
28
Exploiting tools such as:
XML
OWL (Ontology Web Language)
RDF (Resource Descriptor Framework)
DAML-OIL (Darpa Agent Mark-Up Language – Ontology Inference Layer)
(confusing syntactic integration with semantic integration)
29
Ontology confused with: the language of ontology
‘Ontology’ for semantic webbers is without content
Philosophical ontology = build a theory of reality
Semantic-web-style ontology = build a model of the data in our computers
30
Defining ‘gene’
GDB: a gene is a DNA fragment that can be transcribed and translated into a protein
Genbank: a gene is a DNA region of biological interest with a name and that carries a genetic trait or phenotype
31
Example: The Enterprise OntologyA Sale is an agreement between two Legal-
Entities for the exchange of a Product for a Sale-Price.
A Strategy is a Plan to Achieve a high-level Purpose.
A Market is all Sales and Potential Sales within a scope of interest.
32
Example: Statements of Accounts
Company Financial statements may be prepared under either the (US) GAAP or the (European) IASC standards
These allocate cost items to different categories depending on the laws of the countries involved.
33
Job:
to develop an algorithm for the automatic conversion of income statements and balance sheets between the two systems.Not even this relatively simple problem has been satisfactorily resolved
… why not?
Because the very same terms mean different things and are applied in different ways in different cultures
34
The Semantic Web Initiative
The Web is a vast edifice of heterogeneous data sources
Needs the ability to query and integrate across different conceptual systems
35
How resolve incompatibilities?
enforce terminological compatibility via standardized term hierarchies, with standardized definitions of terms, which
1. satisfy the constraints of a description logic (DL)
2. are applied as meta-tags to the content of websites
36
Clay Shirky
The Semantic Web is a machine for creating syllogisms.
Humans are mortalGreeks are humanTherefore, Greeks are mortal
37
Lewis Carroll
- No interesting poems are unpopular among people of real taste - No modern poetry is free from affectation - All your poems are on the subject of soap-bubbles - No affected poetry is popular among people of real taste - No ancient poetry is on the subject of soap-bubbles
Therefore: All your poems are bad.
38
the promise of the Semantic Web
it will improve all the areas of your life where you currently use syllogisms
39
We can use the Semantic Webto prove that Joe loves Mary
we found two documents on a trusted site, one of which said that ":Joe :loves :MJS", and another of which said that ":MJS daml:equivalentTo :Mary". We also got the checksums of the files in person from the maintainer of the site.
To check this information, we can list the checksums in a local file, and then set up some FOPL rules that say "if file 'a' contains the information Joe loves mary and has the checksum md5:0qrhf8q3hfh, then record SuccessA", "if file 'b' contains the information MJS is equivalent to Mary, and has the checksum md5:0892t925h, then record SuccessB", and "if SuccessA and SuccessB, then Joe loves Mary". [http://infomesh.net/2001/swintro/]
40
Merging Databases
Merging databases simply becomes a matter of recording in RDF somewhere that "Person Name" in your database is equivalent to "Name" in my database, and then throwing all of the information together and getting a processor to think about it. [http://infomesh.net/2001/swintro/]
Is your "Person Name = John Smith" the same person as my "Name = John Q. Smith"? Who knows? Not the Semantic Web
41
XML-syntax does not help<BUSINESS-CARD>
<FIRSTNAME>Jules</FIRSTNAME> <LASTNAME>Deryck</LASTNAME> <COMPANY>Newco</COMPANY> <MEMBEROF>XTC Group</MEMBEROF> <JOBTITLE>Business Manager</JOBTITLE> <TEL>+32(0)3.471.99.60</TEL> <FAX>+32(0)3.891.99.65</FAX> <GSM>+32(0)465.23.04.34</GSM> <WEBSITE>www.newco.com</WEBSITE> <ADDRESS> <STREET>Dendersesteenweg 17</STREET> <ZIP>2630</ZIP> <CITY>Aartselaar</CITY> <COUNTRY>Belgium</COUNTRY> </ADDRESS> </BUSINESS-CARD>
42
and with correct XML-syntax:<BUSINESS-CARD>
<FIRSTNAME>Jules</FIRSTNAME> <LASTNAME>Deryck</LASTNAME> <COMPANY>Newco</COMPANY> <MEMBEROF>XTC Group</MEMBEROF> <JOBTITLE>Business Manager</JOBTITLE> <TEL>+32(0)3.471.99.60</TEL> <FAX>+32(0)3.891.99.65</FAX> <GSM>+32(0)465.23.04.34</GSM> <WEBSITE>www.newco.com</WEBSITE> <ADDRESS> <STREET>Dendersesteenweg 17 </STREET>
43
and with correct XML-syntax:<BUSINESS-CARD>
<FIRSTNAME>Jules</FIRSTNAME> <LASTNAME>Deryck</LASTNAME> <COMPANY>Newco</COMPANY> <MEMBEROF>XTC Group</MEMBEROF> <JOBTITLE>Business Manager</JOBTITLE> <TEL>+32(0)3.471.99.60</TEL> <FAX>+32(0)3.891.99.65</FAX> <GSM>+32(0)465.23.04.34</GSM> <WEBSITE>www.newco.com</WEBSITE> <ADDRESS> <STREET>Dendersesteenweg 17</STREET> <ZIP>2630</ZIP> <CITY>Aartselaar</CITY> <COUNTRY>Belgium</COUNTRY> </ADDRESS> </BUSINESS-CARD>
Is "Jules" the first name of the person, or of the business-card?
44
and with correct XML-syntax:<BUSINESS-CARD>
<FIRSTNAME>Jules</FIRSTNAME> <LASTNAME>Deryck</LASTNAME> <COMPANY>Newco</COMPANY> <MEMBEROF>XTC Group</MEMBEROF> <JOBTITLE>Business Manager</JOBTITLE> <TEL>+32(0)3.471.99.60</TEL> <FAX>+32(0)3.891.99.65</FAX> <GSM>+32(0)465.23.04.34</GSM> <WEBSITE>www.newco.com</WEBSITE> <ADDRESS> <STREET>Dendersesteenweg 17</STREET> <ZIP>2630</ZIP> <CITY>Aartselaar</CITY> <COUNTRY>Belgium</COUNTRY> </ADDRESS> </BUSINESS-CARD>
Is Jules or Newco the member of XTC Group?
45
and with correct XML-syntax:<BUSINESS-CARD>
<FIRSTNAME>Jules</FIRSTNAME> <LASTNAME>Deryck</LASTNAME> <COMPANY>Newco</COMPANY> <MEMBEROF>XTC Group</MEMBEROF> <JOBTITLE>Business Manager</JOBTITLE> <TEL>+32(0)3.471.99.60</TEL> <FAX>+32(0)3.891.99.65</FAX> <GSM>+32(0)465.23.04.34</GSM> <WEBSITE>www.newco.com</WEBSITE> <ADDRESS> <STREET>Dendersesteenweg 17</STREET> <ZIP>2630</ZIP> <CITY>Aartselaar</CITY> <COUNTRY>Belgium</COUNTRY> </ADDRESS> </BUSINESS-CARD>
Do the phone numbers and address belong to Jules or to the business?
46
Shirkey:
The Semantic Web's philosophical argument -- the world should make more sense than it does -- is hard to argue with. The Semantic Web, with its neat ontologies and its syllogistic logic, is a nice vision. However, like many visions that project future benefits but ignore present costs, it requires too much coordination and too much energy to be effective in the real world …
47
Semantic Web effort
thus far devoted primarily to developing systems for standardized representation of web pages and web processes
(= ontology of web typography)
not to the harder task of developing of ontologies (term hierarchies) for the content of such web pages
48
Cory Doctorow
A world of exhaustive, reliable metadata would be a utopia.
49
Problem 1: People lie
Meta-utopia is a world of reliable metadata. But poisoning the well can confer benefits to the poisoners
Metadata exists in a competitive world.Some people are crooks. Some people are cranks. Some people are French philosophers.
50
Problem 2: People are lazy
Half the pages on Geocities are called “Please title this page”
51
Problem 3: People are stupid
The vast majority of the Internet's users (even those who are native speakers of English)cannot spell or punctuate Will internet users learn to accurately tag their information with whatever DL-hierarchy they're supposed to be using?
52
Problem 4: Ontology Impedance
= semantic mismatch between ontologies being merged
This problem recognized in Semantic Web literature:
http://ontoweb.aifb.uni-karlsruhe.de
/About/Deliverables/ontoweb-del-7.6-swws1.pdf
53
Solution 1:treat it as (inevitable)
‘impedance’
and learn to find ways to cope with the disturbance which it brings
Suggested here:
http://ontoweb.aifb.uni-karls-ruhe.de/Ab-out/Deliverables/ontoweb-del-7.6-swws1.pdf
54
Solution 2: resolve the impedance problem on a case-by-case basis
Suppose two databases are put on the web.
Someone notices that "where" in the friends table and "zip" in the places table mean the same thing.
http://www.w3.org/DesignIssues/Semantic.html
55
Both solutions fail
1. treating mismatches as ‘impedance’ ignores the problem of error propagation
(and is inappropriate in an area like medicine)
2. resolving impedance on a case-by-case basis defeats the very purpose of the Semantic Web
56
Ontology Impedance
‘gene’ used in websites issued by
biotech companies involved in gene patenting
medical researchers interested in role of genes in predisposition to smoking
insurance companies
57
The idea:
distinguish two separate tasks:
- developing an expressively rich correct ontologies of given domains
- developing on this basis computer applications capable of running in real time
58
Basic Formal Ontology
BFOThe Vampire Slayer
59
60
BFO
ontology not the ‘standardization’ or ‘specification’ of concepts
(not a branch of knowledge or concept engineering)
but an inventory of the types of entities existing in reality
61
BFO not a computer application
but a reference ontology
in the sense of Aristotelian philosophyin the sense of Aristotelian philosophy
- it sacrifices tractability for the sake of - it sacrifices tractability for the sake of expressive powerexpressive power
62
Defining ‘gene’
GDB: a gene is a DNA fragment that can be transcribed and translated into a protein
Genbank: a gene is a DNA region of biological interest with a name and that carries a genetic trait or phenotype
63
Ontology
‘fragment’, ‘region’, ‘name’, ‘carry’, ‘trait’, ‘type’
... ‘part’, ‘whole’, ‘function’, ‘inhere’, ‘substance’ …
are ontological terms in the sense of traditional (philosophical) ontology
64
BFO
not just a system of categoriesbut a formal theory with definitions, axioms, theoremsdesigned to provide formal resources for the
building of reference ontologies for specific domains
the latter should be of sufficient richness that terminological incompatibilities can be resolved intelligently rather than by brute force
65
The Reference Ontology Community
IFOMIS (Saarbrücken) Laboratories for Applied Ontology (Trento/Rome,
Turin)Foundational Ontology Project (Leeds)Ontology Works (BaltimoreDepartment of Structural Biology (Seattle)Virtual Soldier Project (DARPA)Open Biological Ontologies Consortium
(Cambridge, Berkeley, Bar Harbor)
66
67
Ontology Tutorial Part 2The Future of Ontology in
Biomedicine
68
Ontology Tutorial Part 2:The Future of Ontology in
Buffalo
69
Ontology Tutorial Part 2The Future of Ontology in
Biomedicine
70
Three senses of ontology
1. Philosophical sense: an inventory of the types of entities and relations in reality
2. Knowledge engineering sense: an ontology as a consensus representation of the concepts used in a given domain
(Semantic Web)
3. Ontology as controlled vocabulary
(Gene Ontology, Open Biological Ontologies Consortium)
71
Philosophical Ontology
Ontologies are WINDOWS ON REALITY
Ontologies deal with classes/universals/invariants in reality
which exist independently of our theorizing
and independently of our language
72
What are universals?
invariants in reality
satisfying biological laws(there are truths about universals in
biological textbooks)
73
A universal is not determined by its instances as a state is not determined by its citizens
A universal may vary with time as an organism may vary with time (by gaining and losing molecules)
74
Universals are Not Sets
A set is an abstract structure, existing outside time and space. The set of Romans timelessly has Julius Caesar as a member.Universals exist in time.
75
A Window on Reality
76
Medical Diagnostic Hierarchy
a hierarchy in the realm of diseases
77
Dependence Relations
Organisms Diseases
78
A Window on Reality
Organisms Diseases
79
A Window on Reality
80
siamese
mammal
cat
organism
substanceuniversals
animal
instances
frog
81
82
Many current standard ‘ontologies’ ramshackle because they have no
counterpart of formal ontologyThe Universal Medical Language System (UMLS)
a compendium of source vocabularies including:
HL7 RIM
SNOMED
International Classification of Diseases
MeSH – Medical Subject Headings
Gene Ontology
83
Three senses of ontology
1. Philosophical sense: an inventory of the types of entities and relations in reality
2. Knowledge engineering sense: an ontology as a consensus representation of the concepts used in a given domain(Semantic Web)
3. Ontology as controlled vocabulary(Gene Ontology, Open Biological Ontologies Consortium)
84
Problem: The different source vocabularies are incompatible with
each other
85
Problem: They contain bad coding
which often derives from failure to pay attention to simple logical or ontological principles or from principles of good definitions
86
Bad Coding
Plant roots is-a Plant
Plant leaves is-a Plant
Pollen is-a Plant
Both testes is a testis
Both uterii is a uterus
87
Bad definitions
Heptolysis =def the cause of heptolysis
Biological process =def a biological goal that requires more than one function
88
The Concept Orientation
Work on biomedical ontologies grew out of work on medical dictionaries and nomenclatures
Has focused almost exclusively on ‘concepts’ conceived (sometimes confused with terms/descriptions).
89
The Curse of Linguistics
Work on biomedical ontologies grew out of work on medical dictionaries and nomenclatures
This led to the assumption that all that need be said about classes can be said without appeal to time or to instances in reality.
Ontology is about meanings/terms/strings
90
An alternative research programme for ontology
based on philosophical principles
Terms in bio-ontologies refer not to ‘concepts’
but to universals in reality
91
already reformed
Foundational Model of Anatomy Anatomy Reference Ontology
92
Anatomical Entity
Physical Anatomical Entity
Material Physical Anatomical Entity
-is a-
Non-material Physical Anatomical Entity
ConceptualAnatomical Entity
AnatomicalStructure
BodySubstance
BodyPart
HumanBody
OrganSystem
OrganCell
OrganPart
AnatomicalSpace
Anatomical Relationship
CellPart
Biological Macromolecule
Tissue
93
Anatomical Entity
Physical Anatomical Entity
Material Physical Anatomical Entity
-is a-
Non-material Physical Anatomical Entity
ConceptualAnatomical Entity
AnatomicalStructure
BodySubstance
BodyPart
HumanBody
OrganSystem
OrganCell
OrganPart
AnatomicalSpace
Anatomical Relationship
CellPart
Biological Macromolecule
Tissue
A window on reality
94
Pleural Cavity
Pleural Cavity
Interlobar recess
Interlobar recess
Mesothelium of Pleura
Mesothelium of Pleura
Pleura(Wall of Sac)
Pleura(Wall of Sac)
VisceralPleura
VisceralPleura
Pleural SacPleural Sac
Parietal Pleura
Parietal Pleura
Anatomical SpaceAnatomical Space
OrganCavityOrganCavity
Serous SacCavity
Serous SacCavity
AnatomicalStructure
AnatomicalStructure
OrganOrgan
Serous SacSerous Sac
MediastinalPleura
MediastinalPleura
TissueTissue
Organ PartOrgan Part
Organ Subdivision
Organ Subdivision
Organ Component
Organ Component
Organ CavitySubdivision
Organ CavitySubdivision
Serous SacCavity
Subdivision
Serous SacCavity
Subdivision
95
To represent ontological relations we need to take instances into account
To say A part_of B is not to say anything about Bs’ need for As as parts
96
part_of as a relation between universals
A part_of B =def
given any x, if inst(x, A) then there is some y such that inst(y, B) and part(x, y)
human testis part_of human being,
But not:
heart part_of human being.
97
already reformed
Foundational Model of Anatomy Anatomy Reference Ontology
98
under construction / overhaul
Physiology Reference Ontology
Gene Ontology
OBOL
99
The Gene Ontology
a controlled vocabulary for annotations of genes and gene products
100
When a gene is identified
three important types of questions need to be addressed:
1. Where is it located in the cell?
2. What functions does it have on the molecular level?
3. To what biological processes do these functions contribute?
101
GO has three ontologies
molecular functions
cellular components
biological processes
102
GO astonishingly influential
used by all major species genome projects
used by all major pharmacological research groups
used by all major bioinformatics research groups
103
GO part of the Open Biological Ontologies consortium
Fungal Ontology
Plant Ontology
Yeast Ontology
Disease Ontology
Mouse Anatomy Ontology
Cell Ontology
Sequence Ontology
Relations Ontology
104
Each of GO’s ontologies
is organized in a graph-theoretical structure involving two sorts of links or edges:
is-a (= is a subtype of )
(copulation is-a biological process)
part-of
(cell wall part-of cell)
105
106
107
cellular components
molecular functions
biological processes
1372 component terms
7271 function terms
8069 process terms
108
The Cellular Component Ontology (counterpart of anatomy)
flagellum
chromosome
membrane
cell wall
nucleus
109
The Molecular Function Ontology
ice nucleation
protein stabilization
kinase activity
binding
The Molecular Function ontology is (roughly) an ontology of actions on the molecular level of granularity
110
Biological Process Ontology
glycolysis
copulation
death
An ontology of occurrents on the level of granularity of cells, organs and whole organisms
111
GO built by biologists
free of the Curse of Linguistics
free of the Curse of Computer Science
112
but problems still remain
menopause part_of aging
aging part_of death
menopause part_of death
113
heptolysis
Definition
The causes of heptolysis …
114
regulation of sleep part_of sleep
extrinsic to membrane part_of membrane
115
GO uses only two relations
is_a and part_of
116
hence GO has only sentences of the forms A is_a B and A part_of B
no way to express ‘not’ and no way to express ‘is localized at’ and no way to express ‘I don’t know’:
117
Holliday junction helicase complex
is-a
unlocalized
cellular component unknown is-a cellular component
118
Old GO definition of part_of
A part_of B =def A can be part of B
119
New GO definition of part_of as part of current OBOL reform effort
A part_of B =def
given any x, if inst(x, A) then there is some y such that inst(y, B) and part(x, y)
120
Analogous problems for nearly all foundational relations of ontologies and semantic networks:
A causes B
A is associated with B
A is located in B
etc.
Reference to instances is necessary to clear up these problems
121
122
The Future of Ontology in Buffalo
http://ontology.buffalo.edu/bcor/
to provide a forum within which philosophical ontologists and those involved in ontology
applications can work together in high-level interdisciplinary research
to assist in coordination and integration of projects in ontological research being
pursued in Buffalo
123
Gary Byrd
Charles Dement
Randall Dipert
John Eisner
Daniel Fischer
Louis Goldberg
Jorge Gracia
David Hershenov
Rajiv Kishore
Eric Little
James Llinas
David Mark
Bill Rapaport
Galina Rogova
Ram Ramesh
Stuart C. Shapiro
Barry Smith
Rohini Srihari
Moises Sudit
124
College of Arts and Sciences
Computer Science and Engineering
School of Management
Center of Excellence in Bioinformatics
School of Informatics
School of Dental Medicine
Center for Multisource Information Fusion
National Center for Geographic Information and Analysis
School of Medicine and Biomedical Sciences
125
Computer Science and Engineering
School of Management
Charles Dement
Pharma of the Future
126
Computer Science and Engineering
Daniel Fischer
Bill Rapaport
Stuart Shapiro
Rohini Srihari
127
School of Management
Ram Ramesh
Rajiv Kishore
128
Center of Excellence in Bioinformatics
Daniel Fischer
129
School of Informatics / School of Medicine
Gary Byrd
Medical Informatics Certificate Program
130
School of Dental Medicine
John Eisner
Louis Goldberg
SNODENT
131
Center for Multisource Information Fusion
Eric Little
James Llinas
Galina Rogova
Moises Sudit
132
National Center for Geographic Information and Analysis
David Mark
Barry Smith
133
Department of Philosophy
Barry Smith (Director?)
Randall Dipert
Jorge Gracia
David Hershenov
Ingvar Johansson
Jiyuan Yu
134
Goal
To show how philosophical ontology can contribute to the successful application of ontologies in information systems