1 ontology tutorial part 1 what is ontology and what can it do? barry smith

Post on 21-Dec-2015

221 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Ontology Tutorial Part 1What is Ontology

and What Can It Do?

Barry Smith

http://ontology.buffalo.edu/smith

2

The problem of data integration / information fusion

About 30,000 genes in a human

Probably 100-200,000 proteins

Individual variation in most genes

100s of cell types

100,000s of disease types

3

DNA

Protein

Organelle

Cell

Tissue

Organ

Organism

Muscle tissueNerve tissueConnective tissueEpithelial tissueBlood

Musculo-skeletal systemCirculatory systemRespiratory systemDigestive systemNervous systemUrinary systemReproductive systemEndocrine systemLymphoidal system

MitochondriaNucleusEndoplasmic reticulumCell membrane

4

The Challenge

Each (clinical, pathological, genetic, proteomic, pharmacological …) information system uses its own terminology and category systembiomedical research demands the ability to navigate through all such information systems How can we overcome the incompatibilities which become apparent when data from distinct sources is combined?

5

Answer:

“Ontology”

6

Three senses of ontology

1. Philosophical sense: an inventory of the types of entities and relations in reality

2. Knowledge engineering sense: an ontology as a consensus representation of the concepts used in a given domain(Semantic Web)

3. Ontology as controlled vocabulary(Gene Ontology, Open Biological Ontologies Consortium)

7

Three senses of ontology

1. Philosophical sense: an inventory of the types of entities and relations in reality

2. Knowledge engineering sense: an ontology as a consensus representation of the concepts used in a given domain

(Semantic Web)

3. Ontology as controlled vocabulary

(Gene Ontology, Open Biological Ontologies Consortium)

8

Ontology as a branch of philosophy

seeks to establish

the basic formal-ontological structures

the kinds and structures of objects, properties, events, processes and relations in each material domain of reality

9

Formal ontology an analogue of pure mathematics

Can be applied to different domains

10

Material ontology a kind of generalized chemistry or zoology

(Aristotle’s ontology grew out of biological classification)

11

Aristotle

world’s first ontologist

12

World‘s first ontology

(from Porphyry’s Commentary on Aristotle’s Categories)

13

Linnaean Ontology

14

Formal Ontology– theory of part and whole– theory of dependence / unity– theory of boundary, continuity and contact– theory of universals and instances – theory of continuants and occurrents (objects

and processes)– theory of functions and functioning– theory of granularity

15

Formal Ontology

the theory of those ontological structures

(such as part-whole, universal-particular)

which apply to all domains whatsoever

16

Formal-Ontological Categoriessubstanceprocessfunctionunitypluralitysitedependent partindependent part

are able to form complex structures in non-arbitrary ways joined by relations such as part, dependence, location.

17

A Network of Domain Ontologies

Material (Regional) Ontologies

Basic Formal Ontology

18

19

Three senses of ontology

1. Philosophical sense: an inventory of the types of entities and relations in reality

2. Knowledge engineering sense: an ontology as a consensus representation of the concepts used in a given domain

(Semantic Web)

3. Ontology as controlled vocabulary

(Gene Ontology, Open Biological Ontologies Consortium)

20

Assumptions

Communication / compatibility problems should be solved automatically

(by machine)

Hence ontologies must be applications running in real time

21

Application ontology:

Ontologies are inside the computer

thus subject to severe constraints on expressive power

(effectively the expressive power of Description Logic)

22

Problem: Confusion of concepts and entities in reality

Don’t construct theories of reality; construct ‘models’ of ‘concepts’

23

The Semantic Web

Ontology in the Knowledge Engineering Sense

24

A new silver bullet

25

The Semantic Web

designed to integrate the vast amounts of heterogeneous online data and services

via dramatically better support at the level of metadata designed to yield the ability to query and integrate across different conceptual systems

26

Tim Berners-Lee, inventor of the internet

‘sees a more powerful Web emerging, one where documents and data will be annotated with special codes allowing computers to search and analyze the Web automatically. The codes … are designed to add meaning to the global network in ways that make sense to computers’

27

hyperlinked vocabularies, called

‘ontologies’ will be used by Web authors ‘to explicitly define their words and

concepts as they post their stuff online.

‘The idea is the codes would let software "agents" analyze the Web on our behalf, making smart inferences that go far beyond the simple linguistic analyses performed by today's search engines.’

28

Exploiting tools such as:

XML

OWL (Ontology Web Language)

RDF (Resource Descriptor Framework)

DAML-OIL (Darpa Agent Mark-Up Language – Ontology Inference Layer)

(confusing syntactic integration with semantic integration)

29

Ontology confused with: the language of ontology

‘Ontology’ for semantic webbers is without content

Philosophical ontology = build a theory of reality

Semantic-web-style ontology = build a model of the data in our computers

30

Defining ‘gene’

GDB: a gene is a DNA fragment that can be transcribed and translated into a protein

Genbank: a gene is a DNA region of biological interest with a name and that carries a genetic trait or phenotype

31

Example: The Enterprise OntologyA Sale is an agreement between two Legal-

Entities for the exchange of a Product for a Sale-Price.

A Strategy is a Plan to Achieve a high-level Purpose.

A Market is all Sales and Potential Sales within a scope of interest.

32

Example: Statements of Accounts

Company Financial statements may be prepared under either the (US) GAAP or the (European) IASC standards

These allocate cost items to different categories depending on the laws of the countries involved.

33

Job:

to develop an algorithm for the automatic conversion of income statements and balance sheets between the two systems.Not even this relatively simple problem has been satisfactorily resolved

… why not?

Because the very same terms mean different things and are applied in different ways in different cultures

34

The Semantic Web Initiative

The Web is a vast edifice of heterogeneous data sources

Needs the ability to query and integrate across different conceptual systems

35

How resolve incompatibilities?

enforce terminological compatibility via standardized term hierarchies, with standardized definitions of terms, which

1. satisfy the constraints of a description logic (DL)

2. are applied as meta-tags to the content of websites

36

Clay Shirky

The Semantic Web is a machine for creating syllogisms.

Humans are mortalGreeks are humanTherefore, Greeks are mortal

37

Lewis Carroll

- No interesting poems are unpopular among people of real taste - No modern poetry is free from affectation - All your poems are on the subject of soap-bubbles - No affected poetry is popular among people of real taste - No ancient poetry is on the subject of soap-bubbles

Therefore: All your poems are bad.

38

the promise of the Semantic Web

it will improve all the areas of your life where you currently use syllogisms

39

We can use the Semantic Webto prove that Joe loves Mary

we found two documents on a trusted site, one of which said that ":Joe :loves :MJS", and another of which said that ":MJS daml:equivalentTo :Mary". We also got the checksums of the files in person from the maintainer of the site.

To check this information, we can list the checksums in a local file, and then set up some FOPL rules that say "if file 'a' contains the information Joe loves mary and has the checksum md5:0qrhf8q3hfh, then record SuccessA", "if file 'b' contains the information MJS is equivalent to Mary, and has the checksum md5:0892t925h, then record SuccessB", and "if SuccessA and SuccessB, then Joe loves Mary". [http://infomesh.net/2001/swintro/]

40

Merging Databases

Merging databases simply becomes a matter of recording in RDF somewhere that "Person Name" in your database is equivalent to "Name" in my database, and then throwing all of the information together and getting a processor to think about it. [http://infomesh.net/2001/swintro/]

Is your "Person Name = John Smith" the same person as my "Name = John Q. Smith"? Who knows? Not the Semantic Web

41

XML-syntax does not help<BUSINESS-CARD>

 <FIRSTNAME>Jules</FIRSTNAME>  <LASTNAME>Deryck</LASTNAME>  <COMPANY>Newco</COMPANY>  <MEMBEROF>XTC Group</MEMBEROF>  <JOBTITLE>Business Manager</JOBTITLE>  <TEL>+32(0)3.471.99.60</TEL>  <FAX>+32(0)3.891.99.65</FAX>  <GSM>+32(0)465.23.04.34</GSM>  <WEBSITE>www.newco.com</WEBSITE>  <ADDRESS>   <STREET>Dendersesteenweg 17</STREET>   <ZIP>2630</ZIP>   <CITY>Aartselaar</CITY>   <COUNTRY>Belgium</COUNTRY>  </ADDRESS> </BUSINESS-CARD>

42

and with correct XML-syntax:<BUSINESS-CARD>

 <FIRSTNAME>Jules</FIRSTNAME>  <LASTNAME>Deryck</LASTNAME>  <COMPANY>Newco</COMPANY>  <MEMBEROF>XTC Group</MEMBEROF>  <JOBTITLE>Business Manager</JOBTITLE>  <TEL>+32(0)3.471.99.60</TEL>  <FAX>+32(0)3.891.99.65</FAX>  <GSM>+32(0)465.23.04.34</GSM>  <WEBSITE>www.newco.com</WEBSITE>  <ADDRESS>   <STREET>Dendersesteenweg 17 </STREET>  

43

and with correct XML-syntax:<BUSINESS-CARD>

 <FIRSTNAME>Jules</FIRSTNAME>  <LASTNAME>Deryck</LASTNAME>  <COMPANY>Newco</COMPANY>  <MEMBEROF>XTC Group</MEMBEROF>  <JOBTITLE>Business Manager</JOBTITLE>  <TEL>+32(0)3.471.99.60</TEL>  <FAX>+32(0)3.891.99.65</FAX>  <GSM>+32(0)465.23.04.34</GSM>  <WEBSITE>www.newco.com</WEBSITE>  <ADDRESS>   <STREET>Dendersesteenweg 17</STREET>   <ZIP>2630</ZIP>   <CITY>Aartselaar</CITY>   <COUNTRY>Belgium</COUNTRY>  </ADDRESS> </BUSINESS-CARD>

Is "Jules" the first name of the person, or of the business-card?

44

and with correct XML-syntax:<BUSINESS-CARD>

 <FIRSTNAME>Jules</FIRSTNAME>  <LASTNAME>Deryck</LASTNAME>  <COMPANY>Newco</COMPANY>  <MEMBEROF>XTC Group</MEMBEROF>  <JOBTITLE>Business Manager</JOBTITLE>  <TEL>+32(0)3.471.99.60</TEL>  <FAX>+32(0)3.891.99.65</FAX>  <GSM>+32(0)465.23.04.34</GSM>  <WEBSITE>www.newco.com</WEBSITE>  <ADDRESS>   <STREET>Dendersesteenweg 17</STREET>   <ZIP>2630</ZIP>   <CITY>Aartselaar</CITY>   <COUNTRY>Belgium</COUNTRY>  </ADDRESS> </BUSINESS-CARD>

Is Jules or Newco the member of XTC Group?

45

and with correct XML-syntax:<BUSINESS-CARD>

 <FIRSTNAME>Jules</FIRSTNAME>  <LASTNAME>Deryck</LASTNAME>  <COMPANY>Newco</COMPANY>  <MEMBEROF>XTC Group</MEMBEROF>  <JOBTITLE>Business Manager</JOBTITLE>  <TEL>+32(0)3.471.99.60</TEL>  <FAX>+32(0)3.891.99.65</FAX>  <GSM>+32(0)465.23.04.34</GSM>  <WEBSITE>www.newco.com</WEBSITE>  <ADDRESS>   <STREET>Dendersesteenweg 17</STREET>   <ZIP>2630</ZIP>   <CITY>Aartselaar</CITY>   <COUNTRY>Belgium</COUNTRY>  </ADDRESS> </BUSINESS-CARD>

Do the phone numbers and address belong to Jules or to the business?

46

Shirkey:

The Semantic Web's philosophical argument -- the world should make more sense than it does -- is hard to argue with. The Semantic Web, with its neat ontologies and its syllogistic logic, is a nice vision. However, like many visions that project future benefits but ignore present costs, it requires too much coordination and too much energy to be effective in the real world …

47

Semantic Web effort

thus far devoted primarily to developing systems for standardized representation of web pages and web processes

(= ontology of web typography)

not to the harder task of developing of ontologies (term hierarchies) for the content of such web pages

48

Cory Doctorow

A world of exhaustive, reliable metadata would be a utopia.

49

Problem 1: People lie

Meta-utopia is a world of reliable metadata. But poisoning the well can confer benefits to the poisoners

Metadata exists in a competitive world.Some people are crooks. Some people are cranks. Some people are French philosophers.

50

Problem 2: People are lazy

Half the pages on Geocities are called “Please title this page”

51

Problem 3: People are stupid

The vast majority of the Internet's users (even those who are native speakers of English)cannot spell or punctuate Will internet users learn to accurately tag their information with whatever DL-hierarchy they're supposed to be using?

52

Problem 4: Ontology Impedance

= semantic mismatch between ontologies being merged

This problem recognized in Semantic Web literature:

http://ontoweb.aifb.uni-karlsruhe.de

/About/Deliverables/ontoweb-del-7.6-swws1.pdf

53

Solution 1:treat it as (inevitable)

‘impedance’

and learn to find ways to cope with the disturbance which it brings

Suggested here:

http://ontoweb.aifb.uni-karls-ruhe.de/Ab-out/Deliverables/ontoweb-del-7.6-swws1.pdf

54

Solution 2: resolve the impedance problem on a case-by-case basis

Suppose two databases are put on the web.

Someone notices that "where" in the friends table and "zip" in the places table mean the same thing.

http://www.w3.org/DesignIssues/Semantic.html

55

Both solutions fail

1. treating mismatches as ‘impedance’ ignores the problem of error propagation

(and is inappropriate in an area like medicine)

2. resolving impedance on a case-by-case basis defeats the very purpose of the Semantic Web

56

Ontology Impedance

‘gene’ used in websites issued by

biotech companies involved in gene patenting

medical researchers interested in role of genes in predisposition to smoking

insurance companies

57

The idea:

distinguish two separate tasks:

- developing an expressively rich correct ontologies of given domains

- developing on this basis computer applications capable of running in real time

58

Basic Formal Ontology

BFOThe Vampire Slayer

59

60

BFO

ontology not the ‘standardization’ or ‘specification’ of concepts

(not a branch of knowledge or concept engineering)

but an inventory of the types of entities existing in reality

61

BFO not a computer application

but a reference ontology

in the sense of Aristotelian philosophyin the sense of Aristotelian philosophy

- it sacrifices tractability for the sake of - it sacrifices tractability for the sake of expressive powerexpressive power

62

Defining ‘gene’

GDB: a gene is a DNA fragment that can be transcribed and translated into a protein

Genbank: a gene is a DNA region of biological interest with a name and that carries a genetic trait or phenotype

63

Ontology

‘fragment’, ‘region’, ‘name’, ‘carry’, ‘trait’, ‘type’

... ‘part’, ‘whole’, ‘function’, ‘inhere’, ‘substance’ …

are ontological terms in the sense of traditional (philosophical) ontology

64

BFO

not just a system of categoriesbut a formal theory with definitions, axioms, theoremsdesigned to provide formal resources for the

building of reference ontologies for specific domains

the latter should be of sufficient richness that terminological incompatibilities can be resolved intelligently rather than by brute force

65

The Reference Ontology Community

IFOMIS (Saarbrücken) Laboratories for Applied Ontology (Trento/Rome,

Turin)Foundational Ontology Project (Leeds)Ontology Works (BaltimoreDepartment of Structural Biology (Seattle)Virtual Soldier Project (DARPA)Open Biological Ontologies Consortium

(Cambridge, Berkeley, Bar Harbor)

66

67

Ontology Tutorial Part 2The Future of Ontology in

Biomedicine

68

Ontology Tutorial Part 2:The Future of Ontology in

Buffalo

69

Ontology Tutorial Part 2The Future of Ontology in

Biomedicine

70

Three senses of ontology

1. Philosophical sense: an inventory of the types of entities and relations in reality

2. Knowledge engineering sense: an ontology as a consensus representation of the concepts used in a given domain

(Semantic Web)

3. Ontology as controlled vocabulary

(Gene Ontology, Open Biological Ontologies Consortium)

71

Philosophical Ontology

Ontologies are WINDOWS ON REALITY

Ontologies deal with classes/universals/invariants in reality

which exist independently of our theorizing

and independently of our language

72

What are universals?

invariants in reality

satisfying biological laws(there are truths about universals in

biological textbooks)

73

A universal is not determined by its instances as a state is not determined by its citizens

A universal may vary with time as an organism may vary with time (by gaining and losing molecules)

74

Universals are Not Sets

A set is an abstract structure, existing outside time and space. The set of Romans timelessly has Julius Caesar as a member.Universals exist in time.

75

A Window on Reality

76

Medical Diagnostic Hierarchy

a hierarchy in the realm of diseases

77

Dependence Relations

Organisms Diseases

78

A Window on Reality

Organisms Diseases

79

A Window on Reality

80

siamese

mammal

cat

organism

substanceuniversals

animal

instances

frog

81

82

Many current standard ‘ontologies’ ramshackle because they have no

counterpart of formal ontologyThe Universal Medical Language System (UMLS)

a compendium of source vocabularies including:

HL7 RIM

SNOMED

International Classification of Diseases

MeSH – Medical Subject Headings

Gene Ontology

83

Three senses of ontology

1. Philosophical sense: an inventory of the types of entities and relations in reality

2. Knowledge engineering sense: an ontology as a consensus representation of the concepts used in a given domain(Semantic Web)

3. Ontology as controlled vocabulary(Gene Ontology, Open Biological Ontologies Consortium)

84

Problem: The different source vocabularies are incompatible with

each other

85

Problem: They contain bad coding

which often derives from failure to pay attention to simple logical or ontological principles or from principles of good definitions

86

Bad Coding

Plant roots is-a Plant

Plant leaves is-a Plant

Pollen is-a Plant

Both testes is a testis

Both uterii is a uterus

87

Bad definitions

Heptolysis =def the cause of heptolysis

Biological process =def a biological goal that requires more than one function

88

The Concept Orientation

Work on biomedical ontologies grew out of work on medical dictionaries and nomenclatures

Has focused almost exclusively on ‘concepts’ conceived (sometimes confused with terms/descriptions).

89

The Curse of Linguistics

Work on biomedical ontologies grew out of work on medical dictionaries and nomenclatures

This led to the assumption that all that need be said about classes can be said without appeal to time or to instances in reality.

Ontology is about meanings/terms/strings

90

An alternative research programme for ontology

based on philosophical principles

Terms in bio-ontologies refer not to ‘concepts’

but to universals in reality

91

already reformed

Foundational Model of Anatomy Anatomy Reference Ontology

92

Anatomical Entity

Physical Anatomical Entity

Material Physical Anatomical Entity

-is a-

Non-material Physical Anatomical Entity

ConceptualAnatomical Entity

AnatomicalStructure

BodySubstance

BodyPart

HumanBody

OrganSystem

OrganCell

OrganPart

AnatomicalSpace

Anatomical Relationship

CellPart

Biological Macromolecule

Tissue

93

Anatomical Entity

Physical Anatomical Entity

Material Physical Anatomical Entity

-is a-

Non-material Physical Anatomical Entity

ConceptualAnatomical Entity

AnatomicalStructure

BodySubstance

BodyPart

HumanBody

OrganSystem

OrganCell

OrganPart

AnatomicalSpace

Anatomical Relationship

CellPart

Biological Macromolecule

Tissue

A window on reality

94

Pleural Cavity

Pleural Cavity

Interlobar recess

Interlobar recess

Mesothelium of Pleura

Mesothelium of Pleura

Pleura(Wall of Sac)

Pleura(Wall of Sac)

VisceralPleura

VisceralPleura

Pleural SacPleural Sac

Parietal Pleura

Parietal Pleura

Anatomical SpaceAnatomical Space

OrganCavityOrganCavity

Serous SacCavity

Serous SacCavity

AnatomicalStructure

AnatomicalStructure

OrganOrgan

Serous SacSerous Sac

MediastinalPleura

MediastinalPleura

TissueTissue

Organ PartOrgan Part

Organ Subdivision

Organ Subdivision

Organ Component

Organ Component

Organ CavitySubdivision

Organ CavitySubdivision

Serous SacCavity

Subdivision

Serous SacCavity

Subdivision

95

To represent ontological relations we need to take instances into account

To say A part_of B is not to say anything about Bs’ need for As as parts

96

part_of as a relation between universals

A part_of B =def

given any x, if inst(x, A) then there is some y such that inst(y, B) and part(x, y)

human testis part_of human being,

But not:

heart part_of human being.

97

already reformed

Foundational Model of Anatomy Anatomy Reference Ontology

98

under construction / overhaul

Physiology Reference Ontology

Gene Ontology

OBOL

99

The Gene Ontology

a controlled vocabulary for annotations of genes and gene products

100

When a gene is identified

three important types of questions need to be addressed:

1. Where is it located in the cell?

2. What functions does it have on the molecular level?

3. To what biological processes do these functions contribute?

101

GO has three ontologies

molecular functions

cellular components

biological processes

102

GO astonishingly influential

used by all major species genome projects

used by all major pharmacological research groups

used by all major bioinformatics research groups

103

GO part of the Open Biological Ontologies consortium

Fungal Ontology

Plant Ontology

Yeast Ontology

Disease Ontology

Mouse Anatomy Ontology

Cell Ontology

Sequence Ontology

Relations Ontology

104

Each of GO’s ontologies

is organized in a graph-theoretical structure involving two sorts of links or edges:

is-a (= is a subtype of )

(copulation is-a biological process)

part-of

(cell wall part-of cell)

105

106

107

cellular components

molecular functions

biological processes

1372 component terms

7271 function terms

8069 process terms

108

The Cellular Component Ontology (counterpart of anatomy)

flagellum

chromosome

membrane

cell wall

nucleus

109

The Molecular Function Ontology

ice nucleation

protein stabilization

kinase activity

binding

The Molecular Function ontology is (roughly) an ontology of actions on the molecular level of granularity

110

Biological Process Ontology

glycolysis

copulation

death

An ontology of occurrents on the level of granularity of cells, organs and whole organisms

111

GO built by biologists

free of the Curse of Linguistics

free of the Curse of Computer Science

112

but problems still remain

menopause part_of aging

aging part_of death

menopause part_of death

113

heptolysis

Definition

The causes of heptolysis …

114

regulation of sleep part_of sleep

extrinsic to membrane part_of membrane

115

GO uses only two relations

is_a and part_of

116

hence GO has only sentences of the forms A is_a B and A part_of B

no way to express ‘not’ and no way to express ‘is localized at’ and no way to express ‘I don’t know’:

117

Holliday junction helicase complex

is-a

unlocalized

cellular component unknown is-a cellular component

118

Old GO definition of part_of

A part_of B =def A can be part of B

119

New GO definition of part_of as part of current OBOL reform effort

A part_of B =def

given any x, if inst(x, A) then there is some y such that inst(y, B) and part(x, y)

120

Analogous problems for nearly all foundational relations of ontologies and semantic networks:

A causes B

A is associated with B

A is located in B

etc.

Reference to instances is necessary to clear up these problems

121

122

The Future of Ontology in Buffalo

http://ontology.buffalo.edu/bcor/

to provide a forum within which philosophical ontologists and those involved in ontology

applications can work together in high-level interdisciplinary research

to assist in coordination and integration of projects in ontological research being

pursued in Buffalo

123

Gary Byrd

Charles Dement

Randall Dipert

John Eisner

Daniel Fischer

Louis Goldberg

Jorge Gracia

David Hershenov

Rajiv Kishore

Eric Little

James Llinas

David Mark

Bill Rapaport

Galina Rogova

Ram Ramesh

Stuart C. Shapiro

Barry Smith

Rohini Srihari

Moises Sudit

124

College of Arts and Sciences

Computer Science and Engineering

School of Management

Center of Excellence in Bioinformatics

School of Informatics

School of Dental Medicine

Center for Multisource Information Fusion

National Center for Geographic Information and Analysis

School of Medicine and Biomedical Sciences

125

Computer Science and Engineering

School of Management

Charles Dement

Pharma of the Future

126

Computer Science and Engineering

Daniel Fischer

Bill Rapaport

Stuart Shapiro

Rohini Srihari

127

School of Management

Ram Ramesh

Rajiv Kishore

128

Center of Excellence in Bioinformatics

Daniel Fischer

129

School of Informatics / School of Medicine

Gary Byrd

Medical Informatics Certificate Program

130

School of Dental Medicine

John Eisner

Louis Goldberg

SNODENT

131

Center for Multisource Information Fusion

Eric Little

James Llinas

Galina Rogova

Moises Sudit

132

National Center for Geographic Information and Analysis

David Mark

Barry Smith

133

Department of Philosophy

Barry Smith (Director?)

Randall Dipert

Jorge Gracia 

David Hershenov

Ingvar Johansson

Jiyuan Yu

134

Goal

To show how philosophical ontology can contribute to the successful application of ontologies in information systems

top related