ontologies and ontology alignmentpatla00/talks/lambrix-harbin.pdf · + z z oh hpp >> 7ol lzl...

23
1 Ontologies and Ontology alignment Patrick Lambrix Linköpings universitet HIT-MSRA 2008 Summer School, Harbin, China, July 2008. Outline Part I: Semantic Web and Ontologies Part II: Ontology alignment Part I Semantic Web and Ontologies Part I: Semantic Web and ontologies Semantic Web Semantic Web Ontologies Definition Use Components Knowledge representation GET THAT PROTEIN! Where? Which? How? Vision: Web services - Databases and tools (service providers) announce their service capabilities - Users request services which may be based on task descriptions - Service matchers find relevant services (composition) based on user needs and user preferences, negotiate service delivery, and deliver results to user Locating relevant information

Upload: others

Post on 04-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ontologies and Ontology alignmentpatla00/talks/lambrix-harbin.pdf · + z z oh hpp >> 7ol lzl l o p l h p hz zk pkl li j ml l jl 5lnp l lk h pjp h j p n m h hzph jh

1

Ontologies and Ontology alignment

Patrick LambrixLinköpings universitet

HIT-MSRA 2008 Summer School, Harbin, China, July 2008.

Outline

Part I: Semantic Web and Ontologies

Part II: Ontology alignment

Part ISemantic Web and Ontologies

Part I: Semantic Web and ontologies

Semantic Web Semantic Web

OntologiesDefinition

Use

Components

Knowledge representation

GET THAT PROTEIN!Where?

Which?How?

Vision: Web services

- Databases and tools (service providers) announce their service capabilities- Users request services which may be based on task descriptions- Service matchers find relevant services (composition) based on user needs and user preferences, negotiate service delivery, and deliver results to user

Locating relevant information

Page 2: Ontologies and Ontology alignmentpatla00/talks/lambrix-harbin.pdf · + z z oh hpp >> 7ol lzl l o p l h p hz zk pkl li j ml l jl 5lnp l lk h pjp h j p n m h hzph jh

2

Vision:

Based on the meaning of the query:- only relevant information is retrieved- all relevant information is retrieved

Retrieving relevant information

Vision:

Integrate data sources that are heterogeneous in content, data quality, data models, access methods, terminology

Disease information

Targetstructure

Chemicalstructure

Disease models

Clinicaltrials

Metabolism,toxicology

Genomics

DISCOVERY

Integrating information

A library of documents (web pages) interconnected by linksA common portal to applications accessible through web pages, and presenting their results as web pages

A place where computers do the presentation (easy) and people do the linking and interpreting (hard).

Today: syntactic WebW3C: Facilities to put machine-understandable data

on the Web are becoming a high priority for many communities. The Web can reach its full potential only if it becomes a place where data can be shared and processed by automated tools as well as by people. For the Web to scale, tomorrow's programs must be able to share and process data even when these programs have been designed totally independently. The Semantic Web is a vision: the idea of having data on the web defined and linked in a way that it can be used by machines not just for display purposes, but for automation, integration and reuse of data across various applications.

Semantic Web

What is the problem?

Example based on example on slides by P. Patel-Schneider

What information can we see…

Date: 13-15 June, 2005Location: LinköpingSponsors: IEEE, CERC, LiU14th IEEE International Workshops on

Enabling Technologies: Infrastructures for Collaborating Enterprises (WETICE-2005)

Welcome to WETICE-2005

Page 3: Ontologies and Ontology alignmentpatla00/talks/lambrix-harbin.pdf · + z z oh hpp >> 7ol lzl l o p l h p hz zk pkl li j ml l jl 5lnp l lk h pjp h j p n m h hzph jh

3

What information can a machine see…

Use XML markup with “meaningful” tags

<date> 13-15 June 2005 </date>

<location> Linköping </location>

<sponsors>IEEE, CERC, LiU </sponsors>

<name> 14th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborating Enterprises (WETICE-2005) </name>

<welcome> Welcome to WETICE-2005 </welcome>

Machine sees …

<date> </date>

<location> </location>

<sponsors>

</sponsors>

<name> </name>

<welcome> </welcome>

But what about …

<date> 13-15 June 2005 </date>

<place> Linköping </place>

<sponsors>IEEE, CERC, LiU </sponsors>

<conf> 14th IEEE International Workshops on Enabling Technologies: Infrastructures for Collaborating Enterprises (WETICE-2005) </conf>

<introduction> Welcome to WETICE-2005 </introduction>

Machine sees …<date> </date>

< > </ >

<sponsors>

</sponsors>

< >

</ >

< > </ >

Adding “Semantics” – first approach

External agreement on meaning of annotationsAgree on the meaning of a set of annotation tagsProblems with this approach:

Inflexible Limited number of things can be expressed

Page 4: Ontologies and Ontology alignmentpatla00/talks/lambrix-harbin.pdf · + z z oh hpp >> 7ol lzl l o p l h p hz zk pkl li j ml l jl 5lnp l lk h pjp h j p n m h hzph jh

4

Adding “Semantics” – second approach

Use on-line ontologies to specify meaning of annotations

Ontologies provide a vocabulary of termsNew terms can be formed by combining existing onesMeaning (semantics) of such terms is formally specified

First step towards the vision: adding semantic annotation to web resources

Scientific American, May 2001:

Semantic annotations based on ontologies

Locating informationWeb service descriptions use ontologiesUsers use ontologies when formulating requestsService matchers find services based on meaning

Retrieving relevant informationReduce non-relevant information (precision)Find more relevant information (recall)

Integrating informationRelating similar entities in different databases

Part I: Semantic Web and ontologies

Semantic Web

OntologiesOntologiesDefinition

Use

Components

Knowledge representation

Ontologies

“Ontologies define the basic terms and relations comprising the vocabulary of a topic area, as well as the rules for combining terms and relations to define extensions to the vocabulary.”(Neches, Fikes, Finin, Gruber, Senator, Swartout, 1991)

Definitions

Ontology as specification of a conceptualizationOntology as philosophical disciplineOntology as informal conceptual systemOntology as formal semantic accountOntology as representation of conceptual system via a logical theoryOntology as the vocabulary used by a logical theoryOntology as a meta-level specification of a logical theory

(Guarino, Giaretta)

Page 5: Ontologies and Ontology alignmentpatla00/talks/lambrix-harbin.pdf · + z z oh hpp >> 7ol lzl l o p l h p hz zk pkl li j ml l jl 5lnp l lk h pjp h j p n m h hzph jh

5

Definitions

An ontology is an explicit specification of a conceptualization (Gruber)An ontology is a hierarchically structured set of terms for describing a domain that can be used as a skeletal foundation for a knowledge base. (Swartout, Patil, Knight, Russ)An ontology provides the means for describing explicitly the conceptualization behind the knowledge represented in a knowledge base. (Bernaras, Lasergoiti, Correra)An ontology is a formal, explicit specification of a shared conceptualization (Studer, Benjamins, Fensel)

Example GENE ONTOLOGY (GO)

immune response i- acute-phase response i- anaphylaxis i- antigen presentation i- antigen processingi- cellular defense responsei- cytokine metabolism

i- cytokine biosynthesis synonym cytokine production…

p- regulation of cytokine biosynthesis…

…i- B-cell activation

i- B-cell differentiation i- B-cell proliferation

i- cellular defense response …i- T-cell activation

i- activation of natural killer cell activity …

Example OntologiesKnowledge representation ontology: frame ontologyTop level ontologies: TLO, CycLinguistic ontologies: GUM, WordNetEngineering ontologies: EngMath, PhysSysDomain ontologies: CHEMICALS, Gene Ontology, Open Biomedical Ontologies

Ontologies used …for communication between people and organizationsfor enabling knowledge reuse and sharingas basis for interoperability between systemsas repository of informationas query model for information sources

Key technology for the Semantic Web

Ontologies in biomedical research

many biomedical ontologiese.g. GO, OBO, SNOMED-CT

practical use of biomedical ontologiese.g. databases annotated with GO

GENE ONTOLOGY (GO)

immune response i- acute-phase response i- anaphylaxis i- antigen presentation i- antigen processingi- cellular defense responsei- cytokine metabolism

i- cytokine biosynthesissynonym cytokine production…

p- regulation of cytokine biosynthesis

……i- B-cell activation

i- B-cell differentiation i- B-cell proliferation

i- cellular defense response …i- T-cell activation

i- activation of natural killer cell activity

Componentsconcepts - represent a set or class of entities in a domain

immune response- organized in taxonomies (hierarchies based on e.g. is-a or is-part-of)

immune response is-a defense response

instances - often not represented in an ontology

(instantiated ontology)

Page 6: Ontologies and Ontology alignmentpatla00/talks/lambrix-harbin.pdf · + z z oh hpp >> 7ol lzl l o p l h p hz zk pkl li j ml l jl 5lnp l lk h pjp h j p n m h hzph jh

6

Components

relations

R: C1 x C2 x … x Cn

Protein hasName ProteinName

Chromosone hasSubcellularLocationNucleus

Componentsaxioms‘facts that are always true’

The origin of a protein is always of the type ‘gene coding origin type’

Each protein has at least one source. A helix can never be a sheet and vice versa.

Different kinds of ontologies

Controlled vocabulariesConcepts

TaxonomiesConcepts, is-a

ThesauriConcepts, predefined relations

Data models (e.g. EER, UML)Concepts, relations, axioms

LogicsConcepts, relations, axioms

Taxonomy - GeneOntologyid: GO:0003674 name: molecular_functiondef: “Elemental activities, such as catalysis or binding, describing the actions of a gene product at the

molecular level. A given gene product may exhibit one or more molecular functions.”

id: GO:0015643 name: bindingdef: “The selective, often stoichiometric, interaction of a molecule with one or more specific sites on

another molecule.”is-a: GO:0003674 ! molecular_function

id: GO:0008289 name: lipid bindingis_a: GO:0015643 ! binding

id: GO:0016209 name: antioxidant activitydef: “Inhibition of the reactions brought about by dioxygen (O2) or peroxides. Usually the

antioxidant is effective because it can itself be more easily oxidized than the substance protected.”

is_a: GO:0003674 ! molecular_function

id: GO:0004601 name: peroxidase activitydef: "Catalysis of the reaction: donor + H2O2 = oxidized donor + 2 H2O." is_a: GO:0016209 ! antioxidant activityis_a: GO:0016684 ! oxidoreductase activity, acting on peroxide as acceptor

Taxonomy - GeneOntology

molecular function

peroxidase activity

antioxidant activity

lipid binding

bindingoxidoreductase activity, acting on peroxide as acceptor

Thesaurus

graph fixed set of relations

(synonym, narrower term, broader term, similar)

Page 7: Ontologies and Ontology alignmentpatla00/talks/lambrix-harbin.pdf · + z z oh hpp >> 7ol lzl l o p l h p hz zk pkl li j ml l jl 5lnp l lk h pjp h j p n m h hzph jh

7

Thesaurus - WordNetthesaurus, synonym finder

=> wordbook

=> reference book, reference, reference work, book of facts

=> book

=> publication

=> print media

=> medium

=> means

=> instrumentality, instrumentation

=> artifact, artefact

=> object, inanimate object, physical object

=> entity

=> work, piece of work

=> product, production

=> creation

=> artifact, artefact

=> object, inanimate object, physical object

=> entity

OO Data modelsEERentity types, attributes, relationships, cardinality constraints, taxonomy

UMLclasses, attributes, associations, cardinality constraints, taxonomy, operations

Taxonomy/inheritance – semantics?Intuitive, lots of tools, widely used.

Reference

protein-id

accession definition

source

article-id

title

author

PROTEIN

ARTICLE

m

n

Entity-relationship UML

RDF + RDF Schema

Basic construct: sentence: Subject Predicate Object

Encoded in XML

Can be seen as ground atomic formula

Represented as graph

RDF Schema

Editors, query tools exist

RDF Schema - example

rdfs:Resource

xyz:MotorVehiclerdfs:Class

ss

t

t

xyz:Truck

s

t

xyz:PassengerVehicle

s = rdfs:subClassOft = rdf:type

xyz:Van

s s

xyz:MiniVan s

s

tt

t

t

Page 8: Ontologies and Ontology alignmentpatla00/talks/lambrix-harbin.pdf · + z z oh hpp >> 7ol lzl l o p l h p hz zk pkl li j ml l jl 5lnp l lk h pjp h j p n m h hzph jh

8

Logics

Formal languages

Syntax, semantics, inference mechanisms

LogicsReasoning services used in

Ontology designCheck concept satisfiability, ontology satisfiability and (unexpected)

implied relationships

Ontology aligning and mergingAssert inter-ontology relationships.Reasoner computes integrated concept hierarchy/consistency.

Ontology deploymentDetermine if a set of facts are consistent w. r. t. ontology.Determine if individuals are instances of ontology concepts.Query inclusion.Classification-based querying.

Description Logics

A family of KR formalisms tailored for expressing knowledge about concepts and concept hierarchiesBased on FOPL, supported by automatic reasoning systemsBasic building blocks: concepts (concepts), roles (binary relations), individuals (instances)Language constructs can be used to define new concepts and roles (axioms).

Intersection, union, negation, quantification, …

Knowledge base is Tbox + AboxTbox: concept level - axioms: equality and subsumption (is-a)Abox: instance level - axioms: membership, relations

Reasoning servicesSatisfiability of concept, Subsumption/Equivalence/Disjointness between concepts, Classification, Instantiation, Retrieval

Description Logics

IntersectionSignal-transducer-activity ∩ binding

Negation¬ Helix

Quantifiers∃ hasOrigin.Mitochondrion∀ hasOrigin.Gene-coding-origin-type

OWL

OWL-Lite, OWL-DL, OWL-Full: increasing expressivityA legal OWL-Lite ontology is a legal OWL-DL ontology is a legal OWL-Full ontologyOWL-DL: expressive description logic, decidableXML-basedRDF-based (OWL-Full is extension of RDF, OWL-Lite and OWL-DL are extensions of a restriction of RDF)

OWL-Lite

Class, subClassOf, equivalentClassintersectionOf (only named classes and restrictions)Property, subPropertyOf, equivalentPropertydomain, range (global restrictions)inverseOf, TransitiveProperty (*), SymmetricProperty, FunctionalProperty, InverseFunctionalPropertyallValuesFrom, someValuesFrom (local restrictions)minCardinality, maxCardinality (only 0/1)Individual, sameAs, differentFrom, AllDifferent

(*) restricted

Page 9: Ontologies and Ontology alignmentpatla00/talks/lambrix-harbin.pdf · + z z oh hpp >> 7ol lzl l o p l h p hz zk pkl li j ml l jl 5lnp l lk h pjp h j p n m h hzph jh

9

OWL-DL

Type separation (class cannot also be individual or property, property cannot be also class or individual), Separation between DatatypePropertiesand ObjectPropertiesClass –complex classes, subClassOf, equivalentClass, disjointWithintersectionOf, unionOf, complementOfProperty, subPropertyOf, equivalentPropertydomain, range (global restrictions)inverseOf, TransitiveProperty (*), SymmetricProperty, FunctionalProperty, InverseFunctionalPropertyallValuesFrom, someValuesFrom (local restrictions), oneOf, hasValueminCardinality, maxCardinalityIndividual, sameAs, differentFrom, AllDifferent

(*) restricted

The Celestial Emporium of Benevolent Knowledge, Borges"On those remote pages it is written that animals are divided into:a. those that belong to the Emperor b. embalmed ones c. those that are trained d. suckling pigse. mermaids f. fabulous ones g. stray dogs h. those that are included in this classificationi. those that tremble as if they were mad j. innumerable ones k. those drawn with a very fine camel's hair brush l. others m. those that have just broken a flower vase n. those that resemble flies from a distance"

Defining ontologies is not so easy ...

Slide from talk by C. Goble

Defining ontologies is not so easy ...Dyirbal classification of objects in the universe

Bayi: men, kangaroos, possums, bats, most snakes, most fishes, some birds, most insects, the moon, storms, rainbows, boomerangs, some spears, etc.Balan: women, anything connected with water or fire,bandicoots, dogs, platypus, echidna, some snakes, some fishes, most birds, fireflies, scorpions, crickets, the stars, shields, some spears, some trees, etc.Balam: all edible fruit and the plants that bear them, tubers, ferns, honey, cigarettes, wine, cake.Bala: parts of the body, meat, bees, wind, yamsticks, some spears, most trees, grass, mud, stones, noises, language, etc.

Slide from talk by C. Goble

Ontology tools

Ontology development tools

Ontology merge and alignment tools

Ontology evaluation tools

Ontology-based annotation tools

Ontology storage and querying tools

Ontology learning tools

Part IIOntology Alignment

Part II – Ontology Alignment

Ontology alignmentOntology alignment

Ontology alignment strategies

Evaluation of ontology alignment strategies

Recommending ontology alignment strategies

Current issues

Page 10: Ontologies and Ontology alignmentpatla00/talks/lambrix-harbin.pdf · + z z oh hpp >> 7ol lzl l o p l h p hz zk pkl li j ml l jl 5lnp l lk h pjp h j p n m h hzph jh

10

Ontologies in biomedical research

many biomedical ontologiese.g. GO, OBO, SNOMED-CT

practical use of biomedical ontologiese.g. databases annotated with GO

GENE ONTOLOGY (GO)

immune response i- acute-phase response i- anaphylaxis i- antigen presentation i- antigen processingi- cellular defense responsei- cytokine metabolism

i- cytokine biosynthesissynonym cytokine production…

p- regulation of cytokine biosynthesis

……i- B-cell activation

i- B-cell differentiation i- B-cell proliferation

i- cellular defense response …i- T-cell activation

i- activation of natural killer cell activity

Ontologies with overlapping information

SIGNAL-ONTOLOGY (SigO)

Immune Responsei- Allergic Responsei- Antigen Processing and Presentationi- B Cell Activationi- B Cell Developmenti- Complement Signaling

synonym complement activation i- Cytokine Response i- Immune Suppression i- Inflammation i- Intestinal Immunity i- Leukotriene Response

i- Leukotriene Metabolism i- Natural Killer Cell Responsei- T Cell Activationi- T Cell Development i- T Cell Selection in Thymus

GENE ONTOLOGY (GO)

immune responsei- acute-phase response i- anaphylaxis i- antigen presentationi- antigen processingi- cellular defense responsei- cytokine metabolism

i- cytokine biosynthesissynonym cytokine production…

p- regulation of cytokine biosynthesis

……i- B-cell activation

i- B-cell differentiation i- B-cell proliferation

i- cellular defense response …i- T-cell activation

i- activation of natural killer cell activity

Ontologies with overlapping information

Use of multiple ontologies e.g. custom-specific ontology + standard ontology

Bottom-up creation of ontologiesexperts can focus on their domain of expertise

important to know the interimportant to know the inter--ontology ontology relationshipsrelationships

SIGNAL-ONTOLOGY (SigO)

Immune Responsei- Allergic Responsei- Antigen Processing and Presentationi- B Cell Activation i- B Cell Developmenti- Complement Signaling

synonym complement activation i- Cytokine Response i- Immune Suppression i- Inflammation i- Intestinal Immunity i- Leukotriene Response

i- Leukotriene Metabolism i- Natural Killer Cell Response i- T Cell Activation i- T Cell Development i- T Cell Selection in Thymus

GENE ONTOLOGY (GO)

immune response i- acute-phase response i- anaphylaxis i- antigen presentation i- antigen processingi- cellular defense responsei- cytokine metabolism

i- cytokine biosynthesissynonym cytokine production…

p- regulation of cytokine biosynthesis

……i- B-cell activation

i- B-cell differentiation i- B-cell proliferation

i- cellular defense response …i- T-cell activation

i- activation of natural killer cell activity

Ontology Alignment

equivalent concepts

equivalent relations

is-a relation

SIGNAL-ONTOLOGY (SigO)

Immune Responsei- Allergic Responsei- Antigen Processing and Presentationi- B Cell Activationi- B Cell Developmenti- Complement Signaling

synonym complement activation i- Cytokine Response i- Immune Suppression i- Inflammation i- Intestinal Immunity i- Leukotriene Response

i- Leukotriene Metabolism i- Natural Killer Cell Responsei- T Cell Activationi- T Cell Development i- T Cell Selection in Thymus

GENE ONTOLOGY (GO)

immune responsei- acute-phase response i- anaphylaxis i- antigen presentationi- antigen processingi- cellular defense responsei- cytokine metabolism

i- cytokine biosynthesissynonym cytokine production…

p- regulation of cytokine biosynthesis

……i- B-cell activation

i- B-cell differentiation i- B-cell proliferation

i- cellular defense response …i- T-cell activation

i- activation of natural killer cell activity

Defining the relations between the terms in different ontologies

Part II – Ontology Alignment

Ontology alignment

Ontology alignment strategiesOntology alignment strategies

Evaluation of ontology alignment strategies

Recommending ontology alignment strategies

Current issues

Page 11: Ontologies and Ontology alignmentpatla00/talks/lambrix-harbin.pdf · + z z oh hpp >> 7ol lzl l o p l h p hz zk pkl li j ml l jl 5lnp l lk h pjp h j p n m h hzph jh

11

An Alignment Framework

According to inputKR: OWL, UML, EER, XML, RDF, …

components: concepts, relations, instance, axioms

According to processWhat information is used and how?

According to output1-1, m-n

Similarity vs explicit relations (equivalence, is-a)

confidence

Classification

Matchers

Strategies based on linguistic matching

Structure-based strategies

Constraint-based approaches

Instance-based strategies

Use of auxiliary information

Matcher Strategies

Strategies based on linguistic matchingStrategies based on linguistic matching

SigO: complement signalingsynonym complement activation

GO: Complement Activation

Example matchers

Edit distanceNumber of deletions, insertions, substitutions required to transform one string into anotheraaaa baab: edit distance 2

N-gramN-gram : N consecutive characters in a stringSimilarity based on set comparison of n-gramsaaaa : {aa, aa, aa}; baab : {ba, aa, ab}

Matcher Strategies

Strategies based on linguistic matching

StructureStructure--based strategiesbased strategies

Constraint-based approaches

Instance-based strategies

Use of auxiliary information

Page 12: Ontologies and Ontology alignmentpatla00/talks/lambrix-harbin.pdf · + z z oh hpp >> 7ol lzl l o p l h p hz zk pkl li j ml l jl 5lnp l lk h pjp h j p n m h hzph jh

12

Example matchers

Propagation of similarity valuesAnchored matching

Example matchers

Propagation of similarity valuesAnchored matching

Example matchers

Propagation of similarity valuesAnchored matching

Matcher Strategies

Strategies based on linguistic matching

Structure-based strategies

ConstraintConstraint--basedbased approachesapproaches

Instance-based strategies

Use of auxiliary information

O1O2

Bird

Mammal Mammal

FlyingAnimal

Matcher Strategies

Strategies based on linguistic matching

Structure-based strategies

ConstraintConstraint--basedbased approachesapproaches

Instance-based strategies

Use of auxiliary information

O1O2

Bird

Mammal Mammal

Stone

Example matchers

Similarities between data typesSimilarities based on cardinalities

Page 13: Ontologies and Ontology alignmentpatla00/talks/lambrix-harbin.pdf · + z z oh hpp >> 7ol lzl l o p l h p hz zk pkl li j ml l jl 5lnp l lk h pjp h j p n m h hzph jh

13

Matcher Strategies

Strategies based on linguistic matching

Structure-based strategies

Constraint-based approaches

InstanceInstance--basedbased strategiesstrategies

Use of auxiliary information

Ontology

instancecorpus

Example matchers

Instance-based

Use life science literature as instances

Structure-based extensions

Learning matchers – instance-based strategies

Basic intuition A similarity measure between concepts can be

computed based on the probability that documents about one concept are also about the other concept and vice versa.

Intuition for structure-based extensionsDocuments about a concept are also about their

super-concepts.

(No requirement for previous alignment results.)

Learning matchers - steps

Generate corporaUse concept as query term in PubMed

Retrieve most recent PubMed abstracts

Generate text classifiersOne classifier per ontology / One classifier per concept

ClassificationAbstracts related to one ontology are classified by the otherontology’s classifier(s) and vice versa

Calculate similarities

Basic Naïve Bayes matcherGenerate corpora

Generate classifiersNaive Bayes classifiers, one per ontology

ClassificationAbstracts related to one ontology are classified to the concept in the other ontology with highestposterior probability P(C|d)

Calculate similarities

Basic Support Vector Machines matcher

Generate corporaGenerate classifiers

SVM-based classifiers, one per conceptClassification

Single classification variant: Abstracts related to concepts in one ontology are classified to the concept in the otherontology for which its classifier gives the abstract the highestpositive value.Multiple classification variant: Abstracts related to conceptsin one ontology are classified all the concepts in the otherontology whose classifiers give the abstract a positive value.

Calculate similarities

Page 14: Ontologies and Ontology alignmentpatla00/talks/lambrix-harbin.pdf · + z z oh hpp >> 7ol lzl l o p l h p hz zk pkl li j ml l jl 5lnp l lk h pjp h j p n m h hzph jh

14

Structural extension ‘Cl’

Generate classifiersTake (is-a) structure of the ontologies into account whenbuilding the classifiers

Extend the set of abstracts associated to a concept by addingthe abstracts related to the sub-concepts

C1

C3

C4

C2

Structural extension ‘Sim’

Calculate similaritiesTake structure of the ontologies into account whencalculating similarities

Similarity is computed based on the classifiers appliedto the concepts and their sub-concepts

Matcher Strategies

Strategies based linguistic matching

Structure-based strategies

Constraint-based approaches

Instance-based strategies

UseUse of of auxiliaryauxiliary informationinformation

thesauri

alignment strategies

dictionary

intermediateontology

Example matchers

Use of WordNetUse WordNet to find synonyms

Use WordNet to find ancestors and descendants in the is-a hierarchy

Use of Unified Medical Language System (UMLS)Includes many ontologies

Includes many alignments (not complete)

Use UMLS alignments in the computation of the similarity values

Ontology

Alignm

entand Mergning

Systems

Combinations

Page 15: Ontologies and Ontology alignmentpatla00/talks/lambrix-harbin.pdf · + z z oh hpp >> 7ol lzl l o p l h p hz zk pkl li j ml l jl 5lnp l lk h pjp h j p n m h hzph jh

15

Combination Strategies

Usually weighted sum of similarity values of different matchers Filtering

Threshold filteringPairs of concepts with similarity higher or equal

than threshold are alignment suggestions

Filtering techniques

th

( 2, B )

( 3, F )

( 6, D )

( 4, C )

( 5, C )

( 5, E )

……

suggest

discard

sim

Filtering techniques

lower-th

( 2, B )

( 3, F )

( 6, D )

( 4, C )

( 5, C )

( 5, E )

……

upper-th

Double threshold filtering(1) Pairs of concepts with similarity higher than or equal to upper threshold are

alignment suggestions

(2) Pairs of concepts with similarity between lower and upper thresholds are alignment suggestions if they make sense with respect to the structure of the ontologies and the suggestions according to (1)

Example alignment system SAMBO – matchers, combination, filter

Example alignment system SAMBO – suggestion mode

Page 16: Ontologies and Ontology alignmentpatla00/talks/lambrix-harbin.pdf · + z z oh hpp >> 7ol lzl l o p l h p hz zk pkl li j ml l jl 5lnp l lk h pjp h j p n m h hzph jh

16

Example alignment system SAMBO – manual mode

Part II – Ontology Alignment

Ontology alignment

Ontology alignment strategies

Evaluation of ontology alignment strategies Evaluation of ontology alignment strategies

Recommending ontology alignment strategies

Current issues

Evaluation measuresPrecision: # correct suggested alignments

# suggested alignments Recall: # correct suggested alignments

# correct alignments F-measure: combination of precision and recall

Ontology AlignmentEvaluation Initiative

OAEI

Since 2004

Evaluation of systems

Different trackscomparison: benchmark (open)

expressive: anatomy (blind)

directories and thesauri: directory, food, environment, library (blind)

consensus: conference

OAEI

Evaluation measuresPrecision/recall/f-measure

recall of non-trivial alignments

full / partial golden standard

Page 17: Ontologies and Ontology alignmentpatla00/talks/lambrix-harbin.pdf · + z z oh hpp >> 7ol lzl l o p l h p hz zk pkl li j ml l jl 5lnp l lk h pjp h j p n m h hzph jh

17

OAEI 200717 systems participated

benchmark (13)ASMOV: p = 0.95, r = 0.90

anatomy (11) AOAS: f = 0.86, r+ = 0.50SAMBO: f =0.81, r+ = 0.58

library (3)Thesaurus merging: FALCON: p = 0.97, r = 0.87Annotation scenario:

FALCON: pb =0.65, rb = 0.49, pa = 0.52, ra = 0.36, Ja = 0.30Silas: pb = 0.66, rb= 0.47, pa = 0.53, ra = 0.35, Ja = 0.29

directory (9), food (6), environment (2), conference (6)

OAEI 2007

Systems can use only one combination of strategies per task

systems use similar strategiestext: string matching, tf-idf

structure: propagation of similarity to ancestors and/or descendants

thesaurus (WordNet)

domain knowledge important for anatomy task

Evaluation of algorithms

CasesGO vs. SigO

MA vs. MeSH

GO-immune defense

GO: 70 terms SigO: 15 terms

SigO-immune defense GO-behaviorGO: 60 terms SigO: 10 terms

SigO-behavior

MA-eyeMA: 112terms MeSH: 45 terms

MeSH-eye

MA-noseMA: 15 terms MeSH: 18 terms

MeSH-nose MA-earMA: 77 terms MeSH: 39 terms

MeSH-ear

Evaluation of matchersMatchers

Term, TermWN, Dom, Learn (Learn+structure), Struc

ParametersQuality of suggestions: precision/recall

Threshold filtering : 0.4, 0.5, 0.6, 0.7, 0.8

Weights for combination: 1.0/1.2

KitAMO(http://www.ida.liu.se/labs/iislab/projects/KitAMO)

Results

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

0.4 0.5 0.6 0.7 0.8

threshold

prec

isio

n

B

ID

nose

ear

eye

Terminological matchers

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

0.4 0.5 0.6 0.7 0.8

threshold

reca

ll

B

ID

nose

ear

eye

Page 18: Ontologies and Ontology alignmentpatla00/talks/lambrix-harbin.pdf · + z z oh hpp >> 7ol lzl l o p l h p hz zk pkl li j ml l jl 5lnp l lk h pjp h j p n m h hzph jh

18

ResultsBasic learning matcher (Naïve Bayes)

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

0.4 0.5 0.6 0.7 0.8

threshold

reca

ll

B

ID

nose

ear

eye

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

0.4 0.5 0.6 0.7 0.8

threshold

prec

isio

n

B

ID

nose

ear

eye

Naive Bayes slightly better recall, but slightly worse precision than SVM-single

SVM-multiple (much) better recall, but worse precision than SVM-single

Results

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

0.4 0.5 0.6 0.7 0.8

threshold

prec

isio

n

B

ID

nose

ear

eye

Domain matcher (using UMLS)

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

0.4 0.5 0.6 0.7 0.8

threshold

reca

ll

B

ID

nose

ear

eye

Results

Comparison of the matchers

CS_TermWN CS_Dom CS_Learn

Combinations of the different matchers

combinations give often better results

no significant difference on the quality of suggestions for different

weight assignments in the combinations

(but: did not check yet for large variations for the weights)

Structural matcher did not find (many) new correct alignments

(but: good results for systems biology schemas SBML – PSI MI)

⊇ ⊇

Evaluation of filtering

MatcherTermWN

ParametersQuality of suggestions: precision/recall

Double threshold filtering using structure: Upper threshold: 0.8

Lower threshold: 0.4, 0.5, 0.6, 0.7, 0.8

Results

The precision for double threshold filtering with upperthreshold 0.8 and lower threshold T is higher than for threshold filtering with threshold T

eye

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

0,4 0,5 0,6 0,7

(lower) threshold

prec

isio

n

TermWN

filtered

Results

eye

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

0,4 0,5 0,6 0,7

(low er) threshold

reca

ll

Ter mWN

f i l ter ed

The recall for double threshold filtering with upperthreshold 0.8 and lower threshold T is about the same as for threshold filtering with threshold T

Page 19: Ontologies and Ontology alignmentpatla00/talks/lambrix-harbin.pdf · + z z oh hpp >> 7ol lzl l o p l h p hz zk pkl li j ml l jl 5lnp l lk h pjp h j p n m h hzph jh

19

Part II – Ontology Alignment

Ontology alignment

Ontology alignment strategies

Evaluation of ontology alignment strategies

Recommending ontology alignment Recommending ontology alignment strategies strategies

Current issues

Recommending strategies - 1

Use knowledge about previous use of alignment strategies

gather knowledge about input, output, use, performance, cost via questionnairesNot so much knowledge availableOAEI

(Mochol, Jentzsch, Euzenat 2006)

Recommending strategies - 2

Optimize Parameters for ontologies, similarity assessment,

matchers, combinations and filters Run general alignment algorithm User validates the alignment resultOptimize parameters based on validation

(Ehrig, Staab, Sure 2005)

Recommending strategies - 2Tests

travel in russiaQOM: r=0.618, p=0.596, f=0.607Decision tree 150: r=0.723, p=0.591, f=0.650

bibsterQOM: r=0.279, p=0.397, f=0.328Decision tree 150: r=0.630, p=0.375, f=0.470

Decision trees better than Neural Nets and Support Vector Machines.

Recommending strategies - 3

Based on inherent knowledgeUse the actual ontologies to align to find good

candidate alignment strategies

User/oracle with minimal alignment work

Complementary to the other approaches

(Tan, Lambrix 2007)

Idea

Select small segments of the ontologies

Generate alignments for the segments (expert/oracle)

Use and evaluate available alignment algorithms on the segments

Recommend alignment algorithm based on evaluation on the segments

Page 20: Ontologies and Ontology alignmentpatla00/talks/lambrix-harbin.pdf · + z z oh hpp >> 7ol lzl l o p l h p hz zk pkl li j ml l jl 5lnp l lk h pjp h j p n m h hzph jh

20

Frameworko2o1

SelectionAlgorithm

Segment Pair

GeneratorAlignment

1o

1o

1sp

nsp

..o2

.2o

matchers filters

combinationalgorithms

alignment strategies

1spa−

na−sp

.

.

.1 2oo

1 o2o

evaluationresult report

Align

men

t S

trategy

Recom

men

dation

Algorith

m

RecommendedAlignment Strategy

Toolb

ox

Align

men

t

o2o1

SelectionAlgorithm

Segment Pair

GeneratorAlignment

1o

1o

1sp

nsp

..o2

.2o

matchers filters

combinationalgorithms

alignment strategies

1spa−

na−sp

.

.

.1 2oo

1 o2o

evaluationresult report

Align

men

t S

trategy

Recom

men

dation

Algorith

m

RecommendedAlignment Strategy

Toolb

ox

Align

men

t

Experiment case - Ontologies

NCI thesaurusNational Cancer Institute, Center for

Bioinformatics

Anatomy: 3495 terms

MeSHNational Library of Medicine

Anatomy: 1391 terms

Experiment case - Oracle

UMLSLibrary of Medicine

Metathesaurus contains > 100 vocabularies

NCI thesaurus and MeSH included in UMLS

Used as approximation for expert knowledge

919 expected alignments according to UMLS

Experiment case – alignment strategies

Matchers and combinationsN-gram (NG)Edit Distance (ED)Word List + stemming (WL)Word List + stemming + WordNet (WN)NG+ED+WL, weights 1/3 (C1)NG+ED+WN, weights 1/3 (C2)

Threshold filterthresholds 0.4, 0.5, 0.6, 0.7, 0.8

o2o1

SelectionAlgorithm

Segment Pair

GeneratorAlignment

1o

1o

1sp

nsp

..o2

.2o

matchers filters

combinationalgorithms

alignment strategies

1spa−

na−sp

.

.

.1 2oo

1 o2o

evaluationresult report

Align

men

t S

trategy

Recom

men

dation

Algorith

m

RecommendedAlignment Strategy

Toolb

ox

Align

men

t

Segment pair selection algorithms

SubGCandidate segment pair = sub-graphs according

to is-a/part-of with roots with same name; between 1 and 60 terms in segment

Segment pairs randomly chosen from candidate segment pairs such that segment pairs are disjoint

o2o1

SelectionAlgorithm

Segment Pair

GeneratorAlignment

1o

1o

1sp

nsp

..o2

.2o

matchers filters

combinationalgorithms

alignment strategies

1spa−

na−sp

.

.

.1 2oo

1 o2o

evaluationresult report

Align

men

t S

trategy

Recom

men

dation

Algorith

m

RecommendedAlignment Strategy

Toolb

ox

Align

men

t

Segment pair selection algorithms

Clust - Cluster terms in ontologyCandidate segment pair is pair of clusters

containing terms with the same name; at least 5 terms in clusters

Segment pairs randomly chosen from candidate segment pairs

o2o1

SelectionAlgorithm

Segment Pair

GeneratorAlignment

1o

1o

1sp

nsp

..o2

.2o

matchers filters

combinationalgorithms

alignment strategies

1spa−

na−sp

.

.

.1 2oo

1 o2o

evaluationresult report

Align

men

t S

trategy

Recom

men

dation

Algorith

m

RecommendedAlignment Strategy

Toolb

ox

Align

men

t

Page 21: Ontologies and Ontology alignmentpatla00/talks/lambrix-harbin.pdf · + z z oh hpp >> 7ol lzl l o p l h p hz zk pkl li j ml l jl 5lnp l lk h pjp h j p n m h hzph jh

21

Segment pair selection algorithmsFor each trial, 3 segment pair sets with 5 segment pairs were generated

SubG: A1, A2, A3 2 to 34 terms in segmentlevel of is-a/part-of ranges from 2 to 6max expected alignments in segment pair is 23

Clust: B1, B2, B35 to 14 terms in segmentlevel of is-a/part-of is 2 or 3max expected alignments in segment pair is 4

Segment pair alignment generator

Used UMLS as oracle

Used KitAMO as toolbox

Generates reports on similarity values produced by different matchers, execution times, number of correct, wrong, redundant suggestions

Alignment toolbox

o2o1

SelectionAlgorithm

Segment Pair

GeneratorAlignment

1o

1o

1sp

nsp

..o2

.2o

matchers filters

combinationalgorithms

alignment strategies

1spa−

na−sp

.

.

.1 2oo

1 o2o

evaluationresult report

Align

men

t S

trategy

Recom

men

dation

Algorith

m

RecommendedAlignment Strategy

Toolb

ox

Align

men

t

o2o1

SelectionAlgorithm

Segment Pair

GeneratorAlignment

1o

1o

1sp

nsp

..o2

.2o

matchers filters

combinationalgorithms

alignment strategies

1spa−

na−sp

.

.

.1 2oo

1 o2o

evaluationresult report

Align

men

t S

trategy

Recom

men

dation

Algorith

m

RecommendedAlignment Strategy

Toolb

ox

Align

men

t

Recommendation algorithm

Recommendation scores: F, F+E, 10F+E

F: quality of the alignment suggestions

- average f-measure value for the segment pairs

E: average execution time over segment pairs, normalized with respect to number of term pairs

Algorithm gives ranking of alignment strategies based on recommendation scores on segment pairs

o2o1

SelectionAlgorithm

Segment Pair

GeneratorAlignment

1o

1o

1sp

nsp

..o2

.2o

matchers filters

combinationalgorithms

alignment strategies

1spa−

na−sp

.

.

.1 2oo

1 o2o

evaluationresult report

Align

men

t S

trategy

Recom

men

dation

Algorith

m

RecommendedAlignment Strategy

Toolb

ox

Align

men

t

Expected recommendations for F

Best strategies for the whole ontologies and measure F:

1. (WL,0.8)

2. (C1,0.8)

3. (C2,0.8)

Results

SPS A1

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

0,4 0,5 0,6 0,7 0,8

threshold

Rec

om

men

dat

ion

S

core

NG ED WL WN C1 C2

SubG, F, SPS A1

ResultsTop 3 strategies for SubG and measure F:

A1: 1. (WL,0.8) (WL, 0.7) (C1,0.8) (C2,0.8)A2: 1. (WL,0.8) 2. (WL,0.7) 3. (WN,0.7)A3: 1. (WL,0.8) (WL, 0.7) (C1,0.8) (C2,0.8)

Best strategy always recommended firstTop 3 strategies often recommended(WL,0.7) has rank 4 for whole ontologies

Page 22: Ontologies and Ontology alignmentpatla00/talks/lambrix-harbin.pdf · + z z oh hpp >> 7ol lzl l o p l h p hz zk pkl li j ml l jl 5lnp l lk h pjp h j p n m h hzph jh

22

Results

Top 3 strategies for Clust and measure F:

B1: 1. (C2,0.7) 2. (ED,0.6) 3. (C2,0.6)

B2: 1. (WL,0.8) (WL, 0.7) (C1,0.8) (C2,0.8)

B3: 1. (C1,0.8) (ED,0.7) 3. (C1,0.7) (C2,0.7) (WL,0.7) (WN,0.7)

Top strategies often recommended, but not always

(WL,0.7) (C1,0.7) (C2,0.7) ranked 4,5,6 for whole ontologies

Results

SubG gives better results than Clust

Results improve when number of segments is increased

10F+E similar results as F

F+E WordNet gives lower ranking

Runtime environment has influence

Part II – Ontology Alignment

Ontology alignment

Ontology alignment strategies

Evaluation of ontology alignment strategies

Recommending ontology alignment strategies

Current IssuesCurrent Issues

Current issues

Systems and algorithmsComplex ontologies

Use of instance-based techniques

Alignment types (equivalence, is-a, …)

Complex alignments (1-n, m-n)

Connection ontology types – alignment strategies

Current issues

EvaluationsNeed for Golden standards

Systems available, but not always the alignmentalgorithms

Evaluation measures

Recommending ’best’ alignment strategies

Further reading

Starting points for further studies

Page 23: Ontologies and Ontology alignmentpatla00/talks/lambrix-harbin.pdf · + z z oh hpp >> 7ol lzl l o p l h p hz zk pkl li j ml l jl 5lnp l lk h pjp h j p n m h hzph jh

23

Further readingontologies

KnowledgeWeb ( http://knowledgeweb.semanticweb.org/ ) and its predecessorOntoWeb ( http://ontoweb.aifb.uni-karlsruhe.de/ )

Lambrix, Tan, Jakoniene, Strömbäck, Biological Ontologies, chapter 4 in Baker, Cheung, (eds), Semantic Web: Revolutionizing Knowledge Discovery in the Life Sciences, 85-99, Springer, 2007. ISBN: 978-0-387-48436-5.

(general about ontologies)

Lambrix, Towards a Semantic Web for Bioinformatics using Ontology-based Annotation, Proceedings of the 14th IEEE International Workshops on EnablingTechnologies: Infrastructures for Collaborative Enterprises, 3-7, 2005. Invited talk.

(ontologies for semantic web)

OWL, http://www.w3.org/TR/owl-features/ , http://www.w3.org/2004/OWL/

Further reading ontology alignment

http://www.ontologymatching.org(plenty of references to articles and systems)

Ontology alignment evaluation initiative: http://oaei.ontologymatching.org(home page of the initiative)

Euzenat, Shvaiko, Ontology Matching, Springer, 2007.

Lambrix, Tan, SAMBO – a system for aligning and merging biomedical ontologies, Journal of Web Semantics, 4(3):196-206, 2006.

(description of the SAMBO tool and overview of evaluations of different matchers)

Lambrix, Tan, A tool for evaluating ontology alignment strategies, Journal on Data Semantics, VIII:182-202, 2007.

(description of the KitAMO tool for evaluating matchers)

Further readingontology alignment

Chen, Tan, Lambrix, Structure-based filtering for ontology alignment,IEEEWETICE workshop on semantic technologies in collaborative applications, 364-369, 2006.

(double threshold filtering technique)

Tan H, Lambrix P, `A method for recommending ontology alignment strategies', International Semantic Web Conference, 494-507, 2007.

Ehrig M, Staab S, Sure Y, ‘Bootstrapping ontology alignment methods with APFEL, International Semantic Web Conference, 186-200, 2005.

Mochol M, Jentzsch A, Euzenat J, ’Applying an analytic method for matchingapproach selection’, International Workshop on Ontology Matching, 2006.

(recommendation of alignment strategies)