ontology maintenance with an algebraic methodology: a case study jan jannink, gio wiederhold...

24
Ontology Maintenance with an Algebraic Methodology: a Case Study Jan Jannink, Gio Wiederhold Presented by: Lei Lei

Post on 22-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Ontology Maintenance with an Algebraic Methodology: a Case

StudyJan Jannink, Gio Wiederhold

Presented by: Lei Lei

Challenges

Obstacle: Autonomy of diverse knowledge sources

Data volatility and amount increases cost Major challeges: Establish and maintain app

lication specific portion of knowledge sources

An Algebraic Approach

Construct virtual knowledge bases geared to a specific application

Use composable operators to transform contexts into contexts

Operators express relevant parts of a source and the conditions using rules

Rules define a valid context transformation

On-line Dictionary:Webster

Autonomously maintained to develop a novel thesaurus application

120,000 entries, two million words Semi-annual updates Errors and inconsistencies help robustness

Target Application

Construct a graph of the definitions to determine related terms, and automatically generate thesaurus entries

Related Work Ontology composition (Wiederhold 1994) Rule-based approach to semantic integration (Bergamas

chi et al. 1999) Semantic reconciliation (Siegel 1991) Uschold et al. 1998 Specification morphisms, (Smith 1993) WordNet system (Miller & al. 1990) WHIRL (Cohen 1998) PageRank (Page&Brin 1998) Latent semantic indexing (Deerwester 1990) Hypertext authority (Kleinberg 1998)

Outline

Algebra Usage Scenario Background Context Creation Ontology Maintainance Future Work Conclusion My Evaluation

Typical Algebra Usage ScenarioA minimal sufficient set of Linkage between items in different resources

Background

Algebraic OperatorsCanonical unary to establish and refine a context within which the source knowledge meets the application requirement

Background(Cont.)

Semantic Context * No global notion of consistency

* Defined as objects that encapsulate other objects

* Congruity: relevance of source info. to application

* Similarity: equivalent and mergeable objects

between different sources

Rule Language(Cont.)

Allow uninterpreted components of an object to become attributes of the object

Constructors:

create new objects Constructors:

generate proxy objects Editors & convertors:

modify the objects

Object Model(Cont.)

Subsume existing models

Only objects have an identity to which others can refer

Correspond to XML supplemented with obj. identity

Rich to model complex relationship

Context Creation

Summarize Operator (S operator)

Transforms source data based on a predicate Create object: Encapsulates & populates

Data classification:Groups source into equivalent classes

Syntax: (given contexts c1,c2, a matching rule e)

Context Creation(Cont.)

1.Predicate e partitions the objects of c1 into n equivalent parts

2. c2 consists of n+1 values: e,s1,s2,…,sn 3.One is an exception class, not match e

Example with Webster’s Dictionary

Automatic Thesaurus Extraction from Dictionary

Example(Cont.)

Construct a directed

graph from definition: 1.Each head word and

definition grouping is a node

2.Each word in a definition node

is an arc to the node having

that head word

Definition from the dictionary data for Egoism

Context Creation(Cont.)

*Syllable and accent markers in head words

*Misspelled head words

*Mis-tagged fields

*Stemming and irregular verbs(Hopelessness)

*Common abbreviations in definitions(etc.)

*Undefined words with common prefixes(un-)

*Multi-word head words(Water Buffalo)

*Underfined hyphenated and compound words

Set 99% accuracy in the conversion from data to graph stru.

Constructing the Congruity Expression

An object that represents the entire source

Subdivided into chunks

One head word

One definition

Express congruity relationship between the dictionary and thesaurus application

Ontology Maintenance

Context Refinement

Return the ten longest head words of the dictionary

Maintaining the Ontology

Changes in source help correct and extend dict. Maintain statistics with the S operator when extractin

g the relevant parts of the dictionary

Find no longer needed rules Note which rules no longer needed A comparison of the terms reveals new errors

Future Work

A web based interface to display ArcRank algorithm based on PageRank

(http://www-db.stanford.edu/SKC)

Conclusion

An on-line dictionary is good test-bed An algebraic approach improving maintaina

bility Congruity simplified identification and handli

ng of changes Use Summarize to define and refine a conte

xt that prepare the dictionary data for thesaurus service use

My Assessment

Strength *Decouple the selection of congruent parts of the source data *Congruity and similarity measure use algebra rather than single languag

e *Mirror classes using operators of the algebra instead of low level abstrac

t primitives that are difficult to compose Weakness *Details of ci’=S(ci) are needed *Difficult to grasp the capability of S operator *Accuracy and error accumulation problem *Ambiguous Rules Generation

Questions?