ontology maintenance with an algebraic methodology: a case study jan jannink, gio wiederhold...
Post on 22-Dec-2015
217 views
TRANSCRIPT
Ontology Maintenance with an Algebraic Methodology: a Case
StudyJan Jannink, Gio Wiederhold
Presented by: Lei Lei
Challenges
Obstacle: Autonomy of diverse knowledge sources
Data volatility and amount increases cost Major challeges: Establish and maintain app
lication specific portion of knowledge sources
An Algebraic Approach
Construct virtual knowledge bases geared to a specific application
Use composable operators to transform contexts into contexts
Operators express relevant parts of a source and the conditions using rules
Rules define a valid context transformation
On-line Dictionary:Webster
Autonomously maintained to develop a novel thesaurus application
120,000 entries, two million words Semi-annual updates Errors and inconsistencies help robustness
Target Application
Construct a graph of the definitions to determine related terms, and automatically generate thesaurus entries
Related Work Ontology composition (Wiederhold 1994) Rule-based approach to semantic integration (Bergamas
chi et al. 1999) Semantic reconciliation (Siegel 1991) Uschold et al. 1998 Specification morphisms, (Smith 1993) WordNet system (Miller & al. 1990) WHIRL (Cohen 1998) PageRank (Page&Brin 1998) Latent semantic indexing (Deerwester 1990) Hypertext authority (Kleinberg 1998)
Outline
Algebra Usage Scenario Background Context Creation Ontology Maintainance Future Work Conclusion My Evaluation
Typical Algebra Usage ScenarioA minimal sufficient set of Linkage between items in different resources
Background
Algebraic OperatorsCanonical unary to establish and refine a context within which the source knowledge meets the application requirement
Background(Cont.)
Semantic Context * No global notion of consistency
* Defined as objects that encapsulate other objects
* Congruity: relevance of source info. to application
* Similarity: equivalent and mergeable objects
between different sources
Rule Language(Cont.)
Allow uninterpreted components of an object to become attributes of the object
Constructors:
create new objects Constructors:
generate proxy objects Editors & convertors:
modify the objects
Object Model(Cont.)
Subsume existing models
Only objects have an identity to which others can refer
Correspond to XML supplemented with obj. identity
Rich to model complex relationship
Context Creation
Summarize Operator (S operator)
Transforms source data based on a predicate Create object: Encapsulates & populates
Data classification:Groups source into equivalent classes
Syntax: (given contexts c1,c2, a matching rule e)
Context Creation(Cont.)
1.Predicate e partitions the objects of c1 into n equivalent parts
2. c2 consists of n+1 values: e,s1,s2,…,sn 3.One is an exception class, not match e
Example(Cont.)
Construct a directed
graph from definition: 1.Each head word and
definition grouping is a node
2.Each word in a definition node
is an arc to the node having
that head word
Definition from the dictionary data for Egoism
Context Creation(Cont.)
*Syllable and accent markers in head words
*Misspelled head words
*Mis-tagged fields
*Stemming and irregular verbs(Hopelessness)
*Common abbreviations in definitions(etc.)
*Undefined words with common prefixes(un-)
*Multi-word head words(Water Buffalo)
*Underfined hyphenated and compound words
Set 99% accuracy in the conversion from data to graph stru.
Constructing the Congruity Expression
An object that represents the entire source
Subdivided into chunks
One head word
One definition
Express congruity relationship between the dictionary and thesaurus application
Maintaining the Ontology
Changes in source help correct and extend dict. Maintain statistics with the S operator when extractin
g the relevant parts of the dictionary
Find no longer needed rules Note which rules no longer needed A comparison of the terms reveals new errors
Future Work
A web based interface to display ArcRank algorithm based on PageRank
(http://www-db.stanford.edu/SKC)
Conclusion
An on-line dictionary is good test-bed An algebraic approach improving maintaina
bility Congruity simplified identification and handli
ng of changes Use Summarize to define and refine a conte
xt that prepare the dictionary data for thesaurus service use
My Assessment
Strength *Decouple the selection of congruent parts of the source data *Congruity and similarity measure use algebra rather than single languag
e *Mirror classes using operators of the algebra instead of low level abstrac
t primitives that are difficult to compose Weakness *Details of ci’=S(ci) are needed *Difficult to grasp the capability of S operator *Accuracy and error accumulation problem *Ambiguous Rules Generation