august 2000 gio vdr 1 increasing the precision of semantic interoperation gio wiederhold stanford...
Post on 21-Dec-2015
213 views
TRANSCRIPT
August 2000 Gio vdR 1
Increasing the Precision of
Semantic Interoperation
Gio Wiederhold
Stanford University
August 2000
Reind van de Riet celebration
Thanks to Jan Jannink, Shrish Agarwal, Prasenjit Mitra, Stefan Decker.
August 2000 Gio vdR 3
Achievements• Computer Science-based Formalization
• Effective Dissemination - students and writings
• Powerful Database Technology
• Language-sensitive Methods
• Ubiquitous Databases
• World-wide interconnectivity
• All the Information one might want
somewhere . . . .
August 2000 Gio vdR
Information overload Data starvation
• More databases– public & corporate
• Faster communication– digital– packeting: TCP-IP, ATM
• World-wide connectivity– internet– world-wide web
• Disintermediation– ubiquitous publishing
August 2000 Gio vdR
Data and Knowledge
Information is created at theconfluence ofdata -- the state & knowledge -- the ability to select and project the state into the future
Knowledge LoopKnowledge Loop
ExperienceExperience
ActionAction
Data LoopData Loop
StorageStorageEducationEducation
SelectionSelection
IntegrationIntegration
AbstractionAbstraction
Decision-makingDecision-making
State changesState changes
RecordingRecording
August 2000 Gio vdR 6
Language issues
• Our languages are specialized for our needs– efficiency of expression: minimal symbols
Examples: • carpenter versus householder domain• surgeon versus pathologist
• World-wide communication changes ranges– geographical locality is less relevant– conceptual locality is important
• Business requires precision, including that words map consistently to instances
knowledge controls use of data
August 2000 Gio vdR 7
Transform Data to Information
data and simulation resources
value-added services
decision-makers at workstationsApplication Layer
Mediation Layer
Foundation Layer
August 2000 Gio vdR 8
Heterogeneity among Domains
If interoperation involves distinct
domains mismatch ensues• Autonomy conflicts with consistency,
– Local Needs have Priority,– Outside uses are a Byproduct
Heterogeneity must be addressed• Platform and Operating Systems • Representation and Access Conventions • Naming and Ontology
August 2000 Gio vdR 9
Semantic Mismatches
Information comes from many autonomous sources• Differing viewpoints (by source)
– differing terms for similar items { lorry, truck }
– same terms for dissimilar items trunk(luggage, car)
– differing coverage vehicles (DMV, AIA)
– differing granularity trucks (shipper, manuf.)
– different scope student museum fee, Stanford
• Hinders use of information from disjoint sources – missed linkages loss of information, opportunities– irrelevant linkages overload on user or application
program
• Poor precision when merged
ok for web browsing , poor for business
August 2000 Gio vdR 10
Need for precision
adapted from Warren Powell, Princeton Un.
data
err
ors
information quantity
human lim
it
acceptable limit
hum
an w
ith to
ols?
Information Wall
More precision is needed as data volume increases--- a small error rate still leads to too many errors False Positives have to be investigated ( attractive-looking supplier - makes toys apparent drug-target with poor annotation )
False Negatives cause lost opportunities, suboptimal to some degree
False positives = poor precision typically cost more thanfalse negatives = poor recall
August 2000 Gio vdR 11
Means to achieve precision
• Reduce redundancy– omit similar results from alternate sources
reports, workshop papers, journals, books
• Reduce false positives– recognize contextual domains *
• the same word refers to different object types nail (carpentry, anatomy), miter (carpentry, religion)
• Abstract findings to higher levels– Linguistic processing based on customer model
medical case studies have similar formats
August 2000 Gio vdR 12
Proposed Language Solutions
Specify and define terminology usage: ontology
• Domain-specific ontologies XML DTD assumption– Small, focused, cooperating groups– high quality, some examples - genomics, arthritis, Shakespeare plays
– allows sharable, formal tools – ongoing, local maintenance affecting users - annual updates
– poor interoperation, users still face inter-domain mismatches
• Cannot achieve globally consistency – wonderful for users and their programs– too many interacting sources– long time to achieve, 2 sources (UAL, BA), 3 (+ trucks), 4, … all ? – costly maintenance, since all sources evolve – no world-wide authority to dictate conformance
August 2000 Gio vdR 13
Domains and Consistency .
• a domain will contain many objects• the object configuration is consistent• within a domain all terms are consistent &• relationships among objects are consistent
• context is implicit
No committee is needed to forge compromises * within a domain
Compromises hide valuable details
Domain Ontology
August 2000 Gio vdR 14
Many ontologies Interoperation
Have devolved maintenance onto many domain-specific experts / authorities
• Many applications span multiple domains
• Need an algebra to compute composed ontologies that are limited to their articulation terms
• enable interpretation within the source contexts
SKC
August 2000 Gio vdR 15
Sample Operation: INTERSECTION
Source Domain 1:Owned and maintained by Store
Result contains shared terms,useful for purchasing
Source Domain 2:Owned and maintainedby Factory
Articulation
August 2000 Gio vdR 16
Tools to create articulations
Combine ontology graphs with expert selection based on spelling, graph matching, and a nexus derived from a dictionary (O.E.D.)
Veh
icle
reg
istr
atio
n
on
tolo
gy
Veh
icle
sale
s o
nto
log
y
Suggestionsfor articulations
August 2000 Gio vdR 17
An Ontology Algebra
A knowledge-based algebra for ontologies
The Articulation Ontology (AO) consists of matching rules that link domain ontologies
Intersection create a subset ontology keep sharable entries
Union create a joint ontology merge entries
Difference create a distinct ontology remove shared entries
August 2000 Gio vdR 18
INTERSECTION support
Store Ontology
Articulation ontology
Matching rules that use terms from the 2 source domains
Factory Ontology
Terms usefulfor purchasing
August 2000 Gio vdR 19
Other Basic Operations
typically priorintersections
UNION: mergingentire ontologies
DIFFERENCE: materialfully under local control
Arti-culation ontology
August 2000 Gio vdR 20
Features of an algebra
Operations can be composed
Operations can be rearranged
Alternate arrangements can be evaluated
Optimization is enabled
The record of past operations can be
kept and reused when sources change
August 2000 Gio vdR 21
Knowledge CompositionArticulationknowledgefor U
U
U
(A B)U
(B C)U
(C E)
Knowledge resource
B
Knowledge resource
A
Knowledge resource
C Knowledge
resourceD
U
(C D)
U
(B C)
Articulationknowledge
Composed knowledge forapplications using A,B,C,E
Knowledge resource
E
U
(C E)
Legend:
U : unionU: intersection
Articulationknowledgefor (A B)
U
August 2000 Gio vdR 22
Domain Specialization .• Knowledge Acquisition (20% effort) &• Knowledge Maintenance (80% effort *)
to be performed• Domain specialists• Professional organizations• Field teams of modest size
Empowermentautonomouslymaintainable
* based on experience with software
August 2000 Gio vdR 23
Summary
To sustain the growth of web usage1. The value of the results has to keep increasing
precision, relevance not volume2. Value is provided by mediating experts,
encoded as models of diverse resources, diverse customersProblems being addressed redundancy mismatches quality maintenance
Need research to develop tools for these tasks
} Clear models
August 2000 Gio vdR 24
Integration Science ?
IntegrationScience
IntegrationScience
ArtificialIntelligence
knowledge mgmtlinguistic models
uncertainty
ArtificialIntelligence
knowledge mgmtlinguistic models
uncertainty
Systems Engineering
analysisdocumentation
costing
Systems Engineering
analysisdocumentation
costing
Databasesaccessstoragealgebras
scalability
Databasesaccessstoragealgebras
scalability
August 2000 Gio vdR 26
Sample Processing in HPKB
• What is the most recent year an OPEC member nation was on the UN security council?– Related to DARPA HPKB
Challenge Problem– SKC resolves 3 Sources
• CIA Factbook ‘96 (nation)
• OPEC (members, dates)
• UN (SC members, years)
– SKC obtains the Correct Answer
• 1996 (Indonesia)
– Other groups obtained more,
but factually wrong answers
– Problems resolved by SKC* Factbook has out of date
OPEC & UN SC lists
– Indonesia not listed
– Gabon (left OPEC 1994)
* different country names
– Gambia => The Gambia
* historical country names
– Yugoslavia
• UN lists future security council members
– Gabon 1999
• intent of original question
– Temporal variants
August 2000 Gio vdR 27
Tools to create articulations
Combine ontology graphs with expert selection based on spelling, graph matching, and a nexus derived from a dictionary (O.E.D.)
Veh
icle
reg
istr
atio
n
on
tolo
gy
Veh
icle
sale
s o
nto
log
y
Suggestionsfor articulations
August 2000 Gio vdR 28
Tools to create articulations
Graph matcherforArticulation- creatingExpert
Vehicle ontology
Transport ontology
Suggestionsfor articulations
August 2000 Gio vdR 29
continue from initial point
Also suggest similar terms for further articulation:
• by spelling similarity,• by graph position• by term match repository
Expert response:1. Okay2. False3. Irrelevant to this articulation
All results are recorded
Okay’s are converted into articulation rules
August 2000 Gio vdR 30
Candidate Match Repository
Term linkages automatically extracted from 1912 Webster’s dictionary *
* free, other sources .have been processed.
Based on processing headwords definitions using algebra primitives
Notice presence of 2 domains: chemistry, transport
August 2000 Gio vdR 33
Primitive Operations
Unary• Summarize -- structure up• Glossarize - list terms• Filter - reduce instances• Extract - circumscription
Binary • Match - data corrobaration• Difference - distance
measure• Intersect - schem discovery• Blend - schema extension
Constructors• create object• create setConnectors• match object• match setEditors• insert value• edit value• move value• delete valueConverters• object - value• object indirection• reference indirection
Model and Instance