ibm research © 2007 ibm corporation sher: scalable highly expressive reasoner 6/4/2015 sher: a...
Post on 19-Dec-2015
234 views
TRANSCRIPT
IBM Research
SHER: Scalable Highly Expressive Reasoner 04/18/23 © 2007 IBM Corporation
SHER: A Scalable Highly Expressive Reasoner and its Applications.
J. Dolby, A. Fokoue, A. Kalyanpur, A. Kershenbaum, L. Ma, E. Schonberg, K. Srinivas
IBM Research
© 2007 IBM Corporation2 04/18/23SHER: Scalable Highly Expressive Reasoner
Outline
Background and motivation
Core SHER technical innovations
–Scalability via Summarization
–Refinement: Resolving Inconsistencies in a Summary
–Integration with Incomplete Reasoners
–Conjunctive Query Evaluation
SHER concrete applications
–Automated Clinical trials matching using ontologies
–Anatomy Lens: Semantic search over PubMed
–Scalable Text Analytics cleanup
Conclusion
IBM Research
© 2007 IBM Corporation3 04/18/23SHER: Scalable Highly Expressive Reasoner
Project background and Motivation
Emergence of OWL as a standardized language for expressing semantic relations in ontologies.
–2004 : OWL a W3C Recommendation
Emergence of standardized ontologies encoded in OWL, especially in healthcare, life sciences:
–Biopax
–SNOMED
–About 80 ontologies at OBO (e.g. GO, FMA)
Emerging use of ontologies in search and retrieval of structured and unstructured data.
IBM Research
© 2007 IBM Corporation4 04/18/23SHER: Scalable Highly Expressive Reasoner
Vision: Semantic Information Retrieval
John Smith visited theLouvre
Show me all people who visited France?
Semantic InformationRetrieval System
John Smith!
DB2
OntologyDefinition
Louvre located in Paris….
Unstructured data
Legacy data
RDF StoreHomogeneous
view
IBM Research
© 2007 IBM Corporation5 04/18/23SHER: Scalable Highly Expressive Reasoner
Problems
Computational complexity of reasoning
– Intractable in the worst case.
– In 2005, intractable in practice on large and expressive KBs
Imprecision/inconsistencies in ontologies.
– Reasoner inability to scale consistency check
Query answering in expressive ontologies
IBM Research
© 2007 IBM Corporation6 04/18/23SHER: Scalable Highly Expressive Reasoner
Dealing with complexity challenges Reducing the expressivity of DL languages
–Why?• 80/20 rule ~ 80% of use cases covered by 20% of the language constructs• Tractability • Ease of implementation
–Result of this line of Research:• DL-Lite family (Diego Calvanese et al.)
– Covers: ER, UML– LogSpace complexity– Easy scalable implementation on top of relational DBMS
• EL++ (Franz Baader et al.)– Covers most life science ontologies– Polynomial time complexity (satisfiability, subsumption, and instance checking)– Simple rule-style implementation
• OWL 2.0 Profiles Approximate Reasoning
–Screech OWL Reasoner (Pascal Hitzler et al.)
IBM Research
© 2007 IBM Corporation7 04/18/23SHER: Scalable Highly Expressive Reasoner
SHER – A Highly Scalable SOUND and COMPLETE Reasoner for large OWL-DL KB
Reasons over highly expressive ontologies
Reasons over data in relational databases
No inferencing on load
– hence deals better with fast changing data
– the downside: reasoning is performed at query time.
Highly scalable -- reasons on 7.7M records in 7.9 s.
– State of the art cannot run on more than 1 million records on a 64 bit dual processor machine with 4G heap.
– Can scale to more than 60 million triples
– Semantically index 300 million triples from the medical literature.
Tolerate inconsistencies
Provide explanations
IBM Research
© 2007 IBM Corporation8 04/18/23SHER: Scalable Highly Expressive Reasoner
Outline
Background and motivation
Core SHER technical innovations
–Scalability via summarization
–Refinement: Resolving inconsistencies in a summary
–Integration with incomplete reasoners
–Conjunctive Query Evaluation
SHER concrete applications
–Automated Clinical trials matching using ontologies
–Anatomy Lens: Semantic search over PubMed
–Scalable Text Analytics cleanup
Conclusion and future work
IBM Research
© 2007 IBM Corporation9 04/18/23SHER: Scalable Highly Expressive Reasoner
Scalability via Summarization (ISWC 2006)
C1
M1
H1
isTaughtBy
C2
M2
H2
Original ABox
likes likes
P1
P2
Summary
M’
H’
likes
P’
C’
Legend: C – Course P - Person M - ManW – WomanH - Hobby
C’{C1, C2}
isTaughtBy
The summary mapping function f that satisfies the constraints:
– If any individual a is an explicit member of a concept C in the original Abox, and f(a) is an explicit member of C in the summary Abox.
– If a≠b is explicitly in the original Abox, then f(a) ≠f(b) is explicitly in the summary Abox.
– If a relation R(a, b) exists in the original ABox, then R(f(a), f(b)) exists in the summary.
If the summary is consistent, then the original Abox is consistent (converse is not true).
isTaughtBy isTaughtByisTaughtBy
isTaughtBy
TBox:Functional (isTaughtBy)Disjoint (Man, Woman)
IBM Research
© 2007 IBM Corporation10 04/18/23SHER: Scalable Highly Expressive Reasoner
Summarization effectiveness
Ontology Instances Role Assertions
I R A
Biopax 261,149 582,655 81 583
LUBM-1 42,585 214,177 410 16,233
LUBM-5 179,871 927,854 598 35,375
LUBM-10 351,422 1,816,153 673 49,176
LUBM-30 1,106,858 6,494,950 765 79,845
NIMD 1,278,540 1,999,787 19 55
ST 874,319 3,595,132 21 183
I – Instances after summarizationRA – Role assertions after summarization
IBM Research
© 2007 IBM Corporation11 04/18/23SHER: Scalable Highly Expressive Reasoner
Scalability via Filtering (ISWC 2006)
For expressive ontologies, query answering can be reduced to a consistency check on the Abox.
For the SHIN subset of DL (OWL-DL minus datatype reasoning and nominals), only certain types of relations are key to finding an inconsistency.
Specifically, any relation R which appears as part of an universal restriction (S.C) or a maximum cardinality (nS) are key for finding inconsistencies.
All relations that do not participate in such concept expressions can be filtered, provided we can compute all relevant concepts in the ontology…
IBM Research
© 2007 IBM Corporation12 04/18/23SHER: Scalable Highly Expressive Reasoner
Filtering effectiveness
Ontology Instances Role Assertions
I R A
Biopax 261,149 582,655 38 98
LUBM-1 42,585 214,177 280 284
LUBM-5 179,871 927,854 426 444
LUBM-10 351,422 1,816,153 474 492
LUBM-30 1,106,858 6,494,950 545 574
NIMD 1,278,540 1,999,787 2 1
ST 874,319 3,595,132 18 50
I – Instances after filteringRA – Role assertions after filtering
IBM Research
© 2007 IBM Corporation13 04/18/23SHER: Scalable Highly Expressive Reasoner
Refinement (AAAI 2007)
What if summary is inconsistent?
– Either, • Original ABox has a real inconsistencyOr,• ABox was consistent but the process of summarization introduced
fake inconsistency in the summary
Therefore, we follow a process of Refinement to check for real inconsistency
• Refinement = Selectively decompress portions of the summary• Use Justifications for the inconsistency to select portion of
summary to refine– Justification = minimal set of assertions responsible for inconsistency
• Repeat process iteratively till refined summary is consistent or justification is “precise”
IBM Research
© 2007 IBM Corporation14 04/18/23SHER: Scalable Highly Expressive Reasoner
Refinement: Resolving inconsistencies in a summary
C1
M1
H1
isTaughtBy
C2
M2
H2
C3
W1
Original ABox
likes likes
P1
P3
P2
Summary
M’
H’
likes
P’
C’
W’
isTaughtBy
Legend: C – Course P - Person M - ManW – WomanH - Hobby
M’
H’
likes
P’
Cx’
W’
isTaughtBy
Cy’
M’
likes
Px’
Cx’
W’
isTaughtBy
Cy’
Py’
H’
After 1st Refinement After 2nd Refinement – Consistent Summary
Summary is inconsistent
Summary still inconsistent!
C’{C1, C2, C3}
Cx’{C1, C2} Cy’{C3}
isTaughtBy
isTaughtBy
Py’{P3}Px’{P1, P2}
isTaughtBy isTaughtBy isTaughtByisTaughtByisTaughtBy isTaughtBy
TBox:Functional (isTaughtBy)Disjoint (Man, Woman)
isTaughtBy isTaughtBy isTaughtByisTaughtBy
IBM Research
© 2007 IBM Corporation15 04/18/23SHER: Scalable Highly Expressive Reasoner
C1
M1
H1
isTaughtBy
C2
M2
H2
C3
W1
Original ABox
likes likes
P1
P3
P2
Summary
M’
H’
likes
P’
C’
W’
isTaughtBy
Legend: C – Course P - Person M - ManW – WomanH - Hobby
M’
H’
likes
P’
Cx’
W’
isTaughtBy
Cy’
M’
likes
Px’
Cx’
W’
isTaughtBy
Cy’
Py’
H’
After 1st Refinement After 2nd Refinement – Consistent Summary
Summary is inconsistent
Summary still inconsistent!
C’{C1, C2, C3}
Cx’{C1, C2} Cy’{C3}
isTaughtBy
isTaughtBy
Py’{P3}Px’{P1, P2}
Sample Q: PeopleWithHobby?
Not(Q)
Not(Q)
Not(Q)
Solns: P1, P2
Px’
Not(Q)
Not(Q)
Refinement: Solving Membership Query (AAAI 2007)
TBox:Functional (isTaughtBy)Disjoint (Man, Woman)
isTaughtBy isTaughtBy isTaughtBy
isTaughtByisTaughtByisTaughtBy isTaughtBy
IBM Research
© 2007 IBM Corporation16 04/18/23SHER: Scalable Highly Expressive Reasoner
Results : Consistency Check
Ontology Instances Role Assertions Time for consistency check (in s)
Biopax 261,149 582,655 2.3
LUBM-1 42,585 214,177 2.9
LUBM-5 179,871 927,854 5.4
LUBM-10 351,422 1,816,153 5.1
LUBM-30 1,106,858 6,494,950 7.9
NIMD 1,278,540 1,999,787 0.8
ST 874,319 3,595,132 0.4
IBM Research
© 2007 IBM Corporation17 04/18/23SHER: Scalable Highly Expressive Reasoner
Results: Membership Query AnsweringOntology Type assertions Role Assertions
UOBM-1 25,453 214,177
UOBM-10 224,879 1,816,153
UOBM-30 709,159 6,494,950
Reasoner Dataset Avg. Time (in sec)
St. Dev (in sec)
Range
(in sec)
KAON2 UOBM-1 21 1 18 - 37
KAON2 UOBM-10 448 23 414 - 530
SHER UOBM-1 4 4 2 - 24
SHER UOBM-10 15 26 6 - 191
SHER UOBM-30 35 63 12 - 391
IBM Research
© 2007 IBM Corporation18
Improving SHER Performance through integration with a fast but incomplete reasoner Refinement
– Critical for completeness – But time consuming (joins between large tables): majority of time spent in
refinement– However, a lot of solutions “easily” detected using query expansion
• e.g. ColonNeoplasm(x) = Disease(x) ^ hasAssociatedMorphology(x, y) ^ Neoplasm(y) ^ hasFindingSite(x, z) ^ Colon(z)
Improved SHER Performance by adding Query Expansion module– General Idea:
• Quickly find solutions to query• Refine summary to isolate solution individuals• Test remaining individuals
– Advantages:• Any sound technique to find solutions quickly can be used (QE, forward-
chaining based rule system)• Much less refinement required if above technique finds many solutions
– Depending on expressivity of logic, you may not need refinement at all
IBM Research
© 2007 IBM Corporation19
SHER Hybrid Algorithm Evaluation
Avg. Query Answering time for Clinical Trials Use-Case down to 15mins
– Huge reduction in number of refinement steps
IBM Research
© 2007 IBM Corporation20
Conjunctive Query in SHER (ISWC 2008) SHER also supports Grounded Conjunctive Queries (CQ), which combine
membership/type queries and relationship queries R(x,y)– GraduateStudent(x) ^ isMemberOf(x, y) ^ Department(y) ^ subOrganizationOf(y, z)
Solving CQ much harder than MQ– Summarization/Refinement algorithm does not directly apply for RQ
• Intuitively, summarization groups individuals based on type – works well for type queries, but for relationship queries we need to consider pairs of individuals
Alternate 3-step Approach:– Use a Datalog Rule engine to estimate potential relationships– Use various heuristics to find definite relationships– Test remaining relationships in Summary and solve by splitting– Advantage:
• Graceful degradation depending on complexity of query and ontology/data (very fast on realistic queries and datasets)
IBM Research
© 2007 IBM Corporation21
Conjunctive Query Evaluation
Comparison with KAON2 on the UOBM Benchmark
IBM Research
© 2007 IBM Corporation22 04/18/23SHER: Scalable Highly Expressive Reasoner
Outline
Background and motivation
Core SHER technical innovations
–Scalability via summarization
–Refinement: Resolving inconsistencies in a summary
–Integration with incomplete reasoners
–Conjunctive Query Evaluation
SHER concrete applications
–Automated Clinical trials matching using ontologies
–Anatomy Lens: Semantic search over PubMed
–Scalable Text Analytics cleanup
Conclusion and future work
IBM Research
© 2007 IBM Corporation23 04/18/23SHER: Scalable Highly Expressive Reasoner
Matching Patient Records to Clinical Trials Using
OntologiesWith collaboration with Columbia University Medical Center : Chintan Patel and James Cimino
Work presented at ISWC 2007
IBM Research
© 2007 IBM Corporation24 04/18/23SHER: Scalable Highly Expressive Reasoner
ProblemIn complex domains such as healthcare, there is a “semantic gap” between data and queries. E.g.,
Patient data Queries
Patient on Hydrocortisone 2% Patients on drugs with steroids as ingredients
Patient tested positive for Patients with tuberculosis meningitismycobacterium tuberculosis
Can ontology analytics using ontologies such as SNOMED be used to bridge this gap?
Case study on patient recruitment for clinical trials problem
IBM Research
© 2007 IBM Corporation25 04/18/23SHER: Scalable Highly Expressive Reasoner
Clinical Trials MatchingCurrent scenario: A day in the life of Columbia’s Clinical Trials Investigator
Look at criteria in the trial protocol
Pore throughpatient charts
Call Physician to discuss consent
Result: Poor participation in clinical trials!Can ontology analytics be used to find patients that match clinical trial criteria to improve participation?
IBM Research
© 2007 IBM Corporation26 04/18/23SHER: Scalable Highly Expressive Reasoner
Clinical Trials Matching
What we want to do: A day in the life of Columbia’s Clinical Trials Investigator
QuickTime™ and aGIF decompressor
are needed to see this picture.
Look at criteria in the trial protocol
Call Physician to discuss consent
Find patientsautomatically
Patient Data
OntologOntology y
ReasoneReasonerr
Ontologies(SNOMED)
Query for criteria Find matching patients
IBM Research
© 2007 IBM Corporation27 04/18/23SHER: Scalable Highly Expressive Reasoner
Technical Challenges to using ontologies
Knowledge engineering
Clinical Data Repository
NY PresbyterianMedical Entities Dictionary (MED)
Concepts:100210ISA Relationships:148,821
Extensive Local Knowledge
Need to map local knowledge (MED) to domain knowledge (SNOMED),e.g.,
Presence of MRSA on a Lab test means lab test hasCaustiveAgent MRSA.
Coded in MED
IBM Research
© 2007 IBM Corporation28 04/18/23SHER: Scalable Highly Expressive Reasoner
Technical challenges to using ontologies
Scalability of reasoning
– ABox (patient data): 1 year data at Columbia (250K patients) 60M RDF triples
– TBox (SNOMED+MED, 461K concepts)
Expressivity of reasoning: SNOMED is EL++, but ABox contains negation, e.g.,
– Lab results ruled out the presence of an organism
Inconsistent, noisy and incomplete data
– Lab results that indicate both the presence and absence of an organism.
IBM Research
© 2007 IBM Corporation29 04/18/23SHER: Scalable Highly Expressive Reasoner
Overall Solution
MEDLocal Knowledge
SNOMEDDomain Knowledge
∃associatedObservation.MRSA
Patient Data
LabA MRSAOrganism Present
coded in
coded in
ETL
Mapping
Integrated Tbox
Patient Data
LabA:∃causativeAgent.MRSAOrganism
SHER SHER Ontology Ontology ReasonerReasoner
Abox
Query Extraction
semi-automatic
Matching patients
IBM Research
© 2007 IBM Corporation30 04/18/23SHER: Scalable Highly Expressive Reasoner
Results for ~250K patients
Query # Matches Time (in min)Optimized Time
(mins)
MRSA Disorder 1052 68.9 10.8
On Warfarin 3127 63.8 6.3
Breast neoplasm 74 26.4 7.7
Colon neoplasm 164 31.8 6.5Pneumococcal
pneumonia.. 107 56.4 10.7
On metronidazole 2 61.4 5.4
Acute amebiasis... 1357 370.8 20.2
Steroids/cyclosporine 5555 145.5 6.6
Corticosteroids 4794 78.8 6.1
Optimizations to use query expansion to quickly compute and remove “obvious” solutions.
IBM Research
© 2007 IBM Corporation31
AlphaWorks Service: Anatomy Lens
Ontology-based PubMed Search
– GOAL: Real-time Ontology Reasoning on the Web
– Overcome Keyword Search: Poor precision & recall
– Link 3 large OWL Ontologies: FMA, GO, MeSH
– Dataset Size: 16 Million MEDLINE Articles, ~300M Triples
– Support Structured queries:
• Find articles about “neuron development” (GO Process) in the “Hippocampus” region (FMA Part) of the brain
• Possible solution article may be about “dendrite morphogeneis” in the “Archicortex“
IBM Research
© 2007 IBM Corporation32
Real-time Reasoner for Service
EL+ Reasoner in SHER for Anatomy Lens
– Many HCLS Ontologies fall in this EL+ fragment of OWL
Highly optimized:
– Classification Times:
• GO (32K) – 3 s• FMA (75K concepts) – 30 s• SNOMED (350K concepts) – 8 mins
– State-of-the-art reasoner - CEL - takes >2hrs on SNOMED
Additional Features:
– Incremental reasoning
– Explanations support
IBM Research
© 2007 IBM Corporation33
Anatomy Lens Demo
Online Video: http://anatomylens.alphaworks.ibm.com/AnatomyLens/AnatomyLensVideo/AnatomyLensVideo.html
IBM Research
© 2007 IBM Corporation34 04/18/23SHER: Scalable Highly Expressive Reasoner
Scalable Cleanup of Information Extraction Data Using Ontologies
In collaboration with Christopher Welty, James Fan, and William Murdock
Presented at ISWC 2007
IBM Research
© 2007 IBM Corporation35 04/18/23SHER: Scalable Highly Expressive Reasoner
Problem Text extraction from natural language is imperfect
Relationship extraction is especially problematic, e.g.
...the decision in September 1991 to withdraw tactical nuclear bombs, missiles and torpedos from US Navy ships...
Text extraction:
–nuclear ownerOf bombs
–nuclear type Weapon, bombs type Weapon
Can ontology reasoning be used to improve relationship extraction?
IBM Research
© 2007 IBM Corporation36 04/18/23SHER: Scalable Highly Expressive Reasoner
Background SemantiClean (ISWC-2006)
Ontology
–ownerOf domain (Person Organization)⊔
–Person disjointFrom Organization
–Person disjointFrom Weapon
–Organization disjointFrom Weapon...
Add triple at a time
nuclear ownerOf bombnuclear ownerOf bombCheck with Check with
DLDLReasonerReasoner
Discard ifDiscard ifinconsistentinconsistent
Improves relationship extraction by 8-15%.
IBM Research
© 2007 IBM Corporation37 04/18/23SHER: Scalable Highly Expressive Reasoner
Evaluating the Triple at a Time Approach
Scalability
– Text extraction on a normal desktop can process a million documents/day.
– Each document extracts ~70 entities, ~40 relations.
– Consistency detection in DL reasoners does not scale to such large RDF graphs.
IBM Research
© 2007 IBM Corporation38 04/18/23SHER: Scalable Highly Expressive Reasoner
Computational ExperienceDataset
# Documents # Individuals# Role
assertions# Just. Time (mins)
100 8,628 15,521 191 10
500 32,787 62,414 625 19
1500 104,507 195,206 1,570 37
3683 286,605 513,522 2,744 67
IBM Research
© 2007 IBM Corporation39 04/18/23SHER: Scalable Highly Expressive Reasoner
Conclusions
SHER–Reasons over highly expressive ontologies–Reasons on data in relational databases
–No inferencing on load, hence deals better with fast changing data–Integrates with fast incomplete reasoners–Highly scalable -- reasons on 7.7M records in 7.9 s.
• semantically indexed 300 million triples from the medical literature. –Tolerates inconsistencies–Provides explanations
Many applications:–Semantic Matching for clinical trials–Semantic search over PubMed–Scalable text analytics cleanup
What next?– SHER Code release scheduled for the end of June 2008
IBM Research
SHER: Scalable Highly Expressive Reasoner 04/18/23 © 2007 IBM Corporation
THANKS!
QUESTIONS?
More on SHER:
http://domino.research.ibm.com/comm/research_projects.nsf/pages/iaa.index.html
IBM Research
© 2007 IBM Corporation42 04/18/23SHER: Scalable Highly Expressive Reasoner
Integrating MED and SNOMED
Mapping
UMLSNLP (MMTX)Manual
100,210 concepts 361,824 concepts
MED SNOMED
• 17,446 concepts in MED directly mapped by subclass relations to SNOMED concepts (17% of MED).• Including subclasses of 17,446 concepts, the coverage of MED is 75,514 concepts. • 88% of concepts in the Abox were covered by the integrated Tbox.
Integrated Tbox
IBM Research
© 2007 IBM Corporation43 04/18/23SHER: Scalable Highly Expressive Reasoner
Modeling patient data in SNOMED
Patient DataLabA MRSAOrganism Present
ETL
Patient Data
LabA:∃causativeAgent.MRSAOrganism
Abox
• Modeling positive and negative results, e.g.
LabEventA MRSAOrganism Absent
is modeled as:LabEventA:
∀causativeAgent.¬MRSAOrganism
• Modeling groupings of eventsRadiologyEventA findingSite ColonRadiologyEventA morphology Neoplasm
is modeled as:RadiologyEventA:
∃roleGroup.(∃hasMorphology.Neoplasm ⊓∃hasFindingSite. Colon)
IBM Research
© 2007 IBM Corporation44 04/18/23SHER: Scalable Highly Expressive Reasoner
Validation (100 patient records)Query # Matches # Misses Time (in s)
MRSA Disorder 1 54
On Warfarin 4 78
Breast neoplasm 0 1 29
Colon neoplasm 1 51
Pneumococcal pneumonia.. 0 39
On metronidazole 0 1 29
Acute amebiasis... 0 225
Steroids/cyclosporine 6 117
Corticosteroids 6 8 118
Misses primarily due to incorrect mappings. No false positives
IBM Research
© 2007 IBM Corporation45 04/18/23SHER: Scalable Highly Expressive Reasoner
Solutions to Challenges
Large Aboxes (patient data): Reason on a summarized version of the data (ISWC 2006).
Large Tboxes (SNOMED+MED): Compute closure of concepts in Abox which is 22,561 concepts.
Incomplete data. MRSA defined as •∃hasCausativeAgent.MRSAOrganism Infection⊓
Patient record will never indicate infection, hence no matches.
Convert all conjuncts to disjuncts for user specified concepts (e.g., MRSA Disorder) in the query.
•∃hasCausativeAgent.MRSAOrganism Infection⊔
IBM Research
© 2007 IBM Corporation46 04/18/23SHER: Scalable Highly Expressive Reasoner
Justification based consistent subset
Set of removed assertionsx removed because of J1y cannot be removed because of J2 BUT y can be removed due to J3.
JustificationsJ1 - x, y, mJ2 - x, y, zJ3 - y, q
Justification based consistent subset by example:
BUT for knowledge bases filled with thousands of inconsistencies, even this justification based consistent subset computation may not scale.
Approximate cleanup technique
IBM Research
© 2007 IBM Corporation47 04/18/23SHER: Scalable Highly Expressive Reasoner
Approximate cleanup
Summarization: Perform consistency detection on a summarized version of the larger RDF graph.
aa
bb
a:Nation
b:Organization
partOfcc
dd
c:Nation
d:Person
ownerOfee
ff
e:Organization
f:Nation
residentOf
Original data (Abox)
uu vv
ss
u:Nation v:Organization
s:Person
Summary Abox
Mapping function f satisfies:• If a:C ∈ A, then f(a):C ∈ A’• If R(a, b) ∈ A, then R(f(a), f(b)) ∈ A’• If a≠b ∈ A, then f(a) ≠ f(b) ∈ A’If A’ is consistent, then A is consistent. Converse does not hold.
IBM Research
© 2007 IBM Corporation48 04/18/23SHER: Scalable Highly Expressive Reasoner
Isolating an inconsistency
uu vv
ss
u:Nation v:Organization
s:Person
Summary Abox
• Check consistency• Find justification (minimal set of assertions that cause the inconsistency), e.g., u ownerOf s• Refine the summary to make the justification more precise.
ownerOf
uu
ss
a,c,f in A mapped to u
s:Person
ownerOf
uu
ss
a mapped to u
s:Person
ownerOf
To make a justification more precise, refine summary individuals in the justification by the sets of role assertions they have.
uu’’
vvc,f mapped to u’
v:Organization
Refined summary
IBM Research
© 2007 IBM Corporation49 04/18/23SHER: Scalable Highly Expressive Reasoner
Is the inconsistency real?
Stop refining when a justification is precise. A justification J is precise when:For all summary individuals s ∈ J, and for all role assertions R(s, t) ∈ J implies that for all individuals a ∈ A such that f(a)=s, there is an individual b ∈ A such that f(b)=t and R(a,b) ∈ A.
uu
ss
a mapped to u
ownerOf uu’’
vvc,f mapped to u’
Precise justification
d mapped to s
Refined summaryKey to scalable inconsistency detection:Precise justification J where each individual in J has many thousand individuals in A mapped to it
IBM Research
© 2007 IBM Corporation50 04/18/23SHER: Scalable Highly Expressive Reasoner
Cleaning up inconsistencies
Once a precise justification is found, check if it is conclusive. A precise justification is conclusive if for example:
– its acyclic
– its cyclic, but can be shown to be acyclic after the application of deterministic tableau rules
– No real use cases where justifications are not conclusive.
Remove a single assertion of a precise, conclusive justification. Iterate to find all justifications, until the knowledge base is consistent.
IBM Research
© 2007 IBM Corporation51 04/18/23SHER: Scalable Highly Expressive Reasoner
Advantages of approximate cleanup
For acyclic justifications, removal of a single assertion in J in the summary is equivalent to removing a single assertion in all the justifications in the original data that are “instances” of J
Produces a justification-based consistent subset in a scaleable way if data has only acyclic justifications.
IBM Research
© 2007 IBM Corporation52 04/18/23SHER: Scalable Highly Expressive Reasoner
Disadvantages of approximate cleanup
In the case of a cyclic justification, if the cycle maps to a large cycle in the Abox, e.g:
aa
bb
cc
dd
Original Abox
R
R
R
Ra:A⊓∀R.¬AR is transitive
ssR
b:Q
c:Q
d:Qs:A⊓∀R.¬A
Summary Abox
Removal of a single role assertion in J eliminates the whole cycle! Extra deletion of assertions.
ttt:Q
J
IBM Research
© 2007 IBM Corporation53 04/18/23SHER: Scalable Highly Expressive Reasoner
How aggressive is approximate cleanup?
# Documents # Individuals# Role
assertions# Deleted
Estimated extra
deletions
100 8,628 15,521 299 19
500 32,787 62,414 1,150 89
1500 104,507 195,206 3,910 359
3683 286,605 513,522 9,574 967
Estimated extra deletions, based on the assumption that only one assertion needed to be removed from each justification.