2015 tapp - towards constraint-based explanations for answers and non-answers
Post on 10-Aug-2015
106 Views
Preview:
TRANSCRIPT
Towards Constraint-based Explanations for Answers and
Non-Answers
Boris Glavic
Illinois Institute of Technology
Sean Riddle Athenahealth Corporation
Sven Köhler University of California
Davis
Bertram Ludäscher University of Illinois Urbana-Champaign
Outline
① Introduction ② Approach ③ Explanations ④ Generalized Explanations ⑤ Computing Explanations with Datalog ⑥ Conclusions and Future Work
Overview
• Introduce a unified framework for generalizing explanations for answers and non-answers
• Why/why-not question Q(t) • Why is tuple t not in result of query Q?
• Explanation • Provenance for the answer/non-answer
• Generalization • Use an ontology to summarize and generalize
explanations • Computing generalized explanations for UCQs • Use Datalog
1
Train-Example
2
• 2hop(X,Y) :-‐ Train(X,Z), Train(Z,Y). • Why can’t I reach Berlin from Chicago? • Why-not 2hop(Chicago,Berlin)
From To
New York Washington DC
Washington DC New York
New York Chicago
Chicago New York
… …
Berlin Munich
Munich Berlin
… …
Sea:le
Chicago
Washington DC
New York
Paris
Berlin
Munich
Atlan=c Ocean!
Train-Example Explanations
• 2hop(X,Y) :-‐ Train(X,Z), Train(Z,Y). • Missing train connections explain why Chicago
and Berlin are not connected • E.g., if there only would exist a train line between
New York and Berlin: Train(New York, Berlin)!
3
Sea:le
Chicago
Washington DC
New York
Paris
Berlin
Munich
Atlan=c Ocean!
Why-not Approaches
• Two categories of data-based explanations for missing answers
• 1) Enumerate all failed rule derivations and why they failed (missing tuples) • Provenance games
• 2) One set of missing tuples that fulfills optimality criterion • e.g., minimal side-effect on query result • e.g., Artemis, …
4
Why-not Approaches
• 1) Enumerate all failed rule derivations and why they failed (missing tuples) • Exhaustive explanation • Potentially very large explanations
• Train(Chicago,Munich), Train(Munich,Berlin) • Train(Chicago,Seattle), Train(Seattle,Berlin) • …
• 2) One set of missing tuples that fulfills optimality criterion • Concise explanation that is optimal in a sense • Optimality criterion not always good fit/effective
• Consider reach (transitive closure) • Adding any train connection between USA and Europe
- same effect on query result 5
Uniform Treatment of Why/Why-not
• Provenance and missing answer approaches have been treated mostly independently
• Observation: • For provenance models that support query
languages with “full” negation • Why and why-not are both provenance
computations! • Q(X) :-‐ Train(chicago,X). • Why-not Q(New York)? • Equivalent to why Q’(New York)? • Q’(X) :-‐ adom(X), not Q(X)
6
Outline
① Introduction ② Approach ③ Explanations ④ Generalized Explanations ⑤ Computing Explanations with Datalog ⑥ Conclusions and Future Work
Unary Train-Example
• Q(X) :-‐ Train(chicago,X). • Why-not Q(berlin) • Explanation: Train(chicago,berlin)
• Consider an available ontology! • More general: Train(chicago,GermanCity)
7
Sea:le
Chicago
Washington DC
New York
Paris
Berlin
Munich
Atlan=c Ocean!
ACity
NACity
EuropeanCityUSCity
IllinoisCity WashingtonCity NYStateCity DCCity GermanCity FrenchCity
chicago seattle newyork washington_dc berlin munich paris lyon dijon
Unary Train-Example
• Q(X) :-‐ Train(chicago,X). • Why-not Q(berlin) • Explanation: Train(chicago,berlin)
• Consider an available ontology! • Generalized explanation:
• Train(chicago,GermanCity) • Most general explanation:
• Train(chicago,EuropeanCity)
8
ACity
NACity
EuropeanCityUSCity
IllinoisCity WashingtonCity NYStateCity DCCity GermanCity FrenchCity
chicago seattle newyork washington_dc berlin munich paris lyon dijon
Our Approach
• Explanations for why/why-not questions • over UCQ queries • Successful/failed rule derivations
• Utilize available ontology • Expressed as inclusion dependencies • “mapped” to instance
• E.g., city(name,country) • GermanCity(X) :-‐ city(X,germany).
• Generalized explanations • Use concepts to describe subsets of an explanation
• Most general explanation • Pareto-optimal 9
ACity
NACity
EuropeanCityUSCity
IllinoisCity WashingtonCity NYStateCity DCCity GermanCity FrenchCity
chicago seattle newyork washington_dc berlin munich paris lyon dijon
Related Work - Generalization
• ten Cate et al. High-‐Level Why-‐Not Explana9ons using Ontologies [PODS ‘15] • Also uses ontologies for generalization • We summarize provenance instead of query results! • Only for why-not, but, extension to why trivial
• Other summarization techniques using ontologies • Data X-ray • Datalog-S (datalog with subsumption)
10
Outline
① Introduction ② Approach ③ Explanations ④ Generalized Explanations ⑤ Computing Explanations with Datalog ⑥ Conclusions and Future Work
Rule derivations
11
• What causes a tuple to be or not be in the result of a query Q? • Tuple in result – exists >= 1 successful rule
deriva=on which jus=fies its existence • Existen=al check
• Tuple not in result -‐ all rule deriva=ons that would jus=fy its existence have failed • Universal check
• Rule deriva=on • Replace rule variables with constants from
instance • Successful: body if fulfilled
Basic Explanations
12
• A basic explana=on for ques=on Q(t) • Why -‐ successful deriva=ons with Q(t) as head • Why-‐not -‐ failed rule deriva=ons • Replace successful goals with placeholder T • Different ways to fail
2hop(Chicago,Munich) :-‐ Train(Chicago,New York), Train(New York,Munich). 2hop(Chicago,Munich) :-‐ Train(Chicago,Berlin), Train(Berlin,Munich). 2hop(Chicago,Munich) :-‐ Train(Chicago,Paris), Train(Paris,Munich).
Sea:le
Chicago
Washington DC
New York
Paris
Berlin
Munich
Explanations Example
13
• Why 2hop(Paris,Munich)?
2hop(Paris,Munich) :-‐ Train(Paris,Berlin), Train(Berlin,Munich).
Sea:le
Chicago
Washington DC
New York
Paris
Berlin
Munich
Outline
① Introduction ② Approach ③ Explanations ④ Generalized Explanations ⑤ Computing Explanations with Datalog ⑥ Conclusions and Future Work
Generalized Explanation
14
• Generalized Explanations • Rule derivations with concepts
• Generalizes user question • generalize a head variable
2hop(Chicago,Berlin) – 2hop(USCity,EuropeanCity)
• Summarizes provenance of (non-) answer • generalize any rule variable
2hop(New York,Seattle) :-‐ Train(New York,Chicago), Train(Chicago,Seattle). 2hop(New York,Seattle) :-‐ Train(New York,USCity), Train(USCity,Seattle).
Generalized Explanation Def.
14
• For user question Q(t) and rule r • r(C1,…,Cn)
① (C1,…,Cn) subsumes user question ② headvars(C1,…,Cn) only cover existing/
missing tuples ③ For every tuple t’ covered by headvars(C1,
…,Cn) all rule derivations for t’ covered are explanations for t’
Recap Generalization Example
15
• r: Q(X) :-‐ Train(chicago,X). • Why-not Q(berlin) • Explanation: r(berlin)
• Generalized explanation: • r(GermanCity)
ACity
NACity
EuropeanCityUSCity
IllinoisCity WashingtonCity NYStateCity DCCity GermanCity FrenchCity
chicago seattle newyork washington_dc berlin munich paris lyon dijon
Most General Explanation
16
• Domination Relationship • r(C1,…,Cn) dominates r(D1,…,Dn) • if for all i: Ci subsumes Di • and exists i: Ci strictly subsumes Di
• Most General Explanation • Not dominated by any other explanation
• Example most general explanation: • r(EuropeanCity)
ACity
NACity
EuropeanCityUSCity
IllinoisCity WashingtonCity NYStateCity DCCity GermanCity FrenchCity
chicago seattle newyork washington_dc berlin munich paris lyon dijon
Outline
① Introduction ② Approach ③ Explanations ④ Generalized Explanations ⑤ Computing Explanations with Datalog ⑥ Conclusions and Future Work
Datalog Implementation
① Rules for checking subsump=on and domina=on of concept tuples
② Rules for successful and failed rule deriva=ons • Return variable bindings
③ Rules that model explana=ons, generaliza=on, and most general explana=ons
17
① Modeling Subsumption
• Basic concepts and concepts isBasicConcept(X) :-‐ Train(X,Y). isConcept(X) :-‐ isBasicConcept(X). isConcept(EuropeanCity).
• Subsump9on (inclusion dependencies) subsumes(GermanCity,EuropeanCity). subsumes(X,GermanCity) :-‐ city(X,germany).
• Transi9ve closure subsumes(X,Y) :-‐ subsumes(X,Z), subsumes(Z,Y).
• Non-‐strict version subsumesEqual(X,X) :-‐ isConcept(X). subsumesEqual(X,Y) :-‐ subsumes(X,Y).
18
② Capture Rule Derivations
• Rule r1:2hop(X,Y) :-‐ Train(X,Z), Train(Z,Y). • Success and failure rules r1_success(X,Y,Z) :-‐ Train(X,Z), Train(Z,Y). r1_fail(X,Y,Z) :-‐ isBasicConcept(X),
isBasicConcept(Y), isBasicConcept(Z), not r1_success(X,Y,Z).
More general: r1(X,Y,Z,true,false) :-‐ isBasicConcept(Y),
Train(X,Z), not Train(Z,Y).
19
③ Model Generalization
• Explana9on for Q(X) :-‐ Train(chicago,X). expl_r1_success(C1,B1) :− subsumesEqual(B1,C1),
r1_success(B1), not has_r1_fail(C1).
User ques=on: Q(B1) Explanation: Q(C1) :-‐ Train(chicago, C1). Q(B1) exists and jus=fied by r1: r1_success(B1) r1 succeeds for all B in C1: not has_r1_fail(C1)
20
③ Model Generalization
• Explana9on for Q(X) :-‐ Train(chicago,X). expl_r1_success(C1,B1) :− subsumesEqual(B1,C1),
r1_success(B1), not has_r1_fail(C1).
21
ACity
NACity
EuropeanCityUSCity
IllinoisCity WashingtonCity NYStateCity DCCity GermanCity FrenchCity
chicago seattle newyork washington_dc berlin munich paris lyon dijon
③ Model Generalization
• Domina9on dominated_r1_success(C1,B1) :-‐
expl_r1_success(C1,B1), expl_r1_success(D1,B1), subsumes(C1, D1).
• Most general explana9on most_gen_r1_success(C1,B1) :-‐
expl_r1_success(C1,B1), not dominated_r1_success(C1,B1).
• Why ques9on why(C1) :-‐ most_gen_r1_success(C1,seattle).
22
ACity
NACity
EuropeanCityUSCity
IllinoisCity WashingtonCity NYStateCity DCCity GermanCity FrenchCity
chicago seattle newyork washington_dc berlin munich paris lyon dijon
Outline
① Introduction ② Approach ③ Explanations ④ Generalized Explanations ⑤ Computing Explanations with Datalog ⑥ Conclusions and Future Work
Conclusions
• Unified framework for generalizing provenance-based explanations for why and why-not questions
• Uses ontology expressed as inclusion dependencies (Datalog rules) for summarizing explanations
• Uses Datalog to find most general explanations (pareto optimal)
23
Future Work I
• Extend ideas to other types of constraints • E.g., denial constraints – German cities have less than 10M inhabitants :-‐ city(X,germany,Z), Z > 10,000,000
• Query returns countries with very large cities Q(Y) :-‐ city(X,Y,Z), Z > 15,000,000
• Why-not Q(germany)? – Constraint describes set of (missing) data – Can be answered without looking at data
• Semantic query optimization? 24
Future Work II
• Alternative definitions of explanation or generalization – Our gen. explanations are sound, but not complete – Complete version Concept covers at least explanation – Sound and complete version: Concepts cover explanation exactly
• Queries as ontology concepts – As introduced in ten Cate
25
ACity
NACity
EuropeanCityUSCity
IllinoisCity WashingtonCity NYStateCity DCCity GermanCity FrenchCity
chicago seattle newyork washington_dc berlin munich paris lyon dijon
ACity
NACity
EuropeanCityUSCity
IllinoisCity WashingtonCity NYStateCity DCCity GermanCity FrenchCity
chicago seattle newyork washington_dc berlin munich paris lyon dijon
ACity
NACity
EuropeanCityUSCity
IllinoisCity WashingtonCity NYStateCity DCCity GermanCity FrenchCity
chicago seattle newyork washington_dc berlin munich paris lyon dijon
Future Work III
• Extension for FO queries – Generalization of provenance game graphs – Need to generalize interactions of rules
• Implementation – Integrate with our provenance game engine
• Powered by GProM! • Negation - not yet • Generalization rules - not yet
26
GProMParserParser
Query Log-- --- ----- -- --- -- -- - - - -------- --- - ---
Query Log-- --- ----- -- --- -- -- - - - -------- --- - ---
Datalog Parser
SELECT *FROM ...
Q(X) :- R(X,Y).Why(Q(1)).
ProvenanceGame
Rewriter
SQL CodeGeneratorSQL Code
GeneratorSQL CodeGenerator
User
Backend Database
Datalog Translator
Q(X) :- R(X,Y).Why(Q(1)).
move((((((('notREL_' || 'R_LOST') || '(') || 1) || ',') || V0) || ')'),(((((('REL_' || 'R_WON') || '(') || 1) || ',') || V0) || ')')) :- RR_WON_+(1,V0).move((((((('REL_' || 'R_WON') || '(') || 1) || ',') || V0) || ')'),(((((('EDB_' || 'R_LOST') || '(') || 1) || ',') || V0) || ')')) :- RR_WON_+(1,V0).move((((('REL_' || 'Q_WON') || '(') || 1) || ')'),(((((('RULE_' || '0_LOST') || '(') || 1) || ',') || Y) || ')')) :- r0_WON_+(1,Y).move((((((('RULE_' || '0_LOST') || '(') || 1) || ',') || Y) || ')'),(((((('GOAL_' || '0_0_WON') || '(') || 1) || ',') || Y) || ')')) :- r0_WON_+(1,Y).move((((((('GOAL_' || '0_0_WON') || '(') || 1) || ',') || Y) || ')'),(((((('notREL_' || 'R_LOST') || '(') || 1) || ',') || Y) || ')')) :- r0_WON_+(1,Y).move((((((('notREL_' || 'R_LOST') || '(') || 1) || ',') || Y) || ')'),(((((('REL_' || 'R_WON') || '(') || 1) || ',') || Y) || ')')) :- r0_WON_+(1,Y).r0_WON_+(1,Y) :- r0_WON_+_nonlinked(1,Y).RR_WON_+_nonlinked(1,V0) :- R(1,V0).RQ_WON_+_nonlinked(1) :- r0_WON_+_nonlinked(1,Y).RR_WON_+(1,V1) :- +r0_WON_+(1,V1),RR_WON_+_nonlinked(1,V1).r0_WON_+_nonlinked(1,Y) :- +RR_WON_+_nonlinked(1,Y).
Questions?
• Boris – http://cs.iit.edu/~dbgroup/index.html
• Bertram – https://www.lis.illinois.edu/people/faculty/
ludaesch
Relationship to (Constraint) Provenance Games
36
¬Train(Chicago,Munich)
g17(Chicago,Berlin)
Train(Chicago,Munich) Train(NewY ork,Berlin)
r7(Chicago,WashingtonDC,WashingtonDC,Berlin)
g27(Chicago,Berlin) g17(Chicago, Chicago)
r7(Chicago,Munich,Munich,Berlin)r7(Chicago,Berlin,Berlin,Berlin)
g27(NewY ork,Berlin)
Train(Berlin,Berlin)
r7(Chicago,NewY ork,NewY ork,Berlin)
¬Train(NewY ork,Berlin)
g27(Berlin,Berlin)
¬Train(Chicago,Berlin)
g27(WashingtonDC,Berlin)
¬Train(Chicago, Chicago) ¬Train(WashingtonDC,Berlin)
g17(Chicago,Munich)
¬Train(Chicago,WashingtonDC)
Train(Chicago,WashingtonDC)
g17(Chicago,WashingtonDC)
TwoHop(Chicago,Berlin) ¬Train(Chicago,WashingtonDC)
Train(WashingtonDC,Berlin)Train(Chicago, Chicago)
r7(Chicago, Chicago, Chicago,Berlin)
¬Train(Berlin,Berlin)
Train(Chicago,Berlin)
9 Berlin
9 Washington DC9 New York
9 Chicago 9 Munich
1
TwoHop :x1 = CHI,x2 6= WDC,x2 6= CHI
Train :x2 6= WDC,x2 6= CHI,x1 = NY C
G11 : Train :y 6= NY C,x = CHI
R1 :x = CHI,y = CHI,z = NY C
R1 :x = CHI,y = BER,z = MUN
R1 :y 6= NY C,x = CHI,y 6= WDC,y 6= CHI,y 6= BER,z 6= BER
G21 : Train :y 6= NY C,y 6= WDC,y 6= CHI,y 6= BER,y 6= MUN,z = BER
G21 : Train :
z 6= MUN,y = BER
Train :x2 6= NY C,x1 = WDC
G21 : Train :z 6= NY C,y = WDC
G21 : Train :
z 6= WDC,z 6= CHI,y = NY C
Train :x1 6= NY C,x1 6= WDC,x1 6= CHI,x1 6= BER,x2 6= BER
R1 :x = CHI,y = MUN,z = BER
R1 :x = CHI,z 6= NY C,y = WDC
¬Train :x2 6= NY C,x1 = WDC
R1 :x = CHI,z 6= NY C,y = CHI
¬Train :x1 6= NY C,x1 6= WDC,x1 6= CHI,x1 6= BER,x2 6= BER
R1 :x = CHI,y = NY C,z 6= WDC,z 6= CHI
Train :x2 6= MUN,x1 = BER
¬Train :x2 6= WDC,x2 6= CHI,x1 = NY C
Train :x1 6= NY C,x1 6= WDC,x1 6= CHI,x1 6= BER,x1 6= MUN,x2 = BER
¬Train :x2 6= MUN,x1 = BER
G21 : Train :y 6= NY C,y 6= WDC,y 6= CHI,y 6= BER,z 6= BER
¬Train :x2 6= NY C,x1 = CHI
G21 : Train :z 6= NY C,y = CHI
¬Train :x1 6= NY C,x1 6= WDC,x1 6= CHI,x1 6= BER,x1 6= MUN,x2 = BER
R1 :x = CHI,y = WDC,z = NY C
R1 :x = CHI,z 6= MUN,y = BER
Train :x2 6= NY C,x1 = CHI
R1 :y 6= NY C,x = CHI,y 6= WDC,y 6= CHI,y 6= BER,y 6= MUN,z = BER
1
top related