(long version: cs.unb.ca/~boley/talks/rulesobdaboley/talks/rulesobda.pdfmediator vs. warehouse...

105
Department of Informatics Colloquium, University of Zurich, Switzerland, April 10, 2014 11th Annual CS Research Exposition, UNB Fredericton, NB, May 2, 2014 Seminar, IBM Research, Zurich, Switzerland, June 23, 2014 FE Colloquium, WSL Birmensdorf, Switzerland, June 30, 2014 Kolloquium Informatik / Semantic Web Meetup, Freie Universität Berlin, Germany, July 4, 2014 Computational Logic Seminar - MUGS, Stanford Logic Group, Stanford University, CA, October 22, 2014 Seminar, Nuance Communications, Sunnyvale, CA, June 24, 2015 Seminar, Department of Computer Science, Stony Brook University, February 4, 2016 (Long version: cs.unb.ca/~boley/talks/RulesOBDA.pdf)

Upload: others

Post on 17-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Department of Informatics Colloquium, University of Zurich, Switzerland, April 10, 2014

11th Annual CS Research Exposition, UNB Fredericton, NB, May 2, 2014

Seminar, IBM Research, Zurich, Switzerland, June 23, 2014

FE Colloquium, WSL Birmensdorf, Switzerland, June 30, 2014

Kolloquium Informatik / Semantic Web Meetup, Freie Universität Berlin, Germany, July 4, 2014

Computational Logic Seminar - MUGS, Stanford Logic Group, Stanford University, CA, October 22, 2014

Seminar, Nuance Communications, Sunnyvale, CA, June 24, 2015

Seminar, Department of Computer Science, Stony Brook University, February 4, 2016

(Long version: cs.unb.ca/~boley/talks/RulesOBDA.pdf)

Page 2: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Abstract Ontology-Based Data Access (OBDA) enables automated reasoning over an ontology as a generalized global schema for the data in local (e.g., relational or graph) databases reachable through mappings. OBDA can semantically validate, enrich, and integrate heterogeneous data sources. Motivated by rule-ontology synergies, this talk discusses key concepts of OBDA and their foundation in three kinds of (logical) 'if-then' rules, using examples from the DeltaForest case study on the susceptibility of forests to climate change. (1) A query is a special Datalog rule whose conjunctive body can be rewritten (see 2) and unfolded (see 3), and whose n-ary head predicate instantiates the distinguished answer variables of the body predicates. OBDA ontologies beyond RDF Schema (RDFS) expressivity usually permit negative constraints for data validation, which are translated to Boolean conjunctive queries corresponding to integrity rules. (2) The OBDA ontology supports query rewriting and database materialization through global-schema-level reasoning. It usually includes the expressivity of RDFS, whose class and property subsumptions can be seen as single-premise Datalog rules with, respectively, unary and binary predicates, and whose remaining axioms are also definable by rules. OBDA ontologies often extend RDFS to the description logic DL-Lite (as in OWL 2 QL), including subsumption axioms that correspond to (head-)existential rules. Recent work has also explored Rule-Based Data Access, e.g. via Description Logic Programs (as in OWL 2 RL, definable in RIF-Core), Datalog+/-, and Disjunctive Datalog. (3) OBDA data integration is centered on Global-As-View (GAV) mappings, which are safe Datalog rules allowing query unfolding of each global head predicate into a conjunction of local body predicates. These (heterogeneous) conjunctive queries can be further mapped to the database languages of the sources (e.g., to SQL or SPARQL). Conversely, the same GAV mappings allow database folding. The talk develops a unified architecture for (1) to (3).

1

Page 3: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

What is Knowledge-Based Data Access? KBDA applies AI / Semantic Technologies to databases

Founded on (logical) knowledge representation in, e.g., description logics or Datalog / deductive databases, query answering/optimization, and data integration/federation (e.g., Relational Logic Approach), with possible focus on:

Ontology-Based Data Access (OBDA)

Rule-Based Data Access (RBDA)

Knowledge Base (TBox, rule base) as generalized global schema for data in local e.g., relational or graph DBs (ABoxes, fact bases)

KB module amplifies data storage & query execution of distributed, heterogeneous (No)SQL DBs

Provides multi-purpose knowledge level for data

2

Explore synergies!

Page 4: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Global schema

Local schemas

Preview: Unified KBDA Architecture (for Bidirectional QueryData Flow)

3

Rewriting

KB

Q Q’

Folding Unfolding

DB1

. . .

Q1’’

Q2’’

Qn’’

DB2

DBn

DB’ Materialization

DB

Mappings

Page 5: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Why Knowledge-Based Data Access? Domain knowledge utilized to deal with data torrent

Domain experts conceptually unfold queries / fold data

via Mappings defined with IT (SQL, SPARQL, …) experts

User concepts are captured in Knowledge Base for

domain-enriched database querying / materialization

without IT experts (queries also for data integrity)

Engines use KB to deduce answers implicit in DBs

Analytics enabled by queries exploring hypotheses

Enrichment of KB with verified hypotheses

KB as major organizational resource also for, e.g.:

Schema-level query answering (even without DBs) 4

Page 6: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

KB Contains Formal Knowledge as Ontologies and/or Rules

FormalKnowledge

OntologyKnowledge RuleKnowledge

FactKnowledge/Data TaxonomyKnowledge

5

Page 7: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Data Access as Data Integration, Querying, and Management

Data Integration: “read[+write]” combining local (distributed, heterogeneous) data sources under a global schema

Data Querying: “read-only” retrieving & inferring answers from integrated data sources

Data Management: “read+write” querying & updating data sources (including with asserts and retracts)

6

Data Access conceived in layers:

Page 8: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Mediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons

Mediator: KB-enriched queries via Query Rewriting Less space in time/space trade-off ( more expansion, not execution time) Databases are left in original form (for big volume, variety, velocity) KB/DB separation makes all optimizations available from DB engines Repeated transformations for repeated queries ( workaround: caching) Partial views ( workaround: use maximally general, non-ground queries)

Warehouse: KB-enriched sources via Database Materialization Less time in DB time/space trade-off ( more space) Queries are left in original form Synchronization needed between DB original and copy Metadata often lost during DB transformation Total view (staging area for broad analytics)

7

Page 9: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

8

Knowledge-Based Data Access, Using Strategy: 3-Dimensional KBDAs

Integration Querying Management

Access

Ontology

Rule

Kn

ow

led

ge

warehouse

mediator

< <

<

bidirectional

Page 10: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

9

OBDIwarehouse

Three Dimensions of KBDAs: O,I,w

Integration

O

n

t

o

l

o

g

y

-Based Data

Page 11: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

10

OBDQwarehouse O

n

t

o

l

o

g

y

Querying -Based Data

Three Dimensions of KBDAs: O,Q,w

Page 12: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

11

OBDMwarehouse O

n

t

o

l

o

g

y

Management -Based Data

Three Dimensions of KBDAs: O,M,w

Page 13: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

12

RBDIwarehouse R

u

l

e

Integration -Based Data

Three Dimensions of KBDAs: R,I,w

Page 14: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

13

RBDQwarehouse R

u

l

e

Querying -Based Data

Three Dimensions of KBDAs: R,Q,w

Page 15: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

14

RBDMwarehouse R

u

l

e

Management -Based Data

Three Dimensions of KBDAs: R,M,w

Page 16: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

15

OBDImediator

Three Dimensions of KBDAs: O,I,m

Integration

O

n

t

o

l

o

g

y

-Based Data

Page 17: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

16

OBDQmediator O

n

t

o

l

o

g

y

Querying -Based Data

Three Dimensions of KBDAs: O,Q,m

Page 18: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

17

OBDMmediator O

n

t

o

l

o

g

y

Management -Based Data

Three Dimensions of KBDAs: O,M,m

Page 19: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

18

RBDImediator R

u

l

e

Integration -Based Data

Three Dimensions of KBDAs: R,I,m

Page 20: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

19

RBDQmediator R

u

l

e

Querying -Based Data

Three Dimensions of KBDAs: R,Q,m

Page 21: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

20

RBDMmediator R

u

l

e

Management -Based Data

Three Dimensions of KBDAs: R,M,m

Page 22: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

21

KBDAbidirectional K

n

o

w

l

e

d

g

e

Access -Based Data

Three Dimensions of KBDAs: b

Page 23: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

22

RBDQbidirectional R

u

l

e

Querying -Based Data

Three Dimensions of KBDAs: R,Q,b

Page 24: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

RBDA Realizes Uniform KBDA 1 of 3: Queries as Rules

1. a) A conjunctive query is a special Datalog rule whose body can be rewritten (see 2.) and unfolded (see 3.), and whose head instantiates the distinguished answer variables of the body b) KBDA ontologies beyond RDF Schema (RDFS) often permit Boolean conjunctive queries corresponding to integrity rules

2. …

3. …

23

Page 26: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

RBDA Realizes Uniform KBDA 3 of 3: Mappings as Rules

1. …

2. …

3. KBDA data integration is centered on

Global-As-View (GAV) mappings,

which are Datalog rules for, e.g., unfolding

each global head predicate to (a join, i.e.

conjunction, of) local body predicates

25

Page 27: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

RBDA Realizes Uniform KBDA All of: Queries, KBs, and Mappings as Rules

1. a) A conjunctive query is a special Datalog rule whose body can be rewritten (see 2.) and unfolded (see 3.), and whose head instantiates the distinguished answer variables of the body b) KBDA ontologies beyond RDF Schema (RDFS) often permit Boolean conjunctive queries corresponding to integrity rules

2. KBDA KB supports, e.g., query rewriting through global-schema-level reasoning, including with RDFS taxonomies or Datalog rule axioms, and DL-Lite (OWL 2 QL) or (head-)existential rules; KBDA rules also permit Description Logic Programs (OWL 2 RL), Datalog, and Disjunctive Datalog. [Semantics of ontology languages customizable for expressivity and efficiency requirements by adding/deleting rules (SPIN)]

3. KBDA data integration is centered on Global-As-View (GAV) mappings, which are Datalog rules for, e.g., unfolding each global head predicate to (a join, i.e. conjunction, of) local body predicates

26 This uniformity for mediators also holds for warehouses and bidirectionally

Page 28: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

27

Example: Forest/Orchard Knowledge

Page 29: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

EntityWithTree KB: Named Root Class (1)

28

EntityContainingAtLeastOneTree

Forest Orchard

EntityContainingAtLeastOneTree Forest. EntityContainingAtLeastOneTree Orchard. Forest Woodland.

Subsumption axioms (in description-logic syntax):

Woodland

“ ” is taxonomy-style

(description logic)

‘subsumes’ infix

Root of taxonomy tree of

tree-containing entities to

see the forest for the trees

Page 30: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

EntityWithTree KB: Named Root Class (2)

29

EntityContainingAtLeastOneTree

Forest Orchard

EntityContainingAtLeastOneTree Forest. EntityContainingAtLeastOneTree Orchard. Forest Woodland.

Subsumption axioms (in higher-order rule syntax):

Woodland

“ ” is rule-style

‘is implied by’ infix

Root of taxonomy tree of

tree-containing entities to

see the forest for the trees

Page 31: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

EntityContainingAtLeastOneTree

Forest Orchard

EntityContainingAtLeastOneTree :- Forest. EntityContainingAtLeastOneTree :- Orchard. Forest :- Woodland.

Subsumption axioms (in higher-order rule syntax):

Woodland

Root of taxonomy tree of

tree-containing entities to

see the forest for the trees

“ ” is Prolog-style

‘if’ infix

:-

EntityWithTree KB: Named Root Class (3)

“ ” is often used for logical implication rules. “ ” is often used for logic programming rules,

with premises allowing Negation as failure (Naf). While we will not need Naf here, it can be mixed in

(e.g., as in the Relational Logic Approach and Naf Hornlog RuleML)

:-

30

Page 32: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

EntityWithTree KB: Constructed Root Class

31

contains.Tree

contains.Tree :- Forest. contains.Tree :- Orchard. Forest :- Woodland.

Subsumption axioms (in higher-order rule-infix syntax):

Forest Orchard

Woodland

Entities each having a

(multi-valued) contains property with

at least one value in class Tree

Cf. ontology-style (description logic) axioms:

contains.Tree Forest

contains.Tree Orchard

C P is often written as P C. C P is often written as P C.

C P is normally written only in this direction (in Prolog, Datalog, etc.)

:-

Page 33: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

32

RBDQmediator R

u

l

e

Querying -Based Data

Three Dimensions of KBDAs: R,Q,m

Page 34: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Query Rewriting and Unfolding

33

Mediator strategy uses: KB to rewrite Q to Q’ and Mappings (rules) to unfold Q’ to Qi

’’ KB can be ontology, e.g. in OWL 2 QL (DL-Lite), or rules Abstract (relational/graph/…) queries Qi

’’ -grounded (to SQL/SPARQL/…) for DBi Each (relational/graph/…) database DBi left as original; answers at

Rewriting

KB

Q Folding Unfolding

Mappings DB1

. . .

Q1’’

Q2’’

Qn’’

DB2

DBn

Q’

(a)

Two ‘backward’ transformations

Page 35: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Global schema

Local schemas

Query Rewriting and Unfolding

34

Rewriting

KB

Q Folding Unfolding

Mappings DB1

. . .

Q1’’

Q2’’

Qn’’

DB2

DBn

Q’

(a)

Mediator strategy uses: KB to rewrite Q to Q’ and Mappings (rules) to unfold Q’ to Qi

’’ KB can be ontology, e.g. in OWL 2 QL (DL-Lite), or rules Abstract (relational/graph/…) queries Qi

’’ -grounded (to SQL/SPARQL/…) for DBi Each (relational/graph/…) database DBi left as original; answers at

Page 36: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Query Rewriting Use of KB In Information Retrieval:

Query expansion With increased recall

Without loss of precision

From Logic Programming (for Horn expressivity): Can use resolution method for KB-enrichment of a given (conjunctive) query with expanded (conjunctive) queries so that, for any DB, the answers to the enriched queries no longer using the KB are the same as the answers to the original query using the KB

35

Intuitive idea of query rewriting

Page 37: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Ontology-to-Rule Clausification of EntityWithTree KB Omitting Orchard

36

contains.Tree :- Forest. Forest :- Woodland.

(contains(?x s(?x)) Tree(s(?x))) :- Forest(?x). Forest(?x) :- Woodland(?x).

contains(?x s(?x)) :- Forest(?x). Tree(s(?x)) :- Forest(?x). Forest(?x) :- Woodland(?x).

Description Logic subsumptions (as higher-order rules):

Horn Logic rules (the first with conjunctive head):

Horn Logic rules (the first head split into two conjuncts):

s is fresh Skolem function symbol

Higher-order syntactic sugar for existential rule:

?y(contains(?x ?y) Tree(?y)) :- Forest(?x).

Page 38: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

KB Rules Perform Rewriting of Given Query

37

q(?z) :- contains(?z ?y) Tree(?y).

Rewriting Datalog query rule to obtain extra query rules:

contains(?x s(?x)) :- Forest(?x). Tree(s(?x)) :- Forest(?x). Forest(?x) :- Woodland(?x).

KB with Horn rules (from above) for rewriting of query rules:

q(?z):-contains(?z s(?x))Forest(?x)

q(?z) :- Forest(?z) Forest(?z). q(?z) :- Forest(?z).

q(?z):-Forest(?z)Tree(s(?z))

q(?z) :- Woodland(?z).

Q: Given

Expansion Q’: Given Expansion

Page 39: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Query Unfolding Use of Mappings to Original Database Sources Datalog rules bridging between:

KB

Distributed DBs

Use partial deduction-like unfolding (and simplification) of (conjunctive) KB queries to (conjunctive) abstract DB queries Abstract relational queries grounded to SQL,

abstract graph queries grounded to SPARQL, etc.

Lower-level optimization and execution by SQL, SPARQL, etc. engines

Generated queries distributed over multiple DBs as indicated by “source.” name prefixes

38

Intuitive idea of query unfolding

Page 40: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Sample Mapping Rules to Three Local Data Sources

39

contains(?x ?y) :- locDB.cnt(?x ?kind ?y). contains(?x ?y) :- regionDB.sub(?x ?r) locDB.cnt(?r ?kind ?y).

Forest(?x) :- ecoDB.Habitat(?plot "forest" ?size ?x).

Woodland(?x) :- ecoDB.Habitat(?plot "wood" ?size ?x).

Tree(?t) :- locDB.cnt(?plot "tree" ?t). Tree(?t) :- ecoDB.Plant(?plot "tree" ?size ?t).

Map KB predicates to locDB/regionDB tables for geo data:

Map KB predicates to locDB/ecoDB tables for forestry data:

Page 41: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Mapping Rules Perform Unfolding of Rewritten Queries

40

q(?z) :- contains(?z ?y) Tree(?y). Union of conjunctive queries as Datalog rules (rewritten):

q(?z) :- Forest(?z). q(?z) :- Woodland(?z).

q(?z) :- locDB.cnt(?z ?kind ?y) locDB.cnt(?plot "tree" ?y). q(?z) :- locDB.cnt(?z "tree" ?y). q(?z) :- locDB.cnt(?z ?kind ?y) ecoDB.Plant(?plot "tree" ?size ?y). q(?z) :- regionDB.sub(?z ?r) locDB.cnt(?r ?kind ?y) locDB.cnt(?plot "tree" ?y).

q(?z) :- regionDB.sub(?z ?r) locDB.cnt(?r "tree" ?y). q(?z) :- regionDB.sub(?z ?r)locDB.cnt(?r ?kind ?y)ecoDB.Plant(?plot "tree" ?size ?y).

Unfolding above queries via mappings from previous slide:

q(?z) :- ecoDB.Habitat(?plot "forest" ?size ?z).

q(?z) :- ecoDB.Habitat(?plot "wood" ?size ?z).

Simplification: Integrity rules (next slide)

Qi’’: Bold-faced (1 ≤ i ≤ 3)

Q’: Given Expansion

Page 42: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Local Data Source Consistency: Sample Integrity Rules (in 3 Forms)

(?x1 = ?x2 ?k1 = ?k2) :- locDB.cnt(?x1 ?k1 ?y) locDB.cnt(?x2 ?k2 ?y).

Datalog rule with equality (in conjunctive head) for functionality constraints over source.table:

Datalog rules with equality (head split into two conjuncts):

?x1 = ?x2 :- locDB.cnt(?x1 ?k1 ?y) locDB.cnt(?x2 ?k2 ?y). ?k1 = ?k2 :- locDB.cnt(?x1 ?k1 ?y) locDB.cnt(?x2 ?k2 ?y).

Same super-entity xi and kind ki if same sub-entity y

Datalog rules with inequality (in body) and falsity (in head):

⊥ :- locDB.cnt(?x1 ?k1 ?y) locDB.cnt(?x2 ?k2 ?y) ?x1 ≠ ?x2. ⊥ :- locDB.cnt(?x1 ?k1 ?y) locDB.cnt(?x2 ?k2 ?y) ?k1 ≠ ?k2.

Permits Boolean queries

Page 43: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Generalized Chaining-Mapping Rules

42

contains(?x ?y) :- locDB.cnt(?x ?kind ?y). contains(?x ?y) :- regionDB.sub(?x ?r) locDB.cnt(?r ?kind ?y).

Relational product of regionDB.sub and locDB.cnt (s. above):

contains(?x ?y) :- locDB.cnt(?x ?kind ?y). contains(?x ?y) :- regionDB.sub(?x ?r) contains(?r ?y).

Transitive closure of regionDB.sub based on locDB.cnt:

Recursive chaining

Page 44: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

43

RBDQwarehouse R

u

l

e

Querying -Based Data

Three Dimensions of KBDAs: R,Q,w

Page 45: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Database Materialization after Folding

44

KB

Q Folding Unfolding

Mappings DB1

. . .

DB2

DBn

DB’ Materialization

DB

(b)

• Warehouse strategy uses: Mappings (same rules as for unfolding) to fold DBi to DB and

KB to materialize Database DB to DB’ • KB can be ontology, e.g. in OWL 2 RL (DLP), or rules • Query is left as original; answers at solid triangular arrow head

Two ‘forward’ transformations

Page 46: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Global schema

Local schemas 45

KB

Q Folding Unfolding

Mappings DB1

. . .

DB2

DBn

DB’ Materialization

DB

(b)

• Warehouse strategy uses: Mappings (same rules as for unfolding) to fold DBi to DB and

KB to materialize Database DB to DB’ • KB can be ontology, e.g. in OWL 2 RL (DLP), or rules • Query is left as original; answers at solid triangular arrow head

Database Materialization after Folding

Page 47: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Database Folding Uses Mappings to Translate and Integrate Databases Datalog rules bridging between n local DBi and 1 global DB

Usually simpler than under mediator strategy

Materialization often done for higher-level DBi, e.g. with Prolog facts as relational DBi or RDF triples as graph DBi

Even if folding needs no table-to-facts or graph-to-triples translation, it still needs to do n-to-1 integration

For n=1 and higher-level DB1, no folding is needed at all (as assumed here)

46

DB = DB1

Page 48: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Database Materialization Use of KB In Database Systems:

Chase procedure

From Logic Programming (for Horn expressivity): Can use bottom-up/forward-chaining method and KB for enrichment of a given set of facts (DB) with derived facts

so that, for any query, the answers against the enriched DB’ no longer using the KB are the same as the answers against the original DB using the KB

47

Page 49: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

EntityWithTree (EWT) KB

48

contains.Tree

contains.Tree :- Forest. Forest :- Woodland.

Subsumptions (as higher-order rules):

Forest

Woodland

Page 50: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

49

contains.Tree

Data (as higher-order facts):

Forest

Woodland

e

f

w

contains.Tree(e). Forest(f). Woodland(w).

EWT DB

DB: Original

Page 51: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

EWT KB and DB: ‘Populated’ KB

contains.Tree

contains.Tree :- Forest. Forest :- Woodland.

Subsumptions and Data (as higher-order rules and facts):

Forest

Woodland

e

f

w

contains.Tree(e). Forest(f). Woodland(w).

50

Page 52: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Applying All KB Rules to All DB Facts (1)

contains.Tree

contains.Tree :- Forest. Forest :- Woodland.

Forest

Woodland

f

w

contains.Tree(e). contains.Tree(f). Forest(f). Forest(w). Woodland(w).

w

51

e f

Subsumptions and Data (as higher-order rules and facts):

Page 53: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Applying All KB Rules to All DB Facts (2)

contains.Tree

contains.Tree :- Forest. Forest :- Woodland.

Forest

Woodland

f

w

contains.Tree(e). contains.Tree(f). contains.Tree(w). Forest(f). Forest(w). Woodland(w).

w

52

e f w

Subsumptions and Data (as higher-order rules and facts):

Page 54: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Fixpoint with No New Rule-Derived Facts

contains.Tree

contains.Tree :- Forest. Forest :- Woodland.

Forest

Woodland

f

w

contains.Tree(e). contains.Tree(f). contains.Tree(w). Forest(f). Forest(w). Woodland(w).

w

53

e f w

Subsumptions and Data (as higher-order rules and facts):

Page 55: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Unpacking Conjunctions Implicit in Facts

contains.Tree

contains.Tree :- Forest. Forest :- Woodland.

Forest

Woodland

e

f

w

contains.Tree(e). contains(e s(e)). Tree(s(e)). contains.Tree(f). contains(f s(f)). Tree(s(f)). contains.Tree(w). contains(w s(w)). Tree(s(w)). Forest(f). Forest(w). Woodland(w).

f

w

w

54

s is fresh

Skolem function symbol

s(e) s(f) s(w)

con

tain

s

con

tains

con

tains

Tree

Subsumptions and Data (as higher-order rules and facts):

DB’: Materialized

Page 56: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Querying by Lookup in Materialized DB’

55

q4(?z) :- contains(?z ?y) Tree(?y).

Datalog rules for (conjunctive) queries, with answers:

q2(?z) :- Forest(?z).

q1(?z) :- Woodland(?z). ?z = w

?z = f, w

q3(?z) :- contains.Tree(?z). ?z = e, f, w

?z = e, f, w

Separate conjunctive queries qi:

After materialization, no need for

query rewriting with the same KB

Page 57: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

56

RBDQbidirectional R

u

l

e

Querying -Based Data

Three Dimensions of KBDAs: R,Q,b

Page 58: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Query Rewriting Along with Database Materialization after Folding (General)

57

• Bidirectional strategy (Mediator-Warehouse combination) • Both directions traversable concurrently (DB’ as sync point) • Answers at solid triangular arrow head

Rewriting

KB

Q Q’

Folding Unfolding

Mappings DB1

. . .

DB2

DBn

DB’ Materialization

DB

(c)

Page 59: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Global schema

Local schemas 58

• Bidirectional strategy (Mediator-Warehouse combination) • Both directions traversable concurrently (DB’ as sync point) • Answers at solid triangular arrow head

Rewriting

KB

Q Q’

Folding Unfolding

DB1

. . .

DB2

DBn

DB’ Materialization

DB

(c) Mappings

Query Rewriting Along with Database Materialization after Folding (General)

Page 60: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

59

• Bidirectional strategy for n=1: KBUpper disjoint from KBLower

(bipartitioning of KB)

(c1)

Rewriting Q Q’

Folding Unfolding

Mappings

DB1 DB’ Materialization

DB

KBUpper KBLower

Query Rewriting Along with Database Materialization after Folding (Simplified)

Page 61: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Global schema

Local schema 60

• Bidirectional strategy for n=1: KBUpper disjoint from KBLower

(bipartitioning of KB)

Mappings (c1)

Rewriting Q Q’

Folding Unfolding DB1 DB’

Materialization DB

KBUpper KBLower

Query Rewriting Along with Database Materialization after Folding (Simplified)

Page 62: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Partial Query Rewriting Using Part of Bipartitioned KB Can use resolution method

on part of KB for enrichment of a given (conjunctive) query with expanded (conjunctive) queries

so that, for any DB, the answers to the enriched queries only using the other (complement) part of the KB are the same as the answers to the original query using the original KB

In the following, KB = KBUpper KBLower will be bipartitioned into Upper and Lower part, here using KBUpper

61

+

Page 63: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Rule Clausification of Upper EWT KB

62

contains.Tree :- Forest.

(contains(?x s(?x)) Tree(s(?x))) :- Forest(?x).

contains(?x s(?x)) :- Forest(?x). Tree(s(?x)) :- Forest(?x).

Upper Description Logic subsumption (as higher-order rule):

Horn Logic rule (with conjunctive head):

Horn Logic rules (head split into two conjuncts):

s is fresh Skolem function symbol

Higher-order syntactic sugar for existential rule:

?y(contains(?x ?y) Tree(?y)) :- Forest(?x).

Page 64: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Upper KB Rules Perform Rewriting of Query

63

q(?z) :- contains(?z ?y) Tree(?y).

contains(?x s(?x)) :- Forest(?x). Tree(s(?x)) :- Forest(?x).

q(?z):-contains(?z s(?x))Forest(?x)

q(?z) :- Forest(?z) Forest(?z). q(?z) :- Forest(?z).

q(?z):-Forest(?z)Tree(s(?z))

Expansion

Q: Given

Q’: Given Expansion

Rewriting Datalog query rule to obtain extra query rule:

KB with Horn rules (from above) for rewriting of query rules:

Page 65: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Partial Database Materialization Using Part of Bipartitioned KB Can use bottom-up/forward-chaining method

on part of KB for enrichment of a given set of facts (DB) with derived facts so that, for any query, the answers against the enriched DB’ only using the other (complement) part of the KB are the same as the answers against the original DB using the original KB

In the following, KB will again be bipartitioned into Upper and Lower part, here using KBLower

64

Page 66: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Lower EWT KB and Unchanged DB

contains.Tree

Forest :- Woodland.

Lower Subsumption and original Data (as higher-order rule and facts):

Forest

Woodland

e

f

w

contains.Tree(e). Forest(f). Woodland(w).

65

DB = DB1: Original

Page 67: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Applying the KB Rule to All DB Facts

contains.Tree

Forest :- Woodland.

Forest

Woodland

f

w

contains.Tree(e). Forest(f). Forest(w). Woodland(w).

w

66

e

Lower Subsumption and original Data (as higher-order rule and facts):

Page 68: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

contains.Tree

Forest :- Woodland.

Forest

Woodland

f

w

contains.Tree(e). Forest(f). Forest(w). Woodland(w).

w

67

e

Fixpoint with No New Rule-Derived Facts Lower Subsumption and original Data (as higher-order rule and facts):

Page 69: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Unpacking Conjunction Implicit in Fact

contains.Tree

Forest :- Woodland.

Forest

Woodland

e

f

w

contains.Tree(e). contains(e s(e)). Tree(s(e)). Forest(f). Forest(w). Woodland(w).

w

68

s is fresh

Skolem function symbol

s(e)

con

tain

s

Tree DB’: Materialized

Lower Subsumption and original Data (as higher-order rule and facts):

Page 70: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Querying by Lookup in Materialized DB’ (cf. Slide “Upper KB Rules Perform Rewriting of Query”)

69

q(?z) :- contains(?z ?y) Tree(?y).

Datalog rules for (conjunctive) queries, with answers:

q(?z) :- Forest(?z). ?z = f, w

?z = e

Union of conjunctive queries q:

partial query rewriting

and partial materialization

using parts (Upper, Lower) of the same KB

?z = e, f, w Q’

Page 71: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

70

KBDAstrategy K

n

o

w

l

e

d

g

e

Access -Based Data

Three Dimensions of KBDAs

Page 72: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Query Rewriting and Unfolding: Recap

71

Rewriting

KB

Q Folding Unfolding

Mappings DB1

. . .

Q1’’

Q2’’

Qn’’

DB2

DBn

Q’

(a)

Page 73: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Database Materialization after Folding: Recap

72

KB

Q Folding Unfolding

Mappings DB1

. . .

DB2

DBn

DB’ Materialization

DB

(b)

Page 74: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Query Rewriting Along with Database Materialization after Folding: Interlude

73

Rewriting

KB

Q Q’

Folding Unfolding

Mappings DB1

. . .

DB2

DBn

DB’ Materialization

DB

(c)

Page 75: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Unified Architecture

74

Rewriting

KB

Q Q’

Folding Unfolding

Mappings DB1

. . .

Q1’’

Q2’’

Qn’’

DB2

DBn

DB’ Materialization

DB

Q’

(d)

• Combines strategies (a)-(c) of earlier slides • Meets the needs of Forest case study

Page 76: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Global schema

Local schemas

Unified Architecture

75

Rewriting

KB

Q Q’

Folding Unfolding

DB1

. . .

Q1’’

Q2’’

Qn’’

DB2

DBn

DB’ Materialization

DB

Q’

(d)

• Combines strategies (a)-(c) of earlier slides • Meets the needs of Forest case study

Mappings

Page 77: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Will permit (Web-based) interchange of at least the three kinds of rules in KBDAs: Sharing and Reuse of queries (+ integrity constraints),

KBs, and mappings can save repeated work

Language options include ISO: Prolog (extra-logical features),

Common Logic 2 (‘wild-west’ syntax)

W3C: OWL 2 RL (DLP: weak), RIF-BLD (Horn: no head-)

OMG: PRR (metamodel only), SBVR (mostly Controlled English)

RuleML: Deliberation RuleML (customizable expressivity)

Flora-2, Ergo: Rulelog (extends database expressivity) 76

Standard Language for KBDAs Rules

Page 78: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

System of 1.02

77

RuleML

Deliberation Reaction

ECAP KR HOL

FOL

Derivation

Hornlog

Datalog

Fact Query Trigger (EA)

CEP

subClassOf

Overlaps

Syntactic specialization of

ECA

Production (CA)

Consumer

Page 79: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

RuleML system covers a wide rule spectrum: Deliberation Family (1.02) … Reaction Family (1.02)

Rule condition part shared across the spectrum

Syntactic uniformity allows wide reuse (XML, JSON)

System now constitutes a deep language lattice

Major language inclusion path:

Deliberation HOL FOL Derivation Hornlog Datalog …

Naf mix-in customization of Hornlog RuleML (Naf Hornlog RuleML) leads to Logic Programs

78

RuleML System of Families of Languages

Page 80: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

RBDAs-Style KBDAs Architecture: Expressivity of Rule Systems

79

• The language of the global schema can be generalized from unary/binary (OBDAs) to n-ary predicates (RBDAs)

• When decidability of querying is not required, RBDAs

expressivity can be extended from Datalog, Datalog, and description logics to Datalog+, Horn logic, and FOL, as enabled by Deliberation RuleML 1.02, • Features customizable with the MYNG 1.02 GUI

• Moreover, Reaction RuleML 1.02 can express updates, as needed for KBDMs (Ontology-based Data Management)

Page 81: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

RBDAs-Style KBDAs Architecture: Uniformity via Rule Systems

80

• Rule-based style of Unified Architecture (earlier slide) • Presentation syntax (“:-”), serialization (“<RuleML>”),

and semantics approach (model theory) uniform from queries (+ integrity constraints) to KBs to mappings to abstract DBs

• Division of labor between KB rules and mapping rules can be modified without crossing paradigm boundaries • Allows KB- and mapping-directed normal forms

• Assumptions (unique-name and closed-world) of DBs and many rule systems made explicit by semantic styles

Page 82: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Forest: Study Overview RuleML provides a family of rule (incl. fact/data & query)

languages of customizable expressivity, a family-uniform XML format, and a suite of (MYNG) tools for processing

WSL creates knowledge and publishes data about Swiss forests, giving an integrated federal perspective on heterogeneous databases of various (geographically, thematically, ...) specialized sources

RuleML-WSL collaboration has brought RuleML technology to bear on WSL knowledge and DBs for RBDA

Prepared data sets, defined global schema and mappings, formalized knowledge, and specialized RBDA architecture in support of complex statistical data analysis

81

Page 84: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Forest: Three Local Data Sources Productivity Research Areas: Ertragskundeflächen (EKF)

83 areas Pure stands Approximate 10-year intervals (time series of different lengths) Data: .dbf/.csv files (ca. 100 MB) 3 tables

Natural Forest Reserves: Naturwaldreservate (NWR) 36 areas Pure and mixed stands 10-year intervals Data: .txt files (ca. 65 MB) 1 file per area

Long-term Forest Ecosystem Research: Langfristige Waldökosystemforschung (LWF) 18 areas Pure and mixed stands 5-year intervals (since 1995) Data: Oracle DB (ca. 500 GB) RDB tables

83

Page 85: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

84

Local schemas

EKF

vfl

FNUM BBG HA …

trees

FNUM BNR BA …

plots

F_ID NAME NINV…

wr

X F_ID B_ID …

LWF

plots

PLAC CLNR X …

trees

PLAC CLNR BANR…

Global schema

PlotsStatic

plot source x y altitude class

PlotsDynamic

plot year age dmg dg n

SGAbundance

plot species-group percentage

RelMortality

plot year1 year2 relmortality

dom

FNUM SPEC PC …

dom

F_ID SPEC PC ..

dom

FNUM SPEC PC ..

NWR

dyn

PLAC YEAR N...

dyn

F_ID YEAR N...

dyn

FNUM YEAR N...

DB’

Keys – three composite – in bold red

Forest: Schemas for Architecture (a)-(d)

Page 86: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Eligibility Criteria 1. Impact of forest management on tree mortality is

negligible at the investigated sites

NWR: impact is negligible by definition

EKF: impact of grades A, B is negligible (≠ C, D, H, P)

LWF: not recorded

2. Plots are statistically independent of each other

Plots that are located within a distance of 500 meters from each other are possibly dependent

3. Plots contain a time-averaged abundance greater than a threshold value for at least one of the target species

85

Page 87: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Questions Addressed 1. Are there sufficiently many eligible plots in order to

perform an analysis per main tree species?

2. Are there sufficiently many eligible plots in order to perform an analysis per main tree species and climatic region?

3. Which eligible plots represent pure tree stands and which eligible plots represent mixed tree stands?

86

Page 88: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

q(?plot) :- .EligiblePlot(?plot) Query .TreeStandKey(?id ?plot "oak") .TreeStandAbundance(?id ?pct) ?pct >= 15. Exists ?id (.TreeStandKey(?id ?plot ?sp) .TreeStandAbundance(?id ?pct)) :- KB Rule .TreeStandMerged(?plot ?sp ?pct).

q(?plot) :- .EligiblePlot(?plot) Rewritten

.TreeStandMerged(?plot "oak" ?pct) ?pct >= 15.

Query Rewriting

87

Ou

tpu

t In

pu

t

Page 89: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Query Unfolding

88

q2(?plot) :- .TreeStandMerged(?plot "oak" ?pct) Query 2 ?pct >= 15. .TreeStandMerged(?plot "oak" ?pct) :- EKF.dom(?plot "Quercus petraea" ?pct1) Mapping EKF.dom(?plot "Quercus robur" ?pct2) Rule ?pct = ?pct1 + ?pct2.

q2(?plot) :- EKF.dom(?plot "Quercus petraea" ?pct1) Unfolded EKF.dom(?plot "Quercus robur" ?pct2) ?pct1+?pct2 >= 15.

Ou

tpu

t In

pu

t

Page 90: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Conclusions Ontology-Based Data Access (OBDA) founded on three

kinds of rules: Query rules (including integrity rules), KB rules (for query rewriting and DB materialization), as well as mapping rules (for query unfolding and DB folding)

OBDA complemented by Rule-Based Data Access (RBDA) and generalized to Knowledge-Based Data Access (KBDA)

Specified an RBDA-uniform KBDAs architecture with unified mediator, warehouse, and bidirectional strategies

RuleML used for XML-serialized rules, MYNG-customized rule expressivity, and platform-independent RBDA

Introduced Forest specialization of RBDA architecture for statistical data analysis in ecosystem research at WSL

89

Page 91: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Future Work (1) Translate simplified presentation syntax into pre-released

XML serialization of Deliberation RuleML 1.02 / MYNG 1.02 Support implementations of specified architecture reusing

(open source) KBDA technology (cf. RBDA wiki page) For high-precision RBDQs language support, complement

current techniques of Datalog+ RuleML with Datalog RuleML using context-sensitive/semantic validators for “-” constraints

Evaluate (mediator/warehouse, relational/graph, …) trade-offs for KBDQs in PSOA RuleML as executed in PSOATransRun

Develop Forest study at WSL for extended and new data sources of big volume, variety, and velocity (e.g., about climate change)

Augment geospatial KBDQs mappings with Optique mapping (bootstrapping, repair, …) techniques

90

Page 92: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

Future Work (2) Compare engines for OBDQs and RBDQs, including HYDRA

and RDFox, w.r.t. expressivity and efficiency

Adapt Semantic Automated Discovery and Integration (SADI) test cases for KBDQs experiments in PSOA RuleML querying

Evaluate Abstract Logic-based Architecture Storage systems & Knowledge base Analysis (ALASKA) for RBDAs

Extend the KBDAs architecture with semantic annotation rules for (Deep) Web data extraction (Deep Web Mediator)

Use Grailog for KBDAs data and knowledge visualization

Explore synergies between the logical KBDAs approach with statistical approaches, e.g. from Statistical Relational AI

91

Page 93: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

BACKUP Slides

92

Page 94: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

What Kinds of Jeopardy! Queries (from IBM

Watson’s 2011 First Prize Game) are Addressed Here? Queries that are non-metaphorical

“To push one of these paper products is to stretch established limits” (“the envelope”, Brad Rutter)

Queries not asking for a logical predicate/relation

“To bring back someone to his original function or position” (“reinstate”, Ken Jennings)

Queries asking for a logical individual constant

“It's New Zealand's second-largest city” (“Christchurch”, IBM Watson)

93

Page 95: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

What Logic Do the Addressed Jeopardy! Queries Adhere to? Function-free Horn logic / Pure Prolog (Datalog)

Jeopardy! “It's New Zealand's second-largest city”

Paraphrase (with a suggestive Wikipedia link) “Which city is New Zealand's second largest urban area w.r.t. population ranking?”

Formalization (as a query rule with variable ?z) q(?z) :- city(?z) urban_pop_rank(New_Zealand 2 ?z).

94

Page 96: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

KBDAs (e.g. OBDQmediator) Mappings

Can be

relational-to-relational

relational-to-graph (e.g., SQL to SPARQL)

graph-to-relational

graph-to-graph

95

Page 97: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

96

Following slides taken from InteropGraphRel

Page 98: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

97

Page 99: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

98

Page 100: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

99

Page 101: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

100

Page 102: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

101

Page 103: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

102

Page 104: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

103

Page 105: (Long version: cs.unb.ca/~boley/talks/RulesOBDAboley/talks/RulesOBDA.pdfMediator vs. Warehouse Strategy in KBDA Architectures Pros & Cons Mediator: KB-enriched queries via Query Rewriting

104