dl'12 mastro at work

49
Mastro at Work: Experiences on Ontology-based Data Access Domenico Fabio Savo 1 , Domenico Lembo 1 , Maurizio Lenzerini 1 , Antonella Poggi 1 , Mariano Rodriguez-Muro 2 , Vittorio Romagnoli 3 , Marco Ruzzi 1 , Gabriele Stella 3 1 Sapienza Universit` a di Roma lastname @dis.uniroma1.it 2 Free University of Bozen-Bolzano [email protected] 3 Banca Monte dei Paschi di Siena firstname.lastname @banca.mps.it May, 2010 Mastro at Work Savo et. al.

Upload: mariano-rodriguez

Post on 01-Nov-2014

207 views

Category:

Education


2 download

DESCRIPTION

A presentation on my early work on the Mastro system. Some of this research is now part of the ontop system, some evolved into more optimised forms (also in ontop).

TRANSCRIPT

Page 1: DL'12 mastro at work

Mastro at Work: Experiences onOntology-based Data Access

Domenico Fabio Savo1, Domenico Lembo1,Maurizio Lenzerini1, Antonella Poggi1,

Mariano Rodriguez-Muro2, Vittorio Romagnoli3,Marco Ruzzi1, Gabriele Stella3

1 Sapienza Universitadi Roma

[email protected]

2 Free University ofBozen-Bolzano

[email protected] Banca Monte dei

Paschi di Siena

[email protected]

May, 2010Mastro at Work Savo et. al.

Page 2: DL'12 mastro at work

Motivations

DL-Lite OBDA framework

OBDA

Integrated view, semantically richdescription, mapping for concep-tual level and data sources. Ex-ploiting reasoning to overcome in-completeness

Data SourceData Source

Data SourceData Layer

Ontology Semantic Layer

Queries

Mappings

Mastro at Work Savo et. al.

Page 3: DL'12 mastro at work

Motivations

DL-Lite OBDA framework

DL-Lite framework for OBDA

Components:

• A family of OntologyLanguages: DL-Lite.

• A mapping technique forrelational databases:Virtual ABoxes

• Promising proposal.

• However, never evaluated in‘the field’.

Data SourceData Source

Data SourceData Layer

Ontology Semantic Layer

Queries

Mappings

Mastro at Work Savo et. al.

Page 4: DL'12 mastro at work

Motivations

The domain

• Joint project on OBDA by Banca Monte dei Paschi diSiena (MPS), Free University of Bozen-Bolzano, andSAPIENZA Universita di Roma.

• Clusters of Connected Customers (CCCs)

• Data is used in risk estimation in the process of grantingcredit to bank customers

Mastro at Work Savo et. al.

Page 5: DL'12 mastro at work

Motivations

Problems and Solutions

• management is now completely entrusted to the expertof the applications rather than to the domain experts.

• OBDA has been then used for answering queries posed overthe CCCs ontology, not only aimed at easily extractingrelevant information but also to localize inconsistenciesand incompleteness in the data, as well as to devise newdata governance tasks.

Mastro at Work Savo et. al.

Page 6: DL'12 mastro at work

Motivations

Problems and Solutions

• management is now completely entrusted to the expertof the applications rather than to the domain experts.

• OBDA has been then used for answering queries posed overthe CCCs ontology, not only aimed at easily extractingrelevant information but also to localize inconsistenciesand incompleteness in the data, as well as to devise newdata governance tasks.

Mastro at Work Savo et. al.

Page 7: DL'12 mastro at work

Systems

Mastro at Work Savo et. al.

Page 8: DL'12 mastro at work

Mastro

The Mastro-OBDA plugin

A DL-Lite reasoner for the OBDA context that is able to take anontology with with mappings to a relational database (defining a‘virtual Abox’) in order to provide the following services:

Features

• Conjunctive Query Answering

• Epistemic Query Answering (EQL)

• Identification Constraints

• Epistemic Constraints

Mastro at Work Savo et. al.

Page 9: DL'12 mastro at work

Mastro

The Mastro-OBDA plugin

A DL-Lite reasoner for the OBDA context that is able to take anontology with with mappings to a relational database (defining a‘virtual Abox’) in order to provide the following services:

Features

• Conjunctive Query Answering

• Epistemic Query Answering (EQL)

• Identification Constraints

• Epistemic Constraints

Mastro at Work Savo et. al.

Page 10: DL'12 mastro at work

Protege, OBDA and Mastro plugins

Protege 4 and the OBDA Plugin

Features

• Ontology definition

• Datasource and mappingdefinition

• Interaction withOBDA-reasoner (CQs,Epistemic queries, etc.)

Mastro at Work Savo et. al.

Page 11: DL'12 mastro at work

Case Study

Mastro at Work Savo et. al.

Page 12: DL'12 mastro at work

MPS

Methodology

• Developed the Ontology independently from the source

• Tools used:• interviews• questionnaires• existing documentation

• Developed over a period of 6 months

Mastro at Work Savo et. al.

Page 13: DL'12 mastro at work

Ontology

Excerpt of the Ontology

∃inGrouping v Customer

∃inGrouping− v Grouping∃relativeTo v Grouping

∃relativeTo− v CCC

Grouping v ∃inGrouping−

Grouping v ∃relativeTo(functional relativeTo)

(functional inGrouping−)Grouping v δ(timestamp)

JuridicalCCC v CCCJuridicalCCC v δ(timestamp)

∃inMembership v Customer

∃inMembership− v Membership∃hasMembership v CompanyGroup

∃hasMembership− v Memberhip

∃Membership v ∃inMembership−

Memberhip v ∃hasMembership−

(functional inMembership−)(functional hasMembership)

Holding v MembershipMembership v δ(timestamp)

CompanyGroup v δ(id code)

79 concepts, 33 roles, 37concept attributes, 600DL-LiteA,Id axioms

Mastro at Work Savo et. al.

Page 14: DL'12 mastro at work

Constraints

IDCs to impose complex business constraints

(id JuridicalCCC timestamp, relativeTo−

◦ inGrouping− ◦ inMembership ◦ ?Holding◦ hasMembership−)

• At the same time two juridical CCCs cannot comprisecustomers that are lead members, i.e., are the holdings, of thesame company group.

A total of 30 Identification Constraints

Mastro at Work Savo et. al.

Page 15: DL'12 mastro at work

Constraints

IDCs to impose complex business constraints

(id JuridicalCCC timestamp, relativeTo−

◦ inGrouping− ◦ inMembership ◦ ?Holding◦ hasMembership−)

• At the same time two juridical CCCs cannot comprisecustomers that are lead members, i.e., are the holdings, of thesame company group.

A total of 30 Identification Constraints

Mastro at Work Savo et. al.

Page 16: DL'12 mastro at work

Constraints

EQLCs to impose complex business constraints

EQLC( verify not exists (

SELECT jurCCC.jccc

FROM sparqltable(SELECT ?jccc

WHERE{ ?jccc rdf:type ’JuridicalCCC’ })jurCCCWHERE jurCCC.jccc NOT IN (

SELECT withGroupLeader.jccc

FROM sparqltable(SELECT ?jccc, ?mem

WHERE{ ?cus rdf:type ’Customer’.

?cus :inMembership ?mem.?mem rdf:type ’Holding’.

?cus :inGrouping ?gr. ?gr :relativeTo ?jccc.

?jccc rdf:type ’JuridicalCCC’}) withGroupLeader ) ) )

• There does not exist a juridical CCC that does not comprise acustomer which is the holding member of a company group

A total of 27 Epistemic Constraint

Mastro at Work Savo et. al.

Page 17: DL'12 mastro at work

Constraints

EQLCs to impose complex business constraints

EQLC( verify not exists (

SELECT jurCCC.jccc

FROM sparqltable(SELECT ?jccc

WHERE{ ?jccc rdf:type ’JuridicalCCC’ })jurCCCWHERE jurCCC.jccc NOT IN (

SELECT withGroupLeader.jccc

FROM sparqltable(SELECT ?jccc, ?mem

WHERE{ ?cus rdf:type ’Customer’.

?cus :inMembership ?mem.?mem rdf:type ’Holding’.

?cus :inGrouping ?gr. ?gr :relativeTo ?jccc.

?jccc rdf:type ’JuridicalCCC’}) withGroupLeader ) ) )

• There does not exist a juridical CCC that does not comprise acustomer which is the holding member of a company group

A total of 27 Epistemic Constraint

Mastro at Work Savo et. al.

Page 18: DL'12 mastro at work

OBDA Mappings

The Data Source

• Currently, MPS applications managing CCCs rely over a 15million tuple database, stored in 12 relational tables underIBM DB2 RDBMS

Source name Source Description Source sizeGZ0001 Data on customers 3.463.083GZ0002 Data on juridical connections between customers 157.280GZ0003 Data on guarantee connection between customers 1.270.333GZ0004 Data on economical connections between customers 104.033GZ0005 Data on corporation connections between customers 1.021.779GZ0006 Data on patrimonial connections between customers 809.321GZ0007 Data on company groups 55.362GZ0012 Customers loan information 5.966.948GZ0015 Data on monitoring and reporting procedures 1.243GZ0101 Data on membership of customers into CCCs 2.225.466GZ0102 Information on CCCs 663.656GZ0104 Data on bank credit coordinators for juridical CCCs 38.457

Mastro at Work Savo et. al.

Page 19: DL'12 mastro at work

OBDA Mappings

OBDA Mappings: Example

SELECT id cluster, timestamp val FROM GZ0102, GZ0007

WHERE GZ0102.validity code = ‘T’ AND GZ0102.id cluster <> 0

AND GZ0007.validity code = ‘T’ AND GZ0007.id group <> 0

AND GZ0102.id cluster = GZ0007.id group

JuridicalCCC(ccc(id cluster, timestamp val)),timestamp(ccc(id cluster, timestamp val), timestamp val)

If the tuple (243, 24052009112341) is in ans(body) the we havethe following Virtual ABox assertions:

JuridicalCCC(gcc(243, 24052009112341))timestamp(gcc(243, 24052009112341)

Mastro at Work Savo et. al.

Page 20: DL'12 mastro at work

OBDA Mappings

OBDA Mappings: Example

SELECT id cluster, timestamp val FROM GZ0102, GZ0007

WHERE GZ0102.validity code = ‘T’ AND GZ0102.id cluster <> 0

AND GZ0007.validity code = ‘T’ AND GZ0007.id group <> 0

AND GZ0102.id cluster = GZ0007.id group

JuridicalCCC(ccc(id cluster, timestamp val)),timestamp(ccc(id cluster, timestamp val), timestamp val)

If the tuple (243, 24052009112341) is in ans(body) the we havethe following Virtual ABox assertions:

JuridicalCCC(gcc(243, 24052009112341))timestamp(gcc(243, 24052009112341)

Mastro at Work Savo et. al.

Page 21: DL'12 mastro at work

Experimentation Ontology usage

Mastro at Work Savo et. al.

Page 22: DL'12 mastro at work

Ontology usage

Verifying incompleteness in the data through queryanswering

Incompleteness of the data

Querying the database directly vs. querying the ontology providesmore answers.

• To retrieve the identification codes of all company groups.DB operations use id code from GZ0007

• Asking for q(y)← CompanyGroup(x), id code(x , y)

• Mastro indicates that GZ0007 is not the only relevant table.

Mastro at Work Savo et. al.

Page 23: DL'12 mastro at work

Ontology usage

Verifying inconsistencies in the data through queryanswering

Inconsistency of the data

Using epistemic query answering to locate inconsistent tuples.

• (functional ingrouping−)• We can detect the violating tuples using:

SELECT testview.l, testview.c1, testview.c2

FROM sparqltable (SELECT ?l ?c1 ?c2

WHERE{?c1:inGrouping?l. ?c2:inGrouping?l}) testview

WHERE testview.c1 <> testview.c2

Mastro at Work Savo et. al.

Page 24: DL'12 mastro at work

Query structure

Evaluation Performance

Mastro at Work Savo et. al.

Page 25: DL'12 mastro at work

Query structure

Query Performance

Query answering in DL-Lite for OBDA in a nutshell

• Reformulate w.r.t. T• Unfold w.r.t. M• Evaluate

Sources of complexity

• Reformulation - Size of the reformulation

• Unfolding - Size of the unfolding and query structure

Most critical aspect in the MPS scenario: query structure.

Mastro at Work Savo et. al.

Page 26: DL'12 mastro at work

Query structure

Query Performance

Query answering in DL-Lite for OBDA in a nutshell

• Reformulate w.r.t. T• Unfold w.r.t. M• Evaluate

Sources of complexity

• Reformulation - Size of the reformulation

• Unfolding - Size of the unfolding and query structure

Most critical aspect in the MPS scenario: query structure.

Mastro at Work Savo et. al.

Page 27: DL'12 mastro at work

Query structure

Query Performance

Query answering in DL-Lite for OBDA in a nutshell

• Reformulate w.r.t. T• Unfold w.r.t. M• Evaluate

Sources of complexity

• Reformulation - Size of the reformulation

• Unfolding - Size of the unfolding and query structure

Most critical aspect in the MPS scenario: query structure.

Mastro at Work Savo et. al.

Page 28: DL'12 mastro at work

Query structure

Query Performance

Query answering in DL-Lite for OBDA in a nutshell

• Reformulate w.r.t. T• Unfold w.r.t. M• Evaluate

Sources of complexity

• Reformulation - Size of the reformulation

• Unfolding - Size of the unfolding and query structure

Most critical aspect in the MPS scenario: query structure.

Mastro at Work Savo et. al.

Page 29: DL'12 mastro at work

Query structure

Query Structure

In Mastro, query unfolding is done by means of partial evaluationand SQL views.

Given a Virtual Abox defined by DB, the mappings M and a queryQ to be evaluated we:

• Define a set of auxiliary predicates and SQL views

• Associate these to T by means of a logic program P• Compute the partial evaluation of Q with respect to P• Translate the PE into SQL by means of the views.

Mastro at Work Savo et. al.

Page 30: DL'12 mastro at work

Query structure

Query Structure

In Mastro, query unfolding is done by means of partial evaluationand SQL views.

Given a Virtual Abox defined by DB, the mappings M and a queryQ to be evaluated we:

• Define a set of auxiliary predicates and SQL views

• Associate these to T by means of a logic program P• Compute the partial evaluation of Q with respect to P• Translate the PE into SQL by means of the views.

Mastro at Work Savo et. al.

Page 31: DL'12 mastro at work

Query structure

Query Structure

In Mastro, query unfolding is done by means of partial evaluationand SQL views.

Given a Virtual Abox defined by DB, the mappings M and a queryQ to be evaluated we:

• Define a set of auxiliary predicates and SQL views

• Associate these to T by means of a logic program P

• Compute the partial evaluation of Q with respect to P• Translate the PE into SQL by means of the views.

Mastro at Work Savo et. al.

Page 32: DL'12 mastro at work

Query structure

Query Structure

In Mastro, query unfolding is done by means of partial evaluationand SQL views.

Given a Virtual Abox defined by DB, the mappings M and a queryQ to be evaluated we:

• Define a set of auxiliary predicates and SQL views

• Associate these to T by means of a logic program P• Compute the partial evaluation of Q with respect to P

• Translate the PE into SQL by means of the views.

Mastro at Work Savo et. al.

Page 33: DL'12 mastro at work

Query structure

Query Structure

In Mastro, query unfolding is done by means of partial evaluationand SQL views.

Given a Virtual Abox defined by DB, the mappings M and a queryQ to be evaluated we:

• Define a set of auxiliary predicates and SQL views

• Associate these to T by means of a logic program P• Compute the partial evaluation of Q with respect to P• Translate the PE into SQL by means of the views.

Mastro at Work Savo et. al.

Page 34: DL'12 mastro at work

Query structure

T -viewsExample:The mappings

m1: SELECT .... WHERE cd tp = 503 ; linkedTo(cus(idcus), link(linkid))m2: SELECT .... WHERE cd tp = 501 ; linkedTo(cus(idcus), link(linkid))

The view for AuxlinkedTo

SELECT ‘cus(’||idcus||‘)’ as term1, ‘link(’||linkid||‘)’ as term2

FROM (SELECT .... WHERE cd\_tp = 503) view\_m1

UNION

SELECT ‘cus’(||idcus||‘)’ as term1, ‘link(’||linkid||‘)’ as term2

FROM (SELECT .... WHERE cd\_tp = 501) view\_m2

Mastro at Work Savo et. al.

Page 35: DL'12 mastro at work

Query structure

T -viewsExample:The mappings

m1: SELECT .... WHERE cd tp = 503 ; linkedTo(cus(idcus), link(linkid))m2: SELECT .... WHERE cd tp = 501 ; linkedTo(cus(idcus), link(linkid))

The view for AuxlinkedTo

SELECT ‘cus(’||idcus||‘)’ as term1, ‘link(’||linkid||‘)’ as term2

FROM (SELECT .... WHERE cd\_tp = 503) view\_m1

UNION

SELECT ‘cus’(||idcus||‘)’ as term1, ‘link(’||linkid||‘)’ as term2

FROM (SELECT .... WHERE cd\_tp = 501) view\_m2

Mastro at Work Savo et. al.

Page 36: DL'12 mastro at work

Query structure

T -views, unfoldingProgram

linkedTo(x , y)← AuxlinkedTo(x , y)

The queryq(x , y)← linkedTo(x , z), linkedTo(y , z)

The partial evaluation

q(x , y)← AuxleadsTo(x , z),AuxlinkedTo(y , z)

Mastro at Work Savo et. al.

Page 37: DL'12 mastro at work

Query structure

T -views, unfolding

SELECT leadsto1.term1, leadsto2.term1 FROM (

SELECT ‘cus(’||idcus||‘)’ as term1, ‘link(’||linkid||‘)’ as term2

FROM (SELECT .... WHERE cd\_tp = 503) view\_m1

UNION

SELECT ‘cus’(||idcus||‘)’ as term1, ‘link(’||linkid||‘)’ as term2

FROM (SELECT .... WHERE cd\_tp = 501) view\_m2

) as leadsto1,

(

SELECT ‘cus(’||idcus||‘)’ as term1, ‘link(’||linkid||‘)’ as term2

FROM (SELECT .... WHERE cd\_tp = 503) view\_m1

UNION

SELECT ‘cus’(||idcus||‘)’ as term1, ‘link(’||linkid||‘)’ as term2

FROM (SELECT .... WHERE cd\_tp = 501) view\_m2

) as leadsto2

WHERE leadsto1.term2 = leadsto2.term2

Mastro at Work Savo et. al.

Page 38: DL'12 mastro at work

Query structure

Performance of T -viewsPoor performance, in the order of hours, for trivial queries.

Culprit

Materialization of partial results in the DBMS query plans.

Solution

For relational DBMS queries, simpler is better.

Mastro at Work Savo et. al.

Page 39: DL'12 mastro at work

Query structure

Performance of T -viewsPoor performance, in the order of hours, for trivial queries.

Culprit

Materialization of partial results in the DBMS query plans.

Solution

For relational DBMS queries, simpler is better.

Mastro at Work Savo et. al.

Page 40: DL'12 mastro at work

Query structure

Performance of T -viewsPoor performance, in the order of hours, for trivial queries.

Culprit

Materialization of partial results in the DBMS query plans.

Solution

For relational DBMS queries, simpler is better.

Mastro at Work Savo et. al.

Page 41: DL'12 mastro at work

Query structure

M-views

Example:Mappings

m1: SELECT .... WHERE cd tp = 503 ; linkedTo(cus(idcus), link(linkid))m2: SELECT .... WHERE cd tp = 501 ; linkedTo(cus(idcus), link(linkid))

The views:

Auxm1 = SELECT .... WHERE cd tp = 503

Auxm2 = SELECT .... WHERE cd tp = 503

Mastro at Work Savo et. al.

Page 42: DL'12 mastro at work

Query structure

M-views, unfolding

Program:

linkedTo(cus(idcus), link(linkid))← Auxm1(idcus, linkid)

linkedTo(cus(idcus), link(linkid))← Auxm2(idcus, linkid)

The queryq(x , y)← linkedTo(x , z), linkedTo(y , z)

The partial evaluation

q(cus(idcus1), cus(idcus2))← Auxm1(idcus1, linkid1),Auxm1(idcus2, linkid1)

q(cus(idcus1), cus(idcus2))← Auxm1(idcus1, linkid1),Auxm2(idcus2, linkid1)

q(cus(idcus1), cus(idcus2))← Auxm2(idcus1, linkid1),Auxm2(idcus2, linkid1)

Mastro at Work Savo et. al.

Page 43: DL'12 mastro at work

Query structure

M-views, unfolding

Program:

linkedTo(cus(idcus), link(linkid))← Auxm1(idcus, linkid)

linkedTo(cus(idcus), link(linkid))← Auxm2(idcus, linkid)

The queryq(x , y)← linkedTo(x , z), linkedTo(y , z)

The partial evaluation

q(cus(idcus1), cus(idcus2))← Auxm1(idcus1, linkid1),Auxm1(idcus2, linkid1)

q(cus(idcus1), cus(idcus2))← Auxm1(idcus1, linkid1),Auxm2(idcus2, linkid1)

q(cus(idcus1), cus(idcus2))← Auxm2(idcus1, linkid1),Auxm2(idcus2, linkid1)

Mastro at Work Savo et. al.

Page 44: DL'12 mastro at work

Query structure

M-views, unfolding

Program:

linkedTo(cus(idcus), link(linkid))← Auxm1(idcus, linkid)

linkedTo(cus(idcus), link(linkid))← Auxm2(idcus, linkid)

The queryq(x , y)← linkedTo(x , z), linkedTo(y , z)

The partial evaluation

q(cus(idcus1), cus(idcus2))← Auxm1(idcus1, linkid1),Auxm1(idcus2, linkid1)

q(cus(idcus1), cus(idcus2))← Auxm1(idcus1, linkid1),Auxm2(idcus2, linkid1)

q(cus(idcus1), cus(idcus2))← Auxm2(idcus1, linkid1),Auxm2(idcus2, linkid1)

Mastro at Work Savo et. al.

Page 45: DL'12 mastro at work

Query structure

M-views, unfolding

SELECT ’cus(’||auxm11.idcus||’)’ as x, ’cus(’||auxm12.idcus||’)’ as y

FROM (SELECT .... WHERE cd\_tp = 503) as auxm11,

(SELECT .... WHERE cd\_tp = 503) as auxm12

WHERE auxm11.linkid = auxm12.linkid

UNION

SELECT ’cus(’||auxm11.idcus||’)’ as x, ’cus(’||auxm21.idcus||’)’ as y

FROM (SELECT .... WHERE cd\_tp = 503) as auxm11,

(SELECT .... WHERE cd\_tp = 501) as auxm21

WHERE auxm11.linkid = auxm21.linkid

UNION

SELECT ’cus(’||auxm21.idcus||’)’ as x, ’cus(’||auxm22.idcus||’)’ as y

FROM (SELECT .... WHERE cd\_tp = 501) as auxm21,

(SELECT .... WHERE cd\_tp = 501) as auxm22

WHERE auxm21.linkid = auxm22.linkid

Mastro at Work Savo et. al.

Page 46: DL'12 mastro at work

Query structure

Performance comparison

figures/performances4.pdf

Mastro at Work Savo et. al.

Page 47: DL'12 mastro at work

Conclusions

Conclusions

Mastro at Work Savo et. al.

Page 48: DL'12 mastro at work

Conclusions

MPS feedback

Useful result from the MPS point of view

• Data Integration

• Data Quality

• Knowledge Sharing

From the technical point of view:

• DBMS level performance for on-the-fly OBDA is possible

• Query tuning is mandatory.

• Pinpointed the features of the queries that are needed forgood performance and those that trigger bad performance.

Mastro at Work Savo et. al.

Page 49: DL'12 mastro at work

Conclusions

Current and Future work

• Experiment with live access to the sources

• Extend the current experimentation to other data domains inMPS

Preview of the Mastro OBDA plugin and the OBDA plugin forProtege 4.0

• http://www.dis.uniroma1.it/quonto/

• http://obda.inf.unibz.it

Mastro at Work Savo et. al.