poster - completeness statements about rdf data sources and their use for query answering

1
Completeness Statements about RDF Data Sources and Their Use for Query Answering Fariz Darari joint work with Werner Nutt, Giuseppe Pirrò, and Simon Razniewski KRDB, Free University of Bozen-Bolzano, Italy Thousands of RDF data sources are today available on the Web. Machine-readable qualitative descriptions of their content are crucial. We focus on data completeness, an important aspect of data quality. How to formalize and express in a machine-readable way completeness information about RDF data sources? How to leverage such completeness information? 1. Formal framework for expressing completeness information. 2. Study of query completeness from completeness information in various settings. Completeness statement on the Web Users visiting this source can prefer it to other sources. However, the completeness statement verified as complete is only human readable! Why is LinkedMDB complete ? Why is DBpedia not complete for the query ? The completeness statement in DBpedia says that it is complete for Tarantino’s movies (dv:st1). However, the query asks about all movies for which Tarantino is the director, and also an actor. It is not stated that DBpedia includes all the actors of Tarantino’s movies. Therefore, DBpedia is possibly not complete for this query. The completeness statements in LMDB say that they are complete for Tarantino’s movies (lv:st1) and also the actors (lv:st2). Implementation http://rdfcorner.wordpress.com Query completeness in a single data source scenario lv:st2 c:hasCondition [c:subject [spin:varName "m"]; c:predicate rdf:type; c:object schema:Movie]. lv:st2 c:hasCondition [c:subject [spin:varName "m"]; c:predicate schema:director; c:object dbp:Tarantino]. dv:dbpdataset rdf:type void:Dataset; dv:dbpdataset c:hasComplStmt dv:st1. dv:st1 c:hasPattern [c:subject [spin:varName "m"]; c:predicate rdf:type; c:object schema:Movie ]. dv:st1 c:hasPattern [c:subject [spin:varName "m"]; c:predicate schema:director;c:object dbp:Tarantino]. SELECT ?m WHERE {?m rdf:type schema:Movie. ?m schema:director dbp:Tarantino. ?m schema:actor dbp:Tarantino} Select all the movies for which Tarantino is the director and also an actor DBPedia is complete for all Tarantino's movies The answer is incomplete The answer is complete SPARQL endpoint SPARQL endpoint @prefix c: <http://inf.unibz.it/ontologies/completeness#> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> @prefix spin: <http://spinrdf.org/sp#> @prefix void: <http://rdfs.org/ns/void#> @prefix dv: <http://dbpedia.org/void/> @prefix lv: <http://linkedmdb.org/void/> @prefix dbp: <http://dbpedia.org/resource/> @prefix schema: <http://schema.org> Q lv:st2 c:hasPattern [c:subject[spin:varName "m"]; c:predicate schema:actor; c:object[spin:varName "a"]]. Endpoint IRI DBPe Endpoint IRI LMDBe lv:lmdbdataset c:hasComplStmt lv:st1. lv:st1 c:hasPattern [c:subject [spin:varName "m"]; c:predicate rdf:type; c:object schema:Movie ]. lv:st1 c:hasPattern [c:subject [spin:varName "m"]; c:predicate schema:director;c:object dbp:Tarantino ]. lv:lmdbdataset rdf:type void:Dataset; lv:lmdbdataset c:hasComplStmt lv:st2. For each completeness statement, all the triple patterns defined via hasPattern are collected into a set P1 and all the triple patterns defined via hasCondition are collected into a set P2. A completeness statement is interpreted as: CONSTRUCT {P1} WHERE {P1 . P2} When a data source has a completeness statement (defined via hasComplStmt), it means that if the query above is evaluated over an “ideal” graph then all the results are in the data source. SPARQL queries with OPT Completeness with RDFS inference Completeness statement on the Semantic Web Extensions CoRner: Completeness Reasoner Given a query Q and a data source with completeness statements S: 1. Create a template answer graph G Q of Q. 2. Over G Q , evaluate all CONSTRUCT queries derived from S 3. Check whether G Q can be obtained after the evaluation. If yes, the query is complete, otherwise might be incomplete. Semantics of completeness statements Checking query completeness Context Problem Contributions lv:st1 c:hasCondition [c:subject [spin:varName "m"]; c:predicate rdf:type; c:object schema:Movie]. lv:st1 c:hasCondition [c:subject [spin:varName "m"]; c:predicate schema:director; c:object dbp:Tarantino]. lv:st1 c:hasPattern [c:subject[spin:varName "m"]; c:predicate schema:actor; c:object[spin:varName "a"]]. lv:lmdbdataset rdf:type void:Dataset. lv:lmdbdataset c:hasComplStmt lv:st1. LMDB is complete for all Tarantino’s movies and all their actors. Federated query completeness SPARQL queries with negations and comparisons Live, Web-based CoRner Work In Progress Empirical evaluation of query completeness checking

Upload: fariz-darari

Post on 11-May-2015

86 views

Category:

Technology


2 download

DESCRIPTION

Thousands of RDF data sources are today available on the Web. Machine-readable qualitative descriptions of their content are crucial. We focus on data completeness, an important aspect of data quality. How to formalize and express in a machine-readable way completeness information about RDF data sources? How to leverage such completeness information? Formal framework for expressing completeness information. Study of query completeness from completeness information in various settings.

TRANSCRIPT

Page 1: Poster - Completeness Statements about RDF Data Sources and Their Use for Query Answering

Completeness Statements about RDF Data Sources and Their Use for Query Answering

Fariz Dararijoint work with Werner Nutt, Giuseppe Pirrò, and Simon Razniewski

KRDB, Free University of Bozen-Bolzano, Italy

Thousands of RDF data sources are today available on the Web.

Machine-readable qualitative descriptions of their content are crucial.

We focus on data completeness, an important aspect of data quality.

How to formalize and express in a machine-readable way

completeness information about RDF data sources?

How to leveragesuch completeness information?

1. Formal framework for expressing completeness information.

2. Study of query completeness from completeness information in various settings.

Completeness statement on the Web

Users visiting this source can prefer it to other sources.

However, the completeness statement verified as complete is

only human readable!

Why is LinkedMDBcomplete ?

Why is DBpedia not complete for the query ?

The completeness statement in DBpedia says that it is complete for Tarantino’s movies (dv:st1). However, the query asks about all movies for which Tarantino is the director, and also an actor.

It is not stated that DBpedia includes all the actors of Tarantino’s movies. Therefore, DBpedia is possiblynot complete for this query.

The completeness statements inLMDB say that they are completefor Tarantino’s movies (lv:st1)and also the actors (lv:st2).

Implementation

http://rdfcorner.wordpress.com

Query completeness in a single data source scenario

lv:st2 c:hasCondition [c:subject [spin:varName "m"];

c:predicate rdf:type; c:object schema:Movie].

lv:st2 c:hasCondition [c:subject [spin:varName "m"];

c:predicate schema:director; c:object dbp:Tarantino].

dv:dbpdataset rdf:type void:Dataset;

dv:dbpdataset c:hasComplStmt dv:st1.

dv:st1 c:hasPattern [c:subject [spin:varName "m"];

c:predicate rdf:type; c:object schema:Movie ].

dv:st1 c:hasPattern [c:subject [spin:varName "m"];

c:predicate schema:director;c:object dbp:Tarantino].

SELECT ?m

WHERE {?m rdf:type schema:Movie.

?m schema:director dbp:Tarantino.

?m schema:actor dbp:Tarantino}

Select all the movies for which

Tarantino is the director and also an actor

DBPedia is complete

for all Tarantino's movies

LinkedMDB is complete for all Tarantino's movies

and also movies for which he is an actor

The answer is

incompleteThe answer is

complete

SPARQL

endpoint

SPARQL

endpoint

@prefix c: <http://inf.unibz.it/ontologies/completeness#>

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

@prefix spin: <http://spinrdf.org/sp#>

@prefix void: <http://rdfs.org/ns/void#>

@prefix dv: <http://dbpedia.org/void/>

@prefix lv: <http://linkedmdb.org/void/>

@prefix dbp: <http://dbpedia.org/resource/>

@prefix schema: <http://schema.org>

Q

lv:st2 c:hasPattern [c:subject[spin:varName "m"];

c:predicate schema:actor; c:object[spin:varName "a"]].

Endpoint IRI

DBPeEndpoint IRI

LMDBe

lv:lmdbdataset c:hasComplStmt lv:st1.

lv:st1 c:hasPattern [c:subject [spin:varName "m"];

c:predicate rdf:type; c:object schema:Movie ].

lv:st1 c:hasPattern [c:subject [spin:varName "m"];

c:predicate schema:director;c:object dbp:Tarantino ].

lv:lmdbdataset rdf:type void:Dataset;

lv:lmdbdataset c:hasComplStmt lv:st2.

For each completeness statement, all the triple patterns definedvia hasPattern are collected into a set P1 and all the triple patterns definedvia hasCondition are collected into a set P2. A completeness statement isinterpreted as: CONSTRUCT {P1} WHERE {P1 . P2}When a data source has a completeness statement (defined viahasComplStmt), it means that if the query above is evaluated overan “ideal” graph then all the results are in the data source.

SPARQL queries with OPT

Completeness with RDFS inference

Completeness statement on the Semantic Web

Extensions

CoRner: Completeness Reasoner

Given a query Q and a data source with completeness statements S:1. Create a template answer graph GQ of Q.2. Over GQ, evaluate all CONSTRUCT queries derived from S3. Check whether GQ can be obtained after the evaluation.

If yes, the query is complete, otherwise might be incomplete.

Semantics of completeness statements

Checking query completeness

Context Problem Contributions

lv:st1 c:hasCondition

[c:subject [spin:varName "m"]; c:predicate rdf:type; c:object schema:Movie].

lv:st1 c:hasCondition

[c:subject [spin:varName "m"]; c:predicate schema:director; c:object dbp:Tarantino].

lv:st1 c:hasPattern

[c:subject[spin:varName "m"]; c:predicate schema:actor; c:object[spin:varName "a"]].

lv:lmdbdataset rdf:type void:Dataset.

lv:lmdbdataset c:hasComplStmt lv:st1.

LMDB is complete for all Tarantino’s movies and all their actors.

Federated query completeness

SPARQL queries with negations and comparisons

Live, Web-based CoRner

Work In Progress

Empirical evaluation of query completeness checking