query-driven management of linked data quality

1
Query-Driven Management of Linked Data Quality Fariz Darari – 2 nd Year PhD Student Supervised by: Prof. Werner Nutt Linked Data Cloud (1014 datasets) … but what about its quality? Generalized 2-step approach to query-driven data quality: Given a user query, the query answers are checked if they reside inside: P parts of data with data quality aspects such as completeness*, correctness and timeliness, that we annotate, also using queries. Research Directions Interrelations between Data Quality Aspects and also Their Provenance Data Completeness and Linked Data Streams SPARQL Queries Data Privacy Current Results and Future Work Timestamped Completeness Statements 1 We add timestamps to completeness statements Consequently, completeness statements need not hold all the time Example: LMDb has all movies starring Tarantino up to Aug 2010 Consequently, query completeness may have timestamps as well Reasoning for Large Sets of Completeness Statements 1 In general, to perform a query completeness check from completeness statements (CSs), all CSs have to be considered This is problematic if there are a large number of CSs We propose an indexing technique over completeness statements to find only potentially relevant CSs for the check One of the relevance criterion is based on predicates, that is, the predicates of completeness statements must occur in the query We develop our indexing technique based on tries and inverted files Experiments have shown that the technique can considerably speed up the check Completeness Reasoning Implementation: CORNER 2 We made a proof of concept of the query-driven completeness management built using the standard Semantic Web framework A Web demo is available at http://corner.inf.unibz.it/ We plan to develop a technique to extract RDF completeness statements from Wikipedia and use CORNER as the hub and testbed over its experimental evaluation RDF and SPARQL Reconciliation 3 RDF follows the Open World Assumption, while SPARQL follows the Closed World Assumption This semantic gap makes different query answer semantics in terms of certain and possible answers For instance, queries with negation may return wrong answers We propose the use of completeness statements to bridge the gap As next steps, we plan to complete the characterizations of the theorems and handle more negation fragments Query-Driven Data Completeness: Marty Moviegoer wants complete answers Query-Driven Approach to Linked Data Quality Data completeness Data correctness Data timeliness 1 2 * At ISWC 2013, we proposed a query-driven approach to completeness management over Linked Data. This became the foundation of our research, horizontally and vertically, as seen in the research directions below. 2 Published at ESWC 2014 Demo 3 Published at ISWC 2014 Poster 1 To be submitted at the ACM Transactions on the Web journal Wikipedia already has some English quality annotations (e.g., completeness) In the center of the Cloud, there is DBpedia (RDF version of Wikipedia) The potential of the statements is however still untapped for DBpedia Winter Seminar 2015 at UniBZ We’ve got quantity …

Upload: fariz-darari

Post on 15-Jul-2015

72 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Query-Driven Management of Linked Data Quality

Query-Driven Management of Linked Data QualityFariz Darari – 2nd Year PhD Student

Supervised by: Prof. Werner Nutt

Linked Data Cloud(1014 datasets)

… but what about its quality?

Generalized 2-step approach to query-driven data quality:

Given a user query, the query answers are checkedif they reside inside:

P parts of data with data quality aspects such as completeness*, correctness and timeliness, that we annotate, also using queries.

Research Directions

Interrelations between Data Quality Aspects and also Their Provenance

Data Completeness and• Linked Data Streams• SPARQL Queries• Data Privacy

Current Results and Future Work

Timestamped Completeness Statements1

• We add timestamps to completeness statements• Consequently, completeness statements need not hold all the time• Example: LMDb has all movies starring Tarantino up to Aug 2010• Consequently, query completeness may have timestamps as well

Reasoning for Large Sets of Completeness Statements1

• In general, to perform a query completeness check from completeness statements (CSs), all CSs have to be considered

• This is problematic if there are a large number of CSs• We propose an indexing technique over completeness statements

to find only potentially relevant CSs for the check• One of the relevance criterion is based on predicates, that is,

the predicates of completeness statements must occur in the query• We develop our indexing technique based on tries and inverted files• Experiments have shown that the technique

can considerably speed up the check

Completeness Reasoning Implementation: CORNER2

• We made a proof of concept of the query-driven completeness management builtusing the standard Semantic Web framework

• A Web demo is available at http://corner.inf.unibz.it/• We plan to develop a technique to extract

RDF completeness statements from Wikipedia and use CORNER as the hub and testbed over its experimental evaluation

RDF and SPARQL Reconciliation3

• RDF follows the Open World Assumption, while SPARQL follows the Closed World Assumption

• This semantic gap makes different query answersemantics in terms of certain and possible answers

• For instance, queries with negation may returnwrong answers

• We propose the use of completeness statements

to bridge the gap

• As next steps, we plan to complete the characterizationsof the theorems and handle more negation fragments

Query-Driven Data Completeness:Marty Moviegoer wants complete answers

Query-Driven Approach to Linked Data Quality

Data completeness

Data correctness

Data timeliness

1

2

* At ISWC 2013, we proposed a query-driven approach to completeness managementover Linked Data. This became the foundation of our research, horizontally and vertically,as seen in the research directions below.

2 Published at ESWC 2014 Demo

3 Published at ISWC 2014 Poster

1 To be submitted at the ACM Transactions on the Web journal

Wikipedia already has some English quality annotations (e.g., completeness)In the center of the Cloud, there is DBpedia (RDF version of Wikipedia)The potential of the statements is however still untapped for DBpedia

Winter Seminar 2015 at UniBZ

We’ve got quantity …