query-driven management of linked data quality
TRANSCRIPT
Query-Driven Management of Linked Data QualityFariz Darari – 2nd Year PhD Student
Supervised by: Prof. Werner Nutt
Linked Data Cloud(1014 datasets)
… but what about its quality?
Generalized 2-step approach to query-driven data quality:
Given a user query, the query answers are checkedif they reside inside:
P parts of data with data quality aspects such as completeness*, correctness and timeliness, that we annotate, also using queries.
Research Directions
Interrelations between Data Quality Aspects and also Their Provenance
Data Completeness and• Linked Data Streams• SPARQL Queries• Data Privacy
Current Results and Future Work
Timestamped Completeness Statements1
• We add timestamps to completeness statements• Consequently, completeness statements need not hold all the time• Example: LMDb has all movies starring Tarantino up to Aug 2010• Consequently, query completeness may have timestamps as well
Reasoning for Large Sets of Completeness Statements1
• In general, to perform a query completeness check from completeness statements (CSs), all CSs have to be considered
• This is problematic if there are a large number of CSs• We propose an indexing technique over completeness statements
to find only potentially relevant CSs for the check• One of the relevance criterion is based on predicates, that is,
the predicates of completeness statements must occur in the query• We develop our indexing technique based on tries and inverted files• Experiments have shown that the technique
can considerably speed up the check
Completeness Reasoning Implementation: CORNER2
• We made a proof of concept of the query-driven completeness management builtusing the standard Semantic Web framework
• A Web demo is available at http://corner.inf.unibz.it/• We plan to develop a technique to extract
RDF completeness statements from Wikipedia and use CORNER as the hub and testbed over its experimental evaluation
RDF and SPARQL Reconciliation3
• RDF follows the Open World Assumption, while SPARQL follows the Closed World Assumption
• This semantic gap makes different query answersemantics in terms of certain and possible answers
• For instance, queries with negation may returnwrong answers
• We propose the use of completeness statements
to bridge the gap
• As next steps, we plan to complete the characterizationsof the theorems and handle more negation fragments
Query-Driven Data Completeness:Marty Moviegoer wants complete answers
Query-Driven Approach to Linked Data Quality
Data completeness
Data correctness
Data timeliness
1
2
* At ISWC 2013, we proposed a query-driven approach to completeness managementover Linked Data. This became the foundation of our research, horizontally and vertically,as seen in the research directions below.
2 Published at ESWC 2014 Demo
3 Published at ISWC 2014 Poster
1 To be submitted at the ACM Transactions on the Web journal
Wikipedia already has some English quality annotations (e.g., completeness)In the center of the Cloud, there is DBpedia (RDF version of Wikipedia)The potential of the statements is however still untapped for DBpedia
Winter Seminar 2015 at UniBZ
We’ve got quantity …