bridging the semantic gap between rdf and sparql using completeness statements - iswc 2014 poster
TRANSCRIPT
Bridging the Semantic Gap between RDF and SPARQLusing Completeness Statements
Landscape
SPARQL assumes RDF data to be complete, following the Closed-World Assumption (CWA)
The semantic gap between RDF and SPARQL would make queries give different query answer semanticsdepending on the assumption used, in terms of
> Certain answers: if new facts are given, query answers cannot be withdrawn> Possible answers: query answers are complete
For example, query answer semantics with respect to the movie data sources above:
Query OWA only CWA only Combined*All actors who won both an Oscar and a Golden Globe Certain Certain & Possible Certain & Possible
All actors with tattoos who won an Oscar Certain Certain & Possible Certain
All actors who won an Oscar but no Golden Globe - Certain & Possible Certain & Possible
All actors with tattoos who did not win an Oscar - Certain & Possible Certain
All actors without tattoos who won an Oscar - Certain & Possible Possible**
Components
Completeness StatementsTo say a data source is complete for a topic, we can use completeness statements.For the sources above, we formulate two statements:C1. “Complete for all Oscar winners”:
Compl(?act won oscar|true)C2. “Complete for all Golden Globe winners”:
Compl(?act won globe|true)
Valid InterpretationsUsing completeness statements, we can restrict the possible interpretations of a data source. So, by attaching to a data source the completeness statement “Complete for all Oscar winners”: Compl(?act won oscar|true), we acceptonly those interpretations of that source that contain no other Oscar winner as valid.
Certain and Possible Answers in the presence of Completeness StatementsCertain and possible answers are defined with respect to valid interpretations.An answer A of query Q is certain if A is in Q(I) for all valid interpretations I.An answer A of Q is possible if A is in Q(I)for some valid interpretation I.
Crucial Statements of A QueryTo capture which information is crucial for getting certain and possible answers for query Q,we define positive and negative crucial completeness statements for Q.For example, for the query Q that asks for “actors winning an Oscar but no Golden Globe”, the positive crucial statement is Compl(?act won oscar | true) and the negative crucial statement is Compl(?act won globe | ?act won Oscar).
Theorem 1: Bounded Possible AnswersIf the positive crucial statements of a query are entailed by the given completeness statements, then: (1) all query answers are certain, and(2) all possible answers are retrieved over the source
Theorem 2: Queries with NegationIf the negative crucial statements are entailed,then every answer is certain.If, in addition, the positive statements are entailed, then there cannot be any other possible answers than those returned by the query.
RDF data is often treated as incomplete, following the Open-World Assumption (OWA)
which suits the case here, since it is fair to assume that the data source is incomplete for actors with tattoos (why? hard to know all)
which suits the case here, since it is fair to assume thatthe data source is complete for Oscar winning actors and Golden-Globe winning actors (why? easy to verify)
RDF data source containing Oscar winning actors and Golden-Globe winning actors
Fariz Darari, Simon Razniewski and Werner NuttFree University of Bozen-Bolzano, Italy
Download Paper
RDF data source containing actors with tattoos
Gap
Problem
Different query answer semantics in terms of certain and possible answers!
*Combined semantics takes into account completeness information over data sources,while leaving other parts of data incomplete
**This would mean, as we expect no certain answers, any answers can be foundto be wrong as we don’t know if we are complete for tattoed actors
We built the bridgeA technique to identify query answer semantics in the presence of completeness statements
1
2
5
6
4
3
Since the completeness statements C1 and C2 of the data sources above entail both the positive and negative crucial statements of the query Q, the query Q will return certain and possible answers.