bridging the semantic gap between rdf and sparql using completeness statements - iswc 2014 poster

1
Bridging the Semantic Gap between RDF and SPARQL using Completeness Statements Landscape SPARQL assumes RDF data to be complete, following the Closed-World Assumption (CWA) The semantic gap between RDF and SPARQL would make queries give different query answer semantics depending on the assumption used, in terms of > Certain answers: if new facts are given, query answers cannot be withdrawn > Possible answers: query answers are complete For example, query answer semantics with respect to the movie data sources above: Query OWA only CWA only Combined* All actors who won both an Oscar and a Golden Globe Certain Certain & Possible Certain & Possible All actors with tattoos who won an Oscar Certain Certain & Possible Certain All actors who won an Oscar but no Golden Globe - Certain & Possible Certain & Possible All actors with tattoos who did not win an Oscar - Certain & Possible Certain All actors without tattoos who won an Oscar - Certain & Possible Possible** Components Completeness Statements To say a data source is complete for a topic, we can use completeness statements. For the sources above, we formulate two statements: C1. “Complete for all Oscar winners”: Compl(?act won oscar|true) C2. “Complete for all Golden Globe winners”: Compl(?act won globe|true) Valid Interpretations Using completeness statements, we can restrict the possible interpretations of a data source. So, by attaching to a data source the completeness statement “Complete for all Oscar winners”: Compl(?act won oscar|true), we accept only those interpretations of that source that contain no other Oscar winner as valid. Certain and Possible Answers in the presence of Completeness Statements Certain and possible answers are defined with respect to valid interpretations. An answer A of query Q is certain if A is in Q(I) for all valid interpretations I. An answer A of Q is possible if A is in Q(I) for some valid interpretation I. Crucial Statements of A Query To capture which information is crucial for getting certain and possible answers for query Q, we define positive and negative crucial completeness statements for Q. For example, for the query Q that asks for “actors winning an Oscar but no Golden Globe”, the positive crucial statement is Compl(?act won oscar | true) and the negative crucial statement is Compl(?act won globe | ?act won Oscar). Theorem 1: Bounded Possible Answers If the positive crucial statements of a query are entailed by the given completeness statements, then: (1) all query answers are certain, and (2) all possible answers are retrieved over the source Theorem 2: Queries with Negation If the negative crucial statements are entailed, then every answer is certain. If, in addition, the positive statements are entailed, then there cannot be any other possible answers than those returned by the query. RDF data is often treated as incomplete, following the Open-World Assumption (OWA) which suits the case here, since it is fair to assume that the data source is incomplete for actors with tattoos (why? hard to know all) which suits the case here, since it is fair to assume that the data source is complete for Oscar winning actors and Golden-Globe winning actors (why? easy to verify) RDF data source containing Oscar winning actors and Golden-Globe winning actors Fariz Darari, Simon Razniewski and Werner Nutt Free University of Bozen-Bolzano, Italy Download Paper RDF data source containing actors with tattoos Gap Problem Different query answer semantics in terms of certain and possible answers! *Combined semantics takes into account completeness information over data sources, while leaving other parts of data incomplete **This would mean, as we expect no certain answers, any answers can be found to be wrong as we don’t know if we are complete for tattoed actors We built the bridge A technique to identify query answer semantics in the presence of completeness statements 1 2 5 6 4 3 Since the completeness statements C1 and C2 of the data sources above entail both the positive and negative crucial statements of the query Q, the query Q will return certain and possible answers.

Upload: fariz-darari

Post on 15-Jul-2015

247 views

Category:

Internet


0 download

TRANSCRIPT

Page 1: Bridging the Semantic Gap between RDF and SPARQL using Completeness Statements - ISWC 2014 Poster

Bridging the Semantic Gap between RDF and SPARQLusing Completeness Statements

Landscape

SPARQL assumes RDF data to be complete, following the Closed-World Assumption (CWA)

The semantic gap between RDF and SPARQL would make queries give different query answer semanticsdepending on the assumption used, in terms of

> Certain answers: if new facts are given, query answers cannot be withdrawn> Possible answers: query answers are complete

For example, query answer semantics with respect to the movie data sources above:

Query OWA only CWA only Combined*All actors who won both an Oscar and a Golden Globe Certain Certain & Possible Certain & Possible

All actors with tattoos who won an Oscar Certain Certain & Possible Certain

All actors who won an Oscar but no Golden Globe - Certain & Possible Certain & Possible

All actors with tattoos who did not win an Oscar - Certain & Possible Certain

All actors without tattoos who won an Oscar - Certain & Possible Possible**

Components

Completeness StatementsTo say a data source is complete for a topic, we can use completeness statements.For the sources above, we formulate two statements:C1. “Complete for all Oscar winners”:

Compl(?act won oscar|true)C2. “Complete for all Golden Globe winners”:

Compl(?act won globe|true)

Valid InterpretationsUsing completeness statements, we can restrict the possible interpretations of a data source. So, by attaching to a data source the completeness statement “Complete for all Oscar winners”: Compl(?act won oscar|true), we acceptonly those interpretations of that source that contain no other Oscar winner as valid.

Certain and Possible Answers in the presence of Completeness StatementsCertain and possible answers are defined with respect to valid interpretations.An answer A of query Q is certain if A is in Q(I) for all valid interpretations I.An answer A of Q is possible if A is in Q(I)for some valid interpretation I.

Crucial Statements of A QueryTo capture which information is crucial for getting certain and possible answers for query Q,we define positive and negative crucial completeness statements for Q.For example, for the query Q that asks for “actors winning an Oscar but no Golden Globe”, the positive crucial statement is Compl(?act won oscar | true) and the negative crucial statement is Compl(?act won globe | ?act won Oscar).

Theorem 1: Bounded Possible AnswersIf the positive crucial statements of a query are entailed by the given completeness statements, then: (1) all query answers are certain, and(2) all possible answers are retrieved over the source

Theorem 2: Queries with NegationIf the negative crucial statements are entailed,then every answer is certain.If, in addition, the positive statements are entailed, then there cannot be any other possible answers than those returned by the query.

RDF data is often treated as incomplete, following the Open-World Assumption (OWA)

which suits the case here, since it is fair to assume that the data source is incomplete for actors with tattoos (why? hard to know all)

which suits the case here, since it is fair to assume thatthe data source is complete for Oscar winning actors and Golden-Globe winning actors (why? easy to verify)

RDF data source containing Oscar winning actors and Golden-Globe winning actors

Fariz Darari, Simon Razniewski and Werner NuttFree University of Bozen-Bolzano, Italy

Download Paper

RDF data source containing actors with tattoos

Gap

Problem

Different query answer semantics in terms of certain and possible answers!

*Combined semantics takes into account completeness information over data sources,while leaving other parts of data incomplete

**This would mean, as we expect no certain answers, any answers can be foundto be wrong as we don’t know if we are complete for tattoed actors

We built the bridgeA technique to identify query answer semantics in the presence of completeness statements

1

2

5

6

4

3

Since the completeness statements C1 and C2 of the data sources above entail both the positive and negative crucial statements of the query Q, the query Q will return certain and possible answers.