1 rdf aggregate queries and views edward hung, yu deng, v.s. subrahmanian university of maryland,...

37
1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

Post on 20-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

1

RDF Aggregate Queries and Views

Edward Hung, Yu Deng, V.S. Subrahmanian

University of Maryland, College Park

ICDE 2005, April 7, Tokyo, Japan

Page 2: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

2

Maintenance of RDF Aggregate Views Introduction of RDF and RDQL RDQL Extension for Aggregate Views Aggregate View Maintenance Algorithms

AMX Implementation and Experiments Related Work

Page 3: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

3

Introduction Resource Description Framework (RDF)

W3C RecommendationRepresents metadata about resources

identifiable on the web (by Uniform Resource Identifier (URI))

Triple: (Resource, Property, Value) (Artist, rdf:type, rdfs:Class) (Painter, rdf:type, rdfs:Class) (Painter, rdfs:subClassOf, Artist)

Page 4: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

<?xml version="1.0"?><!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xml:base="http://www.auctionschema.com/schema1#">

<rdfs:Class rdf:ID="Artist"/> <rdfs:Class rdf:ID="Painter"><rdfs:subClassOf

rdf:resource="#Artist"/></rdfs:Class> <rdfs:Datatype rdf:about="&xsd;string"/> <rdf:Property rdf:ID="fname"> <rdfs:domain rdf:resource="#Artist"/> <rdfs:range rdf:resource="&xsd;string"/> </rdf:Property></rdf:RDF>

<?xml version="1.0"?><!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]><rdf:RDF xmlns:rdf ="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ns1="http://www.auctionschema.com/schema1#">

<rdf:Description rdf:about="http://www.artist.net#guyrose"> <rdf:type rdf:resource="ns1:Painter"/> <ns1:fname rdf:datatype="&xsd;string"> Guy </ns1:fname> </rdf:Description></rdf:RDF>

RDFSchema

RDFInstance

Page 5: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

<?xml version="1.0"?><!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xml:base="http://www.auctionschema.com/schema1#">

<rdfs:Class rdf:ID="Artist"/> <rdfs:Class rdf:ID="Painter"><rdfs:subClassOf

rdf:resource="#Artist"/></rdfs:Class> <rdfs:Datatype rdf:about="&xsd;string"/> <rdf:Property rdf:ID="fname"> <rdfs:domain rdf:resource="#Artist"/> <rdfs:range rdf:resource="&xsd;string"/> </rdf:Property></rdf:RDF>

<?xml version="1.0"?><!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]><rdf:RDF xmlns:rdf ="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ns1="http://www.auctionschema.com/schema1#">

<rdf:Description rdf:about="http://www.artist.net#guyrose"> <rdf:type rdf:resource="ns1:Painter"/> <ns1:fname rdf:datatype="&xsd;string"> Guy </ns1:fname> </rdf:Description></rdf:RDF>

ArtistString

Painter

fname

subClassOf

&r1Guyfname

&r1 = http://www.artist.net#guyrose

Page 6: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan
Page 7: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

7

RDQL: RDF Query Language

SELECT?highpriceWHERE (?artist, <ns1:lname>, "Rose"),(?artist, <ns1:fname>, "Guy"),(?artist, <ns1:creates>, ?artifact),(?artifact, <ns1:estimated>, ?price),(?price, <ns1:high>, ?highprice),(?artifact, <ns1:presented>, ?date)AND 2004-04-01 <= ?date <= 2004-04-30USING ns1 FOR http://www.auctionschema.com/schema1#>

graph pattern

Page 8: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

8

RDQL Extension for Aggregates and Views

CREATEVIEW AS SELECT max(?highprice)WHERE (?artist, <ns1:lname>, "Rose"),(?artist, <ns1:fname>, "Guy"),(?artist, <ns1:creates>, ?artifact),(?artifact, <ns1:estimated>, ?price),(?price, <ns1:high>, ?highprice),(?artifact, <ns1:presented>, ?date)AND 2004-04-01 <= ?date <= 2004-04-30USING ns1 FOR http://www.auctionschema.com/schema1#>

Page 9: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

9

Aggregate Query Aggregate operators, e.g. min, max, sum,

count, average GROUP BY clause Output a table of tuples

Output can be (i) an RDF instance or (ii) a tableAdvantage of (i): allows us to further query the

resultHowever, (ii) allows any forms of tables, which

include the possibility to output in the form of an RDF instance if the table consists of a set of RDF tuples.

Page 10: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

We are expanding the syntax of RDQL so that it allows constants in SELECT clauses which equivalently creates new resources using the constants.

For example, the previous query can be modified as followsCREATEVIEW AS

SELECTSELECT <ns1:works_by_guyrose>, <ns1:works_by_guyrose>, <ns1:maxprice>, <ns1:maxprice>, maxmax(?highprice)(?highprice)

WHERE (?artist, <ns1:lname>, "Rose"),(?artist, <ns1:fname>, "Guy"),(?artist, <ns1:creates>, ?artifact),(?artifact, <ns1:estimated>, ?price),(?price, <ns1:high>, ?highprice),(?artifact, <ns1:presented>, ?date)AND 2004-04-01 <= ?date <= 2004-04-30USING ns1 FOR http://www.auctionschema.com/schema1#>

The result is a valid RDF statement (<ns1:works_by_guyrose>,<ns1:maxprice>,``800000"^^ns1:USD)

Page 11: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

11

Aggregate View Maintenance

Relational Approach Store all triples in a relational table with schema

(Resource, Property, Value)OR Store resources and values of the same property in a

separate relational table with schema (Resource, Value)

#self-joins = (#triples in where-clause) – 1 Large number of delta rules during relational view

maintenance expensive

Page 12: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

12

Aggregate View Maintenance

Our ApproachLocalized search in RDF graphsModified version of breadth-first search

starting at the inserted/deleted edgeauxiliary data are needed for certain

aggregate views min, max, avg

Page 13: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

13

Distributive Aggregate Function An aggregate function f is distributive w.r.t a

source update operation if and only if the updated value is based on its old value and update

without reference to the source. Examples: count, sum, average w.r.t. insertion, deletion

and update For average, we will need an additional attribute size

which stores the size of intermediate result S in order to compute the correct updated value (or, we can use sum, count to calculate it)

max and min are distributive w.r.t. insertion, but not deletion and update Auxiliary data computed from S help to avoid the need to

refer to the source.

Page 14: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

graph pattern

Page 15: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

BAG

Page 16: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

BAG800000

Page 17: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

SELECT max(?highprice) BAG800000, 500000

Page 18: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

18

Compute Aggregates Algorithm CAA

Algorithm CAA(I, Q)/* Input: RDF graph I, query Q *//* Output: table T(Q, I) */1) GP BuildGP(Q); X aggregate variables

of Q;2) Y GROUP BY variables of Q;3) S [VRetrieve(θ, GP, X U Y) |

θMSearchAll(GP, Q, I)];4) Return T(Q, I) TCompute(S, Q);

Page 19: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

19

Aggregate View Maintenance Algorithms AMX AMI – Insertion AMD – Deletion AMT – Triple Modification AMR – Resource Modification

Page 20: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

Update: InsertionBAG

800000, 500000

paints

Page 21: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

BAG800000, 500000

paints

Page 22: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

SELECT max(?highprice) BAG800000, 500000, 60000

paints

Page 23: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

23

AMI for InsertionAlgorithm AMI(I, Q, A(Q, I), T(Q, I), t)/* Input: RDF graph I, query Q, auxiliary data A(Q, I),

query result T(Q, I), inserted triple t *//* Output: table T(Q, I U t), auxiliary data A(Q, I U t) *1) GP BuildGP(Q); 2) X aggregate variables of Q;3) Y GROUP BY variables of Q;4) If TMatch(GP, t) == TRUE, then

a) ΔS [VRetrieve(θ, GP, X U Y) | θMSearch(GP, Q, t, I U t)];

b) return (T(Q, I U t), A(Q, I U t)) TMaintainI(T(Q,I), ΔS, A(Q, I), Q);

5) else, return (T(Q, I U t), A(Q, I U t)) (T(Q, I), A(Q, I));

Page 24: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

24

Algorithm MSearch(GP, Q, t, I)

/* Input: graph pattern GP, query Q, triple t, RDF graph I */

/* Output: Θ = {θ | θ is a pattern matching} */

1) Θ ;

2) for each t’ GP s.t. θ’, t θ’ = t’ θ’,a) for each θ bSearch(t, t’, GP, I),

i. if θ satisfies the constraints in Q, then Θ Θ U θ;

3) return Θ;

Page 25: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

25

Handling GROUP BY

From GROUP BY clause, each tuple in ΔS affects a particular group.

TMaintainI only maintain each affected group (and its corresponding auxiliary data) using affecting tuples.

Delete empty groups and insert new groups.

Page 26: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

26

TMaintainI Handling sum, count, min, max

No auxiliary data requiredSuppose f(x) is an aggregate function on

attribute x, F the original result, F’ the new result

F’ = F + if f = sum F’ = F + |ΔS| if f = count F’ = min([F] U πx(ΔS)) if f = min

F’ = max([F] U πx(ΔS)) if f = max

πx(ΔS) projects a bag of values of x from ΔS

)( Sv xv

Page 27: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

27

TMaintainI

Handling averageWe need size of S

size’ = size+|ΔS|

'' )(

size

vsizeFF Sv x

Page 28: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

BAG800000, 500000, 60000Update: Deletion

paints

Page 29: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

BAG800000, 500000, 60000

paints

Page 30: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

SELECT max(?highprice) BAG500000, 60000

paints

Page 31: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

31

AMD for DeletionAlgorithm AMD(I, Q, A(Q, I), T(Q, I), t)/* Input: RDF graph I, query Q, auxiliary data A(Q, I),

query result T(Q, I), deleted triple t *//* Output: table T(Q, I - t), auxiliary data A(Q, I - t) *1) GP BuildGP(Q); 2) X aggregate variables of Q;3) Y GROUP BY variables of Q;4) If TMatch(GP, t) == TRUE, then

a) ΔS [VRetrieve(θ, GP, X U Y) | θMSearch(GP, Q, t, I)];

b) return (T(Q, I - t), A(Q, I - t)) TMaintainD(T(Q,I), ΔS, A(Q, I), Q);

5) else, return (T(Q, I - t), A(Q, I - t)) (T(Q, I), A(Q, I));

Page 32: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

32

TMaintainD

Handling min, maxMin and max are not distributive w.r.t. deletionWe need to store πx(S) which projects a bag

of values of x from SThe new aggregate value F’ is obtained by:

F’ = min(πx(S - ΔS)) if f = min

F’ = max(πx(S - ΔS)) if f = maxWe need to update πx(S) to become

πx(S) - πx(ΔS)

Page 33: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

33

Implementation and Experiment

Implemented in Java Jena – RDQL Engine of HP Comparison with Relational Approach (standard

view maintenance algorithm on relational tables) Counting Algorithm in Gupta et al. "Maintaining Views

Incrementally", SIGMOD 1993

Dataset: Chef Moz Project RDF dump Data stored in memory

Page 34: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

34

Page 35: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

35

Other Related Work Volz, Oberle, Studer [DBFUSION’02]

the first to introduce a view mechanism for RDF data Their views require that

1. the results contain class instances (i.e., a subject or object variable), or

2. the result itself has the pattern of RDF statement (i.e., a triple containing subject, predicate and object).

Magkanaraki et al [ISWC’03] proposed RVL, a view definition language that can

also create virtual RDF schemas and restructure class and property hierarchies such that new resources, property values, classes and property types can be created.

None of these works specifically address (i) aggregates in RDF or (ii) the problem of maintaining aggregate RDF views.

Page 36: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

36

Summary

Aggregate Views are important for RDF applications

RDQL Extension for Views and Aggregates

Aggregate View Maintenance Algorithms AMXLocalized search in RDF graphs

Page 37: 1 RDF Aggregate Queries and Views Edward Hung, Yu Deng, V.S. Subrahmanian University of Maryland, College Park ICDE 2005, April 7, Tokyo, Japan

37

Thank you very much!

Questions and Answers