1 rdf aggregate queries and views edward hung, yu deng, v.s. subrahmanian university of maryland,...

1

RDF Aggregate Queries and Views

Edward Hung, Yu Deng, V.S. Subrahmanian

University of Maryland, College Park

ICDE 2005, April 7, Tokyo, Japan

2

Maintenance of RDF Aggregate Views Introduction of RDF and RDQL RDQL Extension for Aggregate Views Aggregate View Maintenance Algorithms

AMX Implementation and Experiments Related Work

3

Introduction Resource Description Framework (RDF)

W3C RecommendationRepresents metadata about resources

identifiable on the web (by Uniform Resource Identifier (URI))

Triple: (Resource, Property, Value) (Artist, rdf:type, rdfs:Class) (Painter, rdf:type, rdfs:Class) (Painter, rdfs:subClassOf, Artist)

<?xml version="1.0"?><!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xml:base="http://www.auctionschema.com/schema1#">

<rdfs:Class rdf:ID="Artist"/> <rdfs:Class rdf:ID="Painter"><rdfs:subClassOf

rdf:resource="#Artist"/></rdfs:Class> <rdfs:Datatype rdf:about="&xsd;string"/> <rdf:Property rdf:ID="fname"> <rdfs:domain rdf:resource="#Artist"/> <rdfs:range rdf:resource="&xsd;string"/> </rdf:Property></rdf:RDF>

<?xml version="1.0"?><!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]><rdf:RDF xmlns:rdf ="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ns1="http://www.auctionschema.com/schema1#">

<rdf:Description rdf:about="http://www.artist.net#guyrose"> <rdf:type rdf:resource="ns1:Painter"/> <ns1:fname rdf:datatype="&xsd;string"> Guy </ns1:fname> </rdf:Description></rdf:RDF>

RDFSchema

RDFInstance

<?xml version="1.0"?><!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xml:base="http://www.auctionschema.com/schema1#">

<rdfs:Class rdf:ID="Artist"/> <rdfs:Class rdf:ID="Painter"><rdfs:subClassOf

rdf:resource="#Artist"/></rdfs:Class> <rdfs:Datatype rdf:about="&xsd;string"/> <rdf:Property rdf:ID="fname"> <rdfs:domain rdf:resource="#Artist"/> <rdfs:range rdf:resource="&xsd;string"/> </rdf:Property></rdf:RDF>

<?xml version="1.0"?><!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]><rdf:RDF xmlns:rdf ="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ns1="http://www.auctionschema.com/schema1#">

<rdf:Description rdf:about="http://www.artist.net#guyrose"> <rdf:type rdf:resource="ns1:Painter"/> <ns1:fname rdf:datatype="&xsd;string"> Guy </ns1:fname> </rdf:Description></rdf:RDF>

ArtistString

Painter

fname

subClassOf

&r1Guyfname

&r1 = http://www.artist.net#guyrose

7

RDQL: RDF Query Language

SELECT?highpriceWHERE (?artist, <ns1:lname>, "Rose"),(?artist, <ns1:fname>, "Guy"),(?artist, <ns1:creates>, ?artifact),(?artifact, <ns1:estimated>, ?price),(?price, <ns1:high>, ?highprice),(?artifact, <ns1:presented>, ?date)AND 2004-04-01 <= ?date <= 2004-04-30USING ns1 FOR http://www.auctionschema.com/schema1#>

graph pattern

8

RDQL Extension for Aggregates and Views

CREATEVIEW AS SELECT max(?highprice)WHERE (?artist, <ns1:lname>, "Rose"),(?artist, <ns1:fname>, "Guy"),(?artist, <ns1:creates>, ?artifact),(?artifact, <ns1:estimated>, ?price),(?price, <ns1:high>, ?highprice),(?artifact, <ns1:presented>, ?date)AND 2004-04-01 <= ?date <= 2004-04-30USING ns1 FOR http://www.auctionschema.com/schema1#>

9

Aggregate Query Aggregate operators, e.g. min, max, sum,

count, average GROUP BY clause Output a table of tuples

Output can be (i) an RDF instance or (ii) a tableAdvantage of (i): allows us to further query the

resultHowever, (ii) allows any forms of tables, which

include the possibility to output in the form of an RDF instance if the table consists of a set of RDF tuples.

We are expanding the syntax of RDQL so that it allows constants in SELECT clauses which equivalently creates new resources using the constants.

For example, the previous query can be modified as followsCREATEVIEW AS

SELECTSELECT <ns1:works_by_guyrose>, <ns1:works_by_guyrose>, <ns1:maxprice>, <ns1:maxprice>, maxmax(?highprice)(?highprice)

WHERE (?artist, <ns1:lname>, "Rose"),(?artist, <ns1:fname>, "Guy"),(?artist, <ns1:creates>, ?artifact),(?artifact, <ns1:estimated>, ?price),(?price, <ns1:high>, ?highprice),(?artifact, <ns1:presented>, ?date)AND 2004-04-01 <= ?date <= 2004-04-30USING ns1 FOR http://www.auctionschema.com/schema1#>

The result is a valid RDF statement (<ns1:works_by_guyrose>,<ns1:maxprice>,``800000"^^ns1:USD)

11

Aggregate View Maintenance

Relational Approach Store all triples in a relational table with schema

(Resource, Property, Value)OR Store resources and values of the same property in a

separate relational table with schema (Resource, Value)

#self-joins = (#triples in where-clause) – 1 Large number of delta rules during relational view

maintenance expensive

12

Aggregate View Maintenance

Our ApproachLocalized search in RDF graphsModified version of breadth-first search

starting at the inserted/deleted edgeauxiliary data are needed for certain

aggregate views min, max, avg

13

Distributive Aggregate Function An aggregate function f is distributive w.r.t a

source update operation if and only if the updated value is based on its old value and update

without reference to the source. Examples: count, sum, average w.r.t. insertion, deletion

and update For average, we will need an additional attribute size

which stores the size of intermediate result S in order to compute the correct updated value (or, we can use sum, count to calculate it)

max and min are distributive w.r.t. insertion, but not deletion and update Auxiliary data computed from S help to avoid the need to

refer to the source.

graph pattern

BAG800000

SELECT max(?highprice) BAG800000, 500000

18

Compute Aggregates Algorithm CAA

Algorithm CAA(I, Q)/* Input: RDF graph I, query Q *//* Output: table T(Q, I) */1) GP BuildGP(Q); X aggregate variables

of Q;2) Y GROUP BY variables of Q;3) S [VRetrieve(θ, GP, X U Y) |

θMSearchAll(GP, Q, I)];4) Return T(Q, I) TCompute(S, Q);

19

Aggregate View Maintenance Algorithms AMX AMI – Insertion AMD – Deletion AMT – Triple Modification AMR – Resource Modification

Update: InsertionBAG

800000, 500000

paints

BAG800000, 500000

paints

SELECT max(?highprice) BAG800000, 500000, 60000

paints

23

AMI for InsertionAlgorithm AMI(I, Q, A(Q, I), T(Q, I), t)/* Input: RDF graph I, query Q, auxiliary data A(Q, I),

query result T(Q, I), inserted triple t *//* Output: table T(Q, I U t), auxiliary data A(Q, I U t) *1) GP BuildGP(Q); 2) X aggregate variables of Q;3) Y GROUP BY variables of Q;4) If TMatch(GP, t) == TRUE, then

a) ΔS [VRetrieve(θ, GP, X U Y) | θMSearch(GP, Q, t, I U t)];

b) return (T(Q, I U t), A(Q, I U t)) TMaintainI(T(Q,I), ΔS, A(Q, I), Q);

5) else, return (T(Q, I U t), A(Q, I U t)) (T(Q, I), A(Q, I));

24

Algorithm MSearch(GP, Q, t, I)

/* Input: graph pattern GP, query Q, triple t, RDF graph I */

/* Output: Θ = {θ | θ is a pattern matching} */

1) Θ ;

2) for each t’ GP s.t. θ’, t θ’ = t’ θ’,a) for each θ bSearch(t, t’, GP, I),

i. if θ satisfies the constraints in Q, then Θ Θ U θ;

3) return Θ;

25

Handling GROUP BY

From GROUP BY clause, each tuple in ΔS affects a particular group.

TMaintainI only maintain each affected group (and its corresponding auxiliary data) using affecting tuples.

Delete empty groups and insert new groups.

26

TMaintainI Handling sum, count, min, max

No auxiliary data requiredSuppose f(x) is an aggregate function on

attribute x, F the original result, F’ the new result

F’ = F + if f = sum F’ = F + |ΔS| if f = count F’ = min([F] U πx(ΔS)) if f = min

F’ = max([F] U πx(ΔS)) if f = max

πx(ΔS) projects a bag of values of x from ΔS

)( Sv xv

27

TMaintainI

Handling averageWe need size of S

size’ = size+|ΔS|

'' )(

size

vsizeFF Sv x

BAG800000, 500000, 60000Update: Deletion

paints

BAG800000, 500000, 60000

paints

SELECT max(?highprice) BAG500000, 60000

paints

31

AMD for DeletionAlgorithm AMD(I, Q, A(Q, I), T(Q, I), t)/* Input: RDF graph I, query Q, auxiliary data A(Q, I),

query result T(Q, I), deleted triple t *//* Output: table T(Q, I - t), auxiliary data A(Q, I - t) *1) GP BuildGP(Q); 2) X aggregate variables of Q;3) Y GROUP BY variables of Q;4) If TMatch(GP, t) == TRUE, then

a) ΔS [VRetrieve(θ, GP, X U Y) | θMSearch(GP, Q, t, I)];

b) return (T(Q, I - t), A(Q, I - t)) TMaintainD(T(Q,I), ΔS, A(Q, I), Q);

5) else, return (T(Q, I - t), A(Q, I - t)) (T(Q, I), A(Q, I));

32

TMaintainD

Handling min, maxMin and max are not distributive w.r.t. deletionWe need to store πx(S) which projects a bag

of values of x from SThe new aggregate value F’ is obtained by:

F’ = min(πx(S - ΔS)) if f = min

F’ = max(πx(S - ΔS)) if f = maxWe need to update πx(S) to become

πx(S) - πx(ΔS)

33

Implementation and Experiment

Implemented in Java Jena – RDQL Engine of HP Comparison with Relational Approach (standard

view maintenance algorithm on relational tables) Counting Algorithm in Gupta et al. "Maintaining Views

Incrementally", SIGMOD 1993

Dataset: Chef Moz Project RDF dump Data stored in memory

35

Other Related Work Volz, Oberle, Studer [DBFUSION’02]

the first to introduce a view mechanism for RDF data Their views require that

1. the results contain class instances (i.e., a subject or object variable), or

2. the result itself has the pattern of RDF statement (i.e., a triple containing subject, predicate and object).

Magkanaraki et al [ISWC’03] proposed RVL, a view definition language that can

also create virtual RDF schemas and restructure class and property hierarchies such that new resources, property values, classes and property types can be created.

None of these works specifically address (i) aggregates in RDF or (ii) the problem of maintaining aggregate RDF views.

36

Summary

Aggregate Views are important for RDF applications

RDQL Extension for Views and Aggregates

Aggregate View Maintenance Algorithms AMXLocalized search in RDF graphs

37

Thank you very much!

Questions and Answers

1 rdf aggregate queries and views edward hung, yu deng, v.s. subrahmanian university of maryland,...

Documents

artist slide

rdf aggregate queries

rdf query language

value artist

class painter

work slide

rdql rdql extension

views edward hung