1 rdf aggregate queries and views edward hung, yu deng, v.s. subrahmanian university of maryland,...
Post on 20-Dec-2015
214 views
TRANSCRIPT
1
RDF Aggregate Queries and Views
Edward Hung, Yu Deng, V.S. Subrahmanian
University of Maryland, College Park
ICDE 2005, April 7, Tokyo, Japan
2
Maintenance of RDF Aggregate Views Introduction of RDF and RDQL RDQL Extension for Aggregate Views Aggregate View Maintenance Algorithms
AMX Implementation and Experiments Related Work
3
Introduction Resource Description Framework (RDF)
W3C RecommendationRepresents metadata about resources
identifiable on the web (by Uniform Resource Identifier (URI))
Triple: (Resource, Property, Value) (Artist, rdf:type, rdfs:Class) (Painter, rdf:type, rdfs:Class) (Painter, rdfs:subClassOf, Artist)
<?xml version="1.0"?><!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xml:base="http://www.auctionschema.com/schema1#">
<rdfs:Class rdf:ID="Artist"/> <rdfs:Class rdf:ID="Painter"><rdfs:subClassOf
rdf:resource="#Artist"/></rdfs:Class> <rdfs:Datatype rdf:about="&xsd;string"/> <rdf:Property rdf:ID="fname"> <rdfs:domain rdf:resource="#Artist"/> <rdfs:range rdf:resource="&xsd;string"/> </rdf:Property></rdf:RDF>
<?xml version="1.0"?><!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]><rdf:RDF xmlns:rdf ="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ns1="http://www.auctionschema.com/schema1#">
<rdf:Description rdf:about="http://www.artist.net#guyrose"> <rdf:type rdf:resource="ns1:Painter"/> <ns1:fname rdf:datatype="&xsd;string"> Guy </ns1:fname> </rdf:Description></rdf:RDF>
RDFSchema
RDFInstance
<?xml version="1.0"?><!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xml:base="http://www.auctionschema.com/schema1#">
<rdfs:Class rdf:ID="Artist"/> <rdfs:Class rdf:ID="Painter"><rdfs:subClassOf
rdf:resource="#Artist"/></rdfs:Class> <rdfs:Datatype rdf:about="&xsd;string"/> <rdf:Property rdf:ID="fname"> <rdfs:domain rdf:resource="#Artist"/> <rdfs:range rdf:resource="&xsd;string"/> </rdf:Property></rdf:RDF>
<?xml version="1.0"?><!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]><rdf:RDF xmlns:rdf ="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ns1="http://www.auctionschema.com/schema1#">
<rdf:Description rdf:about="http://www.artist.net#guyrose"> <rdf:type rdf:resource="ns1:Painter"/> <ns1:fname rdf:datatype="&xsd;string"> Guy </ns1:fname> </rdf:Description></rdf:RDF>
ArtistString
Painter
fname
subClassOf
&r1Guyfname
&r1 = http://www.artist.net#guyrose
7
RDQL: RDF Query Language
SELECT?highpriceWHERE (?artist, <ns1:lname>, "Rose"),(?artist, <ns1:fname>, "Guy"),(?artist, <ns1:creates>, ?artifact),(?artifact, <ns1:estimated>, ?price),(?price, <ns1:high>, ?highprice),(?artifact, <ns1:presented>, ?date)AND 2004-04-01 <= ?date <= 2004-04-30USING ns1 FOR http://www.auctionschema.com/schema1#>
graph pattern
8
RDQL Extension for Aggregates and Views
CREATEVIEW AS SELECT max(?highprice)WHERE (?artist, <ns1:lname>, "Rose"),(?artist, <ns1:fname>, "Guy"),(?artist, <ns1:creates>, ?artifact),(?artifact, <ns1:estimated>, ?price),(?price, <ns1:high>, ?highprice),(?artifact, <ns1:presented>, ?date)AND 2004-04-01 <= ?date <= 2004-04-30USING ns1 FOR http://www.auctionschema.com/schema1#>
9
Aggregate Query Aggregate operators, e.g. min, max, sum,
count, average GROUP BY clause Output a table of tuples
Output can be (i) an RDF instance or (ii) a tableAdvantage of (i): allows us to further query the
resultHowever, (ii) allows any forms of tables, which
include the possibility to output in the form of an RDF instance if the table consists of a set of RDF tuples.
We are expanding the syntax of RDQL so that it allows constants in SELECT clauses which equivalently creates new resources using the constants.
For example, the previous query can be modified as followsCREATEVIEW AS
SELECTSELECT <ns1:works_by_guyrose>, <ns1:works_by_guyrose>, <ns1:maxprice>, <ns1:maxprice>, maxmax(?highprice)(?highprice)
WHERE (?artist, <ns1:lname>, "Rose"),(?artist, <ns1:fname>, "Guy"),(?artist, <ns1:creates>, ?artifact),(?artifact, <ns1:estimated>, ?price),(?price, <ns1:high>, ?highprice),(?artifact, <ns1:presented>, ?date)AND 2004-04-01 <= ?date <= 2004-04-30USING ns1 FOR http://www.auctionschema.com/schema1#>
The result is a valid RDF statement (<ns1:works_by_guyrose>,<ns1:maxprice>,``800000"^^ns1:USD)
11
Aggregate View Maintenance
Relational Approach Store all triples in a relational table with schema
(Resource, Property, Value)OR Store resources and values of the same property in a
separate relational table with schema (Resource, Value)
#self-joins = (#triples in where-clause) – 1 Large number of delta rules during relational view
maintenance expensive
12
Aggregate View Maintenance
Our ApproachLocalized search in RDF graphsModified version of breadth-first search
starting at the inserted/deleted edgeauxiliary data are needed for certain
aggregate views min, max, avg
13
Distributive Aggregate Function An aggregate function f is distributive w.r.t a
source update operation if and only if the updated value is based on its old value and update
without reference to the source. Examples: count, sum, average w.r.t. insertion, deletion
and update For average, we will need an additional attribute size
which stores the size of intermediate result S in order to compute the correct updated value (or, we can use sum, count to calculate it)
max and min are distributive w.r.t. insertion, but not deletion and update Auxiliary data computed from S help to avoid the need to
refer to the source.
graph pattern
BAG
BAG800000
SELECT max(?highprice) BAG800000, 500000
18
Compute Aggregates Algorithm CAA
Algorithm CAA(I, Q)/* Input: RDF graph I, query Q *//* Output: table T(Q, I) */1) GP BuildGP(Q); X aggregate variables
of Q;2) Y GROUP BY variables of Q;3) S [VRetrieve(θ, GP, X U Y) |
θMSearchAll(GP, Q, I)];4) Return T(Q, I) TCompute(S, Q);
19
Aggregate View Maintenance Algorithms AMX AMI – Insertion AMD – Deletion AMT – Triple Modification AMR – Resource Modification
Update: InsertionBAG
800000, 500000
paints
BAG800000, 500000
paints
SELECT max(?highprice) BAG800000, 500000, 60000
paints
23
AMI for InsertionAlgorithm AMI(I, Q, A(Q, I), T(Q, I), t)/* Input: RDF graph I, query Q, auxiliary data A(Q, I),
query result T(Q, I), inserted triple t *//* Output: table T(Q, I U t), auxiliary data A(Q, I U t) *1) GP BuildGP(Q); 2) X aggregate variables of Q;3) Y GROUP BY variables of Q;4) If TMatch(GP, t) == TRUE, then
a) ΔS [VRetrieve(θ, GP, X U Y) | θMSearch(GP, Q, t, I U t)];
b) return (T(Q, I U t), A(Q, I U t)) TMaintainI(T(Q,I), ΔS, A(Q, I), Q);
5) else, return (T(Q, I U t), A(Q, I U t)) (T(Q, I), A(Q, I));
24
Algorithm MSearch(GP, Q, t, I)
/* Input: graph pattern GP, query Q, triple t, RDF graph I */
/* Output: Θ = {θ | θ is a pattern matching} */
1) Θ ;
2) for each t’ GP s.t. θ’, t θ’ = t’ θ’,a) for each θ bSearch(t, t’, GP, I),
i. if θ satisfies the constraints in Q, then Θ Θ U θ;
3) return Θ;
25
Handling GROUP BY
From GROUP BY clause, each tuple in ΔS affects a particular group.
TMaintainI only maintain each affected group (and its corresponding auxiliary data) using affecting tuples.
Delete empty groups and insert new groups.
26
TMaintainI Handling sum, count, min, max
No auxiliary data requiredSuppose f(x) is an aggregate function on
attribute x, F the original result, F’ the new result
F’ = F + if f = sum F’ = F + |ΔS| if f = count F’ = min([F] U πx(ΔS)) if f = min
F’ = max([F] U πx(ΔS)) if f = max
πx(ΔS) projects a bag of values of x from ΔS
)( Sv xv
27
TMaintainI
Handling averageWe need size of S
size’ = size+|ΔS|
'' )(
size
vsizeFF Sv x
BAG800000, 500000, 60000Update: Deletion
paints
BAG800000, 500000, 60000
paints
SELECT max(?highprice) BAG500000, 60000
paints
31
AMD for DeletionAlgorithm AMD(I, Q, A(Q, I), T(Q, I), t)/* Input: RDF graph I, query Q, auxiliary data A(Q, I),
query result T(Q, I), deleted triple t *//* Output: table T(Q, I - t), auxiliary data A(Q, I - t) *1) GP BuildGP(Q); 2) X aggregate variables of Q;3) Y GROUP BY variables of Q;4) If TMatch(GP, t) == TRUE, then
a) ΔS [VRetrieve(θ, GP, X U Y) | θMSearch(GP, Q, t, I)];
b) return (T(Q, I - t), A(Q, I - t)) TMaintainD(T(Q,I), ΔS, A(Q, I), Q);
5) else, return (T(Q, I - t), A(Q, I - t)) (T(Q, I), A(Q, I));
32
TMaintainD
Handling min, maxMin and max are not distributive w.r.t. deletionWe need to store πx(S) which projects a bag
of values of x from SThe new aggregate value F’ is obtained by:
F’ = min(πx(S - ΔS)) if f = min
F’ = max(πx(S - ΔS)) if f = maxWe need to update πx(S) to become
πx(S) - πx(ΔS)
33
Implementation and Experiment
Implemented in Java Jena – RDQL Engine of HP Comparison with Relational Approach (standard
view maintenance algorithm on relational tables) Counting Algorithm in Gupta et al. "Maintaining Views
Incrementally", SIGMOD 1993
Dataset: Chef Moz Project RDF dump Data stored in memory
34
35
Other Related Work Volz, Oberle, Studer [DBFUSION’02]
the first to introduce a view mechanism for RDF data Their views require that
1. the results contain class instances (i.e., a subject or object variable), or
2. the result itself has the pattern of RDF statement (i.e., a triple containing subject, predicate and object).
Magkanaraki et al [ISWC’03] proposed RVL, a view definition language that can
also create virtual RDF schemas and restructure class and property hierarchies such that new resources, property values, classes and property types can be created.
None of these works specifically address (i) aggregates in RDF or (ii) the problem of maintaining aggregate RDF views.
36
Summary
Aggregate Views are important for RDF applications
RDQL Extension for Views and Aggregates
Aggregate View Maintenance Algorithms AMXLocalized search in RDF graphs
37
Thank you very much!
Questions and Answers