opportunistic linked data querying through approximate membership metadata

42
Opportunistic Linked Data Querying through Approximate Membership Metadata Miel Vander Sande

Upload: miel-vander-sande

Post on 12-Apr-2017

560 views

Category:

Engineering


3 download

TRANSCRIPT

Page 1: Opportunistic Linked Data Querying through Approximate Membership Metadata

Opportunistic Linked Data Querying through Approximate Membership MetadataMiel Vander Sande

Page 2: Opportunistic Linked Data Querying through Approximate Membership Metadata

“Solve a query for a client, and it will be happy for a day.

Teach a client to SPARQL, and it’ll query happily ever after.” !

— Confucius, 431 BC

Page 3: Opportunistic Linked Data Querying through Approximate Membership Metadata

Linked Data Fragments: a uniform view on publishing Linked Data

Exploring the axis: selector and metadata

Approximate Membership Metadata

Querying through Approximate Membership Metadata

Opportunistic Querying

Page 4: Opportunistic Linked Data Querying through Approximate Membership Metadata

Linked Data Fragments: a uniform view on publishing Linked Data

Exploring the axis: selector and metadata

Approximate Membership Metadata

Querying through Approximate Membership Metadata

Opportunistic Querying

Page 5: Opportunistic Linked Data Querying through Approximate Membership Metadata

Interaction between client & server. The hunt for trade-offs: What can we learn?

high server costlow server cost

datadump

SPARQLendpoint

interface offered by the server

high availability low availabilityhigh bandwidth low bandwidthout-of-date data live data

low client costhigh client cost

Page 6: Opportunistic Linked Data Querying through Approximate Membership Metadata

Linked Data Fragments area uniform view on Linked Data interfaces.

datadump

SPARQLendpoint

interface offered by the server

Every Linked Data interfaceoffers specific fragments of a Linked Data set.

Page 7: Opportunistic Linked Data Querying through Approximate Membership Metadata

data

metadata

controls

What triples does it contain?

What do we know about it?

How to access more data?

Each type of Linked Data Fragment is defined by three characteristics.

Page 8: Opportunistic Linked Data Querying through Approximate Membership Metadata

all dataset triples

(none)

data dump

number of triples, file size

data

metadata

controls

Each type of Linked Data Fragment is defined by three characteristics.

Page 9: Opportunistic Linked Data Querying through Approximate Membership Metadata

triples matching the query

(none)

(none)

SPARQL query resultdata

metadata

controls

Each type of Linked Data Fragment is defined by three characteristics.

Page 10: Opportunistic Linked Data Querying through Approximate Membership Metadata

Linked Data Fragments: a uniform view on publishing Linked Data

Exploring the axis: selector and metadata

Approximate Membership Metadata

Querying through Approximate Membership Metadata

Opportunistic Querying

Page 11: Opportunistic Linked Data Querying through Approximate Membership Metadata

low server cost

datadump

SPARQLquery results

high availabilitylive data

Linked Datadocuments

triple patternfragments

You have to start somewhere: Triple Pattern Fragments.

Verborgh, R., Hartig, O.,…: Querying datasets on the Web with high availability. ISWC2014

high bandwidth

Page 12: Opportunistic Linked Data Querying through Approximate Membership Metadata

data (first 100)

controls (other fragments)

metadata (total count)

Page 13: Opportunistic Linked Data Querying through Approximate Membership Metadata

controls

Triple pattern fragment serversenable clients to be intelligent.

<http://fragments.dbpedia.org/2014/en#dataset> hydra:search [ hydra:template "http://fragments.dbpedia.org/2014/en {?subject,predicate,object}"; hydra:mapping [ hydra:variable "subject"; hydra:property rdf:subject ], [ hydra:variable "predicate"; hydra:property rdf:predicate ], [ hydra:variable "object"; hydra:property rdf:object ] ].

The RDF representation explains:“you can query by triple pattern”.

Page 14: Opportunistic Linked Data Querying through Approximate Membership Metadata

The RDF representation explains:“this is the number of matches”.

metadata

Triple pattern fragment serversenable clients to be intelligent.

<#fragment> void:triples 8141.

Page 15: Opportunistic Linked Data Querying through Approximate Membership Metadata

Give them a SPARQL query.Give them a URL of any dataset fragment.

How can intelligent clientssolve SPARQL queries over fragments?

They look inside the fragmentto see how to access the dataset

and use the metadatato decide how to plan the query.

Page 16: Opportunistic Linked Data Querying through Approximate Membership Metadata

The client splits the queryinto the available fragments.

SELECT ?artist ?name WHERE { ?artist a dbpedia-owl:Artist; rdfs:label ?name; dbpedia-owl:birthPlace dbpedia:Padua. FILTER LANGMATCHES(LANG(?name), "EN") }

Page 17: Opportunistic Linked Data Querying through Approximate Membership Metadata

The client gets the fragments and inspects their metadata.

?artist a dbpedia-owl:Artist.first 100 triples

96,000

?artist rdfs:label ?name.first 100 triples

12,000,000

?artist dbont:birthPlace dbpedia:Padua.first 100 triples

135

Page 18: Opportunistic Linked Data Querying through Approximate Membership Metadata

?artist a dbpedia-owl:Artist. 96.000

?artist rdfs:label ?name. 12.000.000

?artist dbont:birthPlace dbpedia:Padua.dbpedia:Alberto_Benettin dbont:birthPlace dbpedia:Padua.

135

dbpedia:Alberto_Bigon dbont:birthPlace dbpedia:Padua.

The metadata enables the client to choose the right starting point.

dbp:Alberto_Benettin a dbont:Artist.

dbp:Alberto_Benettin rdfs:label ?name.

Page 19: Opportunistic Linked Data Querying through Approximate Membership Metadata

For some patterns, many requests are of type “is this triple in the dataset?”

Frac

tion

of m

embe

rshi

p qu

erie

s

0%

25%

50%

75%

100%

L1 L2 L3 L4 L5 S1 S2 S3 S4 S5 S6 S7 F1 F2 F3 F4 F5 C1 C2 C3

20 WatDiv querieslinear (L), star (S), snowflake-shaped (F) and complex (C)

Page 20: Opportunistic Linked Data Querying through Approximate Membership Metadata

Advancing in selector and/or metadata dimensions.

met

adat

aselector

Triple Pattern Fragments

low server costhigh availability

live data

high bandwidth

Simple Questions

Complex Questions

No information for the client

Extensive usefulinformation for the client

Page 21: Opportunistic Linked Data Querying through Approximate Membership Metadata

Advancing in selector and/or metadata dimensions.

met

adat

aselector

Triple Pattern Fragments

Substring search

J Van Herwegen et. al.: Substring Filtering for Low-Cost Linked Data InterfacesLast talk of this session!

Page 22: Opportunistic Linked Data Querying through Approximate Membership Metadata

Advancing in selector and/or metadata dimensions.

met

adat

aselector

Triple Pattern Fragments

Substring search

Approximate MembershipFunction (AMF)

Page 23: Opportunistic Linked Data Querying through Approximate Membership Metadata

Linked Data Fragments: a uniform view on publishing Linked Data

Exploring the axis: selector and metadata

Approximate Membership Metadata

Querying through Approximate Membership Metadata

Opportunistic Querying

Page 24: Opportunistic Linked Data Querying through Approximate Membership Metadata

Append TPF response with a compact representation of all possible mappings.

met

adat

a

Triple Pattern Fragments

Approximate Membership Function (AMF)

Approximate set membership assessment with a predefined false positive probability.

Bloom filter / Golomb-coded set

+

Page 25: Opportunistic Linked Data Querying through Approximate Membership Metadata

“Can we reduce the number of HTTP requests?”

“Can we reduce the total execution time?”

“What is the overhead on server CPU load?”

Page 26: Opportunistic Linked Data Querying through Approximate Membership Metadata

Bloom Filter

Golomb-coded set (GCS)

0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 … 0 1 0

!

!

n0 dbpedia:Alberto_Benettin

n1 dbpedia:Alberto_Bigon

nx …

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 … 0 0 0

m

0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 … 0 1 0

k0 k1 kx

k0 k1 kx

0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 … 0 0 0

!

n0 dbpedia:Alberto_Benettin

n1 dbpedia:Alberto_Bigon k

0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 … 0 0 0

k

0 1 0 1 1 0 1Golomb coded

Geometric distribution

Page 27: Opportunistic Linked Data Querying through Approximate Membership Metadata

“this BloomFilter with false positive probability X and hash function Y represents the presence of all bindings for ?s”.

metadata

Server enables clients to avoid membership requests.

<#fragment> void:triples 96300. # existing count metadata _:membershipFunction a ms:BloomFilter; # AMF metadata ms:hashSize 524288; ms:hashFunction <MyMurmur1>, <MyMurmur2>; ms:memberCollection [ ms:sourceCollection <#fragment>; ms:projectedProperty rdf:subject ]; ms:falsePositiveRate 0.05; ms:falseNegativeRate 0.0; ms:binaryRepresentation "QmF...ZTY"^^xsd:base64Binary.

Page 28: Opportunistic Linked Data Querying through Approximate Membership Metadata

GET ?artist dbont:birthPlace dbpedia:Padua.dbpedia:Alberto_Benettin dbont:birthPlace dbpedia:Padua.

135

Client filters non-members locally with one extra (cached) request

GET dbpedia:Alberto_Benettin a dbont:Artist. 0GET dbpedia:Alberto_Bigon a dbont:Artist. 1GET dbpedia:Alberto_Da_Zara a dbont:Artist. 1GET dbpedia:Alberto_Gallo a dbont:Artist. 0GET dbpedia:Alberto_Bigon a dbont:Artist. 1

GET ?artist a dbont:Artist.

Appr

ox. M

embe

rshi

p Fi

lt.

GET …

Page 29: Opportunistic Linked Data Querying through Approximate Membership Metadata

We evaluated for request count, server cost and speedup in a Web setting.

BloomFilter: MurMurHash3, GCS: FNV-1

1 HTTP Cache with 1 Mbps

p = 1/1024 (0.1%) , 1/128 (1%), 1/64 (1.6%)

250 queries from 125 diverse WatDiv templates on Amazon EC2 machine

WatDiv 100M triples dataset

Timeout: 3min

Page 30: Opportunistic Linked Data Querying through Approximate Membership Metadata

We evaluated for request count, server cost and speedup in a Web setting.

vs. vanilla TPF server & client

Original “greedy” algorithmOptimized join-tree algorithm*

250 queries from 125 diverse WatDiv templates on Amazon EC2 machine

* Van Herwegen, et. al.: Query Execution Optimization for Clients of Triple Patterns Fragments. ESWC2015

2 client algorithms:

Page 31: Opportunistic Linked Data Querying through Approximate Membership Metadata

> 50% of the queries has fewer requests,< 20% has more requests.

Greedy Bloom

Greedy GCS

Optimized Bloom

Optimized GCS

Percentage of queries (p = 1/1024)

0% 25% 50% 75% 100%

6%

5%

18%

17%

59%

62%

49%

50%

35%

33%

33%

32%

Equal Fewer Requests More Requests

Page 32: Opportunistic Linked Data Querying through Approximate Membership Metadata

Queries with relatively many HTTP req. (45,000+ / query) benefit greatly

Diff

eren

ce in

#Re

ques

ts

0

4,000

8,000

12,000

16,000

Fewer Requests More Requests

Greedy Bloom Greedy GCS Optimized Bloom Optimized GCS

< 35

Page 33: Opportunistic Linked Data Querying through Approximate Membership Metadata

No queries have reduction in execution time, a third even has increase.

Greedy Bloom

Greedy GCS

Optimized Bloom

Optimized GCS

Percentage of queries (p = 1/1024)

0% 25% 50% 75% 100%

16%

31%

33%

38%0%

84%

69%

67%

62%

Equal Lower Execution time Higher Execution time

Page 34: Opportunistic Linked Data Querying through Approximate Membership Metadata

Server remains low-cost, as impact is very acceptable (< 6%).

CPU

(%)

0

7.5

15

22.5

30

Original

Bloom (1/1

024)

Bloom (1/1

28)

Bloom (1/6

4)

GCS (1/1

024)

GCS (1/1

28)

GCS (1/6

4)

11.110.810.2

14.9

11.210.89.2

Page 35: Opportunistic Linked Data Querying through Approximate Membership Metadata

Linked Data Fragments: a uniform view on publishing Linked Data

Exploring the axis: selector and metadata

Approximate Membership Metadata

Querying through Approximate Membership Metadata

Opportunistic Querying

Page 36: Opportunistic Linked Data Querying through Approximate Membership Metadata

During execution, a result candidate could already be correct (1 - p).

Can we be opportunistic here, and temporarily allow imprecise results?

Page 37: Opportunistic Linked Data Querying through Approximate Membership Metadata

“Can we reduce the time to 100% recall?”

Opportunistic Linked Data Querying 13

only allowcertain results

temporarily allowuncertain results

startexecution

startexecution

1st resultcomputed

1st resultcomputed

n < r resultscomputed

n < r resultscomputed

r resultscomputed

r resultscomputed

r + f resultscomputed

0% recall 100% recall 100% recall100% precision

Fig. 2. This SPARQL query execution timeline compares regular and opportunistic

query execution, assuming r total query results and f false positives. Note how

both approaches achieve 100% recall and precision at a shared point in the end, but

there exists a period during which only opportunistic execution reaches 100% recall

(shaded).

need to be discarded. The user thus sees the photos faster than if theyhad only been retrieved after full precision was achieved. This exampleindicates that opportunistic query answering has direct concrete uses inWeb applications.

7 Evaluation

In the following, we discuss our evaluation of executing SPARQL queriesagainst TPF interfaces with an AMF feature. From these experiments, we aimto assess whether AMFs are a valuable asset in the metadata dimension. We�rst describe the experiments and their setup. Then, we discuss their resultsto validate the three hypotheses of Section 3.2.

7.1 Experimental setup

We extended the existing implementations of the TPF client3 and server4 tosupport both Bloom �lters and Golomb-coded sets. The server is con�guredby specifying the AMF and the desired false positive probability. We chose the32-bit MurMurHash3 hash function for GCS and FNV-1 for the Bloom �lter.The server calculates a membership function on the �y for each request fora triple pattern with a single variable.

We ran the experiments with di�erent false positive probabilities p:1/1024 ⇡ 0.1%, 1/128 ⇡ 1%, and 1/64 ⇡ 1.6%. In each experiment, we exe-cuted 250 queries generated from 125 diverse WatDiv SPARQL templates onthree interfaces: i) regular TPF interface ii) TPF with Bloom �lters, and iii) TPFwith GCS. All three cases were tested with both the original and the opti-mized client; the last two setups were tested with and without opportunistic3https://github.com/LinkedDataFragments/Client.js/tree/amq

4https://github.com/LinkedDataFragments/Server.js/tree/amq

Page 38: Opportunistic Linked Data Querying through Approximate Membership Metadata

Temporarily allowing <100% precision can reduce 100% recall time with 1/3.

Exec

utio

n tim

e (s

)

0

35

70

105

140

Greedy + Bloom (p = 1/1024)

100% Recall 100% Precision

Number of revoked results was 0 or 1.

Page 39: Opportunistic Linked Data Querying through Approximate Membership Metadata

Linked Data Fragments: a uniform view on publishing Linked Data

Exploring the axis: selector and metadata

Approximate Membership Metadata

Querying through Approximate Membership Metadata

Opportunistic Querying

Page 40: Opportunistic Linked Data Querying through Approximate Membership Metadata

For some queries types, bandwidth highly decreases for TPF query execution.

Approximate Membership Metadata is a nuanced debate

For larger fragments, realtime computation hurts execution time. We expect gain with pre-caching and out-of-band delivery.

Opportunistic querying is a promising direction for further exploration.

Page 41: Opportunistic Linked Data Querying through Approximate Membership Metadata

TRIPLE PATTERNfragments

dataAPPR. MEM. FILT.

No one size fits all, explore the axis.Find metrics that fit your use-case.

Client & Server loadRequest & Response size

Protocol (HTTP) impact…

Try you own trade-off server at our demo (and get a nice cup of coffee).

Start serving Linked Data like a barista

Page 42: Opportunistic Linked Data Querying through Approximate Membership Metadata

Opportunistic Linked Data Querying through Approximate Membership MetadataMiel Vander Sande