hibiscus: hypergraph-based source selection for sparql endpoint federation

39
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation Muhammad Saleem , Axel-Cyrille Ngonga Ngomo { lastname }@ informatik.uni-leipzig.de Agile Knowledge Engineering and Semantic Web (AKSW), University of Leipzig, Germany 11th Extended Semantic Web Conferece (ESWC), Crete, Greece, 25th-29th of May 2014

Upload: muhammad-saleem

Post on 15-Jan-2015

358 views

Category:

Engineering


2 download

DESCRIPTION

Efficient federated query processing is of significant importance to tame the large amount of data available on the Web of Data. Previous works have focused on generating optimized query execution plans for fast result retrieval. However, devising source selection approaches beyond triple pattern-wise source selection has not received much attention. This work presents HiBISCuS, a novel hypergraph-based source selection approach to federated SPARQL querying. Our approach can be directly combined with existing SPARQL query federation engines to achieve the same recall while querying fewer data sources. We extend three well-known SPARQL query federation engines with HiBISCus and compare our extensions with the original approaches on FedBench. Our evaluation shows that HiBISCuS can efficiently reduce the total number of sources selected without losing recall. Moreover, our approach significantly reduces the execution time of the selected engines on most of the benchmark queries.

TRANSCRIPT

Page 1: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint

Federation

Muhammad Saleem, Axel-Cyrille Ngonga Ngomo

{lastname}@informatik.uni-leipzig.deAgile Knowledge Engineering and Semantic Web (AKSW), University of Leipzig, Germany11th Extended Semantic Web Conferece (ESWC), Crete, Greece, 25th-29th of May 2014

Page 2: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

SPARQL Endpoint Federation

S1

S2

S3

S4

RDF RDF RDF RDF

Parser

Source Selection

Federator Optimizer

Integrator

Get individual triple patterns

Identify capable source against individual triple patterns

Generate optimized sub-query execution plan

Integrate sub-queries results

Execute sub-queries

Page 3: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

Our Contribution

S1

S2

S3

S4

RDF RDF RDF RDF

Parser

Source Selection

Federator Optimizer

Integrator

Efficiently identify capable source against individual triple patterns of a SPARQL query

Page 4: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

Motivation

FedBench (LD3): Return for all US presidents their party membership and news pages about them.

SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}

dbpedia

RDF

Source Selection Algorithm

Triple pattern-wise source selection

S1TP1 =

KEGG

RDF

ChEBI

RDF

NYT

RDF

SWDF

RDF

LMDB

RDF

Jamendo

RDF

Geo Names

RDF

DrugBank

RDF

S1 S2 S3 S4 S5 S6 S7 S8 S9

//TP1

//TP3//TP4

//TP5

//TP2

Page 5: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

Motivation

FedBench (LD3): Return for all US presidents their party membership and news pages about them.

SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}

dbpedia

RDF

Source Selection Algorithm

Triple pattern-wise source selection

S1TP1 =

KEGG

RDF

ChEBI

RDF

NYT

RDF

SWDF

RDF

LMDB

RDF

Jamendo

RDF

Geo Names

RDF

DrugBank

RDF

S1 S2 S3 S4 S5 S6 S7 S8 S9

//TP1

//TP3//TP4

//TP5

//TP2

TP2 = S1

Page 6: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

Motivation

FedBench (LD3): Return for all US presidents their party membership and news pages about them.

SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}

dbpedia

RDF

Source Selection Algorithm

Triple pattern-wise source selection

S1TP1 =

KEGG

RDF

ChEBI

RDF

NYT

RDF

SWDF

RDF

LMDB

RDF

Jamendo

RDF

Geo Names

RDF

DrugBank

RDF

S1 S2 S3 S4 S5 S6 S7 S8 S9

//TP1

//TP3//TP4

//TP5

//TP2

TP2 = S1

TP3 = S1

Page 7: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

Motivation

FedBench (LD3): Return for all US presidents their party membership and news pages about them.

SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}

dbpedia

RDF

Source Selection Algorithm

Triple pattern-wise source selection

S1TP1 =

KEGG

RDF

ChEBI

RDF

NYT

RDF

SWDF

RDF

LMDB

RDF

Jamendo

RDF

Geo Names

RDF

DrugBank

RDF

S1 S2 S3 S4 S5 S6 S7 S8 S9

//TP1

//TP3//TP4

//TP5

//TP2

TP2 = S1

TP3 = S1 TP4 = S4

Page 8: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

Motivation

FedBench (LD3): Return for all US presidents their party membership and news pages about them.

SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}

dbpedia

RDF

Source Selection Algorithm

Triple pattern-wise source selection

S1TP1 =

KEGG

RDF

ChEBI

RDF

NYT

RDF

SWDF

RDF

LMDB

RDF

Jamendo

RDF

Geo Names

RDF

DrugBank

RDF

S1 S2 S3 S4 S5 S6 S7 S8 S9

//TP1

//TP3//TP4

//TP5

//TP2

TP2 = S1

TP3 = S1 TP4 = S4

TP5 = S1 S2 S4 S5

S6 S7 S8 S9

Page 9: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

Motivation

FedBench (LD3): Return for all US presidents their party membership and news pages about them.

SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}

dbpedia

RDF

Source Selection Algorithm

Triple pattern-wise source selection

S1TP1 =

KEGG

RDF

ChEBI

RDF

NYT

RDF

SWDF

RDF

LMDB

RDF

Jamendo

RDF

Geo Names

RDF

DrugBank

RDF

S1 S2 S3 S4 S5 S6 S7 S8 S9

//TP1

//TP3//TP4

//TP5

//TP2

TP2 = S1

TP3 = S1 TP4 = S4

TP5 = S1 S2 S4 S5

S6 S7 S8 S9 Total triple pattern-wise selected sources = 12Total SPARQL ASK queries : 9*5 = 45

Page 10: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

Motivation

FedBench (LD3): Return for all US presidents their party membership and news pages about them.

SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}

dbpedia

RDF

Source Selection Algorithm

Triple pattern-wise source selection

S1TP1 =

KEGG

RDF

ChEBI

RDF

NYT

RDF

SWDF

RDF

LMDB

RDF

Jamendo

RDF

Geo Names

RDF

DrugBank

RDF

S1 S2 S3 S4 S5 S6 S7 S8 S9

//TP1

//TP3//TP4

//TP5

//TP2

TP2 = S1

TP3 = S1 TP4 = S4

TP5 = S1 S2 S4 S5

S6 S7 S8 S9 Total triple pattern-wise selected sources = 12Total SPARQL ASK queries : 9*5 = 45

Page 11: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

Motivation

FedBench (LD3): Return for all US presidents their party membership and news pages about them.

SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}

dbpedia

RDF

Source Selection Algorithm

Triple pattern-wise source selection

S1TP1 =

TP3 = S1

Optimal triple pattern-wise selected sources 5

KEGG

RDF

ChEBI

RDF

NYT

RDF

SWDF

RDF

LMDB

RDF

Jamendo

RDF

Geo Names

RDF

DrugBank

RDF

S1 S2 S3 S4 S5 S6 S7 S8 S9

//TP1

//TP3//TP4

//TP5

//TP2

TP2 = S1

TP4 = S4

TP5 = S1 S2 S4 S5

S6 S7 S8 S9

Page 12: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

Problem Statement

• An overestimation of triple pattern-wise source selection can be expensive– Resources are wasted– Query runtime is increased– Extra traffic is generated

• How do we perform join-aware triple pattern wise source selection in time efficient way?

Page 13: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

HiBISCuS: Key Concept

• Makes use of the URI’s authorities

http://dbpedia.org/ontology/partyScheme Authority Path

For URI details: http://tools.ietf.org/html/rfc3986

Page 14: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

HiBISCuS: SPARQL Query as HypergraphSELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}

?president

rdf:type dbpedia:President

Page 15: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

HiBISCuS: SPARQL Query as HypergraphSELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}

?president

rdf:type dbpedia:President

dbpedia:United_S

tates

dbpedia:nationality

Page 16: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

HiBISCuS: SPARQL Query as HypergraphSELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}

?president

rdf:type dbpedia:President

dbpedia:United_S

tates

dbpedia:nationality

dbpedia:party ?party

Page 17: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

HiBISCuS: SPARQL Query as HypergraphSELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}

?president

rdf:type dbpedia:President

dbpedia:United_S

tates

dbpedia:nationality

dbpedia:party ?party

?x

nyt:topicPage

?page

Page 18: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

HiBISCuS: SPARQL Query as HypergraphSELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}

?president

rdf:type dbpedia:President

dbpedia:United_S

tates

dbpedia:nationality

dbpedia:party ?party

?x

nyt:topicPage

?page

owl:SameAs

Page 19: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

HiBISCuS: SPARQL Query as HypergraphSELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}

?president

rdf:type dbpedia:President

dbpedia:United_S

tates

dbpedia:nationality

?x

owl:SameAs

dbpedia:party ?party

nyt:topicPage

?page

Star simple hybrid Tail of hyperedge

Page 20: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

HiBISCuS: Data Summaries[] a ds:Service ; ds:endpointUrl <http://dbpedia.org/sparql> ; ds:capability [ ds:predicate dbpedia:party ; ds:sbjAuthority <http://dbpedia.org/> ; ds:objAuthority <http://dbpedia.org/> ; ] ; ds:capability [ ds:predicate rdf:type ; ds:sbjAuthority <http://dbpedia.org/> ; ds:objAuthority owl:Thing, dbpedia:President; #we store all distinct

classes ] ; ds:capability [ ds:predicate dbpedia:postalCode ; ds:sbjAuthority <http://dbpedia.org/> ; #No objAuthority as the object value for dbpedia:postalCode is string ] ;

Page 21: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

HiBISCuS: Triple Pattern-wise Source Selection

SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}

?president

rdf:type dbpedia:President

dbpedia:United_S

tates

dbpedia:nationality

?x

owl:SameAs

dbpedia:party ?party

nyt:topicPage

?page

dbpedia

dbpedia

dbpedia

NYT

dbpedia KEGG NYT SWDF LMDB Geo DrgBnk Jamendo

Page 22: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

HiBISCuS: Triple Pattern-wise Source Pruning

SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}

?president

rdf:type dbpedia:President

dbpedia:United_S

tates

dbpedia:nationality

?x

owl:SameAs

dbpedia:party ?party

nyt:topicPage

?page

dbpedia

dbpedia

dbpedia

Sbj. auth.

Sbj. auth.

Sbj. auth.

NYTSbj. auth.

dbpedia KEGG NYT SWDFDrgBnk LMDB Geo Jamendo

Obj.auth.

dbpedia

Sbj. auth.

KEGG

Sbj. auth.

NYT

Sbj. auth.

SWDF

Sbj. auth.

LMDB

Sbj. auth.

Geo

Sbj. auth.

DrgBnkSbj. auth.

Jamendo

Sbj. auth.

Page 23: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

HiBISCuS: Triple Pattern-wise Source Pruning

SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}

?president

rdf:type dbpedia:President

dbpedia:United_S

tates

dbpedia:nationality

?x

owl:SameAs

dbpedia:party ?party

nyt:topicPage

?page

dbpedia

dbpedia

dbpedia

Sbj. auth.

Sbj. auth.

Sbj. auth.

NYTSbj. auth.

dbpedia

Sbj. auth.

KEGG

Sbj. auth.

NYT

Sbj. auth.

SWDF

Sbj. auth.

LMDB

Sbj. auth.

Geo

Sbj. auth.

DrgBnkSbj. auth.

Jamendo

Sbj. auth.

dbpedia KEGG NYT SWDFDrgBnk LMDB Geo Jamendo

Obj.auth.

Page 24: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

HiBISCuS: Triple Pattern-wise Source Pruning

SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}

?president

rdf:type dbpedia:President

dbpedia:United_S

tates

dbpedia:nationality

?x

owl:SameAs

dbpedia:party ?party

nyt:topicPage

?page

dbpedia

dbpedia

dbpedia

Sbj. auth.

Sbj. auth.

Sbj. auth.

NYT

dbpedia KEGG NYT SWDFDrgBnk LMDB Geo Jamendo

Obj.auth.

NYT

Page 25: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

HiBISCuS: Triple Pattern-wise Source Pruning

SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}

?president

rdf:type dbpedia:President

dbpedia:United_S

tates

dbpedia:nationality

?x

owl:SameAs

dbpedia:party ?party

nyt:topicPage

?page

dbpedia

dbpedia

dbpedia

Sbj. auth.

Sbj. auth.

Sbj. auth.

NYT

NYTObj. auth.

NYT

Page 26: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

HiBISCuS: Triple Pattern-wise Source Pruning

SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}

?president

rdf:type dbpedia:President

dbpedia:United_S

tates

dbpedia:nationality

?x

owl:SameAs

dbpedia:party ?party

nyt:topicPage

?page

dbpedia

dbpedia

dbpedia

Sbj. auth.

Sbj. auth.

Sbj. auth.

NYT

NYTObj. auth.

NYT

Page 27: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

HiBISCuS: Triple Pattern-wise Source Pruning

SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}

?president

rdf:type dbpedia:President

dbpedia:United_S

tates

dbpedia:nationality

?x

owl:SameAs

dbpedia:party ?party

nyt:topicPage

?page

dbpedia

dbpedia

dbpedia

NYT

NYT

Total triple pattern-wise selected sources = 5Total SPARQL ASK queries : 0

Page 28: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

Experimental Setup

• Benchmark– FedBench– Real-world datasets collection – Real queries showing typical request– Used all of the 25 queries

• HiBISCuS Extensions– FedX 2.0– DARQ– SPLENDID

• We also included ANAPSID (version of 12/2013) without HiBISCus extension

Page 29: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

Experimental Setup

• Metrics– Index generation time– Index size– Total triple pattern-wise sources selected– Total number of SPARQL ASK requests used– Source selection time– Query execution time

Page 30: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

Index Generation Time and Compression Ratio

FedX SPLENDIDLHD DARQ ANAPSID ADERIS Qtree HiBISCusIndex Generation Time (min) NA 75 75 102 6 6 - 36Compression Ratio NA 99.998 99.998 99.997 99.999 99.999 96.000 99.997

Compression Ratio = 1 - index size/ total datadump size

Page 31: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

Efficient Source SelectionFedBench Cross Domain Queries

System #TP #AR SST(ms)FedX(cold) 78 252 317.8FedX(warm) 78 0 7.3SPLENDID 78 99 320.8DARQ 84 0 7.2ANAPSID 36 43 186HiBISCuS(cold) 35 27 63HiBISCuS(warm) 35 0 30.4

FedBench Life Sciences QueriesSystem #TP #AR SST(ms)FedX(cold) 56 297 375.5FedX(warm) 56 0 8.2SPLENDID 56 90 307.2DARQ 77 0 7.5ANAPSID 44 63 477.4HiBISCuS(cold) 41 18 31.8HiBISCuS(warm) 41 0 23.1

FedBench Linked Data QueriesSystem #TP #AR SST(ms)FedX(cold) 97 369 307.8FedX(warm) 97 0 8SPLENDID 97 126 279DARQ 113 0 7.7ANAPSID 54 37 803.5HiBISCuS(cold) 47 9 20.6HiBISCuS(warm) 47 0 16

OverallSystem #TP #AR SST(ms)FedX(cold) 231 918 330FedX(warm) 231 0 8SPLENDID 231 315 299DARQ 274 0 7.5ANAPSID 134 143 554HiBISCuS(cold) 123 54 35.6HiBISCuS(warm) 123 0 22

Page 32: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

Efficient Source SelectionFedBench Cross Domain Queries

System #TP #AR SST(ms)FedX(cold) 78 252 317.8FedX(warm) 78 0 7.3SPLENDID 78 99 320.8DARQ 84 0 7.2ANAPSID 36 43 186HiBISCuS(cold) 35 27 63HiBISCuS(warm) 35 0 30.4

FedBench Life Sciences QueriesSystem #TP #AR SST(ms)FedX(cold) 56 297 375.5FedX(warm) 56 0 8.2SPLENDID 56 90 307.2DARQ 77 0 7.5ANAPSID 44 63 477.4HiBISCuS(cold) 41 18 31.8HiBISCuS(warm) 41 0 23.1

FedBench Linked Data QueriesSystem #TP #AR SST(ms)FedX(cold) 97 369 307.8FedX(warm) 97 0 8SPLENDID 97 126 279DARQ 113 0 7.7ANAPSID 54 37 803.5HiBISCuS(cold) 47 9 20.6HiBISCuS(warm) 47 0 16

OverallSystem #TP #AR SST(ms)FedX(cold) 231 918 330FedX(warm) 231 0 8SPLENDID 231 315 299DARQ 274 0 7.5ANAPSID 134 143 554HiBISCuS(cold) 123 54 35.6HiBISCuS(warm) 123 0 22

Page 33: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

FedX Extension with HiBISCuS

Improvement in 20/25 queries with net performance improvement 24.61%

CD1 CD2 CD3 CD4 CD5 CD6 CD7 LS1 LS2 LS3 LS4 LS5 LS6 LS7 LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10 LD11 Avg.0

50

100

150

200

250

300

350

400

FedX (warm) FedX+HiBISCus

Que

ry e

xecu

tion

time

(ms)

Page 34: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

SPLENDID Extension with HiBISCuS

Improvement in 25/25 queries with net performance improvement 82.72%

CD1 CD2 CD3 CD4 CD5 CD6 CD7 LS1 LS2 LS3 LS4 LS5 LS6 LS7 LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10 LD11 Avg.0

200

400

600

800

1000

1200

SPLENDID SPLENDID+HiBISCus

Que

ry e

xecu

tion

time

(ms)

Page 35: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

DARQ Extension with HiBISCuS

Improvement in 21/21 queries with net performance improvement 92.22%

CD1 CD2 CD3 CD4 CD5 CD6 CD7 LS1 LS2 LS3 LS4 LS5 LS6 LS7 LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10 LD11 Avg1

10

100

1000

10000

100000

1000000

10000000

DARQ DARQ+HiBISCus

Que

ry e

xecu

tion

time

(ms)

log

scal

e

Not

sup

port

edN

ot s

uppo

rted

Runti

me

erro

r

Runti

me

erro

r

Runti

me

erro

rRu

ntim

e er

ror

Tim

e ou

tTi

me

out

Not

sup

port

edN

ot s

uppo

rted

Tim

e ou

t

Tim

e ou

t

Page 36: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

SPLENDID+HiBISCuS vs ANAPSID

Improvement in 24/24 queries with net performance improvement 98%

CD1 CD2 CD3 CD4 CD5 CD6 CD7 LS1 LS2 LS3 LS4 LS5 LS6 LS7 LD1 LD2 LD3 LD4 LD5 LD6 LD7 LD8 LD9 LD10 LD11 Avg.1

10

100

1000

10000

100000 ANAPSID SPLENDID+HiBISCus

Que

ry e

xecu

tion

time

(ms)

log

scal

e

Zero

Res

ults

Page 37: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

Conclusion and Future Work

• An overestimation of triple pattern-wise source selection can be expensive

• Join-aware triple pattern-wise source selection is more efficient than simple triple pattern-wise source selection

• Performance improvements– FedX : 24.61 %– SPLENDID: 82.72 %– DARQ: 92.22 %– SPLENDID+HiBISCus is 98% faster than ANAPSID

• BigRDFBench is on the roadmap

Page 38: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

Thanks!

homepage and source code: https://code.google.com/p/hibiscusfederation/

{saleem,ngonga}@informatik.uni-leipzig.de AKSW, University of Leipzig, Germany

Page 39: HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation

Source Selection

• Triple pattern-wise source selection– Ensures 100% recall– Can over-estimate capable sources– Can be expensive, e.g., total number of SPARQL ASK requests used– Performed by FedX, SPLENDID, LHD, DARQ, ADERIS etc.

• Join-aware triple-pattern wise source selection– Ensures 100% recall– May selects optimal/close to optimal capable sources– Can be expensive, e.g., total number of SPARQL ASK requests used– Can significantly reduce the query execution time– Performed by ANAPSID