![Page 1: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/1.jpg)
Federated SPARQL Query Processing Over the Web of Data
Muhammad Saleem, Axel-Cyrille NgongaNgomo
Agile Knowledge Engineering and Semantic Web (AKSW), University of Leipzig,
Germany, 25/11/2014
![Page 2: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/2.jpg)
Agenda
• SPARQL Query Federation Approaches
• SPARQL Query Federation Optimization
– Query Rewriting
– Source Selection
– Data Integration Options
– Join Order Selection
– Join Order Optimization
– Join Implementations
• Performance Metrics and Discussion
![Page 3: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/3.jpg)
SPARQL Query Federation Approaches
• SPARQL Endpoint Federation (SEF)
• Linked Data Federation (LDF)
• Distributed Hash Tables (DHTs)
• Hybrid of SEF+LDF
![Page 4: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/4.jpg)
SPARQL Endpoint Federation Approaches
• Most commonly used approaches
• Make use of SPARQL endpoints URLs
• Fast query execution
• RDF data needs to be exposed via SPARQL endpoints
• E.g., HiBISCus, FedX, SPLENDID, ANAPSID, LHD etc.
![Page 5: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/5.jpg)
Linked Data Federation Approaches
• Data needs not be exposed via SPARQL endpoints
• Uses URI lookups at runtime
• Data should follow Linked Data principles
• Slower as compared to previous approaches
• E.g., LDQPS, SIHJoin, WoDQA etc.
![Page 6: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/6.jpg)
Query federation on top of Distributed Hash Tables
• Uses DHT indexing to federate SPARQL queries
• Space efficient
• Cannot deal with whole LOD
• E.g., ATLAS
![Page 7: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/7.jpg)
Hybrid of SEF+LDF
• Federation over SPARQL endpoints and Linked Data
• Can potentially deal with whole LOD
• E.g., ADERIS-Hybrid
![Page 8: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/8.jpg)
SPARQL Endpoint Federation
S1 S2 S3 S4
RDF RDF RDF RDF
Parsing/Rewriting
Source Selection
Federator Optimzer
Integrator
Rewrite query and get Individual Triple Patterns
Identify capable source against Individual Triple Patterns
Generate optimized sub-query Exe. Plan
Integrate sub-queries results
Execute sub-queries
![Page 9: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/9.jpg)
SPARQL Query Rewriting
![Page 10: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/10.jpg)
SPARQL Query Rewriting
FedBench (LD3): Return for all US presidents their party membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality ?nationality.
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
Filter (?nationality = dbpedia:United_States )
}
FedBench (LD3): Return for all US presidents their party membership and news pages about them.
SELECT ?president ?party ?page
WHERE {
?president rdf:type dbpedia:President .
?president dbpedia:nationality dbpedia:United_States .
?president dbpedia:party ?party .
?x nyt:topicPage ?page .
?x owl:sameAs ?president .
}
Try to simplify/avoid SPARQL FILTER and REGEX expressions
![Page 11: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/11.jpg)
Source Selection
![Page 12: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/12.jpg)
Source Selection
FedBench (LD3): Return for all US presidents their party membership and news pages about them.
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
![Page 13: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/13.jpg)
FedBench (LD3): Return for all US presidents their party membership and news pages about them.
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
Source Selection
![Page 14: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/14.jpg)
FedBench (LD3): Return for all US presidents their party membership and news pages about them.
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1
Source Selection
![Page 15: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/15.jpg)
FedBench (LD3): Return for all US presidents their party membership and news pages about them.
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
Source Selection
![Page 16: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/16.jpg)
FedBench (LD3): Return for all US presidents their party membership and news pages about them.
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
TP5 = S1 S2 S4-S9
Source Selection
Total triple pattern-wise sources selected = 1+1+1+1+8 => 12
![Page 17: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/17.jpg)
Types of Source Selection• Index-free
– Using SPARQL ASK queries– No index maintenance required– Potentially ensures result set completeness– SPARQL ASK queries can be expensive– Can make use of the cache to store recent SPARQL ASK queries results– E.g., FedX
• Index-only– Only make use of Index/data summaries– Less efficient but fast source selection– Result set completeness is not ensured– E.g., DARQ, LHD
• Hybrid– Make use of index+SPARQL ASK – Most efficient– Result set completeness is not ensured– Can make use of the cache to store recent SPARQL ASK queries results– E.g., HiBISCuS, ANAPSID, SPLENDID
![Page 18: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/18.jpg)
Index-free Source Selection
Input: SPARQL query Q , set of all data sources DOutput: Triple pattern to relevant data sources map Mfor each triple pattern ti in SPARQL query Q
Ri = {}; // set of relevant data sources for triple pattern ti
for each data source di in Dif SPARQL ASK(di , ti) = true
Ri = Ri U {di};end if
end forM = M U {Ri};
end forreturn M What is the total number of SPARQL ASK requests used?
total number of triple patterns * total number of data sources
![Page 19: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/19.jpg)
Index-free Source Selection
FedBench (LD3): Return for all US presidents their party membership and news pages about them.
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
![Page 20: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/20.jpg)
FedBench (LD3): Return for all US presidents their party membership and news pages about them.
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
Index-free Source Selection
![Page 21: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/21.jpg)
FedBench (LD3): Return for all US presidents their party membership and news pages about them.
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1
Index-free Source Selection
![Page 22: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/22.jpg)
FedBench (LD3): Return for all US presidents their party membership and news pages about them.
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
Index-free Source Selection
![Page 23: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/23.jpg)
FedBench (LD3): Return for all US presidents their party membership and news pages about them.
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
TP5 = S1 S2
Index-free Source Selection
Total number of SPARQL ASK requests used = 45Total triple pattern-wise sources selected = 12
S4-S9
![Page 24: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/24.jpg)
Index-only Source Selection (LHD)Input: SPARQL query Q , set of all data sources D, data sources index I storing all distinct predicates for all data sources in DOutput: Triple pattern to relevant data sources map Mfor each triple pattern ti in SPARQL query QRi = {}; // set of relevant data sources for triple pattern ti
p = Pred(ti) // predicate of ti
if (bound (p)) Ri = Lookup (I, p) // index lookup for predicate of ti
elseRi = D ; // all data sources are relevant
end ifM = M U {Ri} ;
end forreturn M Why it is the less efficient approach (i.e., greatly overestimate relevant data sources)?
• Source selection is only based on predicate of triple patterns• Simply select all data sources for triple patterns having unbound predicates
![Page 25: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/25.jpg)
Index-only Source Selection
FedBench (LD3): Return for all US presidents their party membership and news pages about them.
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1-S9TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
![Page 26: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/26.jpg)
FedBench (LD3): Return for all US presidents their party membership and news pages about them.
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1S1-S9
Index-only Source Selection
![Page 27: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/27.jpg)
FedBench (LD3): Return for all US presidents their party membership and news pages about them.
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1
Index-only Source Selection
S1-S9
![Page 28: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/28.jpg)
FedBench (LD3): Return for all US presidents their party membership and news pages about them.
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
Index-only Source Selection
S1-S9
![Page 29: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/29.jpg)
FedBench (LD3): Return for all US presidents their party membership and news pages about them.
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
TP5 = S1 S2 S4-S9
Index-only Source Selection
Total number of SPARQL ASK requests used = 0Total triple pattern-wise sources selected = 20
S1-S9
![Page 30: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/30.jpg)
Hybrid Source SelectionInput: SPARQL query Q , set of all data sources D, data sources index I storing all distinct predicates for all data sources in DOutput: Triple pattern to relevant data sources map Mfor each triple pattern ti in SPARQL query Q
Ri = {}; // set of relevant data sources for triple pattern ti
s = Subj(ti) , p = Pred(ti) , o = Obj(ti) ; // subject, predicate, and object of ti
if (!bound (p) || bound (s) || bound (o) ) for each data source di in D
if SPARQL ASK(di , ti) = trueRi = Ri U {di};
end ifend for
else Ri = Lookup (I, p) // index lookup for predicate of ti
end ifM = M U {Ri}
end forreturn M
What is the total number of SPARQL ASK requests used?
total number of triple patterns with bound subject or bound object or unbound predicate * total number of data sources
![Page 31: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/31.jpg)
Hybrid Source Selection
FedBench (LD3): Return for all US presidents their party membership and news pages about them.
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
![Page 32: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/32.jpg)
FedBench (LD3): Return for all US presidents their party membership and news pages about them.
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
Hybrid Source Selection
![Page 33: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/33.jpg)
FedBench (LD3): Return for all US presidents their party membership and news pages about them.
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1
Hybrid Source Selection
![Page 34: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/34.jpg)
FedBench (LD3): Return for all US presidents their party membership and news pages about them.
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
Hybrid Source Selection
![Page 35: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/35.jpg)
FedBench (LD3): Return for all US presidents their party membership and news pages about them.
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
TP5 = S1 S2
Total number of SPARQL ASK requests used = 18Total triple pattern-wise sources selected = 12
S4-S9
Anything still needs to be improved?
Hybrid Source Selection
![Page 36: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/36.jpg)
Source Selection
• Triple pattern-wise source selection– Ensures 100% recall– Can over-estimate capable sources– Can be expensive, e.g., total number of SPARQL ASK
requests used– Performed by FedX, SPLENDID, LHD, DARQ, ADERIS etc.
• Join-aware triple-pattern wise source selection– Ensures 100% recall– May selects optimal/close to optimal capable sources– Can be expensive, e.g., total number of SPARQL ASK
requests used– Can significantly reduce the query execution time– Performed by ANAPSID, HiBISCuS
![Page 37: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/37.jpg)
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
• Hybrid source selection
• Join-aware triple-pattern wise source selection
• Makes use of the hypergraph representation of SPARQL queries
• Makes use of the URI authorities
• Makes use of the cache to store recent SPARQL ASK queries results
![Page 38: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/38.jpg)
Motivation
FedBench (LD3): Return for all US presidents their party membership and news pages about them.
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
![Page 39: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/39.jpg)
Motivation
FedBench (LD3): Return for all US presidents their party membership and news pages about them.
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
![Page 40: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/40.jpg)
Motivation
FedBench (LD3): Return for all US presidents their party membership and news pages about them.
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1
![Page 41: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/41.jpg)
Motivation
FedBench (LD3): Return for all US presidents their party membership and news pages about them.
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
![Page 42: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/42.jpg)
Motivation
FedBench (LD3): Return for all US presidents their party membership and news pages about them.
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
TP5 = S1 S2 S4 S5
S6 S7 S8 S9
![Page 43: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/43.jpg)
Motivation
FedBench (LD3): Return for all US presidents their party membership and news pages about them.
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
TP5 = S1 S2 S4 S5
S6 S7 S8 S9
Total triple pattern-wise selected sources = 12Total SPARQL ASK queries : 9*5 = 45
![Page 44: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/44.jpg)
Motivation
FedBench (LD3): Return for all US presidents their party membership and news pages about them.
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP3 = S1 TP4 = S4
TP5 = S1 S2 S4 S5
S6 S7 S8 S9
Total triple pattern-wise selected sources = 12Total SPARQL ASK queries : 9*5 = 45
![Page 45: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/45.jpg)
Motivation
FedBench (LD3): Return for all US presidents their party membership and news pages about them.
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
dbpedia
RDF
Source Selection Algorithm
Triple pattern-wise source selection
S1TP1 =
TP3 = S1
Optimal triple pattern-wise selected sources 5
KEGG
RDF
ChEBI
RDF
NYT
RDF
SWDF
RDF
LMDB
RDF
Jamendo
RDF
Geo Names
RDF
DrugBank
RDF
S1 S2 S3 S4 S5 S6 S7 S8 S9
//TP1
//TP3
//TP4
//TP5
//TP2
TP2 = S1
TP4 = S4
TP5 = S1 S2 S4 S5
S6 S7 S8 S9
![Page 46: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/46.jpg)
Problem Statement
• An overestimation of triple pattern-wise source selection can be expensive
– Resources are wasted
– Query runtime is increased
– Extra traffic is generated
• How do we perform join-aware triple pattern wise source selection in time efficient way?
![Page 47: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/47.jpg)
HiBISCuS: Key Concept
• Makes use of the URI’s authorities
http://dbpedia.org/ontology/partyScheme Authority Path
For URI details: http://tools.ietf.org/html/rfc3986
![Page 48: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/48.jpg)
HiBISCuS: SPARQL Query as HypergraphSELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
?president
rdf:typedbpedia:President
![Page 49: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/49.jpg)
HiBISCuS: SPARQL Query as HypergraphSELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
?president
rdf:typedbpedia:President
dbpedia:United_S
tates
dbpedia:nationality
![Page 50: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/50.jpg)
HiBISCuS: SPARQL Query as HypergraphSELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
?president
rdf:typedbpedia:President
dbpedia:United_S
tates
dbpedia:nationality
dbpedia:party
?party
![Page 51: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/51.jpg)
HiBISCuS: SPARQL Query as HypergraphSELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
?president
rdf:typedbpedia:President
dbpedia:United_S
tates
dbpedia:nationality
dbpedia:party
?party
?x
nyt:topicPage
?page
![Page 52: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/52.jpg)
HiBISCuS: SPARQL Query as HypergraphSELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
?president
rdf:typedbpedia:President
dbpedia:United_S
tates
dbpedia:nationality
dbpedia:party
?party
?x
nyt:topicPage
?page
owl:SameAs
![Page 53: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/53.jpg)
HiBISCuS: SPARQL Query as HypergraphSELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
?president
rdf:typedbpedia:President
dbpedia:United_S
tates
dbpedia:nationality
?x
owl:SameAs
dbpedia:party
?party
nyt:topicPage
?page
Star simple hybrid Tail of hyperedge
![Page 54: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/54.jpg)
HiBISCuS: Data Summaries
[] a ds:Service ; ds:endpointUrl <http://dbpedia.org/sparql> ; ds:capability [
ds:predicate dbpedia:party ; ds:sbjAuthority <http://dbpedia.org/> ; ds:objAuthority <http://dbpedia.org/> ;
] ; ds:capability [
ds:predicate rdf:type ; ds:sbjAuthority <http://dbpedia.org/> ; ds:objAuthority owl:Thing, dbpedia:President; #we store all distinct
classes ] ;
ds:capability [ds:predicate dbpedia:postalCode ; ds:sbjAuthority <http://dbpedia.org/> ; #No objAuthority as the object value for dbpedia:postalCode is string
] ;
![Page 55: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/55.jpg)
HiBISCuS: Triple Pattern-wise Source Selection
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
?president
rdf:typedbpedia:President
dbpedia:United_States
dbpedia:nationality
?x
owl:SameAs
dbpedia:party ?party
nyt:topicPage
?page
dbpedia KEGG NYT SWDF LMDB Geo DrgBnk Jamendo
![Page 56: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/56.jpg)
HiBISCuS: Triple Pattern-wise Source Pruning
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
?president
rdf:typedbpedia:President
dbpedia:United_States
dbpedia:nationality
?x
owl:SameAs
dbpedia:party ?party
nyt:topicPage
?page
dbpedia KEGG NYT SWDF
DrgBnk LMDB Geo Jamendo
Obj.auth.
dbpedia
Sbj. auth.
KEGG
Sbj. auth.
NYT
Sbj. auth.
SWDF
Sbj. auth.
LMDB
Sbj. auth.
Geo
Sbj. auth.
DrgBnk
Sbj. auth.
Jamendo
Sbj. auth.
![Page 57: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/57.jpg)
HiBISCuS: Triple Pattern-wise Source Pruning
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
?president
rdf:typedbpedia:President
dbpedia:United_States
dbpedia:nationality
?x
owl:SameAs
dbpedia:party ?party
nyt:topicPage
?page
dbpedia
Sbj. auth.
KEGG
Sbj. auth.
NYT
Sbj. auth.
SWDF
Sbj. auth.
LMDB
Sbj. auth.
Geo
Sbj. auth.
DrgBnk
Sbj. auth.
Jamendo
Sbj. auth.
dbpedia KEGG NYT SWDF
DrgBnk LMDB Geo Jamendo
Obj.auth.
![Page 58: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/58.jpg)
HiBISCuS: Triple Pattern-wise Source Pruning
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
?president
rdf:typedbpedia:President
dbpedia:United_States
dbpedia:nationality
?x
owl:SameAs
dbpedia:party ?party
nyt:topicPage
?page
dbpedia KEGG NYT SWDF
DrgBnk LMDB Geo Jamendo
Obj.auth.
![Page 59: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/59.jpg)
HiBISCuS: Triple Pattern-wise Source Pruning
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
?president
rdf:typedbpedia:President
dbpedia:United_States
dbpedia:nationality
?x
owl:SameAs
dbpedia:party ?party
nyt:topicPage
?page
NYTObj. auth.
![Page 60: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/60.jpg)
HiBISCuS: Triple Pattern-wise Source Pruning
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
?president
rdf:typedbpedia:President
dbpedia:United_States
dbpedia:nationality
?x
owl:SameAs
dbpedia:party ?party
nyt:topicPage
?page
NYTObj. auth.
![Page 61: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/61.jpg)
HiBISCuS: Triple Pattern-wise Source Pruning
SELECT ?president ?party ?page WHERE {?president rdf:type dbpedia:President .?president dbpedia:nationality dbpedia:United_States .?president dbpedia:party ?party .?x nyt:topicPage ?page .?x owl:sameAs ?president .}
?president
rdf:typedbpedia:President
dbpedia:United_States
dbpedia:nationality
?x
owl:SameAs
dbpedia:party ?party
nyt:topicPage
?page
Total triple pattern-wise selected sources = 5Total SPARQL ASK queries : 0
![Page 62: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/62.jpg)
Data Integration Options
![Page 63: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/63.jpg)
Complete Local Integration• Triple patterns are individually and completely
evaluated against every endpoint
• Triple pattern results are locally integrated using different join techniques, e.g., NLJ, Hash Join etc.
• Less efficient if query contains common predicates such rdf:type and owl:sameAs
• Large amount of potentially irrelevant intermediate results retrieval
![Page 64: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/64.jpg)
Iterative Integration• Evaluate query iteratively pattern by pattern
• Start with a single triple pattern
• Substitute mappings from previous triple pattern in the subsequent evaluation
• Evaluate query in a NLJ fashion
• NLJ can cause many remote requests
• Block NLJ fashion minimize the remote requests
![Page 65: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/65.jpg)
Join Order Selection
![Page 66: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/66.jpg)
Join Order Selection• Left-deep trees
– Joins take place in a left-to-right sequential order – Result of the join is used as an outer input for the next join– Used in FedX, DARQ
• Right-deep trees– Joins take place in a right-to-left sequential order – Result of the join is used as an inner input for the next join
• Bushy trees– Joins take place in sub-tress both on left and right sides– Used in ANAPSID
• Dynamic programming– Used in SPLENDID
![Page 67: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/67.jpg)
Join Order Selection ExampleCompute Micronutrients using Drugbank and KEGG
SELECT ?drug ?title WHERE {
?drug drugbank:drugCategory drugbank-cat:micronutrient. // TP1
?drug drugbank:casRegistryNumber ?id . // TP2
?keggDrug rdf:type kegg:Drug . // TP3
?keggDrug bio2rdf:xRef ?id . // TP4
?keggDrug dc:title ?title . // TP5
}
67
𝜋 ? 𝑑𝑟𝑢𝑔, ? 𝑡𝑖𝑡𝑙𝑒
TP1 TP2
TP3
TP4
TP5
Left-deep tree
𝜋 ? 𝑑𝑟𝑢𝑔, ? 𝑡𝑖𝑡𝑙𝑒
TP1 TP2
TP3
TP4
TP5
Right-deep tree𝜋 ? 𝑑𝑟𝑢𝑔, ? 𝑡𝑖𝑡𝑙𝑒
TP1 TP2
Bushy tree
TP3 TP5
TP4
Goal: Execute smallest cardinality joins first
![Page 68: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/68.jpg)
Join Order Optimization
![Page 69: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/69.jpg)
Join Order Optimization• Exclusive Groups
– Group triple patterns with the same relevant data source – Evaluation in a single (remote) sub-query– Push join to the data source, i.e., endpoint
• Variable count-heuristic– Iteratively determine the join order based on free variables
count of triple patterns and groups– Consider “resolved ” variable mappings from earlier iteration
• Using Selectivities– Store distinct predicates, avg. subject selectivities , and avg.
object selectivities for each predicate in index– Use the predicate count, avg. subject selectivities , and avg.
object selectivities to estimate the join cardinality
![Page 70: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/70.jpg)
Exclusive Groups
SELECT ?President ?Party ?TopicPage WHERE {
?President rdf:type dbpedia-yago:PresidentsOfTheUnitedStates .
?President dbpedia:party ?Party .
?nytPresident owl:sameAs ?President .
?nytPresident nytimes:topicPage ?TopicPage .
}
Source Selection@ DBpedia
@ DBpedia
@ DBpedia, NYTimes@ NYTimes
Exclusive Group
Advantage:
Delegate joins to the endpoint by forming exclusive groups (i.e. executing the respective patterns in a single subquery)
70
![Page 71: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/71.jpg)
Exclusive Groups Join Order Optimization 2 Unoptimized Internal Representation
Compute Micronutrients using Drugbank and KEGG
SELECT ?drug ?title WHERE {
?drug drugbank:drugCategory drugbank-cat:micronutrient .
?drug drugbank:casRegistryNumber ?id .
?keggDrug rdf:type kegg:Drug .
?keggDrug bio2rdf:xRef ?id .
?keggDrug dc:title ?title .
}
1 SPARQL Query
3 Optimized Internal Representation
4x Local Join=
4x NLJ
Exlusive Group Remote Join
71
![Page 72: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/72.jpg)
[] a sd:Service ; sd:endpointUrl <http://localhost:8890/sparql> ; sd:capability [
sd:predicate diseasome:name ; sd:totalTriples 147 ; // Total number of triple patterns with predicate value sd:predicatesd:avgSbjSel ``0.0068'' ; // 1/ distinct subjects with predicate value sd:predicatesd:avgObjSel ``0.0069'' ; // 1/ distinct Objects with predicate value sd:predicate
] ; sd:capability [
sd:predicate diseasome:chromosomalLocation ; sd:totalTtriples 160 ; sd:avgSbjSel ``0.0062'' ; sd:avgObjSel ``0.0072'' ;
] ;
S1 P O1 .S1 P O2 . S2 P O1 . S3 P O2 .
totalTriples = 4avgSbjSel(p) = 1/3 avgObjSel(p) =1/2
Selectivity Based Join Order Optimization
![Page 73: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/73.jpg)
Selectivity Based Join Order Optimization
• Triple pattern cardinality
• Join Cardinality
𝑝 = pred(tp) , 𝑇 = Total triple having predicate 𝑝
𝐶(𝑡𝑝) =
𝑇 𝑖𝑓 𝑛𝑒𝑖𝑡ℎ𝑒𝑟 𝑠𝑢𝑏𝑗𝑒𝑐𝑡 𝑛𝑜𝑟 𝑜𝑏𝑗𝑒𝑐𝑡 𝑖𝑠 𝑏𝑜𝑢𝑛𝑑
𝑇 × 𝑎𝑣𝑔𝑆𝑏𝑗𝑆𝑒𝑙 𝑝 𝑖𝑓 𝑠𝑢𝑏𝑗𝑒𝑐𝑡 𝑖𝑠 𝑏𝑜𝑢𝑛𝑑
𝑇 × 𝑎𝑣𝑔𝑂𝑏𝑗𝑆𝑒𝑙 𝑝 𝑖𝑓𝑜𝑏𝑗𝑒𝑐𝑡 𝑖𝑠 𝑏𝑜𝑢𝑛𝑑
𝐶(𝐽 𝑡𝑝1, 𝑡𝑝2 ) =
𝐶 𝑡𝑝1 × 𝐶 𝑡𝑝2 × 𝑎𝑣𝑔𝑃𝑟𝑒𝑑𝐽𝑜𝑖𝑛𝑆𝑒𝑙 𝑡𝑝1 × 𝑎𝑣𝑔𝑃𝑟𝑒𝑑𝐽𝑜𝑖𝑛𝑆𝑒𝑙 𝑡𝑝2 𝑖𝑓 𝑝 − 𝑝 𝑗𝑜𝑖𝑛
𝐶 𝑡𝑝1 × 𝐶 𝑡𝑝2 × 𝑎𝑣𝑔𝑆𝑏𝑗𝐽𝑜𝑖𝑛𝑆𝑒𝑙 𝑡𝑝1 × 𝑎𝑣𝑔𝑆𝑏𝑗𝐽𝑜𝑖𝑛𝑆𝑒𝑙 𝑡𝑝2 𝑖𝑓 𝑠 − 𝑠 𝑗𝑜𝑖𝑛
𝐶 𝑡𝑝1 × 𝐶 𝑡𝑝2 × 𝑎𝑣𝑔𝑆𝑏𝑗𝐽𝑜𝑖𝑛𝑆𝑒𝑙 𝑡𝑝1 × 𝑎𝑣𝑔𝑂𝑏𝑗𝐽𝑜𝑖𝑛𝑆𝑒𝑙 𝑡𝑝2 𝑖𝑓 𝑠 − 𝑜 𝑗𝑜𝑖𝑛
How to calculate avgPredJoinSel, avgSbjJoinSel, and avgObjJoinSel?
DARQ selected 0.5 as the avgJoinSel value for all joins
![Page 74: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/74.jpg)
Join Implementations
![Page 75: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/75.jpg)
Join Implementations• Bound Joins
– Start with a single triple pattern (lowest cardinality)– Substitute mappings from previous triple pattern in the
subsequent evaluation– Bound Joins in NLJ fashion
• Execute bound joins in nested loop join fashion• Too many remote requests
– Bound Joins in Block NLJ fashion• Execute bound joins in block nested loop join fashion• Make use of SPARQL UNION construct• Remote requests are reduced by the block size
• Other Join techniques– E.g, Hash Joins
![Page 76: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/76.jpg)
Bound Joins in Block NLJ
SELECT ?President ?Party ?TopicPage WHERE {
?President rdf:type dbpedia:PresidentsOfTheUnitedStates .
?President dbpedia:party ?Party .
?nytPresident owl:sameAs ?President .
?nytPresident nytimes:topicPage ?TopicPage .
}
Assume that the following intermediate results have been computed as input for the last triple pattern
Block Input“Barack Obama”“George W. Bush”…
Before (NLJ)SELECT ?TopicPage WHERE { “Barack Obama” nytimes:topicPage ?TopicPage }SELECT ?TopicPage WHERE { “George W. Bush” nytimes:topicPage ?TopicPage }…
Now: Evaluation in a single remote request using a SPARQL UNION construct + local post processing (SPARQL 1.0)
76
![Page 77: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/77.jpg)
Parallelization and Pipelining• Execute sub-queries concurrently on different data
sources
• Multithreaded worker pool to execute the joins and UNION operators in parallel
• Pipelining approach for intermediate results
• See FedX and LHD implementations
![Page 78: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/78.jpg)
Performance Metrics and Discussion
![Page 79: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/79.jpg)
Performance Metrics• Efficient source selection in terms of
– Total triple pattern-wise sources selected
– Total number of SPARQL ASK requests used during source selection
– Source selection time
• Query execution time
• Results completeness and correctness
• Number of remote requests during query execution
• Index compression ratio (1- index size/datadump size)
• See https://code.google.com/p/bigrdfbench/
![Page 80: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/80.jpg)
Evaluation Setup
• Local dedicated network
• Local SPARQL endpoints (One per machine)
• Run each query 10 times and present the average results
• Statistically analyzed the results, e.g., Wilcoxon signed rank test, student T-test
![Page 81: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/81.jpg)
SPARQL Query Federation Engines• FedX
• SPLENDID
• HiBISCuS+FedX
• HiBISCuS+SPLENDID
• ANAPSID
• LHD
• DARQ
81
![Page 82: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/82.jpg)
AKSW SPARQL Federation Publications• HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation by Muhammad
Saleem and Axel-Cyrille Ngonga Ngomo, in (ESWC, 2014)• DAW: Duplicate-AWare Federated Query Processing over the Web of Data by Muhammad
Saleem Axel-Cyrille Ngonga Ngomo, Josiane Xavier Parreira , Helena Deus , and Manfred Hauswirth, in (ISWC 2013).
• TopFed: TCGA Tailored Federated Query Processing and Linking to LOD by Muhammad Saleem, Shanmukha Sampath , Axel-Cyrille Ngonga Ngomo , Aftab Iqbal, Jonas Almeida , and Helena F. Deus , in (Journal of Biomedical Semantics, 2014).
• A Fine-Grained Evaluation of SPARQL Endpoint Federation Systems by Muhammad Saleem, YasarKhan, Ali Hasnain, Ivan Ermilov, and Axel-Cyrille Ngonga Ngomo , in (Semantic Web Journal, 2014)
• BigRDFBench: A Billion Triples Benchmark for SPARQL Query Federation by Muhammad Saleem, Ali Hasnain, Axel-Cyrille Ngonga Ngomo , in (submitted WWW, 2015).
• SAFE: Policy-Aware SPARQL Query Federation Over RDF Data CubesBy Yasar Khan, Muhammed Saleem , Aftab Iqbal, Muntazir Mehdi, Aidan Hogan, Panagiotis Hasapis, Axel-Cyrille Ngonga Ngomo, Stefan Decker, and Ratnesh Sahay, in (SWAT4LS, 2014)
• QFed: Query Set For Federated SPARQL Query Benchmark by Nur Aini Rakhmawati, Sarasi lithsena, Muhammad Saleem , Stefan Decker, in (iiWAS, 2014)
82
![Page 83: Federated SPARQL query processing over the Web of Data](https://reader034.vdocuments.us/reader034/viewer/2022042602/55a021381a28ab23788b467c/html5/thumbnails/83.jpg)
Thanks
{saleem,ngonga}@informatik.uni-leipzig.deAKSW, University of Leipzig, Germany