zero-knowledge query planning for an iterator implementation of link traversal based query execution

Post on 11-May-2015

1.577 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Wi

TRANSCRIPT

Zero-KnowledgeQuery Planning

for an Iterator Implementation ofLink Traversal Based Query Execution

Olaf Hartighttp://olafhartig.de/foaf.rdf#olaf

@olafhartig

Database and Information Systems Research GroupHumboldt-Universität zu Berlin

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 2

?p ex:affiliated_with <http://.../orgaX>

?p ex:interested_in ?b

?b rdf:type <http://.../Book>

Query

Iterator Based Execution Plan

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I

1

tp2 = ( ?p , ex:interested_in , ?b ) I

2

tp3 = ( ?b , rdf:type , <http://.../Book> ) I

3

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 3

Iterator Based Execution Plan

query-localdataset

Next?

Next?

Next?

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I

1

tp2 = ( ?p , ex:interested_in , ?b ) I

2

tp3 = ( ?b , rdf:type , <http://.../Book> ) I

3

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 4

Next?

Next?

Next?

tp2 = ( ?p , ex:interested_in , ?b ) I

2

tp3 = ( ?b , rdf:type , <http://.../Book> ) I

3

Iterator Based Execution Plan

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I

1

query-localdataset

:

<http://.../alice> ex:affiliated_with <http://.../orgaX>

:

Descriptor object

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 5

Next?

Next?

Next?

tp2 = ( ?p , ex:interested_in , ?b ) I

2

tp3 = ( ?b , rdf:type , <http://.../Book> ) I

3

Iterator Based Execution Plan

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I

1

query-localdataset

:

<http://.../alice> ex:affiliated_with <http://.../orgaX>

:

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 6

Next?

Iterator Based Execution Plan

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I

1

:

<http://.../alice> ex:affiliated_with <http://.../orgaX>

:

query-localdataset

{ ?p = <http://.../alice> }

Next?

tp2 = ( ?p , ex:interested_in , ?b ) I

2

tp3 = ( ?b , rdf:type , <http://.../Book> ) I

3

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 7

Next?

Next?

tp3 = ( ?b , rdf:type , <http://.../Book> ) I

3

Iterator Based Execution Plan

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I

1

tp2 = ( ?p , ex:interested_in , ?b )

tp2' = ( <http://.../alice> , ex:interested_in , ?b )

I2

query-localdataset

{ ?p = <http://.../alice> }

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 8

Next?

Next?

tp3 = ( ?b , rdf:type , <http://.../Book> ) I

3

Iterator Based Execution Plan

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I

1

tp2 = ( ?p , ex:interested_in , ?b )

tp2' = ( <http://.../alice> , ex:interested_in , ?b )

I2

query-localdataset

{ ?p = <http://.../alice> }

:

<http://.../alice> ex:interested_in <http://.../b1>

:

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 9

Next?

tp3 = ( ?b , rdf:type , <http://.../Book> ) I

3

Iterator Based Execution Plan

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I

1

tp2 = ( ?p , ex:interested_in , ?b )

tp2' = ( <http://.../alice> , ex:interested_in , ?b )

I2

{ ?p = <http://.../alice> }

:

<http://.../alice> ex:interested_in <http://.../b1>

:

query-localdataset

{ ?p = <http://.../alice> , ?b = <http://.../b1> }

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 10

Iterator Based Execution Plan

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I

1

Next?

{ ?p = <http://.../alice> }

query-localdataset

tp3 = ( ?b , rdf:type , <http://.../Book> )

tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> )

I3

tp2 = ( ?p , ex:interested_in , ?b )

tp2' = ( <http://.../alice> , ex:interested_in , ?b )

I2

{ ?p = <http://.../alice> , ?b = <http://.../b1> }

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 11

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I

1

Next?

{ ?p = <http://.../alice> }

tp3 = ( ?b , rdf:type , <http://.../Book> )

tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> )

I3

tp2 = ( ?p , ex:interested_in , ?b )

tp2' = ( <http://.../alice> , ex:interested_in , ?b )

I2

{ ?p = <http://.../alice> , ?b = <http://.../b1> }

Iterator Based Execution Plan

query-localdataset

:

<http://.../Book> rdfs:subClassOf <http://.../CreativeWork>

:

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 12

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I

1

Next?

{ ?p = <http://.../alice> }

tp3 = ( ?b , rdf:type , <http://.../Book> )

tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> )

I3

tp2 = ( ?p , ex:interested_in , ?b )

tp2' = ( <http://.../alice> , ex:interested_in , ?b )

I2

{ ?p = <http://.../alice> , ?b = <http://.../b1> }

Iterator Based Execution Plan

query-localdataset

:

<http://.../b1> rdf:type <http://.../Book>

:

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 13

Iterator Based Execution Plan

{ ?p = <http://.../alice> , ?b = <http://.../b1> }

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX> ) I

1

{ ?p = <http://.../alice> }

tp3 = ( ?b , rdf:type , <http://.../Book> )

tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> )

I3

tp2 = ( ?p , ex:interested_in , ?b )

tp2' = ( <http://.../alice> , ex:interested_in , ?b )

I2

{ ?p = <http://.../alice> , ?b = <http://.../b1> }

query-localdataset

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 14

?p ex:affiliated_with <http://.../orgaX>

?p ex:interested_in ?b

?b rdf:type <http://.../Book>

Query

tp2 = ( ?p , ex:interested_in , ?b ) I

2

Example Query Execution Plan

tp3 = ( ?b , rdf:type , <http://.../Book> ) I

3

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I

1

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 15

?p ex:affiliated_with <http://.../orgaX>

?p ex:interested_in ?b

?b rdf:type <http://.../Book>

Query

tp2 = ( ?p , ex:interested_in , ?b ) I

2

An Alternative Execution Plan

tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I

3

tp1 = ( ?b , rdf:type , <http://.../Book> ) I

1

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 16

An Alternative Execution Plan

query-localdataset

Next?

Next?

Next?

tp1 = ( ?b , rdf:type , <http://.../Book> ) I

1

tp2 = ( ?p , ex:interested_in , ?b ) I

2

tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I

3

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 17

Next?

Next?

Next?

tp2 = ( ?p , ex:interested_in , ?b ) I

2

tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I

3

tp1 = ( ?b , rdf:type , <http://.../Book> ) I

1

An Alternative Execution Plan

query-localdataset

:

<http://.../Book> rdfs:subClassOf <http://.../CreativeWork>

:

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 18

Next?

Next?

END!

tp2 = ( ?p , ex:interested_in , ?b ) I

2

An Alternative Execution Plan

tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I

3

:

<http://.../Book> rdfs:subClassOf <http://.../CreativeWork>

:

query-localdataset

tp1 = ( ?b , rdf:type , <http://.../Book> ) I

1

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 19

An Alternative Execution Plan

tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I

3

query-localdataset

END!

END!

tp2 = ( ?p , ex:interested_in , ?b ) I

2

END!

tp1 = ( ?b , rdf:type , <http://.../Book> ) I

1

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 20

An Alternative Execution Plan

query-localdataset

END!

END!

tp2 = ( ?p , ex:interested_in , ?b ) I

2

END!

tp1 = ( ?b , rdf:type , <http://.../Book> ) I

1

Number of results may depend on the order of triple patterns

= logical query execution plan

tp3 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I

3

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 21

Query Plan Selection

● Assessment criteria:● Cost (query execution time)● Benefit (number of results)

● Cost and benefit must be estimated without plan execution

● Estimation impossible due to “zero knowledge”

● Heuristic Based Plan Selection● DEPENDENCY RESPECT RULE

● SEED TP RULE

● NO VOCAB SEED RULE

● FILTERING TP RULE

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 22

SEED TP RULE

● Potential seed triple pattern

… is a triple pattern that contains at least one HTTP URI

● Seed triple pattern of a plan

… is the first triple pattern in the plan and

… is a potential seed triple pattern

● Rationale: goodstarting point

Use a plan with a seed triple pattern

?p ex:affiliated_with <http://.../orgaX>

?p ex:interested_in ?b

?b rdf:type <http://.../Book>

Query

√√

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 23

NO VOCAB SEED RULE

● Not only vocabulary term URIs in the seed triple pattern

● Patterns to avoid: ?s ex:any_property ?o

?s rdf:type ex:any_class

● Rationale: URIs for vocabulary term usually resolve tovocabulary definitions with little instance data

Avoid a seed triple pattern with vocabulary terms

?p ex:affiliated_with <http://.../orgaX>

?p ex:interested_in ?b

?b rdf:type <http://.../Book>

Query

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 24

FILTERING TP RULE

● Filtering triple pattern: each variable already occurs in oneof the preceding triple patterns

● For each resultconsumed as inputa filtering TP canonly report 1 or 0results as output

● Rationale: Reduce cost

tp2 = ( ?p , ex:interested_in , ?b )

tp2' = ( <http://.../alice> , ex:interested_in , ?b )

I2

tp3 = ( ?b , rdf:type , <http://.../Book> )

tp3' = ( <http://.../b1> , rdf:type , <http://.../Book> )

I3

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I

1

{ ?p = <http://.../alice> }

{ ?p = <http://.../alice> , ?b = <http://.../b1> }

Use a plan where all filtering triple patterns areas close to the seed triple pattern as possible

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 25

Evaluation Procedure

● Generate all possible plans

● Execute each plan:● 5 runs (+ 1 initial warm-up run) ● Use an initially empty query-local dataset for each run

● Measure for each plan:● Avg. execution time● Avg. number of descriptor objects retrieved during execution● Avg. number of query results

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 26

Evaluation Query (Example)

SELECT ?spec ?genus WHERE {

geospecies:4qyn7 gs:inFamily ?fam .

?fam skos:narrowerTransitive ?spec .

?spec skos:closeMatch ?sp2 .

?sp2 rdfs:subClassOf ?genus .

?spec gs:isExpectedIn ?loc .

geospecies:4qyn7 gs:isExpectedIn ?loc

?loc rdf:type gs:State . }

● 2 potential seed triple patterns thatsatisfy our NO SEED VOCAB RULE

● 56 different plans, each contains2 filtering triple patterns

Of what genus are the species that are● classified in the

same family as the American Badger,

● and expected in the same states as the American Badger ?

Picture source: Wikipedia

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 27

Measurements

1st Filtering TP

Percentage of plans in each group with a filtering TP in specific positions

2nd Filtering TPre

trie

v ed

desc

r. o

b jec

ts

0 30 60 90 120 150 1800

100

200

300

400

query exec. times (in seconds)

quer

y re

sult s

0 30 60 90 120 150 1800

10

20

30

query exec. times (in seconds)

1 2 3 4 5 6 70

50

100

TP position in the ordered BGP1 2 3 4 5 6 7

0

50

100

TP position in the ordered BGP

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 28

Conclusions

● Approach that uses iterators to implementLink Traversal Based Query Execution

… is sound

… guarantees termination

… cannot guarantee (reachability-) completeness

● Degree of completeness depends onthe query plans (i.e. orders of the BGP)

● Heuristic based plan selection

● Next steps:● Algorithm to generate most suitable plans only● Integrate adaptive query processing techniques

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 29

Backup Slides

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 30

Outline

1. Link Traversal BasedQuery Execution

2. Characteristics of theIterator BasedImplementation Approach

3. Query Plan Selection

4. Evaluation

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 31

?p ex:affiliated_with <http://.../orgaX>

?p ex:interested_in ?b

?b rdf:type <http://.../Book>

Query

Link Traversal Based Querying

● Semantics definedin two phases:

1. Reachability

2. Query ResultsDescriptor

object

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 32

?p ex:affiliated_with <http://.../orgaX>

?p ex:interested_in ?b

?b rdf:type <http://.../Book>

Query

Link Traversal Based Querying

● Semantics definedin two phases:

1. Reachability

2. Query Results

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 33

?p ex:affiliated_with <http://.../orgaX>

?p ex:interested_in ?b

?b rdf:type <http://.../Book>

Query

Link Traversal Based Querying

● Semantics definedin two phases:

1. Reachability

2. Query Results

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 34

Link Traversal Based Querying

● Semantics definedin two phases:

1. Reachability

2. Query Results

● 2 phases do not reflect idea of actual execution strategy● Intertwine query evaluation with the traversal of data links

Reachabledescriptor

object

The mapping μ : V → U B Lis a result to BGP bgp iff:

Condition 1: dom(μ) = vars(bgp)Condition 2: μ[bgp] D

reachable

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 35

Characteristics

● No need to know all data sources in advance● Never complete w.r.t. all data on the Web

● Reason: reachability based on links that match query patterns● New concept: reachability-completeness

● No guarantee for termination● Reason: Web of Data is infinite (at any point in time)

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 36

Outline

1. Characteristics of theImplementation Approach

2. Query Plan Selection

3. Evaluation

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 37

Characteristics

● Sound

● Number of results may depend on order of triple patterns

● Reachability-completeness not guaranteed● Main reason: inflexibility due to fixed order

● Termination is guaranteed● No such guarantee for link traversal based

query execution in general because the Webof Data is infinite (at any point in time)

● Efficient

● Easy to apply in existing query engines

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 38

Plan Selection Problem

● Number of results may depend on order of triple patterns

➔ Problem: select a suitable plan

= logical query execution plan

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 39

Outline

1. Link Traversal BasedQuery Execution

2. Characteristics of theIterator BasedImplementation Approach

3. Query Plan Selection

4. Evaluation

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 40

?p ex:affiliated_with <http://.../orgaX>

?p ex:interested_in ?b

?b rdf:type <http://.../Book>

Query

DEPENDENCY RESPECT RULE

● Dependency respect: a variable from each triple pattern already occurs in one of the preceding triple patterns

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I

1

tp2 = ( ?p , ex:interested_in , ?b ) I

2

tp3 = ( ?b , rdf:type , <http://.../Book> ) I

3

Use a dependency respecting query plan

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 41

?p ex:affiliated_with <http://.../orgaX>

?p ex:interested_in ?b

?b rdf:type <http://.../Book>

Query

DEPENDENCY RESPECT RULE

● Dependency respect: a variable from each triple pattern already occurs in one of the preceding triple patterns

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I

1

tp2 = ( ?p , ex:interested_in , ?b ) I

2

tp3 = ( ?b , rdf:type , <http://.../Book> ) I

3

Use a dependency respecting query plan

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 42

?p ex:affiliated_with <http://.../orgaX>

?p ex:interested_in ?b

?b rdf:type <http://.../Book>

Query

DEPENDENCY RESPECT RULE

Use a dependency respecting query plan

● Dependency respect: a variable from each triple pattern already occurs in one of the preceding triple patterns

● Rationale:Avoidcartesianproducts

tp1 = ( ?p , ex:affiliated_with , <http://.../orgaX>) I

1

tp2 = ( ?b , rdf:type , <http://.../Book> ) I

2

tp3 = ( ?p , ex:interested_in , ?b ) I

3

Olaf Hartig - Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution 43

These slides have been created byOlaf Hartig

http://olafhartig.de

This work is licensed under aCreative Commons Attribution-Share Alike 3.0 License

(http://creativecommons.org/licenses/by-sa/3.0/)

top related