heaven: supporting systematic comparative research of rdf stream processing engines

33
Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines Master thesis by: Riccardo Tommasini (799120) Advisor: Prof. Emanuele Della Valle Co-Advisors: Daniele Dell’Aglio e Marco Balduini Scuola di Ingegneria Industriale e dell’Informazione Computer Science and Engineering Anno Accademico 2013 – 2014 1

Upload: riccardo-tommasini

Post on 17-Feb-2017

299 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master thesis by: Riccardo Tommasini (799120)

Advisor: Prof. Emanuele Della Valle Co-Advisors: Daniele Dell’Aglio e Marco Balduini

Scuola di Ingegneria Industriale e dell’Informazione Computer Science and Engineering

Anno Accademico 2013 – 2014

1

Page 2: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

Agenda

2

Motivations

Research  Question

Conclusion

Evaluation

Development

Page 3: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

Stream Reasoning

3

Reasoning upon rapidly changing information flows

- Emanuele Della Valle, Stefano Ceri, 2009

Page 4: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

Computer Science research mainly focus on proposing new systems and models, lacking for empirical evaluations of the existing ones.

- Walter F. Tichy, 25 August 1994

Motivations

4

Page 5: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

5

RSP ENGINE C-SPAEQL Engine CQELS SPARQLstream

EP-SPARQL INSTANS SparkWAVE DynamiTE Trowl

C-SPARQL Engine ≡ ✔ ✔ ✔ ✔

CQELS ✔ ≡ ✔

SPARQLstream ✔ ≡EP-SPARQL ✔ ✔ ≡

INSTANS ≡SparkWAVE ✔ ≡DynamiTE ≡

Trowl ≡

State of the art in RSP Comparison

Page 6: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

Agenda

6

Motivations

Research  Question

Conclusion

Evaluation

Development

✓Motivations

Page 7: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

In Social Science…

Problem Setting - Comparative Method

7

Comparative  Analysis  is  Case  Driven

Cases are seen as a combination of properties

Similarities and differences are examined with shared methods

Baselines define analysis guidelines

Page 8: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

Problem Setting - Test Stand

Evaluate  engines  with  Test  Stands

8

In Aerospace engineering…

Experimental Environment

Reproducibility, Repeatability, ComparabilityEvaluation of running systems

Page 9: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

< ,Q>

RSP Engine

9

RDF  Stream    Processing    Engine

data streams integration through RDF data model

continuously infers implied triples w.r.t. ontology T

heterogeneous data streams

continuous querying (Q) answering

T

Page 10: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

RSP Engine - Complexity

10

RDF  Stream  Model

Execution  Semantics

Inference  Rule

+

+

Page 11: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

11

Benchmark DataStreams & Ontologies Queries Metrics Test Stand Baselines Method

SR Bench ✔ ✔ Feasibility ✖ ✖ ✖

LS Bench ✔ ✔Feasibility, Throughput ≈ ✖ ≈

CSRBench ✔ ✔Feasibility,

Throughput, Correctness

≈ ✖ ✖

State of the art of RSP Engine Benchmarking

Page 12: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

Research Question

Heavena framework to enable

Systematic Comparative Research Approach (SCRA) of RDF Stream Processing (RSP) Engines

Can  an  engine  test  stand,  together  with  existing  queries,  dataset  and  metrics,  enable  Systematic  Comparative  

Research  Approach  of  RSP  Engines?

12

Contribution

we developed and released as open source

Page 13: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

Agenda

13

Conclusion

Evaluation

Development

Research  Question✓Motivations ✓

Research  Question

Page 14: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

Do  not  Influence  the  experiment

Extendible  Design

Engine, Query, Dataset and Ontologies independence allows to exploits existing solutions presented before

Moreover…

Extendible  Measurement  Set

Heaven - Test Stand Requirements

14

Page 15: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

RSPEngine< ,Q>

Heaven - Test Stand

15

E,D,T,QE

Input output

StartStop

Inte

rface

Inte

rface

T

T QD

Streamer D

ResultCollector

Page 16: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

Heaven - Test Stand

16

Disk

ResultCollector Streamer RSPEngine

Experiment

Analyser

Start MB StopTestStand

MB

Page 17: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

Agenda

17

Conclusion

Evaluation

Development ✓Research  Question✓

Motivations ✓

Development

Page 18: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

Do

Heaven  extends  the  traditional  top-­‐down  analysis,  enabling  the  comparative  methods

How  to  start  the  research?

We  evaluate  four  naive  RSP  Engines,  called  Baselines,  included  in  the  framework

18

Page 19: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

Heaven - Baseline Engines

19

RDF StreamNaive

Δ+

Δ-

DSMS Reasoner

Incremental

Input Triple Inferred Triple

active window

RDF Stream

DSMS Incremental Reasoner

RDF Stream RDF Stream

Incremental

Page 20: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

Haven - Data

adapts  LUBM  data  to  a  streaming  scenario

20

The  RDF2RDFStream  Module

generates  many  RDF  Stream  controlling  the  number  of  contemporary  triple

Constant Flow Rate

Con

tem

pora

ry tr

iple

s

time

Step Flow Rate

Con

tem

pora

ry tr

iple

s

time

Page 21: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

Doing - Queries

ω= β × S

S = 1

S > 1

21

Tumbling Window

Sliding Window

Variations  of  the  full  

materialisation  query

Window Dimension ω [ms]Slide Parameter β = 100 [ms]

S ∈ N

Page 22: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

Experiments

22

15  SOAK  Tests

10 TIMES

FOR EACH BASELINE

168 HOURS OF EXECUTION

6  STEP  Tests

10 TIMES

FOR EACH BASELINE

150 HOURS OF EXECUTION

Con

tem

pora

ry

tripl

es

time

Con

tem

pora

ry

tripl

es

time

Page 23: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

Heaven - Analyser

We  exploit  a  layered  investigation  method,  which  answer  different  possible  question  about  RSP  Engine  analysis

L0  -­‐    How  to  choose  an  engine?

L1  -­‐    What  distinguish  an  engine?

L2  -­‐    When  choosing  an  engine?

L3  -­‐    Why  choosing  this  engine?23

Page 24: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

Doing - Analyser L0 - Dashboard

24

Memory(mb)

Latency(ms)

Memory(mb)

Latency(ms)

Memory(mb)

Latency(ms)

Memory(mb)

Latency(ms)

Increasing Window

Dim

ension (ms)

Page 25: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

25

Doing - Analyser L1 - Statistical Comparison

6.3 SOAK Test Evaluation Results

(a) Incremental

Triple Slots

in Number

Window 1 10 100 1000 10000

1 G

10 G '100 G ' '1000 G ' ' '10000 NA T S T T

(b) Triple

Triple Slots

in Number

Window 1 10 100 1000 10000

1 I

10 I I

100 N I I

1000 N I I I

10000 NA I I I I

(c) Naive

Triple Slots

in Number

Window 1 10 100 1000 10000

1 '10 ' '100 G ' T

1000 G ' T T

10000 NA ' ' T T

(d) Graph

Triple Slots

in Number

Window 1 10 100 1000 1000

1 I

10 I I

100 ' I I

1000 N I I I

10000 NA I I I I

Table 6.7 – Analyser Investigation Stack - Level 1 - SOAK Test average

latency comparison trough a qualitative approach. The following convention

indicates the baseline has not reached the Steady State Condition: G, T, N, I.

(a), (c) - latency results comparison between Incremental and Naive approaches;

(b), (d) - latency results comparison between Graph-based and Triple-based

models.

representation, butHeaven allows also more detailed analysis with quantitative

comparisons as shows in Section 5.4. To properly read the tables note that

they report that a baseline is better than another one when the di↵erence in

term of latency or memory is bigger than 5%, otherwise we consider the two

terms of comparison as equal and we use the simble '. Moreover, we indicate

that the better solution has not reached the Steady State Condition with the

underlined symbols G, T, N, I.

When N >1, the results in Table 6.7.a and 6.7.c allow to say that using

a Triple-base RDF stream is faster than Graph-based one. In particular, for

the case N=1000 when the window contains 1000 triples (i.e., each CTEvent

contains only one triple), the Naive Triple-based approach is about 10% faster

than the Naive Graph-based one while the Incremental Graph-based is even

about 20% faster. This findings confirm [Hp.2], while the cases when N=10

the does not confirm the hypothesis because the results can be consider as

equal (result di↵erences are smaller than 5%). A possible explanation is that

109

Latency

Evaluation

(a) Incremental

Triple Slots

in Number

Window 1 10 100 1000 10000

1 T

10 G T

100 G T G

1000 G G G T

10000 NA G G G G

(b) Triple

Triple Slots

in Number

Window 1 10 100 1000 10000

1 N

10 I N

100 N N I

1000 N I I I

10000 NA I I I I

(c) Naive

Triple Slots

in Number

Window 1 10 100 1000 10000

1 G

10 G T

100 G G T

1000 G G G T

10000 NA G G T T

(d) Graph

Triple Slots

in Number

Window 1 10 100 1000 10000

1 N

10 N N

100 ' N I

1000 ' I I I

100000 NA N I I I

Table 6.8 – Analyser Investigation Stack - Level 1 - SOAK Test average

memory comparison trough a qualitative approach.The following convention

indicates the baseline has not reached the Steady State Condition: G, T,

N, I. (a), (c) - memory results comparison between Incremental and Naive

approaches; (b), (d) - memory results comparison between Graph-based and

Triple-based models

the dimension of the graph cannot be considered small w.r.t the window when

N=10.

When N=1 (i.e., the window contains only one CTEvent) instead, the

results in Table 6.7.b and Table 6.7.d show that for large events the Naive

approach is faster than the Incremental one, as we stated when we formulate

[Hp.1]. Instead, when CTEvent contains only few triples, the Incremental

approach is faster and this is not intuitive, because to formulate [Hp.1] we

consider the changes dimension in percentage.

The results in Table 6.7.b and 6.7.d support [Hp.1] by stating that when

the number of changing triples in �+�� (Section 4.2) is a small fraction of

those in the window an Incremental approach is faster than the Naive one. The

exception of case N=1, but it can be seen as a limit case, where the reasoner

is asked to deduce all the implicit triples implied by the only explicit triple in

the window.

110

Memory

I: IncrementalN: Naive

SS

Window Dimension (ω) = Slide Parameter (β) × S

Page 26: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

Doing - Analyser L2 - Pattern Identification

26

6.3 SOAK Test Evaluation Results

(a) Graph Naive

Triple Slots

in Number

Window 1 10 100 1000 10000

1

10

100

1000

10000

(b) Graph Incremental

Triple Slots

in Number

Window 1 10 100 1000 10000

1

10

100

1000

10000

Table 6.11 – The figure shows the representation in the time domain of mem-

ory for GN (a) and GI (b).

117

Memory

Naive

Page 27: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

Doing - Analyser L3 - Visual Comparison

27

Page 28: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

Agenda

28

Motivations

Research  Question

Conclusion

Evaluation

Development

✓✓✓✓

✓Evaluation

Page 29: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

Done

My  contributions  are

Can  an  engine  test  stand,  together  with  existing  queries,  dataset  and  metrics,  enable  SCRA  of  RSP  Engines?

Test  Stand

Baselines

Method

Analysis29

Page 30: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

Future Works

SCRA  of  RSP  Engines  is  just  at  the  beginning.  Further  development  of  Heaven  are  possibile.

Benchmark  Suite

Heaven  as  a  Service  

Research  on  Baselines

30

Research  on  Existing  RSP  Engines

Page 31: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

Agenda

31

Motivations

Research  Question

Conclusion

Evaluation

Development ✓✓

✓✓

✓Conclusion

Page 32: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

Thank You

32

Thank You!

Page 33: Heaven: Supporting Systematic Comparative Research of RDF Stream Processing Engines

Master Degree Thesis – Riccardo Tommasini

Questions?

?????????

33