a pragmatic approach to semantic repositories benchmarking

20
06/07/22 A Pragmatic Approach to Semantic Repositories Benchmarking Dhaval Thakker , Taha Osman, Shakti Gohil, Phil Lakin © Dhaval Thakker, Press Association , Nottingham Trent University

Upload: dhaval-thakker

Post on 17-Dec-2014

3.129 views

Category:

Technology


5 download

DESCRIPTION

 

TRANSCRIPT

Page 1: A Pragmatic Approach to Semantic Repositories Benchmarking

04/10/23

A Pragmatic Approach to Semantic Repositories Benchmarking

Dhaval Thakker , Taha Osman, Shakti Gohil, Phil Lakin

© Dhaval Thakker, Press Association , Nottingham Trent University

Page 2: A Pragmatic Approach to Semantic Repositories Benchmarking

2

OutlineOutline

Introduction to the Semantic Technology Project at PA Images

Semantic Repository benchmarking Parameters Datasets

Results and Analysis Loading and querying results Modification tests

Conclusions

Page 3: A Pragmatic Approach to Semantic Repositories Benchmarking

3

http://www.pressassociation.comhttp://www.pressassociation.com

Semantic Web project Benchmarking Results Conclusions

Press Association & its operations UK’s leading multimedia news & information provider Core News Agency operation Content and Editorial services: Sports data, entertainment guides,

weather forecasting, photo syndication

Page 4: A Pragmatic Approach to Semantic Repositories Benchmarking

4

Press Association ImagesPress Association Images

Semantic Web project Benchmarking Results Conclusions

Page 5: A Pragmatic Approach to Semantic Repositories Benchmarking

5

Current Search EngineCurrent Search Engine

Semantic Web project Benchmarking Results Conclusions

Page 6: A Pragmatic Approach to Semantic Repositories Benchmarking

6

Browsing EngineBrowsing Engine

Images of Sports, Entertainment, News domain entities: people, events, locations etc.

Lacks a meta-data rich browsing engine functionality that can utilize these entities for a greater browsing experience.

To help the searchers to browse through images based on these entities and their relationships

Semantic web based browsing engine

Semantic Web project Benchmarking Results Conclusions

Page 7: A Pragmatic Approach to Semantic Repositories Benchmarking

7

Semantic Repository BenchmarkingSemantic Repository Benchmarking

“a tool, which combines the functionality of an RDF-based DBMS and an inference engine and can store data and evaluate queries, regarding the semantics of ontologies and metadata schemata.” *

Criteria for selection: The analytical parameters, such as expected level of

reasoning and query language support Selected semantic repositories

AllegroGraph, BigOWLIM, Oracle, Sesame,TDB Jena, Virtuoso

Semantic Web project Benchmarking Results Conclusions

* Kiryakov, A, Measurable Targets for Scalable Reasoning., Ontotext Technology White Paper, Nov 2007. 

Page 8: A Pragmatic Approach to Semantic Repositories Benchmarking

8

PA DatasetPA Dataset

Ontology – OWL-lite to OWL-DL Classification, subproperties, inverse properties and hasValue for automatic

classification 147 classes, 60 object properties and 30 data properties

Knowledge base – Entities 6.6 M triples Approx 1.2M entities Disk space: 1.23 GB

Image annotations - Each Image: 2-4 triples. 8M triples Approx 5 M images Disk space: 1.57 GB

Semantic Web project Benchmarking Results Conclusions

Page 9: A Pragmatic Approach to Semantic Repositories Benchmarking

9

Published Benchmarks & datasetsPublished Benchmarks & datasets

The Lehigh University Benchmark (LUBM) first standard platform to benchmark OWL systems but it gradually fell behind with the increasing expressivity of

OWL reasoning

The University Ontology Benchmark (UOBM) benchmark improve the reasoning coverage of LUBM OWL-DL and OWL-Lite inferencing

Berlin SPARQL benchmark (BSBM) BSBM focuses provides comprehensive evaluation for SPARQL

query features.

Semantic Web project Benchmarking Results Conclusions

Page 10: A Pragmatic Approach to Semantic Repositories Benchmarking

10

Benchmarking ParametersBenchmarking Parameters

1) Classification of semantic stores in Native, Memory-based or Database-based (A)

2) Forward-chaining or backward-chaining (A)3) Load Time (P)4) Query Response time (P)5) Query results analysis (P)6) RDF store update tests (P)7) Study different serialisation and impact on performance8) Scalability (A)9) Reasoner Integration (A)10) Query Language supported (A)11) Clustering supported (A)12) Programming Languages support (A)13) Platform support (A)14) RDFview support (support for non-rdf data) (A)

Semantic Web project Benchmarking Results Conclusions

Page 11: A Pragmatic Approach to Semantic Repositories Benchmarking

11

UOBM: Load timeUOBM: Load time

Semantic Web project Benchmarking Results Conclusions

UOBM Loading Time

Virtuoso

SesameJenaTDB

Oracle

BigOWLIM

AllegroGraph

0

50

100

150

200

250

300

350

UOBM1 UOBM5 UOBM10 UOBM30 Total

UOBM dataset(s)

Tim

e (m

inu

tes)

Page 12: A Pragmatic Approach to Semantic Repositories Benchmarking

12

Load Time: PA DatasetLoad Time: PA Dataset

KBImage Captions

TotalKB

Image CaptionsTotal

KBImage Captions

TotalKB

Image CaptionsTotal

KBImage Captions

Total

0 50 100 150 200 250 300 350

Time (minutes)

Virtuoso

Allegrograph

Sesame

JenaTDB

BigOWLIM

Semantic Repository

PA Dataset Loading Time

Semantic Web project Benchmarking Results Conclusions

Page 13: A Pragmatic Approach to Semantic Repositories Benchmarking

13

Dataset QueriesDataset Queries

Measuring query execution speed SPARQL queries From JAVA based client Measured execution speed based on three runs PA Dataset:

13 queries to test the expressiveness supported Subsumption, Inverse properties (Q6,Q12, Q15), Automatic classification

UOBM Dataset: 15 queries - 12 queries fall under OWL-Lite and 3 queries are of OWL-DL

expressivity Q5 and Q7 involves transitive (owl:TransitiveProperty) Q6 relies on semantic repositories to support (owl:inverseOf) Q10 requires symmetric

Semantic Web project Benchmarking Results Conclusions

Page 14: A Pragmatic Approach to Semantic Repositories Benchmarking

14

UOBM: Query executionUOBM: Query execution

Partially Answered N Query was not answered by this tool

  Execution Timings (seconds)

No. Virtuoso Allegrograph Oracle Sesame Jena TDB BigOWLIM

Q1 6.766 (P) 21.921 0.141 0.203 0.031 0.047

Q2 N 8.906(P) N 0.001(P) 0.001(P) 0.062

Q3 N 651.237 N 0.109 0.016 0.062

Q4 N N(infinite) N 0.14 120 0.063

Q5 N 1.281 N N N 0.047

Q6 N 1153.025 N N N 0.047

Q7 N 300.12 N N N 0.001

Q8 N 6.843(P) N N N 0.031

Q9 N N N N N 0.031(P)

Q10 0 0.25 0.001(P) 0.001 0.001 0.016

Q11 N N(infinite) 0.001(P) 0.094(P) N(infinite) 0.062

Q12 N 476.507 N N N 0.016

Q13 N N N N N N

Q14 N N(infinite) N N N 0.016

Q15 N N N N N N

Semantic Web project Benchmarking Results Conclusions

Page 15: A Pragmatic Approach to Semantic Repositories Benchmarking

15

PA Dataset: Query executionPA Dataset: Query execution

Query No. Virtuoso Allegrograph Sesame Jena TDB BigOWLIM

Q1 2.234 (P) 26.422 0.469(P) 0.047 0.219

Q2 N N N N 0.063

Q4 N N N N 0.047

Q5 0.172 1.719 0.141 N 0.078

Q6 N 3.765 N 0.001 0.45

Q7 84.469 28.688 0.203 N 0.093

Q8 0.047 3.39 0.11 0.001 0.062

Q9 0.156 1.782 0.171 N 0.016

Q10 0.001 1.734 0.047 N 0

Q11 N 1.734 0.11 0.001 0.062

Q12 N 16.14 N N 0.079

Q13 5.563(P) 1.812 0.016(P) 0.001 0.641

Q15 N 1.688 N N 0.031

Semantic Web project Benchmarking Results Conclusions

Partially Answered N Query was not answered by this tool

Page 16: A Pragmatic Approach to Semantic Repositories Benchmarking

16

Two complete stores: BigOWLIM v/s AllegrographTwo complete stores: BigOWLIM v/s Allegrograph

Bigowlim vs Allegrograph

0

5

10

15

20

25

Q1 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q15

Queries

Se

co

nd

s

Bigowlim Allegrograph

Semantic Web project Benchmarking Results Conclusions

Page 17: A Pragmatic Approach to Semantic Repositories Benchmarking

17

Two fast stores: BigOWLIM v/s SesameTwo fast stores: BigOWLIM v/s Sesame

Bigowlim vs Sesame

0

0.05

0.1

0.15

0.2

0.25

Q5 Q7 Q8 Q9 Q10 Q11

Queries

Se

co

nd

s

Bigowlim Sesame

Semantic Web project Benchmarking Results Conclusions

Page 18: A Pragmatic Approach to Semantic Repositories Benchmarking

18

Modification Tests: InsertModification Tests: Insert

U1U2

U3

U1U2U3

U1U2

U3

U1U2

U3U1U2

U3

0 5 10 15 20

Time (seconds)

Virtuoso

Allegrogaph

Sesame

Jena TDB

BigOWLIM

Sem

anti

c R

epo

sito

ryImage Insert Operations

U1U2

U3

U1U2

U3U1

U2U3

U1U2U3

U1U2

U3

0 0.5 1 1.5 2

Time(seconds)

Allegrogaph

Virtuoso

Sesame

Jena TDB

BigOWLIM

Sem

an

tic R

ep

osit

ory

KB Insert operations

Semantic Web project Benchmarking Results Conclusions

Page 19: A Pragmatic Approach to Semantic Repositories Benchmarking

19

Modification Tests: Update & DeleteModification Tests: Update & Delete

D1D2

D3U1

U2U3

Average

BigOWLIM

Allegrograph

Virtuoso

Sesame

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Time (seconds)

Modification Queries

Modification execution speed

Semantic Web project Benchmarking Results Conclusions

Page 20: A Pragmatic Approach to Semantic Repositories Benchmarking

20

ConclusionsConclusions

PA Dataset benchmarking Essential and desirable requirements of our application into a set of functional (practical) and

non-functional (analytical) parameters To consolidate our findings we use UOBM, a public benchmark that satisfies the requirements

of our target system

Analysis All the repositories are sound ..however not complete BigOWLIM provides the best average query response time and answers maximum number of

queries for both the datasets. But Slower in loading, modification tests. Sesame, Jena, Virtuoso and Oracle offered sub-second query response time for the majority of

queries they answer. Allegrograph answers more queries than the former four repositories hence offers better

coverage of OWL properties. Average query response time for Allegrograph was the highest for both the dataset

Further work Expanding this benchmark exercise to billion triples More repositories Adding extra benchmarking parameters such as the performance impact of concurrent users

and transaction-related operations

Semantic Web project Benchmarking Results Conclusions