how much semantic data on small devices?

How much semantic data on

small devices?

Mathieu d’Aquin, AndriyNikolov and Enrico MottaKnowledge Media Institute, The Open Univeristy, UK

[email protected]

@mdaquin

Semantic Data on Small Devices?

Benchmarking Semantic Data Tools

Large Scale Benchmarks

LUBM(1,0)103,397 triples

Extracting sets of small-scale

ontologies

Clusters of ontologies having similar characteristics, except for size

Extracting sets of small-scale

Ontologies

• Characteristics of ontologies

– Size (tiples): varies from very small scale to

medium scale

– Ratio class/prop: allowing 50% variance

– Ratio class/inst.: allowing 50% variance

– DL expressivity: Complexity of the

language

• 99 automatically created clusters

• Manual selection of 10

Results

Size (triples) Prop/class Ind/class DL

9-2742 0.65-1.0 1.0-2.0 ALO

27-3688 0.21-0.48 0.07-0.14 ALH

2-8502 N/A N/A -

17-3696 0.66-2.0 4.5-20.5 -

3208-658808 N/A N/A EL

1514-153298 N/A N/A ELR+

8-3657 N/A N/A -

7-4959 1.41-4.0 N/A AL

1-2759 N/A N/A -

43-5132 1.0-2.0 13.0-22.09 -

Queries

• Using real life ontologies need domain independent Queries

• A set of 8 generic queries of varying complexity, and which results might depend on inference

Select all labels

Select all comments

Select all labels and comments

Select all RDFS classes

Select all classes (RDFS/OWL/DAML)

Select all instances of all classes

Select all properties applied to instances of all classes

Select all properties by their domain

Running the benchmarks – Triple

Stores

Jena with TDB persistent storage

R As above + RDFS reasoning

R

Sesame with persistent storage

As above + RDFS reasoning

Mulgara with default configuration

Running the benchmarks – Device

Asus EEE PC 700 (2G)

Running the benchmarks - Measures

• Loading time: for each ontologies in an

empty, re-initialized store.

• Disk Space: of the persistent store right

after loading.

• Memory consumption: of the triple store

process right after loading the ontology.

• Query time: for each ontology, averaged

over the 8 queries.

Results – Loading time

Results – Loading time

R

R

=

Results – Disk Space

Results – Disk Space

RR=< <

Results – Memory consumption

Results – Memory

consumptions

R

R

=

Result – Query time

Result – Query time

R=

R

<

Conclusion – on tests

• Sesame performs best in almost all

aspects, even when including reasoning

• Reasoning has big impact on Jena TDB at

query time

• Mulgara is clearly not adequate in a small-

scale scenario

Conclusion – on small-scale benchmarking

• Validates our assumption that small-scale benchmarks give different results than large-scale benchmarks

• Points out the need for more work to tackle the small-scale scenarios

• Results are not always clear cut in every aspects: benchmarks as support to decide which tool to use, depending on the application constraints

how much semantic data on small devices?

Technology

results disk space

persistent store

rdfs classesselect

small scale

small devices

tripleslarge scale benchmarks

benchmarks triple storesjena

semantic data toolslubm1