perplexity of index models over evolving linked data

28
Institute for Web Science & Technologies – WeST Perplexity of Index Models over Evolving Linked Data Thomas Gottron , Christian Gottron May 27th, 2014 ESWC, Crete #eswc2014GottronT C

Upload: thomas-gottron

Post on 23-Aug-2014

316 views

Category:

Science


7 download

DESCRIPTION

ESWC presentation on the stability of 12 different index models for linked data. Provides a formalisation of the index models as well as stability evaluation based on data distributions and information theoretic metrics.

TRANSCRIPT

Page 1: Perplexity of Index Models over Evolving Linked Data

Institute for Web Science & Technologies – WeST

Perplexity of Index Models over Evolving Linked Data

Thomas Gottron, Christian Gottron

May 27th, 2014ESWC, Crete

#eswc2014GottronTC

Page 2: Perplexity of Index Models over Evolving Linked Data

Thomas Gottron ESWC 27.5.2014, 2Perplexity of Index Models Over Evolving LOD

Motivation

Index

Once upon a time... ... some time later

New index

???

Accuracy?

Page 3: Perplexity of Index Models over Evolving Linked Data

Thomas Gottron ESWC 27.5.2014, 3Perplexity of Index Models Over Evolving LOD

Index ModelsOver Linked Data

Page 4: Perplexity of Index Models over Evolving Linked Data

Thomas Gottron ESWC 27.5.2014, 4Perplexity of Index Models Over Evolving LOD

Data Format

Linked Data as N-Quads:

triple – what is the information?

context URI – where does it come from?

s op

c

( )s op c

Page 5: Perplexity of Index Models over Evolving Linked Data

Thomas Gottron ESWC 27.5.2014, 5Perplexity of Index Models Over Evolving LOD

(Abstract) Index Models

D : Data elements to be retrieved (payload) K : Key elements to access the data (index elements) σ : Selection function: How to get data for a key

k1

k2

k3

...

kn

d1,1 d1,2 d1,3 ...

d2,1 d2,2

d3,1 d3,2 d3,3 ...

dn,1 dn,2 dn,3 ...

Sea

rch

data

st

ruct

ure

Efficient storage and retrieval

Data items (payload)Keys

Page 6: Perplexity of Index Models over Evolving Linked Data

Thomas Gottron ESWC 27.5.2014, 6Perplexity of Index Models Over Evolving LOD

Concrete Example: Subject Based Index Model

ukob:Gottron

ukob:Staab

ukob:Schegi

...

tud:CGottron

(ukob:Gottron, rdf:type, foaf:Person) (ukob:Gottron, foaf:knows, ukob:Staab) ...

(ukob:Staab, swrc:institution, ukob:WeST) (ukob:Staab, foaf:name, „Steffen Staab“) ...

(ukob:Schegi, rdf:type, foaf:Person) (ukob:Schegi, foaf:name, „Stefan Scheglmann“)

(tud:CGottron, swrc:institution, tud:KOM) (tud:CGottron, foaf:knows, ukob:Gottron) ...

Page 7: Perplexity of Index Models over Evolving Linked Data

Thomas Gottron ESWC 27.5.2014, 7Perplexity of Index Models Over Evolving LOD

12 Implemented Index Models

Triple based

Meta data

Schema-level

https://github.com/gottron/lod-index-models

s ops

s opp

s opo

s opterm

s opc

s opPLD

type s

SchemEX s

tt st t

pp sp p

p-1

p-1 op-1p-1

tp sp t

Page 8: Perplexity of Index Models over Evolving Linked Data

Thomas Gottron ESWC 27.5.2014, 8Perplexity of Index Models Over Evolving LOD

Index Accuracyover Evolving Data

Page 9: Perplexity of Index Models over Evolving Linked Data

Thomas Gottron ESWC 27.5.2014, 9Perplexity of Index Models Over Evolving LOD

Comparing Indices

Once upon a time... ... some time later

???k1

k2

k3

...

kn

d1,1 d1,2 d1,3 ...

d2,1 d2,2

d3,1 d3,2 d3,3 ...

dn,1 dn,2 dn,3

k1

k2

k3

...

kn

d1,1 d1,2 d1,3 ...

d2,1 d2,2

d3,1 d3,2 d3,3 ...

dn,1 dn,2 dn,3

Page 10: Perplexity of Index Models over Evolving Linked Data

Thomas Gottron ESWC 27.5.2014, 10Perplexity of Index Models Over Evolving LOD

Metrics

First indicator of interest: Stability of the key element set

Relative size of the overlap of two sets

Jaccard Similarity

Page 11: Perplexity of Index Models over Evolving Linked Data

Thomas Gottron ESWC 27.5.2014, 11Perplexity of Index Models Over Evolving LOD

How to Measure Accuracy?

Queries? No established query log

for used data set Different key elements

require different queries Cover all of the index

Distributions! Relevant to several

applications Established metrics for

comparison

SPARQL

Page 12: Perplexity of Index Models over Evolving Linked Data

Thomas Gottron ESWC 27.5.2014, 12Perplexity of Index Models Over Evolving LOD

Obtaining a Distribution from an Index

k1

k2

k3

...

kn

d1,1 d1,2 d1,3 ...

d2,1 d2,2

d3,1 d3,2 d3,3 ...

dn,1 dn,2 dn,3 ...

Page 13: Perplexity of Index Models over Evolving Linked Data

Thomas Gottron ESWC 27.5.2014, 13Perplexity of Index Models Over Evolving LOD

Obtaining a Distribution from an Index

k1

k2

k3

...

kn

4

2

10

8

Relative frequencies

...

+ Smoothing(see paper)

Page 14: Perplexity of Index Models over Evolving Linked Data

Thomas Gottron ESWC 27.5.2014, 14Perplexity of Index Models Over Evolving LOD

Comparing Indices

Once upon a time... ... some time later

???

Page 15: Perplexity of Index Models over Evolving Linked Data

Thomas Gottron ESWC 27.5.2014, 15Perplexity of Index Models Over Evolving LOD

Comparing Distributions

Information theoretic measures

???

Entropy of P Expected length (in bits) for an optimalencoding of a (randomly chosen) key

Page 16: Perplexity of Index Models over Evolving Linked Data

Thomas Gottron ESWC 27.5.2014, 16Perplexity of Index Models Over Evolving LOD

Metrics

Expected length when the encoding is based on a different distribution

Cross-Entropy of P and Q

How many uniformly distributed keys would have the same entropy

Perplexity

Perplexity relative to a uniform distribution over the keys

Normalized Perplexity

Page 17: Perplexity of Index Models over Evolving Linked Data

Thomas Gottron ESWC 27.5.2014, 17Perplexity of Index Models Over Evolving LOD

Metrics: How to Interpret Perplexity

Perplexity based on cross entropy „How surprised are you about the outcome

of an experiment given you have some expections?“

1 2 3 4 5 61 2 3 4 5 6

Baseline

Unfair die

model

Page 18: Perplexity of Index Models over Evolving Linked Data

Thomas Gottron ESWC 27.5.2014, 18Perplexity of Index Models Over Evolving LOD

1 2 3 4 5 6

Metrics: How to Interpret Perplexity

Perplexity based on cross entropy „How surprised are you about the outcome

of an experiment given you have some expections?“

Minimal value

1 2 3 4 5 6

Unfair die

Page 19: Perplexity of Index Models over Evolving Linked Data

Thomas Gottron ESWC 27.5.2014, 19Perplexity of Index Models Over Evolving LOD

Metrics: How to Interpret Perplexity

Perplexity based on cross entropy „How surprised are you about the outcome

of an experiment given you have some expections?“

1 2 3 4 5 6 1 2 3 4 5 6

Unfair die

Page 20: Perplexity of Index Models over Evolving Linked Data

Thomas Gottron ESWC 27.5.2014, 20Perplexity of Index Models Over Evolving LOD

Metrics: How to Interpret Perplexity

Perplexity based on cross entropy „How surprised are you about the outcome

of an experiment given you have some expections?“

1 2 3 4 5 6 1 2 3 4 5 6

Unfair die

Page 21: Perplexity of Index Models over Evolving Linked Data

Thomas Gottron ESWC 27.5.2014, 21Perplexity of Index Models Over Evolving LOD

Stability of Index Modelsover Evolving Data

Page 22: Perplexity of Index Models over Evolving Linked Data

Thomas Gottron ESWC 27.5.2014, 22Perplexity of Index Models Over Evolving LOD

Comparing Indices

Once upon a time... ... some time later

JaccardPerplexity

k1

k2

k3

...

kn

d1,1 d1,2 d1,3 ...

d2,1 d2,2

d3,1 d3,2 d3,3 ...

dn,1 dn,2 dn,3

We are using data from the

Dynamic Linked Data

Observatory

77 Weekly snapshots, 16M triples

k1

k2

k3

...

kn

d1,1 d1,2 d1,3 ...

d2,1 d2,2

d3,1 d3,2 d3,3 ...

dn,1 dn,2 dn,3

Page 23: Perplexity of Index Models over Evolving Linked Data

Thomas Gottron ESWC 27.5.2014, 23Perplexity of Index Models Over Evolving LOD

Experimental Setup

Index construction / Estimation of distributions

...

T0 (Base)

...

...

T1 T2T3 TnTn-1

T0

„dev

iatio

n“

T1 T2T3 TnTn-1

Perplexity

Page 24: Perplexity of Index Models over Evolving Linked Data

Thomas Gottron ESWC 27.5.2014, 24Perplexity of Index Models Over Evolving LOD

Results: Jaccard Similarity of Key Set

Page 25: Perplexity of Index Models over Evolving Linked Data

Thomas Gottron ESWC 27.5.2014, 25Perplexity of Index Models Over Evolving LOD

Results: Normalised Perplexity

Page 26: Perplexity of Index Models over Evolving Linked Data

Thomas Gottron ESWC 27.5.2014, 26Perplexity of Index Models Over Evolving LOD

Results: Normalised Perplexity (Zoom in)

Page 27: Perplexity of Index Models over Evolving Linked Data

Thomas Gottron ESWC 27.5.2014, 27Perplexity of Index Models Over Evolving LOD

Conclusion

Summary

Evaluation of stability of 12 LOD index models Application independent evaluation framework Good stability of schema-level indices

Future Work

Index specific assessment of quality based on samples Accuracy in answering queries

Page 28: Perplexity of Index Models over Evolving Linked Data

Thomas Gottron ESWC 27.5.2014, 28Perplexity of Index Models Over Evolving LOD

Thanks!

Contact:Thomas GottronWeST – Institute for Web Science and TechnologiesUniversität Koblenz-Landau [email protected] #eswc2014GottronTC