perplexity of index models over evolving linked data
DESCRIPTION
ESWC presentation on the stability of 12 different index models for linked data. Provides a formalisation of the index models as well as stability evaluation based on data distributions and information theoretic metrics.TRANSCRIPT
Institute for Web Science & Technologies – WeST
Perplexity of Index Models over Evolving Linked Data
Thomas Gottron, Christian Gottron
May 27th, 2014ESWC, Crete
#eswc2014GottronTC
Thomas Gottron ESWC 27.5.2014, 2Perplexity of Index Models Over Evolving LOD
Motivation
Index
Once upon a time... ... some time later
New index
???
Accuracy?
Thomas Gottron ESWC 27.5.2014, 3Perplexity of Index Models Over Evolving LOD
Index ModelsOver Linked Data
Thomas Gottron ESWC 27.5.2014, 4Perplexity of Index Models Over Evolving LOD
Data Format
Linked Data as N-Quads:
triple – what is the information?
context URI – where does it come from?
s op
c
( )s op c
Thomas Gottron ESWC 27.5.2014, 5Perplexity of Index Models Over Evolving LOD
(Abstract) Index Models
D : Data elements to be retrieved (payload) K : Key elements to access the data (index elements) σ : Selection function: How to get data for a key
k1
k2
k3
...
kn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3 ...
Sea
rch
data
st
ruct
ure
Efficient storage and retrieval
Data items (payload)Keys
Thomas Gottron ESWC 27.5.2014, 6Perplexity of Index Models Over Evolving LOD
Concrete Example: Subject Based Index Model
ukob:Gottron
ukob:Staab
ukob:Schegi
...
tud:CGottron
(ukob:Gottron, rdf:type, foaf:Person) (ukob:Gottron, foaf:knows, ukob:Staab) ...
(ukob:Staab, swrc:institution, ukob:WeST) (ukob:Staab, foaf:name, „Steffen Staab“) ...
(ukob:Schegi, rdf:type, foaf:Person) (ukob:Schegi, foaf:name, „Stefan Scheglmann“)
(tud:CGottron, swrc:institution, tud:KOM) (tud:CGottron, foaf:knows, ukob:Gottron) ...
Thomas Gottron ESWC 27.5.2014, 7Perplexity of Index Models Over Evolving LOD
12 Implemented Index Models
Triple based
Meta data
Schema-level
https://github.com/gottron/lod-index-models
s ops
s opp
s opo
s opterm
s opc
s opPLD
type s
SchemEX s
tt st t
pp sp p
p-1
p-1 op-1p-1
tp sp t
Thomas Gottron ESWC 27.5.2014, 8Perplexity of Index Models Over Evolving LOD
Index Accuracyover Evolving Data
Thomas Gottron ESWC 27.5.2014, 9Perplexity of Index Models Over Evolving LOD
Comparing Indices
Once upon a time... ... some time later
???k1
k2
k3
...
kn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3
k1
k2
k3
...
kn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3
Thomas Gottron ESWC 27.5.2014, 10Perplexity of Index Models Over Evolving LOD
Metrics
First indicator of interest: Stability of the key element set
Relative size of the overlap of two sets
Jaccard Similarity
Thomas Gottron ESWC 27.5.2014, 11Perplexity of Index Models Over Evolving LOD
How to Measure Accuracy?
Queries? No established query log
for used data set Different key elements
require different queries Cover all of the index
Distributions! Relevant to several
applications Established metrics for
comparison
SPARQL
Thomas Gottron ESWC 27.5.2014, 12Perplexity of Index Models Over Evolving LOD
Obtaining a Distribution from an Index
k1
k2
k3
...
kn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3 ...
Thomas Gottron ESWC 27.5.2014, 13Perplexity of Index Models Over Evolving LOD
Obtaining a Distribution from an Index
k1
k2
k3
...
kn
4
2
10
8
Relative frequencies
...
+ Smoothing(see paper)
Thomas Gottron ESWC 27.5.2014, 14Perplexity of Index Models Over Evolving LOD
Comparing Indices
Once upon a time... ... some time later
???
Thomas Gottron ESWC 27.5.2014, 15Perplexity of Index Models Over Evolving LOD
Comparing Distributions
Information theoretic measures
???
Entropy of P Expected length (in bits) for an optimalencoding of a (randomly chosen) key
Thomas Gottron ESWC 27.5.2014, 16Perplexity of Index Models Over Evolving LOD
Metrics
Expected length when the encoding is based on a different distribution
Cross-Entropy of P and Q
How many uniformly distributed keys would have the same entropy
Perplexity
Perplexity relative to a uniform distribution over the keys
Normalized Perplexity
Thomas Gottron ESWC 27.5.2014, 17Perplexity of Index Models Over Evolving LOD
Metrics: How to Interpret Perplexity
Perplexity based on cross entropy „How surprised are you about the outcome
of an experiment given you have some expections?“
1 2 3 4 5 61 2 3 4 5 6
Baseline
Unfair die
model
Thomas Gottron ESWC 27.5.2014, 18Perplexity of Index Models Over Evolving LOD
1 2 3 4 5 6
Metrics: How to Interpret Perplexity
Perplexity based on cross entropy „How surprised are you about the outcome
of an experiment given you have some expections?“
Minimal value
1 2 3 4 5 6
Unfair die
Thomas Gottron ESWC 27.5.2014, 19Perplexity of Index Models Over Evolving LOD
Metrics: How to Interpret Perplexity
Perplexity based on cross entropy „How surprised are you about the outcome
of an experiment given you have some expections?“
1 2 3 4 5 6 1 2 3 4 5 6
Unfair die
Thomas Gottron ESWC 27.5.2014, 20Perplexity of Index Models Over Evolving LOD
Metrics: How to Interpret Perplexity
Perplexity based on cross entropy „How surprised are you about the outcome
of an experiment given you have some expections?“
1 2 3 4 5 6 1 2 3 4 5 6
Unfair die
Thomas Gottron ESWC 27.5.2014, 21Perplexity of Index Models Over Evolving LOD
Stability of Index Modelsover Evolving Data
Thomas Gottron ESWC 27.5.2014, 22Perplexity of Index Models Over Evolving LOD
Comparing Indices
Once upon a time... ... some time later
JaccardPerplexity
k1
k2
k3
...
kn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3
We are using data from the
Dynamic Linked Data
Observatory
77 Weekly snapshots, 16M triples
k1
k2
k3
...
kn
d1,1 d1,2 d1,3 ...
d2,1 d2,2
d3,1 d3,2 d3,3 ...
dn,1 dn,2 dn,3
Thomas Gottron ESWC 27.5.2014, 23Perplexity of Index Models Over Evolving LOD
Experimental Setup
Index construction / Estimation of distributions
...
T0 (Base)
...
...
T1 T2T3 TnTn-1
T0
„dev
iatio
n“
T1 T2T3 TnTn-1
Perplexity
Thomas Gottron ESWC 27.5.2014, 24Perplexity of Index Models Over Evolving LOD
Results: Jaccard Similarity of Key Set
Thomas Gottron ESWC 27.5.2014, 25Perplexity of Index Models Over Evolving LOD
Results: Normalised Perplexity
Thomas Gottron ESWC 27.5.2014, 26Perplexity of Index Models Over Evolving LOD
Results: Normalised Perplexity (Zoom in)
Thomas Gottron ESWC 27.5.2014, 27Perplexity of Index Models Over Evolving LOD
Conclusion
Summary
Evaluation of stability of 12 LOD index models Application independent evaluation framework Good stability of schema-level indices
Future Work
Index specific assessment of quality based on samples Accuracy in answering queries
Thomas Gottron ESWC 27.5.2014, 28Perplexity of Index Models Over Evolving LOD
Thanks!
Contact:Thomas GottronWeST – Institute for Web Science and TechnologiesUniversität Koblenz-Landau [email protected] #eswc2014GottronTC