making use of the linked data cloud: the role of index structures

Post on 05-Sep-2014

210 Views

Category:

Science

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

The intensive growth of the Linked Open Data Cloud has spawned a web of data where a multitude of data sources provides huge amounts of valuable information across different domains. Nowadays, when accessing and using Linked Data more and more often the challenging question is not so much whether there is relevant data available, but rather where it can be found and how it is structured. Thus, index structures play an important role for making use of the information in LOD cloud. In this talk I will address three aspects of Linked Data index structures: (1) a high level view and categorization of indices structures and how they can be queried and explored, (2) approaches for building index structures and the need to maintain them and (3) some example applications which greatly benefit from indices over linked data.

TRANSCRIPT

Institute for Web Science & Technologies – WeST

Making Use of the Linked Data Cloud:

The Role of Index Structures

Thomas Gottron

March 20th, 2014 FGDB Frühjahrstreffen

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 2 Role of Index Structures on LOD

Making Use of the Linked Data Cloud ...

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

LOD: a rich, huge, diverse, public and distributed knowledge base on the Web.

Pros Cons

rich

knowledge

base

diverse public

huge

on the Web

diverse distributed

Shall I?

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 3 Role of Index Structures on LOD

Challenges Underlying the „Cons“

Volume Semi-

structured

No schema

No central access point

Multitude of data sources

Quality

Dynamics Availability

huge

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 4 Role of Index Structures on LOD

Making Use of the Linked Data Cloud ...

Pros Cons

rich

knowledge

base

diverse public

huge

on the Web

diverse distributed

Shall I?

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 5 Role of Index Structures on LOD

20 years ago ...

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 6 Role of Index Structures on LOD

Making Use of the World Wide Web... Shall I?

Source: Chris 73 / Wikimedia Commons

Pros Cons

rich

document

collection

diverse public

huge

on the

Internet

diverse distributed

Technical solutions to

the problems

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 7 Role of Index Structures on LOD

Making Use of the Linked Data Cloud ... Shall I?

Pros Cons

rich

knowledge

base

diverse public

huge

on the Web

diverse distributed

Inde

x st

ruct

ures

Provide:

Solutions for the storage, management, organization

of, and access to a

rich, huge, diverse distributed knowledge

base on the Web.

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 8 Role of Index Structures on LOD

Types of Indices

Building Indices

Using Indices

k1

k2

k3

...

kn

d1,1 d1,2 d1,3 ...

d2,1 d2,2

d3,1 d3,2 d3,3 ...

dn,1 dn,2 dn,3 ...

Sea

rch

data

st

ruct

ure

Efficient storage and retrieval

s1 o1 p1 c1

s1 o1 p2 c1

s2 o2 p2 c1

s1 p1 p2

s2 p2

p1 p2 s1 s3

p2 s2

E1 rdf:type dc:creator

E2

Bad News ... dc:title

foaf:Document

swrc:InProceedings

rdf:type

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 9 Role of Index Structures on LOD

Types of Indices

Building Indices

Using Indices

k1

k2

k3

...

kn

d1,1 d1,2 d1,3 ...

d2,1 d2,2

d3,1 d3,2 d3,3 ...

dn,1 dn,2 dn,3 ...

Sea

rch

data

st

ruct

ure

Efficient storage and retrieval

s1 o1 p1 c1

s1 o1 p2 c1

s2 o2 p2 c1

s1 p1 p2

s2 p2

p1 p2 s1 s3

p2 s2

E1 rdf:type dc:creator

E2

Bad News ... dc:title

foaf:Document

swrc:InProceedings

rdf:type

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 10 Role of Index Structures on LOD

Data Format

§  Linked Data as N-Quads:

triple – what is the information?

context URI – where does it come from?

s o p

c

( ) s o p c

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 11 Role of Index Structures on LOD

Index Models

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 12 Role of Index Structures on LOD

(Abstract) Index Models

w  D : Data elements to be retrieved (payload) w  K : Key elements to access the data (index elements) w  σ : Selection function: How to get data for a key

k1

k2

k3

...

kn

d1,1 d1,2 d1,3 ...

d2,1 d2,2

d3,1 d3,2 d3,3 ...

dn,1 dn,2 dn,3 ...

DK σ

Sea

rch

data

st

ruct

ure

Efficient storage and retrieval

℘( )

Data items / Payload Keys

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 13 Role of Index Structures on LOD

Concrete Example: Subject Based Index Model

ukob:Gottron

ukob:Staab

ukob:Schegi

...

tud:CGottron

(ukob:Gottron, rdf:type, foaf:Person) (ukob:Gottron, foaf:knows, ukob:Staab) ...

(ukob:Staab, swrc:institution, ukob:WeST) (ukob:Staab, foaf:name, „Steffen Staab“) ...

(ukob:Schegi, rdf:type, foaf:Person) (ukob:Schegi, foaf:name, „Stefan Scheglmann“)

(tud:CGottron, swrc:institution, tud:KOM) (tud:CGottron, foaf:knows, ukob:Gottron) ...

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 14 Role of Index Structures on LOD

Schema-level Indices

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 15 Role of Index Structures on LOD

Schema Information on the LOD Cloud

(No) Schema?

Guidelines / best practices

Automatic tools Social effects

Emerging Schema!

Induce from data observations

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 16 Role of Index Structures on LOD

Examples for Schema Information

p1

x p2

p3

{p1, p2, p3}

...

x, ... {cA, cB}

...

y, ...

rdf:type

y cB

cA

rdf:type

Property Set Type Set

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 17 Role of Index Structures on LOD

Indexing „Styles“ for the Payload

Full Caching

local

Web

s o p c

Triples

local

Web

s o p

Entities

local

Web

s

Data Sources

local

Web

c

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 18 Role of Index Structures on LOD

Schema-based Access to the LOD cloud

? foaf:Document

fb:Computer_Scientist

dc:creator

x

swrc:InProceedings

SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type swrc:InProceedings . ?x dc:creator ?y . ?y rdf:type fb:Computer_Scientist }

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 19 Role of Index Structures on LOD

Schema-based Access to the LOD cloud

Schema-level Index

Where? •  ACM •  DBLP

SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type swrc:InProceedings . ?x dc:creator ?y . ?y rdf:type fb:Computer_Scientist }

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 20 Role of Index Structures on LOD

Building Indices

s1 o1 p1 c1

s1 o1 p2 c1

s2 o2 p2 c1

s1 p1 p2

s2 p2

p1 p2 s1 s3

p2 s2

Types of Indices

k1

k2

k3

...

kn

d1,1 d1,2 d1,3 ...

d2,1 d2,2

d3,1 d3,2 d3,3 ...

dn,1 dn,2 dn,3 ...

Sea

rch

data

st

ruct

ure

Efficient storage and retrieval

Using Indices

E1 rdf:type dc:creator

E2

Bad News ... dc:title

foaf:Document

swrc:InProceedings

rdf:type

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 21 Role of Index Structures on LOD

Index Construction

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 22 Role of Index Structures on LOD

Building Indices: Operators

§  Combination of few simple operations w  Aggregate, Join, Invert

§  Example: Property Set index

s1 o1 p1 c1

s1 o1 p2 c1

s2 o2 p2 c1

s3 o3 p1 c1

s3 o4 p2 c1

s4 o1 p3 c1

s1 p1 p2

s2 p2

s3 p1 p2

s4 p3

p1 p2 s1 s3

p2 s2

p3 s4

Aggregate Invert

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 23 Role of Index Structures on LOD

12 Implemented Index Models

§  Triple based w  Subject à Triple w  Predicate à Triple w  Object à Triple

§  Meta data w  Keywords à Triple w  Context à Triple w  PLD à Triple

§  Schema-level w  RDF Type à Entity w  Type set (TS) à Entity w  Property set (PS) à Entity w  Incoming property set (IPS) à Entity w  Type and properties (ECS) à Entity w  SchemEX à Entity

https://github.com/gottron/lod-index-models

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 24 Role of Index Structures on LOD

Indices over Evolving Data

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 25 Role of Index Structures on LOD

Index Maintenance

2007

2008

2009 2010

2011

Not just growth, but also deletion and

modification of data

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 26 Role of Index Structures on LOD

How to Measure Accuracy?

§  Queries? w  No established query log

for data set w  Different key elements

require different queries w  Cover all of the index

§  Distributions! w  Relevant to several

applications w  Established metrics for

comparison

SPARQL

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 27 Role of Index Structures on LOD

Quantifying Divergence of Index Accuracy over Time

Index construction / Estimation of distributions

...

...

T0 (Base) T1 T2 T3 Tn

...

Tn-1

T0

„dev

iatio

n“

T1 T2 T3 Tn Tn-1

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 28 Role of Index Structures on LOD

Evolving Data: Normalised Perplexity

0.0

0.2

0.4

0.6

0.8

1.0

0 10 20 30 40 50 60 70

Norm

. P

erp

lexi

ty

Week of Data Snapshot

Subject Predicate Object

0.0

0.2

0.4

0.6

0.8

1.0

0 10 20 30 40 50 60 70

No

rm.

Pe

rple

xity

Week of Data Snapshot

Context Keywords PLD

0.0

0.2

0.4

0.6

0.8

1.0

0 10 20 30 40 50 60 70

No

rm.

Pe

rple

xity

Week of Data Snapshot

RDF TypeTS

PSIPS

0.0

0.2

0.4

0.6

0.8

1.0

0 10 20 30 40 50 60 70

No

rm.

Pe

rple

xity

Week of Data Snapshot

ECS SchemEX

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 29 Role of Index Structures on LOD

Evolving Data: Normalised Perplexity (Zoom in)

0.00

0.02

0.04

0.06

0.08

0.10

0 10 20 30 40 50 60 70

No

rm.

Pe

rple

xity

Week of Data Snapshot

Subject Predicate Object

0.00

0.02

0.04

0.06

0.08

0.10

0 10 20 30 40 50 60 70

No

rm.

Pe

rple

xity

Week of Data Snapshot

Context Keywords PLD

0.00

0.02

0.04

0.06

0.08

0.10

0 10 20 30 40 50 60 70

No

rm.

Pe

rple

xity

Week of Data Snapshot

RDF TypeTS

PSIPS

0.00

0.02

0.04

0.06

0.08

0.10

0 10 20 30 40 50 60 70

No

rm.

Pe

rple

xity

Week of Data Snapshot

ECS SchemEX

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 30 Role of Index Structures on LOD

Using Indices

E1 rdf:type dc:creator

E2

Bad News ... dc:title

foaf:Document

swrc:InProceedings

rdf:type

Types of Indices

k1

k2

k3

...

kn

d1,1 d1,2 d1,3 ...

d2,1 d2,2

d3,1 d3,2 d3,3 ...

dn,1 dn,2 dn,3 ...

Sea

rch

data

st

ruct

ure

Efficient storage and retrieval

Building Indices

s1 o1 p1 c1

s1 o1 p2 c1

s2 o2 p2 c1

s1 p1 p2

s2 p2

p1 p2 s1 s3

p2 s2

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 31 Role of Index Structures on LOD

Programming Support

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 32 Role of Index Structures on LOD

LITEQ and NPQL

§  Support programming with Linked Data sources

§  NPQL (Node Path Query Language) w  Intensional queries à class descriptions, properties w  Extensional queries à instance data

§  LITEQ w  Implementiation of NPQL (F# type provider) w  Autocompletion

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 33 Role of Index Structures on LOD

LITEQ and NPQL

§  RDF type and property navigation (intension)

dC.``http://example.org/ns#creature``↵ .SubTypeNavigation.`` ``http://example.org/ns#dog``

``http://example.org/ns#cat`` ``http://example.org/ns#person`` ...

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 34 Role of Index Structures on LOD

LITEQ and NPQL

§  RDF type and property navigation (intension)

dC.``http://example.org/ns#creature``↵ .SubTypeNavigation.``http://example.org/ns#dog``

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 35 Role of Index Structures on LOD

LITEQ and NPQL

§  RDF type and property navigation (intension)

dC.``http://example.org/ns#creature``↵ .SubTypeNavigation.``http://example.org/ns#dog``↵ .PropNavigation.``

``http://example.org/ns#hasOwner`` ``http://example.org/ns#hasName`` ``http://example.org/ns#taxNumber`` ...

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 36 Role of Index Structures on LOD

LITEQ and NPQL

§  RDF type and property navigation (intension)

dC.``http://example.org/ns#creature``↵ .SubTypeNavigation.``http://example.org/ns#dog``↵ .PropNavigation.``http://example.org/ns#hasOwner``

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 37 Role of Index Structures on LOD

LITEQ and NPQL

§  Accessing instances (extension) let allDogs = dC.``http://example.org/ns#creature``↵

.SubTypeNavigation.``http://example.org/ns#dog``.↵ .Extension

§  Accessing individuals

let bello = dC.``http://example.org/ns#creature``↵ .SubTypeNavigation.``http://example.org/ns#dog``↵ .Individuals.``http://example.org/ns#bello``↵ .getRdfObject

bello.get_hasName() bello.get_taxNumber()

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 38 Role of Index Structures on LOD

Exploring Entity Descriptions

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 39 Role of Index Structures on LOD

Schema-based Access to the LOD cloud

Schema-level Index

Where? •  ACM •  DBLP

SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type swrc:InProceedings . ?x dc:creator ?y . ?y rdf:type fb:Computer_Scientist }

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 40 Role of Index Structures on LOD

Schema-level Search of Relevant Data Sources

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 41 Role of Index Structures on LOD

Searching for a Suitable Description

SELECT ?x WHERE { ?x rdf:type foaf:Document }

SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type foaf:PersonalProfileDocument }

SELECT ?x WHERE { ?x rdf:type foaf:Document . ?x rdf:type sioc:Post . }

Did you mean ...

Related Queries ...

So far: gentle,

iterative modification

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 42 Role of Index Structures on LOD

Parallel Indices Over the Data

ts1

ts2

ts3

...

tsn

d1,1 d1,2 d1,3 ...

d2,1 d2,2

d3,1 d3,2 d3,3 ...

dn,1 dn,2 dn,3 ...

psA

psB

psC

...

psM

dA,1 dA,2 dA,3 ...

dB,1 dB,2

dC,1

dM,1 dM,2 dM,3 ...

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 43 Role of Index Structures on LOD

Parallel Indices Over the Data

ts1

ts2

ts3

...

tsn

d1,1 d1,2 d1,3 ...

d2,1 d2,2

d3,1 d3,2 d3,3 ...

dn,1 dn,2 dn,3

psA

psB

psC

...

psM

dA,1 dA,2 dA,3 ...

dB,1 dB,2

dC,1

dM,1 dM,2 dM,3 ...

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 44 Role of Index Structures on LOD

General Idea for Mapping

Entity Set

c1

c2

p3

p4

p5

Approx. Entity

Set

deriv

e derive

approximate

description alternative description

ts1

ts2

ts3

...

tsn

d1,1 d1,2 d1,3 ...

d2,1 d2,2

d3,1 d3,2 d3,3 ...

dn,1 dn,2 dn,3

psA

psB

psC

...

psM

dA,1 dA,2 dA,3 ...

dB,1 dB,2

dC,1

dM,1 dM,2 dM,3 ...

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 45 Role of Index Structures on LOD

Types of Indices

Building Indices

Using Indices

k1

k2

k3

...

kn

d1,1 d1,2 d1,3 ...

d2,1 d2,2

d3,1 d3,2 d3,3 ...

dn,1 dn,2 dn,3 ...

Sea

rch

data

st

ruct

ure

Efficient storage and retrieval

s1 o1 p1 c1

s1 o1 p2 c1

s2 o2 p2 c1

s1 p1 p2

s2 p2

p1 p2 s1 s3

p2 s2

E1 rdf:type dc:creator

E2

Bad News ... dc:title

foaf:Document

swrc:InProceedings

rdf:type

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 46 Role of Index Structures on LOD

Summary

Pros Cons

rich

knowledge

base

diverse public

huge

on the Web

diverse distributed

k1

k2

k3

...

kn

d1,1 d1,2 d1,3 ...

d2,1 d2,2

d3,1 d3,2 d3,3 ...

dn,1 dn,2 dn,3 ...

Technical solutions to some of the problems

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 47 Role of Index Structures on LOD

Summary

Pros Cons

rich

knowledge

base

diverse public

huge

on the Web

diverse distributed

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 48 Role of Index Structures on LOD

Thank you!

Contact: Thomas Gottron WeST – Institute for Web Science and Technologies Universität Koblenz-Landau gottron@uni-koblenz.de

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 49 Role of Index Structures on LOD

References

1.  M. Konrath, T. Gottron, and A. Scherp, “Schemex – web-scale indexed schema extraction of linked open data,” in Semantic Web Challenge, Submission to the Billion Triple Track, 2011.

2.  M. Konrath, T. Gottron, S. Staab, and A. Scherp, “Schemex—efficient construction of a data catalogue by stream-based indexing of linked data,” Journal of Web Semantics, 2012.

3.  T. Gottron, M. Knauf, S. Scheglmann, and A. Scherp, “Explicit and implicit schema information on the linked open data cloud: Joined forces or antagonists?,” Tech. Rep. 06/2012, Institut WeST, Universität Koblenz-Landau, 2012.

4.  T. Gottron and R. Pickhardt, “A detailed analysis of the quality of stream-based schema construction on linked open data,” in CSWS’12: Proceedings of the Chinese Semantic Web Symposium, 2012.

5.  T. Gottron, A. Scherp, B. Krayer, and A. Peters, “Get the google feeling: Supporting users in finding relevant sources of linked open data at web-scale,” in Semantic Web Challenge, Submission to the Billion Triple Track, 2012.

6.  T. Gottron, A. Scherp, B. Krayer, and A. Peters, “LODatio: Using a Schema-Based Index to Support Users in Finding Relevant Sources of Linked Data,” in K-CAP’13: Proceedings of the Conference on Knowledge Capture, 2013.

7.  T. Gottron, M. Knauf, S. Scheglmann, and A. Scherp, “A Systematic Investigation of Explicit and Implicit Schema Information on the Linked Open Data Cloud,” in ESWC’13: Proceedings of the 10th Extended Semantic Web Conference, 2013.

8.  J. Schaible, T. Gottron, S. Scheglmann, and A. Scherp, “LOVER: Support for Modeling Data Using Linked Open Vocabularies,” in LWDM’13: 3rd International Workshop on Linked Web Data Management, 2013.

9.  R. Dividino, A. Scherp, G. Gröner, and T. Gottron, “Change-a-LOD: Does the Schema on the Linked Data Cloud Change or Not?,” in COLD’13: International Workshop on Consuming Linked Data, 2013.

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 50 Role of Index Structures on LOD

References

10.  T. Gottron, M. Knauf, and A. Scherp, “Analysis of schema structures in the linked open data graph based on unique subject uris, pay-level domains, and vocabulary usage,” Distributed and Parallel Databases, pp. 1–39, 2014.

11.  T. Gottron and C. Gottron, “Perplexity of index models over evolving linked data,” in ESWC’14: Proceedings of the Extended Semantic Web Conference, 2014.

12.  T. Gottron, A. Scherp, and S. Scheglmann, “Providing alternative declarative descriptions for entity sets using parallel concept lattices,” in ESWC’14: Proceedings of the Extended Semantic Web Conference, 2014.

13.  Carothers, G.: Rdf 1.1 n-quads. W3C Recommendation (Feb 2014), http://www.w3. org/TR/2014/REC-n-quads-20140225/, (accessed 14 March 2014)

14.  Käfer, T., Abdelrahman, A., Umbrich, J., O’Byrne, P., Hogan, A.: Observing linked data dynamics. In: The Se- mantic Web: Semantics and Big Data, Lecture Notes in Computer Science, vol. 7882, pp. 213–227. Springer Berlin Heidelberg (2013)

Thomas Gottron FGDB Frühjahrstreffen 20.3.2014, 51 Role of Index Structures on LOD

Sources

•  Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/, This work is available under a CC-BY-SA license.

•  WorldWideWeb Around Wikipedia – Wikipedia as part of the world wide web, This Wikipedia and Wikimedia Commons image is from the user Chris 73 and is freely available at //commons.wikimedia.org/wiki/File:WorldWideWebAroundWikipedia.png under the creative commons CC-BY-SA 3.0 license.

top related