rdf analytics: lenses over semantic graphs

50
May 14, 2014 RDF Analytics Lenses over Semantic Graphs Dario Colazzo 3,1 Franc¸oisGoasdou´ e 4,1 Ioana Manolescu 1,2 Alexandra Roatis ¸ 2,1 1 OAK – Inria, France 2 LRI – Universit´ e Paris-Sud, France 3 LAMSADE – Universit´ e Paris Dauphine, France 4 PILGRIM – Universit´ e Rennes 1, France

Upload: alexandra-roati

Post on 05-Dec-2014

161 views

Category:

Data & Analytics


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: RDF Analytics: Lenses over Semantic Graphs

May 14, 2014

RDF AnalyticsLenses over Semantic GraphsDario Colazzo 3,1 Francois Goasdoue 4,1

Ioana Manolescu 1,2 Alexandra Roatis 2,1

1OAK – Inria, France2LRI – Universite Paris-Sud, France3LAMSADE – Universite Paris Dauphine, France4PILGRIM – Universite Rennes 1, France

Page 2: RDF Analytics: Lenses over Semantic Graphs

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2

RDF data warehousing scenario

þAlice

software engineer

IT companybuilds user applications

open RDF data (Grenoble)

worksFor

DS: Restaurants

(i) heterogeneous data

App: clickable mapm#restaurants

region & average ratingtype of cuisine

build

RDW: relational data warehouseextract tabular data (SPARQL queries)

merge

(ii) new central concepts

DS3: MuseumsDS2: Shops

RDW2 RDW3

(iii) other missing relationships?

Bug: landmarks , museums

find

redesign

Feature: query relationshipsregion � famous people

(iv) query schema

add

Feature: new type of aggregationfor each landmark, show how many restaurants are nearby

(v) impossible ! (separate star schema; restaurants and landmarks – central entities)

add

Page 3: RDF Analytics: Lenses over Semantic Graphs

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2

RDF data warehousing scenario

þAlice

software engineer

IT companybuilds user applications

open RDF data (Grenoble)

worksFor

DS: Restaurants

(i) heterogeneous data

App: clickable mapm#restaurants

region & average ratingtype of cuisine

build

RDW: relational data warehouseextract tabular data (SPARQL queries)

merge

(ii) new central concepts

DS3: MuseumsDS2: Shops

RDW2 RDW3

(iii) other missing relationships?

Bug: landmarks , museums

find

redesign

Feature: query relationshipsregion � famous people

(iv) query schema

add

Feature: new type of aggregationfor each landmark, show how many restaurants are nearby

(v) impossible ! (separate star schema; restaurants and landmarks – central entities)

add

Page 4: RDF Analytics: Lenses over Semantic Graphs

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2

RDF data warehousing scenario

þAlice

software engineer

IT companybuilds user applications

open RDF data (Grenoble)

worksFor

DS: Restaurants

(i) heterogeneous data

App: clickable mapm#restaurants

region & average ratingtype of cuisine

build

RDW: relational data warehouseextract tabular data (SPARQL queries)

merge

(ii) new central concepts

DS3: MuseumsDS2: Shops

RDW2 RDW3

(iii) other missing relationships?

Bug: landmarks , museums

find

redesign

Feature: query relationshipsregion � famous people

(iv) query schema

add

Feature: new type of aggregationfor each landmark, show how many restaurants are nearby

(v) impossible ! (separate star schema; restaurants and landmarks – central entities)

add

Page 5: RDF Analytics: Lenses over Semantic Graphs

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2

RDF data warehousing scenario

þAlice

software engineer

IT companybuilds user applications

open RDF data (Grenoble)

worksFor

DS: Restaurants

(i) heterogeneous data

App: clickable mapm#restaurants

region & average ratingtype of cuisine

build

RDW: relational data warehouseextract tabular data (SPARQL queries)

merge

(ii) new central concepts

DS3: MuseumsDS2: Shops

RDW2 RDW3

(iii) other missing relationships?

Bug: landmarks , museums

find

redesign

Feature: query relationshipsregion � famous people

(iv) query schema

add

Feature: new type of aggregationfor each landmark, show how many restaurants are nearby

(v) impossible ! (separate star schema; restaurants and landmarks – central entities)

add

Page 6: RDF Analytics: Lenses over Semantic Graphs

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2

RDF data warehousing scenario

þAlice

software engineer

IT companybuilds user applications

open RDF data (Grenoble)

worksFor

DS: Restaurants

(i) heterogeneous data

App: clickable mapm#restaurants

region & average ratingtype of cuisine

build

RDW: relational data warehouseextract tabular data (SPARQL queries)

merge

(ii) new central concepts

DS3: MuseumsDS2: Shops

RDW2 RDW3

(iii) other missing relationships?

Bug: landmarks , museums

find

redesign

Feature: query relationshipsregion � famous people

(iv) query schema

add

Feature: new type of aggregationfor each landmark, show how many restaurants are nearby

(v) impossible ! (separate star schema; restaurants and landmarks – central entities)

add

Page 7: RDF Analytics: Lenses over Semantic Graphs

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2

RDF data warehousing scenario

þAlice

software engineer

IT companybuilds user applications

open RDF data (Grenoble)

worksFor

DS: Restaurants

(i) heterogeneous data

App: clickable mapm#restaurants

region & average ratingtype of cuisine

build

RDW: relational data warehouseextract tabular data (SPARQL queries)

merge

(ii) new central concepts

DS3: MuseumsDS2: Shops

RDW2 RDW3

(iii) other missing relationships?

Bug: landmarks , museums

find

redesign

Feature: query relationshipsregion � famous people

(iv) query schema

add

Feature: new type of aggregationfor each landmark, show how many restaurants are nearby

(v) impossible ! (separate star schema; restaurants and landmarks – central entities)

add

Page 8: RDF Analytics: Lenses over Semantic Graphs

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2

RDF data warehousing scenario

þAlice

software engineer

IT companybuilds user applications

open RDF data (Grenoble)

worksFor

DS: Restaurants

(i) heterogeneous data

App: clickable mapm#restaurants

region & average ratingtype of cuisine

build

RDW: relational data warehouseextract tabular data (SPARQL queries)

merge

(ii) new central concepts

DS3: MuseumsDS2: Shops

RDW2 RDW3

(iii) other missing relationships?

Bug: landmarks , museums

find

redesign

Feature: query relationshipsregion � famous people

(iv) query schema

add

Feature: new type of aggregationfor each landmark, show how many restaurants are nearby

(v) impossible ! (separate star schema; restaurants and landmarks – central entities)

add

Page 9: RDF Analytics: Lenses over Semantic Graphs

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2

RDF data warehousing scenario

þAlice

software engineer

IT companybuilds user applications

open RDF data (Grenoble)

worksFor

DS: Restaurants

(i) heterogeneous data

App: clickable mapm#restaurants

region & average ratingtype of cuisine

build

RDW: relational data warehouseextract tabular data (SPARQL queries)

merge

(ii) new central concepts

DS3: MuseumsDS2: Shops

RDW2 RDW3

(iii) other missing relationships?

Bug: landmarks , museums

find

redesign

Feature: query relationshipsregion � famous people

(iv) query schema

add

Feature: new type of aggregationfor each landmark, show how many restaurants are nearby

(v) impossible ! (separate star schema; restaurants and landmarks – central entities)

add

Page 10: RDF Analytics: Lenses over Semantic Graphs

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2

RDF data warehousing scenario

þAlice

software engineer

IT companybuilds user applications

open RDF data (Grenoble)

worksFor

DS: Restaurants

(i) heterogeneous data

App: clickable mapm#restaurants

region & average ratingtype of cuisine

build

RDW: relational data warehouseextract tabular data (SPARQL queries)

merge

(ii) new central concepts

DS3: MuseumsDS2: Shops

RDW2 RDW3

(iii) other missing relationships?

Bug: landmarks , museums

find

redesign

Feature: query relationshipsregion � famous people

(iv) query schema

add

Feature: new type of aggregationfor each landmark, show how many restaurants are nearby

(v) impossible ! (separate star schema; restaurants and landmarks – central entities)

add

Page 11: RDF Analytics: Lenses over Semantic Graphs

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2

RDF data warehousing scenario

þAlice

software engineer

IT companybuilds user applications

open RDF data (Grenoble)

worksFor

DS: Restaurants

(i) heterogeneous data

App: clickable mapm#restaurants

region & average ratingtype of cuisine

build

RDW: relational data warehouseextract tabular data (SPARQL queries)

merge

(ii) new central concepts

DS3: MuseumsDS2: Shops

RDW2 RDW3

(iii) other missing relationships?

Bug: landmarks , museums

find

redesign

Feature: query relationshipsregion � famous people

(iv) query schema

add

Feature: new type of aggregationfor each landmark, show how many restaurants are nearby

(v) impossible ! (separate star schema; restaurants and landmarks – central entities)

add

Page 12: RDF Analytics: Lenses over Semantic Graphs

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 3

RDF data warehousing

Application needs:(i) support of heterogeneous data(ii) multiple central concepts(iii) support for RDF semantics when querying(iv) possibility to query the relationships between entities (the schema)(v) flexible choice of aggregation dimensions

This work:I redesign the core data analytics concepts and tools for RDFI formal framework for warehouse-style analytics on RDF data

suited to heterogeneous, semantic-rich corpora of Linked Data

Page 13: RDF Analytics: Lenses over Semantic Graphs

Summary

1. RDF Graphs & BGP Queries2. RDF Graph Analysis3. On-Line Analytical Processing4. Empirical Evaluation5. Sum Up

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 4

Page 14: RDF Analytics: Lenses over Semantic Graphs

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 5

RDF Graphs & BGP Queries– recall –

Page 15: RDF Analytics: Lenses over Semantic Graphs

The Resource Description Framework (RDF)

RDF graph – set of triples

Assertion Triple Relational notationClass s rdf:type o o(s)Property s p o p(s, o)

user1

user2

worksWith

Bill hasName28 hasAge

Madrid inCity

Studentrdf:type:b1wrote

blog1

inBlogresource (URI)

blank node

literal (string)

property

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 6

Page 16: RDF Analytics: Lenses over Semantic Graphs

RDF Schema (RDFS)

– declare semantic constraints between classes and properties

Constraint Triple Relational notationSubclass s rdfs:subClassOf o s ⊆ o

Subproperty s rdfs:subPropertyOf o s ⊆ o

Domain typing s rdfs:domain o Πdomain(s) ⊆ oRange typing s rdfs:range o Πrange(s) ⊆ o

Person

Student

rdfs:subClassOf

knowsrdfs:rangerdfs:domain

worksWith

rdfs:subPropertyOf

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 7

Page 17: RDF Analytics: Lenses over Semantic Graphs

Open-world assumption and RDF entailmentRDF data model – based on the open-world assumption.→ deductive constraints – implicitly propagate tuples

Entailment – reasoning mechanismset of explicit triples

+ → derive implicit triplessome entailment rules

Exhaustive application of entailment → saturation (closure)

The semantics of an RDF graph is its saturation.

user1 Student

Person

rdfs:subClassOf

rdf:type

rdf:type

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 8

Page 18: RDF Analytics: Lenses over Semantic Graphs

Basic Graph Pattern (BGP) queries

→ subset of SPARQL; BGP – conjunctions of triple patterns

q(y) :- x rdf:type Person, x hasName y

query evaluation , query answeringI the evaluation of a query only uses the graph’s explicit triplesI (complete) answer set – evaluate q against the graph’s saturation

user1 Student

Person

rdfs:subClassOf

rdf:type

rdf:type

Bill

hasName

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 9

Page 19: RDF Analytics: Lenses over Semantic Graphs

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 10

RDF Graph Analysis– formal framework for warehousing RDF data –

Page 20: RDF Analytics: Lenses over Semantic Graphs

Analytical schema (AnS) and instance (I)RDF graph:

Personuser1

user2

rdf:type

rdf:type

BillhasNamepost1

post2

wrote

wroteblog1

inBlog

inBlog

Code BloghasName

Analytical schema:→ labeled directed graph

n1

λ(n1) ← Blogger

δ(n1) ←q(x) :- x rdf:type Person,

x wrote y ,y inBlog z

n2

λ(n2) ← Nameδ(n2) ← q(x) :- y hasName x

e2

λ(e2) ← identifiedBy

δ(e2) ←q(x , y) :- x rdf:type Person,

x hasName y

Instance of the analytical schema w.r.t. the graph

x rdf:type λ(n1)

user1 rdf:type Bloggeruser2 rdf:type Blogger

x λ(e2) yuser1 identifiedBy Bill

x rdf:type λ(n2)

Bill rdf:type NameCode Blog rdf:type Name

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 11

Page 21: RDF Analytics: Lenses over Semantic Graphs

Analytical schema (AnS) and instance (I)RDF graph:

Personuser1

user2

rdf:type

rdf:type

BillhasNamepost1

post2

wrote

wroteblog1

inBlog

inBlog

Code BloghasName

Analytical schema:→ labeled directed graph

n1

λ(n1) ← Blogger

δ(n1) ←q(x) :- x rdf:type Person,

x wrote y ,y inBlog z

n2

λ(n2) ← Nameδ(n2) ← q(x) :- y hasName x

e2

λ(e2) ← identifiedBy

δ(e2) ←q(x , y) :- x rdf:type Person,

x hasName y

Instance of the analytical schema w.r.t. the graphx rdf:type λ(n1)

user1 rdf:type Bloggeruser2 rdf:type Blogger

x λ(e2) yuser1 identifiedBy Bill

x rdf:type λ(n2)

Bill rdf:type NameCode Blog rdf:type Name

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 11

Page 22: RDF Analytics: Lenses over Semantic Graphs

Analytical schema (AnS) and instance (I)RDF graph:

Personuser1

user2

rdf:type

rdf:type

BillhasNamepost1

post2

wrote

wroteblog1

inBlog

inBlog

Code BloghasName

Analytical schema:→ labeled directed graph

n1

λ(n1) ← Blogger

δ(n1) ←q(x) :- x rdf:type Person,

x wrote y ,y inBlog z

n2

λ(n2) ← Nameδ(n2) ← q(x) :- y hasName x

e2

λ(e2) ← identifiedBy

δ(e2) ←q(x , y) :- x rdf:type Person,

x hasName y

Instance of the analytical schema w.r.t. the graphx rdf:type λ(n1)

user1 rdf:type Bloggeruser2 rdf:type Blogger

x λ(e2) yuser1 identifiedBy Bill

x rdf:type λ(n2)

Bill rdf:type NameCode Blog rdf:type Name

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 11

Page 23: RDF Analytics: Lenses over Semantic Graphs

Analytical schema (AnS) and instance (I)RDF graph:

Personuser1

user2

rdf:type

rdf:type

BillhasNamepost1

post2

wrote

wroteblog1

inBlog

inBlog

Code BloghasName

Analytical schema:→ labeled directed graph

n1

λ(n1) ← Blogger

δ(n1) ←q(x) :- x rdf:type Person,

x wrote y ,y inBlog z

n2

λ(n2) ← Nameδ(n2) ← q(x) :- y hasName x

e2

λ(e2) ← identifiedBy

δ(e2) ←q(x , y) :- x rdf:type Person,

x hasName y

Instance of the analytical schema w.r.t. the graphx rdf:type λ(n1)

user1 rdf:type Bloggeruser2 rdf:type Blogger

x λ(e2) yuser1 identifiedBy Bill

x rdf:type λ(n2)

Bill rdf:type NameCode Blog rdf:type Name

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 11

Page 24: RDF Analytics: Lenses over Semantic Graphs

Analytical schema (AnS) and instance (I)RDF graph:

Personuser1

user2

rdf:type

rdf:type

BillhasNamepost1

post2

wrote

wroteblog1

inBlog

inBlog

Code BloghasName

Analytical schema:→ labeled directed graph

n1

λ(n1) ← Blogger

δ(n1) ←q(x) :- x rdf:type Person,

x wrote y ,y inBlog z

n2

λ(n2) ← Nameδ(n2) ← q(x) :- y hasName x

e2

λ(e2) ← identifiedBy

δ(e2) ←q(x , y) :- x rdf:type Person,

x hasName y

! data heterogeneity preserved !

Instance of the analytical schema w.r.t. the graphx rdf:type λ(n1)

user1 rdf:type Bloggeruser2 rdf:type Blogger

x λ(e2) yuser1 identifiedBy Bill

x rdf:type λ(n2)

Bill rdf:type NameCode Blog rdf:type Name

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 11

Page 25: RDF Analytics: Lenses over Semantic Graphs

Analytical schema (AnS) and instance (I)RDF graph:

Personuser1

user2

rdf:type

rdf:type

BillhasNamepost1

post2

wrote

wroteblog1

inBlog

inBlog

Code BloghasName

Analytical schema:→ labeled directed graph

n1

λ(n1) ← Blogger

δ(n1) ←q(x) :- x rdf:type Person,

x wrote y ,y inBlog z

n2

λ(n2) ← Nameδ(n2) ← q(x) :- y hasName x

e2

λ(e2) ← identifiedBy

δ(e2) ←q(x , y) :- x rdf:type Person,

x hasName y

! easy to extend !

Instance of the analytical schema w.r.t. the graphx rdf:type λ(n1)

user1 rdf:type Bloggeruser2 rdf:type Blogger

x λ(e2) yuser1 identifiedBy Bill

x rdf:type λ(n2)

Bill rdf:type NameCode Blog rdf:type Name

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 11

Page 26: RDF Analytics: Lenses over Semantic Graphs

Analytical query (AnQ)

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 12

Analytical schema: Instance:

n1 : Blogger n2 : Citye2 : from

n3 : Valuee3 : age

n4 : BlogPost

e4 : posted

n5 : Site e5 : on

user1

user2

user3

28 age

Madrid from

40 age

35 age

New York from

post1

post2

post3

post4

postedposted

posted

posted

blog1

blog2

on

ononon

Query: Find the number of sites where each blogger posts,classified by the blogger’s age and city.

c(x , d1, d2) :- x age d1, x from d2m(x , v) :- x posted y , y on vcount

Page 27: RDF Analytics: Lenses over Semantic Graphs

Analytical query (AnQ)

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 12

Analytical schema: Instance:

n1 : Blogger n2 : Citye2 : from

n3 : Valuee3 : age

n4 : BlogPost

e4 : posted

n5 : Site e5 : on

user1

user2

user3

28 age

Madrid from

40 age

35 ageNew York from

post1

post2

post3

post4

postedposted

posted

posted

blog1

blog2

on

ononon

Query: Find the number of sites where each blogger posts,classified by the blogger’s age and city.

c(x , d1, d2) :- x age d1, x from d2{ 〈user1, “28”, “Madrid”〉 , 〈user3, “35”, “New York”〉 }

m(x , v) :- x posted y , y on vcount

Page 28: RDF Analytics: Lenses over Semantic Graphs

Analytical query (AnQ)

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 12

Analytical schema: Instance:

n1 : Blogger n2 : Citye2 : from

n3 : Valuee3 : age

n4 : BlogPost

e4 : posted

n5 : Site e5 : on

user1

user2

user3

28 age

Madrid from

40 age

35 age

New York from

post1

post2

post3

post4

postedposted

posted

posted

blog1

blog2

on

ononon

Query: Find the number of sites where each blogger posts,classified by the blogger’s age and city.

c(x , d1, d2) :- x age d1, x from d2{ 〈user1, “28”, “Madrid”〉 , 〈user3, “35”, “New York”〉 }

m(x , v) :- x posted y , y on v{〈user1, blog1〉, 〈user1, blog2〉, 〈user2, blog2〉, 〈user3, blog2〉}

count

Page 29: RDF Analytics: Lenses over Semantic Graphs

Analytical query (AnQ)

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 12

Analytical schema: Instance:

n1 : Blogger n2 : Citye2 : from

n3 : Valuee3 : age

n4 : BlogPost

e4 : posted

n5 : Site e5 : on

user1

user2

user3

28 age

Madrid from

40 age

35 age

New York from

post1

post2

post3

post4

postedposted

posted

posted

blog1

blog2

on

ononon

Query: Find the number of sites where each blogger posts,classified by the blogger’s age and city.

c(x , d1, d2) :- x age d1, x from d2{ 〈user1, “28”, “Madrid”〉 , 〈user3, “35”, “New York”〉 }

m(x , v) :- x posted y , y on v{〈user1, blog1〉, 〈user1, blog2〉, 〈user2, blog2〉, 〈user3, blog2〉}

count{ 〈“28”, “Madrid”, 2〉 , 〈“35”, “New York”, 1〉 }

Page 30: RDF Analytics: Lenses over Semantic Graphs

Analytical query answering

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 13

B through analytical schema materializationB through analytical query reformulation

Page 31: RDF Analytics: Lenses over Semantic Graphs

Analytical query answering

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 13

B through analytical schema materializationB through analytical query reformulation

Analytical schema:

n1

λ(n1) ← Blogger

δ(n1) ←q(x) :- x rdf:type Person,

x wrote y ,y inBlog z

e1

λ(e1) ← acquaintedWith

δ(e1) ←q(x , y) :- z rdfs:subPropertyOf knows,

x z y

Query:c(x , d) :- x rdf:type Blogger,

x acquaintedWith dcZ(x , d) :- x rdf:type Person,

x wrote y1,y1 inBlog y2,z1 rdfs:subPropertyOf knows,x z1 d

Page 32: RDF Analytics: Lenses over Semantic Graphs

Analytical query answering

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 13

B through analytical schema materializationB through analytical query reformulationAnalytical schema:

n1

λ(n1) ← Blogger

δ(n1) ←q(x) :- x rdf:type Person,

x wrote y ,y inBlog z

e1

λ(e1) ← acquaintedWith

δ(e1) ←q(x , y) :- z rdfs:subPropertyOf knows,

x z y

Query:c(x , d) :- x rdf:type Blogger,

x acquaintedWith d

cZ(x , d) :- x rdf:type Person,x wrote y1,y1 inBlog y2,

Page 33: RDF Analytics: Lenses over Semantic Graphs

Analytical query answering

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 13

B through analytical schema materializationB through analytical query reformulationAnalytical schema:

n1

λ(n1) ← Blogger

δ(n1) ←q(x) :- x rdf:type Person,

x wrote y ,y inBlog z

e1

λ(e1) ← acquaintedWith

δ(e1) ←q(x , y) :- z rdfs:subPropertyOf knows,

x z y

Query:c(x , d) :- x rdf:type Blogger,

x acquaintedWith d

cZ(x , d) :- x rdf:type Person,x wrote y1,y1 inBlog y2,z1 rdfs:subPropertyOf knows,x z1 d

Page 34: RDF Analytics: Lenses over Semantic Graphs

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 14

On-Line Analytical Processing– applying OLAP operations –

Page 35: RDF Analytics: Lenses over Semantic Graphs

Slice, dice, drill-in and drill-out

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 15

Query: Find the number of sites where each blogger posts,classified by the blogger’s age and city.c(x , d1, d2) :- x age d1, x from d2m(x , v) :- x posted y , y on vcount

Page 36: RDF Analytics: Lenses over Semantic Graphs

Slice, dice, drill-in and drill-out

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 15

Query: Find the number of sites where each blogger posts,classified by the blogger’s age and city.c(x , d1, d2) :- x age d1, x from d2m(x , v) :- x posted y , y on vcount

Slice: bind an aggregation dimension to a single valuecΣ′(x , d1, d2) :- x age d1, x from d2Σ′ = { d1 ← “35” }

Page 37: RDF Analytics: Lenses over Semantic Graphs

Slice, dice, drill-in and drill-out

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 15

Query: Find the number of sites where each blogger posts,classified by the blogger’s age and city.c(x , d1, d2) :- x age d1, x from d2m(x , v) :- x posted y , y on vcount

Slice: bind an aggregation dimension to a single valuecΣ′(x , d1, d2) :- x age d1, x from d2Σ′ = { d1 ← “35” }

Dice: bind several aggregation dimensions to sets of valuescΣ′(x , d1, d2) :- x age d1, x from d2Σ′ = { d1 ← {“28”}, d2 ← {“Madrid”, “Kyoto”} }

Page 38: RDF Analytics: Lenses over Semantic Graphs

Slice, dice, drill-in and drill-out

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 15

Query: Find the number of sites where each blogger posts,classified by the blogger’s age and city.c(x , d1, d2) :- x age d1, x from d2m(x , v) :- x posted y , y on vcount

Slice: bind an aggregation dimension to a single valuecΣ′(x , d1, d2) :- x age d1, x from d2Σ′ = { d1 ← “35” }

Dice: bind several aggregation dimensions to sets of valuescΣ′(x , d1, d2) :- x age d1, x from d2Σ′ = { d1 ← {“28”}, d2 ← {“Madrid”, “Kyoto”} }

Drill-in: remove a dimension from the classifierc ′(x , d2) :- x from d2

Page 39: RDF Analytics: Lenses over Semantic Graphs

Slice, dice, drill-in and drill-out

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 15

Query: Find the number of sites where each blogger posts,classified by the blogger’s age and city.c(x , d1, d2) :- x age d1, x from d2m(x , v) :- x posted y , y on vcount

Slice: bind an aggregation dimension to a single valuecΣ′(x , d1, d2) :- x age d1, x from d2Σ′ = { d1 ← “35” }

Dice: bind several aggregation dimensions to sets of valuescΣ′(x , d1, d2) :- x age d1, x from d2Σ′ = { d1 ← {“28”}, d2 ← {“Madrid”, “Kyoto”} }

Drill-in: remove a dimension from the classifierc ′(x , d2) :- x from d2

Drill-out: add a dimension to the classifierc ′(x , d1, d2, d3) :- x age d1, x from d2, x acquaintedWith d3

Page 40: RDF Analytics: Lenses over Semantic Graphs

Roll-up and drill-down

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 16

Query: Find the number of sites where each blogger posts,classified by the blogger’s age and city.

c(x , d1, d2) :- x age d1, x from d2m(x , v) :- x posted y , y on vcount

nextLevel relationship – hierarchies among nodes or edges

n1 : Blogger n2 : Citye2 : from n6 : Statee6 : nextLevel

n3 : Valuee3 : age

n4 : BlogPost

e4 : posted

n5 : Site e5 : on

Roll-up: along the City dimension to the State levelc ′(x , d1, d3) :- x age d1, x from d2, d2 nextLevel d3

Page 41: RDF Analytics: Lenses over Semantic Graphs

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 17

Empirical Evaluation– experiments and demo –

Page 42: RDF Analytics: Lenses over Semantic Graphs

Experiments

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 18

Settings: kdb+ v3.0 (64 bits) – highly efficient in-memory column storeq interpreted programming language

Dataset: DBpedia Download 3.8Ontology and Ontology Infobox datasets

Hardware: 8-core DELL server at 2.13 GHz16 GB of RAMrunning Linux 2.6.31.14

Results: linear scale-up w.r.t. the data sizefor instance materialization and query answering

Page 43: RDF Analytics: Lenses over Semantic Graphs

Analytical query answering12 patterns c number of triple patterns in the classifier query

1,097 queries v number of dimension variables in the classifier querym number of triple patterns in the measure query

c1v1

m1

c1v1

m2

c1v1

m3

c2v1

m3

c3v2

m3

c4v3

m3

c5v1

m3

c5v2

m3

c5v3

m3

c5v4

m1

c5v4

m2

c5v4

m3

0

1

10

average minimum maximum

c1v1m1 (73)

c1v1m2 (53)

c1v1m3 (62)

c2v1m3 (71)

c3v2m3 (76)

c4v3m3 (130)

c5v1m3 (144)

c5v2m3 (216)

c5v3m3 (144)

c5v4m1 (28)

c5v4m2 (64)

c5v4m3 (36)

0

1

10

100

1,000

10,000

100,000

evaluation time (s)

number of results

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 19

Page 44: RDF Analytics: Lenses over Semantic Graphs

Java GUI using the Prefuse toolkit(collaboration with Tushar Ghosh)

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 20

Page 45: RDF Analytics: Lenses over Semantic Graphs

Java GUI using the Prefuse toolkit(collaboration with Tushar Ghosh)

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 20

Page 46: RDF Analytics: Lenses over Semantic Graphs

Java GUI using the Prefuse toolkit(collaboration with Tushar Ghosh)

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 20

Page 47: RDF Analytics: Lenses over Semantic Graphs

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 21

Sum Up

Page 48: RDF Analytics: Lenses over Semantic Graphs

Related works

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 22

I Graph cube: on warehousing and OLAP multidimensional networks [SIGMOD 2011]→ do not handle heterogeneous graphs, nor data semantics, both central in RDF→ only focus on counting edges in contrast with our flexible analytical queries

I Business intelligence on complex graph data [EDBT/ICDT 2012 Workshops]→ graph data aggregated in a spatial fashion (group connected nodes into regions)→ our framework – RDF-specific + more general aggregation

I No Size Fits All – Running the Star Schema Benchmark with SPARQL andRDF Aggregate Views [ESWC 2013]→ techniques for transforming OLAP queries into SPARQL→ could be used to further optimize analytical query answering in our framework

I The MD-join: An Operator for Complex OLAP [ICDE 2001]→ separation between grouping and aggregation present in our analytical queries

is similar to the MD-join operator for RDWsI W3C’s SPARQL 1.1 Query Language

→ features SQL-style grouping and aggregation→ efficient SPARQL 1.1 platforms – ideal for deploying our framework

Page 49: RDF Analytics: Lenses over Semantic Graphs

Sum up and perspectives

RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 23

Sum up:Approach for specifying and exploiting an RDF data warehouseI define an analytical schema that captures the information of

interestI formalize analytical queries (or cubes) over the analytical schema

Instances of analytical schemas are RDF graphs themselves, whichallows to exploit the rich semantics and heterogeneous structure.

Perspectives:I semi-automatic analytical schema designI optimized OLAP operation on analytical queries resultsI efficient methods for deploying analytical schemas and analytical

queries in parallel contexts

Page 50: RDF Analytics: Lenses over Semantic Graphs

Questions?I

You Attention

Question

:b1

:b2

:b3

thank

payed

askask

askrdf:type

rdf:type

rdf:type

[email protected]

https://team.inria.fr/oak/warg/