centro de investigacio´n de la web sem´antica center for...

263
Centro de Investigaci´ on de la Web Sem´ antica Center for Semantic Web Research www.ciws.cl, @ciwschile

Upload: others

Post on 08-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Centro de Investigacion de la Web SemanticaCenter for Semantic Web Research

www.ciws.cl, @ciwschile

Page 2: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Four institutions

PUC, Chile

◮ Marcelo Arenas (Boss)◮ SW, data exchange,

semistrucured data

◮ Juan Reutter◮ SW, graph DBs, DLs

◮ Cristian Riveros◮ data exchange, semistr. data

◮ Jorge Baier◮ planning, search

Univ. of Talca

Renzo Angles (SW)

UTFSM

Carlos Buil (SW)

Univ. of Chile

◮ Pablo Barcelo (Deputy)◮ graph DBs, DB theory

◮ Claudio Gutierrez◮ SW, graph DBs

◮ Jorge Perez◮ SW, data exchange

◮ Aidan Hogan◮ SW, semistr. data

◮ Barbara Poblete◮ web mining, SNA

◮ Benjamın Bustos◮ multimedia

Page 3: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Query Languages for Graph DBs(Bridging the Gap Between Theory and Practice)

Pablo Barcelo and Aidan HoganDCC, Universidad de Chile

Center for Semantic Web Research (www.ciws.cl)

Page 4: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

BACKGROUND AND OBJECTIVES

Page 5: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Graph databases

Trendy applications:

◮ Social network analysis

◮ Semantic web

◮ Scientific databases

◮ Software bug localization

◮ Geo-routing

Page 6: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Graph databases

Trendy applications:

◮ Social network analysis

◮ Semantic web

◮ Scientific databases

◮ Software bug localization

◮ Geo-routing

More in general:

◮ Wherever connections are as important as data

Page 7: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

What is a graph database?

A data management system that exposes a graph data model.1

1Graph databases. Robinson, Webber, & Eifrem. O’Reilly, 2013.

Page 8: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

What is a graph database?

A data management system that exposes a graph data model.1

Several existing graph DB engines and query languages:

◮ DEX/Sparksee - basic algebra

◮ IBM System G - Gremlin

◮ Neo4J - Cypher

◮ Oracle PGX - PGQL

◮ RDF stores (Virtuoso, AllegroGraph, Oracle, IBM) - SPARQL

1Graph databases. Robinson, Webber, & Eifrem. O’Reilly, 2013.

Page 9: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

What graph databases are good for?

◮ Flexible modelling of interconnected data

◮ Agile evolution of the data model

◮ Scalable evaluation of join-intensive queries

Page 10: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

My personal story

◮ Since 2009: Working on theory of query languages for graph DBs

◮ Since 2015: Working group of LDBC for the design of such language

Page 11: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

My personal story

◮ Since 2009: Working on theory of query languages for graph DBs

◮ Since 2015: Working group of LDBC for the design of such language

Conclusion:

◮ Theory and practice are more connected than expected

Page 12: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Objectives

◮ Identify topics of common interest for theoreticians and developers

◮ Formalize relevant concepts (syntax, semantics, terminology, etc)

◮ Understand tradeoff expressiveness/efficiency

Page 13: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Overview of the course

////////Aidan/////will///////teach///////////Monday,///////////Tuesday,//////and///////////////Wednesday:

◮ ///////////Semantic//////web,////////RDF,/////and///////////SPARQL

◮ /////Half//////class////////based////on////////////////presentation,//////half///////based////on/////lab

Pablo will teach on Thursday and Friday:

◮ Property graphs, graph patterns, path queries

◮ Lectures presenting how theory of graph DBs relates to practice

Page 14: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Syllabus

Page 15: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Syllabus

//////////Monday:////////////Semantic///////web,//////RDF//////data/////////model,////////////Wikidata

//////////Tuesday:////////////SPARQL/////1.0,///////basic////////graph///////////patterns,///////basic/////////////operations//////////(union,/////////////projection,////////////optional),//////////solution/////////////modifiers,///////graph////////////construct/

//////////////Wednesday:////////////SPARQL/////1.1,///////////property////////paths,////////////negation,////////////////aggregation,//////////////distribution,///////////updates

Page 16: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Syllabus

//////////Monday:////////////Semantic///////web,//////RDF//////data/////////model,////////////Wikidata

//////////Tuesday:////////////SPARQL/////1.0,///////basic////////graph///////////patterns,///////basic/////////////operations//////////(union,/////////////projection,////////////optional),//////////solution/////////////modifiers,///////graph////////////construct/

//////////////Wednesday:////////////SPARQL/////1.1,///////////property////////paths,////////////negation,////////////////aggregation,//////////////distribution,///////////updates

Thursday: Graph DBs, property graphs, graph patterns, expressiveness,complexity of evaluation

Page 17: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Syllabus

//////////Monday:////////////Semantic///////web,//////RDF//////data/////////model,////////////Wikidata

//////////Tuesday:////////////SPARQL/////1.0,///////basic////////graph///////////patterns,///////basic/////////////operations//////////(union,/////////////projection,////////////optional),//////////solution/////////////modifiers,///////graph////////////construct/

//////////////Wednesday:////////////SPARQL/////1.1,///////////property////////paths,////////////negation,////////////////aggregation,//////////////distribution,///////////updates

Thursday: Graph DBs, property graphs, graph patterns, expressiveness,complexity of evaluation

Friday: Path query languages, regular path queries, ungrouping,expressiveness, complexity

Page 18: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Evaluation

35% in-class labs, 15% applied assignment, 50% theoretical assignment

Page 19: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

THE DATA MODEL:PROPERTY GRAPHS

Page 20: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

The data model is important as it must be:

◮ Flexible enough to accomodate scenarios of practical interest

◮ Simple enough to allow for a clean presentation

◮ Expressive enough for theoretical issues to appear in full force

Page 21: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

The data model is important as it must be:

◮ Flexible enough to accomodate scenarios of practical interest

◮ Simple enough to allow for a clean presentation

◮ Expressive enough for theoretical issues to appear in full force

This is accomplished by the model of property graphs

Page 22: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

A bit of history and context is needed

Page 23: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

A bit of history and context is needed

The simplest possible graph data model: Directed graphs

Page 24: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

A bit of history and context is needed

The simplest possible graph data model: Directed graphs

Clint EastwoodThe Bridges of

Madison County

Meryl Streep Out of Africa

Page 25: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Any problems with this data model?

Page 26: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Any problems with this data model?

It does not allow to classify relationships

Page 27: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Any problems with this data model?

It does not allow to classify relationships

Clint EastwoodThe Bridges of

Madison County

C. Eastwood acts in and directed this movie

Page 28: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Edge-labeled graphs

Allow for such classification

Page 29: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Edge-labeled graphs

Allow for such classification

Clint EastwoodThe Bridges of

Madison County

acts in

directs

Meryl Streepacts in

Page 30: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Formal definition of edge-labeled graphs

An edge-labelled graph G is a pair (V ,E ), where:

1. V is a finite set of vertices (or nodes).

2. E is a finite set of labeled edges; formally,

E ⊆ V × Lab× V ,

where Lab is a set of labels.

Page 31: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

What if actors play different roles in movies?

Page 32: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

What if actors play different roles in movies?

Peter Sellers

Lionel Mandrake

plays

Merkin Muffley

plays

Dr.Strangelove

movie

movie

18 minutes

34 minutes

screentime

screentime

Page 33: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Any problems with this approach?

Page 34: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Any problems with this approach?

We might need to modify the schema every time we add attributes

Page 35: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Any problems with this approach?

We might need to modify the schema every time we add attributes

More convenient to allow attributes in nodes/edges explicitly

Page 36: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Any problems with this approach?

We might need to modify the schema every time we add attributes

More convenient to allow attributes in nodes/edges explicitly

◮ This corresponds to what property graphs do

Page 37: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

A property graph

role = Scorpio

ref = IMDb

name = Clint Eastwood

gender = male

n1 : Person

title = The Bridges of

Madison County

n2 : Movie

role = Robert Kincaid

ref = IMDb

e1 : acts in

e2 : directs

title = Dirty Harry

n5 : Movie

role = Harry Callahan

ref = IMDb

e5 : acts in

name = Andy Robinson

gender = male

n6 : Person

e6 : acts in

Page 38: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Another property graph

firstName = Julie

lastName = Freud

country = Chile

n1 : Person

firstName = John

lastName = Cook

gender = male

country = Chile

n2 : Person

e1 : knows

e2 : knows

name = U2

n3 : Tag

e3 : hasInterest

content = I love U2

language = en

n4 : Post

e4 : hasTag

content = Queen is awesome

n5 : Post

date = 14-09-15

e5 : likes

date = 15-03-14

e6 : likes

date = 23-10-15

e7 : likes

Page 39: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Yet another property graph

authored by

Author

name: Leonid Libkin

affil: U. of Edinburgh

Author

name: Robert Tarjan

affil: Princeton Univ.

Article

name: jacmHopcroft74

year: 1974

Venue

name: FOCS67

authored by

corresponding: YES

Author

Author

Author

Author

name: John Hopcroft

affil: Cornell Univ.

name: Jeff Ullman

affil: Stanford Univ.

name: Ron Fagin

affil: IBM Almaden

name: Moshe Vardi

affil: Rice University

Author

name: Limsoon Wong

affil: NU Singapore

Article

name: FOCSHopcroft67

year: 1967

Article

name: PODSUllman89

year: 1989

Article

name: PODSFaginUV83

year: 1983

Article

name: PODSVardi95

year: 1995

Article

name: PODSLibkin95

year: 1995

Article

name: IPLLibkinW95

year: 1995

Journal

name: JACM

Venue

Venue

Venue

Journal

name: PODS89

name: PODS83

name: PODS95

name: IPL

journal

journal

part of

part of

part of

part of

Conference

Conference

name: FOCS

name: PODS

series

series

part of

serie

s

series

authored by

authored by

authored by

authored by

authored by

corresponding: YES

corresponding: YES

authored by

authored by

cooresponding: YES

cooresponding: YES

authored by

authored by

authored by

cooresponding: YES

Page 40: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

What is a property graph then?

◮ It is a graph

◮ It is directed

◮ It is labeled in nodes and edges

◮ Nodes and edges can be attributed

Page 41: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

What is a property graph then?

◮ It is a graph

◮ It is directed

◮ It is labeled in nodes and edges

◮ Nodes and edges can be attributed

Page 42: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

What is a property graph then?

◮ It is a graph

◮ It is directed

◮ It is labeled in nodes and edges

◮ Nodes and edges can be attributed

Page 43: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

What is a property graph then?

◮ It is a graph

◮ It is directed

◮ It is labeled in nodes and edges

◮ Nodes and edges can be attributed

Page 44: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

What is a property graph then?

◮ It is a graph

◮ It is directed

◮ It is labeled in nodes and edges

◮ Nodes and edges can be attributed

Page 45: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

A formal definition

A property graph G is a tuple (V ,E , ρ, λ, σ), where:

1. V is a finite set of vertices (or nodes).

2. E is a finite set of edges.

3. ρ : E → (V × V ) is a total function.◮ ρ(e) = (v1, v2) indicates that e is an edge from v1 to v2.

4. λ : (V ∪ E )→ Lab is a total function with Lab a set of labels.◮ If λ(m) = ℓ, then ℓ is the label of node or edge m.

5. σ : (V ∪ E )× Prop→ Val is a partial functionwith Prop a finite set of properties and Val a set of values.

◮ If σ(m, p) = v , then v is the value of property p in m.

Page 46: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

A formal definition

A property graph G is a tuple (V ,E , ρ, λ, σ), where:

1. V is a finite set of vertices (or nodes).

2. E is a finite set of edges.

3. ρ : E → (V × V ) is a total function.◮ ρ(e) = (v1, v2) indicates that e is an edge from v1 to v2.

4. λ : (V ∪ E )→ Lab is a total function with Lab a set of labels.◮ If λ(m) = ℓ, then ℓ is the label of node or edge m.

5. σ : (V ∪ E )× Prop→ Val is a partial functionwith Prop a finite set of properties and Val a set of values.

◮ If σ(m, p) = v , then v is the value of property p in m.

Page 47: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

A formal definition

A property graph G is a tuple (V ,E , ρ, λ, σ), where:

1. V is a finite set of vertices (or nodes).

2. E is a finite set of edges.

3. ρ : E → (V × V ) is a total function.◮ ρ(e) = (v1, v2) indicates that e is an edge from v1 to v2.

4. λ : (V ∪ E )→ Lab is a total function with Lab a set of labels.◮ If λ(m) = ℓ, then ℓ is the label of node or edge m.

5. σ : (V ∪ E )× Prop→ Val is a partial functionwith Prop a finite set of properties and Val a set of values.

◮ If σ(m, p) = v , then v is the value of property p in m.

Page 48: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

A formal definition

A property graph G is a tuple (V ,E , ρ, λ, σ), where:

1. V is a finite set of vertices (or nodes).

2. E is a finite set of edges.

3. ρ : E → (V × V ) is a total function.◮ ρ(e) = (v1, v2) indicates that e is an edge from v1 to v2.

4. λ : (V ∪ E )→ Lab is a total function with Lab a set of labels.◮ If λ(m) = ℓ, then ℓ is the label of node or edge m.

5. σ : (V ∪ E )× Prop→ Val is a partial functionwith Prop a finite set of properties and Val a set of values.

◮ If σ(m, p) = v , then v is the value of property p in m.

Page 49: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

A formal definition

A property graph G is a tuple (V ,E , ρ, λ, σ), where:

1. V is a finite set of vertices (or nodes).

2. E is a finite set of edges.

3. ρ : E → (V × V ) is a total function.◮ ρ(e) = (v1, v2) indicates that e is an edge from v1 to v2.

4. λ : (V ∪ E )→ Lab is a total function with Lab a set of labels.◮ If λ(m) = ℓ, then ℓ is the label of node or edge m.

5. σ : (V ∪ E )× Prop→ Val is a partial functionwith Prop a finite set of properties and Val a set of values.

◮ If σ(m, p) = v , then v is the value of property p in m.

Page 50: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

A formal definition

A property graph G is a tuple (V ,E , ρ, λ, σ), where:

1. V is a finite set of vertices (or nodes).

2. E is a finite set of edges.

3. ρ : E → (V × V ) is a total function.◮ ρ(e) = (v1, v2) indicates that e is an edge from v1 to v2.

4. λ : (V ∪ E )→ Lab is a total function with Lab a set of labels.◮ If λ(m) = ℓ, then ℓ is the label of node or edge m.

5. σ : (V ∪ E )× Prop→ Val is a partial functionwith Prop a finite set of properties and Val a set of values.

◮ If σ(m, p) = v , then v is the value of property p in m.

Page 51: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

A formal definition

A property graph G is a tuple (V ,E , ρ, λ, σ), where:

1. V is a finite set of vertices (or nodes).

2. E is a finite set of edges.

3. ρ : E → (V × V ) is a total function.◮ ρ(e) = (v1, v2) indicates that e is an edge from v1 to v2.

4. λ : (V ∪ E )→ Lab is a total function with Lab a set of labels.◮ If λ(m) = ℓ, then ℓ is the label of node or edge m.

5. σ : (V ∪ E )× Prop→ Val is a partial functionwith Prop a finite set of properties and Val a set of values.

◮ If σ(m, p) = v , then v is the value of property p in m.

Page 52: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

A formal definition

A property graph G is a tuple (V ,E , ρ, λ, σ), where:

1. V is a finite set of vertices (or nodes).

2. E is a finite set of edges.

3. ρ : E → (V × V ) is a total function.◮ ρ(e) = (v1, v2) indicates that e is an edge from v1 to v2.

4. λ : (V ∪ E )→ Lab is a total function with Lab a set of labels.◮ If λ(m) = ℓ, then ℓ is the label of node or edge m.

5. σ : (V ∪ E )× Prop→ Val is a partial functionwith Prop a finite set of properties and Val a set of values.

◮ If σ(m, p) = v , then v is the value of property p in m.

Page 53: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

A formal definition

A property graph G is a tuple (V ,E , ρ, λ, σ), where:

1. V is a finite set of vertices (or nodes).

2. E is a finite set of edges.

3. ρ : E → (V × V ) is a total function.◮ ρ(e) = (v1, v2) indicates that e is an edge from v1 to v2.

4. λ : (V ∪ E )→ Lab is a total function with Lab a set of labels.◮ If λ(m) = ℓ, then ℓ is the label of node or edge m.

5. σ : (V ∪ E )× Prop→ Val is a partial functionwith Prop a finite set of properties and Val a set of values.

◮ If σ(m, p) = v , then v is the value of property p in m.

Page 54: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

How would you represent property graphs?

Page 55: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

How would you represent property graphs?

Labeled graphs: set of triples

Property graphs:set of triples + special triples describing property/value pairs of objects

Page 56: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

GRAPH PATTERNS

Page 57: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Basic graph patterns are:

The basic unit for querying property graphs

Page 58: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Basic graph patterns are:

The basic unit for querying property graphs

A graph-structured query that is to be matched against a property graph

Page 59: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Basic graph patterns are:

The basic unit for querying property graphs

A graph-structured query that is to be matched against a property graph

Definition of basic graph pattern (bgp):

◮ A property graph that uses variables and constants

◮ Variables can appear as:node and edge names, labels, properties, and values

Page 60: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Examples of basic graph patterns

x3

x1 x2

acts in acts in

firstName = x1

lastName = x2

x10 : Person

firstName = x3

lastName = x4

x11 : Person

x12 : knows

x13 : knows

x8 = x9

x16 : x7

date = x5

x14 : likes

date = x6

x15 : likes

Page 61: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Evaluation of bgps

Set p(G ) of all possible “matchings” of bgp p in property graph G

Page 62: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Evaluation of bgps

Set p(G ) of all possible “matchings” of bgp p in property graph G

Matching:

◮ Replacement of variables in p by constants such that resultingproperty graph G ′ is “contained” in G

Page 63: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Example of evaluation

Page 64: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Example of evaluation

name = Clint Eastwood

x1 : Person

title = x5

x4 : Movie

x2 : x3title = x9

x8 : Movie

x6 : x7

Page 65: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Example of evaluation

name = Clint Eastwood

x1 : Person

title = x5

x4 : Movie

x2 : x3title = x9

x8 : Movie

x6 : x7

x1 x2 x3 x4 x5 x6 x7 x8 x9

n1 e2 directs n2 The Bridges of Madison C. e5 acts in n5 Dirty Harry

n1 e5 acts in n5 Dirty Harry e2 directs n2 The Bridges of Madison C.

n1 e1 acts in n2 The Bridges of Madison C. e5 acts in n5 Dirty Harry

n1 e5 acts in n5 Dirty Harry e1 acts in n2 The Bridges of Madison C.

n1 e1 acts in n2 The Bridges of Madison C. e2 directs n2 The Bridges of Madison C.

n1 e2 directs n2 The Bridges of Madison C. e1 acts in n2 The Bridges of Madison C.

n1 e1 acts in n2 The Bridges of Madison C. e1 acts in n2 The Bridges of Madison C.

n1 e2 directs n2 The Bridges of Madison C. e2 directs n2 The Bridges of Madison C.

n1 e5 acts in n5 Dirty Harry e5 acts in n5 Dirty Harry

Page 66: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Which semantic is the right semantics?

◮ Homomorphism-based semantics: All matchings are valid

◮ Isomorphism-based semantics: Only 1-1 matchings are valid

◮ Node/edge injective semantics: Self-defining

Page 67: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Which semantic is the right semantics?

◮ Homomorphism-based semantics: All matchings are valid

◮ Isomorphism-based semantics: Only 1-1 matchings are valid

◮ Node/edge injective semantics: Self-defining

Page 68: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Which semantic is the right semantics?

◮ Homomorphism-based semantics: All matchings are valid

◮ Isomorphism-based semantics: Only 1-1 matchings are valid

◮ Node/edge injective semantics: Self-defining

Page 69: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Which semantic is the right semantics?

◮ Homomorphism-based semantics: All matchings are valid

◮ Isomorphism-based semantics: Only 1-1 matchings are valid

◮ Node/edge injective semantics: Self-defining

Page 70: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Different semantics by example

Homomorphism-based semantics:

x1 x2 x3 x4 x5 x6 x7 x8 x9

n1 e2 directs n2 The Bridges of Madison C. e5 acts in n5 Dirty Harry X

n1 e5 acts in n5 Dirty Harry e2 directs n2 The Bridges of Madison C. X

n1 e1 acts in n2 The Bridges of Madison C. e5 acts in n5 Dirty Harry X

n1 e5 acts in n5 Dirty Harry e1 acts in n2 The Bridges of Madison C. X

n1 e1 acts in n2 The Bridges of Madison C. e2 directs n2 The Bridges of Madison C. X

n1 e2 directs n2 The Bridges of Madison C. e1 acts in n2 The Bridges of Madison C. X

n1 e1 acts in n2 The Bridges of Madison C. e1 acts in n2 The Bridges of Madison C. X

n1 e2 directs n2 The Bridges of Madison C. e2 directs n2 The Bridges of Madison C. X

n1 e5 acts in n5 Dirty Harry e5 acts in n5 Dirty Harry X

Page 71: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Different semantics by example

Isomorphism-based semantics:

x1 x2 x3 x4 x5 x6 x7 x8 x9

n1 e2 directs n2 The Bridges of Madison C. e5 acts in n5 Dirty Harry X

n1 e5 acts in n5 Dirty Harry e2 directs n2 The Bridges of Madison C. X

n1 e1 acts in n2 The Bridges of Madison C. e5 acts in n5 Dirty Harry ✗

n1 e5 acts in n5 Dirty Harry e1 acts in n2 The Bridges of Madison C. ✗

n1 e1 acts in n2 The Bridges of Madison C. e2 directs n2 The Bridges of Madison C. ✗

n1 e2 directs n2 The Bridges of Madison C. e1 acts in n2 The Bridges of Madison C. ✗

n1 e1 acts in n2 The Bridges of Madison C. e1 acts in n2 The Bridges of Madison C. ✗

n1 e2 directs n2 The Bridges of Madison C. e2 directs n2 The Bridges of Madison C. ✗

n1 e5 acts in n5 Dirty Harry e5 acts in n5 Dirty Harry ✗

Page 72: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Different semantics by example

Node-injective semantics:

x1 x2 x3 x4 x5 x6 x7 x8 x9

n1 e2 directs n2 The Bridges of Madison C. e5 acts in n5 Dirty Harry X

n1 e5 acts in n5 Dirty Harry e2 directs n2 The Bridges of Madison C. X

n1 e1 acts in n2 The Bridges of Madison C. e5 acts in n5 Dirty Harry X

n1 e5 acts in n5 Dirty Harry e1 acts in n2 The Bridges of Madison C. X

n1 e1 acts in n2 The Bridges of Madison C. e2 directs n2 The Bridges of Madison C. ✗

n1 e2 directs n2 The Bridges of Madison C. e1 acts in n2 The Bridges of Madison C. ✗

n1 e1 acts in n2 The Bridges of Madison C. e1 acts in n2 The Bridges of Madison C. ✗

n1 e2 directs n2 The Bridges of Madison C. e2 directs n2 The Bridges of Madison C. ✗

n1 e5 acts in n5 Dirty Harry e5 acts in n5 Dirty Harry ✗

Page 73: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Different semantics by example

Edge-injective semantics:

x1 x2 x3 x4 x5 x6 x7 x8 x9

n1 e2 directs n2 The Bridges of Madison C. e5 acts in n5 Dirty Harry X

n1 e5 acts in n5 Dirty Harry e2 directs n2 The Bridges of Madison C. X

n1 e1 acts in n2 The Bridges of Madison C. e5 acts in n5 Dirty Harry X

n1 e5 acts in n5 Dirty Harry e1 acts in n2 The Bridges of Madison C. X

n1 e1 acts in n2 The Bridges of Madison C. e2 directs n2 The Bridges of Madison C. X

n1 e2 directs n2 The Bridges of Madison C. e1 acts in n2 The Bridges of Madison C. X

n1 e1 acts in n2 The Bridges of Madison C. e1 acts in n2 The Bridges of Madison C. ✗

n1 e2 directs n2 The Bridges of Madison C. e2 directs n2 The Bridges of Madison C. ✗

n1 e5 acts in n5 Dirty Harry e5 acts in n5 Dirty Harry ✗

Page 74: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

To make bgps usable we need more features

◮ Projection: Only some variables appear in the output, i.e., π(p(G ))

◮ Filter: Prune based on the values of variables, i.e., σ(π(G ))

◮ Union: Take the union of two bgps, i.e., p1(G ) ∪ p2(G )

◮ Difference: Take the difference of two bgps, i.e., p1(G ) \ p2(G )

◮ Optional: Extend matchings of p1 with those in p2, if possible

Page 75: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

To make bgps usable we need more features

◮ Projection: Only some variables appear in the output, i.e., π(p(G ))

◮ Filter: Prune based on the values of variables, i.e., σ(π(G ))

◮ Union: Take the union of two bgps, i.e., p1(G ) ∪ p2(G )

◮ Difference: Take the difference of two bgps, i.e., p1(G ) \ p2(G )

◮ Optional: Extend matchings of p1 with those in p2, if possible

Page 76: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

To make bgps usable we need more features

◮ Projection: Only some variables appear in the output, i.e., π(p(G ))

◮ Filter: Prune based on the values of variables, i.e., σ(π(G ))

◮ Union: Take the union of two bgps, i.e., p1(G ) ∪ p2(G )

◮ Difference: Take the difference of two bgps, i.e., p1(G ) \ p2(G )

◮ Optional: Extend matchings of p1 with those in p2, if possible

Page 77: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

To make bgps usable we need more features

◮ Projection: Only some variables appear in the output, i.e., π(p(G ))

◮ Filter: Prune based on the values of variables, i.e., σ(π(G ))

◮ Union: Take the union of two bgps, i.e., p1(G ) ∪ p2(G )

◮ Difference: Take the difference of two bgps, i.e., p1(G ) \ p2(G )

◮ Optional: Extend matchings of p1 with those in p2, if possible

Page 78: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

To make bgps usable we need more features

◮ Projection: Only some variables appear in the output, i.e., π(p(G ))

◮ Filter: Prune based on the values of variables, i.e., σ(π(G ))

◮ Union: Take the union of two bgps, i.e., p1(G ) ∪ p2(G )

◮ Difference: Take the difference of two bgps, i.e., p1(G ) \ p2(G )

◮ Optional: Extend matchings of p1 with those in p2, if possible

Page 79: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

To make bgps usable we need more features

◮ Projection: Only some variables appear in the output, i.e., π(p(G ))

◮ Filter: Prune based on the values of variables, i.e., σ(π(G ))

◮ Union: Take the union of two bgps, i.e., p1(G ) ∪ p2(G )

◮ Difference: Take the difference of two bgps, i.e., p1(G ) \ p2(G )

◮ Optional: Extend matchings of p1 with those in p2, if possible

Page 80: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Projection

Some variables in the output are distinguished

Page 81: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Projection

Some variables in the output are distinguished

Evaluation under projection:

◮ Consider all possible matchings of the graph pattern

◮ Remove columns that do not correspond to distinguished variables

Page 82: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Example of projection

name = Clint Eastwood

x1 : Person

title = x5

x4 : Movie

x2 : x3title = x9

x8 : Movie

x6 : x7

Page 83: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Example of projection

name = Clint Eastwood

x1 : Person

title = x5

x4 : Movie

x2 : x3title = x9

x8 : Movie

x6 : x7

x1 x2 x3 x4 x5 x6 x7 x8 x9

n1 e2 directs n2 The Bridges of Madison C. e5 acts in n5 Dirty Harry

n1 e5 acts in n5 Dirty Harry e2 directs n2 The Bridges of Madison C.

n1 e1 acts in n2 The Bridges of Madison C. e5 acts in n5 Dirty Harry

n1 e5 acts in n5 Dirty Harry e1 acts in n2 The Bridges of Madison C.

n1 e1 acts in n2 The Bridges of Madison C. e2 directs n2 The Bridges of Madison C.

n1 e2 directs n2 The Bridges of Madison C. e1 acts in n2 The Bridges of Madison C.

n1 e1 acts in n2 The Bridges of Madison C. e1 acts in n2 The Bridges of Madison C.

n1 e2 directs n2 The Bridges of Madison C. e2 directs n2 The Bridges of Madison C.

n1 e5 acts in n5 Dirty Harry e5 acts in n5 Dirty Harry

Page 84: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Example of projection

name = Clint Eastwood

x1 : Person

title = x5

x4 : Movie

x2 : x3title = x9

x8 : Movie

x6 : x7

x3 x5 x7 x9

directs The Bridges of Madison C. acts in Dirty Harry

acts in Dirty Harry directs The Bridges of Madison C.

acts in The Bridges of Madison C. acts in Dirty Harry

acts in Dirty Harry acts in The Bridges of Madison C.

acts in The Bridges of Madison C. directs The Bridges of Madison C.

directs The Bridges of Madison C. acts in The Bridges of Madison C.

acts in The Bridges of Madison C. acts in The Bridges of Madison C.

directs The Bridges of Madison C. directs The Bridges of Madison C.

acts in Dirty Harry acts in Dirty Harry

Page 85: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Example of filter

name = Clint Eastwood

x1 : Person

title = x5

year = z

x4 : Movie

x2 : x3title = x9

year = y

x8 : Movie

x6 : x7

FILTER x5 6= x9 AND y < 1985 AND z < 1985

Page 86: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Examples of union and difference

◮ Movies in which C. Eastwood acted or directed

◮ Movies in which C. Eastwood acted but not directed

Page 87: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Examples of union and difference

◮ Movies in which C. Eastwood acted or directed

◮ Movies in which C. Eastwood acted but not directed

Page 88: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Examples of union and difference

◮ Movies in which C. Eastwood acted or directed

◮ Movies in which C. Eastwood acted but not directed

Page 89: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

OPTIONAL operator

Allows to match parts of the data only if available

◮ Important in the context of semistructured data

◮ Developed by the RDF community

◮ Corresponds to left-outer join in relational algebra

Page 90: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

An example with OPTIONAL

The property graph

evaluated in

Paper

year: 1985

name: AKV85

Conference

name: STOC85

published in

Paper

year: 1985

name: Saf85

Review

text: "This paper..."

publ

ished

in

Page 91: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

An example with OPTIONAL

The property graph

evaluated in

Paper

year: 1985

name: AKV85

Conference

name: STOC85

published in

Paper

year: 1985

name: Saf85

Review

text: "This paper..."

publ

ished

in

The pattern with optional

(x ,published in, y) OPTIONAL (y , evaluated in, z)

Page 92: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

An example with OPTIONAL

A matching

name: STOC85

Paper

year: 1985

name: AKV85

published in

Paper

year: 1985

name: Saf85

Review

text: "This paper..."

publ

ished

in

evaluated in

Conference

The pattern with optional

(x ,published in, y) OPTIONAL (x , evaluated in, z)

Page 93: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

An example with OPTIONAL

Another matching

name: STOC85

published in

Paper

year: 1985

name: Saf85

Review

text: "This paper..."

publ

ished

in

evaluated in

Paper

year: 1985

name: AKV85

Conference

The pattern with optional

(x ,published in, y) OPTIONAL (x , evaluated in, z)

Page 94: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Pattern trees

A way to formalize bgps + optional operator

Page 95: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Pattern trees

A way to formalize bgps + optional operator

A well-designed pattern tree (WDPT) is a tuple (T , λ) such that:

1. T is a rooted tree

2. λ maps each node t in T to a bgp

3. for every variable y , the set of nodes of T where y appears isconnected (well-designedness)

Page 96: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Example of WDPT

WDPT t

{(x , rec by , y), (x , publ in, “after2010”)}

{(x ,NMErank , z)} {(y , formed in, z ′)}

Page 97: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Example of WDPT

WDPT t

{(x , rec by , y), (x , publ in, “after2010”)}

{(x ,NMErank , z)} {(y , formed in, z ′)}

It represents the following graph pattern:

((x , rec by , y), (x , publ in, “after2010”) OPT (x ,NMErank , z)

)

OPT (y , formed in, z ′)

Page 98: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Evaluation of WDPTs

Consider WDPT t = (T , λ) with root r and subtree T ′ of T rooted in r :

Page 99: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Evaluation of WDPTs

Consider WDPT t = (T , λ) with root r and subtree T ′ of T rooted in r :

pT ′ := bgp formed by all triples in the nodes of T ′.

Page 100: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Evaluation of WDPTs

Consider WDPT t = (T , λ) with root r and subtree T ′ of T rooted in r :

pT ′ := bgp formed by all triples in the nodes of T ′.

A matching from p to G is a replacement hof some variables of p with constants such that:

1. There is subtree T ′ of T rooted in r such thath is a matching from pT ′ to G .

2. h cannot be extended any further.

Page 101: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

COMPLEXITY OF EVALUATION FOR BGPS

Page 102: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

A simplified scenario

◮ We consider edge-labeled graphs only

◮ Variables only occur as node names

◮ Bgps are equipped with projection

Page 103: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

The decision problem we study

PROBLEM : Evaluation

INPUT : An edge-labeled graph G and a bgp p

QUESTION : Is it the case that p(G ) 6= ∅?

Page 104: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

The decision problem we study

PROBLEM : Evaluation

INPUT : An edge-labeled graph G and a bgp p

QUESTION : Is it the case that p(G ) 6= ∅?

Recall that:

◮ p(G ) 6= ∅ ⇐⇒ there is a matching from p to G .

◮ Matching: replacement of variables in p by constants such thatresulting graph G ′ is contained in G (homomorphism)

Page 105: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

How many such replacements we have to consider?

Page 106: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

How many such replacements we have to consider?

At most nm, where:

◮ n = number of nodes in G

◮ m = number of variables in p

Page 107: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

How many such replacements we have to consider?

At most nm, where:

◮ n = number of nodes in G

◮ m = number of variables in p

This is exponential in the size of the input!

Page 108: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

How many such replacements we have to consider?

At most nm, where:

◮ n = number of nodes in G

◮ m = number of variables in p

This is exponential in the size of the input!

But can we do better?

Page 109: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

How many such replacements we have to consider?

At most nm, where:

◮ n = number of nodes in G

◮ m = number of variables in p

This is exponential in the size of the input!

But can we do better?

◮ Most likely no, under complexity-theoretical assumptions

Page 110: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

The complexity of evaluation

Proposition (Chandra and Merlin, 1977)

The evaluation problem is NP-complete

Page 111: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

The complexity of evaluation

Proposition (Chandra and Merlin, 1977)

The evaluation problem is NP-complete

◮ For the upper bound:Guess a replacement, check it is a matching

◮ For the lower bound:Reduce from 3-colorability

Page 112: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Proof of the lower bound

Given an undirected graph G = (V ,E ), where:

◮ V = {1, . . . , n} and E ⊆ V × V (symmetric)

Page 113: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Proof of the lower bound

Given an undirected graph G = (V ,E ), where:

◮ V = {1, . . . , n} and E ⊆ V × V (symmetric)

Represent G as the bgp p:

{(xi ,E , xj ) | (i , j) ∈ E}

Page 114: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Proof of the lower bound

Given an undirected graph G = (V ,E ), where:

◮ V = {1, . . . , n} and E ⊆ V × V (symmetric)

Represent G as the bgp p:

{(xi ,E , xj ) | (i , j) ∈ E}

Consider edge-labeled graph G ′ given as:

{(a,E , r ), (r ,E , a), (a,E , g ), (g ,E , a), (g ,E , r ), (r ,E , g)}

Page 115: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Proof of the lower bound

Given an undirected graph G = (V ,E ), where:

◮ V = {1, . . . , n} and E ⊆ V × V (symmetric)

Represent G as the bgp p:

{(xi ,E , xj ) | (i , j) ∈ E}

Consider edge-labeled graph G ′ given as:

{(a,E , r ), (r ,E , a), (a,E , g ), (g ,E , a), (g ,E , r ), (r ,E , g)}

=⇒ p(G ′) 6= ∅ iff G is 3-colorable

Page 116: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Does this proof represent a practical situation?

Page 117: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Does this proof represent a practical situation?

The bgp p is bigger than the edge-labeled graph G ′

◮ In practice, G ′ is large and p is much smaller

Page 118: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Does this proof represent a practical situation?

The bgp p is bigger than the edge-labeled graph G ′

◮ In practice, G ′ is large and p is much smaller

Question:Should p and G be be assigned the same “relevance” in evaluation?

Page 119: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

A radical solution

Consider only G to be part of the input: The bgp p is fixed

Page 120: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

A radical solution

Consider only G to be part of the input: The bgp p is fixed

This corresponds to the notion of data complexity (Vardi, 1981)

Page 121: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

A radical solution

Consider only G to be part of the input: The bgp p is fixed

This corresponds to the notion of data complexity (Vardi, 1981)

Data complexity of bgp evaluation:

◮ Can be solved in polynomial time O(|G ||p|), since now p is fixed

◮ Actually, it belongs to a low complexity class inside LOGSPACE

◮ Evaluation of bgps in data complexity is “parallelizable”

Page 122: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

A radical solution

Consider only G to be part of the input: The bgp p is fixed

This corresponds to the notion of data complexity (Vardi, 1981)

Data complexity of bgp evaluation:

◮ Can be solved in polynomial time O(|G ||p|), since now p is fixed

◮ Actually, it belongs to a low complexity class inside LOGSPACE

◮ Evaluation of bgps in data complexity is “parallelizable”

Data complexity: good explanation to success of evaluation in practice

Page 123: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Still, if G is very large...

... a running time of O(|G ||p|) is not so good

Page 124: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Still, if G is very large...

... a running time of O(|G ||p|) is not so good

A different idea:

◮ Understand the structure of bgps used in practice

◮ We have good indications that many of them are nearly-acyclic

Page 125: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Acyclic bgps

Those whose underlying undirected graph is acyclic

Page 126: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Acyclic bgps

Those whose underlying undirected graph is acyclic

◮ (x , a, y), (y , b, z), (x , c , z ′) is acyclic

◮ (x , a, y), (y , b, z), (x , c , z ′), (z ′, d , x), (x , e, x) is also acyclic

◮ (x , a, y), (y , b, z), (x , c , z) is not acyclic

Page 127: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Evaluation of acyclic bgps

Theorem (Yannakakis, 1981)

Acyclic bgps can be evaluated in linear time O(|G | · |p|).

Page 128: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Evaluation of acyclic bgps

Theorem (Yannakakis, 1981)

Acyclic bgps can be evaluated in linear time O(|G | · |p|).

Idea of the proof:

◮ Bottom-up dynamic programming algorithm

◮ Labels each node v with the set

{h(v) | h is a matching of subtree of p rooted in v}

Page 129: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

In practice bgps are not acyclic, but “nearly” acyclic

Question: How do we measure the degree of acyclicity of a bgp?

Page 130: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

In practice bgps are not acyclic, but “nearly” acyclic

Question: How do we measure the degree of acyclicity of a bgp?

Approach taken in the literature:

◮ Use notion of treewidth: measures the degree of acyclicity of a graph

◮ Measure the degree of acyclicity of the underlying graph of the bgp

Page 131: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Treewidth of a graph

Measures how much an undirected graph “resembles” a tree:

◮ Acyclic graphs have treewidth one

◮ Cycles have treewidth two

◮ The complete graph on k elements has treewidth (k − 1)

◮ The k × k grid has treewidth k

Page 132: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Tree decompositions

Notion used to define treewidth

◮ It decomposes a graph as a tree while preserving connectivity

Page 133: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Tree decompositions

Notion used to define treewidth

◮ It decomposes a graph as a tree while preserving connectivity

A tree decomposition of G = (V ,E ) is a pair (T , λ) such that:

◮ T is a tree

◮ λ assigns a set of nodes in V to each node t in T

◮ If {u, v} ∈ E , then u, v appear together in one such λ(t)

◮ For u ∈ V , the set of λ(t)’s that contain u is connected

Page 134: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Tree decompositions

Notion used to define treewidth

◮ It decomposes a graph as a tree while preserving connectivity

A tree decomposition of G = (V ,E ) is a pair (T , λ) such that:

◮ T is a tree

◮ λ assigns a set of nodes in V to each node t in T

◮ If {u, v} ∈ E , then u, v appear together in one such λ(t)

◮ For u ∈ V , the set of λ(t)’s that contain u is connected

The width of (T , λ) is(max {|λ(t)| : t ∈ T} − 1

)

Page 135: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Definition of treewidth

The treewidth of G = (V ,E ) is

◮ The minimum width of its tree decompositions

Page 136: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Definition of treewidth

The treewidth of G = (V ,E ) is

◮ The minimum width of its tree decompositions

The treewidth of a bgp is the treewidth of its underlying graph

Page 137: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Bgps of small treewidth are well-behaved

Theorem (Chekuri and Rajaraman, 1997)

Evaluation for bgps of treewidth k can be solved in time

O(|G |k · |p|).

Page 138: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Bgps of small treewidth are well-behaved

Theorem (Chekuri and Rajaraman, 1997)

Evaluation for bgps of treewidth k can be solved in time

O(|G |k · |p|).

Idea of the proof:

◮ Compute a tree decomposition (T , λ) of p of width k

◮ Perform a dyn programming procedure on (T , λ) as for acyclic bgps

Page 139: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Can one improve on bounded treewidth?

Page 140: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Can one improve on bounded treewidth?

Grohe’s Theorem (Grohe, 2002): The following are equivalent

◮ A class C of bgps is tractable

◮ The treewidth of the bgps in C is bounded modulo equivalence

Page 141: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

OPTIMIZATION OF BGPS

Page 142: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

A theoretical notion of optimization

Find an equivalent bgp of minimal size

Page 143: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

A theoretical notion of optimization

Find an equivalent bgp of minimal size

Bgps p and p′ are equivalent if

◮ p(G ) = p′(G ), over every edge-labeled graph

Page 144: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

A theoretical notion of optimization

Find an equivalent bgp of minimal size

Bgps p and p′ are equivalent if

◮ p(G ) = p′(G ), over every edge-labeled graph

We assume that bgps are equipped with projection

◮ We write p(x) to denote that x are the distinguished variables of p

Page 145: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Equivalence of bgps is the same than evaluation

Theorem (Chandra and Merlin, 1977)

The following are equivalent for bgps p(x) and p′(x ′):

◮ p(x) and p′(x ′) are equivalent

◮ x ∈ p′(p) and x ′ ∈ p(p′)

Page 146: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Equivalence of bgps is the same than evaluation

Theorem (Chandra and Merlin, 1977)

The following are equivalent for bgps p(x) and p′(x ′):

◮ p(x) and p′(x ′) are equivalent

◮ x ∈ p′(p) and x ′ ∈ p(p′)

As a corollary:

◮ Checking equivalence of bgps is NP-complete

Page 147: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Equivalence of bgps is the same than evaluation

Theorem (Chandra and Merlin, 1977)

The following are equivalent for bgps p(x) and p′(x ′):

◮ p(x) and p′(x ′) are equivalent

◮ x ∈ p′(p) and x ′ ∈ p(p′)

As a corollary:

◮ Checking equivalence of bgps is NP-complete

This is considered to be a positive result (bgps are small)

Page 148: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Minimization of bgps

Theorem (Hell and Nesetril, 1992)

A minimal equivalent bgp of p is

◮ unique (up to renaming of variables)

◮ always contained in p

◮ corresponds to the core of p

Page 149: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Minimization of bgps

Theorem (Hell and Nesetril, 1992)

A minimal equivalent bgp of p is

◮ unique (up to renaming of variables)

◮ always contained in p

◮ corresponds to the core of p

As a corollary:

◮ Minimization of bgps can be carried out in PNP (feasible)

Page 150: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

NAVIGATION

Page 151: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Graphs are there to be navigated

Traversing the edges of the graph recursively is useful for:

◮ Detecting implicit relations over the data

◮ Allowing more flexibility at the moment of querying

Page 152: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Recall our property graph?

authored by

Author

name: Leonid Libkin

affil: U. of Edinburgh

Author

name: Robert Tarjan

affil: Princeton Univ.

Article

name: jacmHopcroft74

year: 1974

Venue

name: FOCS67

authored by

corresponding: YES

Author

Author

Author

Author

name: John Hopcroft

affil: Cornell Univ.

name: Jeff Ullman

affil: Stanford Univ.

name: Ron Fagin

affil: IBM Almaden

name: Moshe Vardi

affil: Rice University

Author

name: Limsoon Wong

affil: NU Singapore

Article

name: FOCSHopcroft67

year: 1967

Article

name: PODSUllman89

year: 1989

Article

name: PODSFaginUV83

year: 1983

Article

name: PODSVardi95

year: 1995

Article

name: PODSLibkin95

year: 1995

Article

name: IPLLibkinW95

year: 1995

Journal

name: JACM

Venue

Venue

Venue

Journal

name: PODS89

name: PODS83

name: PODS95

name: IPL

journal

journal

part of

part of

part of

part of

Conference

Conference

name: FOCS

name: PODS

series

series

part of

serie

s

series

authored by

authored by

authored by

authored by

authored by

corresponding: YES

corresponding: YES

authored by

authored by

cooresponding: YES

cooresponding: YES

authored by

authored by

authored by

cooresponding: YES

Page 153: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

An example of navigation in such graph

Find pairs of authors linked by a coauthorship sequence

Page 154: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

An example of navigation in such graph

Find pairs of authors linked by a coauthorship sequence

(x , (authored by−1 · authored by)∗ , y

)

Page 155: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Regular path queries (RPQs)

Expressions of the form (x , r , y), where r is a regular expression:

r , r ′ := ∅ | ǫ | a | a−1 | r ∪ r ′ | r · r ′ | r∗ (a ∈ Lab)

Page 156: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Regular path queries (RPQs)

Expressions of the form (x , r , y), where r is a regular expression:

r , r ′ := ∅ | ǫ | a | a−1 | r ∪ r ′ | r · r ′ | r∗ (a ∈ Lab)

Evaluation:

◮ Set r(G ) of pairs of nodes linked by a path whose label is in L(r)

Page 157: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Paths by example

A path in the edge-labeled graph:

:Ronald FagininPods:83

:John E. HopcroftinFocs:FOCS8

journal:jacm Jacm:HopcroftT74 :Robert E Tarjan

:Jeffrey Ullman

conf:focs Focs:HopU67a

:Moshe Y. Vardi

series

series

journal

partOf

partOf

creatorcreator

creatorcreator

creator

creator

Pods:FaginUV83creator

creator:Leonid Libkin

partOf

creatorcreator

:Limsoon Wongjournal

inPods:89

series

Pods:Ullman89creatorpartOf

Pods:Vardi95

conf:pods

partOf

IPL:LibkinW95

inPods:95

Pods:Libkin95

journal:IPL

creator

series

Page 158: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Labels of path by example

The label of such path is partOf series:

:Ronald FagininPods:83

:John E. HopcroftinFocs:FOCS8

journal:jacm Jacm:HopcroftT74 :Robert E Tarjan

:Jeffrey Ullman

conf:focs Focs:HopU67a

:Moshe Y. Vardi

series

series

journal

partOf

partOf

creatorcreator

creatorcreator

creator

creator

Pods:FaginUV83creator

creator:Leonid Libkin

partOf

creatorcreator

:Limsoon Wongjournal

inPods:89

series

Pods:Ullman89creatorpartOf

Pods:Vardi95

conf:pods

partOf

IPL:LibkinW95

inPods:95

Pods:Libkin95

journal:IPL

creator

series

Page 159: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

More formally

Given an edge-labeled graph G = (V ,E ) such that

◮ E ⊆ V × Lab× V

Page 160: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

More formally

Given an edge-labeled graph G = (V ,E ) such that

◮ E ⊆ V × Lab× V

A path in G is a sequence of consecutive edges in G , i.e.,

v1a1−→ v2

a2−→ v3 · · · vkak−→ vk+1

Page 161: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

More formally

Given an edge-labeled graph G = (V ,E ) such that

◮ E ⊆ V × Lab× V

A path in G is a sequence of consecutive edges in G , i.e.,

v1a1−→ v2

a2−→ v3 · · · vkak−→ vk+1

The label of such path is the word

a1a2 · · · ak ∈ Lab∗

Page 162: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Evaluation problem for RPQs

PROBLEM : Evaluation for RPQs

INPUT : An edge-labeled graph G ,

a pair (u, v) of nodes in G , and a RPQ (x , r , y)

QUESTION : Is it the case that (u, v) ∈ r(G )?

(i.e., is there a path from u to v with label in L(r)?)

Page 163: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

The complexity of evaluation for RPQs

Theorem (Folklore)

The evaluation of RPQs can be solved in linear time O(|G | · |r |)

Page 164: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

The complexity of evaluation for RPQs

Theorem (Folklore)

The evaluation of RPQs can be solved in linear time O(|G | · |r |)

For the proof:Exploit the connection between edge-labeled graphs, RPQs, and NFAs

Page 165: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Non-deterministic finite automata (NFAs)

Tuple A = (Q, q0,F , δ), where:

◮ Q is a finite set of states

◮ q0 ∈ Q is the initial state

◮ F ⊆ Q are the final states

◮ δ ⊆ Q × Lab× Q is the transition relation

Page 166: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Acceptance by NFAs

L(A) = set of words “accepted” by A

Page 167: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Acceptance by NFAs

L(A) = set of words “accepted” by A

When is the worda1a2 . . . ak ∈ Lab∗

accepted by A?

Page 168: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Acceptance by NFAs

L(A) = set of words “accepted” by A

When there are states

q0a1q1a2q2 . . . qk−1akqk ∈ Lab∗

such that qk ∈ F and (qi−1, ai , qi ) ∈ δ for each 1 ≤ i ≤ k

Page 169: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

The emptiness problem for NFAs

Given an NFA A, is L(A) 6= ∅?

Page 170: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

The emptiness problem for NFAs

Given an NFA A, is L(A) 6= ∅?

◮ Can be solved in time O(|A|) by DFS

Page 171: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

The emptiness problem for NFAs

Given an NFA A, is L(A) 6= ∅?

◮ Can be solved in time O(|A|) by DFS

Given NFAs A1 and A2, is L(A1) ∩ L(A2) 6= ∅?

Page 172: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

The emptiness problem for NFAs

Given an NFA A, is L(A) 6= ∅?

◮ Can be solved in time O(|A|) by DFS

Given NFAs A1 and A2, is L(A1) ∩ L(A2) 6= ∅?

◮ Construct the cross product A1 ×A2 in time O(|A1| · |A2|)

◮ Check if L(A1 ×A2) 6= ∅ in time O(|A1| · |A2|)

Page 173: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Edge-labeled graphs can be seen as NFAs

◮ Nodes are states

◮ Edges ua−→ v are transitions

◮ There are no initial and final states

Page 174: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Regular expressions can be turned into NFAs

Proposition

There is an algorithm that given a regular expression r

◮ constructs in time O(|r |) an NFA A such that L(A) = L(r)

Page 175: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Proof of the theorem on evaluation of RPQs

Theorem

The evaluation of RPQs can be solved in linear time O(|G | · |r |)

Page 176: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Proof of the theorem on evaluation of RPQs

Theorem

The evaluation of RPQs can be solved in linear time O(|G | · |r |)

Proof idea:

◮ Compute in linear time from r an equivalent NFA A◮ Compute in linear time the NFA (G , u, v):

◮ Obtained from G by setting u and v as initial and final state, resp.

◮ Then (u, v) ∈ r(G ) iff NFA (G , u, v)×A accepts a word

◮ Check the latter in time O(|G | · |A|) = O(|G | · |r |)

Page 177: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

A semantics based on simple paths?

Simple paths: No node is repeated

Page 178: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

A semantics based on simple paths?

Simple paths: No node is repeated

Simple paths semantics:

◮ Motivated by apps for which simple paths are more natural

◮ Studied in the context of semistructured data in the early 90s

◮ Revival due to application in early versions of SPARQL 1.1

Page 179: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

RPQs under simple paths semantics

Problem: Evaluation for RPQs under simple paths

Input: An edge-labeled graph G ,

a pair (u, v) of nodes in G , and a RPQ (x , r , y)

Question: Is there a simple path from

u to v in G whose label is in L(r)?

Page 180: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Complexity of finding regular simple paths

Theorem (Mendelzon and Wood, 1989)

Evaluation for RPQs under simple paths is NP-complete

◮ This holds even if r = (aa)∗

◮ In fact, even if r = w∗, for w any word of length ≥ 2

Page 181: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Complexity of finding regular simple paths

Theorem (Mendelzon and Wood, 1989)

Evaluation for RPQs under simple paths is NP-complete

◮ This holds even if r = (aa)∗

◮ In fact, even if r = w∗, for w any word of length ≥ 2

Page 182: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Complexity of finding regular simple paths

Theorem (Mendelzon and Wood, 1989)

Evaluation for RPQs under simple paths is NP-complete

◮ This holds even if r = (aa)∗

◮ In fact, even if r = w∗, for w any word of length ≥ 2

Page 183: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Complexity of finding regular simple paths

Theorem (Mendelzon and Wood, 1989)

Evaluation for RPQs under simple paths is NP-complete

◮ This holds even if r = (aa)∗

◮ In fact, even if r = w∗, for w any word of length ≥ 2

Proof:

◮ The following is NP-complete (Lapaugh and Papadimitriou, 1984)• Given a digraph G = (V ,E ) and two nodes u, v ,

is there a directed path from u to v of even length?

Page 184: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

In conclusion

◮ Evaluation of RPQs under simple paths is NP-complete

◮ This holds even in data complexity

◮ This casts serious doubts on the applicability of the semantics

Page 185: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

EXTENDING BGPS WITH NAVIGATION

Page 186: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Conjunctive regular path queries (CRPQs)

Obtained by combining bgps + RPQs

Page 187: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Example of C2RPQ

The C2RPQ

(x , creator−, y), (y , partOf · series, z), (y , creator , u)

computes pairs (a1, a2) that are coauthors of a conference paper

:Ronald FagininPods:83

:John E. HopcroftinFocs:FOCS8

conf:pods

journal:jacm Jacm:HopcroftT74 :Robert E Tarjan

:Jeffrey Ullman

conf:focs Focs:HopU67a

:Moshe Y. Vardi

series

series

journal

partOf

partOf

creatorcreator

creatorcreator

creator

creator

Pods:FaginUV83creator

creator:Leonid Libkin

partOf

creatorcreator

:Limsoon Wongjournal

creatorPods:Ullman89inPods:89

partOf

series

Pods:Libkin95

IPL:LibkinW95

partOf creatorinPods:95

journal:IPL

series

Pods:Vardi95

Page 188: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Example of C2RPQ

The C2RPQ

(x , creator−, y), (y , partOf · series, z), (y , creator , u)

computes pairs (a1, a2) that are coauthors of a conference paper

creator

inPods:83

:John E. HopcroftinFocs:FOCS8

journal:jacm Jacm:HopcroftT74 :Robert E Tarjan

:Jeffrey Ullman

conf:focs Focs:HopU67apartOf

creatorcreator

creatorcreator

Pods:FaginUV83

creator:Leonid Libkin

partOf

creatorcreator

:Limsoon Wongjournal

creator

partOfseries creatorconf:pods :Ronald Fagin

zy

u

x

:Moshe Y. Vardi

series

journal

series

partOfPods:Ullman89

creatorinPods:89

Pods:Vardi95inPods:95partOf

IPL:LibkinW95

Pods:Libkin95

series

journal:IPL

creator

Page 189: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Example of C2RPQ

The C2RPQ

(x , creator−, y), (y , partOf · series, z), (y , creator , u)

computes pairs (a1, a2) that are coauthors of a conference paper

inPods:83

:John E. HopcroftinFocs:FOCS8

conf:pods

journal:jacm Jacm:HopcroftT74 :Robert E Tarjan

:Jeffrey Ullman

conf:focs Focs:HopU67a

:Moshe Y. Vardi

series

journal

partOf

creatorcreator

creatorcreator

creator

Pods:FaginUV83

creator:Leonid Libkin

partOf

creatorcreator

:Limsoon Wongjournal

creator

partOfseries creator:Ronald Fagin

a1

a2creator

inPods:89 Pods:Ullman89

series

partOf

IPL:LibkinW95

series

journal:IPL

creatorpartOf

Pods:Libkin95

Pods:Vardi95inPods:95

Page 190: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Formal definition of CRPQS

Set q of the form:

{(x1, L1, y1), . . . , (xm, Lm, ym)},

such that the xi , yi ’s are variables and the ri ’s reg exps

Page 191: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Formal definition of CRPQS

Set q of the form:

{(x1, L1, y1), . . . , (xm, Lm, ym)},

such that the xi , yi ’s are variables and the ri ’s reg exps

Evaluation: Set q(G ) of all possible “matchings” of q in G , i.e.,

◮ Replacements of the xi , yi by constants ci , disuch that (ci , di ) is linked by path with label in L(ri )

Page 192: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

CRPQs vs RPQs

The CRPQ

(x , creator−, y), (y , partOf · series, z), (y , creator , u)

is not expressible as a RPQ over the graph database:

:Ronald FagininPods:83

:John E. HopcroftinFocs:FOCS8

conf:pods

journal:jacm Jacm:HopcroftT74 :Robert E Tarjan

:Jeffrey Ullman

conf:focs Focs:HopU67a

:Moshe Y. Vardi

series

series

journal

partOf

partOf

creatorcreatorcreatorcreator

creator

creator

Pods:FaginUV83creator

creator:Leonid Libkin

partOf

creatorcreator

:Limsoon Wongjournal

creatorPods:Ullman89inPods:89

partOf

series

Pods:Libkin95

IPL:LibkinW95

partOf creatorinPods:95

journal:IPL

series

Pods:Vardi95

Page 193: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Evaluation for CRPQs

PROBLEM : Evaluation for CRPQs

INPUT : An edge-labeled graph G and a CRPQ q

QUESTION : Is it the case that q(G ) 6= ∅?

Page 194: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Complexity of evaluation for CRPQs

Proposition (Folklore)

Evaluation of CRPQs is NP-complete

◮ But polynomial in data complexity

Page 195: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Complexity of evaluation for CRPQs

Proposition (Folklore)

Evaluation of CRPQs is NP-complete

◮ But polynomial in data complexity

Restrictions on degree of acyclicity yield tractability

Page 196: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

QUERIES ON GRAPHS WITH DATA

Page 197: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Querying the data

So far queries only talk about the topology of the data

◮ We have restricted ourselves to data graphs

Page 198: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Querying the data

So far queries only talk about the topology of the data

◮ We have restricted ourselves to data graphs

Queries that combine topology and data are important in practice:

◮ Example: People of the same age connected by professional links

Page 199: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Querying the data

So far queries only talk about the topology of the data

◮ We have restricted ourselves to data graphs

Queries that combine topology and data are important in practice:

◮ Example: People of the same age connected by professional links

We need to develop navigational queries for property graphs

Page 200: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Data graphs and data paths

We work with data graphs as a simplification of property graphs

Page 201: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Data graphs and data paths

We work with data graphs as a simplification of property graphs

A data graph G is a tuple (V ,E , κ), where:

◮ (V ,E ) is an edge-labeled graph

◮ κ : V → D, where D is a set of data values

Page 202: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Data graphs and data paths

We work with data graphs as a simplification of property graphs

A data graph G is a tuple (V ,E , κ), where:

◮ (V ,E ) is an edge-labeled graph

◮ κ : V → D, where D is a set of data values

With each pathv1

a1−→ v2 · · · vkak−→ vk+1

we associate a data path

κ(v1)a1−→ κ(v2) · · · κ(vk)

ak−→ κ(vk+1)

Page 203: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

The choice of a formalism

Formalism for querying data paths has to be chosen with care:

Theorem (Libkin and Vrgoc, 2012)

The following problem is NP-complete:

◮ Is there a data path from u to v with no repeated data value?

Page 204: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

The choice of a formalism

Formalism for querying data paths has to be chosen with care:

Theorem (Libkin and Vrgoc, 2012)

The following problem is NP-complete:

◮ Is there a data path from u to v with no repeated data value?

If a language expresses this property:

◮ It is NP-hard in data complexity ⇒ impractical

Page 205: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Regular expressions with memory (REMs)

◮ They permit to specify when data values are remembered and used

◮ Data values are remembered in k registers {x1, . . . , xk}

◮ At any point we can compare a data value with one in the registers

◮ Based on the notion of register automata

Page 206: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

REM: Example

Consider the REM ↓x .a+[x=].

Intuition:

◮ Store the current data value d in register x

◮ After reading a word in a+ check that d is seen again

Evaluation: Pairs (v , v ′) of nodes:

◮ Linked by a path labeled in a+.

◮ v and v ′ contain the same data value.

Page 207: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

REM: Conditions

Conditions: Compare a data value with the ones in the registers

Conditions over {x1, . . . , xk} are given by the grammar:

c := x=i | ¬c | c ∧ c (1 ≤ i ≤ k)

Then (d , τ) |= c for d ∈ D and τ = (d1, . . . , dk) ∈ Dk :

◮ (d , τ) |= x=i iff d = di

◮ Boolean combinations are standard

Page 208: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

REMs: Syntax and semantics (Intuition)

REMs over Σ and {x1, . . . , xk} are defined by grammar:

e := ε | a | e ∪ e | e · e | e+ | e[c] | ↓ x .e

where a ∈ Lab, c condition, and x tuple in {x1, . . . , xk}

Page 209: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

REMs: Syntax and semantics (Intuition)

REMs over Σ and {x1, . . . , xk} are defined by grammar:

e := ε | a | e ∪ e | e · e | e+ | e[c] | ↓ x .e

where a ∈ Lab, c condition, and x tuple in {x1, . . . , xk}

Intuition: Evaluation of REM e on data graph G is:• pairs (v , v ′) of nodes linked by a data path ρ such that ρ |= e, where:

Page 210: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

REMs: Syntax and semantics (Intuition)

REMs over Σ and {x1, . . . , xk} are defined by grammar:

e := ε | a | e ∪ e | e · e | e+ | e[c] | ↓ x .e

where a ∈ Lab, c condition, and x tuple in {x1, . . . , xk}

Intuition: Evaluation of REM e on data graph G is:• pairs (v , v ′) of nodes linked by a data path ρ such that ρ |= e, where:

◮ ρ |= e[c] if and only if

τ ∈ Dk st (κ(vk+1), τ) |= c

κ(v1)a1−→ κ(v2) · · · κ(vk)

ak−→ κ(vk+1)︸ ︷︷ ︸

can be parsed wrt estarting from empty registers

ρD

finishing in register value

Page 211: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

REMs: Syntax and semantics (Intuition)

REMs over Σ and {x1, . . . , xk} are defined by grammar:

e := ε | a | e ∪ e | e · e | e+ | e[c] | ↓ x .e

where a ∈ Lab, c condition, and x tuple in {x1, . . . , xk}

Intuition: Evaluation of REM e on data graph G is:• pairs (v , v ′) of nodes linked by a data path ρ such that ρ |= e, where:

◮ ρ |=↓ x .e if and only ifρD

κ(v1)a1−→ κ(v2) · · · κ(vk)

ak−→ κ(vk+1)︸ ︷︷ ︸

can be parsed wrt estarting from the register valuethat assigns κ(v1) to each x ∈ x

Page 212: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

REM: No closure under complement

Consider the REM Σ∗ · (↓x .Σ+[x=]) · Σ∗:

Page 213: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

REM: No closure under complement

Consider the REM Σ∗ · (↓x .Σ+[x=]) · Σ∗:

◮ Defines pairs of nodes linked by data path ρ such that:• ρ contains the same data value twice.

◮ The complement of this language is the NP-complete propertymentioned before

Page 214: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

REM: No closure under complement

Consider the REM Σ∗ · (↓x .Σ+[x=]) · Σ∗:

◮ Defines pairs of nodes linked by data path ρ such that:• ρ contains the same data value twice.

◮ The complement of this language is the NP-complete propertymentioned before

Corollary:

◮ REMs are not closed under complement

Page 215: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Complexity of REM evaluation

Theorem (Libkin, Vrgoc, 2012)

Evaluation for REMs is Pspace-complete

◮ But polynomial in data complexity

Page 216: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Complexity of REM evaluation

Theorem (Libkin, Vrgoc, 2012)

Evaluation for REMs is Pspace-complete

◮ But polynomial in data complexity

Both bounds extend to the class of conjunctive REMs

Page 217: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

PATH QUERIES

Page 218: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

CRPQs and path queries

CRPQs fall short of expressive power for applications that need:

◮ to include paths in the output of a query, and

◮ to define complex relationships among labels of paths.

Page 219: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

CRPQs and path queries

CRPQs fall short of expressive power for applications that need:

◮ to include paths in the output of a query, and

◮ to define complex relationships among labels of paths.

Examples:

◮ Semantic Web queries:• establish semantic associations among paths.

◮ Biological applications:• compare paths based on similarity.

◮ Route-finding applications:• compare paths based on length or number of occurences of labels.

◮ Data provenance and semantic search over the Web:• require returning paths to the user.

Page 220: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Path comparisons

We use a set S of relations on words.

◮ Example: S may contain• Unary relations: Regular, context-free languages, etc.• Binary relations: prefix, equal length, subsequence, etc.

◮ Comparisons among labels of paths = Pertenence to some S ∈ S.• Example: w1 is a substring of w2.

◮ We assume S contains all regular languages.

Page 221: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Extended CRPQs

The S-extended CRPQs (ECRPQ(S)) are rules obtained from a CRPQ:

Ans(z , ) ← (x1, L1, y1), . . . , (xm, Lm, ym),

◮ by joining each pair (xi , yi ) with a path variable πi ,

◮ comparing labels of paths in πj wrt Sj ∈ S• for πj a tuple of path variables among the πi ’s,

◮ projecting some of πi ’s as a tuple χ in the output.

Page 222: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Extended CRPQs

The S-extended CRPQs (ECRPQ(S)) are rules obtained from a CRPQ:

Ans(z, ) ← (x1, π1, y1), . . . , (xm, πm, ym),

◮ by joining each pair (xi , yi ) with a path variable πi ,

◮ comparing labels of paths in πj wrt Sj ∈ S• for πj a tuple of path variables among the πi ’s,

◮ projecting some of πi ’s as a tuple χ in the output.

Page 223: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Extended CRPQs

The S-extended CRPQs (ECRPQ(S)) are rules obtained from a CRPQ:

Ans(z, ) ← (x1, π1, y1), . . . , (xm, πm, ym),∧

1≤j≤t Sj(πj )

◮ by joining each pair (xi , yi ) with a path variable πi ,

◮ comparing labels of paths in πj wrt Sj ∈ S• for πj a tuple of path variables among the πi ’s,

◮ projecting some of πi ’s as a tuple χ in the output.

Page 224: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Extended CRPQs

The S-extended CRPQs (ECRPQ(S)) are rules obtained from a CRPQ:

Ans(z, χ) ← (x1, π1, y1), . . . , (xm, πm, ym),∧

1≤j≤t Sj(πj )

◮ by joining each pair (xi , yi ) with a path variable πi ,

◮ comparing labels of paths in πj wrt Sj ∈ S• for πj a tuple of path variables among the πi ’s,

◮ projecting some of πi ’s as a tuple χ in the output.

Page 225: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Extended CRPQs and our requirements

ECRPQs meet our requirements:

Ans(z, χ) ← (x1, π1, y1), . . . , (xm, πm, ym),∧

1≤j≤t Sj(πj)

Page 226: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Extended CRPQs and our requirements

ECRPQs meet our requirements:

Ans(z, χ) ← (x1, π1, y1), . . . , (xm, πm, ym),∧

1≤j≤t Sj(πj)

◮ They allow to export paths in the output.

◮ They allow to compare labels of paths with relations Sj ∈ S.

Page 227: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Extended CRPQs and our requirements

ECRPQs meet our requirements:

Ans(z, χ) ← (x1, π1, y1), . . . , (xm, πm, ym),∧

1≤j≤t Sj(πj )

◮ They allow to export paths in the output.

◮ They allow to compare labels of paths with relations Sj ∈ S.

Page 228: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Evaluation of ECRPQs

Evaluation of the ECRPQ(S)

θ(z, χ) : Ans(z , χ) ← (x1, π1, y1), . . . , (xm, πm, ym),∧

j Sj(πj )

is defined in terms of “matchings”

Page 229: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Evaluation of ECRPQs

Evaluation of the ECRPQ(S)

θ(z, χ) : Ans(z , χ) ← (x1, π1, y1), . . . , (xm, πm, ym),∧

j Sj(πj )

is defined in terms of “matchings”

Matching for ECRPQs: Same than for CRPQs but:

◮ Each πi is mapped to a path ρi in the graph DB.

◮ If πj = (πj1 , . . . , πjk ) then: (Lab(ρj1), . . . , Lab(ρjk ))︸ ︷︷ ︸

the labels of (ρj1 , . . . , ρjk )

∈ Sj .

Page 230: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Evaluation of ECRPQs

Evaluation of the ECRPQ(S)

θ(z, χ) : Ans(z , χ) ← (x1, π1, y1), . . . , (xm, πm, ym),∧

j Sj(πj )

is defined in terms of “matchings”

Matching for ECRPQs: Same than for CRPQs but:

◮ Each πi is mapped to a path ρi in the graph DB.

◮ If πj = (πj1 , . . . , πjk ) then: (Lab(ρj1), . . . , Lab(ρjk )) ∈ Sj .

The evaluation θ(G) of θ(z, χ) on G is the set:

{(σ(z), σ(χ)) | σ is matching from θ to G}.

Page 231: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Considerations about ECRPQ(S)

• ECRPQ(S) extends the class of CRPQs.

◮ Ans(z)←∧

i(xi , Li , yi ) = Ans(z) ←∧

i(xi , πi , yi ), Li (πi ).

• Expressiveness and complexity of ECRPQ(S):

◮ Depends on the class S.

• We study two such classes with roots in formal language theory:

◮ Regular relations (Elgot, Mezei (1965)).

◮ Rational relations (Nivat (1968)).

Page 232: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

COMPARING PATHS WITH REGULAR RELATIONS

Page 233: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Introduction

• Regular relations: Regular languages for relations of any arity

◮ REG: Class of regular relations.

• Bottomline:

ECRPQ(REG): Reasonable expressiveness and complexity.

Page 234: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Regular relations

n-ary regular relation:

Set of n-tuples (w1, . . . ,wn) of stringsaccepted by synchronous automaton over Σn.

Page 235: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Regular relations

n-ary regular relation:

Set of n-tuples (w1, . . . ,wn) of stringsaccepted by synchronous automaton over Σn.

◮ The input strings are written in the n-tapes.

◮ Shorter strings are padded with symbol ⊥.

◮ At each step:The automaton simultaneously reads next symbol on each tape.

Page 236: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Synchronous automata

w1 = a a b · · · a b c

w2 = a b a · · · a

w3 = b b · · ·...

...

wn = a b b · · · a c

Page 237: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Synchronous automata

w1 = a a b · · · a b c

w2 = a b a · · · a ⊥ ⊥

w3 = b b ⊥ · · · ⊥ ⊥ ⊥...

...

wn = a b b · · · a c ⊥

Page 238: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Synchronous automata

w1 = a a b · · · a b c

w2 = a b a · · · a ⊥ ⊥

w3 = b b ⊥ · · · ⊥ ⊥ ⊥...

...

wn = a b b · · · a c ⊥

Page 239: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Synchronous automata

w1 = a a b · · · a b c

w2 = a b a · · · a ⊥ ⊥

w3 = b b ⊥ · · · ⊥ ⊥ ⊥...

...

wn = a b b · · · a c ⊥

Page 240: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Synchronous automata

w1 = a a b · · · a b c

w2 = a b a · · · a ⊥ ⊥

w3 = b b ⊥ · · · ⊥ ⊥ ⊥...

...

wn = a b b · · · a c ⊥

Page 241: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Synchronous automata

w1 = a a b · · · a b c

w2 = a b a · · · a ⊥ ⊥

w3 = b b ⊥ · · · ⊥ ⊥ ⊥...

...

wn = a b b · · · a c ⊥

Page 242: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Synchronous automata

w1 = a a b · · · a b c

w2 = a b a · · · a ⊥ ⊥

w3 = b b ⊥ · · · ⊥ ⊥ ⊥...

...

wn = a b b · · · a c ⊥

Page 243: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Synchronous automata

w1 = a a b · · · a b c

w2 = a b a · · · a ⊥ ⊥

w3 = b b ⊥ · · · ⊥ ⊥ ⊥...

...

wn = a b b · · · a c ⊥

Page 244: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Examples of regular relations

• All regular languages.

• The prefix relation defined by:

( ⋃

a∈Σ

(a, a))∗·( ⋃

a∈Σ

(a,⊥))∗.

• The equal length relation defined by:

( ⋃

a,b∈Σ

(a, b))∗.

• Pairs of strings at edit distance at most k , for fixed k ≥ 0.

Page 245: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Examples of regular relations

• All regular languages.

• The prefix relation defined by:

( ⋃

a∈Σ

(a, a))∗·( ⋃

a∈Σ

(a,⊥))∗.

• The equal length relation defined by:

( ⋃

a,b∈Σ

(a, b))∗.

• Pairs of strings at edit distance at most k , for fixed k ≥ 0.

Proposition

The subsequence, subword and suffix relations are not regular.

Page 246: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

ECRPQ(REG)

ECRPQ(REG): Class of queries of the form

Ans(z, χ) ←∧

i (xi , πi , yi ),∧

j Sj(πj),

where each Sj is a regular relation (B., Libkin, Lin, Wood (2012)).

Page 247: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

ECRPQ(REG)

ECRPQ(REG): Class of queries of the form

Ans(z, χ) ←∧

i (xi , πi , yi ),∧

j Sj(πj),

where each Sj is a regular relation (B., Libkin, Lin, Wood (2012)).

Example: The ECRPQ(REG) query

Ans(x , y) ← (x , π1, z), (z , π2, y), a∗(π1), b

∗(π2), equal length(π1, π2)

computes pairs of nodes linked by a path labeled in {anbn | n ≥ 0}.

Page 248: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

ECRPQ(REG)

ECRPQ(REG): Class of queries of the form

Ans(z, χ) ←∧

i (xi , πi , yi ),∧

j Sj(πj),

where each Sj is a regular relation (B., Libkin, Lin, Wood (2012)).

Example: The ECRPQ(REG) query

Ans(x , y) ← (x , π1, z), (z , π2, y), a∗(π1), b

∗(π2), equal length(π1, π2)

computes pairs of nodes linked by a path labeled in {anbn | n ≥ 0}.

Corollary

ECRPQ(REG) properly extends the class of CRPQs.

Page 249: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Complexity of evaluation of ECRPQ(REG)

Theorem (B., Libkin, Lin and Wood, 2012)

Evaluation for ECRPQ(REG) is Pspace-complete.

◮ But polynomial in data complexity

Page 250: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

COMPARING PATHS WITH RATIONAL RELATIONS

Page 251: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Introduction

ECRPQ(REG) queries are still short of expressive power:

◮ RDF or biological networks:• Compare strings based on subsequence and subword relations.

◮ These relations are rational: Accepted by “asynchronous” automata.• RAT: Class of rational relations.

Bottomline:

◮ ECRPQ(RAT) evaluation:• Undecidable or very high complexity.

◮ Restricting the syntactic shape of queries yields tractability.

Page 252: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Rational relations

n-ary rational relation:Set of n-tuples (w1, . . . ,wn) of stringsaccepted by asynchronous automaton with n heads.

Page 253: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Rational relations

n-ary rational relation:Set of n-tuples (w1, . . . ,wn) of stringsaccepted by asynchronous automaton with n heads.

◮ The input strings are written in the n-tapes.

◮ At each step:The automaton enters a new state and move some tape heads.

Page 254: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Rational relations

n-ary rational relation:Set of n-tuples (w1, . . . ,wn) of stringsaccepted by asynchronous automaton with n heads.

◮ The input strings are written in the n-tapes.

◮ At each step:The automaton enters a new state and move some tape heads.

n-ary rational relation:Described by regular expression over alphabet (Σ ∪ {ǫ})n.

Page 255: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Examples of rational relations

• All regular relations.

• The subsequence relation �ss defined by:

(( ⋃

a∈Σ

(a, ǫ))∗

b∈Σ

(b, b)

)∗

·( ⋃

a∈Σ

(a, ǫ))∗.

• The subword relation �sw defined by:

( ⋃

a∈Σ

(a, ǫ))∗·( ⋃

b∈Σ

(b, b))∗·( ⋃

a∈Σ

(a, ǫ))∗.

Page 256: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Examples of rational relations

• All regular relations.

• The subsequence relation �ss defined by:

(( ⋃

a∈Σ

(a, ǫ))∗

b∈Σ

(b, b)

)∗

·( ⋃

a∈Σ

(a, ǫ))∗.

• The subword relation �sw defined by:

( ⋃

a∈Σ

(a, ǫ))∗·( ⋃

b∈Σ

(b, b))∗·( ⋃

a∈Σ

(a, ǫ))∗.

Proposition

The pairs (w1,w2) such that w1 is the reversal of w2 is not rational.

Page 257: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

ECRPQ(RAT)

ECRPQ(RAT): Class of queries of the form

Ans(z, χ) ←∧

i (xi , πi , yi ),∧

j Sj(πj),

where each Sj is a rational relation (B., Figueira, Libkin (2012)).

Example: The ECRPQ(RAT) query

Ans(x , y) ← (x , π1, z), (y , π2,w), π1 �ss π2

computes x , y that are origins of paths ρ1 and ρ2 such that:

◮ λ(ρ1) is a subsequence of λ(ρ2).

Page 258: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Evaluation of ECRPQ(RAT) queries

Evaluation of queries in ECRPQ(RAT) is undecidable, but:

◮ True if we allow only practically motivated rational relations?• For example, �ss and �sw.

Page 259: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Evaluation of ECRPQ(RAT) queries

Evaluation of queries in ECRPQ(RAT) is undecidable, but:

◮ True if we allow only practically motivated rational relations?• For example, �ss and �sw.

Adding subword relation to ECRPQ(REG) leads to undecidability:

Theorem (B., Figueira and Libkin, 2012)

Evaluation for ECRPQ(REG ∪{�sw})) is undecidable.

Page 260: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Evaluation of ECRPQ(RAT) queries

Evaluation of queries in ECRPQ(RAT) is undecidable, but:

◮ True if we allow only practically motivated rational relations?• For example, �ss and �sw.

Adding subword relation to ECRPQ(REG) leads to undecidability:

Theorem (B., Figueira and Libkin, 2012)

Evaluation for ECRPQ(REG ∪{�sw})) is undecidable.

Adding subsequence preserves decidability, but at a very high cost:

Theorem (B., Figueira and Libkin, 2012)

Evaluation for (ECRPQ(REG ∪{�ss})) is decidable,but non-primitive-recursive.

Page 261: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Acyclic ECRPQ(RAT) queries

Acyclic ECRPQ(RAT) queries yield tractable data complexity.

◮ Queries of the form:

Ans(z)←∧

i≤k

(xi , πi , yi ), Li (πi ),∧

j

Sj(πj1 , πj2),

where the graph on {1, . . . , k} defined by edges (πj1 , πj2) is acyclic.

Page 262: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

Acyclic ECRPQ(RAT) queries

Acyclic ECRPQ(RAT) queries yield tractable data complexity.

◮ Queries of the form:

Ans(z)←∧

i≤k

(xi , πi , yi ), Li (πi ),∧

j

Sj(πj1 , πj2),

where the graph on {1, . . . , k} defined by edges (πj1 , πj2) is acyclic.

Acyclic ECRPQ(RAT) is not more expensive than ECRPQ(REG):

Theorem (B., Figueira and Libkin, 2012)

Evaluation of acyclic ECRPQ(RAT) queries is Pspace-complete.

◮ But polynomial in data complexity

Page 263: Centro de Investigacio´n de la Web Sem´antica Center for ...aidanhogan.com/teaching/ba/slides/pablo-baires-2016.pdf · Centro de Investigacio´n de la Web Sem´antica

What are your conclusions?