centro de investigacio´n de la web sem´antica center for...
TRANSCRIPT
Centro de Investigacion de la Web SemanticaCenter for Semantic Web Research
www.ciws.cl, @ciwschile
Four institutions
PUC, Chile
◮ Marcelo Arenas (Boss)◮ SW, data exchange,
semistrucured data
◮ Juan Reutter◮ SW, graph DBs, DLs
◮ Cristian Riveros◮ data exchange, semistr. data
◮ Jorge Baier◮ planning, search
Univ. of Talca
Renzo Angles (SW)
UTFSM
Carlos Buil (SW)
Univ. of Chile
◮ Pablo Barcelo (Deputy)◮ graph DBs, DB theory
◮ Claudio Gutierrez◮ SW, graph DBs
◮ Jorge Perez◮ SW, data exchange
◮ Aidan Hogan◮ SW, semistr. data
◮ Barbara Poblete◮ web mining, SNA
◮ Benjamın Bustos◮ multimedia
Query Languages for Graph DBs(Bridging the Gap Between Theory and Practice)
Pablo Barcelo and Aidan HoganDCC, Universidad de Chile
Center for Semantic Web Research (www.ciws.cl)
BACKGROUND AND OBJECTIVES
Graph databases
Trendy applications:
◮ Social network analysis
◮ Semantic web
◮ Scientific databases
◮ Software bug localization
◮ Geo-routing
Graph databases
Trendy applications:
◮ Social network analysis
◮ Semantic web
◮ Scientific databases
◮ Software bug localization
◮ Geo-routing
More in general:
◮ Wherever connections are as important as data
What is a graph database?
A data management system that exposes a graph data model.1
1Graph databases. Robinson, Webber, & Eifrem. O’Reilly, 2013.
What is a graph database?
A data management system that exposes a graph data model.1
Several existing graph DB engines and query languages:
◮ DEX/Sparksee - basic algebra
◮ IBM System G - Gremlin
◮ Neo4J - Cypher
◮ Oracle PGX - PGQL
◮ RDF stores (Virtuoso, AllegroGraph, Oracle, IBM) - SPARQL
1Graph databases. Robinson, Webber, & Eifrem. O’Reilly, 2013.
What graph databases are good for?
◮ Flexible modelling of interconnected data
◮ Agile evolution of the data model
◮ Scalable evaluation of join-intensive queries
My personal story
◮ Since 2009: Working on theory of query languages for graph DBs
◮ Since 2015: Working group of LDBC for the design of such language
My personal story
◮ Since 2009: Working on theory of query languages for graph DBs
◮ Since 2015: Working group of LDBC for the design of such language
Conclusion:
◮ Theory and practice are more connected than expected
Objectives
◮ Identify topics of common interest for theoreticians and developers
◮ Formalize relevant concepts (syntax, semantics, terminology, etc)
◮ Understand tradeoff expressiveness/efficiency
Overview of the course
////////Aidan/////will///////teach///////////Monday,///////////Tuesday,//////and///////////////Wednesday:
◮ ///////////Semantic//////web,////////RDF,/////and///////////SPARQL
◮ /////Half//////class////////based////on////////////////presentation,//////half///////based////on/////lab
Pablo will teach on Thursday and Friday:
◮ Property graphs, graph patterns, path queries
◮ Lectures presenting how theory of graph DBs relates to practice
Syllabus
Syllabus
//////////Monday:////////////Semantic///////web,//////RDF//////data/////////model,////////////Wikidata
//////////Tuesday:////////////SPARQL/////1.0,///////basic////////graph///////////patterns,///////basic/////////////operations//////////(union,/////////////projection,////////////optional),//////////solution/////////////modifiers,///////graph////////////construct/
//////////////Wednesday:////////////SPARQL/////1.1,///////////property////////paths,////////////negation,////////////////aggregation,//////////////distribution,///////////updates
Syllabus
//////////Monday:////////////Semantic///////web,//////RDF//////data/////////model,////////////Wikidata
//////////Tuesday:////////////SPARQL/////1.0,///////basic////////graph///////////patterns,///////basic/////////////operations//////////(union,/////////////projection,////////////optional),//////////solution/////////////modifiers,///////graph////////////construct/
//////////////Wednesday:////////////SPARQL/////1.1,///////////property////////paths,////////////negation,////////////////aggregation,//////////////distribution,///////////updates
Thursday: Graph DBs, property graphs, graph patterns, expressiveness,complexity of evaluation
Syllabus
//////////Monday:////////////Semantic///////web,//////RDF//////data/////////model,////////////Wikidata
//////////Tuesday:////////////SPARQL/////1.0,///////basic////////graph///////////patterns,///////basic/////////////operations//////////(union,/////////////projection,////////////optional),//////////solution/////////////modifiers,///////graph////////////construct/
//////////////Wednesday:////////////SPARQL/////1.1,///////////property////////paths,////////////negation,////////////////aggregation,//////////////distribution,///////////updates
Thursday: Graph DBs, property graphs, graph patterns, expressiveness,complexity of evaluation
Friday: Path query languages, regular path queries, ungrouping,expressiveness, complexity
Evaluation
35% in-class labs, 15% applied assignment, 50% theoretical assignment
THE DATA MODEL:PROPERTY GRAPHS
The data model is important as it must be:
◮ Flexible enough to accomodate scenarios of practical interest
◮ Simple enough to allow for a clean presentation
◮ Expressive enough for theoretical issues to appear in full force
The data model is important as it must be:
◮ Flexible enough to accomodate scenarios of practical interest
◮ Simple enough to allow for a clean presentation
◮ Expressive enough for theoretical issues to appear in full force
This is accomplished by the model of property graphs
A bit of history and context is needed
A bit of history and context is needed
The simplest possible graph data model: Directed graphs
A bit of history and context is needed
The simplest possible graph data model: Directed graphs
Clint EastwoodThe Bridges of
Madison County
Meryl Streep Out of Africa
Any problems with this data model?
Any problems with this data model?
It does not allow to classify relationships
Any problems with this data model?
It does not allow to classify relationships
Clint EastwoodThe Bridges of
Madison County
C. Eastwood acts in and directed this movie
Edge-labeled graphs
Allow for such classification
Edge-labeled graphs
Allow for such classification
Clint EastwoodThe Bridges of
Madison County
acts in
directs
Meryl Streepacts in
Formal definition of edge-labeled graphs
An edge-labelled graph G is a pair (V ,E ), where:
1. V is a finite set of vertices (or nodes).
2. E is a finite set of labeled edges; formally,
E ⊆ V × Lab× V ,
where Lab is a set of labels.
What if actors play different roles in movies?
What if actors play different roles in movies?
Peter Sellers
Lionel Mandrake
plays
Merkin Muffley
plays
Dr.Strangelove
movie
movie
18 minutes
34 minutes
screentime
screentime
Any problems with this approach?
Any problems with this approach?
We might need to modify the schema every time we add attributes
Any problems with this approach?
We might need to modify the schema every time we add attributes
More convenient to allow attributes in nodes/edges explicitly
Any problems with this approach?
We might need to modify the schema every time we add attributes
More convenient to allow attributes in nodes/edges explicitly
◮ This corresponds to what property graphs do
A property graph
role = Scorpio
ref = IMDb
name = Clint Eastwood
gender = male
n1 : Person
title = The Bridges of
Madison County
n2 : Movie
role = Robert Kincaid
ref = IMDb
e1 : acts in
e2 : directs
title = Dirty Harry
n5 : Movie
role = Harry Callahan
ref = IMDb
e5 : acts in
name = Andy Robinson
gender = male
n6 : Person
e6 : acts in
Another property graph
firstName = Julie
lastName = Freud
country = Chile
n1 : Person
firstName = John
lastName = Cook
gender = male
country = Chile
n2 : Person
e1 : knows
e2 : knows
name = U2
n3 : Tag
e3 : hasInterest
content = I love U2
language = en
n4 : Post
e4 : hasTag
content = Queen is awesome
n5 : Post
date = 14-09-15
e5 : likes
date = 15-03-14
e6 : likes
date = 23-10-15
e7 : likes
Yet another property graph
authored by
Author
name: Leonid Libkin
affil: U. of Edinburgh
Author
name: Robert Tarjan
affil: Princeton Univ.
Article
name: jacmHopcroft74
year: 1974
Venue
name: FOCS67
authored by
corresponding: YES
Author
Author
Author
Author
name: John Hopcroft
affil: Cornell Univ.
name: Jeff Ullman
affil: Stanford Univ.
name: Ron Fagin
affil: IBM Almaden
name: Moshe Vardi
affil: Rice University
Author
name: Limsoon Wong
affil: NU Singapore
Article
name: FOCSHopcroft67
year: 1967
Article
name: PODSUllman89
year: 1989
Article
name: PODSFaginUV83
year: 1983
Article
name: PODSVardi95
year: 1995
Article
name: PODSLibkin95
year: 1995
Article
name: IPLLibkinW95
year: 1995
Journal
name: JACM
Venue
Venue
Venue
Journal
name: PODS89
name: PODS83
name: PODS95
name: IPL
journal
journal
part of
part of
part of
part of
Conference
Conference
name: FOCS
name: PODS
series
series
part of
serie
s
series
authored by
authored by
authored by
authored by
authored by
corresponding: YES
corresponding: YES
authored by
authored by
cooresponding: YES
cooresponding: YES
authored by
authored by
authored by
cooresponding: YES
What is a property graph then?
◮ It is a graph
◮ It is directed
◮ It is labeled in nodes and edges
◮ Nodes and edges can be attributed
What is a property graph then?
◮ It is a graph
◮ It is directed
◮ It is labeled in nodes and edges
◮ Nodes and edges can be attributed
What is a property graph then?
◮ It is a graph
◮ It is directed
◮ It is labeled in nodes and edges
◮ Nodes and edges can be attributed
What is a property graph then?
◮ It is a graph
◮ It is directed
◮ It is labeled in nodes and edges
◮ Nodes and edges can be attributed
What is a property graph then?
◮ It is a graph
◮ It is directed
◮ It is labeled in nodes and edges
◮ Nodes and edges can be attributed
A formal definition
A property graph G is a tuple (V ,E , ρ, λ, σ), where:
1. V is a finite set of vertices (or nodes).
2. E is a finite set of edges.
3. ρ : E → (V × V ) is a total function.◮ ρ(e) = (v1, v2) indicates that e is an edge from v1 to v2.
4. λ : (V ∪ E )→ Lab is a total function with Lab a set of labels.◮ If λ(m) = ℓ, then ℓ is the label of node or edge m.
5. σ : (V ∪ E )× Prop→ Val is a partial functionwith Prop a finite set of properties and Val a set of values.
◮ If σ(m, p) = v , then v is the value of property p in m.
A formal definition
A property graph G is a tuple (V ,E , ρ, λ, σ), where:
1. V is a finite set of vertices (or nodes).
2. E is a finite set of edges.
3. ρ : E → (V × V ) is a total function.◮ ρ(e) = (v1, v2) indicates that e is an edge from v1 to v2.
4. λ : (V ∪ E )→ Lab is a total function with Lab a set of labels.◮ If λ(m) = ℓ, then ℓ is the label of node or edge m.
5. σ : (V ∪ E )× Prop→ Val is a partial functionwith Prop a finite set of properties and Val a set of values.
◮ If σ(m, p) = v , then v is the value of property p in m.
A formal definition
A property graph G is a tuple (V ,E , ρ, λ, σ), where:
1. V is a finite set of vertices (or nodes).
2. E is a finite set of edges.
3. ρ : E → (V × V ) is a total function.◮ ρ(e) = (v1, v2) indicates that e is an edge from v1 to v2.
4. λ : (V ∪ E )→ Lab is a total function with Lab a set of labels.◮ If λ(m) = ℓ, then ℓ is the label of node or edge m.
5. σ : (V ∪ E )× Prop→ Val is a partial functionwith Prop a finite set of properties and Val a set of values.
◮ If σ(m, p) = v , then v is the value of property p in m.
A formal definition
A property graph G is a tuple (V ,E , ρ, λ, σ), where:
1. V is a finite set of vertices (or nodes).
2. E is a finite set of edges.
3. ρ : E → (V × V ) is a total function.◮ ρ(e) = (v1, v2) indicates that e is an edge from v1 to v2.
4. λ : (V ∪ E )→ Lab is a total function with Lab a set of labels.◮ If λ(m) = ℓ, then ℓ is the label of node or edge m.
5. σ : (V ∪ E )× Prop→ Val is a partial functionwith Prop a finite set of properties and Val a set of values.
◮ If σ(m, p) = v , then v is the value of property p in m.
A formal definition
A property graph G is a tuple (V ,E , ρ, λ, σ), where:
1. V is a finite set of vertices (or nodes).
2. E is a finite set of edges.
3. ρ : E → (V × V ) is a total function.◮ ρ(e) = (v1, v2) indicates that e is an edge from v1 to v2.
4. λ : (V ∪ E )→ Lab is a total function with Lab a set of labels.◮ If λ(m) = ℓ, then ℓ is the label of node or edge m.
5. σ : (V ∪ E )× Prop→ Val is a partial functionwith Prop a finite set of properties and Val a set of values.
◮ If σ(m, p) = v , then v is the value of property p in m.
A formal definition
A property graph G is a tuple (V ,E , ρ, λ, σ), where:
1. V is a finite set of vertices (or nodes).
2. E is a finite set of edges.
3. ρ : E → (V × V ) is a total function.◮ ρ(e) = (v1, v2) indicates that e is an edge from v1 to v2.
4. λ : (V ∪ E )→ Lab is a total function with Lab a set of labels.◮ If λ(m) = ℓ, then ℓ is the label of node or edge m.
5. σ : (V ∪ E )× Prop→ Val is a partial functionwith Prop a finite set of properties and Val a set of values.
◮ If σ(m, p) = v , then v is the value of property p in m.
A formal definition
A property graph G is a tuple (V ,E , ρ, λ, σ), where:
1. V is a finite set of vertices (or nodes).
2. E is a finite set of edges.
3. ρ : E → (V × V ) is a total function.◮ ρ(e) = (v1, v2) indicates that e is an edge from v1 to v2.
4. λ : (V ∪ E )→ Lab is a total function with Lab a set of labels.◮ If λ(m) = ℓ, then ℓ is the label of node or edge m.
5. σ : (V ∪ E )× Prop→ Val is a partial functionwith Prop a finite set of properties and Val a set of values.
◮ If σ(m, p) = v , then v is the value of property p in m.
A formal definition
A property graph G is a tuple (V ,E , ρ, λ, σ), where:
1. V is a finite set of vertices (or nodes).
2. E is a finite set of edges.
3. ρ : E → (V × V ) is a total function.◮ ρ(e) = (v1, v2) indicates that e is an edge from v1 to v2.
4. λ : (V ∪ E )→ Lab is a total function with Lab a set of labels.◮ If λ(m) = ℓ, then ℓ is the label of node or edge m.
5. σ : (V ∪ E )× Prop→ Val is a partial functionwith Prop a finite set of properties and Val a set of values.
◮ If σ(m, p) = v , then v is the value of property p in m.
A formal definition
A property graph G is a tuple (V ,E , ρ, λ, σ), where:
1. V is a finite set of vertices (or nodes).
2. E is a finite set of edges.
3. ρ : E → (V × V ) is a total function.◮ ρ(e) = (v1, v2) indicates that e is an edge from v1 to v2.
4. λ : (V ∪ E )→ Lab is a total function with Lab a set of labels.◮ If λ(m) = ℓ, then ℓ is the label of node or edge m.
5. σ : (V ∪ E )× Prop→ Val is a partial functionwith Prop a finite set of properties and Val a set of values.
◮ If σ(m, p) = v , then v is the value of property p in m.
How would you represent property graphs?
How would you represent property graphs?
Labeled graphs: set of triples
Property graphs:set of triples + special triples describing property/value pairs of objects
GRAPH PATTERNS
Basic graph patterns are:
The basic unit for querying property graphs
Basic graph patterns are:
The basic unit for querying property graphs
A graph-structured query that is to be matched against a property graph
Basic graph patterns are:
The basic unit for querying property graphs
A graph-structured query that is to be matched against a property graph
Definition of basic graph pattern (bgp):
◮ A property graph that uses variables and constants
◮ Variables can appear as:node and edge names, labels, properties, and values
Examples of basic graph patterns
x3
x1 x2
acts in acts in
firstName = x1
lastName = x2
x10 : Person
firstName = x3
lastName = x4
x11 : Person
x12 : knows
x13 : knows
x8 = x9
x16 : x7
date = x5
x14 : likes
date = x6
x15 : likes
Evaluation of bgps
Set p(G ) of all possible “matchings” of bgp p in property graph G
Evaluation of bgps
Set p(G ) of all possible “matchings” of bgp p in property graph G
Matching:
◮ Replacement of variables in p by constants such that resultingproperty graph G ′ is “contained” in G
Example of evaluation
Example of evaluation
name = Clint Eastwood
x1 : Person
title = x5
x4 : Movie
x2 : x3title = x9
x8 : Movie
x6 : x7
Example of evaluation
name = Clint Eastwood
x1 : Person
title = x5
x4 : Movie
x2 : x3title = x9
x8 : Movie
x6 : x7
x1 x2 x3 x4 x5 x6 x7 x8 x9
n1 e2 directs n2 The Bridges of Madison C. e5 acts in n5 Dirty Harry
n1 e5 acts in n5 Dirty Harry e2 directs n2 The Bridges of Madison C.
n1 e1 acts in n2 The Bridges of Madison C. e5 acts in n5 Dirty Harry
n1 e5 acts in n5 Dirty Harry e1 acts in n2 The Bridges of Madison C.
n1 e1 acts in n2 The Bridges of Madison C. e2 directs n2 The Bridges of Madison C.
n1 e2 directs n2 The Bridges of Madison C. e1 acts in n2 The Bridges of Madison C.
n1 e1 acts in n2 The Bridges of Madison C. e1 acts in n2 The Bridges of Madison C.
n1 e2 directs n2 The Bridges of Madison C. e2 directs n2 The Bridges of Madison C.
n1 e5 acts in n5 Dirty Harry e5 acts in n5 Dirty Harry
Which semantic is the right semantics?
◮ Homomorphism-based semantics: All matchings are valid
◮ Isomorphism-based semantics: Only 1-1 matchings are valid
◮ Node/edge injective semantics: Self-defining
Which semantic is the right semantics?
◮ Homomorphism-based semantics: All matchings are valid
◮ Isomorphism-based semantics: Only 1-1 matchings are valid
◮ Node/edge injective semantics: Self-defining
Which semantic is the right semantics?
◮ Homomorphism-based semantics: All matchings are valid
◮ Isomorphism-based semantics: Only 1-1 matchings are valid
◮ Node/edge injective semantics: Self-defining
Which semantic is the right semantics?
◮ Homomorphism-based semantics: All matchings are valid
◮ Isomorphism-based semantics: Only 1-1 matchings are valid
◮ Node/edge injective semantics: Self-defining
Different semantics by example
Homomorphism-based semantics:
x1 x2 x3 x4 x5 x6 x7 x8 x9
n1 e2 directs n2 The Bridges of Madison C. e5 acts in n5 Dirty Harry X
n1 e5 acts in n5 Dirty Harry e2 directs n2 The Bridges of Madison C. X
n1 e1 acts in n2 The Bridges of Madison C. e5 acts in n5 Dirty Harry X
n1 e5 acts in n5 Dirty Harry e1 acts in n2 The Bridges of Madison C. X
n1 e1 acts in n2 The Bridges of Madison C. e2 directs n2 The Bridges of Madison C. X
n1 e2 directs n2 The Bridges of Madison C. e1 acts in n2 The Bridges of Madison C. X
n1 e1 acts in n2 The Bridges of Madison C. e1 acts in n2 The Bridges of Madison C. X
n1 e2 directs n2 The Bridges of Madison C. e2 directs n2 The Bridges of Madison C. X
n1 e5 acts in n5 Dirty Harry e5 acts in n5 Dirty Harry X
Different semantics by example
Isomorphism-based semantics:
x1 x2 x3 x4 x5 x6 x7 x8 x9
n1 e2 directs n2 The Bridges of Madison C. e5 acts in n5 Dirty Harry X
n1 e5 acts in n5 Dirty Harry e2 directs n2 The Bridges of Madison C. X
n1 e1 acts in n2 The Bridges of Madison C. e5 acts in n5 Dirty Harry ✗
n1 e5 acts in n5 Dirty Harry e1 acts in n2 The Bridges of Madison C. ✗
n1 e1 acts in n2 The Bridges of Madison C. e2 directs n2 The Bridges of Madison C. ✗
n1 e2 directs n2 The Bridges of Madison C. e1 acts in n2 The Bridges of Madison C. ✗
n1 e1 acts in n2 The Bridges of Madison C. e1 acts in n2 The Bridges of Madison C. ✗
n1 e2 directs n2 The Bridges of Madison C. e2 directs n2 The Bridges of Madison C. ✗
n1 e5 acts in n5 Dirty Harry e5 acts in n5 Dirty Harry ✗
Different semantics by example
Node-injective semantics:
x1 x2 x3 x4 x5 x6 x7 x8 x9
n1 e2 directs n2 The Bridges of Madison C. e5 acts in n5 Dirty Harry X
n1 e5 acts in n5 Dirty Harry e2 directs n2 The Bridges of Madison C. X
n1 e1 acts in n2 The Bridges of Madison C. e5 acts in n5 Dirty Harry X
n1 e5 acts in n5 Dirty Harry e1 acts in n2 The Bridges of Madison C. X
n1 e1 acts in n2 The Bridges of Madison C. e2 directs n2 The Bridges of Madison C. ✗
n1 e2 directs n2 The Bridges of Madison C. e1 acts in n2 The Bridges of Madison C. ✗
n1 e1 acts in n2 The Bridges of Madison C. e1 acts in n2 The Bridges of Madison C. ✗
n1 e2 directs n2 The Bridges of Madison C. e2 directs n2 The Bridges of Madison C. ✗
n1 e5 acts in n5 Dirty Harry e5 acts in n5 Dirty Harry ✗
Different semantics by example
Edge-injective semantics:
x1 x2 x3 x4 x5 x6 x7 x8 x9
n1 e2 directs n2 The Bridges of Madison C. e5 acts in n5 Dirty Harry X
n1 e5 acts in n5 Dirty Harry e2 directs n2 The Bridges of Madison C. X
n1 e1 acts in n2 The Bridges of Madison C. e5 acts in n5 Dirty Harry X
n1 e5 acts in n5 Dirty Harry e1 acts in n2 The Bridges of Madison C. X
n1 e1 acts in n2 The Bridges of Madison C. e2 directs n2 The Bridges of Madison C. X
n1 e2 directs n2 The Bridges of Madison C. e1 acts in n2 The Bridges of Madison C. X
n1 e1 acts in n2 The Bridges of Madison C. e1 acts in n2 The Bridges of Madison C. ✗
n1 e2 directs n2 The Bridges of Madison C. e2 directs n2 The Bridges of Madison C. ✗
n1 e5 acts in n5 Dirty Harry e5 acts in n5 Dirty Harry ✗
To make bgps usable we need more features
◮ Projection: Only some variables appear in the output, i.e., π(p(G ))
◮ Filter: Prune based on the values of variables, i.e., σ(π(G ))
◮ Union: Take the union of two bgps, i.e., p1(G ) ∪ p2(G )
◮ Difference: Take the difference of two bgps, i.e., p1(G ) \ p2(G )
◮ Optional: Extend matchings of p1 with those in p2, if possible
To make bgps usable we need more features
◮ Projection: Only some variables appear in the output, i.e., π(p(G ))
◮ Filter: Prune based on the values of variables, i.e., σ(π(G ))
◮ Union: Take the union of two bgps, i.e., p1(G ) ∪ p2(G )
◮ Difference: Take the difference of two bgps, i.e., p1(G ) \ p2(G )
◮ Optional: Extend matchings of p1 with those in p2, if possible
To make bgps usable we need more features
◮ Projection: Only some variables appear in the output, i.e., π(p(G ))
◮ Filter: Prune based on the values of variables, i.e., σ(π(G ))
◮ Union: Take the union of two bgps, i.e., p1(G ) ∪ p2(G )
◮ Difference: Take the difference of two bgps, i.e., p1(G ) \ p2(G )
◮ Optional: Extend matchings of p1 with those in p2, if possible
To make bgps usable we need more features
◮ Projection: Only some variables appear in the output, i.e., π(p(G ))
◮ Filter: Prune based on the values of variables, i.e., σ(π(G ))
◮ Union: Take the union of two bgps, i.e., p1(G ) ∪ p2(G )
◮ Difference: Take the difference of two bgps, i.e., p1(G ) \ p2(G )
◮ Optional: Extend matchings of p1 with those in p2, if possible
To make bgps usable we need more features
◮ Projection: Only some variables appear in the output, i.e., π(p(G ))
◮ Filter: Prune based on the values of variables, i.e., σ(π(G ))
◮ Union: Take the union of two bgps, i.e., p1(G ) ∪ p2(G )
◮ Difference: Take the difference of two bgps, i.e., p1(G ) \ p2(G )
◮ Optional: Extend matchings of p1 with those in p2, if possible
To make bgps usable we need more features
◮ Projection: Only some variables appear in the output, i.e., π(p(G ))
◮ Filter: Prune based on the values of variables, i.e., σ(π(G ))
◮ Union: Take the union of two bgps, i.e., p1(G ) ∪ p2(G )
◮ Difference: Take the difference of two bgps, i.e., p1(G ) \ p2(G )
◮ Optional: Extend matchings of p1 with those in p2, if possible
Projection
Some variables in the output are distinguished
Projection
Some variables in the output are distinguished
Evaluation under projection:
◮ Consider all possible matchings of the graph pattern
◮ Remove columns that do not correspond to distinguished variables
Example of projection
name = Clint Eastwood
x1 : Person
title = x5
x4 : Movie
x2 : x3title = x9
x8 : Movie
x6 : x7
Example of projection
name = Clint Eastwood
x1 : Person
title = x5
x4 : Movie
x2 : x3title = x9
x8 : Movie
x6 : x7
x1 x2 x3 x4 x5 x6 x7 x8 x9
n1 e2 directs n2 The Bridges of Madison C. e5 acts in n5 Dirty Harry
n1 e5 acts in n5 Dirty Harry e2 directs n2 The Bridges of Madison C.
n1 e1 acts in n2 The Bridges of Madison C. e5 acts in n5 Dirty Harry
n1 e5 acts in n5 Dirty Harry e1 acts in n2 The Bridges of Madison C.
n1 e1 acts in n2 The Bridges of Madison C. e2 directs n2 The Bridges of Madison C.
n1 e2 directs n2 The Bridges of Madison C. e1 acts in n2 The Bridges of Madison C.
n1 e1 acts in n2 The Bridges of Madison C. e1 acts in n2 The Bridges of Madison C.
n1 e2 directs n2 The Bridges of Madison C. e2 directs n2 The Bridges of Madison C.
n1 e5 acts in n5 Dirty Harry e5 acts in n5 Dirty Harry
Example of projection
name = Clint Eastwood
x1 : Person
title = x5
x4 : Movie
x2 : x3title = x9
x8 : Movie
x6 : x7
x3 x5 x7 x9
directs The Bridges of Madison C. acts in Dirty Harry
acts in Dirty Harry directs The Bridges of Madison C.
acts in The Bridges of Madison C. acts in Dirty Harry
acts in Dirty Harry acts in The Bridges of Madison C.
acts in The Bridges of Madison C. directs The Bridges of Madison C.
directs The Bridges of Madison C. acts in The Bridges of Madison C.
acts in The Bridges of Madison C. acts in The Bridges of Madison C.
directs The Bridges of Madison C. directs The Bridges of Madison C.
acts in Dirty Harry acts in Dirty Harry
Example of filter
name = Clint Eastwood
x1 : Person
title = x5
year = z
x4 : Movie
x2 : x3title = x9
year = y
x8 : Movie
x6 : x7
FILTER x5 6= x9 AND y < 1985 AND z < 1985
Examples of union and difference
◮ Movies in which C. Eastwood acted or directed
◮ Movies in which C. Eastwood acted but not directed
Examples of union and difference
◮ Movies in which C. Eastwood acted or directed
◮ Movies in which C. Eastwood acted but not directed
Examples of union and difference
◮ Movies in which C. Eastwood acted or directed
◮ Movies in which C. Eastwood acted but not directed
OPTIONAL operator
Allows to match parts of the data only if available
◮ Important in the context of semistructured data
◮ Developed by the RDF community
◮ Corresponds to left-outer join in relational algebra
An example with OPTIONAL
The property graph
evaluated in
Paper
year: 1985
name: AKV85
Conference
name: STOC85
published in
Paper
year: 1985
name: Saf85
Review
text: "This paper..."
publ
ished
in
An example with OPTIONAL
The property graph
evaluated in
Paper
year: 1985
name: AKV85
Conference
name: STOC85
published in
Paper
year: 1985
name: Saf85
Review
text: "This paper..."
publ
ished
in
The pattern with optional
(x ,published in, y) OPTIONAL (y , evaluated in, z)
An example with OPTIONAL
A matching
name: STOC85
Paper
year: 1985
name: AKV85
published in
Paper
year: 1985
name: Saf85
Review
text: "This paper..."
publ
ished
in
evaluated in
Conference
The pattern with optional
(x ,published in, y) OPTIONAL (x , evaluated in, z)
An example with OPTIONAL
Another matching
name: STOC85
published in
Paper
year: 1985
name: Saf85
Review
text: "This paper..."
publ
ished
in
evaluated in
Paper
year: 1985
name: AKV85
Conference
The pattern with optional
(x ,published in, y) OPTIONAL (x , evaluated in, z)
Pattern trees
A way to formalize bgps + optional operator
Pattern trees
A way to formalize bgps + optional operator
A well-designed pattern tree (WDPT) is a tuple (T , λ) such that:
1. T is a rooted tree
2. λ maps each node t in T to a bgp
3. for every variable y , the set of nodes of T where y appears isconnected (well-designedness)
Example of WDPT
WDPT t
{(x , rec by , y), (x , publ in, “after2010”)}
{(x ,NMErank , z)} {(y , formed in, z ′)}
Example of WDPT
WDPT t
{(x , rec by , y), (x , publ in, “after2010”)}
{(x ,NMErank , z)} {(y , formed in, z ′)}
It represents the following graph pattern:
((x , rec by , y), (x , publ in, “after2010”) OPT (x ,NMErank , z)
)
OPT (y , formed in, z ′)
Evaluation of WDPTs
Consider WDPT t = (T , λ) with root r and subtree T ′ of T rooted in r :
Evaluation of WDPTs
Consider WDPT t = (T , λ) with root r and subtree T ′ of T rooted in r :
pT ′ := bgp formed by all triples in the nodes of T ′.
Evaluation of WDPTs
Consider WDPT t = (T , λ) with root r and subtree T ′ of T rooted in r :
pT ′ := bgp formed by all triples in the nodes of T ′.
A matching from p to G is a replacement hof some variables of p with constants such that:
1. There is subtree T ′ of T rooted in r such thath is a matching from pT ′ to G .
2. h cannot be extended any further.
COMPLEXITY OF EVALUATION FOR BGPS
A simplified scenario
◮ We consider edge-labeled graphs only
◮ Variables only occur as node names
◮ Bgps are equipped with projection
The decision problem we study
PROBLEM : Evaluation
INPUT : An edge-labeled graph G and a bgp p
QUESTION : Is it the case that p(G ) 6= ∅?
The decision problem we study
PROBLEM : Evaluation
INPUT : An edge-labeled graph G and a bgp p
QUESTION : Is it the case that p(G ) 6= ∅?
Recall that:
◮ p(G ) 6= ∅ ⇐⇒ there is a matching from p to G .
◮ Matching: replacement of variables in p by constants such thatresulting graph G ′ is contained in G (homomorphism)
How many such replacements we have to consider?
How many such replacements we have to consider?
At most nm, where:
◮ n = number of nodes in G
◮ m = number of variables in p
How many such replacements we have to consider?
At most nm, where:
◮ n = number of nodes in G
◮ m = number of variables in p
This is exponential in the size of the input!
How many such replacements we have to consider?
At most nm, where:
◮ n = number of nodes in G
◮ m = number of variables in p
This is exponential in the size of the input!
But can we do better?
How many such replacements we have to consider?
At most nm, where:
◮ n = number of nodes in G
◮ m = number of variables in p
This is exponential in the size of the input!
But can we do better?
◮ Most likely no, under complexity-theoretical assumptions
The complexity of evaluation
Proposition (Chandra and Merlin, 1977)
The evaluation problem is NP-complete
The complexity of evaluation
Proposition (Chandra and Merlin, 1977)
The evaluation problem is NP-complete
◮ For the upper bound:Guess a replacement, check it is a matching
◮ For the lower bound:Reduce from 3-colorability
Proof of the lower bound
Given an undirected graph G = (V ,E ), where:
◮ V = {1, . . . , n} and E ⊆ V × V (symmetric)
Proof of the lower bound
Given an undirected graph G = (V ,E ), where:
◮ V = {1, . . . , n} and E ⊆ V × V (symmetric)
Represent G as the bgp p:
{(xi ,E , xj ) | (i , j) ∈ E}
Proof of the lower bound
Given an undirected graph G = (V ,E ), where:
◮ V = {1, . . . , n} and E ⊆ V × V (symmetric)
Represent G as the bgp p:
{(xi ,E , xj ) | (i , j) ∈ E}
Consider edge-labeled graph G ′ given as:
{(a,E , r ), (r ,E , a), (a,E , g ), (g ,E , a), (g ,E , r ), (r ,E , g)}
Proof of the lower bound
Given an undirected graph G = (V ,E ), where:
◮ V = {1, . . . , n} and E ⊆ V × V (symmetric)
Represent G as the bgp p:
{(xi ,E , xj ) | (i , j) ∈ E}
Consider edge-labeled graph G ′ given as:
{(a,E , r ), (r ,E , a), (a,E , g ), (g ,E , a), (g ,E , r ), (r ,E , g)}
=⇒ p(G ′) 6= ∅ iff G is 3-colorable
Does this proof represent a practical situation?
Does this proof represent a practical situation?
The bgp p is bigger than the edge-labeled graph G ′
◮ In practice, G ′ is large and p is much smaller
Does this proof represent a practical situation?
The bgp p is bigger than the edge-labeled graph G ′
◮ In practice, G ′ is large and p is much smaller
Question:Should p and G be be assigned the same “relevance” in evaluation?
A radical solution
Consider only G to be part of the input: The bgp p is fixed
A radical solution
Consider only G to be part of the input: The bgp p is fixed
This corresponds to the notion of data complexity (Vardi, 1981)
A radical solution
Consider only G to be part of the input: The bgp p is fixed
This corresponds to the notion of data complexity (Vardi, 1981)
Data complexity of bgp evaluation:
◮ Can be solved in polynomial time O(|G ||p|), since now p is fixed
◮ Actually, it belongs to a low complexity class inside LOGSPACE
◮ Evaluation of bgps in data complexity is “parallelizable”
A radical solution
Consider only G to be part of the input: The bgp p is fixed
This corresponds to the notion of data complexity (Vardi, 1981)
Data complexity of bgp evaluation:
◮ Can be solved in polynomial time O(|G ||p|), since now p is fixed
◮ Actually, it belongs to a low complexity class inside LOGSPACE
◮ Evaluation of bgps in data complexity is “parallelizable”
Data complexity: good explanation to success of evaluation in practice
Still, if G is very large...
... a running time of O(|G ||p|) is not so good
Still, if G is very large...
... a running time of O(|G ||p|) is not so good
A different idea:
◮ Understand the structure of bgps used in practice
◮ We have good indications that many of them are nearly-acyclic
Acyclic bgps
Those whose underlying undirected graph is acyclic
Acyclic bgps
Those whose underlying undirected graph is acyclic
◮ (x , a, y), (y , b, z), (x , c , z ′) is acyclic
◮ (x , a, y), (y , b, z), (x , c , z ′), (z ′, d , x), (x , e, x) is also acyclic
◮ (x , a, y), (y , b, z), (x , c , z) is not acyclic
Evaluation of acyclic bgps
Theorem (Yannakakis, 1981)
Acyclic bgps can be evaluated in linear time O(|G | · |p|).
Evaluation of acyclic bgps
Theorem (Yannakakis, 1981)
Acyclic bgps can be evaluated in linear time O(|G | · |p|).
Idea of the proof:
◮ Bottom-up dynamic programming algorithm
◮ Labels each node v with the set
{h(v) | h is a matching of subtree of p rooted in v}
In practice bgps are not acyclic, but “nearly” acyclic
Question: How do we measure the degree of acyclicity of a bgp?
In practice bgps are not acyclic, but “nearly” acyclic
Question: How do we measure the degree of acyclicity of a bgp?
Approach taken in the literature:
◮ Use notion of treewidth: measures the degree of acyclicity of a graph
◮ Measure the degree of acyclicity of the underlying graph of the bgp
Treewidth of a graph
Measures how much an undirected graph “resembles” a tree:
◮ Acyclic graphs have treewidth one
◮ Cycles have treewidth two
◮ The complete graph on k elements has treewidth (k − 1)
◮ The k × k grid has treewidth k
Tree decompositions
Notion used to define treewidth
◮ It decomposes a graph as a tree while preserving connectivity
Tree decompositions
Notion used to define treewidth
◮ It decomposes a graph as a tree while preserving connectivity
A tree decomposition of G = (V ,E ) is a pair (T , λ) such that:
◮ T is a tree
◮ λ assigns a set of nodes in V to each node t in T
◮ If {u, v} ∈ E , then u, v appear together in one such λ(t)
◮ For u ∈ V , the set of λ(t)’s that contain u is connected
Tree decompositions
Notion used to define treewidth
◮ It decomposes a graph as a tree while preserving connectivity
A tree decomposition of G = (V ,E ) is a pair (T , λ) such that:
◮ T is a tree
◮ λ assigns a set of nodes in V to each node t in T
◮ If {u, v} ∈ E , then u, v appear together in one such λ(t)
◮ For u ∈ V , the set of λ(t)’s that contain u is connected
The width of (T , λ) is(max {|λ(t)| : t ∈ T} − 1
)
Definition of treewidth
The treewidth of G = (V ,E ) is
◮ The minimum width of its tree decompositions
Definition of treewidth
The treewidth of G = (V ,E ) is
◮ The minimum width of its tree decompositions
The treewidth of a bgp is the treewidth of its underlying graph
Bgps of small treewidth are well-behaved
Theorem (Chekuri and Rajaraman, 1997)
Evaluation for bgps of treewidth k can be solved in time
O(|G |k · |p|).
Bgps of small treewidth are well-behaved
Theorem (Chekuri and Rajaraman, 1997)
Evaluation for bgps of treewidth k can be solved in time
O(|G |k · |p|).
Idea of the proof:
◮ Compute a tree decomposition (T , λ) of p of width k
◮ Perform a dyn programming procedure on (T , λ) as for acyclic bgps
Can one improve on bounded treewidth?
Can one improve on bounded treewidth?
Grohe’s Theorem (Grohe, 2002): The following are equivalent
◮ A class C of bgps is tractable
◮ The treewidth of the bgps in C is bounded modulo equivalence
OPTIMIZATION OF BGPS
A theoretical notion of optimization
Find an equivalent bgp of minimal size
A theoretical notion of optimization
Find an equivalent bgp of minimal size
Bgps p and p′ are equivalent if
◮ p(G ) = p′(G ), over every edge-labeled graph
A theoretical notion of optimization
Find an equivalent bgp of minimal size
Bgps p and p′ are equivalent if
◮ p(G ) = p′(G ), over every edge-labeled graph
We assume that bgps are equipped with projection
◮ We write p(x) to denote that x are the distinguished variables of p
Equivalence of bgps is the same than evaluation
Theorem (Chandra and Merlin, 1977)
The following are equivalent for bgps p(x) and p′(x ′):
◮ p(x) and p′(x ′) are equivalent
◮ x ∈ p′(p) and x ′ ∈ p(p′)
Equivalence of bgps is the same than evaluation
Theorem (Chandra and Merlin, 1977)
The following are equivalent for bgps p(x) and p′(x ′):
◮ p(x) and p′(x ′) are equivalent
◮ x ∈ p′(p) and x ′ ∈ p(p′)
As a corollary:
◮ Checking equivalence of bgps is NP-complete
Equivalence of bgps is the same than evaluation
Theorem (Chandra and Merlin, 1977)
The following are equivalent for bgps p(x) and p′(x ′):
◮ p(x) and p′(x ′) are equivalent
◮ x ∈ p′(p) and x ′ ∈ p(p′)
As a corollary:
◮ Checking equivalence of bgps is NP-complete
This is considered to be a positive result (bgps are small)
Minimization of bgps
Theorem (Hell and Nesetril, 1992)
A minimal equivalent bgp of p is
◮ unique (up to renaming of variables)
◮ always contained in p
◮ corresponds to the core of p
Minimization of bgps
Theorem (Hell and Nesetril, 1992)
A minimal equivalent bgp of p is
◮ unique (up to renaming of variables)
◮ always contained in p
◮ corresponds to the core of p
As a corollary:
◮ Minimization of bgps can be carried out in PNP (feasible)
NAVIGATION
Graphs are there to be navigated
Traversing the edges of the graph recursively is useful for:
◮ Detecting implicit relations over the data
◮ Allowing more flexibility at the moment of querying
Recall our property graph?
authored by
Author
name: Leonid Libkin
affil: U. of Edinburgh
Author
name: Robert Tarjan
affil: Princeton Univ.
Article
name: jacmHopcroft74
year: 1974
Venue
name: FOCS67
authored by
corresponding: YES
Author
Author
Author
Author
name: John Hopcroft
affil: Cornell Univ.
name: Jeff Ullman
affil: Stanford Univ.
name: Ron Fagin
affil: IBM Almaden
name: Moshe Vardi
affil: Rice University
Author
name: Limsoon Wong
affil: NU Singapore
Article
name: FOCSHopcroft67
year: 1967
Article
name: PODSUllman89
year: 1989
Article
name: PODSFaginUV83
year: 1983
Article
name: PODSVardi95
year: 1995
Article
name: PODSLibkin95
year: 1995
Article
name: IPLLibkinW95
year: 1995
Journal
name: JACM
Venue
Venue
Venue
Journal
name: PODS89
name: PODS83
name: PODS95
name: IPL
journal
journal
part of
part of
part of
part of
Conference
Conference
name: FOCS
name: PODS
series
series
part of
serie
s
series
authored by
authored by
authored by
authored by
authored by
corresponding: YES
corresponding: YES
authored by
authored by
cooresponding: YES
cooresponding: YES
authored by
authored by
authored by
cooresponding: YES
An example of navigation in such graph
Find pairs of authors linked by a coauthorship sequence
An example of navigation in such graph
Find pairs of authors linked by a coauthorship sequence
(x , (authored by−1 · authored by)∗ , y
)
Regular path queries (RPQs)
Expressions of the form (x , r , y), where r is a regular expression:
r , r ′ := ∅ | ǫ | a | a−1 | r ∪ r ′ | r · r ′ | r∗ (a ∈ Lab)
Regular path queries (RPQs)
Expressions of the form (x , r , y), where r is a regular expression:
r , r ′ := ∅ | ǫ | a | a−1 | r ∪ r ′ | r · r ′ | r∗ (a ∈ Lab)
Evaluation:
◮ Set r(G ) of pairs of nodes linked by a path whose label is in L(r)
Paths by example
A path in the edge-labeled graph:
:Ronald FagininPods:83
:John E. HopcroftinFocs:FOCS8
journal:jacm Jacm:HopcroftT74 :Robert E Tarjan
:Jeffrey Ullman
conf:focs Focs:HopU67a
:Moshe Y. Vardi
series
series
journal
partOf
partOf
creatorcreator
creatorcreator
creator
creator
Pods:FaginUV83creator
creator:Leonid Libkin
partOf
creatorcreator
:Limsoon Wongjournal
inPods:89
series
Pods:Ullman89creatorpartOf
Pods:Vardi95
conf:pods
partOf
IPL:LibkinW95
inPods:95
Pods:Libkin95
journal:IPL
creator
series
Labels of path by example
The label of such path is partOf series:
:Ronald FagininPods:83
:John E. HopcroftinFocs:FOCS8
journal:jacm Jacm:HopcroftT74 :Robert E Tarjan
:Jeffrey Ullman
conf:focs Focs:HopU67a
:Moshe Y. Vardi
series
series
journal
partOf
partOf
creatorcreator
creatorcreator
creator
creator
Pods:FaginUV83creator
creator:Leonid Libkin
partOf
creatorcreator
:Limsoon Wongjournal
inPods:89
series
Pods:Ullman89creatorpartOf
Pods:Vardi95
conf:pods
partOf
IPL:LibkinW95
inPods:95
Pods:Libkin95
journal:IPL
creator
series
More formally
Given an edge-labeled graph G = (V ,E ) such that
◮ E ⊆ V × Lab× V
More formally
Given an edge-labeled graph G = (V ,E ) such that
◮ E ⊆ V × Lab× V
A path in G is a sequence of consecutive edges in G , i.e.,
v1a1−→ v2
a2−→ v3 · · · vkak−→ vk+1
More formally
Given an edge-labeled graph G = (V ,E ) such that
◮ E ⊆ V × Lab× V
A path in G is a sequence of consecutive edges in G , i.e.,
v1a1−→ v2
a2−→ v3 · · · vkak−→ vk+1
The label of such path is the word
a1a2 · · · ak ∈ Lab∗
Evaluation problem for RPQs
PROBLEM : Evaluation for RPQs
INPUT : An edge-labeled graph G ,
a pair (u, v) of nodes in G , and a RPQ (x , r , y)
QUESTION : Is it the case that (u, v) ∈ r(G )?
(i.e., is there a path from u to v with label in L(r)?)
The complexity of evaluation for RPQs
Theorem (Folklore)
The evaluation of RPQs can be solved in linear time O(|G | · |r |)
The complexity of evaluation for RPQs
Theorem (Folklore)
The evaluation of RPQs can be solved in linear time O(|G | · |r |)
For the proof:Exploit the connection between edge-labeled graphs, RPQs, and NFAs
Non-deterministic finite automata (NFAs)
Tuple A = (Q, q0,F , δ), where:
◮ Q is a finite set of states
◮ q0 ∈ Q is the initial state
◮ F ⊆ Q are the final states
◮ δ ⊆ Q × Lab× Q is the transition relation
Acceptance by NFAs
L(A) = set of words “accepted” by A
Acceptance by NFAs
L(A) = set of words “accepted” by A
When is the worda1a2 . . . ak ∈ Lab∗
accepted by A?
Acceptance by NFAs
L(A) = set of words “accepted” by A
When there are states
q0a1q1a2q2 . . . qk−1akqk ∈ Lab∗
such that qk ∈ F and (qi−1, ai , qi ) ∈ δ for each 1 ≤ i ≤ k
The emptiness problem for NFAs
Given an NFA A, is L(A) 6= ∅?
The emptiness problem for NFAs
Given an NFA A, is L(A) 6= ∅?
◮ Can be solved in time O(|A|) by DFS
The emptiness problem for NFAs
Given an NFA A, is L(A) 6= ∅?
◮ Can be solved in time O(|A|) by DFS
Given NFAs A1 and A2, is L(A1) ∩ L(A2) 6= ∅?
The emptiness problem for NFAs
Given an NFA A, is L(A) 6= ∅?
◮ Can be solved in time O(|A|) by DFS
Given NFAs A1 and A2, is L(A1) ∩ L(A2) 6= ∅?
◮ Construct the cross product A1 ×A2 in time O(|A1| · |A2|)
◮ Check if L(A1 ×A2) 6= ∅ in time O(|A1| · |A2|)
Edge-labeled graphs can be seen as NFAs
◮ Nodes are states
◮ Edges ua−→ v are transitions
◮ There are no initial and final states
Regular expressions can be turned into NFAs
Proposition
There is an algorithm that given a regular expression r
◮ constructs in time O(|r |) an NFA A such that L(A) = L(r)
Proof of the theorem on evaluation of RPQs
Theorem
The evaluation of RPQs can be solved in linear time O(|G | · |r |)
Proof of the theorem on evaluation of RPQs
Theorem
The evaluation of RPQs can be solved in linear time O(|G | · |r |)
Proof idea:
◮ Compute in linear time from r an equivalent NFA A◮ Compute in linear time the NFA (G , u, v):
◮ Obtained from G by setting u and v as initial and final state, resp.
◮ Then (u, v) ∈ r(G ) iff NFA (G , u, v)×A accepts a word
◮ Check the latter in time O(|G | · |A|) = O(|G | · |r |)
A semantics based on simple paths?
Simple paths: No node is repeated
A semantics based on simple paths?
Simple paths: No node is repeated
Simple paths semantics:
◮ Motivated by apps for which simple paths are more natural
◮ Studied in the context of semistructured data in the early 90s
◮ Revival due to application in early versions of SPARQL 1.1
RPQs under simple paths semantics
Problem: Evaluation for RPQs under simple paths
Input: An edge-labeled graph G ,
a pair (u, v) of nodes in G , and a RPQ (x , r , y)
Question: Is there a simple path from
u to v in G whose label is in L(r)?
Complexity of finding regular simple paths
Theorem (Mendelzon and Wood, 1989)
Evaluation for RPQs under simple paths is NP-complete
◮ This holds even if r = (aa)∗
◮ In fact, even if r = w∗, for w any word of length ≥ 2
Complexity of finding regular simple paths
Theorem (Mendelzon and Wood, 1989)
Evaluation for RPQs under simple paths is NP-complete
◮ This holds even if r = (aa)∗
◮ In fact, even if r = w∗, for w any word of length ≥ 2
Complexity of finding regular simple paths
Theorem (Mendelzon and Wood, 1989)
Evaluation for RPQs under simple paths is NP-complete
◮ This holds even if r = (aa)∗
◮ In fact, even if r = w∗, for w any word of length ≥ 2
Complexity of finding regular simple paths
Theorem (Mendelzon and Wood, 1989)
Evaluation for RPQs under simple paths is NP-complete
◮ This holds even if r = (aa)∗
◮ In fact, even if r = w∗, for w any word of length ≥ 2
Proof:
◮ The following is NP-complete (Lapaugh and Papadimitriou, 1984)• Given a digraph G = (V ,E ) and two nodes u, v ,
is there a directed path from u to v of even length?
In conclusion
◮ Evaluation of RPQs under simple paths is NP-complete
◮ This holds even in data complexity
◮ This casts serious doubts on the applicability of the semantics
EXTENDING BGPS WITH NAVIGATION
Conjunctive regular path queries (CRPQs)
Obtained by combining bgps + RPQs
Example of C2RPQ
The C2RPQ
(x , creator−, y), (y , partOf · series, z), (y , creator , u)
computes pairs (a1, a2) that are coauthors of a conference paper
:Ronald FagininPods:83
:John E. HopcroftinFocs:FOCS8
conf:pods
journal:jacm Jacm:HopcroftT74 :Robert E Tarjan
:Jeffrey Ullman
conf:focs Focs:HopU67a
:Moshe Y. Vardi
series
series
journal
partOf
partOf
creatorcreator
creatorcreator
creator
creator
Pods:FaginUV83creator
creator:Leonid Libkin
partOf
creatorcreator
:Limsoon Wongjournal
creatorPods:Ullman89inPods:89
partOf
series
Pods:Libkin95
IPL:LibkinW95
partOf creatorinPods:95
journal:IPL
series
Pods:Vardi95
Example of C2RPQ
The C2RPQ
(x , creator−, y), (y , partOf · series, z), (y , creator , u)
computes pairs (a1, a2) that are coauthors of a conference paper
creator
inPods:83
:John E. HopcroftinFocs:FOCS8
journal:jacm Jacm:HopcroftT74 :Robert E Tarjan
:Jeffrey Ullman
conf:focs Focs:HopU67apartOf
creatorcreator
creatorcreator
Pods:FaginUV83
creator:Leonid Libkin
partOf
creatorcreator
:Limsoon Wongjournal
creator
partOfseries creatorconf:pods :Ronald Fagin
zy
u
x
:Moshe Y. Vardi
series
journal
series
partOfPods:Ullman89
creatorinPods:89
Pods:Vardi95inPods:95partOf
IPL:LibkinW95
Pods:Libkin95
series
journal:IPL
creator
Example of C2RPQ
The C2RPQ
(x , creator−, y), (y , partOf · series, z), (y , creator , u)
computes pairs (a1, a2) that are coauthors of a conference paper
inPods:83
:John E. HopcroftinFocs:FOCS8
conf:pods
journal:jacm Jacm:HopcroftT74 :Robert E Tarjan
:Jeffrey Ullman
conf:focs Focs:HopU67a
:Moshe Y. Vardi
series
journal
partOf
creatorcreator
creatorcreator
creator
Pods:FaginUV83
creator:Leonid Libkin
partOf
creatorcreator
:Limsoon Wongjournal
creator
partOfseries creator:Ronald Fagin
a1
a2creator
inPods:89 Pods:Ullman89
series
partOf
IPL:LibkinW95
series
journal:IPL
creatorpartOf
Pods:Libkin95
Pods:Vardi95inPods:95
Formal definition of CRPQS
Set q of the form:
{(x1, L1, y1), . . . , (xm, Lm, ym)},
such that the xi , yi ’s are variables and the ri ’s reg exps
Formal definition of CRPQS
Set q of the form:
{(x1, L1, y1), . . . , (xm, Lm, ym)},
such that the xi , yi ’s are variables and the ri ’s reg exps
Evaluation: Set q(G ) of all possible “matchings” of q in G , i.e.,
◮ Replacements of the xi , yi by constants ci , disuch that (ci , di ) is linked by path with label in L(ri )
CRPQs vs RPQs
The CRPQ
(x , creator−, y), (y , partOf · series, z), (y , creator , u)
is not expressible as a RPQ over the graph database:
:Ronald FagininPods:83
:John E. HopcroftinFocs:FOCS8
conf:pods
journal:jacm Jacm:HopcroftT74 :Robert E Tarjan
:Jeffrey Ullman
conf:focs Focs:HopU67a
:Moshe Y. Vardi
series
series
journal
partOf
partOf
creatorcreatorcreatorcreator
creator
creator
Pods:FaginUV83creator
creator:Leonid Libkin
partOf
creatorcreator
:Limsoon Wongjournal
creatorPods:Ullman89inPods:89
partOf
series
Pods:Libkin95
IPL:LibkinW95
partOf creatorinPods:95
journal:IPL
series
Pods:Vardi95
Evaluation for CRPQs
PROBLEM : Evaluation for CRPQs
INPUT : An edge-labeled graph G and a CRPQ q
QUESTION : Is it the case that q(G ) 6= ∅?
Complexity of evaluation for CRPQs
Proposition (Folklore)
Evaluation of CRPQs is NP-complete
◮ But polynomial in data complexity
Complexity of evaluation for CRPQs
Proposition (Folklore)
Evaluation of CRPQs is NP-complete
◮ But polynomial in data complexity
Restrictions on degree of acyclicity yield tractability
QUERIES ON GRAPHS WITH DATA
Querying the data
So far queries only talk about the topology of the data
◮ We have restricted ourselves to data graphs
Querying the data
So far queries only talk about the topology of the data
◮ We have restricted ourselves to data graphs
Queries that combine topology and data are important in practice:
◮ Example: People of the same age connected by professional links
Querying the data
So far queries only talk about the topology of the data
◮ We have restricted ourselves to data graphs
Queries that combine topology and data are important in practice:
◮ Example: People of the same age connected by professional links
We need to develop navigational queries for property graphs
Data graphs and data paths
We work with data graphs as a simplification of property graphs
Data graphs and data paths
We work with data graphs as a simplification of property graphs
A data graph G is a tuple (V ,E , κ), where:
◮ (V ,E ) is an edge-labeled graph
◮ κ : V → D, where D is a set of data values
Data graphs and data paths
We work with data graphs as a simplification of property graphs
A data graph G is a tuple (V ,E , κ), where:
◮ (V ,E ) is an edge-labeled graph
◮ κ : V → D, where D is a set of data values
With each pathv1
a1−→ v2 · · · vkak−→ vk+1
we associate a data path
κ(v1)a1−→ κ(v2) · · · κ(vk)
ak−→ κ(vk+1)
The choice of a formalism
Formalism for querying data paths has to be chosen with care:
Theorem (Libkin and Vrgoc, 2012)
The following problem is NP-complete:
◮ Is there a data path from u to v with no repeated data value?
The choice of a formalism
Formalism for querying data paths has to be chosen with care:
Theorem (Libkin and Vrgoc, 2012)
The following problem is NP-complete:
◮ Is there a data path from u to v with no repeated data value?
If a language expresses this property:
◮ It is NP-hard in data complexity ⇒ impractical
Regular expressions with memory (REMs)
◮ They permit to specify when data values are remembered and used
◮ Data values are remembered in k registers {x1, . . . , xk}
◮ At any point we can compare a data value with one in the registers
◮ Based on the notion of register automata
REM: Example
Consider the REM ↓x .a+[x=].
Intuition:
◮ Store the current data value d in register x
◮ After reading a word in a+ check that d is seen again
Evaluation: Pairs (v , v ′) of nodes:
◮ Linked by a path labeled in a+.
◮ v and v ′ contain the same data value.
REM: Conditions
Conditions: Compare a data value with the ones in the registers
Conditions over {x1, . . . , xk} are given by the grammar:
c := x=i | ¬c | c ∧ c (1 ≤ i ≤ k)
Then (d , τ) |= c for d ∈ D and τ = (d1, . . . , dk) ∈ Dk :
◮ (d , τ) |= x=i iff d = di
◮ Boolean combinations are standard
REMs: Syntax and semantics (Intuition)
REMs over Σ and {x1, . . . , xk} are defined by grammar:
e := ε | a | e ∪ e | e · e | e+ | e[c] | ↓ x .e
where a ∈ Lab, c condition, and x tuple in {x1, . . . , xk}
REMs: Syntax and semantics (Intuition)
REMs over Σ and {x1, . . . , xk} are defined by grammar:
e := ε | a | e ∪ e | e · e | e+ | e[c] | ↓ x .e
where a ∈ Lab, c condition, and x tuple in {x1, . . . , xk}
Intuition: Evaluation of REM e on data graph G is:• pairs (v , v ′) of nodes linked by a data path ρ such that ρ |= e, where:
REMs: Syntax and semantics (Intuition)
REMs over Σ and {x1, . . . , xk} are defined by grammar:
e := ε | a | e ∪ e | e · e | e+ | e[c] | ↓ x .e
where a ∈ Lab, c condition, and x tuple in {x1, . . . , xk}
Intuition: Evaluation of REM e on data graph G is:• pairs (v , v ′) of nodes linked by a data path ρ such that ρ |= e, where:
◮ ρ |= e[c] if and only if
τ ∈ Dk st (κ(vk+1), τ) |= c
κ(v1)a1−→ κ(v2) · · · κ(vk)
ak−→ κ(vk+1)︸ ︷︷ ︸
can be parsed wrt estarting from empty registers
ρD
finishing in register value
REMs: Syntax and semantics (Intuition)
REMs over Σ and {x1, . . . , xk} are defined by grammar:
e := ε | a | e ∪ e | e · e | e+ | e[c] | ↓ x .e
where a ∈ Lab, c condition, and x tuple in {x1, . . . , xk}
Intuition: Evaluation of REM e on data graph G is:• pairs (v , v ′) of nodes linked by a data path ρ such that ρ |= e, where:
◮ ρ |=↓ x .e if and only ifρD
κ(v1)a1−→ κ(v2) · · · κ(vk)
ak−→ κ(vk+1)︸ ︷︷ ︸
can be parsed wrt estarting from the register valuethat assigns κ(v1) to each x ∈ x
REM: No closure under complement
Consider the REM Σ∗ · (↓x .Σ+[x=]) · Σ∗:
REM: No closure under complement
Consider the REM Σ∗ · (↓x .Σ+[x=]) · Σ∗:
◮ Defines pairs of nodes linked by data path ρ such that:• ρ contains the same data value twice.
◮ The complement of this language is the NP-complete propertymentioned before
REM: No closure under complement
Consider the REM Σ∗ · (↓x .Σ+[x=]) · Σ∗:
◮ Defines pairs of nodes linked by data path ρ such that:• ρ contains the same data value twice.
◮ The complement of this language is the NP-complete propertymentioned before
Corollary:
◮ REMs are not closed under complement
Complexity of REM evaluation
Theorem (Libkin, Vrgoc, 2012)
Evaluation for REMs is Pspace-complete
◮ But polynomial in data complexity
Complexity of REM evaluation
Theorem (Libkin, Vrgoc, 2012)
Evaluation for REMs is Pspace-complete
◮ But polynomial in data complexity
Both bounds extend to the class of conjunctive REMs
PATH QUERIES
CRPQs and path queries
CRPQs fall short of expressive power for applications that need:
◮ to include paths in the output of a query, and
◮ to define complex relationships among labels of paths.
CRPQs and path queries
CRPQs fall short of expressive power for applications that need:
◮ to include paths in the output of a query, and
◮ to define complex relationships among labels of paths.
Examples:
◮ Semantic Web queries:• establish semantic associations among paths.
◮ Biological applications:• compare paths based on similarity.
◮ Route-finding applications:• compare paths based on length or number of occurences of labels.
◮ Data provenance and semantic search over the Web:• require returning paths to the user.
Path comparisons
We use a set S of relations on words.
◮ Example: S may contain• Unary relations: Regular, context-free languages, etc.• Binary relations: prefix, equal length, subsequence, etc.
◮ Comparisons among labels of paths = Pertenence to some S ∈ S.• Example: w1 is a substring of w2.
◮ We assume S contains all regular languages.
Extended CRPQs
The S-extended CRPQs (ECRPQ(S)) are rules obtained from a CRPQ:
Ans(z , ) ← (x1, L1, y1), . . . , (xm, Lm, ym),
◮ by joining each pair (xi , yi ) with a path variable πi ,
◮ comparing labels of paths in πj wrt Sj ∈ S• for πj a tuple of path variables among the πi ’s,
◮ projecting some of πi ’s as a tuple χ in the output.
Extended CRPQs
The S-extended CRPQs (ECRPQ(S)) are rules obtained from a CRPQ:
Ans(z, ) ← (x1, π1, y1), . . . , (xm, πm, ym),
◮ by joining each pair (xi , yi ) with a path variable πi ,
◮ comparing labels of paths in πj wrt Sj ∈ S• for πj a tuple of path variables among the πi ’s,
◮ projecting some of πi ’s as a tuple χ in the output.
Extended CRPQs
The S-extended CRPQs (ECRPQ(S)) are rules obtained from a CRPQ:
Ans(z, ) ← (x1, π1, y1), . . . , (xm, πm, ym),∧
1≤j≤t Sj(πj )
◮ by joining each pair (xi , yi ) with a path variable πi ,
◮ comparing labels of paths in πj wrt Sj ∈ S• for πj a tuple of path variables among the πi ’s,
◮ projecting some of πi ’s as a tuple χ in the output.
Extended CRPQs
The S-extended CRPQs (ECRPQ(S)) are rules obtained from a CRPQ:
Ans(z, χ) ← (x1, π1, y1), . . . , (xm, πm, ym),∧
1≤j≤t Sj(πj )
◮ by joining each pair (xi , yi ) with a path variable πi ,
◮ comparing labels of paths in πj wrt Sj ∈ S• for πj a tuple of path variables among the πi ’s,
◮ projecting some of πi ’s as a tuple χ in the output.
Extended CRPQs and our requirements
ECRPQs meet our requirements:
Ans(z, χ) ← (x1, π1, y1), . . . , (xm, πm, ym),∧
1≤j≤t Sj(πj)
Extended CRPQs and our requirements
ECRPQs meet our requirements:
Ans(z, χ) ← (x1, π1, y1), . . . , (xm, πm, ym),∧
1≤j≤t Sj(πj)
◮ They allow to export paths in the output.
◮ They allow to compare labels of paths with relations Sj ∈ S.
Extended CRPQs and our requirements
ECRPQs meet our requirements:
Ans(z, χ) ← (x1, π1, y1), . . . , (xm, πm, ym),∧
1≤j≤t Sj(πj )
◮ They allow to export paths in the output.
◮ They allow to compare labels of paths with relations Sj ∈ S.
Evaluation of ECRPQs
Evaluation of the ECRPQ(S)
θ(z, χ) : Ans(z , χ) ← (x1, π1, y1), . . . , (xm, πm, ym),∧
j Sj(πj )
is defined in terms of “matchings”
Evaluation of ECRPQs
Evaluation of the ECRPQ(S)
θ(z, χ) : Ans(z , χ) ← (x1, π1, y1), . . . , (xm, πm, ym),∧
j Sj(πj )
is defined in terms of “matchings”
Matching for ECRPQs: Same than for CRPQs but:
◮ Each πi is mapped to a path ρi in the graph DB.
◮ If πj = (πj1 , . . . , πjk ) then: (Lab(ρj1), . . . , Lab(ρjk ))︸ ︷︷ ︸
the labels of (ρj1 , . . . , ρjk )
∈ Sj .
Evaluation of ECRPQs
Evaluation of the ECRPQ(S)
θ(z, χ) : Ans(z , χ) ← (x1, π1, y1), . . . , (xm, πm, ym),∧
j Sj(πj )
is defined in terms of “matchings”
Matching for ECRPQs: Same than for CRPQs but:
◮ Each πi is mapped to a path ρi in the graph DB.
◮ If πj = (πj1 , . . . , πjk ) then: (Lab(ρj1), . . . , Lab(ρjk )) ∈ Sj .
The evaluation θ(G) of θ(z, χ) on G is the set:
{(σ(z), σ(χ)) | σ is matching from θ to G}.
Considerations about ECRPQ(S)
• ECRPQ(S) extends the class of CRPQs.
◮ Ans(z)←∧
i(xi , Li , yi ) = Ans(z) ←∧
i(xi , πi , yi ), Li (πi ).
• Expressiveness and complexity of ECRPQ(S):
◮ Depends on the class S.
• We study two such classes with roots in formal language theory:
◮ Regular relations (Elgot, Mezei (1965)).
◮ Rational relations (Nivat (1968)).
COMPARING PATHS WITH REGULAR RELATIONS
Introduction
• Regular relations: Regular languages for relations of any arity
◮ REG: Class of regular relations.
• Bottomline:
ECRPQ(REG): Reasonable expressiveness and complexity.
Regular relations
n-ary regular relation:
Set of n-tuples (w1, . . . ,wn) of stringsaccepted by synchronous automaton over Σn.
Regular relations
n-ary regular relation:
Set of n-tuples (w1, . . . ,wn) of stringsaccepted by synchronous automaton over Σn.
◮ The input strings are written in the n-tapes.
◮ Shorter strings are padded with symbol ⊥.
◮ At each step:The automaton simultaneously reads next symbol on each tape.
Synchronous automata
w1 = a a b · · · a b c
w2 = a b a · · · a
w3 = b b · · ·...
...
wn = a b b · · · a c
Synchronous automata
w1 = a a b · · · a b c
w2 = a b a · · · a ⊥ ⊥
w3 = b b ⊥ · · · ⊥ ⊥ ⊥...
...
wn = a b b · · · a c ⊥
Synchronous automata
w1 = a a b · · · a b c
w2 = a b a · · · a ⊥ ⊥
w3 = b b ⊥ · · · ⊥ ⊥ ⊥...
...
wn = a b b · · · a c ⊥
⇑
Synchronous automata
w1 = a a b · · · a b c
w2 = a b a · · · a ⊥ ⊥
w3 = b b ⊥ · · · ⊥ ⊥ ⊥...
...
wn = a b b · · · a c ⊥
⇑
Synchronous automata
w1 = a a b · · · a b c
w2 = a b a · · · a ⊥ ⊥
w3 = b b ⊥ · · · ⊥ ⊥ ⊥...
...
wn = a b b · · · a c ⊥
⇑
Synchronous automata
w1 = a a b · · · a b c
w2 = a b a · · · a ⊥ ⊥
w3 = b b ⊥ · · · ⊥ ⊥ ⊥...
...
wn = a b b · · · a c ⊥
⇑
Synchronous automata
w1 = a a b · · · a b c
w2 = a b a · · · a ⊥ ⊥
w3 = b b ⊥ · · · ⊥ ⊥ ⊥...
...
wn = a b b · · · a c ⊥
⇑
Synchronous automata
w1 = a a b · · · a b c
w2 = a b a · · · a ⊥ ⊥
w3 = b b ⊥ · · · ⊥ ⊥ ⊥...
...
wn = a b b · · · a c ⊥
⇑
Examples of regular relations
• All regular languages.
• The prefix relation defined by:
( ⋃
a∈Σ
(a, a))∗·( ⋃
a∈Σ
(a,⊥))∗.
• The equal length relation defined by:
( ⋃
a,b∈Σ
(a, b))∗.
• Pairs of strings at edit distance at most k , for fixed k ≥ 0.
Examples of regular relations
• All regular languages.
• The prefix relation defined by:
( ⋃
a∈Σ
(a, a))∗·( ⋃
a∈Σ
(a,⊥))∗.
• The equal length relation defined by:
( ⋃
a,b∈Σ
(a, b))∗.
• Pairs of strings at edit distance at most k , for fixed k ≥ 0.
Proposition
The subsequence, subword and suffix relations are not regular.
ECRPQ(REG)
ECRPQ(REG): Class of queries of the form
Ans(z, χ) ←∧
i (xi , πi , yi ),∧
j Sj(πj),
where each Sj is a regular relation (B., Libkin, Lin, Wood (2012)).
ECRPQ(REG)
ECRPQ(REG): Class of queries of the form
Ans(z, χ) ←∧
i (xi , πi , yi ),∧
j Sj(πj),
where each Sj is a regular relation (B., Libkin, Lin, Wood (2012)).
Example: The ECRPQ(REG) query
Ans(x , y) ← (x , π1, z), (z , π2, y), a∗(π1), b
∗(π2), equal length(π1, π2)
computes pairs of nodes linked by a path labeled in {anbn | n ≥ 0}.
ECRPQ(REG)
ECRPQ(REG): Class of queries of the form
Ans(z, χ) ←∧
i (xi , πi , yi ),∧
j Sj(πj),
where each Sj is a regular relation (B., Libkin, Lin, Wood (2012)).
Example: The ECRPQ(REG) query
Ans(x , y) ← (x , π1, z), (z , π2, y), a∗(π1), b
∗(π2), equal length(π1, π2)
computes pairs of nodes linked by a path labeled in {anbn | n ≥ 0}.
Corollary
ECRPQ(REG) properly extends the class of CRPQs.
Complexity of evaluation of ECRPQ(REG)
Theorem (B., Libkin, Lin and Wood, 2012)
Evaluation for ECRPQ(REG) is Pspace-complete.
◮ But polynomial in data complexity
COMPARING PATHS WITH RATIONAL RELATIONS
Introduction
ECRPQ(REG) queries are still short of expressive power:
◮ RDF or biological networks:• Compare strings based on subsequence and subword relations.
◮ These relations are rational: Accepted by “asynchronous” automata.• RAT: Class of rational relations.
Bottomline:
◮ ECRPQ(RAT) evaluation:• Undecidable or very high complexity.
◮ Restricting the syntactic shape of queries yields tractability.
Rational relations
n-ary rational relation:Set of n-tuples (w1, . . . ,wn) of stringsaccepted by asynchronous automaton with n heads.
Rational relations
n-ary rational relation:Set of n-tuples (w1, . . . ,wn) of stringsaccepted by asynchronous automaton with n heads.
◮ The input strings are written in the n-tapes.
◮ At each step:The automaton enters a new state and move some tape heads.
Rational relations
n-ary rational relation:Set of n-tuples (w1, . . . ,wn) of stringsaccepted by asynchronous automaton with n heads.
◮ The input strings are written in the n-tapes.
◮ At each step:The automaton enters a new state and move some tape heads.
n-ary rational relation:Described by regular expression over alphabet (Σ ∪ {ǫ})n.
Examples of rational relations
• All regular relations.
• The subsequence relation �ss defined by:
(( ⋃
a∈Σ
(a, ǫ))∗
⋃
b∈Σ
(b, b)
)∗
·( ⋃
a∈Σ
(a, ǫ))∗.
• The subword relation �sw defined by:
( ⋃
a∈Σ
(a, ǫ))∗·( ⋃
b∈Σ
(b, b))∗·( ⋃
a∈Σ
(a, ǫ))∗.
Examples of rational relations
• All regular relations.
• The subsequence relation �ss defined by:
(( ⋃
a∈Σ
(a, ǫ))∗
⋃
b∈Σ
(b, b)
)∗
·( ⋃
a∈Σ
(a, ǫ))∗.
• The subword relation �sw defined by:
( ⋃
a∈Σ
(a, ǫ))∗·( ⋃
b∈Σ
(b, b))∗·( ⋃
a∈Σ
(a, ǫ))∗.
Proposition
The pairs (w1,w2) such that w1 is the reversal of w2 is not rational.
ECRPQ(RAT)
ECRPQ(RAT): Class of queries of the form
Ans(z, χ) ←∧
i (xi , πi , yi ),∧
j Sj(πj),
where each Sj is a rational relation (B., Figueira, Libkin (2012)).
Example: The ECRPQ(RAT) query
Ans(x , y) ← (x , π1, z), (y , π2,w), π1 �ss π2
computes x , y that are origins of paths ρ1 and ρ2 such that:
◮ λ(ρ1) is a subsequence of λ(ρ2).
Evaluation of ECRPQ(RAT) queries
Evaluation of queries in ECRPQ(RAT) is undecidable, but:
◮ True if we allow only practically motivated rational relations?• For example, �ss and �sw.
Evaluation of ECRPQ(RAT) queries
Evaluation of queries in ECRPQ(RAT) is undecidable, but:
◮ True if we allow only practically motivated rational relations?• For example, �ss and �sw.
Adding subword relation to ECRPQ(REG) leads to undecidability:
Theorem (B., Figueira and Libkin, 2012)
Evaluation for ECRPQ(REG ∪{�sw})) is undecidable.
Evaluation of ECRPQ(RAT) queries
Evaluation of queries in ECRPQ(RAT) is undecidable, but:
◮ True if we allow only practically motivated rational relations?• For example, �ss and �sw.
Adding subword relation to ECRPQ(REG) leads to undecidability:
Theorem (B., Figueira and Libkin, 2012)
Evaluation for ECRPQ(REG ∪{�sw})) is undecidable.
Adding subsequence preserves decidability, but at a very high cost:
Theorem (B., Figueira and Libkin, 2012)
Evaluation for (ECRPQ(REG ∪{�ss})) is decidable,but non-primitive-recursive.
Acyclic ECRPQ(RAT) queries
Acyclic ECRPQ(RAT) queries yield tractable data complexity.
◮ Queries of the form:
Ans(z)←∧
i≤k
(xi , πi , yi ), Li (πi ),∧
j
Sj(πj1 , πj2),
where the graph on {1, . . . , k} defined by edges (πj1 , πj2) is acyclic.
Acyclic ECRPQ(RAT) queries
Acyclic ECRPQ(RAT) queries yield tractable data complexity.
◮ Queries of the form:
Ans(z)←∧
i≤k
(xi , πi , yi ), Li (πi ),∧
j
Sj(πj1 , πj2),
where the graph on {1, . . . , k} defined by edges (πj1 , πj2) is acyclic.
Acyclic ECRPQ(RAT) is not more expensive than ECRPQ(REG):
Theorem (B., Figueira and Libkin, 2012)
Evaluation of acyclic ECRPQ(RAT) queries is Pspace-complete.
◮ But polynomial in data complexity
What are your conclusions?