querying graph databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · early graph query...
TRANSCRIPT
![Page 1: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/1.jpg)
Querying Graph Databases
1 / 83
![Page 2: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/2.jpg)
Graph DBs and applications
• Graph DBs are crucial when topology is as important as data itself.
• Renewed interest due to new applications:
◮ Semantic Web and RDF.
◮ Social networks.
◮ Security and crime detection.
◮ Knowledge representation.
◮ etc etc
◮ ...
2 / 83
![Page 3: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/3.jpg)
Querying graph DBs and relational technology
Why not to use relational technology?
◮ Translate graph DB G → relational database D(G), and query D(G).
Problems:
1. Languages for graph DBs are navigational and require recursion.
2. They can be translated into Datalog, but there are problems:(a) Implementation:• SQL’s recursion is hard to optimize, especially in complex queries,on large databases.(b) Complexity mismatch:• Datalog evaluation is Ptime-complete, but in NLogspace formany graph languages.• Basic static analysis tasks undecidable for Datalog, but decidablefor several graph languages.
3 / 83
![Page 4: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/4.jpg)
Early graph query languages
Graph query languages flourished from the mid 80s to the late 90s:
◮ G, G+, and GraphLog for hypertext and semistructured data, late1980s
◮ GOOD for graph-based models of object DBs, 1990
◮ Hyperlog for hypergraphs, 1994
◮ Languages for heterogeneous and unstructured data, Lorel, StruQL,etc (late 1990s)
4 / 83
![Page 5: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/5.jpg)
Features of graph query languages
◮ Navigation: Recursively traverse the edges of the graph.
◮ Pattern matching: Check if a pattern appears in the graph DB.
And more sophisticated features:
◮ Path comparisons.
◮ Comparisons of the underlying data.
5 / 83
![Page 6: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/6.jpg)
Key problems theory studies:
Expressiveness: What can be said in a query language L?
Complexity of evaluation:
Problem: Eval(L)Input: A graph DB G, a tuple t of objects,
an L-query Q.Question: Is t ∈ Q(G)?
◮ Combined complexity: Both G and Q are part of the input.
◮ Data complexity: Only G is part of the input and Q is fixed.
Containment: We study the problem Cont(L):
◮ Given L-queries Q1,Q2, is Q1(G) ⊆ Q2(G) for every graph DB G?
6 / 83
![Page 7: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/7.jpg)
Graph data model
Different applications have given rise to a many (slightly) different graphDB models. But the essence is the same:
Finite, directed, edge labeled graphs.
Despite the simplicity of the model:
◮ It is flexible enough to accommodate many other more complexmodels and express interesting practical scenarios.
◮ The most fundamental theoretical issues related to querying graphDBs appear in it already.
7 / 83
![Page 8: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/8.jpg)
Graph databases
Definition
A graph DB G over finite alphabet Σ is a pair:
(V , E )set of edges of the form v1
a−→ v2finite set of node ids
(v1, v2 ∈ V , a ∈ Σ)
8 / 83
![Page 9: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/9.jpg)
Graph databases
Definition
A graph DB G over finite alphabet Σ is a pair:
(V , E )set of edges of the form v1
a−→ v2finite set of node ids
(v1, v2 ∈ V , a ∈ Σ)
• A path in G is a sequence of the form:
ρ = v1a1−→ v2
a2−→ v3 · · · vkak−→ vk+1.
• The label of ρ is λ(ρ) = a1a2 · · · ak−1 ∈ Σ∗.
8 / 83
![Page 10: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/10.jpg)
Graph DBs: Example
A graph DB representation of a fragment of DBLP:
:Ronald FagininPods:83
:John E. HopcroftinFocs:FOCS8
conf:pods
journal:jacm Jacm:HopcroftT74 :Robert E Tarjan
:Jeffrey Ullman
conf:focs Focs:HopU67a
:Moshe Y. Vardi
series
series
journal
partOf
partOf
creatorcreator
creatorcreator
creator
creator
Pods:FaginUV83creator
creator:Leonid Libkin
partOf
creatorcreator
:Limsoon Wongjournal
creatorPods:Ullman89inPods:89
partOf
series
Pods:Libkin95
IPL:LibkinW95
partOf creatorinPods:95
journal:IPL
series
Pods:Vardi95
9 / 83
![Page 11: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/11.jpg)
Graph DBs: Example
A path in this graph DB:
:Ronald FagininPods:83
:John E. HopcroftinFocs:FOCS8
journal:jacm Jacm:HopcroftT74 :Robert E Tarjan
:Jeffrey Ullman
conf:focs Focs:HopU67a
:Moshe Y. Vardi
series
series
journal
partOf
partOf
creatorcreator
creatorcreator
creator
creator
Pods:FaginUV83creator
creator:Leonid Libkin
partOf
creatorcreator
:Limsoon Wongjournal
inPods:89
series
Pods:Ullman89creatorpartOf
Pods:Vardi95
conf:pods
partOf
IPL:LibkinW95
inPods:95
Pods:Libkin95
journal:IPL
creator
series
10 / 83
![Page 12: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/12.jpg)
Graph DBs: Example
The label of such path:
:Ronald FagininPods:83
:John E. HopcroftinFocs:FOCS8
journal:jacm Jacm:HopcroftT74 :Robert E Tarjan
:Jeffrey Ullman
conf:focs Focs:HopU67a
:Moshe Y. Vardi
series
series
journal
partOf
partOf
creatorcreator
creatorcreator
creator
creator
Pods:FaginUV83creator
creator:Leonid Libkin
partOf
creatorcreator
:Limsoon Wongjournal
inPods:89
series
Pods:Ullman89creatorpartOf
Pods:Vardi95
conf:pods
partOf
IPL:LibkinW95
inPods:95
Pods:Libkin95
journal:IPL
creator
series
11 / 83
![Page 13: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/13.jpg)
Graph DBs vs NFAs
Important: Graph DBs can be naturally seen as NFAs.Recall: NFA = Nondeterministic finite automaton.
◮ Nodes are states.
◮ Edges ua−→ v are transitions.
◮ There are no initial and final states.
12 / 83
![Page 14: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/14.jpg)
Regular path queries
Basic building block for graph queries: Regular path queries (RPQs).
◮ First studied in 1989.
◮ An RPQ is a Regular expressions over Σ.
◮ Evaluation L(G) of RPQ L on graph DB G = (V ,E ):
• Pairs of nodes (v , v ′) ∈ V linked by path labeled in L.
13 / 83
![Page 15: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/15.jpg)
RPQs with inverse
More often studied its extension with inverses, or 2RPQs.
◮ First studied in 2000.
◮ 2RPQs = RPQs over Σ±, where:
• Σ± = Σ extended with the inverse a− of each a ∈ Σ.
14 / 83
![Page 16: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/16.jpg)
RPQs with inverse
More often studied its extension with inverses, or 2RPQs.
◮ First studied in 2000.
◮ 2RPQs = RPQs over Σ±, where:
• Σ± = Σ extended with the inverse a− of each a ∈ Σ.
Evaluation L(G) of 2RPQ L over graph DB G = (V ,E ):
◮ Pairs of nodes in G that satisfy RPQ L(G±), where:
• G± obtained from G by adding ua−−→ v for each v
a−→ u ∈ E .
14 / 83
![Page 17: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/17.jpg)
Example of 2RPQ
The 2RPQ
(
creator− ·((partOf · series) ∪ journal
))
computes (a, v) s.t. author a published in conference or journal v .
:Ronald FagininPods:83
:John E. HopcroftinFocs:FOCS8
conf:pods
journal:jacm Jacm:HopcroftT74 :Robert E Tarjan
:Jeffrey Ullman
conf:focs Focs:HopU67a
:Moshe Y. Vardi
series
series
journal
partOf
partOf
creatorcreator
creatorcreator
creator
creator
Pods:FaginUV83creator
creator:Leonid Libkin
partOf
creatorcreator
:Limsoon Wongjournal
creatorPods:Ullman89inPods:89
partOf
series
Pods:Libkin95
IPL:LibkinW95
partOf creatorinPods:95
journal:IPL
series
Pods:Vardi95
15 / 83
![Page 18: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/18.jpg)
Example of 2RPQ
The 2RPQ
(
creator− ·((partOf · series) ∪ journal
))
computes (a, v) s.t. author a published in conference or journal v .
inPods:83
:John E. HopcroftinFocs:FOCS8
journal:jacm Jacm:HopcroftT74 :Robert E Tarjan
:Jeffrey Ullman
conf:focs Focs:HopU67aseries
journal
partOf
creatorcreator
creatorcreator
creator
creator
Pods:FaginUV83
creator:Leonid Libkin
partOf
creatorcreator
:Limsoon Wongjournal
series partOf creatorconf:pods :Ronald Fagin
av:Moshe Y. Vardi
Pods:Ullman89creator
inPods:89
series
partOf
series
IPL:LibkinW95
inPods:95partOf creator
Pods:Vardi95
Pods:Libkin95
journal:IPL
16 / 83
![Page 19: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/19.jpg)
Example of 2RPQ
Example: The 2RPQ
(
creator− ·((partOf · series) ∪ journal
))
computes (a, v) s.t. author a published in conference or journal v .
:Ronald FagininPods:83
:John E. HopcroftinFocs:FOCS8
conf:pods
Jacm:HopcroftT74
:Jeffrey Ullman
conf:focs Focs:HopU67aseries
series
partOf
partOf
creator
creatorcreator
creator
creator
Pods:FaginUV83creator
creator:Leonid Libkin
partOf
creatorcreator
:Limsoon Wongjournal
journal creator:Robert E Tarjanjournal:jacm
v
a
:Moshe Y. Vardi
Pods:Ullman89creator
inPods:89
series
partOf
series
IPL:LibkinW95
inPods:95 Pods:Vardi95partOf creator
journal:IPL
Pods:Libkin95
17 / 83
![Page 20: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/20.jpg)
2RPQ evaluation
Problem: Eval(2RPQ)Input: A graph DB G, nodes v , v ′ in G,
a 2RPQ L.Question: Is (v , v ′) ∈ L(G )?
18 / 83
![Page 21: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/21.jpg)
2RPQ evaluation
Problem: Eval(2RPQ)Input: A graph DB G, nodes v , v ′ in G,
a 2RPQ L.Question: Is (v , v ′) ∈ L(G )?
It boils down to:
Problem: RegularPath
Input: A graph DB G, nodes v , v ′ in G,a regular expression L over Σ±.
Question: Is there a path ρ from v to v ′ in G±
such that λ(ρ) ∈ L?
18 / 83
![Page 22: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/22.jpg)
Complexity of finding regular paths
Theorem
RegularPath can be solved in time O(|G| · |L|).
Proof idea:
◮ Compute in linear time from L an equivalent NFA A.
◮ Compute in linear time (G±, v , v ′) : NFA obtained from G± bysetting v and v ′ as initial and final states, respectively.
◮ Then (v , v ′) ∈ L(G) iff L(G±, v , v ′) ∩ L(A) 6= ∅.
◮ For this need to solve the nonemptiness problem for the NFA(G±, v , v ′)×A.
◮ This can be done time O(|G±| · |A|) = O(|G| · |L|).
19 / 83
![Page 23: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/23.jpg)
Complexity of 2RPQ evaluation
2RPQs can be evaluated in linear time:
Corollary
Eval(2RPQ) can be solved in linear time O(|G| · |L|).
20 / 83
![Page 24: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/24.jpg)
Data complexity of 2RPQ evaluation
Data complexity of 2RPQs belongs to a parallelizable class:
Proposition
Let L be a fixed 2RPQ.There is NLogspace procedure that computes L(G) for each G.
Proof idea:
◮ Construct (G±, v , v ′) from G in NLogspace.
◮ Check nonemptiness of (G±, v , v ′)×A in NLogspace.
21 / 83
![Page 25: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/25.jpg)
Conjunctive regular path queries (CRPQs)
RPQs still do not express arbitrary patterns over graph DBs.
◮ To do this we need to close RPQs under joins and projection.
22 / 83
![Page 26: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/26.jpg)
Conjunctive regular path queries (CRPQs)
RPQs still do not express arbitrary patterns over graph DBs.
◮ To do this we need to close RPQs under joins and projection.
This is the class of conjunctive regular path queries (CRPQs).
◮ Extended with inverses they are known as C2RPQs.
22 / 83
![Page 27: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/27.jpg)
Example of C2RPQ
The C2RPQ
Ans(x , u) ← (x , creator−
, y), (y , partOf · series, z), (y , creator, u)
computes pairs (a1, a2) that are coauthors of a conference paper.
:Ronald FagininPods:83
:John E. HopcroftinFocs:FOCS8
conf:pods
journal:jacm Jacm:HopcroftT74 :Robert E Tarjan
:Jeffrey Ullman
conf:focs Focs:HopU67a
:Moshe Y. Vardi
series
series
journal
partOf
partOf
creatorcreator
creatorcreator
creator
creator
Pods:FaginUV83creator
creator:Leonid Libkin
partOf
creatorcreator
:Limsoon Wongjournal
creatorPods:Ullman89inPods:89
partOf
series
Pods:Libkin95
IPL:LibkinW95
partOf creatorinPods:95
journal:IPL
series
Pods:Vardi95
23 / 83
![Page 28: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/28.jpg)
Example of C2RPQ
The C2RPQ
Ans(x , u) ← (x , creator−
, y), (y , partOf · series, z), (y , creator, u)
computes pairs (a1, a2) that are coauthors of a conference paper.
creator
inPods:83
:John E. HopcroftinFocs:FOCS8
journal:jacm Jacm:HopcroftT74 :Robert E Tarjan
:Jeffrey Ullman
conf:focs Focs:HopU67apartOf
creatorcreator
creatorcreator
Pods:FaginUV83
creator:Leonid Libkin
partOf
creatorcreator
:Limsoon Wongjournal
creator
partOfseries creatorconf:pods :Ronald Fagin
zy
u
x
:Moshe Y. Vardi
series
journal
series
partOfPods:Ullman89
creatorinPods:89
Pods:Vardi95inPods:95partOf
IPL:LibkinW95
Pods:Libkin95
series
journal:IPL
creator
24 / 83
![Page 29: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/29.jpg)
Example of C2RPQ
The C2RPQ
Ans(x , u) ← (x , creator−
, y), (y , partOf · series, z), (y , creator, u)
computes pairs (a1, a2) that are coauthors of a conference paper.
inPods:83
:John E. HopcroftinFocs:FOCS8
conf:pods
journal:jacm Jacm:HopcroftT74 :Robert E Tarjan
:Jeffrey Ullman
conf:focs Focs:HopU67a
:Moshe Y. Vardi
series
journal
partOf
creatorcreator
creatorcreator
creator
Pods:FaginUV83
creator:Leonid Libkin
partOf
creatorcreator
:Limsoon Wongjournal
creator
partOfseries creator:Ronald Fagin
a1
a2creator
inPods:89 Pods:Ullman89
series
partOf
IPL:LibkinW95
series
journal:IPL
creatorpartOf
Pods:Libkin95
Pods:Vardi95inPods:95
25 / 83
![Page 30: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/30.jpg)
C2RPQ: Formal definition
C2RPQ over Σ: Rule of the form:
Ans(z) ← (x1,L1, y1), . . . , (xm,Lm, ym),
such that
◮ the xi , yi are variables,
◮ each Li is a 2RPQ over Σ,
◮ the output z has some variables among the xi , yi .
26 / 83
![Page 31: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/31.jpg)
C2RPQ: Formal definition
C2RPQ over Σ: Rule of the form:
Ans(z) ← (x1,L1, y1), . . . , (xm,Lm, ym),
such that
◮ the xi , yi are variables,
◮ each Li is a 2RPQ over Σ,
◮ the output z has some variables among the xi , yi .
CRPQ: C2RPQ without inverse.
26 / 83
![Page 32: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/32.jpg)
Evaluation of C2RPQs
To evaluate C2RPQ ϕ(z) of the form
Ans(z) ← (x1,L1, y1), . . . , (xm,Lm, ym),
simply evaluate the conjunctive query
Ans(z) ← L1(x1, y1), . . . ,Lm(xm, ym),
where each Li (xi , yi ) is the result of evaluating the 2RPQ Li .
Can also see it asπz(L1 ⋊⋉ . . . ⋊⋉ Lm)
Will write ϕ(G).
27 / 83
![Page 33: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/33.jpg)
C2RPQs vs 2RPQs
Proposition
The C2RPQ
Ans(x , u) ← (x , creator−
, y), (y , partOf · series, z), (y , creator, u)
is not expressible as a 2RPQ L over the graph database:
:Ronald FagininPods:83
:John E. HopcroftinFocs:FOCS8
conf:pods
journal:jacm Jacm:HopcroftT74 :Robert E Tarjan
:Jeffrey Ullman
conf:focs Focs:HopU67a
:Moshe Y. Vardi
series
series
journal
partOf
partOf
creatorcreatorcreatorcreator
creator
creator
Pods:FaginUV83creator
creator:Leonid Libkin
partOf
creatorcreator
:Limsoon Wongjournal
creatorPods:Ullman89inPods:89
partOf
series
Pods:Libkin95
IPL:LibkinW95
partOf creatorinPods:95
journal:IPL
series
Pods:Vardi95
Conclusion: Binary C2RPQs are strictly more expressive than 2RPQs.28 / 83
![Page 34: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/34.jpg)
Complexity of evaluation of C2RPQS
Increase in expressiveness has a cost in evaluation.
Proposition
Eval(C2RPQ) is NP-complete, even if restricted to CRPQs.
◮ Upper bound by translation to evaluation of CQs.
◮ Lower bound holds since CRPQs contain CQs over graphs.
29 / 83
![Page 35: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/35.jpg)
Data complexity of evaluation of (U)C2RPQS
But adding conjunctions is free in data complexity.
Proposition
Eval(C2RPQ) can be solved in NLogspace in data complexity.
30 / 83
![Page 36: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/36.jpg)
Summary of basic query languages for graph DBs
◮ 2RPQs can be evaluated in linear time.
◮ 2RPQ evaluation is in NLogspace in data complexity.
◮ For C2RPQs:• Retain good data complexity of 2RPQs.• Combined complexity is intractable.
◮ C2RPQs do not exhaust the NLogspace properties.
31 / 83
![Page 37: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/37.jpg)
Complexity of C2RPQs revisited
C2RPQs can be evaluated in polynomial time in data complexity, but isthis a good measure for massive datasets?
CRPQ evaluation is of the order |G|O(|Q|), which is impractical if G isvery big even for small Q.
Idea: Look for languages that are tractable in combined complexity or, atleast, fixed-parameter tractable (fpt).
◮ L is fpt if there is computable function f : N→ N and constantc ≥ 0 such that L-queries can be evaluated in time O(|G|c · f (|ϕ|)).
The landscape so far:
◮ 2RPQs are tractable in combined complexity (O(|G| · |L|)).
◮ CRPQs are intractable in combined complexity.CRPQs are not fpt (even CQs are not).
32 / 83
![Page 38: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/38.jpg)
Structural restrictions of C2RPQs
Recall:
◮ Relational CQs are neither tractable in combined complexity nor fpt.
◮ Tractable cases of CQ evaluation can be obtained by restricting thesyntactic shape of CQs.
◮ The most common such restriction is acyclicity.◮ An acyclic CQ Q can be evaluated in linear time O(|D| · |Q|) over
relational DB D (Yannakakis (1981)).
◮ Other restrictions include bounded (hyper-)treewidth.
33 / 83
![Page 39: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/39.jpg)
Acyclic C2RPQs
A UC2RPQ is acyclic if its underlying CQ is acyclic.
A different way of stating this:
A C2RPQ Q is acyclic iff its underlying simple and undirected graphU(Q) is acyclic, where U(Q) = (V ,E ) for:
◮ V = {x1, y1, . . . , xm, ym};
◮ E = {{xi , yi} | 1 ≤ i ≤ m and xi 6= yi}.
Remark: Acyclicity allows cycles of length ≤ 2 in C2RPQs.
◮ The C2RPQ Ans()← (x , a, x), (x , b, y), (y , c , x) is acyclic.
34 / 83
![Page 40: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/40.jpg)
Acyclic C2RPQs: Examples
◮ The following C2RPQ is acyclic:
Ans(x , u) ← (x , creator−
, y), (y , partOf · series, z), (y , creator, u).
◮ The following C2RPQ is not acyclic:
Ans() ← (x ,L1, y), (y ,L2, z), (z ,L3, x).
35 / 83
![Page 41: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/41.jpg)
Evaluation of acyclic C2RPQs
Evaluation of acyclic C2RPQs is tractable in combined complexity:
Proposition
Evaluation of an acyclic C2RPQ Q over a graph DB G takes timeO(|G |2 · |Q|2).
36 / 83
![Page 42: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/42.jpg)
The simple path semantics
Simple paths: No node is repeated.
Simple paths semantics:
◮ Motivated by applications for which simple paths are more natural.
◮ Studied back in the late 1980s already.
◮ Revival due to application in early versions of SPARQL, a languagefor RDF.
37 / 83
![Page 43: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/43.jpg)
RPQs under simple paths semantics
• RPQ evaluation in this context = Finding regular simple paths:
Problem: RegularSimplePath
Input: A graph database G, nodes v , v ′ in G,a regular expression L.
Question: Is there a simple path ρ fromv to v ′ in G such that λ(ρ) ∈ L?
38 / 83
![Page 44: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/44.jpg)
RPQs under simple paths semantics
• RPQ evaluation in this context = Finding regular simple paths:
Problem: RegularSimplePath
Input: A graph database G, nodes v , v ′ in G,a regular expression L.
Question: Is there a simple path ρ fromv to v ′ in G such that λ(ρ) ∈ L?
• RegularSimplePath(L): For fixed L.
38 / 83
![Page 45: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/45.jpg)
Complexity of finding regular simple paths
Theorem
The problem RegularSimplePath is in NP, and for some L theproblem RegularSimplePath(L) can be NP-complete.
◮ RegularSimplePath((00)∗):
◮ Is there simple directed path of even length? It is NP-complete.
◮ Query evaluation is NP-complete in data complexity – henceimpractical.
39 / 83
![Page 46: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/46.jpg)
Static analysis: Containment for 2RPQs
Cont(L): Given L-queries Q1 and Q2,
◮ is Q1(G) ⊆ Q2(G) for each graph DB G?
40 / 83
![Page 47: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/47.jpg)
Static analysis: Containment for 2RPQs
Cont(L): Given L-queries Q1 and Q2,
◮ is Q1(G) ⊆ Q2(G) for each graph DB G?
Containment for 2RPQs is decidable:
Theorem
Cont(2RPQ) is Pspace-complete. It is Pspace-hard even for RPQs.
◮ For RPQs easy to prove:• L1(G) ⊆ L2(G) for each G ⇐⇒
regular expression L1 contained in regular expression L2.• Containment of regular expressions:
Pspace-complete (Stock+1)Meyer (1971)).
◮ For 2RPQs more work is required: Reason with two-way automata.
40 / 83
![Page 48: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/48.jpg)
Containment for C2RPQs
Containment of C2RPQs still decidable with exponential blow-up:
Theorem
Cont(C2RPQ) is Expspace-complete, even for CRPQs.
◮ Notice contrast with complexity of containment for CQs:• NP-complete (Chandra,Merlin (1977)).
41 / 83
![Page 49: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/49.jpg)
Summary of containment
◮ Containment of C2RPQs is decidable in double exponential time.
◮ For 2RPQs containment can be checked in single exponential time.
◮ High lower bounds are due to the presence of regular expressions.
42 / 83
![Page 50: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/50.jpg)
Path queries and comparisons
CRPQs fall short of expressive power for applications that need:
◮ to include paths in the output of a query, and
◮ to define complex relationships among labels of paths.
43 / 83
![Page 51: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/51.jpg)
Path queries and comparisons
CRPQs fall short of expressive power for applications that need:
◮ to include paths in the output of a query, and
◮ to define complex relationships among labels of paths.
Examples:
◮ Semantic Web queries:• establish semantic associations among paths.
◮ Biological applications:• compare paths based on similarity.
◮ Route-finding applications:• compare paths based on length or number of occurrences of labels.
◮ Data provenance and semantic search over the Web:• require returning paths to the user.
43 / 83
![Page 52: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/52.jpg)
Path comparisons
We use a set S of relations on words.
◮ Example: S may contain• Unary relations: Regular, context-free languages, etc.• Binary relations: prefix, equal length, subsequence, etc.
◮ Comparisons among labels of paths• Example: w1 is a substring of w2.
◮ We assume S contains all regular languages.
44 / 83
![Page 53: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/53.jpg)
Extended CRPQs
The S-extended CRPQs (ECRPQ(S)) are rules obtained from a CRPQ:
Ans(z , ) ← (x1,L1, y1), . . . , (xm,Lm, ym),
◮ by annotating each pair (xi , yi ) with a path variable πi ,
◮ comparing labels of paths in πj wrt Sj ∈ S• for πj a tuple of path variables among the πi ’s,
◮ projecting some of πi ’s as a tuple χ in the output.
45 / 83
![Page 54: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/54.jpg)
Extended CRPQs
The S-extended CRPQs (ECRPQ(S)) are rules obtained from a CRPQ:
Ans(z, ) ← (x1, π1, y1), . . . , (xm, πm, ym),
◮ by annotating each pair (xi , yi ) with a path variable πi ,
◮ comparing labels of paths in πj wrt Sj ∈ S• for πj a tuple of path variables among the πi ’s,
◮ projecting some of πi ’s as a tuple χ in the output.
45 / 83
![Page 55: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/55.jpg)
Extended CRPQs
The S-extended CRPQs (ECRPQ(S)) are rules obtained from a CRPQ:
Ans(z, ) ← (x1, π1, y1), . . . , (xm, πm, ym),∧
1≤j≤t Sj(πj )
◮ by annotating each pair (xi , yi ) with a path variable πi ,
◮ comparing labels of paths in πj wrt Sj ∈ S• for πj a tuple of path variables among the πi ’s,
◮ projecting some of πi ’s as a tuple χ in the output.
45 / 83
![Page 56: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/56.jpg)
Extended CRPQs
The S-extended CRPQs (ECRPQ(S)) are rules obtained from a CRPQ:
Ans(z, χ) ← (x1, π1, y1), . . . , (xm, πm, ym),∧
1≤j≤t Sj(πj )
◮ by annotating each pair (xi , yi ) with a path variable πi ,
◮ comparing labels of paths in πj wrt Sj ∈ S• for πj a tuple of path variables among the πi ’s,
◮ projecting some of πi ’s as a tuple χ in the output.
45 / 83
![Page 57: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/57.jpg)
Extended CRPQs and our requirements
ECRPQs meet our requirements:
Ans(z, χ) ← (x1, π1, y1), . . . , (xm, πm, ym),∧
1≤j≤t Sj(πj)
46 / 83
![Page 58: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/58.jpg)
Extended CRPQs and our requirements
ECRPQs meet our requirements:
Ans(z, χ) ← (x1, π1, y1), . . . , (xm, πm, ym),∧
1≤j≤t Sj(πj)
◮ They allow paths in the output.
◮ They allow to compare labels of paths with relations Sj ∈ S.
46 / 83
![Page 59: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/59.jpg)
Extended CRPQs and our requirements
ECRPQs meet our requirements:
Ans(z, χ) ← (x1, π1, y1), . . . , (xm, πm, ym),∧
1≤j≤t Sj(πj )
◮ They allow paths in the output.
◮ They allow to compare labels of paths with relations Sj ∈ S.
46 / 83
![Page 60: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/60.jpg)
Evaluation of ECRPQs
Evaluation of the ECRPQ(S)
θ(z, χ) : Ans(z , χ) ← (x1, π1, y1), . . . , (xm, πm, ym),∧
j Sj(πj )
Same than for CRPQs but:
◮ Each πi is mapped to a path ρi in the graph DB.
◮ For each j , if πj = (πj1 , . . . , πjk ) then: (λ(ρj1), . . . , λ(ρjk ))︸ ︷︷ ︸
the labels of (ρj1 , . . . , ρjk)
∈ Sj .
47 / 83
![Page 61: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/61.jpg)
Evaluation of ECRPQs
Evaluation of the ECRPQ(S)
θ(z, χ) : Ans(z , χ) ← (x1, π1, y1), . . . , (xm, πm, ym),∧
j Sj(πj )
Same than for CRPQs but:
◮ Each πi is mapped to a path ρi in the graph DB.
◮ For each j , if πj = (πj1 , . . . , πjk ) then: (λ(ρj1), . . . , λ(ρjk ))︸ ︷︷ ︸
the labels of (ρj1 , . . . , ρjk)
∈ Sj .
47 / 83
![Page 62: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/62.jpg)
Evaluation of ECRPQs
Evaluation of the ECRPQ(S)
θ(z, χ) : Ans(z , χ) ← (x1, π1, y1), . . . , (xm, πm, ym),∧
j Sj(πj )
Same than for CRPQs but:
◮ Each πi is mapped to a path ρi in the graph DB.
◮ For each j , if πj = (πj1 , . . . , πjk )then: (λ(ρj1), . . . , λ(ρjk ))
︸ ︷︷ ︸
the labels of (ρj1 , . . . , ρjk)
∈ Sj .(λ(ρj1), . . . , λ(ρjk )) ∈ Sj .
47 / 83
![Page 63: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/63.jpg)
Considerations about ECRPQ(S)
• ECRPQ(S) extends the class of CRPQs.
◮ Ans(z)←∧
i(xi ,Li , yi ) same as Ans(z) ←∧
i (xi , πi , yi ),Li (πi ).
• Expressiveness and complexity of ECRPQ(S):
◮ Depends on the class S.
• We study two such classes with roots in formal language theory:
◮ Regular relations (Elgot, Mezei (1965)).
◮ Rational relations (Nivat (1968)).
48 / 83
![Page 64: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/64.jpg)
Comparing paths with regular relations
• Regular relations: Regular languages for relations of any arity.
◮ REG: Class of regular relations.
• Bottomline:
ECRPQ(REG): Reasonable expressiveness and complexity.
49 / 83
![Page 65: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/65.jpg)
Regular relations
n-ary regular relation:
Set of n-tuples (w1, . . . ,wn) of stringsaccepted by synchronous automaton over Σn.
50 / 83
![Page 66: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/66.jpg)
Regular relations
n-ary regular relation:
Set of n-tuples (w1, . . . ,wn) of stringsaccepted by synchronous automaton over Σn.
◮ The input strings are written in the n-tapes.
◮ Shorter strings are padded with symbol ⊥.
◮ At each step:The automaton simultaneously reads next symbol on each tape.
50 / 83
![Page 67: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/67.jpg)
Synchronous automata
w1 = a a b · · · a b cw2 = a b a · · · aw3 = b b · · ·
......
wn = a b b · · · a c
51 / 83
![Page 68: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/68.jpg)
Synchronous automata
w1 = a a b · · · a b cw2 = a b a · · · a ⊥ ⊥w3 = b b ⊥ · · · ⊥ ⊥ ⊥
......
wn = a b b · · · a c ⊥
52 / 83
![Page 69: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/69.jpg)
Synchronous automata
w1 = a a b · · · a b cw2 = a b a · · · a ⊥ ⊥w3 = b b ⊥ · · · ⊥ ⊥ ⊥
......
wn = a b b · · · a c ⊥⇑
53 / 83
![Page 70: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/70.jpg)
Synchronous automata
w1 = a a b · · · a b cw2 = a b a · · · a ⊥ ⊥w3 = b b ⊥ · · · ⊥ ⊥ ⊥
......
wn = a b b · · · a c ⊥⇑
54 / 83
![Page 71: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/71.jpg)
Synchronous automata
w1 = a a b · · · a b cw2 = a b a · · · a ⊥ ⊥w3 = b b ⊥ · · · ⊥ ⊥ ⊥
......
wn = a b b · · · a c ⊥⇑
55 / 83
![Page 72: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/72.jpg)
Synchronous automata
w1 = a a b · · · a b cw2 = a b a · · · a ⊥ ⊥w3 = b b ⊥ · · · ⊥ ⊥ ⊥
......
wn = a b b · · · a c ⊥⇑
56 / 83
![Page 73: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/73.jpg)
Synchronous automata
w1 = a a b · · · a b cw2 = a b a · · · a ⊥ ⊥w3 = b b ⊥ · · · ⊥ ⊥ ⊥
......
wn = a b b · · · a c ⊥⇑
57 / 83
![Page 74: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/74.jpg)
Synchronous automata
w1 = a a b · · · a b cw2 = a b a · · · a ⊥ ⊥w3 = b b ⊥ · · · ⊥ ⊥ ⊥
......
wn = a b b · · · a c ⊥⇑
58 / 83
![Page 75: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/75.jpg)
Examples of regular relations
• All regular languages.
• The prefix relation defined by:
( ⋃
a∈Σ
(a, a))∗·( ⋃
a∈Σ
(a,⊥))∗
.
• The equal length relation defined by:
( ⋃
a,b∈Σ
(a, b))∗
.
• Pairs of strings at edit distance at most k, for fixed k ≥ 0.
59 / 83
![Page 76: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/76.jpg)
Examples of regular relations
• All regular languages.
• The prefix relation defined by:
( ⋃
a∈Σ
(a, a))∗·( ⋃
a∈Σ
(a,⊥))∗
.
• The equal length relation defined by:
( ⋃
a,b∈Σ
(a, b))∗
.
• Pairs of strings at edit distance at most k, for fixed k ≥ 0.
Proposition
The subsequence, subword and suffix relations are not regular.
59 / 83
![Page 77: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/77.jpg)
ECRPQ(REG)
ECRPQ(REG): Class of queries of the form
Ans(z, χ) ←∧
i (xi , πi , yi ),∧
j Sj(πj),
where each Sj is a regular relation
60 / 83
![Page 78: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/78.jpg)
ECRPQ(REG)
ECRPQ(REG): Class of queries of the form
Ans(z, χ) ←∧
i (xi , πi , yi ),∧
j Sj(πj),
where each Sj is a regular relation
Example: The ECRPQ(REG) query
Ans(x , y) ← (x , π1, z), (z , π2, y), a∗(π1), b∗(π2), equal length(π1, π2)
computes pairs of nodes linked by a path labeled in {anbn | n ≥ 0}.
60 / 83
![Page 79: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/79.jpg)
ECRPQ(REG)
ECRPQ(REG): Class of queries of the form
Ans(z, χ) ←∧
i (xi , πi , yi ),∧
j Sj(πj),
where each Sj is a regular relation
Example: The ECRPQ(REG) query
Ans(x , y) ← (x , π1, z), (z , π2, y), a∗(π1), b∗(π2), equal length(π1, π2)
computes pairs of nodes linked by a path labeled in {anbn | n ≥ 0}.
Corollary
ECRPQ(REG) properly extends the class of CRPQs.
60 / 83
![Page 80: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/80.jpg)
Complexity of evaluation of ECRPQ(REG)
• Extending CRPQs with regular relations is free for data complexity.• Combined complexity is that of relational calculus over relationaldatabases.
Theorem
◮ Eval(ECPRQ(REG)) is Pspace-complete.
◮ Eval(ECPRQ(REG)) is in NLogspace in data complexity.
61 / 83
![Page 81: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/81.jpg)
Containment for ECRPQ(REG)
Theorem
Cont(ECRPQ(REG)) is undecidable.
◮ Notice contrast with CRPQs for which containment is decidable.
◮ But this is like for full relational algebra/calculus.
62 / 83
![Page 82: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/82.jpg)
Comparing with rational relations
ECRPQ(REG) queries are still short of expressive power:
◮ RDF or biological networks:• Compare strings based on subsequence and subword relations.
◮ These relations are rational: Accepted by asynchronous automata.• RAT: Class of rational relations.
Bottomline:
◮ ECRPQ(RAT) evaluation:• Undecidable or very high complexity.
◮ Restricting the syntactic shape of queries yields tractability.
63 / 83
![Page 83: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/83.jpg)
Rational relations
n-ary rational relation:Set of n-tuples (w1, . . . ,wn) of stringsaccepted by asynchronous automaton with n heads.
64 / 83
![Page 84: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/84.jpg)
Rational relations
n-ary rational relation:Set of n-tuples (w1, . . . ,wn) of stringsaccepted by asynchronous automaton with n heads.
◮ The input strings are written in the n-tapes.
◮ At each step:The automaton enters a new state and move some tape heads.
64 / 83
![Page 85: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/85.jpg)
Rational relations
n-ary rational relation:Set of n-tuples (w1, . . . ,wn) of stringsaccepted by asynchronous automaton with n heads.
◮ The input strings are written in the n-tapes.
◮ At each step:The automaton enters a new state and move some tape heads.
n-ary rational relation:Described by regular expression over alphabet (Σ ∪ {ǫ})n.
64 / 83
![Page 86: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/86.jpg)
Examples of rational relations
• All regular relations.
• The subsequence relation �ss defined by:
(( ⋃
a∈Σ
(a, ǫ))∗
⋃
b∈Σ
(b, b)
)∗
·( ⋃
a∈Σ
(a, ǫ))∗
.
• The subword relation �sw defined by:
( ⋃
a∈Σ
(a, ǫ))∗·( ⋃
b∈Σ
(b, b))∗·( ⋃
a∈Σ
(a, ǫ))∗
.
65 / 83
![Page 87: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/87.jpg)
Examples of rational relations
• All regular relations.
• The subsequence relation �ss defined by:
(( ⋃
a∈Σ
(a, ǫ))∗
⋃
b∈Σ
(b, b)
)∗
·( ⋃
a∈Σ
(a, ǫ))∗
.
• The subword relation �sw defined by:
( ⋃
a∈Σ
(a, ǫ))∗·( ⋃
b∈Σ
(b, b))∗·( ⋃
a∈Σ
(a, ǫ))∗
.
Proposition
The set of pairs (w1,w2) such that w1 is the reversal of w2 is not rational.
65 / 83
![Page 88: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/88.jpg)
ECRPQ(RAT)
ECRPQ(RAT): Class of queries of the form
Ans(z, χ) ←∧
i (xi , πi , yi ),∧
j Sj(πj),
where each Sj is a rational relation
Example: The ECRPQ(RAT) query
Ans(x , y) ← (x , π1, z), (y , π2,w), π1 �ss π2
computes x , y that are origins of paths ρ1 and ρ2 such that:
◮ λ(ρ1) is a subsequence of λ(ρ2).
66 / 83
![Page 89: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/89.jpg)
Evaluation of ECRPQ(RAT) queries
Evaluation of queries in ECRPQ(RAT) is undecidable, but:◮ True if we allow only practically motivated rational relations?• For example, �ss and �sw.
67 / 83
![Page 90: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/90.jpg)
Evaluation of ECRPQ(RAT) queries
Evaluation of queries in ECRPQ(RAT) is undecidable, but:◮ True if we allow only practically motivated rational relations?• For example, �ss and �sw.
Adding subword relation to ECRPQ(REG) leads to undecidability:
Theorem
Evaluation of (ECRPQ(REG ∪{�sw})) queries is undecidable. The sameis true for suffix in place of subword.
67 / 83
![Page 91: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/91.jpg)
Evaluation of ECRPQ(RAT) queries
Evaluation of queries in ECRPQ(RAT) is undecidable, but:◮ True if we allow only practically motivated rational relations?• For example, �ss and �sw.
Adding subword relation to ECRPQ(REG) leads to undecidability:
Theorem
Evaluation of (ECRPQ(REG ∪{�sw})) queries is undecidable. The sameis true for suffix in place of subword.
Adding subsequence preserves decidability, but at a very high cost:
Theorem
Evaluation of (ECRPQ(REG ∪{�ss})) queries is decidable, butnon-primitive-recursive.
Primitive-recursive, informally: any function you can think of!67 / 83
![Page 92: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/92.jpg)
Acyclic ECRPQ(RAT) queries
Acyclic ECRPQ(RAT) queries yield tractable data complexity.
◮ Queries of the form:
Ans(z)←∧
i≤k
(xi , πi , yi ),Li (πi ),∧
j
Sj(πj1 , πj2),
where the graph on {1, . . . , k} defined by edges (πj1 , πj2) is acyclic.
68 / 83
![Page 93: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/93.jpg)
Acyclic ECRPQ(RAT) queries
Acyclic ECRPQ(RAT) queries yield tractable data complexity.
◮ Queries of the form:
Ans(z)←∧
i≤k
(xi , πi , yi ),Li (πi ),∧
j
Sj(πj1 , πj2),
where the graph on {1, . . . , k} defined by edges (πj1 , πj2) is acyclic.
Acyclic ECRPQ(RAT) is not more expensive than ECRPQ(REG):
Theorem
◮ Evaluation of acyclic ECRPQ(RAT) queries is Pspace-complete.
◮ It is in NLogspace in data complexity.
68 / 83
![Page 94: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/94.jpg)
Summary of path queries
◮ Usual query languages do not allow:• to export paths and compare labels of paths.
◮ This has led to the introduction of ECRPQ(S) queries:• They output paths and compare labels of paths with relations in S.
◮ Comparing paths with regular relations:• Preserves tractable data complexity of evaluation.• Leads to undecidability of containment.
◮ Comparing paths with practically motivated rational relations:• Leads to undecidability or high complexity of evaluation.• Tractable cases found restricting the syntactic shape of queries.
69 / 83
![Page 95: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/95.jpg)
Querying graphs with data
So far queries only talk about the topology of the data.
Queries that combine topology and data are important in practice:
◮ Example:People of the same age connected by professional links.
We present a language that expresses topological properties of the data:
◮ It requires an extension of the data model (data graphs).
◮ It talks about data paths:Summarize the topology and the underlying data of a path.
70 / 83
![Page 96: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/96.jpg)
Data graphs and data paths
We work with data graphs and paths over set of data values D.
Definition
A data graph G over Σ is a tuple (V ,E , δ), where:
◮ (V ,E ) is a graph database over Σ, and
◮ δ is a mapping that assigns a value in D to each node v ∈ V .
71 / 83
![Page 97: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/97.jpg)
Data graphs and data paths
We work with data graphs and paths over set of data values D.
Definition
A data graph G over Σ is a tuple (V ,E , δ), where:
◮ (V ,E ) is a graph database over Σ, and
◮ δ is a mapping that assigns a value in D to each node v ∈ V .
With each path ρ = v1a1−→ v2 · · · vk
ak−→ vk+1 in (V ,E ):We associate a data path in G of the form
ρD = δ(v1)a1−→ δ(v2) · · · δ(vk)
ak−→ δ(vk+1),
that is obtained from ρ by replacing each node by its data value.
71 / 83
![Page 98: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/98.jpg)
Data paths and data words
Data paths are very close to data words:
◮ Object studied in XML and verification (Bojanczyk et al. (2006)).
◮ Data words are strings over Σ×D.
Mechanisms that query data words can be used for data paths:
◮ FO, MSO, and some versions of XPath (Bojanczyk et al. (2006)).
◮ Pebble automata (Neven, Schwentick, Vianu (2004)).
◮ Register automata (Kaminski, Francez (1994)).
72 / 83
![Page 99: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/99.jpg)
The choice of a formalism
Formalism for querying data paths has to be chosen with care:
Theorem
The problem DistinctValues is NP-complete:
◮ DistinctValues:Is there a path ρ from v to v ′ s.t. no data value in ρD is repeated?
73 / 83
![Page 100: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/100.jpg)
The choice of a formalism
Formalism for querying data paths has to be chosen with care:
Theorem
The problem DistinctValues is NP-complete:
◮ DistinctValues:Is there a path ρ from v to v ′ s.t. no data value in ρD is repeated?
Conclusion:
◮ If a language expresses DistinctValues:• It is NP-hard in data complexity ⇒ Impractical.
◮ Rules out all formalisms except for one:• Register automata.
73 / 83
![Page 101: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/101.jpg)
Regular expressions for register automata
Regular expressions with memory (REMs):Same as register automata
74 / 83
![Page 102: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/102.jpg)
Regular expressions for register automata
Regular expressions with memory (REMs):Same as register automata
◮ REMs permit to specify when data values are remembered and used.
◮ Data values are remembered in k registers {x1, . . . , xk}.
◮ At any point we can compare a data value with one in the registers.
74 / 83
![Page 103: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/103.jpg)
REM: Example
Consider the REM ↓x .a+[x=].
Intuition:
◮ Store the current data value d in register x .
◮ After reading a word in a+ check that d is seen again.
Semantics: Pairs (v , v ′) of nodes:
◮ Linked by a path labeled in a+.
◮ v and v ′ contain the same data value.
75 / 83
![Page 104: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/104.jpg)
REM: Conditions
• Conditions: Compare a data value with the ones in the registers.
• Conditions over {x1, . . . , xk} are given by the grammar:
c := x=i | ¬c | c ∧ c (1 ≤ i ≤ k)
• We define (d , τ) |= c for d ∈ D and τ = (d1, . . . , dk) ∈ Dk :
◮ (d , τ) |= x=i iff d = di .
◮ Boolean combinations are standard.
76 / 83
![Page 105: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/105.jpg)
REMs: Syntax and semantics (Intuition)
REMs over Σ and {x1, . . . , xk} are defined by grammar:
e := ε | a | e ∪ e | e · e | e+ | e[c] | ↓ x .e
where a ∈ Σ, c condition, and x tuple in {x1, . . . , xk}.
77 / 83
![Page 106: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/106.jpg)
REMs: Syntax and semantics (Intuition)
REMs over Σ and {x1, . . . , xk} are defined by grammar:
e := ε | a | e ∪ e | e · e | e+ | e[c] | ↓ x .e
where a ∈ Σ, c condition, and x tuple in {x1, . . . , xk}.
Intuition: Evaluation of REM e on data graph G is:• pairs (v , v ′) of nodes linked by path ρ such that ρD |= e, where:
77 / 83
![Page 107: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/107.jpg)
REMs: Syntax and semantics (Intuition)
REMs over Σ and {x1, . . . , xk} are defined by grammar:
e := ε | a | e ∪ e | e · e | e+ | e[c] | ↓ x .e
where a ∈ Σ, c condition, and x tuple in {x1, . . . , xk}.
Intuition: Evaluation of REM e on data graph G is:• pairs (v , v ′) of nodes linked by path ρ such that ρD |= e, where:
◮ ρD |= e[c] if and only if
τ ∈ Dk st (κ(vk+1), τ) |= c
κ(v1)a1−→ κ(v2) · · · κ(vk)
ak−→ κ(vk+1)︸ ︷︷ ︸
can be parsed wrt estarting from empty registers
ρD
finishing in register value
77 / 83
![Page 108: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/108.jpg)
REMs: Syntax and semantics (Intuition)
REMs over Σ and {x1, . . . , xk} are defined by grammar:
e := ε | a | e ∪ e | e · e | e+ | e[c] | ↓ x .e
where a ∈ Σ, c condition, and x tuple in {x1, . . . , xk}.
Intuition: Evaluation of REM e on data graph G is:• pairs (v , v ′) of nodes linked by path ρ such that ρD |= e, where:
◮ ρD |=↓ x .e if and only ifρD
κ(v1)a1−→ κ(v2) · · · κ(vk)
ak−→ κ(vk+1)︸ ︷︷ ︸
can be parsed wrt estarting from the register valuethat assigns κ(v1) to each x ∈ x
77 / 83
![Page 109: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/109.jpg)
REM: Example
Consider the REM Σ∗ · (↓x .Σ+[x=]) · Σ∗:
◮ Defines pairs of nodes linked by path ρ such that:• ρD contains the same data value twice.
◮ The complement of this language is DistinctValues.
78 / 83
![Page 110: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/110.jpg)
REM: Example
Consider the REM Σ∗ · (↓x .Σ+[x=]) · Σ∗:
◮ Defines pairs of nodes linked by path ρ such that:• ρD contains the same data value twice.
◮ The complement of this language is DistinctValues.
Corollary
REMs are not closed under complement.
78 / 83
![Page 111: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/111.jpg)
Complexity of REM evaluation
• Data complexity of REM evaluation coincides with that of CRPQs.• Combined complexity same than for FO over relational databases.
Theorem
◮ Eval(REM) is Pspace-complete.
◮ It is in NLogspace in data complexity.
• Both bounds extend to the class of conjunctive REMs.
79 / 83
![Page 112: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/112.jpg)
Summary of queries on graphs with data
◮ Most query languages for graph DBs:• talk about topology, but not about underlying data.
◮ Query languages that combine topology and data:• talk about data paths in data graphs.
◮ Choosing a formalism to query data paths must be done with care:• intractability can be reached easily.
◮ To query data paths:• Can use REMs, which are based on register automata.• REMs can be evaluated efficiently in data complexity.
80 / 83
![Page 113: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/113.jpg)
Comments on papers
◮ Isabel F. Cruz, Alberto O. Mendelzon, Peter T. Wood: A Graphical QueryLanguage Supporting Recursion. SIGMOD Conference 1987: 323-330
◮ Mariano P. Consens, Alberto O. Mendelzon: Low Complexity Aggregation inGraphLog and Datalog. Theor. Comput. Sci. 116(1): 95-116 (1993)
Original papers introducing (C)RPQs
◮ Pablo Barcelo: Querying graph databases. PODS 2013: 175-188
◮ Renzo Angles, Claudio Gutierrez: Survey of graph database models. ACMComput. Surv. 40(1) (2008)
◮ Peter T. Wood: Query languages for graph databases. SIGMOD Record 41(1):50-60 (2012)
Three suveys of graph languages, two are more theoretical, one more practical.
◮ Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, Moshe Y. Vardi:Rewriting of Regular Expressions and Regular Path Queries. J. Comput. Syst.Sci. 64(3): 443-465 (2002)
Introducing two-way queries.
81 / 83
![Page 114: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/114.jpg)
Comments on papers
◮ Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, Moshe Y. Vardi:Reasoning on regular path queries. SIGMOD Record 32(4): 83-92 (2003)
◮ Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, Moshe Y. Vardi:Containment of Conjunctive Regular Path Queries with Inverse. KR 2000:176-185
Static analysis of regular path queries.
◮ Leonid Libkin, Wim Martens, Domagoj Vrgoc: Querying graph databases withXPath. ICDT 2013: 129-140
Adding data values to (C)RPQs
◮ Pablo Barcelo, Leonid Libkin, Anthony Widjaja Lin, Peter T. Wood: ExpressiveLanguages for Path Queries over Graph-Structured Data. ACM Trans. DatabaseSyst. 37(4): 31 (2012)
Extending RPQs with regular relations; topics to concentrate on are those notcovered in class.
◮ Pablo Barcelo, Diego Figueira, Leonid Libkin: Graph Logics with RationalRelations .Logical Methods in Computer Science 9(3) (2013)
Likewise for rational relations.
82 / 83
![Page 115: Querying Graph Databaseshomepages.inf.ed.ac.uk/libkin/teach/atfd16/graphdb.pdf · Early graph query languages Graph query languages flourished from the mid 80s to the late 90s: G,](https://reader030.vdocuments.us/reader030/viewer/2022040202/5e73b559d9367d27e8146d35/html5/thumbnails/115.jpg)
Comments on papers
◮ Dominik D. Freydenberger, Nicole Schweikardt: Expressiveness and StaticAnalysis of Extended Conjunctive Regular Path Queries. AMW 2011
Resolving some of the questions on the containment of path queries.
◮ Jelle Hellings, Bart Kuijpers, Jan Van den Bussche, Xiaowang Zhang: Walk logicas a framework for path query languages on graph databases. ICDT 2013:117-128
A different approach to expanding the power of path languages.
◮ Pablo Barcelo, Leonid Libkin, Juan L. Reutter: Querying Regular GraphPatterns. Journal of the ACM 61(1): 8:1-8:54 (2014)
Incomplete information in graph databases and querying it.
◮ Wenfei Fan, Xin Wang, Yinghui Wu: Querying big graphs within boundedresources. SIGMOD Conference 2014: 301-312
◮ Wenfei Fan: Graph pattern matching revised for social network analysis. ICDT2012: 8-21
Two papers on making graph queries scalable.
83 / 83