![Page 1: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/1.jpg)
What makes graph queries difficult?
Gábor Szárnyas
Budapest Neo4j Meetup – 2019/06/25
With contributions from Petra Várhegyi and Bálint Hegyi
![Page 2: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/2.jpg)
The property graph data model
![Page 3: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/3.jpg)
SIMPLE GRAPH
A B
D E
C
5 people
Many of them know each other
This is a simple graph.
Algorithms:
breadth-first search
depth-first search
PageRank
connected components
![Page 4: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/4.jpg)
ADD EDGE WEIGHTS
A B
D E
C
5 people
Weight: communication cost
This is a weighted graph.
Algorithms:
shortest path algorithms
max-flow
10
8
4
2
2
9
1
6
![Page 5: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/5.jpg)
ADD EDGE TYPES
A B
D E
C
5 people
![Page 6: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/6.jpg)
ADD EDGE TYPES
A B
D E
C
5 people
Business partners
Friends
Multiple edge types
but only a single node type.
This is an edge-typed graph.
![Page 7: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/7.jpg)
ADD NODE AND EDGE TYPES
c4
c2
c5
c3
c6
c1
A B
D E
C
5 people
Business partners
Friends
6 comments
Replying to another comment
Authored by a given person
This is a typed graph.
![Page 8: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/8.jpg)
ADD PROPERTIES
c4
c2
c5
c3
c6
c1
A B
D E
C
5 people – name, age
Business partners
Friends – since
6 comments – content, date
Replying to another comment
Authored by a given person
This is a property graph.
Similar to object-oriented data.
name: “Alice”
age: 25
name: “Bob”
age: 26
since: 2014
name: “Erin”
age: 30
content: “I totally agree”
date: 2017-02-02
content: “Great”
date: 2017-02-03
name: “Dan”
age: 47
![Page 9: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/9.jpg)
Graph processing: Queries and analytics
![Page 10: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/10.jpg)
GRAPH QUERIES: LOCAL
c4
c2
c5
c3
c6
c1
A B
D E
C
Local graph query:
Return “Dan” and his comments.
Well-researched topic.
Typical execution times are low.
name: “Alice”
age: 25
name: “Bob”
age: 26
since: 2014
name: “Erin”
age: 30
content: “I totally agree”
date: 2017-02-02
content: “Great”
date: 2017-02-03
name: “Dan”
age: 47
![Page 11: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/11.jpg)
GRAPH QUERIES: GLOBAL
c4
c2
c5
c3
c6
c1
A B
D E
C
Global graph query:Find people who had no interaction
with “Cecil” through any comments,
neither replying nor receiving a reply.
The result is „Alice”.
Typical execution times are high.
name: “Alice”
age: 25
name: “Bob”
age: 26
since: 2014
name: “Erin”
age: 30
content: “I totally agree”
date: 2017-02-02
content: “Great”
date: 2017-02-03
name: “Dan”
age: 47
![Page 12: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/12.jpg)
GRAPH ANALTYICS: NETWORK SCIENCE
Studies the structure of graphs
Pioneered by László Barabási-Albert et al.
Degree distributions, clustering coefficient, etc.
![Page 13: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/13.jpg)
LOCAL CLUSTERING COEFFICIENT
A B
D E
C LCC(𝑣)=
𝑣
𝑣2
3
2
3
2
3
2
3
2
3
0 0.66 10.33
0.0
0.5
1.0
LCC
The empirical cumulative distribution
function does not present much useful
information in this case.
![Page 14: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/14.jpg)
TYPED CLUSTERING COEFFICIENT
𝑣
TCC(𝑣)=𝑣
0 0.66 10.33
0.0
0.5
1.0
TCC
𝑣
𝑣
+
+
More information
High combinatorial complexity:
• 𝑡 types → 𝑡 × (𝑡 − 1) triangles
• 𝒪 𝑡2 steps
1
2
0 2
3
0 0
A B
D E
C
![Page 15: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/15.jpg)
TYPED CLUSTERING COEFFICIENT
A B
D E
C
𝑣
TCC(𝑣)=
𝑣+
𝑣+
𝑣 𝑣+
𝑣++
𝑣 𝑣+
𝑣+
Business partners
Friends
Family member
3 types → 6 triangles
Petra Várhegyi:
Multidimensional Graph Analytics,
Master’s thesis, 2018
F. Battiston et al.:
Structural measures for multiplex networks,
Physical Review E, 2014
![Page 16: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/16.jpg)
level of detail
estimatedevaluation
time
BFS
GRAPH PROCESSING TECHNIQUES AND LANGUAGES
PageRankDijkstra
structure +types +properties
Local clustering
coeff.
+weights
Floyd
Ford-Fulkerson
Global queries
Local queries
Typedclustering
coeff.
Neo4j Graph Algorithms library Neo4j Graph Database
![Page 17: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/17.jpg)
level of detail
estimatedevaluation
time
BFS
GRAPH PROCESSING TECHNIQUES AND LANGUAGES
PageRankDijkstra
structure +types +properties+weights
Floyd
Ford-Fulkerson
Global queries
Local queries
Neo4j Graph Algorithms library
Typedclustering
coeff.
Neo4j Graph Database
Local clustering
coeff.
![Page 18: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/18.jpg)
Graph processing tools and challenges
![Page 19: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/19.jpg)
GRAPH PROCESSING CHALLENGES / STRUCTURE
the “curse of connectedness”
data structures contemporary computer architectures are
good at processing are linear and simple hierarchical
structures, such as Lists, Stacks, or Trees
a massive amount of random data access is required […]
poor performance since the CPU cache is not in effect for
most of the time. […] parallelism is difficult to extract
because of the unstructured nature of graphs.
B. Shao, Y. Li, H. Wang, H. Xia (Microsoft Research):
Trinity Graph Engine and its Applications,
IEEE Data Engineering Bulleting 2017
connectedness
computer
architectures
caching and
parallelization
![Page 20: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/20.jpg)
GRAPH PROCESSING CHALLENGES / PROPERTIES
existing graph query methods […] focus on the topological
structure of graphs and few have considered attributed graphs.
applications of large graph databases would involve querying the
graph data (attributes) in addition to the graph topology.
answering queries that involve predicates on the attributes of
the graphs in addition to the topological structure […] makes
evaluation and optimization more complex.
S. Sakr, S. Elnikety, Y. He (Microsoft Research):
G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs,
CIKM 2012
topology
properties
complex
optimization
![Page 21: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/21.jpg)
GRAPH PROCESSING TOOLS
graph
queries
graph
analytics
Currently, there is a strong distinction between graph query
and analytical tools – this might change in the future.
Gelly LynxKite
János Szendi-Varga (GraphAware):
Graph Technology Landscape 2019Neo4j Graph Algorithms library
![Page 22: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/22.jpg)
Benchmarks:
Defining a common understanding
![Page 23: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/23.jpg)
TRANSACTION PROCESSING PERFORMANCE COUNCIL (1988-)
Many standard specifications
for benchmarking certain
aspects of relational DBs
![Page 24: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/24.jpg)
LINKED DATA BENCHMARK COUNCIL (2012–)
LDBC is a non-profit organization dedicated to establishing benchmarks,
benchmark practices and benchmark results for graph data management
software.
LDBC’s Social Network Benchmark is an industrial and academic initiative,
formed by principal actors in the field of graph-like data management.
![Page 25: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/25.jpg)
LDBC SOCIAL NETWORK BENCHMARK
Complex graph schema
14 node types, many edge types
Subgraphs
Network of persons
Arbitrary depth trees
o Comments
o TagClasses
Fixed depth trees
o City < Country < Continent
![Page 26: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/26.jpg)
LDBC INTERACTIVE Q3
Friends and friends of friends that have been to countries X and Y
![Page 27: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/27.jpg)
LDBC INTERACTIVE Q14
Trusted connection paths
![Page 28: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/28.jpg)
1 2 73 4 5 6
8 9 1410 11 12 13 1615
17 18 2319 20 21 22 2524
BI WORKLOAD
![Page 29: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/29.jpg)
GraphBLAS:
A unified theory built on linear algebra
![Page 30: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/30.jpg)
THE GRAPHBLAS APPROACH
BLAS GraphBLAS
HW architecture HW architecture
Numerical
applications
Graph analytical
applications
LAGraphLINPACK/LAPACK
S. McMillan: Research review @ CMU, 2015
Graph algorithms on future architectures
Separation of concernsSeparation of concerns
GraphBLAS is an effort to define standard building blocks for graph algorithms
in the language of linear algebra
1979: BLAS (Basic Linear Algebra Subprograms)
2013: GraphBLAS
Key idea: separation of concerns
Graph algorithm
implementers
Hardware vendors
HPC experts
Tim Mattson et al.: LAGraph,
GrAPL @ IPDPS 2019
![Page 31: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/31.jpg)
PARALLELIZATION ON SKEWED DISTRIBUTIONS
Using multiple processing units require load balancing.
Very difficult to implement for real graphs.
This work is in progress and improvements are expected.
Gábor Szárnyas: Multiplex graph analytics
with GraphBLAS, FOSDEM 2019
Bálint Hegyi: Benchmarking scalable graph
query techniques, Master’s thesis, 2019
![Page 32: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/32.jpg)
Summary
![Page 33: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/33.jpg)
SUMMARY: CHALLENGES IN GRAPH PROCESSING
No consensus on a unifying theory:
Relational algebra?
Linear algebra?
Performance:
Many random access operations
Difficult to cache
Difficult to parallelize
Handling properties introduces even more complexity
Many open research and implementation challenges.
![Page 34: Gábor Szárnyas szarnyas@mit.bme...LINKED DATA BENCHMARK COUNCIL (2012–) LDBC is a non-profit organization dedicated to establishing benchmarks, benchmark practices and benchmark](https://reader033.vdocuments.us/reader033/viewer/2022052002/6015855fe12acc0d5866f59e/html5/thumbnails/34.jpg)
CONTRIBUTIONS IN MY PHD DISSERTATION
databaseresearch
high-performancecomputing
network scienceobject-orientedSW engineering
semantic web P1
P2
P3
Gábor Szárnyas:
Query, Analysis, and Benchmarking Techniques for Evolving Property Graphs of Software Systems,
PhD dissertation, 2019