edbt 2015: summer school overview
TRANSCRIPT
![Page 1: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/1.jpg)
EDBTSummer
School 2015
Badenes, CarlosGarijo, Daniel
Priyatna, Freddy
Palamos, Spain31/8 - 4/9 2015
![Page 2: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/2.jpg)
The Venue
2
(where we thought we would be) (where we actually were)
![Page 3: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/3.jpg)
Overview
3
Graph Data Management Part I: Theoretical
- Notes about lectures Part II: Practical- Sparksee Technology - Challenges
![Page 4: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/4.jpg)
Part I: Theoretical
4
๏ Large Scale Graph Processing System - (C. Badenes)Sherif Sakr - National ICT Australia
๏ Graph Visualization - (C. Badenes)Peter Eades - University of Sydney
๏ Graph Data Management - (F. Priyatna)Claudio Gutierrez - Universidad de Chile
๏ Applications of Flexible Querying to Graphs - (F. Priyatna)Alexandra Poulovassilis - Birkbeck, University of London
๏ Graph Management Benchmarking - (F. Priyatna)Peter Boncz - CWI and Vrije Universiteit Amsterdam
๏ Graph Algorithms - (D. Garijo)Dennis Shasha - New York University
๏ Parallel Processing - (D. Garijo)Bin Shao - Microsoft Research, Beijing
![Page 5: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/5.jpg)
Graph Data Management
5
Dr. Claudio GutierrezComputer Science DepartmentUniversidad de Chile
http://richard.cyganiak.de/blog/2006/06/perez-et-al-semantics-and-complexity-of-sparql/
(2-2, 1-1)
![Page 6: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/6.jpg)
a general view of the main features of current graph databases
Graph Data Management
6
A hypernode is a directed graph whose nodes can themselves be graphs (or hypernodes), allowing nesting of
graphs.
A property graph is a directed, labelled, attributed multigraph. That is, a graph where the edges are directed, both nodes and edges are labeled and can have
any number of properties (or attributes), and there can be multiple edges between any two vertices.
![Page 7: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/7.jpg)
Applications of Flexible Querying to Graphs
7
Dr. Alexandra PoulovassilisDepartment of Computer Science and Information Systems, Birkbeck, University of London
Reasoning in Event-Based Distributed Systems
Authors: Helmer, Sven, Poulovassilis, Alexandra, Xhafa,
Fatos
Adapting to Change in Content, Size, Topology
and Use
Editors: Levene, Mark, Poulovassilis, Alexandra (Eds.)
The Functional Approach to Data Management
Modeling, Analyzing and Integrating Heterogeneous
Data
Editors: Gray, P.M.D., Kerschberg, L., King, P.J.H., Poulovassilis, A.
(Eds.)
![Page 8: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/8.jpg)
Applications of Flexible Querying to Graphs
8
Query relaxation, which generally returns additional answers compared to the exact form
of the database query.
Query approximation, which returns potentially different answers compared to the exact form of
the query.
Q2 = SELECT * WHERE {
?x :actedIn :Tea_with_Mussolini .
RELAX ( ?x :hasFamilyName ?z ) }
Q3 = SELECT * WHERE {
?x :actedIn :Tea_with_Mussolini .
?x :label ?z . }
Q3.1 = SELECT * WHERE {
?x :actedIn :Tea_with_Mussolini .
?x :hasGivenName ?z ) }
Q3.2 = SELECT * WHERE {
?x :actedIn :Tea_with_Mussolini .
?x :hasFamilyName ?z . }
Q1= SELECT * WHERE {
APPROX ( :Battle_of_Waterloo :happenedIn/(:hasLongitude|:hasLatitude) ?x ) }
Q1.1=SELECT * WHERE {
:Battle_of_Waterloo :hasLongitude ?x }
Q1.2=SELECT * WHERE {
:Battle_of_Waterloo :hasLatitude ?x }
hasFamilyNamehasGivenName
labelsubPropertyOf subPropertyOf
SparqlAR
![Page 9: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/9.jpg)
Graph Management Benchmarking
9
Dr. Peter BonczCentrum Wiskunde & Informatica (CWI)
![Page 10: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/10.jpg)
Graph Management Benchmarking
10
Description: Given a start Person, find the Forums which that Person’s friends and friends of friends (excluding start Person) became Members of after a given date. Return top 20 Forums, and the number of Posts in each Forum that was Created by any of these Persons. For each Forum consider only those Persons which joined that particular Forum after the given date. Sort results descending by the count of Posts, and then ascending by Forum identifier.
![Page 11: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/11.jpg)
Graph Management Benchmarking
11
Description: Given a start Person, find the Forums which that Person’s friends and friends of friends (excluding start Person) became Members of after a given date. Return top 20 Forums, and the number of Posts in each Forum that was Created by any of these Persons. For each Forum consider only those Persons which joined that particular Forum after the given date. Sort results descending by the count of Posts, and then ascending by Forum identifier.
![Page 12: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/12.jpg)
Graph Motifs
12
![Page 13: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/13.jpg)
Graph Motifs
13
![Page 14: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/14.jpg)
Parallel Processing
14
![Page 15: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/15.jpg)
Parallel Processing
15
![Page 16: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/16.jpg)
Large Scale Graph Processing System
16
Dr. Sherif Sakr Associate Professor at
College of Public Health and Health Informatics at King Saud bin Abdul-Aziz University
“Big Data (Graph) Processing Systems: State-of-the-art and open challenges”
![Page 17: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/17.jpg)
Large Scale Graph Processing System
17
Pregel FamilyBulk Synchronous Parallel (BSP) model
L. G. Valiant. A Bridging Model for Parallel Computation. Commun. ACM, 1990
GraphLab FamilyGather, Apply, Scatter (GAS) model
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Distributed GraphLab: A Framework for Machine Learning in the Cloud. PVLDB, 2012
![Page 18: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/18.jpg)
Graph Visualization
18
Dr. Peter Eades Research Professor at
School of Information Technologies at The University of Sidney
Data Drawing Human
VisualizationFunction
PerceptionFunction
faithful + readable
![Page 19: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/19.jpg)
Graph Visualization
19
Topology-Shape-Metric approach
Energy-based approach
Clustered Planarity:
Multilevel methods:
Fast Approximations:
scaling to large graphs
scaling to large graphs
![Page 20: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/20.jpg)
Part II: Practical
๏ Sparksee - Sparsity Technologies (C.Badenes)Universitat Politécnica de Catalunya
๏ Challenges :: OEG-Team - (D.Garijo)Similarities between Wikipedia Articles
20
![Page 21: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/21.jpg)
Sparksee
21
![Page 22: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/22.jpg)
Sparksee
22
schema
query: Get common Messages for the given Hashtags
// User Nodeint nodeUser = graph.newNodeType("User");int userNickName = graph.newAttribute(nodeUser, "nickname", DataType.String, AttributeKind.Unique);
// knows edgeint edgeKnows = graph.newEdgeType("knows", true, true);
// User1long user1 = graph.newNode(nodeUser);graph.setAttribute(user1,userNickName,new Value().setString(“User1"));
// edge 'knows'long knows1 = graph.newEdge(edgeKnows, user1, user2);
// Find out the OID of the Hashtags with the given hastag's texts.int tag = g.findType("Tag");int tagName = g.findAttribute(tag, "name");long tag1 = g.findObject(tagName, new Value().setString(ht1));long tag2 = g.findObject(tagName, new Value().setString(ht2));
// Retrieve Messages with both hashtags and intersect the retrieved collection of Messages.int tags = g.findType("tags");Objects msgs1 = g.neighbors(tag1, tags, EdgesDirection.Ingoing);Objects msgs2 = g.neighbors(tag2, tags, EdgesDirection.Ingoing);long nums = msgs1.intersection(msgs2);
![Page 23: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/23.jpg)
Challenge
23
Similarities in Wikipedia- Description
- To Evaluate- The design- A good proof of functionality- The efficiency, in terms of computation time- The originality of the proposed method
- Technical prerequisites of participants- Basic programming skills- To be familiar with some graph library
- Technical support provided to participants- English Wikipedia data (dump):
- articles_ids.csv- articles_links.csv- articles_body.csv- articles_redirect.csv- categories_ids.csv- articles_category.csv- categories_relations.csv
![Page 24: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/24.jpg)
Problem
24
Similarity between Wikipedia Articles
Wikipedia Article: text
links
categories
![Page 25: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/25.jpg)
Hypothesis
25
Wikipedia Article: text
links
categories
simLinks
simCtg
simTextα·
β·
ɣ·
+
+
simWA(R1,R2) = α·simTxt(R1,R2) + β·simLinks(R1,R2) + ɣ·simCtg(R1,R2)
where α+β+ɣ=1
![Page 26: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/26.jpg)
Similarity based on Text
26
…
TOPIC_1
p = [0.5, 0.3,.., 0.7]q = [0.2, 0.4,.., 0.9]Ri R
j
TOPIC_2 TOPIC_n
LatentDirichletAllocation
![Page 27: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/27.jpg)
Similarity based on Categories
27
Articles with multiple common categories are likely to be similar
Noise filtering is necessary (e.g., “All articles lacking in-text citations”).See https://github.com/cbadenes/siminwikart-challenge4/blob/master/category/wikipedia_bad_categories.txt
![Page 28: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/28.jpg)
Similarity based on Links
28
Sim(A,B) = links(A) ∩ links(B) / ( (links(A) U links(B) ) / 2)
Articles with multiple common linksare likely to be similar
![Page 29: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/29.jpg)
Proof of Concept
29
Fernando Alonso
Lionel Messi
Iker CasillasPrincess Akiko
(simLinks) α = 0.2(simCtg) β = 0.2(simTxt) ɣ = 0.6
[1]0.062[3]0.075
[1]0.666[3]0.683
[1]0.058[3]0.069
[1]0.043[3]0.072
[1]0.019[3]0.023
[1]0.068[3]0.069
simTxt = 0.059simLinks = 0.019simCtg=[1]0.117
[3]0.181
simTxt = 0.065simLinks = 0.0simCtg=[1]0.095
[3]0.161
simTxt = 0.052simLinks = 0.019simCtg=[1]0.166
[3]0.172
simTxt = 0.980simLinks = 0.175simCtg=[1]0.217
[3]0.302
simTxt = 0.060simLinks = 0.008simCtg=[1]0.030
[3]0.172
simTxt = 0.069simLinks = 0.004simCtg=[1]0.080
[3]0.134
![Page 30: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/30.jpg)
Comparison
30
Lionel Messi
Princess Akiko
simTxt = 0.060 -> <common words>simLinks = 0.008 -> (England,Buenos_Aires,Chile,Madrid,Argentina)simCtg=[1]0.030 -> living_person
![Page 31: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/31.jpg)
Proposal
31
Graph based on Links Graph based on Similarities
![Page 32: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/32.jpg)
Problem
32
Wikipedia links reliability(missing links)
Wikipedia Article: text
links
categories
![Page 33: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/33.jpg)
Further Refinement
33
Similarities between categories (as topics) can define relations between articles
Graph based on Links Graph based on Similarities
Subgraph Pattern Matching
+Topic Model
+
![Page 34: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/34.jpg)
Code
34
https://github.com/cbadenes/siminwikart-challenge4
![Page 35: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/35.jpg)
Happy Ending
35
![Page 36: EDBT 2015: Summer School Overview](https://reader036.vdocuments.us/reader036/viewer/2022062822/58830c691a28ab31068b47ed/html5/thumbnails/36.jpg)
Kitkat Time
• Suggestions?
• Name for the system?
• Contributors?
36