towards data analytics on attributed graphs ngs qe oral presentation 1 student : qi fan supervisor:...

Post on 31-Dec-2015

217 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

TOWARDS DATA ANALYTICS ON ATTRIBUTED GRAPHSNGS QE Oral Presentation

Student : Qi FanSupervisor: Prof. Kian-lee Tan

2

Outline

• Attributed Graph Analytic

• Graph Window Query

• Graph Window Query Processing

• Experiments

• Future Works

Graph Analytic Window Query Query Processing Experiments Future Work

3

Outline

Attributed Graph Analytic

• Graph Window Query

• Graph Window Query Processing

• Experiments

• Future Works

Graph Analytic Window Query Query Processing Experiments Future Work

4

Data Analytics

• Data Analytics plays an important part in business [1]• Web analytics for advertising and recommendation• Customer analytics for market optimization• Portfolio analytics for risk control

• Analytics on data yield:• Data products• Data-driven decision support• Insights of data model

[1] Analytics Examples: http://en.wikipedia.org/wiki/Analytics

Graph Analytic Window Query Query Processing Experiments Future Work

5

Relational Data Analytic

• Table as data representation, SQL as the query language

• Analytic SQL:• Ranking• Windowing• LAG/LEAD• FIRST/LAST• SKYLINE • TOP-K• … …

Graph Analytic Window Query Query Processing Experiments Future Work

6

Emerging of Large Linked Data

• In real world, linked data are becoming emerging:• Facebook, LinkedIn, Biological network, Phone Call

network, Twitter, etc.

• Modeling linked data in relational way and querying using SQL is inefficient:• Graph queries are often traverse based• SQL based traversal is 100 times slower than adjacent

list based [1]

• Graph model is more fit for linked data!!![1] http://java.dzone.com/articles/mysql-vs-neo4j-large-scale

Graph Analytic Window Query Query Processing Experiments Future Work

8

Graph Data Model

Vertex Edge

G = (V, E, A)

Attributed Graph Vertices Edges

Graph

Vertex Attr1 Attr2 Attr3

… …

Attribute Table

Attributes

Graph Structure + attribute dimensions

Graph Analytic Window Query Query Processing Experiments Future Work

9

Graph Data Model

• Graph Data:• Vertex – entities, i.e. User, Webpage, Molecule, etc.• Edge – relationships, i.e. follow, cite, depends-on,

friends-of, etc.• Attribute – profile information for vertex/edge

• Specific model depends on data, thus:• Edge – directed / undirected • Attribute – homogeneous, inhomogeneous

Graph Analytic Window Query Query Processing Experiments Future Work

10

Graph Data Model Example

People and follow relationships...

People and friends relationships…

Bimolecules and depends-on relationships...

Attributed Graph models a wealth of information

Graph Analytic Window Query Query Processing Experiments Future Work

11

Graph Data Analytics

• Graph Database environment is growing:• Neo4j, Titan, SPARQL, Pregel etc.

• Graph Data Analytics are becoming popular:• Graph Summarization[1], Graph OLAP [2] etc.

• In our research, we focus on:• Discover needs of native graph analytical queries• Process graph analytical query efficiently

[2] C. Chen, X. Yan, F. Zhu, J. Han, and P. S. Yu, “Graph olap: Towards online analytical processing on graphs,” in Data Mining, 2008. ICDM’08

[1] Tian, Y., Hankins, R. A., & Patel, J. M. (2008, June). Efficient aggregation for graph summarization. In Proceedings of the 2008 ACM SIGMOD

Graph Analytic Window Query Query Processing Experiments Future Work

12

Outline

• Attributed Graph Analytic

Graph Window Query

• Graph Window Query Processing

• Experiments

• Future Works

Graph Analytic Window Query Query Processing Experiments Future Work

13

SQL Window Query• A SQL window query:

• Partitions a table• Sorts each partition• Implicitly forms window of each tuple

Window of Tuple 7

Window of a tuple contains other tuples related to it

Graph Analytic Window Query Query Processing Experiments Future Work

14

Graph Window Query

• In graph, a vertex can also have a set of related vertices to be its window.

• The aggregation on window is a personalized analysis over each vertex.

Graph Analytic Window Query Query Processing Experiments Future Work

15

Graph Window Examples

• These queries focus on the neighborhoods of each user, thus the neighborhoods forms a vertex’s window

Summarizing the age distribution of each user’s friends

Summarizing the activeness of each user’s friends

Analyze the industry distribution of a user potential connections

Graph Analytic Window Query Query Processing Experiments Future Work

16

Graph Window Examples

• These queries focus on the ancestor-descendent relationship of molecules, thus ancestor-descendent is a vertex’s window

Find how many enzymes are in each molecule’s pathway

Find how many molecules are affected by each enzyme in the pathway

Graph Analytic Window Query Query Processing Experiments Future Work

17

Graph Window Queries

• We thus identify two types of graph window queries:

• K-hop window (k-window):• A vertex’s k-hop window contains all the vertices that

are its the k-hop neighbors.

• Topological window (t-window):• A vertex’s topological window contains all the vertices

that are its accentors / descendents

Graph Analytic Window Query Query Processing Experiments Future Work

18

Graph Window Queries

• K-hop Window:• Similar to ego-centric analysis of network analysis

community• For undirected graph:

• all vertices that can connect a vertex

• For directed graph:• In-k-hop, for vertices that reaches a vertex in k-hop• Out-k-hop, for vertices that reached by a vertex in k-hop

• K-hop, union of in-k-hop and out-k-hop

• T-Window:• Requires graph to be DAG

Graph Analytic Window Query Query Processing Experiments Future Work

19

Graph Window Queries

• Graph Window Query:• INPUT: a specific window (k-hop, topological) and an

aggregation function

• OUTPUT: aggregated value over each vertex’s window

Graph Analytic Window Query Query Processing Experiments Future Work

20

Outline

• Attributed Graph Analytic

• Graph Window Query

Graph Window Query Processing

• Experiments

• Future Works

Graph Analytic Window Query Query Processing Experiments Future Work

21

Related Work• In [1] a system EAGr has been proposed to process

neighborhood query• Focuses on 1-hop neighbor

• It uses iterative planning methods to share aggregations results between different vertex’s window

• However, it assumes a large intermediate data to reside in memory, which is not reasonable for k-window () and t-window

[1] J. Mondal and A. Deshpande, “Eagr: Supporting continuous ego-centric aggregate queries overlarge dynamic graphs,” SIGMOD, 2015.

Graph Analytic Window Query Query Processing Experiments Future Work

22

Graph Window Query Processing• Naïve Processing I:

1. Compute vertex’s window sequentially

2. Aggregate each vertex individually

• Advantage:• No large intermediate data generated

• Inefficiencies:• Repeated computation of every vertex’s window:

• k-window is of complexity in arbitrary graph• t-window is of complexity in arbitrary graph

• Slow in individual aggregation:• Each vertex may have window size of • Total aggregation complexity can be

Graph Analytic Window Query Query Processing Experiments Future Work

23

Graph Window Query Processing

• Naïve Processing II:1. Materialize each vertex’s window

2. On query processing, aggregate each vertex’s window individually

• Advantage:• No computation of windows at run time

• Inefficiencies:• Materialize is not memory efficient

• All the vertex’s window can be as large as

• Query processing is still slow as in Naïve Processing I

Graph Analytic Window Query Query Processing Experiments Future Work

25

Overview of our approach

• Two index schemes:• Dense Block Index: for general window and k-hop

window• Parent Index: for topological window

• Indexes achieves:• Completely preserve the window information for each

vertex• Space efficiency• Efficient run-time query processing

Graph Analytic Window Query Query Processing Experiments Future Work

26

Dense Block Index – Matrix View• Window Matrix:

• Records vertex-window mapping• Rows represent vertex• Columns represent window

A B C D E FA 1 1 1 1 1 1B 1 1 0 1 0 1C 1 0 1 1 1 1D 1 1 1 1 0 0E 1 0 1 0 1 0F 1 1 1 0 0 1

Graph Analytic Window Query Query Processing Experiments Future Work

27

Dense Block Index – Matrix View• Window Matrix Properties:

• Boolean matrix• Completely keeps the vertex-

window information

• Equivalent Matrices:• Window matrix can be applied

with row and column permutations

• Invariant: number of non-zero elements ()

A B C D E FA 1 1 1 1 1 1B 1 1 0 1 0 1C 1 0 1 1 1 1D 1 1 1 1 0 0E 1 0 1 0 1 0F 1 1 1 0 0 1

A C B E D FB 1 0 1 0 1 1D 1 1 1 0 1 0F 1 1 1 0 0 1A 1 1 1 1 1 1C 1 1 0 1 1 1E 1 1 0 1 0 0

Graph Analytic Window Query Query Processing Experiments Future Work

28

Dense Block Index – Matrix View

• Window matrix based aggregation:• Similar to Naïve Processing II

1. Traverse the matrix vertically

2. Aggregate the cells with value one, ignore cells with value zero

• Space and Query Complexity:• in sparse matrix format• in matrix format• Note that can be as large as

Graph Analytic Window Query Query Processing Experiments Future Work

29

Dense Block Index• Dense Blocks:

• Given a matrix, dense blocks is the submatrix whose values are all non-zeros

• Properties of Dense Blocks ():• Space complexity

• compared to

• Query complexity• compared to

{𝐴 ,𝐵 }× {𝐴 ,𝐵 ,𝐶 }A B C D

A 1 1 1 0

B 1 1 1 0

C 0 0 0 1

D 1 0 0 1

Store row id and column id i.e. (A,B)(A,B,C) rather than 6 elements

Query: Compute A+B first, then the result is shared for window (A,B,C)

Same asymptotical bounds, thus can optimize both simultaneously

Graph Analytic Window Query Query Processing Experiments Future Work

30

• Dense Block Index:• For every window to be computed, index all the dense

blocks in a window matrix

• A bipartite graph

A B C D E F

A,F,D B A,CC,ED E F

A C B E D F

B 1 0 1 0 1 1

D 1 1 1 0 1 0

F 1 1 1 0 0 1

A 1 1 1 1 1 1

C 1 1 0 1 1 1

E 1 1 0 1 0 0

Dense Block Index

Graph Analytic Window Query Query Processing Experiments Future Work

31

Dense Block Index

• Properties:• Preserves every non-zero entry of window matrix• During query, no need to access original window

matrix

• Query Processing:1. compute partial aggregates for each dense block

2. compute final aggregates for every window

Graph Analytic Window Query Query Processing Experiments Future Work

32

Dense Block Index Query ProcessingSummarizing the activeness of each user’s friends:

Compute On Graph GOver 1-hop Window

A 118B 64C 103D 78E 66F 55

Graph Analytic Window Query Query Processing Experiments Future Work

33

Dense Block Index• Equivalent matrices may have different optimal partitions

• Find best dense block partition out of all equivalent matrices• Fixed size dense block partition is NP-hard [1]• Heuristics need to be applied

A B C D E FA 1 1 1 1 1 1B 1 1 0 1 0 1C 1 0 1 1 1 1D 1 1 1 1 0 0E 1 0 1 0 1 0F 1 1 1 0 0 1

A C B E D FB 1 0 1 0 1 1D 1 1 1 0 1 0F 1 1 1 0 0 1A 1 1 1 1 1 1C 1 1 0 1 1 1E 1 1 0 1 0 0

[1] V. Vassilevska and A. Pinar, “Finding nonoverlapping dense blocks of a sparse matrix,” Lawrence Berkeley National Laboratory, 2004

Graph Analytic Window Query Query Processing Experiments Future Work

34

MinHash Clustering for DBI

• Heuristic• Classifies similar windows together, then mining the

dense blocks in each cluster• Clustering + Mining

• Clustering:• Jaccard coefficient is used to measure the similarity

between windows• Since each window is a set of vertices

• MinHash is an efficient way to perform Jaccard coefficient based clustering

Graph Analytic Window Query Query Processing Experiments Future Work

35

MinHash Clustering for DBI

• Mining:1. Build partial window matrix for each cluster

2. Condense the rows with identical values

3. For uncondensed rows, recursively cluster + mining, until stop condition achieves

Graph Analytic Window Query Query Processing Experiments Future Work

36

MinHash Clustering for DBIA B C D E F

A 0 0 1 1 1 1B 1 1 1 1 1 0C 0 0 1 1 1 1D 1 1 1 1 0 1E 0 0 1 1 0 0F 0 1 1 1 1 1

A BA 0 0B 1 1C 0 0D 1 1E 0 0F 0 1

C D E FA 1 1 1 1B 1 1 1 0C 1 1 1 1D 1 1 0 1E 1 1 0 0F 1 1 1 1

A BB,D 1 1F 0 1

C D E FA,C,F 1 1 1 1

C D E FB 1 1 1 0D 1 1 0 1E 1 1 0 0

MinHash Clustering

{𝐴 ,𝐶 ,𝐹 }× {𝐶 ,𝐷 ,𝐸 ,𝐹 }

OutputsOutputs

Split

Recursive cluster

Graph Analytic Window Query Query Processing Experiments Future Work

37

MinHash Clustering for DBI• DBI generation can be summarized into following steps:

• Clustering Step:1. Min-Hash each vertex, based on its window

• Mining Step:1. Generate partial matrix for each window

2. Group identical rows

3. Recursive clustering

Bottlenecks

MINHASH COST: WINDOW COST: for k-window, for t-windowToo HIGH in practice

Graph Analytic Window Query Query Processing Experiments Future Work

38

Estimated MinHash Clustering

• For K-hop, we developed an estimation scheme to speed up the index creation process.

• The observation is that when hop goes larger, the overlapping between each vertex also goes larger• Thus we can use lower hop window information in the

clustering phase

Graph Analytic Window Query Query Processing Experiments Future Work

39

Comparison• MinHash Clustering

1. Clustering Step:1. Min-Hash each

vertex, based on its window

2. Mining Step:1. Generate partial

matrix for each window

2. Group Identical rows

3. Recursive clustering

• Estimated Clustering1. Clustering Step:

1. Min-Hash each vertex, based on its lower-hop window

2. Mining Step:1. Generate partial

matrix for each window

2. Group Identical rows

3. Recursive clustering

The estimation reduces the indexing time since:1. Lower-hop window has less elements, so MinHash is faster2. Lower-hop window generation requires less time

Graph Analytic Window Query Query Processing Experiments Future Work

40

Topological Window Processing

• Dense Block Index can be used on Topological Window as well• However, more efficient index exists given a T-

window query

• Containment Relationship in T-window• If , then • Thus, when compute window of , ’s result can be

directly used.

Graph Analytic Window Query Query Processing Experiments Future Work

41

Parent Index• Given , in order to use for computing , we need to

materialize the difference between and

• For a given , the vertex with smallest difference must be one of ’s parent

• Thus, for each vertex, we only index its parent which has the smallest different

Graph Analytic Window Query Query Processing Experiments Future Work

42

Parent Index• A parent index is a lookup table of three fields:• Vertex: the index entry• Parent: the closest parent

id• Diff: the difference

vertices between Vertex and Parent

Graph Analytic Window Query Query Processing Experiments Future Work

43

Parent Index based Query Processing

• Topologically process each vertex’ window

• Use the formulae:

• Topological order ensures that when processing a vertex, its parents’ results are ready

Graph Analytic Window Query Query Processing Experiments Future Work

44

Parent Index Creation

• Efficiently creation based on Topological Scan:• During scan, each vertex passes its current ancestor

information to its child• Child on receiving parents’ ancestor information, union

these ancestors• Child on receiving all parents information, record the

portent with smallest difference

Graph Analytic Window Query Query Processing Experiments Future Work

45

Outline

• Attributed Graph Analytic

• Graph Window Query

• Graph Window Query Processing

Experiments

• Future Works

Graph Analytic Window Query Query Processing Experiments Future Work

46

Experiments

• Machine: 2.27GHz CPU with 32 GB memory

• Data Synthetic:• SNAP [1] generator for directed graphs• DAGGR [2] generator for DAGs

[2] H. Yildirim, V. Chaoji, and M. J. Zaki, “Dagger: A scalable index for reachability queries in large dynamic graphs,” arXiv preprint arXiv:1301.0977, 2013.

[1] Stanford Networ Analysis Platform, http://snap.stanford.edu/snap/index.html

Graph Analytic Window Query Query Processing Experiments Future Work

47

Comparing Algorithms• K-hop window:

• MA: materialize ahead algorithm (materialize vertex-window mapping, individual aggregate)

• KBBFS: bounded BFS for computing window of each vertex• MC: MinHash Clustering• EMC: Estimated MinHash Clustering

• Topological window:• MA• DBI: dense block index• TS: Topological Scan to compute window of each vertex• PI: parent index

Graph Analytic Window Query Query Processing Experiments Future Work

48

Effectiveness of Estimation

Hop = 1 Hop = 2

Hop = 3 Hop = 4

Graph Analytic Window Query Query Processing Experiments Future Work

49

Benefit of Estimation

Degree 160

Hop MC_HASH MC_BFS EMC_HASH EMC_BFS EMC/MC

2 157,885 241,072 1,666 120,931 0.307294

3 2,281,794 4,494,853 1,637 2,257,493 0.33337

4 4,355,439 8,633,192 1,631 4,414,207 0.339977

Hop MC_HASH MC_BFS EMC_HASH EMC_BFS EMC/MC

2 33,611 19,559 484 9,974 0.19669

3 417,102 742,502 470 374,489 0.323351

4 964,521 184,3078 471 927,751 0.330611

Degree 40

Graph Analytic Window Query Query Processing Experiments Future Work

50

Index size of MC and EMC

Degree = 40

Graph Analytic Window Query Query Processing Experiments Future Work

51

Scalability of EMC

V = 100k, hop =1

V = 100k, hop = 2

Graph Analytic Window Query Query Processing Experiments Future Work

52

Effectiveness of PI

V = 10k

Graph Analytic Window Query Query Processing Experiments Future Work

53

Index size of PI

Vertex = 10k

Graph Analytic Window Query Query Processing Experiments Future Work

54

Indexing Time of PI

Degree = 20

Graph Analytic Window Query Query Processing Experiments Future Work

55

Scalability of PI

Degree = 10

Graph Analytic Window Query Query Processing Experiments Future Work

56

Outline

• Attributed Graph Analytic

• Graph Window Query

• Graph Window Query Processing

• Experiments

Future Works

Graph Analytic Window Query Query Processing Experiments Future Work

57

Conclusion and Future Work

• Conclusion:• We proposed two graph window queries and two

indexes for efficient processing

• In future:• Extend the query processing to handle large graphs (in

parallel platform / disk resident index)• More complex aggregation processing (include graph

OLAP)• Dynamic graphs (able to handle updates)

Graph Analytic Window Query Query Processing Experiments Future Work

58

Thank you !

Graph Analytic Window Query Query Processing Experiments Future Work

top related