compression-based graph mining exploiting structure primites

32
Compression-based Graph Mining Exploiting Structure Primitives Seminar explorative Datenanalyse Werner Hoffmann 19.06.2015 Jing Feng, Xiao He, Nina Hubig, Christian Böhm and Claudia Plant

Upload: werner-hoffmann

Post on 11-Feb-2017

105 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: Compression-based Graph Mining Exploiting Structure Primites

Compression-based Graph Mining Exploiting

Structure Primitives

Seminar explorative DatenanalyseWerner Hoffmann

19.06.2015

Jing Feng, Xiao He, Nina Hubig, Christian Böhm and Claudia Plant

Page 2: Compression-based Graph Mining Exploiting Structure Primites

Outline

- What?- Why?- How?- Conclusion!

2

Page 3: Compression-based Graph Mining Exploiting Structure Primites

Context [1]

Graphs:unweightedundirectedmodelled as adjacency matrixsparse

3

Page 4: Compression-based Graph Mining Exploiting Structure Primites

Social Media DataIs Facebook sparse?-> 1.4 x 10^9 nodes ¹-> on average 340 friends² per node-> 478 x 10^9 edges-> possible edges: 0.9 x 10^18 => only 0,000000156% of all possible edges existYes Facebook is very sparse¹https://en.wikipedia.org/wiki/Facebook ²http://www.statista.com/statistics/232499/americans-who-use-social-networking-sites-several-times-per-day/

4

Page 6: Compression-based Graph Mining Exploiting Structure Primites

Instagram network Werner

6

51 friends13 edges[4]

Page 7: Compression-based Graph Mining Exploiting Structure Primites

Goal

Find values for

transitivity [2] and

hubness of a graph

7

Page 8: Compression-based Graph Mining Exploiting Structure Primites

Outline

- What?- Why?- How?- Conclusion!

8

Page 9: Compression-based Graph Mining Exploiting Structure Primites

What is the benefit of knowing the structure of a graph?- deeper insights in Graph

- lossless compression is possible

- link prediction

- number of clusters

- graph partitioning9

Page 10: Compression-based Graph Mining Exploiting Structure Primites

Outline

- What?- Why?- How?- Conclusion!

10

Page 11: Compression-based Graph Mining Exploiting Structure Primites

Basic regular substructures

Trianglestransitivity

11

Starshubness

Page 12: Compression-based Graph Mining Exploiting Structure Primites

Characteristics of CXprime(Compression-based eXploiting Primitives)

Minimum Description Length - based [3]¹

no Input parameters (unsupervised)

Clustering is k-means like

¹https://en.wikipedia.org/wiki/Minimum_description_length 12

Page 13: Compression-based Graph Mining Exploiting Structure Primites

Three different ways of coding

- edge

- hub (or star)

- mesh (or triangle)

13

Page 14: Compression-based Graph Mining Exploiting Structure Primites

Coding Example Hub

14

G={(A,B);(A,C);(A,D);(A,E);(A,F)}

Page 15: Compression-based Graph Mining Exploiting Structure Primites

Coding Example Hub

15

G={(A,B);(A,C);(A,D);(A,E);(A,F)}

G={HUB(A|B,C,D,E,F}

Page 16: Compression-based Graph Mining Exploiting Structure Primites

Coding Example Mesh

16

G={(A,B);(A,C);(A,D);(A,E);(B,C);(B,D);(B,E);(C,D);(C,E);(D,E)}

Page 17: Compression-based Graph Mining Exploiting Structure Primites

Coding Example Mesh

17

G={(A,B);(A,C);(A,D);(A,E);(B,C);(B,D);(B,E);(C,D);(C,E);(D,E)}

G={HUB(A|B,C,D,E);HUB(B|C,D,E);HUB(C|D,E);HUB(D|E)}

Page 18: Compression-based Graph Mining Exploiting Structure Primites

Coding Example Mesh

18

G={(A,B);(A,C);(A,D);(A,E);(B,C);(B,D);(B,E);(C,D);(C,E);(D,E)}

G={HUB(A|B,C,D,E);HUB(B|C,D,E);HUB(C|D,E);HUB(D|E)}

G={M(A,B,C,D,E)}

Page 19: Compression-based Graph Mining Exploiting Structure Primites

Coding Example Hub

19

G={(A,B);(A,C);(A,D);(A,E);(A,F)}

G={HUB(A|B,C,D,E,F}

G={M(A,B);M(A,C);M(A,D);M(A,E);M(A,F);M(A,G)}

Page 20: Compression-based Graph Mining Exploiting Structure Primites

Outcomes 1

After coding the graph in a star-coding and in a

triangle-coding you can see which one is the

smallest, so which basic structure is most

common.

20

Page 21: Compression-based Graph Mining Exploiting Structure Primites

21

Page 22: Compression-based Graph Mining Exploiting Structure Primites

all possible connections of Three nodes

22

Page 23: Compression-based Graph Mining Exploiting Structure Primites

23

Page 24: Compression-based Graph Mining Exploiting Structure Primites

24

Page 25: Compression-based Graph Mining Exploiting Structure Primites

Outcomes 2

If you always use the minimum of the three

possible codings you get an overall minimum

graph. This graph is now clustered in areas of

hubs and triangles.

25

Page 26: Compression-based Graph Mining Exploiting Structure Primites

Outline

- What?- Why?- How?- Conclusion!

26

Page 27: Compression-based Graph Mining Exploiting Structure Primites

27

Page 28: Compression-based Graph Mining Exploiting Structure Primites

28

Page 29: Compression-based Graph Mining Exploiting Structure Primites

Critics

- No example how the coding

actually looks like

- given probabilities are not

replicable

29

Page 30: Compression-based Graph Mining Exploiting Structure Primites

Summary

The mentioned results in the paper are really good. The compression rate is extremely high compared to other graph compression algorithms. The clustering results look really good.

30

Page 31: Compression-based Graph Mining Exploiting Structure Primites

Thanks for your attention[1] FENG JING , XIAO HE , NINA HUBIG , CHRISTIAN BÖHM, CLAUDIA PLANT: Compression-based Graph Mining Exploiting Structure Primitives. Data Mining (ICDM), 2013 IEEE 13th International Conference on, 181–190. IEEE, 2013

[2] T. Schank and D. Wagner, “Approximating clustering coefficient and transitivity,” J. Graph Algorithms Appl., vol. 9, no. 2, pp. 265–275, 2005.

[3] J. Rissanen, “An introduction to the mdl principle,” Helsinki Institute for Information Technology, Tech. Rep., 2005.

[4] Python, Pyplot, Instagram API

31

Page 32: Compression-based Graph Mining Exploiting Structure Primites

32