finding all maximal cliques in very large social networks
TRANSCRIPT
Finding All Maximal Cliques in Very Large Social Networks
16 March 2016, EDBT 2016, Bordeaux, France
Alessio Conte°, Roberto De Virgilio§, Antonio Maccioni§, Maurizio Patrignani§, Riccardo Torlone§
°Università di Pisa §Università Roma Tre
Social Network Analysis
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 1
Community Detection
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 2
Cliques
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 3
A
J H
F
Z
A
J
A
J
F
A
J H
F
Maximal Clique Enumeration (MCE)
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 4
A
JH H
F
D D E
S
E Y
E G
S U
S W
Distributed MCE
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 5
[Cheng et al., KDD 2012][Cheng et al., SIGMOD 2010]
Block size m (Max number of nodes) = 5
A
JH
F
D
D
US
W
E
E
S D
GY
Distributed MCE
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 6
[Cheng et al., KDD 2012][Cheng et al., SIGMOD 2010]
Block size m (Max number of nodes) = 5
A
JH
F
D
D
US
W
E
E
S D
GY
Distributed MCE
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 7
[Cheng et al., KDD 2012][Cheng et al., SIGMOD 2010]
Block size m (Max number of nodes) = 5
A
JH
F
D
D
US
W
E
E
S D
GY
Distributed MCE
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 8
[Cheng et al., KDD 2012][Cheng et al., SIGMOD 2010]
Block size m (Max number of nodes) = 5
A
JH
F
Z RP
D
L
E
S
WU
X
G
Y
Distributed MCE
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 9
[Cheng et al., KDD 2012][Cheng et al., SIGMOD 2010]
Block size m (Max number of nodes) = 5
A
JH
F
D
Z RP
D E
E
S X
GY
D
L
S
W
U
Distributed MCE
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 10
[Cheng et al., KDD 2012][Cheng et al., SIGMOD 2010]
Block size m (Max number of nodes) = 5
A
JH
F
D
Z RP
D E
E
S X
GY
D
L
S
W
U
Distributed MCE
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 11
[Cheng et al., KDD 2012][Cheng et al., SIGMOD 2010]
Block size m (Max number of nodes) = 5
A
JH
F
D
Z RP
D E
ES
X
GY
D
L
S
W
U
D E
S
undetected cliques
Hub Nodes Effect
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 12
Block size m (Max number of nodes) = 5
A
JH
F
Z RP
D
L
E
S
WU
X
G
Y
The Block Size Effect
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 13
* Taken from [Cheng et al., KDD 2012]
efficiency vs completeness/correcteness
Overview of the Approach
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 14
G = (N, E)
C = c1, c
2, ...,
c
n
1st level decomposition
Induced graph2nd level decomposition
FIND MAX CLIQUES
FIND MAX CLIQUES
Block analysis
Nf N
h
Block 1 Block z...
UC
fC
h
1st Level Decomposition
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 15
Separate hubs from the rest of nodes in N- according to a maximum block size m
1st level decompositionFIND MAX CLIQUES
Nf N
h
G = (N, E)
A
JH
F
Z R
P
L WU
X
GY
D ES
1st Level Decomposition - Lemma
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 16
The set of all maximal cliques C of G can be obtained by computing C
f and C
h alone
- C = Cf U C
h
- Cf is the set of cliques with at least one node in N
f
- Ch is the set of cliques with all the nodes in N
h
- proof of this Lemma is in our paper
2nd Level Decomposition
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 17
2nd level decomposition
FIND MAX CLIQUES
Nf
Block 1 Block z...
2nd Level Decomposition
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 18
Kernel node Border nodeKernel node Visited node
B1
A
J
H2nd level decomposition
FIND MAX CLIQUES
Nf
Block 1 Block z...
2nd Level Decomposition
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 19
Kernel node Border nodeKernel node Visited node
B1
A
J
H2nd level decomposition
FIND MAX CLIQUES
Nf
Block 1 Block z...
2nd Level Decomposition
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 20
Kernel node Border nodeKernel node Visited node
B1
A
J
H
F
D 2nd level decomposition
FIND MAX CLIQUES
Nf
Block 1 Block z...
2nd Level Decomposition
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 21
Kernel node Border nodeKernel node Visited node
Maximal Clique Enumeration
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 22
There are many algorithms for MCE- no one outperforms the others- but each has specific advantages
Tomita et al. Eppstein et al....
FIND MAX CLIQUES
Block analysis
Block 1 Block z...
Cf
Block Analysis
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 23
We determine the best-fit MCE algorithm on each block
Block analysis
Select best-fit
Tomita et al. Eppstein et al....
Cf
Block
FIND MAX CLIQUES
Block analysis
Block 1 Block z...
Cf
Decision Tree
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 24
Recursion Over Hub Nodes
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 25
Induced graph
FIND MAX CLIQUES
FIND MAX CLIQUES
Nh
Ch
Recursion Over Hub Nodes
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 26
The induced graph of the hub nodes is recursively processed
B12
DE
S
Kernel node Border nodeKernel node Visited node
Induced graph
FIND MAX CLIQUES
FIND MAX CLIQUES
Nh
Ch
Convergence Guarantee
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 27
Given a degeneracy of the graph lower than the block size, the whole process converges
Theorem. Let G be a graph and let Gi, with i=1, 2, 3, ... be a sequence
of subgraphs of G such that G1 = G and G
i, for i > 1 is the graph induced
by the nodes of Gi−1
of degree greater or equal than m. Let the degeneracy d of G be strictly less than m + 1.
1. There is a value q such that all Gj , with j ≥ q, are empty graphs.
2. There exists a graph with n nodes for which q is Ω(n).
Degeneracy and Sparsity
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 28
A measure of the sparsity of a graph- it is the highest value d for which the network contains a d-core. A d-core is obtained by recursively removing nodes with degree less
than d- degeneracy is typically < 100 on scale-free real graphs- facebook has a degeneracy ~ 54
Experiments: Decomposition
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 29
Experiments: the Block Size Effect
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 30
Experiments: Decision Tree
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 31
Experiments: Effectiveness
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 32
Conclusion
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 33
Approach for computing maximal cliques over an arbitrarily large graph- taking advantage of distributed computation- leveraging on the advantages of different existing algorithms for MCE
Completeness and correcteness are not compromised by hub nodes- unlike state-of-the-art
Future Work
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 34
Take into account the “semantics” of the graph in order to search for specific kind of cliques
Extend our framework for computing more relaxed communities- k-plexes, k-clans, k-cores, etc.
Thanks For The Attention
Finding all Maximal Cliques in Very Large Social Networks @ EDBT 2016 35