Community Structure in Large Complex Networks
Liaoruo Wang and John E. HopcroftDept. of Computer Engineering & Computer Science, Cornell University
In Proc. 7th Annual Conference on Theory and Applications of Models of Computation (TAMC), June 2010
Presented by Nam Nguyen
Motivation Introduction Contributions of the paper Definitions WHISKER is NP-Complete. Algorithms.
Agenda
C.S is a classical but still-hot topic in complex networks.
Previous studies: Communities were assumed to be densely connected inside but sparsely connected outside.
A different point of view: We should disregard “whiskers” and elaborate “cores” in the networks.
Motivation
Roughly speaking◦ Whiskers: Subsets of vertices that are barely connected
to the rest of the network.◦ Cores: Connected subgraphs that are densely connected
inside and well-connected to the rest of the network, i.e., “real communities”
Why???◦ For real-world societies, communities are also well
connected to the rest of the network.◦ Imagine a close-nit community, CISE Dept., with only one
connection with the outer world. Definitions come right away.
Introduction
More concrete definitions of “whiskers” and “cores” in a networks.
WHISKER is NP-Complete Three heuristic algorithms for finding
approximate cores. Simulation results.
Contributions
Graph G = (V,E) undirected, A = (Ai,j). For S⊆V, let SC = V\S.
Conduction of S
where A suitable cut
Definition
A k-whisker
A maximal k-whisker
Definition(cont’d)
A whisker
A maximal whisker
Definition (cont’d)
A core
Definition (cont’d)
Lemmas
Proof
The only suitable cut of size = 26
|S ⋃ T| = 25
>
Lemmas (cont’d)
Proof
(1a) exr + exz + eyr + eyz ≤ vx + vy(1b) eyr + exy + ezr + exz ≤ vy + vz
(1c) exr + eyr + ezr > vx + vz
(1a) + (1b) and use (1c) givesexr+2eyr+ezr+exy+eyz+2exz ≤ vx+2vy+vz < exr+eyr+ezr+vy
eyr + exy + eyz < vy
NAE-3-SAT: The problem of determining whether there exists a truth assignment for a 3-CNF Boolean formula such that each clause has at least one true literal and at least one false literal.
Fact: NAE-3-SAT is NP-Complete [1]
WHISKER: Given an unweighted undirected graph, determine whether there exists a whisker or not.
WHISKER is NP-Complete(of course, from a reduction from NAE-3-SAT)
NP-Completeness
Road map◦ 1. Construct a special graph G of 2n
vertices and show that G admits 2n whiskers and no more.
◦ 2. Construct a G-like graph for the 3-SAT problem.
◦ 3. Make a reduction from NAE-3-SAT problem to WHISKER
WHISKER is NP-Complete
WHISKER is in NP Reduction from NAE-3-SAT to WHISKER
◦ Consider the following graph (constructed in poly time) At each row, pick only one vertex (i.e., either xi or ¬xi) The resulted graph G of n vertices is a whisker Total number of whiskers is 2n ………… And no more than that
NP-Completeness
2n whiskers and no more than that!!! Why???
Suppose there is a whisker W of 2k+j vertices
Cut size of W
By definition of suitable cut size, we have
which implies !!!!
NP-Complete
NAE-3-SAT ≤P WHISKER Consider an instance of NAE-3-SAT with n
variables and c clauses. Construct G1, G2, …, Gc as follow
NP-Complete
NAE-3-SAT ≤P WHISKER Now, combine all Gi’s and add up all edge weights to get G’.
Next
NP-Complete
G G
G’ G’G*3CNF has a satisfied
assignment contains a whisker
update
update
Update G ( )
Update G’◦ Amplify all edge weights of G’ by a small amount δ where cn2δ << 1
All whiskers in new G are the same as in old G.
NP-Complete
G* = G + G’
Goal: If the 3CNF instance has a satisfied truth assignment, then selecting true literal from each row of G* gives us a whisker of size n, and vice versa.
For any truth assignment of 3SAT, rearrange the literals in to TRUE and FALSE columns.
If there is a satisfied not-all-equal assignment for 3SAT◦ Each clause must have one TRUE and one FALSE literals.◦ Not all the literals in each clause can be in the same column.◦ For each ith clause, Gi contains n2-2 edges connecting its two columns◦ Total cut size is required to satisfied
NP-Complete
If there is NO satisfied not-all-equal assignment for 3SAT◦ At least one clause i has its literals located in the same column n2
edges between the two columns of Gi.◦ For the other (c-1) clauses, there are at most (n2-2) edges connecting the
their two columns. Total number of edges: (c-1)(n2-2)+n2 = cn2–2c+2.◦ Of course, we don’t want selecting the true literal in each row give us a
whisker, thus
Combining the two inequalities, if ℇ and δ is chosen such that
Then If the 3CNF instance has a satisfied truth assignment, then selecting true literal from each row of G* gives us a whisker of size n, and vice versa.
◦ Hence, NAE-3-CNF ≤P WHISKER □
NP-Complete
Heuristic Algorithms
On random graph
◦ Alg 2 can positively find an approximate core◦ Alg 3 fails to find approximate core◦ The size of core growing linearly with d = np (fixed n) and
logarithmically with n (fixed d)◦ ??? G(n,p) displays core structure with high probability when p > 1/n ???
Results
Textual graph◦ Vertices and Edges: Words and their semantic Correlations◦ Data is crawled from 10K scientific papers of KDD conf. (1992-2003)◦ Pointwise mutual information
◦ Total: 685 vertices and 6.432 edges
Results
Both alg 2 and 3 successfully find approximate cores. Higher values of λ indicate smaller core sizes. Fig (b), the best community of the textual graph has a large
conductance of .3 best community has as many internal edges as cut edges.
Alg 3 is believed to be more useful.
Results
Is a “whisker” make sense?
Comment
[1] Schaefer, T. J. The complexity of satisfiability problems. In Proc. 10th Ann. ACM Symp. on Theory of Computing (1978), Association for Computing Machinery, pp. 216-226.
Reference