social media mining - chapter 6 (community analysis)
TRANSCRIPT
![Page 1: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/1.jpg)
CommunityAnalysis
SOCIAL
MEDIA
MINING
![Page 2: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/2.jpg)
2Social Media Mining
Measures and Metrics 2Social Media Mining
Community Analysishttp://socialmediamining.info/
Dear instructors/users of these slides:
Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate these slides into your presentations, please include the following note:R. Zafarani, M. A. Abbasi, and H. Liu, Social Media Mining: An Introduction, Cambridge University Press, 2014. Free book and slides at http://socialmediamining.info/or include a link to the website: http://socialmediamining.info/
![Page 3: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/3.jpg)
3Social Media Mining
Measures and Metrics 3Social Media Mining
Community Analysishttp://socialmediamining.info/
Social Community
A group of individuals with common economic, social, or political interests or characteristics, often living in relative proximity.
[real-world] community
![Page 4: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/4.jpg)
4Social Media Mining
Measures and Metrics 4Social Media Mining
Community Analysishttp://socialmediamining.info/
Why analyze communities?
Analyzing communities helps better understand users– Users form groups based on their interests
Groups provide a clear global view of user interactions
• E.g., find polarization
Some behaviors are only observable in a group setting and not on an individual level
– Some republican can agree with some democrats, but their parties can disagree
![Page 5: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/5.jpg)
5Social Media Mining
Measures and Metrics 5Social Media Mining
Community Analysishttp://socialmediamining.info/
Social Media Communities
• Formation:– When like-minded users on social media form a link and start
interacting with each other
• More Formal Formation:1. A set of at least two nodes sharing some interest, and2. Interactions with respect to that interest.
• Social Media Communities– Explicit (emic): formed by user subscriptions– Implicit (etic): implicitly formed by social interactions
• Example: individuals calling Canada from the United States• Phone operator considers them one community for promotional offers
• Other community names: group, cluster, cohesive subgroup, or module
![Page 6: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/6.jpg)
6Social Media Mining
Measures and Metrics 6Social Media Mining
Community Analysishttp://socialmediamining.info/
Examples of Explicit Social Media Communities
Facebook has groups and communities. Users can – post messages and images, – can comment on other messages, – can like posts, and – can view activities of other users– Circles in Google+ represent
communities
– Communities form as lists. – Users join lists to receive
information in the form of tweets
– LinkedIn provides Groups and Associations.
– Users can join professional groups where they can post and share information related to the group
![Page 7: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/7.jpg)
7Social Media Mining
Measures and Metrics 7Social Media Mining
Community Analysishttp://socialmediamining.info/
Finding Implicit Communities: An Example
A simple graph in which three implicit
communities are found, enclosed by the dashed
circles
![Page 8: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/8.jpg)
8Social Media Mining
Measures and Metrics 8Social Media Mining
Community Analysishttp://socialmediamining.info/
Overlapping vs. Disjoint Communities
Overlapping Communities Disjoint Communities
![Page 9: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/9.jpg)
9Social Media Mining
Measures and Metrics 9Social Media Mining
Community Analysishttp://socialmediamining.info/
Implicit communities in other domains
Protein-protein interaction networks
– Communities are likely to group proteins having the same specific function within the cell
World Wide Web – Communities may correspond to groups of pages dealing
with the same or related topics
Metabolic networks – Communities may be related to functional modules such
as cycles and pathwaysFood webs
– Communities may identify compartments
![Page 10: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/10.jpg)
10Social Media Mining
Measures and Metrics 10Social Media Mining
Community Analysishttp://socialmediamining.info/
Real-world Implicit Communities
Collaboration network between scientists working at the Santa Fe Institute.
The colors indicate high level communities and correspond to research divisions of the institute
![Page 11: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/11.jpg)
11Social Media Mining
Measures and Metrics 11Social Media Mining
Community Analysishttp://socialmediamining.info/
What is Community Analysis?
• Community detection– Discovering implicit communities
• Community evolution– Studying temporal evolution of communities
• Community evaluation– Evaluating Detected Communities
![Page 12: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/12.jpg)
12Social Media Mining
Measures and Metrics 12Social Media Mining
Community Analysishttp://socialmediamining.info/
Community Detection
![Page 13: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/13.jpg)
13Social Media Mining
Measures and Metrics 13Social Media Mining
Community Analysishttp://socialmediamining.info/
What is community detection?
• The process of finding clusters of nodes (‘‘communities’’) – With Strong internal connections and – Weak connections between different communities
• Ideal decomposition of a large graph– Completely disjoint communities – There are no interactions between different
communities.
• In practice, – find community partitions that are maximally
decoupled.
![Page 14: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/14.jpg)
14Social Media Mining
Measures and Metrics 14Social Media Mining
Community Analysishttp://socialmediamining.info/
Why Detecting Communities is Important?
• The club members split into two groups (gray and white)• Disagreement between the administrator of the club (node
34) and the club’s instructor (node 1), • The members of one group left to start their own club
The same communities can be found using community
detection
Zachary's karate clubInteractions between 34 members of a karate club for over two years
![Page 15: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/15.jpg)
15Social Media Mining
Measures and Metrics 15Social Media Mining
Community Analysishttp://socialmediamining.info/
Why Community Detection?
Network Summarization
– A community can be considered as a summary of the whole network
– Easier to visualize and understand
Preserve Privacy– [Sometimes] a
community can reveal some properties without releasing the individuals’ privacy information.
![Page 16: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/16.jpg)
16Social Media Mining
Measures and Metrics 16Social Media Mining
Community Analysishttp://socialmediamining.info/
Community Detection vs. Clustering
Clustering• Data is often non-linked (matrix rows)• Clustering works on the distance or similarity
matrix, e.g., -means.• If you use -means with adjacency matrix rows,
you are only considering the ego-centric network
Community detection• Data is linked (a graph)• Network data tends to be “discrete”, leading to
algorithms using the graph property directly – -clique, quasi-clique, or edge-betweenness
![Page 17: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/17.jpg)
17Social Media Mining
Measures and Metrics 17Social Media Mining
Community Analysishttp://socialmediamining.info/
Community Detection Algorithms
Group Users based on
Group attributes
Group Users based on Member
attributes
![Page 18: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/18.jpg)
18Social Media Mining
Measures and Metrics 18Social Media Mining
Community Analysishttp://socialmediamining.info/
Member-Based Community Detection
![Page 19: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/19.jpg)
19Social Media Mining
Measures and Metrics 19Social Media Mining
Community Analysishttp://socialmediamining.info/
Member-Based Community Detection
• Look at node characteristics; and• Identify nodes with similar characteristics and consider
them a community
Node Characteristics
A. Degree– Nodes with same (or similar) degrees are in one community– Example: cliques
B. Reachability – Nodes that are close (small shortest paths) are in one community– Example: -cliques, -clubs, and -clans
C. Similarity – Similar nodes are in the same community
![Page 20: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/20.jpg)
20Social Media Mining
Measures and Metrics 20Social Media Mining
Community Analysishttp://socialmediamining.info/
A. Node Degree
Most common subgraph searched for:• Clique: a maximum complete
subgraph in which all nodes inside the subgraph adjacent to each otherFind communities by searching
for 1. The maximum clique:
the one with the largest number of vertices, or
2. All maximal cliques: cliques that are not subgraphs of a larger clique; i.e., cannot be expanded furtherBoth problems are NP-
hard
To overcome this, we can
I. Brute ForceII. Relax cliques
III. Use cliques as the core for larger communities
![Page 21: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/21.jpg)
21Social Media Mining
Measures and Metrics 21Social Media Mining
Community Analysishttp://socialmediamining.info/
I. Brute-Force Method
Can find all the maximal cliques in the graph
For each vertex , we find the maximal clique that contains node
Impractical for large networks: • For a complete graph of only 100 nodes, the
algorithm will generate at least different cliques starting from any node in the graph
![Page 22: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/22.jpg)
22Social Media Mining
Measures and Metrics 22Social Media Mining
Community Analysishttp://socialmediamining.info/
Enhancing the Brute-Force Performance
Pruning can help:
• When searching for cliques of size or larger
• If the clique is found, each node should have a degree equal to or more than
• We can first prune all nodes (and edges connected to them) with degrees less than – More nodes will have degrees less than – Prune them recursively
• For large , many nodes are pruned as social media networks follow a power-law degree distribution
![Page 23: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/23.jpg)
23Social Media Mining
Measures and Metrics 23Social Media Mining
Community Analysishttp://socialmediamining.info/
Maximum Clique: Pruning…
Example. to find a clique , remove all nodes with degree – Remove nodes 2 and 9– Remove nodes 1 and 3– Remove node 4
Even with pruning, cliques are less desirable– Cliques are rare– A clique of 1000 nodes, has 999x1000/2 edges– A single edge removal destroys the clique– That is less than 0.0002% of the edges!
![Page 24: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/24.jpg)
24Social Media Mining
Measures and Metrics 24Social Media Mining
Community Analysishttp://socialmediamining.info/
II. Relaxing Cliques
• -plex. a set of vertices in which we have
• is the degree of in the induced subgraph– Number of nodes from that are connected to
• Clique of size is a -plex• Finding the maximum -plex: NP-hard
– In practice, relatively easier due to smaller search space.
Maximal -plexes
![Page 25: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/25.jpg)
25Social Media Mining
Measures and Metrics 25Social Media Mining
Community Analysishttp://socialmediamining.info/
III. Using Cliques as a seed of a Community
Clique Percolation Method (CPM)– Uses cliques as seeds to find larger communities– CPM finds overlapping communities
• Input– A parameter , and a network
• Procedure– Find out all cliques of size in the given network– Construct a clique graph.
• Two cliques are adjacent if they share nodes– Each connected components in the clique graph
form a community
![Page 26: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/26.jpg)
26Social Media Mining
Measures and Metrics 26Social Media Mining
Community Analysishttp://socialmediamining.info/
Clique Percolation Method: Example
Cliques of size 3:, ,
, , , , ,
Communities: , ,
![Page 27: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/27.jpg)
27Social Media Mining
Measures and Metrics 27Social Media Mining
Community Analysishttp://socialmediamining.info/
B. Node Reachability
The two extremesNodes are assumed to be in the same community
1. If there is a path between them (regardless of the distance) or
2. They are so close as to be immediate neighbors.
How? Find using BFS/DFS
How? Finding Cliques
Challenge: most nodes are in one community (giant component)
Challenge: Cliques are challenging to find and are rarely observed
Solution: find communities that are in between cliques and connected components in terms of
connectivity and have small shortest paths between their nodes
![Page 28: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/28.jpg)
28Social Media Mining
Measures and Metrics 28Social Media Mining
Community Analysishttp://socialmediamining.info/
Special Subgraphs
1. -Clique: a maximal subgraph in which the largest geodesic distance between any nodes is less than or equal to
2. -Club: follows the same definition as a -clique– Additional Constraint: nodes on the shortest paths should be
part of the subgraph
3. -Clan: a -clique where for all shortest paths within the subgraph the distance is equal or less than . – All -clans are -cliques, but not vice versa.
![Page 29: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/29.jpg)
29Social Media Mining
Measures and Metrics 29Social Media Mining
Community Analysishttp://socialmediamining.info/
C. Node Similarity
• Similar (or most similar) nodes are assumed to be in the same community.– A classical clustering algorithm (e.g., -means) is applied to node similarities to find
communities.
• Node similarity can be defined – Using the similarity of node neighborhoods (Structural Equivalence) – Ch. 3– Similarity of social circles (Regular Equivalence) – Ch. 3
Structural equivalence: two nodes are structurally equivalent iff. they are connecting to the same set of actors
Nodes 1 and 3 are structurally equivalent, So are nodes 5 and 7.
![Page 30: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/30.jpg)
30Social Media Mining
Measures and Metrics 30Social Media Mining
Community Analysishttp://socialmediamining.info/
Node Similarity (Structural Equivalence)
Jaccard Similarity
Cosine similarity
![Page 31: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/31.jpg)
31Social Media Mining
Measures and Metrics 31Social Media Mining
Community Analysishttp://socialmediamining.info/
Group-BasedCommunity Detection
![Page 32: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/32.jpg)
32Social Media Mining
Measures and Metrics 32Social Media Mining
Community Analysishttp://socialmediamining.info/
Group-Based Community Detection
• In group-based community detection, we are interested in communities that have certain group properties
Properties:I. Balanced: Spectral clusteringII. Robust: -connected graphsIII. Modular: Modularity MaximizationIV. Dense: Quasi-cliquesV. Hierarchical: Hierarchical clustering
![Page 33: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/33.jpg)
33Social Media Mining
Measures and Metrics 33Social Media Mining
Community Analysishttp://socialmediamining.info/
I. Balanced Communities
• Community detection can be thought of graph clustering
• Graph clustering: we cut the graph into several partitions and assume these partitions represent communities
• Cut: partitioning (cut) of the graph into two (or more) sets (cutsets) – The size of the cut is the number of edges that are being cut
• Minimum cut (min-cut) problem: find a graph partition such that the number of edges between the two sets is minimized
Min-cuts can be computed
efficiently using the max-flow min-
cut theoremMin-cut often returns an imbalanced partition, with one set being a
singleton
Min-cut
![Page 34: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/34.jpg)
34Social Media Mining
Measures and Metrics 34Social Media Mining
Community Analysishttp://socialmediamining.info/
Ratio Cut and Normalized Cut
• To mitigate the min-cut problem we can change the objective function to consider community size
• is the complement cut set• is the size of the cut
![Page 35: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/35.jpg)
35Social Media Mining
Measures and Metrics 35Social Media Mining
Community Analysishttp://socialmediamining.info/
Ratio Cut & Normalized Cut: Example
For Cut A
For Cut B
Both ratio cut and normalized cut prefer a balanced partition
A
B
![Page 36: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/36.jpg)
36Social Media Mining
Measures and Metrics 36Social Media Mining
Community Analysishttp://socialmediamining.info/
Spectral Clustering
• Both ratio cut and normalized cut can be reformulated in matrix format
• Let when node is member of community and 0, otherwise
• Let , , be the diagonal degree matrix
• Then the th entry on the diagonal of represents the number of edges that are inside community .
• Similarly, the th element on the diagonal of represents the number of edges that are connected to members of community .
• The th element on the diagonal of represents the number of edges that are in the cut that separates community from all other nodes.
• The th diagonal element of is equivalent to
![Page 37: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/37.jpg)
37Social Media Mining
Measures and Metrics 37Social Media Mining
Community Analysishttp://socialmediamining.info/
Spectral Clustering
So ratio cut is
![Page 38: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/38.jpg)
38Social Media Mining
Measures and Metrics 38Social Media Mining
Community Analysishttp://socialmediamining.info/
Spectral Clustering
Both ratio/normalized cut can be reformulated as
• Spectral relaxation: , , is a diagonal degree matrix
Optimal Solution is the top eigenvectors with the smallest eigenvalues
![Page 39: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/39.jpg)
39Social Media Mining
Measures and Metrics 39Social Media Mining
Community Analysishttp://socialmediamining.info/
Recovering Integer Membership Values
• Because we performed spectral relaxation, the matrix obtained is not integer valued
• To recover from we can run -means on
![Page 40: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/40.jpg)
40Social Media Mining
Measures and Metrics 40Social Media Mining
Community Analysishttp://socialmediamining.info/
Spectral Clustering: Example
𝐷=𝑑𝑖𝑎𝑔(2 ,2 ,4 ,4 ,4 , 4 ,4 ,3 ,1)
2 Eigenvectorsi.e., we want
2 communities
Two communities:
-mea
ns,
![Page 41: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/41.jpg)
41Social Media Mining
Measures and Metrics 41Social Media Mining
Community Analysishttp://socialmediamining.info/
II. Robust Communities
• The goal is find subgraphs robust enough such that removing some edges or vertices does not disconnect the subgraph
• -vertex connected (-connected) graph:– is the minimum number of nodes that must be
removed to disconnect the graph• -edge connected: at least edges must be
removed to disconnect the graph
• Examples:– Complete graph of size : unique -connected graph – A cycle: -connected graph
![Page 42: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/42.jpg)
42Social Media Mining
Measures and Metrics 42Social Media Mining
Community Analysishttp://socialmediamining.info/
III. Modular Communities
Consider a graph , where the degrees are known beforehand however edges are not
– Consider two vertices and with degrees and .
What is an expected number of edges between and ?• For any edge going out of randomly the probability
of this edge getting connected to vertex is
![Page 43: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/43.jpg)
43Social Media Mining
Measures and Metrics 43Social Media Mining
Community Analysishttp://socialmediamining.info/
Modularity and Modularity Maximization
• Given a degree distribution, we know the expected number of edges between any pairs of vertices
• We assume that real-world networks should be far from random. Therefore, the more distant they are from this randomly generated network, the more structural they are.
• Modularity defines this distance and modularity maximization tries to maximize this distance
![Page 44: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/44.jpg)
44Social Media Mining
Measures and Metrics 44Social Media Mining
Community Analysishttp://socialmediamining.info/
Normalized Modularity
Consider a partitioning of the data
For partition , this distance can be defined as
This distance can be generalized for a partitioning
The normalized version of this distance is defined as Modularity
![Page 45: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/45.jpg)
45Social Media Mining
Measures and Metrics 45Social Media Mining
Community Analysishttp://socialmediamining.info/
Modularity Maximization
Modularity matrix
is the degree vector for all nodes
Reformulation of the modularity
• is the indicator (partition membership) function: iff.
• Similar to Spectral clustering, • We relax to be orthogonal, i.e., matrix • The optimal solution for is the top eigenvectors of . • To recover the original , we can run -means on
• Difference: in Modularity, the top eigenvalues have to be positive!
![Page 46: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/46.jpg)
46Social Media Mining
Measures and Metrics 46Social Media Mining
Community Analysishttp://socialmediamining.info/
Modularity Maximization: Example
Modularity Matrix
-means
Two Communities:
{1, 2, 3, 4} and
{5, 6, 7, 8, 9}
2 eigenvectors
![Page 47: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/47.jpg)
47Social Media Mining
Measures and Metrics 47Social Media Mining
Community Analysishttp://socialmediamining.info/
IV. Dense Communities: -dense
• The density of a graph defines how close a graph is to a clique
• A subgraph is a -dense (or quasi-clique) if
• A -dense graph is a clique• We can find quasi-cliques using the brute force
algorithm discussed previously, but there are more efficient methods.
![Page 48: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/48.jpg)
48Social Media Mining
Measures and Metrics 48Social Media Mining
Community Analysishttp://socialmediamining.info/
Finding Maximal -dense Quasi-Cliques
We can use a two-step procedure consisting of “local search” and “heuristic pruning”
Local search• Sample a subgraph, and find a maximal -dense
quasi-clique • A greedy approach is to expand a quasi-clique by all of its
high-degree neighbors until the density drops below Heuristic pruning• For a -dense quasi-clique of size k, we recursively
remove nodes with degree less than k and incident edges• We can start from low-degree nodes and recursively
remove all nodes with degree less that k
![Page 49: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/49.jpg)
49Social Media Mining
Measures and Metrics 49Social Media Mining
Community Analysishttp://socialmediamining.info/
V. Hierarchical Communities
• Previous methods consider communities at a single level– Communities may have hierarchies.
• Each community can have sub/super communities. – Hierarchical clustering deals with this scenario
and generates community hierarchies.
• Initially members are considered as either 1 or communities in hierarchical clustering. These communities are gradually – merged (agglomerative hierarchical clustering) or – split (divisive hierarchical clustering)
![Page 50: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/50.jpg)
50Social Media Mining
Measures and Metrics 50Social Media Mining
Community Analysishttp://socialmediamining.info/
Hierarchical Community Detection
• Goal: build a hierarchical structure of communities based on network topology
• Allow the analysis of a network at different resolutions
• Representative approaches: – Divisive Hierarchical Clustering– Agglomerative Hierarchical clustering
![Page 51: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/51.jpg)
51Social Media Mining
Measures and Metrics 51Social Media Mining
Community Analysishttp://socialmediamining.info/
Divisive Hierarchical Clustering
• Divisive clustering– Partition nodes into several sets– Each set is further divided into smaller ones– Network-centric partition can be applied for the partition
• One particular example:
Girvan-Newman Algorithm: recursively remove the “weakest” links within a “community”
– Find the edge with the weakest link– Remove the edge and update the corresponding strength of each
edge
• Recursively apply the above two steps until a network is discomposed into a desired number of connected components.
• Each component forms a community
![Page 52: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/52.jpg)
52Social Media Mining
Measures and Metrics 52Social Media Mining
Community Analysishttp://socialmediamining.info/
Edge Betweenness
• To determine weakest links, the algorithm uses edge betweenness.
Edge betweenness is the number of shortest paths that pass along with the edge
• Edge betweenness measures the “bridgeness” of an edge between two communities
• The edge with high betweenness tends to be the bridge between two communities.
![Page 53: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/53.jpg)
53Social Media Mining
Measures and Metrics 53Social Media Mining
Community Analysishttp://socialmediamining.info/
Edge Betweenness: Example
The edge betweenness of is, as all the shortest paths from to have to either pass or , and is the shortest path between and
![Page 54: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/54.jpg)
54Social Media Mining
Measures and Metrics 54Social Media Mining
Community Analysishttp://socialmediamining.info/
The Girvan-Newman Algorithm
1.Calculate edge betweenness for all edges in the graph.
2.Remove the edge with the highest betweenness.
3.Recalculate betweenness for all edges affected by the edge removal.
4.Repeat until all edges are removed.
![Page 55: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/55.jpg)
55Social Media Mining
Measures and Metrics 55Social Media Mining
Community Analysishttp://socialmediamining.info/
Edge Betweenness Divisive Clustering: Example
the first edge that needs to be removed is e(4, 5) ( or e(4, 6) )
Initial betweenness value
By removing e(4, 5), we compute the edge betweenness once again; this time, e(4, 6) has the highest betweenness value: 20.This is because all shortest paths between
nodes {1,2,3,4} to nodes {5,6,7,8,9} must pass e(4, 6); therefore, it has betweenness 45 = 20.
![Page 56: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/56.jpg)
56Social Media Mining
Measures and Metrics 56Social Media Mining
Community Analysishttp://socialmediamining.info/
Community Detection Algorithms
![Page 57: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/57.jpg)
57Social Media Mining
Measures and Metrics 57Social Media Mining
Community Analysishttp://socialmediamining.info/
Community Evolution
![Page 58: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/58.jpg)
58Social Media Mining
Measures and Metrics 58Social Media Mining
Community Analysishttp://socialmediamining.info/
Network and Community Evolution
• How does a network change over time?• How does a community change over
time?• What properties do you expect to remain
roughly constant?• What properties do you expect to change?• For example,– Where do you expect new edges to form?– Which edges do you expect to be dropped?
![Page 59: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/59.jpg)
59Social Media Mining
Measures and Metrics 59Social Media Mining
Community Analysishttp://socialmediamining.info/
How Networks Evolve?
![Page 60: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/60.jpg)
60Social Media Mining
Measures and Metrics 60Social Media Mining
Community Analysishttp://socialmediamining.info/
Network Growth Patterns
1. Network Segmentation
2. Graph Densification
3. Diameter Shrinkage
![Page 61: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/61.jpg)
61Social Media Mining
Measures and Metrics 61Social Media Mining
Community Analysishttp://socialmediamining.info/
1. Network Segmentation
• Often, in evolving networks, segmentation takes place, where the large network is decomposed over time into three parts
1. Giant Component: As network connections stabilize, a giant component of nodes is formed, with a large proportion of network nodes and edges falling into this component.
2. Stars: These are isolated parts of the network that form star structures. A star is a tree with one internal node and n leaves.
3. Singletons: These are orphan nodes disconnected from all nodes in the network.
![Page 62: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/62.jpg)
62Social Media Mining
Measures and Metrics 62Social Media Mining
Community Analysishttp://socialmediamining.info/
2. Graph Densification
• The density of the graph increases as the network grows– The number of edges increases faster than the
number of nodes does
• Densification exponent: :– : linear growth – constant out-degree – : quadratic growth – clique
and are numbers of edges and nodes respectively at time
![Page 63: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/63.jpg)
63Social Media Mining
Measures and Metrics 63Social Media Mining
Community Analysishttp://socialmediamining.info/
Densification in Real Networks
V(t)
E(t)
1.69
Physics CitationsV(t)
E(t)
1.66
Patent Citations
![Page 64: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/64.jpg)
64Social Media Mining
Measures and Metrics 64Social Media Mining
Community Analysishttp://socialmediamining.info/
3. Diameter Shrinking
• In networks diameter shrinks over time
ArXiv citation graph Affiliation Network
![Page 65: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/65.jpg)
65Social Media Mining
Measures and Metrics 65Social Media Mining
Community Analysishttp://socialmediamining.info/
How Communities Evolve?
![Page 66: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/66.jpg)
66Social Media Mining
Measures and Metrics 66Social Media Mining
Community Analysishttp://socialmediamining.info/
Community Evolution
• Communities also expand, shrink, or dissolve in dynamic networks
![Page 67: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/67.jpg)
67Social Media Mining
Measures and Metrics 67Social Media Mining
Community Analysishttp://socialmediamining.info/
Community Detection in Evolving Networks
![Page 68: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/68.jpg)
68Social Media Mining
Measures and Metrics 68Social Media Mining
Community Analysishttp://socialmediamining.info/
Extending Previous Methods
1. Take t snapshots of the network, , where is a snapshot at time
2. Perform a static community detection algorithm (all methods discussed before) on all snapshots independently
3. Assign community members based on communities found in all time stamps. – E.g., Assign nodes to communities based on
voting (assign nodes to communities they belong to the most over time)
Unstable in highly dynamic networks as community memberships are always changing
![Page 69: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/69.jpg)
69Social Media Mining
Measures and Metrics 69Social Media Mining
Community Analysishttp://socialmediamining.info/
Evolutionary Clustering
• Assume that communities don’t change most of the time
• Minimize an objective function that considers– Snapshot Cost. Communities at different times
(SC)– Temporal Cost. How communities evolve (TC)
• Objective function is defined as
![Page 70: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/70.jpg)
70Social Media Mining
Measures and Metrics 70Social Media Mining
Community Analysishttp://socialmediamining.info/
Evolutionary Clustering
• If we use spectral clustering for each snapshot• Then, , the objective function at time is
• is the community membership matrix at • To define TC we can use
• Challenges with this definition– Assumes that we have the same number of communities at
time and – is non-unique (any orthogonal transformation is still a
solution)
![Page 71: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/71.jpg)
71Social Media Mining
Measures and Metrics 71Social Media Mining
Community Analysishttp://socialmediamining.info/
Evolutionary Clustering
• We can define TC as
• Hence cost will be
![Page 72: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/72.jpg)
72Social Media Mining
Measures and Metrics 72Social Media Mining
Community Analysishttp://socialmediamining.info/
Evolutionary Clustering
• Assuming Normalized Laplacian is used
Similar to Spectral Clustering can be obtained by taking the top eigenvectors of
![Page 73: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/73.jpg)
73Social Media Mining
Measures and Metrics 73Social Media Mining
Community Analysishttp://socialmediamining.info/
Community Evaluation
![Page 74: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/74.jpg)
74Social Media Mining
Measures and Metrics 74Social Media Mining
Community Analysishttp://socialmediamining.info/
Evaluating the Communities
• Evaluation with ground truth• Evaluation without ground truth
We are given objects of two different kinds (, ) • The perfect
communities would be that objects of the same type are in the same community.
![Page 75: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/75.jpg)
75Social Media Mining
Measures and Metrics 75Social Media Mining
Community Analysishttp://socialmediamining.info/
Evaluation with Ground Truth
• When ground truth is available – We have partial knowledge of what
communities should look like– We are given the correct community
(clustering) assignments
• Measures– Precision and Recall, or F-Measure– Purity– Normalized Mutual Information (NMI)
![Page 76: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/76.jpg)
76Social Media Mining
Measures and Metrics 76Social Media Mining
Community Analysishttp://socialmediamining.info/
Precision and Recall
False Negative (FN) : • When similar members are
assigned to different communities
• An incorrect decision
False Positive (FP) : • When dissimilar members are
assigned to the same communities
• An incorrect decision
True Positive (TP) : • When similar members are
assigned to the same communities
• A correct decision.
True Negative (TN) : • When dissimilar members are
assigned to different communities
• A correct decision
![Page 77: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/77.jpg)
77Social Media Mining
Measures and Metrics 77Social Media Mining
Community Analysishttp://socialmediamining.info/
Precision and Recall: Example
![Page 78: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/78.jpg)
78Social Media Mining
Measures and Metrics 78Social Media Mining
Community Analysishttp://socialmediamining.info/
F-Measure
Either or measures one aspect of the performance, – To integrate them into one measure, we can use
the harmonic mean of precision of recall
For the example earlier,
![Page 79: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/79.jpg)
79Social Media Mining
Measures and Metrics 79Social Media Mining
Community Analysishttp://socialmediamining.info/
Purity
We can assume the majority of a community represents the community– We use the label of the majority against the label
of each member to evaluate the communitiesPurity. The fraction of instances that have labels equal to the community’s majority label
• : the number of communities• : total number of nodes, • : the set of instances with label in all communities• : the set of members in community
Purity can be easily tampered by• Points being singleton communities (of size 1);
or by• Very large communities
![Page 80: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/80.jpg)
80Social Media Mining
Measures and Metrics 80Social Media Mining
Community Analysishttp://socialmediamining.info/
Mutual Information
• Mutual information (MI). The amount of information that two random variables share. – By knowing one of the variables, it measures the
amount of uncertainty reduced regarding the others
• and are labels and found communities; • and are the number of data points in community
and with label , respectively; • is the number of nodes in community and with
label ; and is the number of nodes
![Page 81: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/81.jpg)
81Social Media Mining
Measures and Metrics 81Social Media Mining
Community Analysishttp://socialmediamining.info/
Normalizing Mutual Information (NMI)
• Mutual information (MI) is unbounded• To address this issue, we can normalize MI
• How? We know that
• is the entropy function
![Page 82: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/82.jpg)
82Social Media Mining
Measures and Metrics 82Social Media Mining
Community Analysishttp://socialmediamining.info/
Normalized Mutual Information
Normalized Mutual Information
We can also define it as Note that
![Page 83: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/83.jpg)
83Social Media Mining
Measures and Metrics 83Social Media Mining
Community Analysishttp://socialmediamining.info/
Normalized Mutual Information
• NMI values close to one indicate high similarity between communities found and labels
• Values close to zero indicate high dissimilarity between them
• where and are known (with labels) and found communities, respectively
• and are the number of members in the community and , respectively,
• is the number of members in community and labeled , • is the size of the dataset
![Page 84: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/84.jpg)
84Social Media Mining
Measures and Metrics 84Social Media Mining
Community Analysishttp://socialmediamining.info/
Normalized Mutual Information: Example
Found communities (H)– [1,1,1,1,1,1, 2,2,2,2,2,2,2,2]
nh
h=1 6h=2 8
nl
77
nh,l
h=1 5 1h=2 2 6
𝑛=14
Actual Labels (L) – [2,1,1,1,1,1, 2,2,2,2,2,2,1,1]
![Page 85: Social Media Mining - Chapter 6 (Community Analysis)](https://reader036.vdocuments.us/reader036/viewer/2022062902/58ef46ad1a28abe1588b45a3/html5/thumbnails/85.jpg)
85Social Media Mining
Measures and Metrics 85Social Media Mining
Community Analysishttp://socialmediamining.info/
Evaluation without Ground Truth
• Evaluation with Semantics– A simple way of analyzing detected communities is to analyze other attributes
(posts, profile information, content generated, etc.) of community members to see if there is a coherency among community members
– The coherency is often checked via human subjects.• Or through labor markets: Amazon Mechanical Turk
– To help analyze these communities, one can use word frequencies. By generating a list of frequent keywords for each community, human subjects determine whether these keywords represent a coherent topic.
• Evaluation Using Clustering Quality Measures– Use clustering quality measures (SSE)– Use more than two community detection algorithms and compare the results and
pick the algorithm with better quality measure