community dynamicsshilpaa/community_dynamics.pdf · visualizing the evolution of subgroups in...
TRANSCRIPT
0
Community DynamicsCourse: Analysis of Social Media
Shilpa AroraLanguage Technologies Institute
School of Computer ScienceCarnegie Mellon University
1
Papers Presented
• Tanja Falkowski and Myra Spiliopoulou. Data Mining for CommunityDynamics, Künstliche Intelligenz (Journal), (2007), pp. 23-29.
• Tanja Falkowski, Jörg Bartelheimer & Myra Spiliopoulou. Mining andVisualizing the Evolution of Subgroups in Social Networks.Proceedings of the 2006 IEEE/WIC/ACM International Conference onWeb Intelligence.
• Tanja Falkowski, Jörg Bartelheimer and Myra Spiliopoulou,Community Dynamics Mining, In Proc. of 14th European Conferenceon Information Systems (ECIS 2006), Göteborg, Sweden, 2006.
• Main References: Michelle Girvan and M.E.J. Newman. Community structure in social and
biological networks. Proc. Natl. Acad. Sci. USA, (2002) Filippo Radicchi, Claudio Castellano, Federico Cecconi, Vittorio Loreto,
and Domenico Parisi. Detecting and identifying communities innetworks.Proc Natl Acad Sci U S A. 2004
2
Authors
Tanja Falkowski Research Associate Department for Information Systems, Otto-von-Guericke-University, Magdeburg
Myra SpiliopoulouProfessor of Business Informatics Computer Science,Otto-von-Guericke-University, Magdeburg
Jörg Bartelheimer Viadeo S.A.
3
Outline
• Introduction• Motivation• Detecting Communities• Community Dynamics• Visualization Tools• Results• Conclusion
4
Introduction
• Community of Practice – Group of people withcommon interests who interact to exchangetheir knowledge about the topic of interest
• Highly dynamic social network Structure changes over time Members and interactions are fluctuating Community – Persistent structure in a graph of
interactions among fluctuating members Community Instance: densely connected subgroups
that are only loosely connected to the rest of thegraphs
Communities = clusters of similar communityinstances across time
5
Motivation
• Communities in social networks – social groupings citation network – related papers on a single topic web – pages on related topic
• Organizations encourage communitydevelopment to facilitate knowledge sharing
• Factors affecting communities Internal - infrastructure, leadership; External -
publicity• Detecting points of structural change
Predict similar future behavior
6
Prior Work
• Community detection using aggregateddata, drawbacks: All interactions in time are treated equally = big
aggregates dominate, not most current Cannot observe transitions in the interaction
behavior, e.g. merging community, periodicallyactive communities
• Vertex & Edge level tools SoNIA (Moody et. al., 2006) & TeCFlow (Gloor &
Zhao, 2004) Change in behavior of single actor is captured –
dynamics of groups cannot be observed
7
Proposed Approach
• Dynamical temporal observation ofcommunities Time windows Detect community instances in each window Compare community instances across time
window Link together similar community instances to
form a community Interactively visualize changes in community
structure
8
Community Detection
• Graph G= (V,E), V-nodes (member), E-edges (interaction)
• Weights - no of messages exchangedor total length of messages Aggregated weights – favors old member Weights are assigned for each time
window
9
Community Detection
• Hierarchical divisive clustering Iterative removal of edges that do not
contribute to a community Two measures for finding edges to be
removed: Edge Btweenness (Girvan et. al, 2002) Edge Clustering coefficient (Radicchi et. al.
2004)
10
Community Detection
• Edge Betweenness (Girvan et. al, 2002) Edges which are least central = edges between the
communities Number of shortest paths between pairs of vertexes that
include this path Communities = densely connected subgraphs which are
loosely connected to each other few inter-group edges through which shortest paths go
Global quantity using properties of the whole system Complexity - O(m2n), m = number of edges, n = number
of vertices
11
Community Detection
Edge Clustering coefficient (Radicchi et. al. 2004)
= number of triangle to which this edge belongs = degree of the vertex i = maximum possible number of triangles
including that edge Intuition: Edges between communities belong to less
number of shorter loops. Many such triangles would occurwithin communities
Added advantage: nodes with only one connection are notconsidered as isolated communities, as coefficient isinfinite for their unique edges
12
Community Detection
• Betweenness Vs. Clustering Coef.(Radicchi et. al. 2004) Compared on artificial test graph with
four communities Comparable performance
Global quantity & Local quantity Complexity (m= # edges, n = # nodes)
Betweenness = O(m2n) Clustering Coefficient = O(m4/n2)
Strong anti-correlation but not perfect
13
R0= fraction of nodes correctlyclassifiedpout= probability with which pairsof nodes in different groups areconnected
GN- Edge betweenness approachEdge Clustering approach with:* g=3, triangle loops* g=4, square loops
Average time neededto analyze a randomgraph of N nodes
14
Community Detection
• Result of hierarchical clustering: Dendrogram• Meaningful network partition (Modularity -
Newman et. al., 2004) Fraction of edges connecting nodes within a
community minus expected value of the samequantity in a network with same community structurebut random edges
Look for peaks in modularity values - good splitpoints
15
Community Detection
• Modularity (Q):
= sum of elements of matrix x
= trace of e, fraction of vertices in same community
= fraction of edges that connect vertices in community i
In network where edges fall between vertices without regard for thecommunities they belong to, we have
0<=Q <=1, Q > 0.3 - significant community structure
‘e’ - k x k matrix, k = #communities, eij = fraction of all edgesin network that link vertices in community i to vertices incommunity j
16
Community Detection
• Step 1: Time axis is separated into equidistant timewindows
• Step 2: Detect community instances in each timewindow - Hierarchical divisive clustering
• Step 3: Finding similar community instances acrosstime windows - Community survival Similarity = overlap in members
• Step 4: Community survival graph - connectmatching community instances over time Borders of clusters: If and when a community dies or
merges with another community• Step 5: Groups of community instances are
discovered using hierarchical divisive clustering
17
Community Detection
18
Community Detection
• Overlap between 2 communityinstances:
= number of vertices in a community instanceor intersection
Similarity function:
19
Visualizing CommunityDynamics
• CoDyM - Community discovering & dynamicsmining
• Two ways to compare: Fixed: Chosen time window is compared with all
other time windows Periodic: Chosen time window is compared to
previous time window• Measures:
Stability: How stable the composition of the group is? Fixed stability = # members from current window are
active in all other time windows Low Periodic stability indicates high changes in the
membership structure of the subgroup
20
Visualizing CommunityDynamics
• Density: Edges inside the group / edges with outside
group members Indicates connectivity inside the group
• Cohesion - How connected the group is tomembers outside of the instance Greater cohesion, less density leads to an
unstable subgroup• Euclidean distance of vector representing sub-
groups. (Euclidean distance = 0) => structurally
equivalent
21
Visualizing CommunityDynamics
• Correlation Coefficient: Covariance of thevector representation of the graphs /product of their standard deviations Structurally equivalent subgroups will have
correlation +1• Group Activity: Number of interactions in
each time window Min internal and external group activities
measures the reciprocity inside and outside thegroup respectively
22
Observing Communitytransitions & triggers
• Community persists, grows, evolves,disappears, matures, merges, splits Observing vertices and edges Comparing community instances across time windows
• Triggers Community Leadership Change
Degree of vertex and vertex betweenness -> centrality Low edge clustering but high edge betweenness implies
probability that node acts as a bridge is higher External influences e.g. public campaign ->
immediate effect on community Global properties: # vertices & edges, average shortest path,
diameter of the graph, modularity of the graph etc.
23
Data Set
• Online Student Community Interactions - guest book entries Edge - bilateral message exchange 1000 members + 250,000 guestbook entries
over 18 months (June 2004 - November 2005) Time windows
14-days (Not chosen automatically but byexperiments to ensure low standard deviation)
Similarity Threshold - = 0.5, = 6 1025 similar community instances, 4 communities
after clustering
24
Rectangle - Community InstanceHeight = # members
Edge between similarcommunities
Different colors =different communitiesafter clustering
25
Graph ofsimilarcommunityinstanceswithoutclustering
First break in communityevolution detected after 27clustering iterations
26
2nd change in evolution -3 communities
3rd change in evolution -4 communities
27Dendrogram - result of clustering
Best dendrogram cut based on modularity measure
28
List of subgroups for user to chose from
29
Visualizing CommunityEvolution
• Student community - integration ofinternational students who stay for 1or 2 semesters
• High fluctuations at the end of asemester or beginning of a new one
• Structural change correspond tobeginning or end of a semester -Christmas ‘04, Summer’05, Winter’05- as expected
30
Conclusion & Critique
• Temporal evolution of online communities• CoDyM - Community Dynamics Miner
Evolution of online communities Triggers for structural changes
• Useful tool for community providers Foster intra-organizational knowledge sharing
• Need for appropriate similarity measures that scalebetter with changing activity and density
• Quantitatively evaluating community detectionalgorithm is difficult. Requires manual analysis.Need for better measures
• Automatic identification of time window size &thresholds
31
References
• Newman, M. E. J. and Girvan, M. (2004).Finding and evaluating community structure innetworks. Physical Review, E 69(026113).
• Moody, J., Mc Farland, D. and Bender-deMoll,S. (2005). Dynamic Network Visualization.American Journal of Sociology, 110(4), 1206-1241.
• Gloor, P. A. and Zhao, Y. (2004). TeCFlow - ATemporal Communication Flow Visualizer forSocial Networks Analysis. In: CSCW'04Workshop on Social Networks, ACM.
32
Questions ?