on the dynamics of topic-based communites in online...
TRANSCRIPT
On the Dynamics of Topic-Based Communites
in Online Knowledge-Sharing Networks
Anna Guimaraes, Ana Paula Couto da Silva, Jussara Almeida
Department of Computer Science - UFMG (Brazil)
September 21, 2015
Introduction
• Online Knowledge-Sharing Networks
– Wikis, Q&A sites, discussion forums
– User-created and maintained discussions
– Wealth of knowledge
2
Introduction
• Online Knowledge-Sharing Networks
– Wikis, Q&A sites, discussion forums
– User-created and maintained discussions
– Wealth of knowledge
• Prior research focus on knowledge extraction by:
– Detecting quality content [Agichtein et al., 2008]
– Ranking questions and answers [Dalip et al., 2013]
– Identifying expert users [Ravi et al., 2014, Wang et al., 2013]
2
Introduction
• More than repositories for knowledge!
– Community structure surrounding discussions
– Topics and communities subject to temporal changes
– Multiple topics, multiple communities
• This study:
– Community approach to knowledge-sharing networks
– Characterization and modeling of community evolution
3
Case Study: Stack Overflow
4
Case Study: Stack Overflow
Tags
4
Topic-Based Communities in Stack Overflow
• Communities centered around topics
– Topics are explicity defined
– Independent from social interaction graph
• Non-exclusive membership to multiple communities
5
Stack Overflow Dataset
• User activity
– User ID, Tag ID, Time stamp
• Data covering a six-year period
– 2008–2014
Tags Posts Users
400 19.8 million 1.7 million
6
Topic-Based Communities in Stack Overflow
• Temporal analyses of community activity in terms of:
– How user behavior affects community sustainability
– How users relate to communities in the long run
– How users divide their attention across different communities
– How communities affect one another
7
Communities in Stack Overflow: Findings
• Significant revisiting behavior
– Users continue to contribute to a same community
– Revisitors to a community grow more significant over time
Mean Fraction of Revisits
1st month 6th month 12th month
Revisitors 0.20 0.44 0.50
Revisits 0.27 0.46 0.50
8
Communities in Stack Overflow: Findings
• Participation in multiple communities
– 32% of users participate in up to 3 communities
– Average user participates in 17 communities
– Decaying pattern of activity over time
2 4 6 8 10 12Months
5
10
15
20
25
30
Com
mun
ities
1813
2 4 6 8 10 12Months
01020304050607080
Post
s
4228
9
Communities in Stack Overflow: Findings
• Migrating behavior
– Users traverse different communities over time
– Shared member base across communities
Ruby on Rails 3 → Ruby on Rails 4
Feb2013
Aug2013
Feb2014
Aug2014
Months
0100200300400500600700800900
# M
embe
rs
Rails 3 MembersNew Members
10
Communities in Stack Overflow: Findings
• Migrating behavior
– Users traverse different communities over time
– Shared member base across communities
MySQL → PHP
Feb2013
Aug2013
Feb2014
Aug2014
Months
0
1000
2000
3000
4000
5000
6000
# M
embe
rs
MySQLNew Members
10
Communities in Stack Overflow: Findings
• Key aspects dictating community evolution
– Intra-community aspects
– User revisits
– Continued activity
– Inter-community aspects
– Shared member base
– User migration
11
How can we then describe communityevolution?
12
CERIS Model
• CERIS
– Community Evolution model with
Revisits and Inter-community effectS
• Goal: describe community activity (number of posts) over time
• Incorporates revisits and community relationships
13
CERIS Model
• CERIS extends state-of-the-art models
– Phoenix-R evolution model with revisits [Figueiredo et al., 2014]
– Competition model [Beutel et al., 2012]
• Epidemiology approach to network dynamics
– Objects in the network are modeled as infections
14
CERIS Model
• Users are initially exposed to different communities
S
I1 I2
β2β1
γ1 γ2
I1,2
γ2 γ1
εβ2 εβ1
V1 V2
V1,2
ω1,2
ω1 ω2
15
CERIS Model
• Users become infected by participating in a community
S
I1 I2
β2β1
γ1 γ2
I1,2
γ2 γ1
εβ2 εβ1
V1 V2
V1,2
ω1,2
ω1 ω2
15
CERIS Model
• Users can recover by ceasing activity in a community
S
I1 I2
β2β1
γ1 γ2
I1,2
γ2 γ1
εβ2 εβ1
V1 V2
V1,2
ω1,2
ω1 ω2
15
CERIS Model
• Or they can be infected by additional communities
S
I1 I2
β2β1
γ1 γ2
I1,2
γ2 γ1
εβ2 εβ1
V1 V2
V1,2
ω1,2
ω1 ω2
15
CERIS Model
• Revisits to a same community captured by hidden states
S
I1 I2
β2β1
γ1 γ2
I1,2
γ2 γ1
εβ2 εβ1
V1 V2
V1,2
ω1,2
ω1 ω2
15
CERIS Model
I1,2
I1
S
I2V1 V2
V1,2
γ2 γ1
ω1,2
εβ2
γ1
ω1
β2β1
εβ1
γ2
ω2v1
V1
V1,2
V1
V1,2
s1 sn
...
+
+
+
+
16
CERIS Model
• Analyzes the time series for the number of posts in the
communities simultaneously
• Contagious process occurs following “shocks”
– Wavelets method to identify activity peaks as shock candidates
– e.g. When a new related community becomes active
• Model fitting with the Levenberg-Marquardt algorithm and
Minimum Description Length
17
CERIS Model Results
HTML and CSS
2009 2010 2011 2012 2013 2014010000200003000040000500006000070000
csshtmlmodel
iOS versions
Jan2012
Jan2013
Jan2014
Jul Jul Jul Jul050
100150200250300350400
ios7ios6ios5model
18
CERIS Model Results
• Model results:
– Reasonably accurate fittings
– Captures different patterns of activity
– Captures concurrent evolution of related communities
RMSE
HTML and CSS iOS versions All (mean, daily)
3046.895 13.612 21.131
19
CERIS Model Results
• Model outputs used to quantify the relationship between
communities
• Flow of users between communities:
flowC1,C2(t) = εβ2(t)
flowC2,C1(t) = εβ1(t)
20
CERIS Model Results
Top 100
20 40 60 80 100Communities
20
40
60
80
100
Com
munit
ies
0.00.10.20.30.40.50.60.70.80.9
Top 15
java
java
scrip
tc#ph
p
andr
oid
jque
ry
pyth
onht
ml
c++io
s
mys
qlcss
asp.
net
object
ive-
c.n
etjava
javascriptc#
phpandroidjquerypython
htmlc++
iosmysql
cssasp.net
objective-c.net
0.00.10.20.30.40.50.60.70.8
21
Conclusions
• Knowledge-sharing networks as a community environment
– Topic-based communities defined by users interacting with topics
of their interest
• Investigation of topic-based communities in Stack Overflow
– User activity in terms of communities they belong to
– Impact of related communities
• New model to describe community evolution
– Incorporates key factors behind community activity
– Good portrayal of the co-evolution of multiple communities
22
References I
Agichtein, E., Castillo, C., Donato, D., Gionis, A., and
Mishne, G. (2008).
Finding High-Quality Content in Social Media.
In Proc. WSDM.
Beutel, A., Prakash, B. A., Rosenfeld, R., and Faloutsos, C.
(2012).
Interacting Viruses in Networks: Can Both Survive?
In Proc. ACM SIGKDD.
24
References II
Dalip, D. H., Goncalves, M. A., Cristo, M., and Calado, P.
(2013).
Exploiting User Feedback to Learn to Rank Answers in Q&A
Forums: A Case Study with Stack Overflow.
In Proc. ACM SIGIR.
Figueiredo, F., Almeida, J. M., Matsubara, Y., Ribeiro, B.,
and Faloutsos, C. (2014).
Revisit Behavior in Social Media: The Phoenix-R Model and
Discoveries.
Proc. PKDD.
25
References III
Hansen, M. H. and Yu, B. (2001).
Model Selection and the Principle of Minimum Description
Length.
Journal of the American Statistical Association, 96(454).
More, J. J. (1978).
The levenberg-marquardt algorithm: implementation and
theory.
In Numerical analysis, pages 105–116. Springer.
Ravi, S., Pang, B., Rastogi, V., and Kumar, R. (2014).
Great Question! Question Quality in Community Q&A.
In Proc. ICWSM.
26
References IV
Wang, X., Butler, B. S., and Ren, Y. (2013).
The impact of membership overlap on growth: An
ecological competition view of online groups.
Organization Science, 24(2):414–431.
27