dynamics of conversations

Dynamics of Conversations

ACM SIGKDD ’10

By Ravi Kumar, Mohammad Mahdian, & Mary McGlohon

Presented by Annie T. Chen on March 29, 2011.

Overview

RQ: What is the structure of online conversations?

Method Proposed a simple mathematical model for

the structure of conversations Added to it to account for factors such as

recency and author identity that may affect conversations.

Compared the predictions of these models back to the empirical data for three datasets: Usenet groups, Yahoo! Groups, and Twitter

Properties of Conversations

Size and depth of thread Depth: length of the maximum path from the

root to a leaf in a thread Size is roughly quadratic to depth

Degree distribution p Close to power law: p(k) k- for some >2

Branching Process Model (BP-Model) - 1 The Galton-Watson branching process is a

classic model for generating a random tree. At each ith step in the process, each node

generates a certain number of children according to the distribution p

p(k): fraction of nodes with k children in the data

Zi: number of children at the ith level of the thread

let =E[p], the mean of the distribution p

Branching Process Model (BP-Model) - 2 According to the definition of a branching

process, it can be shown that:E[Z] = (1-)-1

Since < 1 for all datasets, the branching process dies out.

Empirical Simulated

Branching Process Model (BP-Model) - 3 Problems with the BP-Model

Model is not generative (degree distributions are stipulated)

Model does not capture the depth distributions that are observed in reality

Number of children is determined by a single distribution

Timestamps are left out

T-Model

Concept: new messages receive more attention than old ones

Probability of the decision to add a child to v is proportional to some function h(degv, rv) of degree and recency of v

Probability of death is proportional to a constant

h(degv, rv) = degv+rv for constants >=0 and (0,1)

Thus, both degree and recency play a role in generating different types of threads

TI-Model - 1

The TI-Model was developed to model author identity.

Concept: authors tend to respond to responses to their own earlier messages.

Based on the polya urn model Original polya urn problem:

Initially, an urn has x balls of color 1 and y balls of color 2. At each time t, one ball is drawn out and returned to the urn with another ball of the same color.

“Rich get richer” process

TI-Model - 2

New message v arrives with u=parent(v)

“Identity copying” effectEmpirical Simulated

an author on path(parent(u))

random author

Examples

Usenet Yahoo! Groups Twitter

Usenet

Empirical

Simulated

Usenet

Group

It.discussioni.leggende.metropolitane 10

It.politica.polo 10

Rec.games.chess.politics 3

Bln.politik.rassismus 2

Sk.politics 1.5

High : Higher degree of preferential attachment Top ones tended to be politically related

Group

fa.linux.kernel 0.98

uk.politics.electoral 0.98

rec.arts.drwho 0.97

uk.politics.crime 0.97

chile.soc.politica 0.96

High : High recency effect Lower traffic groups had a higher recency effect

Usenet Identity copying rates

High (low copying rate): new authors tend to join in often Low (high copying rate): tendency for authors of posts to

have previously already authored a post

High (low copying rate):

or.politics

alt.fan.cecil-adams

alt.marketplace.online.ebay

pl.misc.kolej

rec.arts.sf.written

Low (high copying rate) linux.debian.bugs.dist

microsoft.public.excel.misc

microsoft.public.excel.programming

nctu.talk

tw.bbs.campus.nctu

Yahoo! Groups

Groups with “bushy” threads and high recency effects

Group

indianmedical =10

IllinoisSpeakers

DetectiveRichardHead

Bodybuildersaverageguys

villageDesign

NorthCarolinaSpeakers =0.99

stbaseliosorthodoxchurch

LostnFoundEvents

PatriceVinci

molecular-biology-notebook

Twitter

Group

#mustsee =10

#twitterinreallife

#readingrainbow

#whathappenswhen

#vogueevolution

#yankees =0.99

#warriors

#tiff09

#iranelectioni

#followfriday

Groups with “bushy” threads and high recency effects

Conclusion Employed various mathematical models to simulate

patterns in online conversations Strengths:

Incorporated time and author identity in the models Were able to predict patterns that were found in

actual datasets Weaknesses / further directions:

Explanatory power: how well do these models explain differences between conversational environments and/or networks?

Could incorporate other elements of conversation:• Topics• Structural/semantic components of messages• Actor characteristics/roles

How well do these models emulate different types of communication tools, e.g. Twitter?

References

Aldous, D. (2003). Lecture 2: Branching Processes. Accessed March 29, 2011 at http://www.stat.berkeley.edu/~aldous/Networks/lec2.pdf.

Kumar, R., Mahdian, M., & McGlohon, M. (2010). Dynamics of conversations. ACM SIGKDD 2010.

Zhu, T. (2009). Nonlinear Polya Urn Models and Self-Organizing Processes. Accessed March 29, 2011 at http://www.math.upenn.edu/grad/dissertations/tongzhudissertation.pdf.

dynamics of conversations

Documents

classic model

model author identity

simple mathematical

richer processtimodel

rv of degree

oftenlow high copying

new authors

recency of vprobability