when the wikipedians talk: network and tree structure of wikipedia discussion pages

29
When the Wikipedians talk Network and tree structure of Wikipedia discussion pages David Laniado *, , Riccardo Tasso , Yana Volkovich * & Andreas Kaltenbrunner * * Barcelona Media - Innovation Center Information, Technology & Society Research Group Politecnico di Milano Dipartimento di Elettronica e Informazione ICWSM 2011, Barcelona, July 20 th Laniado, Tasso, Volkovich & Kaltenbrunner When the Wikipedians talk 20-07-2011 1 / 29

Upload: david-laniado

Post on 27-Jan-2015

110 views

Category:

News & Politics


1 download

DESCRIPTION

Talk pages play a fundamental role in Wikipedia as the place for discussion and communication. In this work we use the comments on these pages to extract and study three networks, corresponding to different kinds of interactions. We find evidence of a specific assortativity profile which differentiates article discussions from personal conversations. An analysis of the tree structure of the article talk pages allows to capture patterns of interaction, and reveals structural differences among the discussions about articles from different semantic areas.

TRANSCRIPT

Page 1: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages

When the Wikipedians talkNetwork and tree structure of Wikipedia discussion pages

David Laniado∗, †, Riccardo Tasso†, Yana Volkovich∗ &Andreas Kaltenbrunner∗

∗Barcelona Media - Innovation Center

Information, Technology & Society Research Group

†Politecnico di Milano

Dipartimento di Elettronica e Informazione

ICWSM 2011, Barcelona, July 20th

Laniado, Tasso, Volkovich & Kaltenbrunner (Barcelona Media, Politecnico di Milano)When the Wikipedians talk 20-07-2011 1 / 29

Page 2: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages

Outline

1 Introduction

2 Data extraction

3 Social interaction networksBasic network parametersDirected assortativity

4 Discussion treesStatistics about discussionsDifferences between macro-categories

5 Conclusions

Laniado, Tasso, Volkovich & Kaltenbrunner (Barcelona Media, Politecnico di Milano)When the Wikipedians talk 20-07-2011 2 / 29

Page 3: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages

Outline

1 Introduction

2 Data extraction

3 Social interaction networksBasic network parametersDirected assortativity

4 Discussion treesStatistics about discussionsDifferences between macro-categories

5 Conclusions

Laniado, Tasso, Volkovich & Kaltenbrunner (Barcelona Media, Politecnico di Milano)When the Wikipedians talk 20-07-2011 3 / 29

Page 4: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages

IntroductionWikipedia visible side

Laniado, Tasso, Volkovich & Kaltenbrunner (Barcelona Media, Politecnico di Milano)When the Wikipedians talk 20-07-2011 4 / 29

Page 5: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages

IntroductionArticle talk pages

Laniado, Tasso, Volkovich & Kaltenbrunner (Barcelona Media, Politecnico di Milano)When the Wikipedians talk 20-07-2011 5 / 29

Page 6: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages

IntroductionThe hidden side of Wikipedia

Since 2007, growth of Wikipedia has notably slowed down [B. Suh etal.; “The singularity is not near: slowing growth of Wikipedia;” in Proc. of WikiSym ’09]

The hidden side of Wikipedia is gaining importancearticle talk pages→ explicit coordination and discussionuser talk pages→ personal communications (sort of public inbox)

Article “Barack Obama”:discussion split into 72 pages22 000 comments in the article talk pages(17 500 edits done to the article)

Laniado, Tasso, Volkovich & Kaltenbrunner (Barcelona Media, Politecnico di Milano)When the Wikipedians talk 20-07-2011 6 / 29

Page 7: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages

Introduction

GoalDetect patterns of interaction in the communications between theWikipedians

Unlike in other online discussion spaces, in Wikipedia the usersdiscuss to reach consensus and to coordinate their activity witheach other

Contributionextract the discussions’ structureanalyze interaction networksstudy discussion trees

Laniado, Tasso, Volkovich & Kaltenbrunner (Barcelona Media, Politecnico di Milano)When the Wikipedians talk 20-07-2011 7 / 29

Page 8: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages

Outline

1 Introduction

2 Data extraction

3 Social interaction networksBasic network parametersDirected assortativity

4 Discussion treesStatistics about discussionsDifferences between macro-categories

5 Conclusions

Laniado, Tasso, Volkovich & Kaltenbrunner (Barcelona Media, Politecnico di Milano)When the Wikipedians talk 20-07-2011 8 / 29

Page 9: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages

Data extractionTalk page “Presidency of Barack Obama”

Laniado, Tasso, Volkovich & Kaltenbrunner (Barcelona Media, Politecnico di Milano)When the Wikipedians talk 20-07-2011 9 / 29

Page 10: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages

Data extractionDiscussion tree for article “Presidency of Barack Obama”

red→ root (the article)blue→ structural nodesgreen→ anonymouscommentsgrey→ registered comments

Laniado, Tasso, Volkovich & Kaltenbrunner (Barcelona Media, Politecnico di Milano)When the Wikipedians talk 20-07-2011 10 / 29

Page 11: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages

DatasetBasic quantities of the data analysed

Dump of the English Wikipedia, dated March 12th, 2010.

#articles 3 210 039#edits of article pages 402 851 686#articles with talk page (ATP) 871 485 (27.1%)#comments in ATP 9 421 976#users who comment articles 350 958 (2.8%)#registered users 12 651 636#user talk pages (UTP) 1 662 818 (13.1%)#comments in UTP 13 670 980

Laniado, Tasso, Volkovich & Kaltenbrunner (Barcelona Media, Politecnico di Milano)When the Wikipedians talk 20-07-2011 11 / 29

Page 12: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages

Outline

1 Introduction

2 Data extraction

3 Social interaction networksBasic network parametersDirected assortativity

4 Discussion treesStatistics about discussionsDifferences between macro-categories

5 Conclusions

Laniado, Tasso, Volkovich & Kaltenbrunner (Barcelona Media, Politecnico di Milano)When the Wikipedians talk 20-07-2011 12 / 29

Page 13: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages

Social interaction networksNetwork construction

article reply network→direct replies in articlesdiscussion pages.

Discussion:. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . user A(talk)

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . user B(talk)

Discussion:. . . . . . . . . . . .. . . . . . . . . . . .. . . . . . . . . . . .. . . user C(talk)

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . user B(talk)

C

B

A

article reply network

Article 1

Article 2

user reply network→ directreplies in user talk pages.

wall network→ personalmessages posted on anotheruser’s talk page.

User talk: User A. . . . . . . . . . . .. . . . . . . . . . . .. . . user C(talk)

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . user B(talk)

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . user A(talk) C

A

B

C

A

B

usertalk networkwall network

Laniado, Tasso, Volkovich & Kaltenbrunner (Barcelona Media, Politecnico di Milano)When the Wikipedians talk 20-07-2011 13 / 29

Page 14: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages

Basic network parametersGlobal measures of the Wikipedia discussion and talk network

variable reply-NW talk-NW wall-NW#nodes with edges 204 017 114 258 1 861 702w. in-degree ≥1 121 682 103 147 1 832 168w. out-degree≥1 182 881 63 334 177 331#edges 1 489 734 852 065 4 412 212mean distance 4.10 3.86 4.06maximal distance 15 11 12mean in/out-degree 7.30 7.46 2.37clustering coeff. 0.083 0.053 0.035reciprocity 0.44 0.45 0.15

Laniado, Tasso, Volkovich & Kaltenbrunner (Barcelona Media, Politecnico di Milano)When the Wikipedians talk 20-07-2011 14 / 29

Page 15: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages

Directed assortativityApproach

Assortativity: measure of diversity in networksdo nodes having many connections preferentially interact with oneanother or with poorly connected nodes?

Directed assortativity: correlation between in- and out- degree ofsource and target nodes [J. Foster et al.; “Edge direction and the structure of

networks ”; in PNAS, 2010]

Results depend on the network structureTo assess statistical significance, contrast the results with anensemble of N=100 randomized networks having the same degreesequence

Normalization of the four assortativity measures for a network(Assortativity Significance Profile)

Laniado, Tasso, Volkovich & Kaltenbrunner (Barcelona Media, Politecnico di Milano)When the Wikipedians talk 20-07-2011 15 / 29

Page 16: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages

Directed assortativity profilesComparison of the directed Assortativity Significance Profile

(out,in) (in,out) (out,out) (in, in)−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

AS

P

Slashdot: replyWikipedia: replyWikipedia: talkWikipedia: wall

Where ASP score is not significant ( |Z | < 2), it is marked with thecorresponding symbol at the figure bottoms.

Laniado, Tasso, Volkovich & Kaltenbrunner (Barcelona Media, Politecnico di Milano)When the Wikipedians talk 20-07-2011 16 / 29

Page 17: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages

Outline

1 Introduction

2 Data extraction

3 Social interaction networksBasic network parametersDirected assortativity

4 Discussion treesStatistics about discussionsDifferences between macro-categories

5 Conclusions

Laniado, Tasso, Volkovich & Kaltenbrunner (Barcelona Media, Politecnico di Milano)When the Wikipedians talk 20-07-2011 17 / 29

Page 18: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages

Statistics about discussionsDistributions of users and comments by article discussion

num

ber

of art

icle

s

number of users, comments

size of Wikipedia discussions

100

101

102

103

104

105

100

102

104

106

#comments

#users

0 10 20 3010

0

101

102

103

104

105

106

107

chain length

num

ber

of chain

s

distribution of chain lengths

1−chains (36.6%)

2−chains (40.7%)

n−chains (22.7%)

PL−fit: y~x−6

85% of articles have ≤ 10 comments15 000 articles have ≥ 100 comments

Laniado, Tasso, Volkovich & Kaltenbrunner (Barcelona Media, Politecnico di Milano)When the Wikipedians talk 20-07-2011 18 / 29

Page 19: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages

Discussion treesSize of the discussions

Number of users involved

Number of chains of length >= 3, or consecutive replies betweentwo users

example chain of length 3: A← B← Agood indicator of conflictive discussions

#art

icle

s

#chains

number of chains per article

100

101

102

103

100

101

102

103

104

data

truncated LN:

µ=−6.5, σ=3.0

p−value = 0.997

#chains

ccdf of #art

icle

s

100

101

102

103

10−5

10−4

10−3

10−2

10−1

100

PL: α = 2.23

p−value = 0.087cut−off: x

min = 21

95.75% discarded

Laniado, Tasso, Volkovich & Kaltenbrunner (Barcelona Media, Politecnico di Milano)When the Wikipedians talk 20-07-2011 19 / 29

Page 20: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages

Discussion treesLength of the discussions

Maximum depthsensitive to the presence of individual threads

h-index of the discussion treemaximum number h such that there are:≥ h comments at depth ≥ h

max depth

#art

icle

s

all discussions

5 10 15 20 25 30 35 400

1

2

3

4

x 105

Only discussions >100 nodes

5 10 15 20 25 30 35 40

500

1000

1500

2000

2500

h−index(structure)

#art

icle

s

all discussions

5 10 15 200

1

2

3

4

5x 10

5

Only discussions >100 nodes

5 10 15 20

1000

2000

3000

4000

5000

6000

Laniado, Tasso, Volkovich & Kaltenbrunner (Barcelona Media, Politecnico di Milano)When the Wikipedians talk 20-07-2011 20 / 29

Page 21: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages

Most discussed Wikipedia articlesTop 20 articles ordered by number of chains in the discussion

# Title chains comments users h-index max. depth edits

1 Intelligent design 2413 22454 (3) 954 (13) 16 (20) 20 (358) 9179 (53)2 Gaza War 2358 17961 (6) 607 (47) 19 (2) 27 (28) 11499 (29)3 Barack Obama 2301 22756 (2) 2360 (2) 18 (6) 21 (245) 17453 (6)4 Sarah Palin 2182 19634 (4) 1221 (9) 17 (10) 25 (56) 12093 (24)5 Global warming 2178 19138 (5) 1382 (5) 17 (10) 20 (358) 14074 (15)6 Main Page 2065 32664 (1) 5969 (1) 15 (34) 22 (169) 4003 (674)7 Chiropractic 1772 13684(13) 243 (389) 18 (6) 29 (17) 6190 (204)8 Race and intelligence 1764 13790(12) 410 (126) 17 (10) 24 (74) 7615 (100)9 Anarchism 1589 14385 (9) 496 (76) 20 (1) 28 (22) 12589 (19)

10 British Isles 1556 12044(16) 576 (56) 17 (10) 23 (113) 4047 (658)11 CRU1 hacking incident 1551 11536(17) 474 (88) 17 (10) 20 (358) 2346 (2364)12 Jesus 1397 17916 (7) 1239 (7) 13(119) 16 (1383) 17081 (7)13 Circumcision 1356 10469(21) 436 (113) 17 (10) 26 (42) 7354 (117)14 Homeopathy 1323 13509(14) 516 (68) 17 (10) 25 (56) 6902 (151)15 George W. Bush 1281 15257 (8) 1969 (3) 14 (65) 18 (676) 32314 (1)16 September 11 attacks 1250 13830(11) 1244 (6) 16 (20) 26 (42) 11086 (30)17 Evolution 1165 13404(15) 942 (16) 13(119) 23 (113) 9780 (44)18 Catholic Church 1162 14104(10) 620 (43) 15 (34) 18 (676) 14082 (14)19 Cold fusion 1098 8354 (29) 359 (174) 15 (34) 20 (358) 4320 (557)20 2008 South Ossetia war 1075 10596(20) 853 (20) 17 (10) 23 (113) 9930 (43)

In parenthesis: rank according to the corresponding variable

1Climatic Research Unit

Laniado, Tasso, Volkovich & Kaltenbrunner (Barcelona Media, Politecnico di Milano)When the Wikipedians talk 20-07-2011 21 / 29

Page 22: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages

Assigning articles to macro-categories

Approach by [Kittur et al.; “What’s in Wikipedia?: mapping topics and conflict using socially

annotated category structure.” In Proc. of CHI ’09]

21 macro-categories from Wikipedia∼ direct sub-categories of Category:Main_topic_classifications

assign each category to the closest macro-category

for each article:consider all categories to which it is directly assignedcombine macro-categories corresponding to these categories toassign a percentage

improved effectiveness by adding a weight to penalise links in the“wrong direction” in the category graph.

Laniado, Tasso, Volkovich & Kaltenbrunner (Barcelona Media, Politecnico di Milano)When the Wikipedians talk 20-07-2011 22 / 29

Page 23: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages

Assigning articles to macro-categoriesExample

Example:Article ”Barcelona” belongs to 3 categories:

Geography of Catalonia→ GeographyPhoenician colonies in Spain→ History10s BC establishments→ History

Barcelona→ History (66.7%), Geography (33.3%)

Laniado, Tasso, Volkovich & Kaltenbrunner (Barcelona Media, Politecnico di Milano)When the Wikipedians talk 20-07-2011 23 / 29

Page 24: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages

Article discussions categorisation

% of articles with discussion

% of articles with discussions in different categories

0 2 4 6 8 10 12 14 16 18 20 22 24

23.3%

23.4%

12.1%

8.1%

3.4%

2.6%

4.8%

2.4%

3.2%

3.1%

1.0%

1.2%

1.6%

1.9%

2.0%

0.7%

2.0%

1.2%

1.0%

0.5%

0.6%all articles with discussion pages

top 1% with most comments

top 0.1% with most comments

Geogr. & places

History & eventsCulturePeopleAgric

ulture

SportsSocietyPolitic

s

Tec. & app. s

ci.Educatio

nLaw

EnvironmentBusinessScienceLanguage

Mathematics

BeliefHealthPhilo

sophyComputin

gArts

Laniado, Tasso, Volkovich & Kaltenbrunner (Barcelona Media, Politecnico di Milano)When the Wikipedians talk 20-07-2011 24 / 29

Page 25: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages

Article discussions by semantic areaDifferences between Categories

# of chains vs. # of edits # of users vs. h-index

# chains in discussion

# e

dits in a

rtic

le

# chains in discussion0 0.5 1 1.5 2 2.5 3

90

100

110

120

130

140

150

160

170

180# users in discussion

h−

index o

f dis

cussio

n

# users in discussion3 4 5 6 7 8 9

1.6

1.7

1.8

1.9

2

2.1

2.2

2.3

Technology and applied sciences

Geography and places

History and events

Culture

People

Agriculture

Sports

Society

Politics

Education

Law

Environment

Business

Science

Language

Mathematics

Belief

Health

Philosophy

Computing

Arts

Gray areas→ 95% confidence interval with bootstrap testLaniado, Tasso, Volkovich & Kaltenbrunner (Barcelona Media, Politecnico di Milano)When the Wikipedians talk 20-07-2011 25 / 29

Page 26: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages

Outline

1 Introduction

2 Data extraction

3 Social interaction networksBasic network parametersDirected assortativity

4 Discussion treesStatistics about discussionsDifferences between macro-categories

5 Conclusions

Laniado, Tasso, Volkovich & Kaltenbrunner (Barcelona Media, Politecnico di Milano)When the Wikipedians talk 20-07-2011 26 / 29

Page 27: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages

Conclusions

Considerable effort for the data extractionthe parser will be released as free software

Patterns in the interaction among the Wikipediansstrong relationship between the core and the periphery of thecommunitythe existence of different roles could be further investigated

Measures to characterize the discussionsh-index allows to study the depth of the discussion treesreply chains help to detect conflictive topics

Evidence of structural differences among discussions fromdifferent macro-categories

Laniado, Tasso, Volkovich & Kaltenbrunner (Barcelona Media, Politecnico di Milano)When the Wikipedians talk 20-07-2011 27 / 29

Page 28: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages

References I

Crandall, D. J., Cosley, D., Huttenlocher, D. P., Kleinberg, J. M., and Suri, S. (2008).Feedback effects between similarity and social influence in online communities.In Proc. of KDD ’08.

Foster, J. G., Foster, D. V., Grassberger, P., and Paczuski, M. (2010).Edge direction and the structure of networks.PNAS, 107(24):10815–10820.

Gómez, V., Kappen, H. J., and Kaltenbrunner, A. (2011).Modeling the structure and evolution of discussion cascades.In Proc. of Hypertext 2011.

Kittur, A., Chi, E. H., and Suh, B. (2009).What’s in wikipedia?: mapping topics and conflict using socially annotated categorystructure.In Proc. of CHI ’09, pages 1509–1512, New York, NY, USA. ACM.

Laniado, D. and Tasso, R. (2011).Co-authorship 2.0: Patterns of collaboration in Wikipedia.In Proc. of Hypertext 2011.

Szell, M., Lambiotte, R., and Thurner, S. (2010).Multirelational organization of large-scale social networks in an online world.PNAS, 107(31):13636–13641.

Laniado, Tasso, Volkovich & Kaltenbrunner (Barcelona Media, Politecnico di Milano)When the Wikipedians talk 20-07-2011 28 / 29

Page 29: When the Wikipedians talk: network and tree structure of Wikipedia discussion pages

Laniado, Tasso, Volkovich & Kaltenbrunner (Barcelona Media, Politecnico di Milano)When the Wikipedians talk 20-07-2011 29 / 29