it’s a network,not an encyclopedia
DESCRIPTION
Kane, G. (2009). It’s a Network,Not an Encyclopedia: A Social Network Perspective on Wikipedia Collaboration. Academy of Management Annual Meeting Proceedings.TRANSCRIPT
IT’S A NETWORK,NOT AN ENCYCLOPEDIA: A SOCIAL NETWORK PERSPECTIVE ON WIKIPEDIA COLLABORATION
Gerald C. Kane (and Sam Ransbotham)
Assistant Professor of Information Systems
Carroll School of Management
Boston College
Paper presented at the Academy of Management Annual Meeting 2009 – Chicago
Presentation adapted from http://www.profkane.com/uploads/7/9/1/3/79137/aom_2009.pptx
WIKIPEDIA AS KNOWLEDGE NETWORK
Wikipedia is a good environment for studying collaborative processes (Kane and Fichman 2009).
Research on collaboration on Wikipedia (Butler et
al 2008, Kittur and Kraut 2008, Wilkinson and Huberman 2008). Views articles as independent But contributors collaborate on multiple articles,
can transfer content, process, and reputational knowledge
Assumption: Wikipedia’s article are not independent, but interconnected
RQ: How the connections between WP articles influence article quality?
SOCIAL NETWORK ANALYSIS (SNA) SNA used to study collaborative environments (Borgatti and
Foster 2003, Constant et al 1996). Structure of the network and node’s position within
structure describes how knowledge flows through network.
Two-mode network is classic, but underused, network conceptualization One type of node (contributor) is viewed as the tie
connecting other types of node (articles). Article is unit of analysis.
Centrality is among most widely used measures, refers to whether an individual node is situated within core or periphery of network (Wasserman and Faust 1994, Scott 2000). Central nodes typically perform better, because have better
access to knowledge contained in network.
TWO-MODE NETWORKS (EXAMPLE)
“Mode” = “a distinct set of entities on which the structural variables are measured. […] Structural variables measured on a single set o actors give rise to one-mode network” (Wasserman & Faust, 1994, p. 29)
Two-mode network = a network dataset containing two sets of actors. Measure: which actors from one set have ties to actors in the other set
Affiliation network: special type of two-mode network, with one set of actors and one set of events. Relation: which actor participate in which events?
TWO-MODE NETWORKS (EXAMPLE)
People that participate in social events – film festivals (incidence matrix):
Adjacency matrix (events by events)
Cannes Berlin Venice Moscow
Shangai
Julia 1 1 1 1 0
Sean 1 1 1 0 1
Audrey 0 1 1 1 0
Tom 0 0 1 0 1
Cannes Berlin Venice Moscow
Shangai
Cannes 2 2 1 1
Berlin 2 3 2 1
Venice 2 3 2 2
Moscow 1 2 2 0
Shangai 1 1 2 0
TWO-MODE NETWORKS (EXAMPLE)
People that participate in social events – film festivals (incidence matrix):
Adjacency matrix (actor by actor)
Cannes Berlin Venice Moscow
Shangai
Julia 1 1 1 1 0
Sean 1 1 1 0 1
Audrey 0 1 1 1 0
Tom 0 0 1 0 1
Julia Sean Audrey Tom
Julia 3 3 1
Sean 3 2 2
Audrey 3 2 1
Tom 1 2 1
TWO TYPES OF CENTRALITY
Different measures of centrality assess node’s position at different levels of the network (Faust 1997).
Degree centrality = local network Number of nodes directly connected to a node and
strength of those relationships. H1: The degree centrality of an article in the
two-mode matrix of articles and contributors will be positively related to article quality.
Eigenvector centrality = global network. How important (central) are the nodes to which the
focal node is directly connected to. H2: The eigenvector centrality of an article in
the two-mode matrix of articles and contributors will be positively related to quality.
EIGENVECTOR CENTRALITY - EXAMPLE
A
B1
B2 B3
EIGENVECTOR CENTRALITY - EXAMPLE
A
B1
B2 B3
SETTING AND METHOD Sampled 300 (out of 15,000) medical articles on
Wikipedia in Wikiproject Medicine (WP:MED). Qualitative analysis confirmed that WP:MED was
largely independent sub-community on Wikipedia. 75% of most active contributors worked mostly or
exclusively on medical articles. 1/3 medical professionals, 1/3 patients, 1/3
Wikipedians. WP:MED assesses quality of all articles:
Featured Article (best): FA A-quality: A Good Article: GA B-quality: B Start-quality (worst): Start
Collected all FA, A, GA (~100)
Random sample of B, Start (~200)
Squares = contributorsCircles = articles
Red = Featured ArticlesOrange = A-quality ArticlesYellow = Good Articles
Light Blue = B-quality ArticlesDark Blue = Start-quality articles
TWO-MODE NETWORK OF ARTICLES AND CONTRIBUTORS
VARIABLES
Dependent variable: quality
Independent variables: Transformed 300 (articles) x 1800 (contributors) two-mode network into 300 x 300 matrix (article by article). Nodes are articles, ties are the n° of editors who contributed to each pair of articles.UCINet: Degree centrality Eigenvector centrality
VARIABLES
Control variables: Topic Importance
Most important article are likely to receive more attention, to attract divergent opinions and have a greater base of knowledge.
(1-4). Assigned by WP:MED. Popularity:
Past research: editors may be more likely to contribute to high-traffic articles; patients tend to click on the top search results.
Traffic: Average number page views/ month 1Q2008 (Alexa.com).
Google Rank: Is it the top Google result? (60% were). Direct collaboration (article and discussion pages).
Number of unique contributors, average edits/contributor, number of anonymous contributors.
DATA ANALYSIS
Ordinal Regression in SPSS 16
When there is a progressive relationship within a categorical dependent variable, but it is unclear the magnitude of difference between the categories.
Negative log-log link function chosen because high proportion of lower quality articles (200).
HYPOTHESIS
H1: The degree centrality of an article in the 2-mode incidence matrix of articles and editors will be positively related to article quality.
H2: The eigenvector centrality of an article in the 2-mode incidence matrix of articles and editors will be positively related to article quality.
ResultsMODEL 1 MODEL 2
Est. SE Wald Sig. Est. SE Wald Sig.
Traffic 0.00** 0.00 7.76 0.005 0.00 0.00 2.26 0.13
Google Rank 0.43** 0.15 7.97 0.005 0.43** 0.16 7.51 0.01
Importance -0.82*** 0.12 49.08 0.000 -0.91*** 0.12 53.72 0.00
Contributors (A) 0.00 0.00 1.17 0.279 0.00 0.00 0.15 0.69
Contributors (D) 0.00 0.00 0.08 0.776 0.00 0.00 0.68 0.41
Edits/Contributor(A) 0.63*** 0.12 28.59 0.000 0.61** 0.12 24.35 0.00
Edits/Contributor(D) -0.06 0.05 1.67 0.196 -0.08 0.05 2.16 0.14
Anon (Article) 4.15*** 1.04 16.00 0.000 3.47*** 1.06 10.72 0.00
Anon (Discussion) -2.06** 0.75 7.54 0.006 -1.46* 0.76 3.71 0.05
Deg. Cent. 0.00** 0.00 7.29 0.01
Eig. Cent. 0.33*** 0.09 12.60 0.00
Pseudo R-Square
Cox and Snell 0.379 0.474
Nagelkerke 0.407 0.510
RESULTS Both hypotheses supported.
Degree and Eigenvector centrality highly significant (p<.01+) in the expected direction.
Only one of the measures of direct collaboration is significant (edits/contributor), underscoring importance of network variables.
Topic importance. Most highly significant variable in model, but negatively correlated with quality. Implications for value of medical knowledge on WP.
Anonymity is positively related on article page, but negative related on talk page. Differential effect noted elsewhere (Sia et al., 2002).
IMPLICATIONS FOR THEORY AND PRACTICE
Theoretical Implications. Collaborative features associated with quality in
community-based knowledge creation settings. Network variables critically important for
predicting quality, should not examine as independent collaborative environments.
Validation in other settings needed. Managerial implications.
Companies are beginning to employ similar communities for strategic purposes (Tapscott 2006).
Should approach as integrated collaborative network, not simply independent efforts.
FUTURE DIRECTIONS Connect more directly with existing theories of
information quality (e.g. Constant et al. 1996). Volume of information = number of authors. Diversity of information = degree centrality Quality of information = eigenvector centrality.
Recruiting team of 4th year medical school students to validate WP rankings. Pilot (60 articles) = 90% accuracy and 85% IRR
Collecting entire WP:MED. full text history of 2,026,992 revisions to all of the
16,354 Wikipedia medical articles Eliminates question of biased sampling, reduces
multicollinearity. Additional Controls: Age, Length, Complexity, Sections,
References.